By developing HomoGT, a homogeneous graph transformer architecture for security alert triage, I demonstrate how architectural simplification and peak-activation pooling eliminate analyst fatigue and operationalize reproducible threat detection.
Core Artifact: Homogeneous Graph Transformer for SOAR Alert Triage
Existing graph-based cybersecurity anomaly detection solutions target network records, application logs, and system calls, but none target the SOAR alerts presented to Security Operations Centers (SOCs). Repurposed models from other modalities consistently fail because they cannot exploit the relational topology that differentiates coordinated, multi-step attacks from isolated background noise.
HomoGT addresses this gap by modeling alerts, IP addresses, processes, and domains as a unified, homogeneous graph topology. The framework utilizes shared feature projections that combine IP behavioral statistics and rule identity encodings as the primary input signals. Trained with per-alert supervision, the model aggregates temporal windows using a global max-pooling function to isolate malicious vectors in real time.
Empirical Evaluation & Baselines
The framework was evaluated against standard industry architectures, sequence-based log models, and frontier foundation models. HomoGT delivers a decisive performance advantage across all testing baselines, capturing coordinated exploits while maintaining an exceptionally high balance of precision and recall.
Performance Comparison of Anomaly Detection Frameworks
| Model Type | Prec. | Rec. | F1 | AUROC | AUPRC |
| Proposed Model | 0.75 | 0.94 | 0.83 | 0.977 | 0.897 |
| LogBERT | 0.80 | 0.42 | 0.55 | 0.721 | 0.438 |
| MAGIC | 0.28 | 0.47 | 0.35 | 0.834 | 0.139 |
| E-GraphSAGE | 1.00 | 0.11 | 0.19 | 0.656 | 0.185 |
| Gemini 2.5 Flash | 0.03 | 0.95 | 0.06 | 0.672 | 0.031 |
| Llama 3.1 | 0.02 | 0.47 | 0.04 | 0.533 | 0.022 |
Cross-Architecture Performance Comparison
To evaluate viability for practical deployment, HomoGT was benchmarked against alternative graph structures, sequence-based log architectures, and frontier large language models. The proposed architecture processes a single alert window in 4.26 ms on consumer hardware, striking an optimal balance between low training overhead and high-throughput real-time inference.
| Model / Framework | Training Time | Inference Latency | Throughput |
| Proposed HomoGT | ~6 minutes | 4.26 ms | 235 win/sec |
| E-GraphSAGE | ~45 minutes (CPU) | 0.31 ms | 3,264 win/sec |
| MAGIC | ~10 minutes (CPU) | 6.59 ms | 152 win/sec |
| LogBERT | ~4 minutes | 10.97 ms | 91 win/sec |
| Llama 3.1 | N/A | 11,690 ms | 0.09 win/sec |
| Gemini 2.5 Flash | N/A | API-Bound | Rate-limited |
Theoretical Contributions
- Topology Simplification: Proves that collapsing heterogeneous entity boundaries into a shared mathematical projection space retains critical relational signals while drastically lowering computational complexity.
- Signal Preservation: Validates that max-pooling functions prevent the mathematical dilution of critical threat vectors within high-noise corporate environments, resolving a major limitation of standard average-pooling layouts.
- Resource Efficiency: Demonstrates that a targeted, mathematically optimized graph structure can process complex alert windows at 235 windows per second on consumer-grade hardware, rendering heavy foundation models (which suffer up to 11,690 ms of latency) unnecessary for localized triage workflows.
Academic Publication & Expansion Roadmap
This research is actively transitioning from its initial conference paper framework into an expanded journal submission, focusing on two distinct microarchitectural tracks:
- Sub-Layer LLM Embedding: We are investigating the integration of HomoGT as a dedicated, structural layer within Large Language Model pipelines to merge relational graph topologies with semantic text processing architectures.
- RISC-V Vector Parallelism: To counter the rising operational costs of enterprise AI GPU infrastructure, the next phase maps high-throughput graph training pipelines directly onto bare-metal RISC-V Vector (RVV) extensions. This hardware-software co-design aims to prove that dense, multi-dimensional security workloads can be computed efficiently on open-source instruction set architectures.
