Research papersEach stage decision is grounded in a concrete published result.
The pipeline was not assembled from generic best practices. The choices below map directly from paper findings into implementation details inside NeroLynx Core.
LLMCloudHunter. Automated Detection Rule Generation from Cyber Threat Intelligence
Automated generation and validation of cloud detection rules from CTI using LLMs
Canonical Detection (Stage 5B)
- `falsepositives` field added to Sigma generation schema, shown to significantly improve analyst confidence and rule deployability
- Temperature calibration validated at 0.3 for detection derivation, balancing structure with creative flexibility over a naive 0.7
- Task decomposition impact confirmed. Removing the extraction stage caused a 37.83% drop in relationship precision, validating the pipeline stage separation design
- `reasoning_notes` chain-of-thought field added to canonical intent schema before rule generation
Highlighted metric95.90% descriptive metadata accuracy on generated Sigma rules
IntelEX. Automated Sigma Rule Generation and Validation from CTI Reports
End-to-end pipeline for Sigma rule generation with self-correction from CTI
Canonical Detection (Stage 5B) · Self-correction loop
- Sigma self-correction loop where each generated rule is structurally validated; if issues are found, a single targeted correction call is made with the exact error injected back, reducing error cycles from 12 (baseline) to 3
- TTP-as-chain-of-thought where attack steps formatted as structured behavioral triples serve as explicit reasoning scaffolds
- Template-driven YAML generation with mandatory logsource, detection, condition, and falsepositives sections enforced at schema level
Highlighted metricPrecision improved from 0.313 baseline to 0.818 with TTP context injection
CTI-REALM. Agentic Benchmark for Detection Engineering from Threat Intelligence
Benchmark and methodology for agentic CTI-to-detection workflows
Detection derivation · Backend fanout · Prompt temperature
- Multi-field corroboration requirement where both canonical detection and backend fanout prompts require at least two distinct field conditions per rule
- Logsource context guidance where environment hypotheses (Windows, Linux, cloud) are mapped to specific Sigma logsource categories before detection, preventing mis-categorized rules that never match production SIEMs
- Medium reasoning principle applied with focused prompts at constrained temperatures, 0.3 for detection and 0.1 for compilation, rather than high-temperature open generation
Highlighted metricMulti-field correlation rules scored 33% higher detection quality with significantly reduced false positive rates
TTPXHunter. Extracting TTPs from Threat Intelligence with Contextual Language Models
High-precision TTP extraction from unstructured threat intelligence text
Pre-inference (Stage 3) · Backend fanout (Stage 5B)
- Confidence-filtered observable conditions where any condition with confidence below 0.35 is filtered before backend fanout, preventing low-signal conditions from polluting compiled queries
- Precision-over-recall principle where the pipeline prioritizes specificity and fewer false positives over exhaustive coverage at the rule level
- IoC semantic normalization where the detection prompt treats specific observable values (IPs, hashes, paths) as type indicators for field selection rather than literal match targets, forcing behavioral pattern detection that generalises across environments
Highlighted metric97.38% precision / 96.15% recall on TTP extraction from unstructured CTI
POIROT. Aligning Cyber Threat Intelligence with Kernel Audit Records
Graph-based threat hunting aligning CTI with provenance audit logs
Attack model (Stage 3) · Detection derivation (Stage 5B)
- Attack step correlation design where structuring attack steps as ordered behavioral sequences rather than isolated indicators enables downstream detection to reason about causal chains and multi-step patterns
- Provenance and relationship context reinforced with each attack step carrying explicit actor, action, and target fields to enable correlation-based detection across multiple events
Highlighted metricGraph-based provenance correlation enables multi-hop attack chain reconstruction
ThreatRaptor. Automated Cyber Threat Hunting from OSINT
Automated threat hunting pipeline from OSINT to system audit query generation
Attack model (Stage 3) · Telemetry planning (Stage 4)
- Structured threat behavior extraction separated from detection derivation, preventing ungrounded rule generation
- Behavioral sequence ordering where the detection stage receives attack steps in explicit temporal order, enabling sequence-aware detection conditions where supported by the target backend, such as Elastic EQL sequence syntax
Highlighted metricStructured behavioral extraction prevents ungrounded rule generation from raw text
TTPDrill. Automatic and Accurate Extraction of Threat Actions from Unstructured CTI Sources
SVO-based extraction of threat actions from unstructured CTI with high precision
Canonical Detection (Stage 5B) · Attack model (Stage 3)
- Subject-Verb-Object (SVO) attack step framing where attack steps are formatted as Actor, Action, and Target triples, giving the detection model clear behavioral anchors for field selection
- IoC type normalization where specific observable values such as IPs, hashes, file paths, and CVE identifiers are replaced with semantic category labels during detection reasoning, preventing NLP-style misinterpretation
- Composite action detection window with sequential ordering and grouping of related behavioral steps to support detection of composite techniques spanning multiple actions, such as creating a DLL file combined with modifying the registry to achieve DLL injection
Highlighted metric94% precision using SVO extraction · 87% recall on composite multi-action techniques