Research Foundation

Seven peer-reviewed papers. One coherent pipeline.

NeroLynx Core was built by translating academic research directly into production engineering decisions. Every algorithmic choice in the pipeline, from how attack steps are structured to how detection conditions are filtered, traces back to a published finding with a measurable improvement. This is not a black-box prompt wrapper. It is a structured synthesis of the best published techniques for automated threat intelligence processing and detection derivation.

Research papers

Each stage decision is grounded in a concrete published result.

The pipeline was not assembled from generic best practices. The choices below map directly from paper findings into implementation details inside NeroLynx Core.

01arXiv · 2024

LLMCloudHunter. Automated Detection Rule Generation from Cyber Threat Intelligence

Automated generation and validation of cloud detection rules from CTI using LLMs

Canonical Detection (Stage 5B)

`falsepositives` field added to Sigma generation schema, shown to significantly improve analyst confidence and rule deployability
Temperature calibration validated at 0.3 for detection derivation, balancing structure with creative flexibility over a naive 0.7
Task decomposition impact confirmed. Removing the extraction stage caused a 37.83% drop in relationship precision, validating the pipeline stage separation design
`reasoning_notes` chain-of-thought field added to canonical intent schema before rule generation

Highlighted metric95.90% descriptive metadata accuracy on generated Sigma rules

02arXiv · 2024

IntelEX. Automated Sigma Rule Generation and Validation from CTI Reports

End-to-end pipeline for Sigma rule generation with self-correction from CTI

Canonical Detection (Stage 5B) · Self-correction loop

Sigma self-correction loop where each generated rule is structurally validated; if issues are found, a single targeted correction call is made with the exact error injected back, reducing error cycles from 12 (baseline) to 3
TTP-as-chain-of-thought where attack steps formatted as structured behavioral triples serve as explicit reasoning scaffolds
Template-driven YAML generation with mandatory logsource, detection, condition, and falsepositives sections enforced at schema level

Highlighted metricPrecision improved from 0.313 baseline to 0.818 with TTP context injection

03arXiv · 2025

CTI-REALM. Agentic Benchmark for Detection Engineering from Threat Intelligence

Benchmark and methodology for agentic CTI-to-detection workflows

Detection derivation · Backend fanout · Prompt temperature

Multi-field corroboration requirement where both canonical detection and backend fanout prompts require at least two distinct field conditions per rule
Logsource context guidance where environment hypotheses (Windows, Linux, cloud) are mapped to specific Sigma logsource categories before detection, preventing mis-categorized rules that never match production SIEMs
Medium reasoning principle applied with focused prompts at constrained temperatures, 0.3 for detection and 0.1 for compilation, rather than high-temperature open generation

Highlighted metricMulti-field correlation rules scored 33% higher detection quality with significantly reduced false positive rates

04arXiv · 2024

TTPXHunter. Extracting TTPs from Threat Intelligence with Contextual Language Models

High-precision TTP extraction from unstructured threat intelligence text

Pre-inference (Stage 3) · Backend fanout (Stage 5B)

Confidence-filtered observable conditions where any condition with confidence below 0.35 is filtered before backend fanout, preventing low-signal conditions from polluting compiled queries
Precision-over-recall principle where the pipeline prioritizes specificity and fewer false positives over exhaustive coverage at the rule level
IoC semantic normalization where the detection prompt treats specific observable values (IPs, hashes, paths) as type indicators for field selection rather than literal match targets, forcing behavioral pattern detection that generalises across environments

Highlighted metric97.38% precision / 96.15% recall on TTP extraction from unstructured CTI

05ACM Digital Library · 2019

POIROT. Aligning Cyber Threat Intelligence with Kernel Audit Records

Graph-based threat hunting aligning CTI with provenance audit logs

Attack model (Stage 3) · Detection derivation (Stage 5B)

Attack step correlation design where structuring attack steps as ordered behavioral sequences rather than isolated indicators enables downstream detection to reason about causal chains and multi-step patterns
Provenance and relationship context reinforced with each attack step carrying explicit actor, action, and target fields to enable correlation-based detection across multiple events

Highlighted metricGraph-based provenance correlation enables multi-hop attack chain reconstruction

06arXiv · 2020

ThreatRaptor. Automated Cyber Threat Hunting from OSINT

Automated threat hunting pipeline from OSINT to system audit query generation

Attack model (Stage 3) · Telemetry planning (Stage 4)

Structured threat behavior extraction separated from detection derivation, preventing ungrounded rule generation
Behavioral sequence ordering where the detection stage receives attack steps in explicit temporal order, enabling sequence-aware detection conditions where supported by the target backend, such as Elastic EQL sequence syntax

Highlighted metricStructured behavioral extraction prevents ungrounded rule generation from raw text

07ACM Digital Library · 2017

TTPDrill. Automatic and Accurate Extraction of Threat Actions from Unstructured CTI Sources

SVO-based extraction of threat actions from unstructured CTI with high precision

Canonical Detection (Stage 5B) · Attack model (Stage 3)

Subject-Verb-Object (SVO) attack step framing where attack steps are formatted as Actor, Action, and Target triples, giving the detection model clear behavioral anchors for field selection
IoC type normalization where specific observable values such as IPs, hashes, file paths, and CVE identifiers are replaced with semantic category labels during detection reasoning, preventing NLP-style misinterpretation
Composite action detection window with sequential ordering and grouping of related behavioral steps to support detection of composite techniques spanning multiple actions, such as creating a DLL file combined with modifying the registry to achieve DLL injection

Highlighted metric94% precision using SVO extraction · 87% recall on composite multi-action techniques

Pipeline stage map

Where each paper enters the production pipeline.

Acquisition + Reduction (Stages 1–2)

No direct paper. Deterministic Rust code.

Attack Model Extraction (Stage 3)

TTPXHunterPOIROTThreatRaptorTTPDrill

Telemetry Planning (Stage 4)

ThreatRaptor

Canonical Detection (Stage 5B)

LLMCloudHunterIntelEXCTI-REALMTTPXHunterTTPDrill

Backend Fanout (Stage 5B)

CTI-REALMTTPXHunter

Validation + Scoring (Stage 5C)

IntelEXLLMCloudHunter

Novel contribution

The research combination is the contribution.

No single paper describes this pipeline. Each published technique was designed for a narrow problem. TTP extraction, rule generation, threat hunting, and query synthesis were each addressed in isolation. NeroLynx Core is the first implementation to compose all seven into a single end-to-end pipeline where each stage output is typed, checkpointed, and feeds the next with explicit confidence and provenance. The result is a system where detection derivation is grounded in source material, constrained by validated field presence, compiled across nine backend languages, and assessed for quality, automatically, from a single threat intelligence input.

Typed stage contracts

Every stage emits a strongly typed artifact. Downstream stages cannot receive unvalidated input.

Checkpoint resumability

Any stage can be resumed from disk without re-running prior expensive LLM calls.

Explicit provenance

Every detection artifact traces back to the source sentence, attack step, and field evidence that produced it.

See the research in action.

Every technique described on this page is implemented inside the NeroLynx detection pipeline. Try it with your own threat intelligence.

Try the platform Book a demo