Detection Rule Engine
Scanner's detection engine runs hundreds of detection rules simultaneously against massive log volumes—without repeatedly scanning the same data. Traditional scheduled queries become cost-prohibitive at scale, but Scanner's streaming architecture makes continuous detection both fast and economical.
The Challenge: Scheduled Queries Don't Scale
In traditional data lake environments, detection rules are implemented as scheduled queries that re-scan logs on a fixed interval. This approach has fundamental limitations:
Each detection rule scans the full dataset every time it runs
100 detection rules × 1 query/minute = 100 full scans per minute
1,000 detection rules × 1 query/minute = 1,000 full scans per minute
Most scans find nothing (redundant work)
Query costs scale linearly with the number of rules
Becomes economically infeasible as rule count or log volume increases
Scanner solves this with a streaming detection engine that caches intermediate query results in a specialized data structure, enabling hundreds of detection rules to run efficiently even at massive scale.
How Scanner's Detection Engine Works
Scanner's detection engine operates in two distinct phases:
Phase 1: Indexing Time (Building the Cache)
As logs are indexed, they flow through a Detection Engine state machine that executes all detection rule queries simultaneously.
What happens during indexing:
Concurrent query execution: All detection rule queries (hundreds) run against incoming logs simultaneously
Filter matching: Only logs that match the query filters (the portion before aggregation) are processed further
Partial execution: Matching logs execute up to the first aggregation in the query
Result caching: Aggregation values are stored in a time-based rollup tree data structure
Efficient storage: The rollup tree is written to the Scanner Index files S3 bucket
This happens automatically during the normal indexing process—no additional scanning required.
Learn more: For details on Scanner's indexing pipeline, see How it Works → Stage 2: Indexing
Phase 2: Detection Checking (Querying the Cache)
Separately from indexing, detection workers (continuous ECS tasks) run at intervals configured per detection rule via the run_frequency_s parameter. Each rule can have its own check frequency from a set of allowed values (1 min, 5 min, 15 min, 1 hour, 6 hours, or 1 day).
What happens during detection checking:
Query the rollup tree: For each detection rule, query the rollup tree data structure using the rule's configured time range
Execute remaining query: Complete the query by running the portions after the first aggregation
Evaluate results: Check if the result set is non-empty (i.e., detection triggered)
Send alerts: If triggered, send alerts to configured destinations (Slack, PagerDuty, webhooks, SOAR tools)
Log detection events: Store detection events in the special
_detectionsindex for investigation
The Rollup Tree Data Structure
The rollup tree is a hierarchical, time-based data structure that stores aggregated query results at multiple time granularities.
Tree Structure
Each detection query's aggregations are stored in a tree with nodes at different time resolutions:
24 hours (root)
├── 12 hours
│ ├── 6 hours
│ │ ├── 3 hours
│ │ │ ├── 1 hour
│ │ │ │ ├── 30 minutes
│ │ │ │ │ ├── 15 minutes
│ │ │ │ │ │ ├── 5 minutes
│ │ │ │ │ │ │ ├── 1 minuteHow it works:
Each node contains aggregated values for its time window
Nodes are mathematical monoids with associative and identity properties
Child nodes can be combined to answer queries for arbitrary time ranges
Only nodes covering the queried time range need to be read
Querying the tree:
The rollup tree is a segment tree data structure that efficiently answers range queries by reading the minimal set of nodes needed to cover the requested time range.
For a 24-hour time range query:
Best case: If the current time aligns with a 24-hour node boundary, Scanner reads just the root node (1 read)
Typical case: If the current time doesn't align, Scanner reads multiple nodes that tile together to cover the full 24 hours. For example, it might read one 12-hour child node, one 6-hour child of the other 12-hour node, and several smaller nodes to cover the exact range.
The segment tree structure guarantees that any arbitrary time range can be decomposed into at most O(log n) disjoint segments, minimizing S3 reads while ensuring complete coverage of the requested time window.
Storage Characteristics
Location: Scanner Index files S3 bucket (same location as standard indexes)
Size: The rollup tree only stores aggregations for logs that match detection rule filters. Storage is typically 0.01% - 0.05% of the original log size, depending on filter selectivity.
Example:
10 TB of logs per day
~10% of logs match detection rule filters across all rules
Matching logs after filtering: ~1 TB
Rollup tree cache (compressed aggregations): 1-5 GB per day
Mathematical Properties
Aggregation nodes are designed as monoids, which provide critical properties:
Associativity:
(a + b) + c = a + (b + c)— Nodes can be combined in any orderIdentity element: There's a neutral element that doesn't change results
Composability: Small time windows combine to answer larger time range queries
These properties enable efficient time range queries: Scanner reads only the minimal set of nodes required to cover the requested time range and combines them mathematically.
Constraints and Limits
Node size limit: Each rollup tree node stores up to 64 MB of data. If a node reaches this limit, it is truncated to 64 MB.
Rules without aggregations: For detection rules that have no aggregation (e.g., only filters and table), the rollup tree stores up to 64 MB of raw matching log events per time node. Best practice: Always include aggregations (stats, groupbycount, etc.) to minimize storage overhead and stay well under the 64 MB limit.
Retention period: Rollup tree nodes are retained for 1 week (7 days) to support late-arriving data. After 1 week, nodes are automatically deleted from S3.
Allowed detection values: Both the run_frequency_s and time_range_s parameters must use one of the following values (specified in seconds): 60 (1 min), 300 (5 min), 900 (15 min), 3600 (1 hour), 21600 (6 hours), or 86400 (1 day).
Detection frequency: The run_frequency_s parameter controls how often detection workers check for matches.
Time range limits: The time_range_s parameter controls how far back to look when checking for matches, with a maximum of 24 hours (1 day). Late-arriving data is supported for up to 1 week—logs that arrive late will update the rollup tree nodes and be re-evaluated by detection workers.
Rule count: There is no maximum number of detection rules per tenant. However, compute usage and associated costs increase with the number of active rules.
Lifecycle Management
Rule updates: When a detection rule is modified (query, time range, frequency, etc.), the changes apply to new log data going forward only. Existing rollup tree nodes are not backfilled with the updated logic.
Rule deletion: When a detection rule is deleted, its rollup tree cache nodes are removed from S3. This cleanup happens asynchronously.
Rule disable/enable: When a detection rule is disabled, new rollup tree nodes are not created during indexing, and detection workers stop checking the rule. When re-enabled, the rule begins processing new logs going forward—historical data during the disabled period is not backfilled.
Detection events retention: Detection events stored in the _detections index have unlimited retention by default. Retention can be configured to a shorter period if desired via Scanner's data retention settings.
Late-Arriving and Out-of-Order Data
Scanner handles late-arriving logs gracefully for up to 1 week. When logs arrive out of order or are indexed late:
The detection engine updates the relevant rollup tree nodes with the new data
Updated nodes are flagged for re-evaluation
Detection workers re-check the affected time windows on their next run
If the late data causes a detection rule to trigger, alerts are sent and detection events are created
This ensures detection accuracy even when log delivery is delayed or arrives out of chronological order. Logs arriving more than 1 week late cannot be processed by the detection engine since rollup tree nodes older than 1 week are deleted.
Example: S3 Data Exfiltration Detection
Here's how a detection rule with aggregations executes through the two-phase system:
name: High Volume S3 Data Exfiltration
query_text: |
%ingest.source_type="aws:cloudtrail"
eventSource="s3.amazonaws.com"
eventName="GetObject"
| stats sum(bytesTransferred) as total_bytes by userIdentity.arn
| eval gbTransferred = total_bytes / (1024 * 1024 * 1024)
| where gbTransferred > 100
time_range_s: 300 # Look back 5 minutes
run_frequency_s: 60 # Check every minuteAt indexing time: S3 GetObject events matching the filters have bytesTransferred summed by userIdentity.arn and stored in 1-minute nodes of the rollup tree.
At detection checking time: Every 60 seconds, the detection worker queries the rollup tree for the last 5 minutes, applies the eval and where clauses, and checks if any user exceeds 100 GB. If Alice transferred 150 GB, an alert is sent and a detection event is created.
Query performance: Typically under 100 milliseconds, even over time ranges up to 24 hours, because we're reading cached aggregations instead of scanning raw logs.
Example: Detection Rule Without Aggregations
Detection rules can be written without aggregations, though this is less efficient:
name: S3 Bucket Deletion
query_text: |
%ingest.source_type="aws:cloudtrail"
eventName="DeleteBucket"
| table timestamp, requestParameters.bucketName, userIdentity.arn
time_range_s: 60
run_frequency_s: 60Since this rule has no aggregation (stats, groupbycount, etc.), the rollup tree stores up to 64 MB of raw matching events per time node. For high-volume events, this is less storage-efficient than using aggregations. Best practice: Use aggregations whenever possible to reduce rollup tree storage.
Query Examples and Detection Rules
Detection rules use the same query syntax as ad-hoc searches. Any query can become a detection rule.
For query syntax and examples, see:
Query Syntax - Comprehensive syntax reference
Data Exploration - Extensive query examples and patterns
Aggregation Functions - Function reference (stats, groupbycount, etc.)
For creating and managing detection rules, see:
Detection Rules - User guide for creating rules via UI
Detection Rules as Code - YAML schema and GitHub sync
Performance at Scale
Scanner's streaming detection engine makes it economically feasible to run extensive detection coverage, even at massive log volumes.
Comparison
Detection rules supported
10-50 (practical limit)
Hundreds
Detection latency
Minutes to hours
Minutes (as fast as 1 min)
Data scanned per check
Full dataset per rule
Cached aggregations only
Cost scaling
Linear with rules × volume
Sublinear (shared indexing cost)
Example at scale (500 rules, 10 TB/day logs):
Traditional: 500 rules × 60 checks/hour × 24 hours × 10 TB = 7.2 PB scanned daily — Prohibitively expensive
Scanner: 500 rules × 60 checks/hour × 24 hours × 5 GB = 3.6 TB queried daily — ~2,000x less expensive
Key Benefits
Massive scale: Run hundreds of detection rules without cost explosion
Fast alerts: Detection latency measured in minutes (as fast as 1 minute), not hours or days
Efficient time ranges: Lookback windows up to 24 hours are just as fast as 1-minute windows
No redundant scanning: Each log is processed once during indexing, then queried via cached aggregations
Shared indexing infrastructure: Aggregations happen during normal indexing—no additional log scanning required
Independent detection workers: Detection checking runs on continuous ECS tasks that scale separately from log ingestion
This architectural choice makes comprehensive, real-time detection economically viable at any scale—from gigabytes to petabytes per day.
Related Documentation
How it Works (Overview) - Scanner's four-stage architecture
How Scanner Achieves Fast Queries - Inverted indexes and query performance
Detection Rules (User Guide) - Creating and managing detection rules
Event Sinks - Alert destinations
Last updated
Was this helpful?