Detection Engineering

Detection engineering with an AI assistant and Scanner MCP transforms how you write, test, and validate security rules. Instead of manually writing YAML, testing against logs, and iterating through feedback, you collaborate with your AI assistant to rapidly develop detections that work for your environment.

Choosing Your Tool

Two tools are ideal for detection engineering: Claude Code and Cursor. Both let you work with detection rule files, validate them locally, and test them against your Scanner data.

Claude Code

  • Best for: Working in your terminal, scripting rule development, integrating detection engineering into automated pipelines

  • Strengths: Direct access to scanner-cli validate and scanner-cli run-tests commands, clean file management, easy to version control rules

  • Workflow: Write rules → validate → test → iterate in the terminal, with Claude helping at each step

Cursor (IDE)

  • Best for: Visual rule development, side-by-side code/schema reference, building rule libraries

  • Strengths: Full IDE context, syntax highlighting for YAML, easy project management, can reference schema files while editing

  • Workflow: Develop rules in the editor with Claude context, immediate feedback on structure, inline validation

Both tools seamlessly integrate Scanner MCP to test rules against your actual data without leaving your development environment. For help setting up either tool, see Getting Started.


When to Use Detection Engineering with AI

Detection engineering is the process of building permanent, tested detection rules that catch threats in your environment. Use this approach when you need to:

Build new rules from threats you've seen When your team discovers a real attack or suspicious pattern—either through incident response or threat hunting—capture it as a permanent detection rule. Instead of manually writing YAML and testing, describe the threat to your AI. It writes the rule, tests it against your data, and iterates based on results. What would take hours manually takes minutes with AI assistance.

Migrate rules from your current platform You have detection rules in Splunk, Datadog, Elastic, or SumoLogic, but want to move them to Scanner. Rather than manually translating each rule's logic, point your AI at the rule and ask it to convert it to Scanner syntax. Your AI translates the logic, tests it against your data to ensure equivalence, and adjusts for your environment.

Reduce false positives in existing rules A rule catches threats but also fires constantly on legitimate activity. Instead of manually tuning thresholds, exclusions, and filters, collaborate with your AI. Describe the false positive patterns you're seeing. Your AI analyzes the data, suggests refinements, tests them, and iterates until the rule is accurate.

Build test coverage for rules Detection rules should have unit tests. Your AI writes comprehensive test cases covering true positives, true negatives, and edge cases—then validates that your rules catch what they should and exclude what they shouldn't.

Analyze and improve your detection library You want to know: Do your detection rules actually catch real threats? Are there attack techniques you're not detecting? Your AI analyzes your entire rule library, maps coverage to MITRE ATT&CK, identifies gaps, and recommends new rules for high-risk areas.

Validate rules before deployment Before deploying new rules to production, ensure they're syntactically correct, logically sound, and actually work against your data. Your AI runs validation, testing, and validation checks—catching issues before rules reach production.

How It Works

Detection engineering with an AI assistant follows an interactive workflow:

  1. Describe what to detect — Tell your AI what threat or anomaly you want to catch

  2. AI generates YAML — Your AI writes Scanner detection rules in YAML format

  3. Test against your data — Run the rule against Scanner to see what it catches

  4. Iterate based on results — Adjust the rule to increase true positives or reduce false positives

  5. Validate with unit tests — Your AI writes tests to verify the rule works correctly

  6. Deploy — Save the validated rule to your detection library

See Writing Detection Rules for the complete schema and syntax reference.

The Role of MCP: From Intent to Query

The real power of detection engineering with AI comes from MCP. You describe what you want to catch, and your AI uses Scanner MCP to translate that into production-ready Scanner Query Language — without you needing to learn the syntax.

What You Normally Have to Do

Writing detection queries manually requires:

  • Learning Scanner's query syntax (%ingest.source_type, field names, aggregation functions)

  • Understanding your data model (what fields are available, their names, nesting structure)

  • Translating business logic ("detect failed login attempts") into query operators

  • Testing trial-and-error to get the syntax right

  • Debugging queries that don't work as expected

This takes hours even for experienced teams.

How MCP Changes This

With Scanner MCP, you describe the detection intent in plain language, and your AI:

  1. Translates intent to queries — "Flag users with 10+ failed logins in 1 hour" becomes a Scanner query with proper aggregations and thresholds

  2. Queries your actual data — Tests the query against your logs in real-time to validate it works and show results

  3. Iterates automatically — When you say "that's too many false positives," your AI refines the query based on actual data patterns it sees

  4. Learns your environment — The AI understands your field names, data structure, and what "normal" looks like by querying your data

Example: Intent to Query

You describe the detection:

I want to detect when an AWS IAM user performs suspicious privilege escalation activity. Specifically: multiple failed attempts to modify IAM roles within a short time. This suggests permission probing by an attacker.

Without MCP (manual):

  • You spend 30 minutes reading Scanner documentation

  • You learn the syntax: eventName=(PutRolePolicy AttachRolePolicy), errorCode=*, | stats count() by ...

  • You write a query, run it, get results you don't understand

  • You iterate multiple times to fix syntax errors and refine thresholds

With MCP:

  • Your AI immediately generates a query:

    %ingest.source_type="aws:cloudtrail"
    eventSource="iam.amazonaws.com"
    eventName=(PutRolePolicy AttachRolePolicy)
    errorCode=*
    | stats count() as failed_attempts by userIdentity.userName
    | where failed_attempts >= 3
  • Your AI tests it: "Running query against last 30 days... Found 47 matches. Most look like developers testing permissions in non-prod."

  • You ask: "What if we exclude the dev team?"

  • Your AI refines in seconds and retests: "Better. 12 matches now. These look more suspicious."

Time saved: 30 minutes of documentation → 5 minutes of conversation.

What Makes MCP Powerful for Queries

1. Iterative refinement based on data

You: "This rule triggers too much"
MCP: Analyzes the actual alerts it generated, sees patterns in false positives
AI: "I see the pattern — developers in test accounts hit this all the time.
     Let me exclude those accounts" → Updated query, retested automatically

2. Learning from your data model The AI doesn't need you to provide field names. It queries your data to discover:

  • Which fields exist and what they contain

  • How data is nested (e.g., userIdentity.userName vs user.name)

  • Data patterns that affect thresholds (e.g., "normal users get 0-2 permission errors per day")

3. Bridging platform-specific syntax Coming from Splunk?

You: "Here's my Splunk rule:
     source="aws:cloudtrail" sourcetype="AWSCloudTrail"
     | stats count by user | where count > 10"

MCP: Translates to Scanner syntax, tests it, shows you what changed and why

4. Building complex logic without syntax complexity Describe the attack pattern, MCP handles the query mechanics:

You: "Detect when someone gains admin privileges then immediately
      downloads sensitive data — in the same session"

MCP: Writes a multi-stage query with session tracking, privilege checks,
     and data access aggregations. You just see: "Testing... Found 3 matches."

Best Practices for Working with MCP on Queries

  1. Start with clear intent — "Detect data exfiltration" is vague. Better: "Detect when a user downloads >1GB of data from sensitive S3 buckets in <1 hour"

  2. Let MCP see the data first — Before refining, ask: "Show me what a normal S3 download pattern looks like for our environment"

  3. Validate assumptions — When MCP suggests a threshold, ask: "Why 100 requests?" vs. just accepting it

  4. Ask for explanations — "Can you explain what this query does and why each part is necessary?" helps you understand and review

  5. Test against different time windows — Ask MCP to test the same query against:

    • Last 24 hours (recent activity)

    • Last 7 days (patterns, false positive patterns)

    • Last 30 days (realistic volume)

  6. Combine with data exploration — Before asking for a detection rule, explore your data:

    You: "Show me S3 GetObject activity in the last 24 hours.
         What's the distribution of users, buckets, and transfer sizes?"
    MCP: Queries and visualizes patterns
    You: "Now create a detection for outliers from this pattern"

Writing Detection Rules from Scratch

Start by describing the threat you want to detect in plain language.

Your prompt:

I need to detect suspicious IAM activity in AWS. Specifically, I want
to catch when a user performs multiple failed IAM policy modification
attempts. This suggests an attacker is probing for vulnerabilities or
testing permissions they don't have.

Create a Scanner detection rule that:
- Looks for failed IAM policy modifications (PutRolePolicy, AttachRolePolicy)
- Within a 1-hour time window
- Flags users with 3+ failures in that window
- Exclude service accounts (terraform, ci-pipeline, automation accounts)

Your AI generates YAML:

# schema: https://scanner.dev/schema/scanner-detection-rule.v1.json
name: "Excessive IAM Policy Modification Failures"
enabled: true
description: |
  Detects suspicious IAM activity where a user performs multiple failed
  attempts to modify IAM policies. This pattern suggests permission probing
  or an attacker testing access controls.

  References:
  * https://attack.mitre.org/techniques/T1548/004/

severity: High
query_text: |
  %ingest.source_type="aws:cloudtrail"
  eventSource="iam.amazonaws.com"
  eventName=(PutRolePolicy AttachRolePolicy)
  errorCode=*
  NOT userIdentity.userName=(terraform ci-pipeline automation-user)
  | stats
    count() as failed_attempts,
    min(timestamp) as firstTime,
    max(timestamp) as lastTime
    by userIdentity.userName, eventName
  | where failed_attempts >= 3

tags:
  - techniques.ta0005.privilege_escalation

time_range_s: 3600
run_frequency_s: 300

alert_template:
  info:
    - label: User
      value: "{{userIdentity.userName}}"
    - label: Event Type
      value: "{{eventName}}"
    - label: Failed Attempts
      value: "{{failed_attempts}}"
    - label: Time Window
      value: "{{firstTime}} to {{lastTime}}"

Test it:

Run this rule against the last 7 days of data and show me:
1. How many alerts would this generate?
2. Show me 3-5 example alerts with the user, event types, and error codes
3. Are these plausible attacks or expected operational activity?

Iterate based on feedback:

I see the results. Most look like legitimate permission errors from users trying operations they don't have access to. Can you narrow this to only flag PutRolePolicy and AttachRolePolicy (remove CreateAccessKey) since those are more likely to indicate privilege escalation attempts?

Your AI updates the rule to focus on the most suspicious actions:

query_text: |
  %ingest.source_type="aws:cloudtrail"
  eventSource="iam.amazonaws.com"
  eventName=(PutRolePolicy AttachRolePolicy)
  errorCode=*
  NOT userIdentity.userName=(terraform ci-pipeline automation-user)
  | stats
    count() as failed_attempts,
    min(timestamp) as firstTime,
    max(timestamp) as lastTime
    by userIdentity.userName, eventName
  | where failed_attempts >= 3

Migrating Rules from Other Platforms

Have detection rules in Splunk, Datadog Cloud SIEM, Elastic SIEM, SumoLogic, or another platform? Point your AI at the rule and ask it to translate:

Your workflow:

  1. Export your detection rule from your current platform (Splunk, Datadog, Elastic, SumoLogic)

  2. Save it to a file (e.g., splunk_rule.txt or datadog_rule.json)

  3. Ask your AI to translate it:

I want to migrate a detection rule from [platform name] to Scanner.

I've saved the rule to [filename]. Please:

1. Translate this logic to Scanner query syntax
2. Create a Scanner YAML file with proper schema and required fields
3. Test the query against the last 30 days of historical data
4. Show me sample results - are there any hits? Are they true positives?
5. Iterate on the query based on results (adjust thresholds, filters, or logic as needed)
6. Suggest improvements for my environment
7. Include test cases that validate the rule catches the same activity

Then I'll validate it locally with scanner-cli before deploying.

Your AI will:

  • Translate the rule logic to Scanner query syntax

  • Map fields from the source platform to Scanner's data model

  • Create a Scanner YAML file with proper schema and required fields

  • Use Scanner MCP to test the query against your actual data

  • Iterate on the query based on test results (tuning thresholds, filters, aggregations)

  • Suggest optimizations for Scanner's indexed query engine

  • Provide test cases that validate the migrated rule catches the same activity


Testing and Validation

Writing Unit Tests

After creating a rule, have your AI write test cases to verify it works correctly:

Your prompt:

Write unit tests for this IAM policy modification detection rule.

The rule detects failed IAM policy modifications (PutRolePolicy, AttachRolePolicy)
with 3+ failures in a 1-hour window, excluding known service accounts.

Create test cases following Scanner's format:
1. A true positive - user making multiple failed attempts to modify policies (permission probing)
2. A true negative - legitimate admin activity (successful policy modifications)
3. An edge case - service account excluded from detection

For each test, provide:
- A test name describing the scenario
- Example JSON CloudTrail log events (must include timestamp in RFC 3339 format, %ingest.source_type, eventSource, eventName, errorCode, userIdentity)
- Expected result (true or false)
- Why this test is important

Your AI generates tests:

tests:
  - name: Test alert is triggered when user has 3+ failed policy modifications
    now_timestamp: "2024-08-21T00:10:00.000Z"
    dataset_inline: |
      {"timestamp":"2024-08-21T00:05:00.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"PutRolePolicy","userIdentity":{"userName":"attacker"},"errorCode":"UnauthorizedOperation"}
      {"timestamp":"2024-08-21T00:05:30.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"AttachRolePolicy","userIdentity":{"userName":"attacker"},"errorCode":"AccessDenied"}
      {"timestamp":"2024-08-21T00:06:00.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"PutRolePolicy","userIdentity":{"userName":"attacker"},"errorCode":"InvalidParameterException"}
    expected_detection_result: true

  - name: Test no alert for successful policy modifications
    now_timestamp: "2024-08-21T00:10:00.000Z"
    dataset_inline: |
      {"timestamp":"2024-08-21T00:05:00.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"PutRolePolicy","userIdentity":{"userName":"admin"},"requestParameters":{"roleName":"test-role"}}
      {"timestamp":"2024-08-21T00:05:30.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"AttachRolePolicy","userIdentity":{"userName":"admin"},"requestParameters":{"roleName":"test-role"}}
    expected_detection_result: false

  - name: Test no alert for excluded service accounts
    now_timestamp: "2024-08-21T00:10:00.000Z"
    dataset_inline: |
      {"timestamp":"2024-08-21T00:05:00.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"PutRolePolicy","userIdentity":{"userName":"terraform"},"errorCode":"AccessDenied"}
      {"timestamp":"2024-08-21T00:05:30.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"PutRolePolicy","userIdentity":{"userName":"terraform"},"errorCode":"AccessDenied"}
      {"timestamp":"2024-08-21T00:06:00.000Z","%ingest.source_type":"aws:cloudtrail","eventSource":"iam.amazonaws.com","eventName":"AttachRolePolicy","userIdentity":{"userName":"terraform"},"errorCode":"AccessDenied"}
    expected_detection_result: false

Test your rules locally with: scanner-cli run-tests -f <filename> or for multiple rules: scanner-cli run-tests -r -d ./rules/

Validating Against Your Data

Test the rule against your actual logs to see real results:

Your prompt:

I've created a detection rule for failed S3 GetObject calls from unusual users
(>= 1GB downloads from buckets with "sensitive" or "customer" in the name).

Run it against my Scanner data for the last 30 days and show me:

1. **Volume**: How many alerts would this generate?
2. **Sample alerts**: Show me 3-5 real examples with userIdentity.userName, bucketName, bytesOut, and timestamp
3. **False positive assessment**: Are these real exfiltration risks or legitimate data analysis jobs?
4. **Tuning recommendations**: What thresholds or filters should I adjust? Should I exclude specific service accounts or departments?
5. **Coverage assessment**: Are there other S3 API calls I should be monitoring (e.g., CopyObject, GetObjectTagging)?

Your AI runs the queries against your actual data and provides analysis with specific recommendations for tuning.


Tuning for Your Environment

Detection rules need to be tuned for false positives. Your AI can help systematically reduce noise while maintaining detection:

Your prompt:

This detection is triggering too often. Can you help me tune it?

The rule is looking for: Failed IAM policy modifications (3+ in 1 hour)

Known false positives:
- Developers testing permissions in non-prod (frequent AccessDenied is expected)
- HR/Finance staff who don't normally use IAM (they get errors, not suspicious)
- New onboarded users learning the system

Actual attack scenario we want to catch:
- Focused permission testing (same event type, same user, rapid-fire attempts)
- Not just random permission errors spread throughout the hour

Suggest:
1. Filters to exclude low-risk user groups
2. Tighten the time window to detect rapid-fire probing
3. Focus on specific event types most likely to indicate attacks

Your AI refines the rule and provides test queries to validate improvements.

Your AI's refined query:

query_text: |
  %ingest.source_type="aws:cloudtrail"
  eventSource="iam.amazonaws.com"
  eventName=(PutRolePolicy AttachRolePolicy)
  errorCode=*
  NOT userIdentity.userName=(terraform ci-pipeline automation-user developer-test)
  | stats
    count() as failed_attempts,
    min(timestamp) as firstTime,
    max(timestamp) as lastTime
    by userIdentity.userName
  | where failed_attempts >= 3

Key changes:

  • Removed CreateAccessKey (too noisy—users get errors frequently)

  • Added additional service account exclusions (developer-test accounts)

  • Focus only on policy modification attempts (PutRolePolicy, AttachRolePolicy)

  • Aggregate by user only (not event type) to catch pattern of persistent attempts

Test queries to validate:

# Test 1: Verify we still catch privilege escalation probing
I want to test if this rule catches a user who:
- Makes 5 failed PutRolePolicy attempts
- Spread across different roles
- Within 1 hour
Expected: Should trigger

# Test 2: Verify we excluded expected failures
# Run the query for the last 7 days filtered to developer-test users
# Verify it's NOT generating alerts anymore
Expected: No alerts for excluded service accounts

Example: Complete Detection Engineering Workflow

Here's how a detection might evolve through collaboration with your AI:

Step 1: Initial Idea

I want to detect data exfiltration through S3. Specifically, large downloads from sensitive buckets by users who don't normally access them.

Step 2: AI Writes Initial Rule

Your AI generates a rule (with proper Scanner YAML format):

# schema: https://scanner.dev/schema/scanner-detection-rule.v1.json
name: "S3 Data Exfiltration - Large Downloads"
enabled: true
description: |
  Detects potential data exfiltration through S3 GetObject calls. Flags users
  performing unusually large data transfers from S3 in a single session.

severity: High
query_text: |
  %ingest.source_type="aws:cloudtrail"
  eventSource="s3.amazonaws.com"
  eventName="GetObject"
  | stats
    count() as access_count,
    sum(additionalEventData.bytesTransferredOut) as total_bytes
    by userIdentity.userName, requestParameters.bucketName
  | eval gb = total_bytes / (1024 * 1024 * 1024)
  | where access_count > 100 and gb > 0.5

tags:
  - techniques.ta0010.exfiltration

time_range_s: 3600
run_frequency_s: 300

Key detection logic:

  • GetObject calls to S3 buckets

  • Flags users with 100+ requests AND 0.5GB+ transfer in a 1-hour window

  • High request volume + large transfer indicates automated data collection

  • Aggregates by user and bucket to identify scope of access

Step 3: Test Results

Running the rule against last 30 days shows:
- 47 alerts
- 42 are false positives (legitimate data analysis jobs)
- 5 are real concerns

Step 4: Refine and Retest

You tell your AI:

The job-analysis-prod service account is generating most false positives.
Also, the finance team regularly exports large CSV files.
Can you refine to exclude these while keeping real threats?

Your AI updates the rule to:

  • Exclude known service accounts

  • Exclude users in specific departments with legitimate export jobs

  • Lower the threshold for sensitive buckets (customers, PII)

  • Keep higher threshold for general buckets

Step 5: Retest

New results:
- 8 alerts
- 1 confirmed data exfiltration (user accessing customer data, downloading to external IP)
- 7 legitimate but unusual (flagged for investigation, none malicious)

Step 6: Validate and Deploy

This looks good. Generate unit tests following Scanner's format and a runbook for my team.

Your AI provides:

  • Unit tests in Scanner YAML format (with tests:, dataset_inline:, expected_detection_result:)

  • Test scenarios covering: true positives, true negatives, edge cases, realistic attacks

  • A runbook explaining the threat, investigation steps, and remediation

Then you:

  1. Add the tests to the detection rule YAML

  2. Validate all rules: scanner-cli validate -r -d ./rules/ (validates all YAML files in directory recursively)

  3. Test all rules: scanner-cli run-tests -r -d ./rules/ (runs all tests recursively)

  4. Fix any failures and re-run validation/tests

  5. Deploy the validated rules to your Scanner instance


MITRE Coverage Analysis

Once you have a library of detection rules, analyze your coverage against the MITRE ATT&CK framework to identify gaps and prioritize new detections.

How It Works

  1. Export your rules — Save all detection rules from your GitHub repository or local directory

  2. Explore your data — Your AI uses Scanner MCP to:

    • Query your indexes to understand what log sources you have available

    • Discover the data types available (e.g., aws:cloudtrail, kubernetes:audit, proxy, dns, etc.)

    • Map available fields and their values to understand your data model

    • This gives the AI a concrete understanding of what's detectable in your environment

  3. Ask your AI to analyze coverage — Point your AI at the rules directory and ask it to:

    • Extract all MITRE tactics and techniques from rule tags

    • Count rules per tactic/technique

    • Identify which are well-covered and which have gaps

    • Map your coverage to your threat model and available data sources

  4. Test detection ideas — For gaps identified, your AI uses Scanner MCP to:

    • Run exploratory queries to test whether recommended detection patterns exist in your data

    • Validate that you have the necessary log sources and fields to detect proposed techniques

    • Confirm detection ideas are practical before recommending them

  5. Get recommendations — Your AI suggests new rules tailored to your environment, informed by what data you actually have

Example Workflow

Your prompt:

I want to analyze my detection rule coverage and identify gaps where I should build new detections.

I have a folder of detection rules at ./detection-rules/ organized by threat type.
All rules use MITRE tags like "techniques.ta0005.privilege_escalation".

Please:
1. Read all YAML files in the directory and extract all MITRE tactics and techniques from tags

2. First, use Scanner MCP to explore my available data:
   - Query Scanner to discover what log sources I have (use get_scanner_context)
   - Show me the available indexes and their source types (e.g., aws:cloudtrail, kubernetes:audit, proxy logs, DNS, etc.)
   - For each data source, highlight key fields available for detection

3. Create a coverage matrix showing:
   - Tactics covered: list of all unique tactics found
   - Techniques per tactic: how many techniques we detect for each tactic
   - Coverage gaps: which major tactics have 0 or 1 rules
   - Data availability: for each gap, note whether we have the necessary log sources to detect it

4. For each coverage gap, use Scanner MCP to test detection feasibility:
   - Run exploratory queries to check if the detection pattern exists in our data
   - Example: If recommending a Kerberoasting detection, query for failed Kerberos TGT requests
   - Validate we have the fields needed for the detection to work

5. Recommend 5 new detection rules for our top 3 coverage gaps, but ONLY recommend detections where:
   - We have the necessary log sources and fields
   - Exploratory queries show the attack pattern exists in our data
   - The detection is practical to implement

6. For each recommendation, explain:
   - Which tactic/technique it covers
   - What attack pattern it detects
   - Why it's important for our environment
   - What data sources it will use (be specific about field names we discovered)
   - Example query logic for how to detect it

Your AI analyzes and provides:

First, the AI explores your data:

Step 1: Exploring Available Data Sources
=========================================

Available Log Sources in Your Scanner Instance:
- aws:cloudtrail (CloudTrail logs)
  * Key fields: eventName, userIdentity.userName, sourceIPAddress, errorCode
  * Time range: Last 90 days

- aws:authentication (Auth/IAM events)
  * Key fields: user, auth_method, success, failure_reason
  * Time range: Last 30 days

- kubernetes:audit (Kubernetes API audit logs)
  * Key fields: verb, user, namespace, sourceIPs, objectRef
  * Time range: Last 7 days

- proxy (Proxy/firewall logs)
  * Key fields: source_ip, destination_ip, destination_port, domain, action
  * Time range: Last 90 days

- dns (DNS query logs)
  * Key fields: query_name, source_ip, response_code
  * Time range: Last 30 days

Note: You do NOT have:
- Endpoint/EDR logs (no Windows Event Logs, Sysmon, or osquery data)
- Email gateway logs
- VPC Flow Logs (only proxy logs as network source)

Then analyzes coverage with data awareness:

MITRE Coverage Analysis Results
================================

Total Rules: 47
Tactics Covered: 10 out of 14

Coverage by Tactic:
- TA0001 (Initial Access): 3 rules ✅
- TA0002 (Execution): 5 rules ✅
- TA0003 (Persistence): 2 rules ⚠️
- TA0004 (Privilege Escalation): 8 rules ✅
- TA0005 (Defense Evasion): 1 rule ❌
- TA0006 (Credential Access): 4 rules ✅
- TA0007 (Discovery): 6 rules ✅
- TA0008 (Lateral Movement): 3 rules ✅
- TA0009 (Collection): 2 rules ⚠️
- TA0010 (Exfiltration): 5 rules ✅
- TA0011 (Command & Control): 0 rules ❌
- TA0012 (Impact): 1 rule ❌
- TA0013 (Resource Development): 0 rules ❌
- TA0014 (Reconnaissance): 3 rules ✅

Top Coverage Gaps:
1. Defense Evasion (TA0005): 1 rule - HIGH RISK
2. Command & Control (TA0011): 0 rules - CRITICAL
3. Resource Development (TA0013): 0 rules - MEDIUM

Step 2: Testing Detection Feasibility with Scanner MCP
======================================================

For Defense Evasion gap (Log Tampering - T1070):
- Test query: Searched CloudTrail for DeleteTrail, StopLogging, PutEventSelectors
- Result: ✅ Found 12 instances in last 90 days (legitimate admin activity + 2 suspicious events)
- Data available: eventName, userIdentity, sourceIPAddress all present
- Feasibility: PRACTICAL - Can build reliable detection with thresholds

For Command & Control gap (DNS queries - T1071):
- Test query: Searched DNS logs for queries to known C2 domains and DGA patterns
- Result: ✅ Found DNS logs available with query_name and response_code
- Data available: dns logs with full query details
- Feasibility: PRACTICAL - Can detect unusual DNS patterns

For Resource Development gap (Certificate/Domain registration - T1583):
- Test query: Searched for domain registration events or certificate issuance
- Result: ❌ No data available - no domain registrar or certificate logs in your instance
- Data available: NONE
- Feasibility: NOT PRACTICAL - No data source to detect this; requires 3rd party feeds

Recommended Rules to Fill Gaps (Only for Practical Detections):

1. **Log Tampering Detection** (Defense Evasion - T1070)
   - Detects deletion/modification of CloudTrail logs or AWS API calls to disable logging
   - Data sources: aws:cloudtrail with eventName field
   - Example logic: Look for DeleteTrail, StopLogging, PutEventSelectors, UpdateTrail with LoggingEnabled=false
   - Critical for detecting attackers covering their tracks
   - Recommended priority: HIGH

2. **Suspicious DNS Queries - DGA Detection** (Command & Control - T1071)
   - Detects DNS queries with characteristics of Domain Generation Algorithms (DGA)
   - Data sources: dns logs with query_name and response_code
   - Example logic: Look for high volume of failed DNS queries to similar-looking domains (entropy analysis)
   - Indicates C2 communication channels or malware beaconing
   - Recommended priority: HIGH

3. **Privilege Escalation via Kubernetes RBAC** (Privilege Escalation - T1548)
   - Detects unauthorized attempts to grant elevated roles in Kubernetes
   - Data sources: kubernetes:audit with verb=create/patch objectRef.kind=ClusterRoleBinding
   - Example logic: Flag when non-admin users create ClusterRoleBindings or modify existing ones
   - Validates against your available Kubernetes logs (7-day retention)
   - Recommended priority: MEDIUM

Tips for Coverage Analysis

  • Run monthly — Analyze coverage regularly (at least monthly) to stay aligned with threat landscape changes and ensure your detection library keeps pace with new attacks

  • Align with threat model — Prioritize gaps in tactics relevant to your organization (e.g., if you're a cloud provider, focus on cloud-specific techniques)

  • Share with stakeholders — Use the coverage matrix to show security posture to leadership

  • Track improvements — Keep baseline coverage metrics and track how they improve over time

References

For detailed information on Scanner detection rule syntax and schema:

For examples of production-ready detection rules, see the Scanner out-of-the-box detections repository. These rules demonstrate:

  • Effective query patterns for Scanner query syntax

  • Proper YAML structure with required fields

  • MITRE ATT&CK tag mapping

  • Real-world tuning for scale

You can use these as templates and adapt them with your AI's help for your specific needs.

Where to Go From Here

Detection engineering is one piece of a comprehensive security strategy. Use it alongside:

Interactive Investigations — Investigate security incidents in real-time. When you discover a new attack pattern during an investigation, document it and use detection engineering to build a permanent rule for it.

Autonomous Workflows — Scale detection and response. Once you have a library of detection rules, use autonomous workflows to hunt continuously, triage alerts, and respond automatically to threats 24/7.

The most effective teams use all three: interactive investigations for critical incidents and complex analysis, detection engineering to prevent the same attacks from recurring, and autonomous workflows to scale detection and response across your environment at scale.

Last updated

Was this helpful?