Scanner MCP Tools Reference

Scanner provides five MCP tools that enable AI agents and interactive clients to query security data, explore your environment, and execute threat hunting operations.


Overview

Tool
Purpose
Usage

get_scanner_context

Load a condensed Scanner query reference, available indexes, and source types

Called first by AI agents

get_docs

Retrieve detailed documentation for a specific topic

On-demand reference lookup

get_top_columns

Discover the most frequent column names for one or more indexes

Field discovery before writing queries

execute_query

Run ad-hoc queries against Scanner logs

Core query execution

fetch_query_results

Retrieve specific fields from cached query results

Result refinement


Design: Efficient Context Management

Scanner MCP tools are designed to work together efficiently, avoiding context bloat that would consume excessive tokens and reduce AI agent capability.

Querying the Right Data

The Challenge: Before an AI agent can write a useful query, it needs to know what data exists, what fields are available, and how the query language works. Loading all of this context upfront—full documentation, every index's schema, all field statistics—consumes thousands of tokens before a single query is run, leaving less room for actual investigation.

The Solution: A layered, on-demand discovery pattern:

  1. get_scanner_context returns a condensed starting point—a query documentation table of contents, a compact list of available indexes, and source types. This gives the AI enough orientation to begin without loading full documentation or field metadata.

  2. get_docs provides detailed documentation for a specific topic (syntax, aggregation, examples, or best_practices_and_mistakes) only when the AI actually needs it—for example, looking up aggregation syntax before writing a stats query.

  3. get_top_columns returns the most frequently occurring column names for one or more indexes. The AI calls this to discover available fields for the specific indexes it plans to query, rather than receiving column statistics for every index upfront.

This means the AI learns just enough to start, then deepens its knowledge on demand as the investigation requires.

Efficient Result Retrieval

The Challenge: Security queries often return thousands of rows. Loading all raw results into the AI's context would:

  • Consume massive amounts of tokens

  • Reduce the AI's ability to reason about findings

  • Make investigations slower and more expensive

The Solution: A two-stage result retrieval pattern:

  1. execute_query returns a summary of results, not raw data:

    • Field names and top values for each field

    • Row count and data patterns

    • Statistical summaries (counts, distributions, time ranges)

    • A result_handle for later access

  2. fetch_query_results allows selective retrieval:

    • AI examines the summary from execute_query

    • AI decides which fields and rows are actually relevant

    • Uses fetch_query_results to pull only those specific fields/rows into context

    • Avoids loading irrelevant data

Example: Query returns 5,000 S3 access events. Rather than load all 5,000 rows with all fields (massive context), the AI:

  • Sees the summary showing event types, usernames, buckets involved

  • Identifies which 50 rows are suspicious

  • Fetches only those 50 rows with only relevant fields (eventTime, userName, bucketName)

  • Saves 90% of token usage while retaining key context

This design enables AI agents to handle large datasets without context limitations becoming a bottleneck.


1. get_scanner_context

Purpose: Load a condensed Scanner query reference, available indexes, and source types.

Key Points:

  • Called first — AI agents are instructed to call this before executing any queries

  • Returns a context_token — Required input for execute_query and get_top_columns

  • Discovers your data — Shows what indexes and source types are available in your Scanner instance

  • Condensed reference — Returns a table of contents for query documentation, not the full docs (use get_docs for details)

What It Provides:

  • A condensed Scanner query language reference (table of contents)

  • Available indexes with names and descriptions

  • Discovered source types (e.g., aws:cloudtrail, kubernetes:audit, proxy, dns)

  • A context_token for use with other tools

When to Use:

  • Start of any investigation or query session

  • When exploring what data sources are available

Example Usage:


2. get_docs

Purpose: Retrieve detailed documentation for a specific Scanner query language topic.

Required Parameters:

  • section — One of: syntax, aggregation, examples, best_practices_and_mistakes

Available Sections:

  • syntax — Query syntax rules, operators, and grammar

  • aggregation — Aggregation functions and usage

  • examples — Example queries for common use cases

  • best_practices_and_mistakes — Best practices and common errors to avoid

Key Features:

  • On-demand loading — Fetch only the documentation you need, when you need it

  • No context_token required — Can be called at any time

  • Reduces token usage — Avoids loading thousands of tokens of documentation upfront

When to Use:

  • Before writing a query, to look up syntax or aggregation functions

  • When a query fails, to review best practices and common mistakes

  • To find example queries for a specific use case

Example Usage:


3. get_top_columns

Purpose: Discover the most frequently occurring column names for one or more indexes.

Required Parameters:

  • context_token — Obtained from get_scanner_context

  • indices — List of index names to query (e.g., ["my-cloudtrail-index", "_detections"])

Key Features:

  • Multi-index support — Query columns for multiple indexes in a single call

  • Frequency-sorted — Returns columns sorted by how often they appear in your data

  • Compact format — Results use a tuple format (["column_name", count]) to minimize token usage

  • Time-scoped — Reflects column usage from the last 7 days

When to Use:

  • Before writing a query, to discover what fields are available in an index

  • When exploring an unfamiliar data source

  • To understand the schema of your log data

Example Usage:


4. execute_query

Purpose: Execute an ad-hoc query against Scanner to search your security logs.

Required Parameters:

  • context_token — Obtained from get_scanner_context (proves you've loaded syntax rules)

  • query — Your Scanner Query Language query

  • start_time — Query start time (ISO 8601 format, inclusive)

  • end_time — Query end time (ISO 8601 format, exclusive)

Optional Parameters:

  • max_rows — Maximum rows to return (default: 1000, max: 10000)

  • max_bytes — Memory limit in bytes (default: 128MB)

Key Features:

  • Blocking execution — Waits for results (supports configurable timeouts)

  • Result caching — Results are cached for subsequent operations

  • Field-level summaries — Returns summaries to reduce token usage

  • Time-bounded queries — Scope queries to specific time windows

What It Returns:

  • Query execution status

  • Result summary with key findings

  • A result_handle for fetching detailed results with fetch_query_results

  • Row count and execution time

  • Sample of returned fields

When to Use:

  • Execute threat hunting queries

  • Search for specific events or patterns

  • Investigate alerts or incidents

  • Test detection rule logic

  • Explore data patterns

Example Usage:


5. fetch_query_results

Purpose: Retrieve specific fields and rows from previously executed query results.

Required Parameters:

  • result_handle — Cache handle returned from execute_query

  • fields — List of field names to retrieve (required)

Optional Parameters:

  • limit — Maximum matching rows to return (default: 50, max: 1000)

  • offset — Rows to skip before filtering (default: 0)

  • row_filter_regex — Regex pattern to filter which rows are returned

Key Features:

  • Selective field retrieval — Fetch only the fields you need

  • Pattern matching — Filter results with regex to find specific events

  • Pagination support — Use offset and limit for large result sets

  • Efficient browsing — Avoids re-executing queries

When to Use:

  • Get detailed results after an initial query

  • Extract specific fields from large result sets

  • Filter results to find matching events

  • Drill down into query results progressively

  • Refine investigation based on initial findings

Example Usage:


Version Changelog

Scanner MCP is versioned. Here's what changed between releases.

v0.0.2 (current)

  • get_scanner_context returns condensed output — Instead of returning full inline documentation and field statistics, it now returns a compact query reference table of contents, a slim list of available indexes, and source types. This significantly reduces token usage on the initial call.

  • New tool: get_docs — Retrieve detailed documentation for a specific topic (syntax, aggregation, examples, best_practices_and_mistakes). Replaces the inline documentation previously returned by get_scanner_context.

  • New tool: get_top_columns — Discover the most frequently occurring columns for one or more indexes. Replaces the per-index field statistics previously returned by get_scanner_context.

v0.0.1

  • Initial release with three tools: get_scanner_context, execute_query, fetch_query_results.

  • get_scanner_context returns full inline documentation and field statistics for all indexes.


Last updated

Was this helpful?