# Scanner MCP Tools Reference

Scanner provides five MCP tools that enable AI agents and interactive clients to query security data, explore your environment, and execute threat hunting operations.

***

## Overview

| Tool                  | Purpose                                                                       | Usage                                  |
| --------------------- | ----------------------------------------------------------------------------- | -------------------------------------- |
| `get_scanner_context` | Load a condensed Scanner query reference, available indexes, and source types | Called first by AI agents              |
| `get_docs`            | Retrieve detailed documentation for a specific topic                          | On-demand reference lookup             |
| `get_top_columns`     | Discover the most frequent column names for one or more indexes               | Field discovery before writing queries |
| `execute_query`       | Run ad-hoc queries against Scanner logs                                       | Core query execution                   |
| `fetch_query_results` | Retrieve specific fields from cached query results                            | Result refinement                      |

***

## Design: Efficient Context Management

Scanner MCP tools are designed to work together efficiently, avoiding context bloat that would consume excessive tokens and reduce AI agent capability.

### Querying the Right Data

**The Challenge:** Before an AI agent can write a useful query, it needs to know what data exists, what fields are available, and how the query language works. Loading all of this context upfront—full documentation, every index's schema, all field statistics—consumes thousands of tokens before a single query is run, leaving less room for actual investigation.

**The Solution:** A layered, on-demand discovery pattern:

1. **get\_scanner\_context** returns a *condensed* starting point—a query documentation table of contents, a compact list of available indexes, and source types. This gives the AI enough orientation to begin without loading full documentation or field metadata.
2. **get\_docs** provides detailed documentation for a specific topic (`syntax`, `aggregation`, `examples`, or `best_practices_and_mistakes`) only when the AI actually needs it—for example, looking up aggregation syntax before writing a stats query.
3. **get\_top\_columns** returns the most frequently occurring column names for one or more indexes. The AI calls this to discover available fields for the specific indexes it plans to query, rather than receiving column statistics for every index upfront.

This means the AI learns just enough to start, then deepens its knowledge on demand as the investigation requires.

### Efficient Result Retrieval

**The Challenge:** Security queries often return thousands of rows. Loading all raw results into the AI's context would:

* Consume massive amounts of tokens
* Reduce the AI's ability to reason about findings
* Make investigations slower and more expensive

**The Solution:** A two-stage result retrieval pattern:

1. **execute\_query** returns a *summary* of results, not raw data:
   * Field names and top values for each field
   * Row count and data patterns
   * Statistical summaries (counts, distributions, time ranges)
   * A `result_handle` for later access
2. **fetch\_query\_results** allows selective retrieval:
   * AI examines the summary from execute\_query
   * AI decides which fields and rows are actually relevant
   * Uses fetch\_query\_results to pull *only* those specific fields/rows into context
   * Avoids loading irrelevant data

**Example:** Query returns 5,000 S3 access events. Rather than load all 5,000 rows with all fields (massive context), the AI:

* Sees the summary showing event types, usernames, buckets involved
* Identifies which 50 rows are suspicious
* Fetches only those 50 rows with only relevant fields (`eventTime`, `userName`, `bucketName`)
* Saves 90% of token usage while retaining key context

This design enables AI agents to handle large datasets without context limitations becoming a bottleneck.

***

## 1. get\_scanner\_context

**Purpose:** Load a condensed Scanner query reference, available indexes, and source types.

**Key Points:**

* **Called first** — AI agents are instructed to call this before executing any queries
* **Returns a context\_token** — Required input for `execute_query` and `get_top_columns`
* **Discovers your data** — Shows what indexes and source types are available in your Scanner instance
* **Condensed reference** — Returns a table of contents for query documentation, not the full docs (use `get_docs` for details)

**What It Provides:**

* A condensed Scanner query language reference (table of contents)
* Available indexes with names and descriptions
* Discovered source types (e.g., `aws:cloudtrail`, `kubernetes:audit`, `proxy`, `dns`)
* A `context_token` for use with other tools

**When to Use:**

* Start of any investigation or query session
* When exploring what data sources are available

**Example Usage:**

```
1. Call get_scanner_context (no parameters needed)
2. Receive context_token and condensed reference
3. Use get_docs to look up specific syntax topics as needed
4. Use get_top_columns to discover fields for indexes you plan to query
5. Use context_token with execute_query
```

***

## 2. get\_docs

**Purpose:** Retrieve detailed documentation for a specific Scanner query language topic.

**Required Parameters:**

* **section** — One of: `syntax`, `aggregation`, `examples`, `best_practices_and_mistakes`

**Available Sections:**

* **syntax** — Query syntax rules, operators, and grammar
* **aggregation** — Aggregation functions and usage
* **examples** — Example queries for common use cases
* **best\_practices\_and\_mistakes** — Best practices and common errors to avoid

**Key Features:**

* **On-demand loading** — Fetch only the documentation you need, when you need it
* **No context\_token required** — Can be called at any time
* **Reduces token usage** — Avoids loading thousands of tokens of documentation upfront

**When to Use:**

* Before writing a query, to look up syntax or aggregation functions
* When a query fails, to review best practices and common mistakes
* To find example queries for a specific use case

**Example Usage:**

```
get_docs with:
- section: "aggregation"

Returns: Detailed documentation on aggregation functions, syntax, and usage
```

***

## 3. get\_top\_columns

**Purpose:** Discover the most frequently occurring column names for one or more indexes.

**Required Parameters:**

* **context\_token** — Obtained from `get_scanner_context`
* **indices** — List of index names to query (e.g., `["my-cloudtrail-index", "_detections"]`)

**Key Features:**

* **Multi-index support** — Query columns for multiple indexes in a single call
* **Frequency-sorted** — Returns columns sorted by how often they appear in your data
* **Compact format** — Results use a tuple format (`["column_name", count]`) to minimize token usage
* **Time-scoped** — Reflects column usage from the last 7 days

**When to Use:**

* Before writing a query, to discover what fields are available in an index
* When exploring an unfamiliar data source
* To understand the schema of your log data

**Example Usage:**

```
get_top_columns with:
- context_token: [from get_scanner_context]
- indices: ["my-cloudtrail-index"]

Returns: Top columns for the index, e.g., [["eventName", 482910], ["userIdentity.userName", 401822], ...]
```

***

## 4. execute\_query

**Purpose:** Execute an ad-hoc query against Scanner to search your security logs.

**Required Parameters:**

* **context\_token** — Obtained from `get_scanner_context` (proves you've loaded syntax rules)
* **query** — Your Scanner Query Language query
* **start\_time** — Query start time (ISO 8601 format, inclusive)
* **end\_time** — Query end time (ISO 8601 format, exclusive)

**Optional Parameters:**

* **max\_rows** — Maximum rows to return (default: 1000, max: 10000)
* **max\_bytes** — Memory limit in bytes (default: 128MB)

**Key Features:**

* **Blocking execution** — Waits for results (supports configurable timeouts)
* **Result caching** — Results are cached for subsequent operations
* **Field-level summaries** — Returns summaries to reduce token usage
* **Time-bounded queries** — Scope queries to specific time windows

**What It Returns:**

* Query execution status
* Result summary with key findings
* A `result_handle` for fetching detailed results with `fetch_query_results`
* Row count and execution time
* Sample of returned fields

**When to Use:**

* Execute threat hunting queries
* Search for specific events or patterns
* Investigate alerts or incidents
* Test detection rule logic
* Explore data patterns

**Example Usage:**

```
execute_query with:
- context_token: [from get_scanner_context]
- query: %ingest.source_type="aws:cloudtrail" eventName=DeleteTrail
- start_time: 2025-11-18T00:00:00Z
- end_time: 2025-11-19T00:00:00Z
- max_rows: 100

Returns: Summary of matching CloudTrail events with a result_handle for detailed retrieval
```

***

## 5. fetch\_query\_results

**Purpose:** Retrieve specific fields and rows from previously executed query results.

**Required Parameters:**

* **result\_handle** — Cache handle returned from `execute_query`
* **fields** — List of field names to retrieve (required)

**Optional Parameters:**

* **limit** — Maximum matching rows to return (default: 50, max: 1000)
* **offset** — Rows to skip before filtering (default: 0)
* **row\_filter\_regex** — Regex pattern to filter which rows are returned

**Key Features:**

* **Selective field retrieval** — Fetch only the fields you need
* **Pattern matching** — Filter results with regex to find specific events
* **Pagination support** — Use offset and limit for large result sets
* **Efficient browsing** — Avoids re-executing queries

**When to Use:**

* Get detailed results after an initial query
* Extract specific fields from large result sets
* Filter results to find matching events
* Drill down into query results progressively
* Refine investigation based on initial findings

**Example Usage:**

```
After execute_query returns a result_handle:

fetch_query_results with:
- result_handle: [from execute_query]
- fields: [userIdentity.userName, eventName, sourceIPAddress, eventTime]
- limit: 50
- row_filter_regex: "admin.*"

Returns: 50 rows of matching events showing only the specified fields where username matches regex
```

***

## Version Changelog

Scanner MCP is versioned. Here's what changed between releases.

### v0.0.2 (current)

* **`get_scanner_context` returns condensed output** — Instead of returning full inline documentation and field statistics, it now returns a compact query reference table of contents, a slim list of available indexes, and source types. This significantly reduces token usage on the initial call.
* **New tool: `get_docs`** — Retrieve detailed documentation for a specific topic (`syntax`, `aggregation`, `examples`, `best_practices_and_mistakes`). Replaces the inline documentation previously returned by `get_scanner_context`.
* **New tool: `get_top_columns`** — Discover the most frequently occurring columns for one or more indexes. Replaces the per-index field statistics previously returned by `get_scanner_context`.

### v0.0.1

* Initial release with three tools: `get_scanner_context`, `execute_query`, `fetch_query_results`.
* `get_scanner_context` returns full inline documentation and field statistics for all indexes.

***

## Related Documentation

* [**Getting Started**](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/mcp-and-ai-secops/getting-started) — Setup instructions for Scanner MCP in different tools
* [**Detection Engineering**](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/mcp-and-ai-secops/using-mcp-for-security-operations/detection-engineering) — Use Scanner MCP to build and validate detection rules
* [**Autonomous Workflows**](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/mcp-and-ai-secops/using-mcp-for-security-operations/autonomous-workflows) — Build agents using Scanner MCP tools
* [**Interactive Investigations**](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/mcp-and-ai-secops/using-mcp-for-security-operations/interactive-investigations) — Run investigations with Scanner MCP
