scanner
  • About Scanner
  • When to use it
  • Architecture
  • Getting Started
  • Playground Guide
    • Overview
    • Part 1: Search and Analysis
    • Part 2: Detection Rules
    • Wrapping Up
  • Log Data Sources
    • Overview
    • List
      • AWS
        • AWS Aurora
        • AWS CloudTrail
        • AWS CloudWatch
        • AWS ECS
        • AWS EKS
        • AWS GuardDuty
        • AWS Lambda
        • AWS Route53 Resolver
        • AWS VPC Flow
        • AWS VPC Transit Gateway Flow
        • AWS WAF
      • Cloudflare
        • Audit Logs
        • Firewall Events
        • HTTP Requests
        • Other Datasets
      • Crowdstrike
      • Custom via Fluentd
      • Fastly
      • GitHub
      • Jamf
      • Lacework
      • Osquery
      • OSSEC
      • Sophos
      • Sublime Security
      • Suricata
      • Syslog
      • Teleport
      • Windows Defender
      • Windows Sysmon
      • Zeek
  • Indexing Your Logs in S3
    • Linking AWS Accounts
      • Manual setup
        • AWS CloudShell
      • Infra-as-code
        • AWS CloudFormation
        • Terraform
        • Pulumi
    • Creating S3 Import Rules
      • Configuration - Basic
      • Configuration - Optional Transformations
      • Previewing Imports
      • Regular Expressions in Import Rules
  • Using Scanner
    • Query Syntax
    • Aggregation Functions
      • avg()
      • count()
      • countdistinct()
      • eval()
      • groupbycount()
      • max()
      • min()
      • percentile()
      • rename()
      • stats()
      • sum()
      • table()
      • var()
      • where()
    • Detection Rules
      • Event Sinks
      • Out-of-the-Box Detection Rules
      • MITRE Tags
    • API
      • Ad hoc queries
      • Detection Rules
      • Event Sinks
      • Validating YAML files
    • Built-in Indexes
      • _audit
    • Role-Based Access Control (RBAC)
    • Beta features
      • Scanner for Splunk
        • Getting Started
        • Using Scanner Search Commands
        • Dashboards
        • Creating Custom Content in Splunk Security Essentials
      • Scanner for Grafana
        • Getting Started
      • Jupyter Notebooks
        • Getting Started with Jupyter Notebooks
        • Scanner Notebooks on Github
      • Detection Rules as Code
        • Getting Started
        • Writing Detection Rules
        • CLI
        • Managing Synced Detection Rules
      • Detection Alert Formatting
        • Customizing PagerDuty Alerts
      • Scalar Functions and Operators
        • coalesce()
        • if()
        • arr.join()
        • math.abs()
        • math.round()
        • str.uriencode()
  • Single Sign On (SSO)
    • Overview
    • Okta
      • Okta Workforce
      • SAML
  • Self-Hosted Scanner
    • Overview
Powered by GitBook
On this page
  • What is an ad hoc query?
  • How to execute an asynchronous ad hoc query
  • Check query progress
  • How to execute a blocking ad hoc query

Was this helpful?

  1. Using Scanner
  2. API

Ad hoc queries

You can execute ad hoc queries with the Scanner API, which allows you to run an arbitrary query over a specified time range.

What is an ad hoc query?

An ad hoc query is a search query with a start_time, an end_time, a max_rows, and query. It runs asynchronously in the background, and you can poll it periodically to check for results.

An ad hoc query is basically analogous to a query you make in the Search tab in Scanner.

  • The results of an ad hoc query are tabular, consisting of columns and rows.

  • The results table is limited in size to max_bytes bytes, which is 128MB by default. Some aggregation functions also have additional memory-bounding behavior, which is documented on a per-function basis. Note that this refers to its internal memory representation, not necessarily the size of the returned JSON blob (although they tend to be within a factor of 2-3 to each other).

  • The table can have at most max_rows rows, but may have fewer due to the memory size limitation. Note that the groupbycountand stats aggregations do not support this limit; use max_bytes if you need to bound the size of your result set in such an aggregation.

  • If the query is over log events with no aggregations, the returned log events are always guaranteed to be contiguous, and also to be either the latest or earliest log events, depending on the value of scan_back_to_front.

  • The table has a maximum in-memory size, which is 1GB; if a table would exceed that, then rows may be dropped to allow it to fit, regardless of the value of max_rows. This means that any paging queries should not check if the number of rows returned is equal to max_rows when determining if there are additional pages in a time range; they should instead only terminate if a returned page is empty, or the range is exhausted.

  • The table will contain some internal metadata under the @scnr namespace, e.g. @scnr.context_fields, @scnr.timestamp, etc. These fields are not guaranteed to be stable, and may change without notice.

There are two ways to execute an ad hoc query: asynchronous and blocking.

How to execute an asynchronous ad hoc query

POST /v1/start_query

To execute an asynchronous ad hoc query, you first create it via POST /v1/start_query request. The Scanner API will return the id of the query, which you can use to poll its status with GET /v1/query_progress requests.

Body

Name
Type
Description

query required

string

Query text

start_time required

string

Start timestamp for the query (inclusive). The format of the timestamp is RFC 3339

end_time required

string

End timestamp for the query (exclusive). The format of the timestamp is RFC 3339

max_rows

number

Maximum number of rows to return. Default is 1000, max is 100000

max_bytes

number

Maximum number of bytes to allocate in memory for this query. Default and max are 134217728 (128MB), min is 1048576 (1MB).

scan_back_to_front

boolean

Whether to scan from back (latest) to front (earliest). Default is true

Example

curl $API_BASE/v1/start_query \
-H "Authorization: Bearer $SCANNER_API_KEY" \
-H "Content-Type: application/json" \
-X POST \
-d '{
  "query": "%ingest.source_type: \"aws:cloudtrail\" and sourceIPAddress: 174.23.51.122",
  "start_time": "2024-02-04T01:00:00.000Z",
  "end_time": "2024-02-04T01:30:00.000Z"
}'

Response

When the ad hoc query has been completed successfully, the response HTTP status code will be 200, and the result will contain the ID of the ad hoc query that was just created.

{ "qr_id": "37ccf932-42e7-4e2e-b21e-e9f67384bea7" }

If Scanner was unable to create the ad hoc query because the query parameters were invalid, the response HTTP status code will be 400, and the response body will contain some information about the reason the query was rejected.

{ "error": "Failed to parse query: Type error at 4-7: Function missing arguments" }

Check query progress

GET /v1/query_progress/{qr_id}

Gets the current progress of the query with the supplied qr_id.

Users are expected to run GET requests periodically to check for query results. We recommend checking every 1 second.

Example

curl $API_BASE/v1/query_progress/37ccf932-42e7-4e2e-b21e-e9f67384bea7 \
-H "Authorization: Bearer $SCANNER_API_KEY" \
-H "Content-Type: application/json"

Response

When the query is still in progress, the response HTTP status code will be 200, and the is_completed field will be false:

{
  "is_completed": false,
  "results": {
    "column_ordering": [],
    "rows": []
  },
  "metadata": {
    "n_bytes_scanned": 8716223
  }
}

When the query has completed successfully, the response HTTP status code will be 200, and the is_completed field will be true.

The results field will contain information you can use to render a table of results. The columns field is an array of the names of the columns in the results table, and the rows field is an array of JSON objects representing the rows.

{
  "is_completed" true,
  "results": {
    "column_ordering": ["time", "@index", "raw_event"],
    "rows": [
      { "time": "2024-02-04T01:02:12.210Z", "@index": "global-cloudtrail", "raw_event": "..." },
      { "time": "2024-02-04T01:12:45.761Z", "@index": "global-cloudtrail", "raw_event": "..." },
      { "time": "2024-02-04T01:12:45.761Z", "@index": "global-cloudtrail", "raw_event": "..." },
      ...
    ]
  },
  "metadata": {
    "n_bytes_scanned": 90184761
  }
}

How to execute a blocking ad hoc query

POST /v1/blocking_query

To execute a blocking ad hoc query, you just issue a POST /v1/blocking_query request. The Scanner API will hold the request open until the query completes, or it will time out if the query takes longer than 60 seconds.

Body

Name
Type
Description

query required

string

Query text

start_time required

string

Start timestamp for the query (inclusive). The format of the timestamp is RFC 3339

end_time required

string

End timestamp for the query (exclusive). The format of the timestamp is RFC 3339

max_rows

number

Maximum number of rows to return. Default is 1000, max is 100000

max_bytes

number

Maximum number of bytes to allocate in memory for this query. Default and max are 134217728 (128MB), min is 1048576 (1MB).

scan_back_to_front

boolean

Scan from back (latest) to front (earliest). Default is true

Example

curl $API_BASE/v1/blocking_query \
-H "Authorization: Bearer $SCANNER_API_KEY" \
-H "Content-Type: application/json" \
-X POST \
-d '{
  "query": "%ingest.source_type: \"aws:cloudtrail\" and sourceIPAddress: 174.23.51.122",
  "start_time": "2024-02-04T01:00:00.000Z",
  "end_time": "2024-02-04T01:30:00.000Z"
}'

Response

When the query has completed successfully, the response HTTP status code will be 200, and the is_completed field will be true.

The results field will contain information you can use to render a table of results. The columns field is an array of the names of the columns in the results table, and the rows field is an array of JSON objects representing the rows.

{
  "is_completed": true,
  "results": {
    "column_ordering": ["time", "@index", "raw_event"],
    "rows": [
      { "time": "2024-02-04T01:02:12.210Z", "@index": "global-cloudtrail", "raw_event": "..." },
      { "time": "2024-02-04T01:12:45.761Z", "@index": "global-cloudtrail", "raw_event": "..." },
      { "time": "2024-02-04T01:12:45.761Z", "@index": "global-cloudtrail", "raw_event": "..." },
      ...
    ]
  },
  "metadata": {
    "n_bytes_scanned": 90184761
  }
}

When the query times out, the response HTTP status code will be 504.

PreviousAPINextDetection Rules

Last updated 1 month ago

Was this helpful?