scanner
  • About Scanner
  • When to use it
  • Architecture
  • Getting Started
  • Playground Guide
    • Overview
    • Part 1: Search and Analysis
    • Part 2: Detection Rules
    • Wrapping Up
  • Log Data Sources
    • Overview
    • List
      • AWS
        • AWS Aurora
        • AWS CloudTrail
        • AWS CloudWatch
        • AWS ECS
        • AWS EKS
        • AWS GuardDuty
        • AWS Lambda
        • AWS Route53 Resolver
        • AWS VPC Flow
        • AWS VPC Transit Gateway Flow
        • AWS WAF
      • Cloudflare
        • Audit Logs
        • Firewall Events
        • HTTP Requests
        • Other Datasets
      • Crowdstrike
      • Custom via Fluentd
      • Fastly
      • GitHub
      • Jamf
      • Lacework
      • Osquery
      • OSSEC
      • Sophos
      • Sublime Security
      • Suricata
      • Syslog
      • Teleport
      • Windows Defender
      • Windows Sysmon
      • Zeek
  • Indexing Your Logs in S3
    • Linking AWS Accounts
      • Manual setup
        • AWS CloudShell
      • Infra-as-code
        • AWS CloudFormation
        • Terraform
        • Pulumi
    • Creating S3 Import Rules
      • Configuration - Basic
      • Configuration - Optional Transformations
      • Previewing Imports
      • Regular Expressions in Import Rules
  • Using Scanner
    • Query Syntax
    • Aggregation Functions
      • avg()
      • count()
      • countdistinct()
      • eval()
      • groupbycount()
      • max()
      • min()
      • percentile()
      • rename()
      • stats()
      • sum()
      • table()
      • var()
      • where()
    • Detection Rules
      • Event Sinks
      • Out-of-the-Box Detection Rules
      • MITRE Tags
    • API
      • Ad hoc queries
      • Detection Rules
      • Event Sinks
      • Validating YAML files
    • Built-in Indexes
      • _audit
    • Role-Based Access Control (RBAC)
    • Beta features
      • Scanner for Splunk
        • Getting Started
        • Using Scanner Search Commands
        • Dashboards
        • Creating Custom Content in Splunk Security Essentials
      • Scanner for Grafana
        • Getting Started
      • Jupyter Notebooks
        • Getting Started with Jupyter Notebooks
        • Scanner Notebooks on Github
      • Detection Rules as Code
        • Getting Started
        • Writing Detection Rules
        • CLI
        • Managing Synced Detection Rules
      • Detection Alert Formatting
        • Customizing PagerDuty Alerts
      • Scalar Functions and Operators
        • coalesce()
        • if()
        • arr.join()
        • math.abs()
        • math.round()
        • str.uriencode()
  • Single Sign On (SSO)
    • Overview
    • Okta
      • Okta Workforce
      • SAML
  • Self-Hosted Scanner
    • Overview
Powered by GitBook
On this page
  • Technical Notes
  • Returns
  • Examples

Was this helpful?

  1. Using Scanner
  2. Aggregation Functions

stats()

stats([...fs]) by [...cols] produces an aggregation table.

stats computes the aggregations specified by [...fs] for each group of [...cols].

Any function that returns a single row and fixed number of columns can be used as an argument in fs:

  • avg()

  • count()

  • countdistinct()

  • max()

  • min()

  • percentile()

  • sum()

  • var()

Each f in fs can also be aliased with the as keyword, which will rename it in the output. If an argument is not renamed, it will be provided with a default name for legibility. This default name is not guaranteed to be stable, and queries should not depend on it. If a column needs to be referenced in a later function in the pipeline, please explicitly alias it using as.

fs may be empty. Regardless of whether any fs are provided, stats always calculates the count, e.g. stats by foo will aggregate counts by foo.

cols may be empty. If it is, stats will aggregate over the entire input datastream. For example, stats max(a), max(b) will calculate a single maximum of column a and a single maximum of column b across the whole dataset, producing a table with a single row.

Technical Notes

  • If the total size of the result set is less than 128MB (or max_bytes if using the API), stats will return all the groups, their exact counts, and the statistics for them.

  • If the total size of the result set is over 128MB (or max_bytes if using the API), statswill return a sampling of the groups, weighted for more-frequent groups, such that the returned result set is under 128MB.

    • Not all groups are guaranteed to be in the result; however, every group that is in the result is guaranteed to have its correct final value.

    • Note that the 128MB limitation is on the size of the in-memory representation, and may not correspond exactly to e.g. the size of the returned JSON in the API.

Returns

A table with one row for each distinct value of [...cols], with the following columns:

  • One column for each col of [..cols] provided, each named col and containing the value of col.

  • @q.count, containing the number of occurrences of that value.

  • At least one column for each f of [...fs], named according to the as field, or, if no as was used, named the same as f.

    • If f is a column name, or a function that returns only one column (e.g. sum, avg, count), then that column is added to the result

    • If f is a function that can return multiple columns (e.g. max), then all of those columns are added to the result

Examples

# a table with columns named `hostname`, `@q.count`, `elapsed_ms`, and `max(timestamp)`
* | stats max(elapsed_ms) as elapsed_ms, max(timestamp) by hostname

# a table with columns named `@q.count`, `mean`, `p50`, and `p90`
userIdentity.type: "IAMUser" and eventSource: "s3.amazonaws.com"
| stats count() as numReqs, userIdentity.arn by userIdentity.arn
| stats
    avg(numReqs) as mean,
    percentile(50, numReqs) as p50,
    percentile(90, numReqs) as p90
Previousrename()Nextsum()

Last updated 1 month ago

Was this helpful?