# stats()

`stats([...fs]) by [...cols]` produces an aggregation table.

`stats` computes the aggregations specified by `[...fs]` for each group of `[...cols]`.

Any function that returns a single row and fixed number of columns can be used as an argument in `fs`:

* `avg()`
* `count()`
* `countdistinct()`
* `max()`
* `min()`
* `percentile()`
* `sum()`
* `var()`

Each `f` in `fs` can also be aliased with the `as` keyword, which will rename it in the output. If an argument is *not* renamed, it will be provided with a default name for legibility. <mark style="background-color:yellow;">**This default name is**</mark><mark style="background-color:yellow;">**&#x20;**</mark>*<mark style="background-color:yellow;">**not**</mark>*<mark style="background-color:yellow;">**&#x20;**</mark><mark style="background-color:yellow;">**guaranteed to be stable, and queries should not depend on it.**</mark> If a column needs to be referenced in a later function in the pipeline, please explicitly alias it using `as`.

`fs` may be empty. Regardless of whether any `fs` are provided, `stats` *always* calculates the count, e.g. `stats by foo` will aggregate counts by `foo`.

`cols` may be empty. If it is, `stats` will aggregate over the entire input datastream. For example, `stats max(a), max(b)` will calculate a single maximum of column `a` and a single maximum of column `b` across the whole dataset, producing a table with a single row.

## Technical Notes

* If the total size of the result set is less than 128MB (or `max_bytes` if using the API), `stats` will return all the groups, their exact counts, and the statistics for them.
* If the total size of the result set is over 128MB (or `max_bytes` if using the API), `stats`will return a sampling of the groups, weighted for more-frequent groups, such that the returned result set is under 128MB.
  * Not all groups are guaranteed to be in the result; however, every group that *is* in the result is guaranteed to have its correct final value.
  * Note that the 128MB limitation is on the size of the in-memory representation, and may not correspond exactly to e.g. the size of the returned JSON in the API.

## Returns

A table with one row for each distinct value of `[...cols]`, with the following columns:

* One column for each `col` of `[..cols]` provided, each named `col` and containing the value of `col`.
* `@q.count`, containing the number of occurrences of that value.
* At least one column for each `f` of `[...fs]`, named according to the `as` field, or, if no `as` was used, named the same as `f`.
  * If `f` is a column name, or a function that returns only one column (e.g. `sum`, `avg`, `count`), then that column is added to the result
  * If `f` is a function that can return multiple columns (e.g. `max`), then all of those columns are added to the result

## Examples

```python
# a table with columns named `hostname`, `@q.count`, `elapsed_ms`, and `max(timestamp)`
* | stats max(elapsed_ms) as elapsed_ms, max(timestamp) by hostname

# a table with columns named `@q.count`, `mean`, `p50`, and `p90`
userIdentity.type: "IAMUser" and eventSource: "s3.amazonaws.com"
| stats count() as numReqs, userIdentity.arn by userIdentity.arn
| stats
    avg(numReqs) as mean,
    percentile(50, numReqs) as p50,
    percentile(90, numReqs) as p90
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/querying-and-analysis/aggregation-functions/stats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
