# stats()

`stats(f [, ...fs]) by col [, ...cols]`

produces an aggregation table.

`stats`

computes the aggregations specified by `f [, ...fs]`

for each group of `col [, ...cols]`

.

Any function that returns a single row and single column can be used in place of `f`

:

`avg()`

`count()`

`countdistinct()`

`max()`

`min()`

`percentile()`

`sum()`

`var()`

Each argument can also be aliased with the `as`

keyword, which will rename it in the output.

In addition, `f`

is optional. `stats`

*always* calculates the count, e.g. `stats by foo`

will aggregate counts by `foo`

.

`by`

columns are also optional. If no `by`

columns are specified, `stats`

will aggregate over the input datastream. For example, `stats max(a), max(b)`

will calculate the maximum of column `a`

and the maximum of column `b`

.

## Technical Notes

If there are fewer than 1000 distinct groups,

`stats()`

will return those groups, their exact counts, and the statistics for them.If there are >1000 distinct groups, it will return rows for the 1000 most numerous groups.

These 1000 groups are selected using a variant of the count-min sketch—specifically the count-mean-min sketch detailed here.

For most inputs, this has a p99 error of 0.1%.

The

*actual values*returned for each group/row may be exact depending on the aggregation; only the process of selecting these groups is approximate.

## Returns

A table with one row for each distinct value of `col [, ...cols]`

, with the following columns:

One column for each

`col`

provided, containing the value of`col`

.`@q.count`

, containing the number of occurrences of that value.One column for each of

`f [, ...fs]`

, named according to the`as`

field, or, if no`as`

was used, named the same as`f`

.

## Examples

Last updated