stats()
stats([...fs]) by [...cols]
produces an aggregation table.
stats
computes the aggregations specified by [...fs]
for each group of [...cols]
.
Any function that returns a single row and fixed number of columns can be used as an argument in fs
:
avg()
count()
countdistinct()
max()
min()
percentile()
sum()
var()
Each f
in fs
can also be aliased with the as
keyword, which will rename it in the output. If an argument is not renamed, it will be provided with a default name for legibility. This default name is not guaranteed to be stable, and queries should not depend on it. If a column needs to be referenced in a later function in the pipeline, please explicitly alias it using as
.
fs
may be empty. Regardless of whether any fs
are provided, stats
always calculates the count, e.g. stats by foo
will aggregate counts by foo
.
cols
may be empty. If it is, stats
will aggregate over the entire input datastream. For example, stats max(a), max(b)
will calculate a single maximum of column a
and a single maximum of column b
across the whole dataset, producing a table with a single row.
Technical Notes
If the total size of the result set is less than 128MB (or
max_bytes
if using the API),stats
will return all the groups, their exact counts, and the statistics for them.If the total size of the result set is over 128MB (or
max_bytes
if using the API),stats
will return a sampling of the groups, weighted for more-frequent groups, such that the returned result set is under 128MB.Not all groups are guaranteed to be in the result; however, every group that is in the result is guaranteed to have its correct final value.
Note that the 128MB limitation is on the size of the in-memory representation, and may not correspond exactly to e.g. the size of the returned JSON in the API.
Returns
A table with one row for each distinct value of [...cols]
, with the following columns:
One column for each
col
of[..cols]
provided, each namedcol
and containing the value ofcol
.@q.count
, containing the number of occurrences of that value.At least one column for each
f
of[...fs]
, named according to theas
field, or, if noas
was used, named the same asf
.If
f
is a column name, or a function that returns only one column (e.g.sum
,avg
,count
), then that column is added to the resultIf
f
is a function that can return multiple columns (e.g.max
), then all of those columns are added to the result
Examples
Last updated
Was this helpful?