stats()
stats([...fs]) by [...cols]
produces an aggregation table.
stats
computes the aggregations specified by [...fs]
for each group of [...cols]
.
Any function that returns a single row and fixed number of columns can be used as an argument in fs
:
avg()
count()
countdistinct()
max()
min()
percentile()
sum()
var()
Each f
in fs
can also be aliased with the as
keyword, which will rename it in the output. If an argument is not renamed, it will be provided with a default name for legibility. This default name is not guaranteed to be stable, and queries should not depend on it. If a column needs to be referenced in a later function in the pipeline, please explicitly alias it using as
.
fs
may be empty. Regardless of whether any fs
are provided, stats
always calculates the count, e.g. stats by foo
will aggregate counts by foo
.
cols
may be empty. If it is, stats
will aggregate over the entire input datastream. For example, stats max(a), max(b)
will calculate a single maximum of column a
and a single maximum of column b
across the whole dataset, producing a table with a single row.
Technical Notes
If the total size of the result set is less than 128MB,
stats()
will return all the groups, their exact counts, and the statistics for them.If the total size of the result set is over 128MB,
stats()
will return a sampling of the groups, weighted for more-frequent groups, such that the returned result set is under 128MB.Not all groups are guaranteed to be in the result; however, every group that is in the result is guaranteed to have its correct final value.
Note that the 128MB limitation is on the size of the in-memory representation, and may not correspond exactly to e.g. the size of the returned JSON in the API.
Returns
A table with one row for each distinct value of [...cols]
, with the following columns:
One column for each
col
of[..cols]
provided, each namedcol
and containing the value ofcol
.@q.count
, containing the number of occurrences of that value.At least one column for each
f
of[...fs]
, named according to theas
field, or, if noas
was used, named the same asf
.If
f
is a column name, or a function that returns only one column (e.g.sum
,avg
,count
), then that column is added to the resultIf
f
is a function that can return multiple columns (e.g.max
), then all of those columns are added to the result
Examples
Last updated
Was this helpful?