stats()
stats(f [, ...fs]) by col [, ...cols]
produces an aggregation table.
stats
computes the aggregations specified by f [, ...fs]
for each group of col [, ...cols]
.
Any function that returns a single row and single column can be used in place of f
:
avg()
count()
countdistinct()
max()
min()
percentile()
sum()
var()
Each argument can also be aliased with the as
keyword, which will rename it in the output.
In addition, f
is optional. stats
always calculates the count, e.g. stats by foo
will aggregate counts by foo
.
by
columns are also optional. If no by
columns are specified, stats
will aggregate over the input datastream. For example, stats max(a), max(b)
will calculate the maximum of column a
and the maximum of column b
.
Technical Notes
If there are fewer than 1000 distinct groups,
stats()
will return those groups, their exact counts, and the statistics for them.If there are >1000 distinct groups, it will return rows for the 1000 most numerous groups.
These 1000 groups are selected using a variant of the count-min sketch—specifically the count-mean-min sketch detailed here.
For most inputs, this has a p99 error of 0.1%.
The actual values returned for each group/row may be exact depending on the aggregation; only the process of selecting these groups is approximate.
Returns
A table with one row for each distinct value of col [, ...cols]
, with the following columns:
One column for each
col
provided, containing the value ofcol
.@q.count
, containing the number of occurrences of that value.One column for each of
f [, ...fs]
, named according to theas
field, or, if noas
was used, named the same asf
.
Examples
Last updated