countdistinct()
countdistinct(col [, ...cols])
counts the number of distinct values of the provided column(s) in the input stream. If multiple columns are provided, their values are treated as a tuple.
Technical Notes
If there are at most 1000 distinct values in the input stream, it will return the exact value.
If there are >1000 distinct values, it will fall back to a variant of the hyperloglog algorithm—specifically the HLL+ algorithm detailed here with a precision of 14.
For most inputs, this has an average error of 0.5% and a p95 error of 1.5%.
Returns
A table with one row and one column, called @q.uniq
.
Example
Analyze AWS CloudTrail logs to count the number of distinct IAM users making AWS API calls.
Last updated