Understanding Tokens and Query Performance
How you structure your queries affects how quickly Scanner can return results. This page covers key patterns for writing efficient queries.
For background on Scanner's architecture, see How Scanner Achieves Fast Queries.
How the Index Works
Scanner builds an inverted index that maps field values to the files containing them. The index keys are tokens stored in sorted order.
Think of it like a phone book: you can quickly find all names starting with "Sm" because names are sorted alphabetically. But finding all names ending with "son" requires reading every entry. This is why trailing wildcards are fast (prefix lookup) and leading wildcards are slow (full scan).
Token Matching: The Most Important Concept
Scanner breaks field values into tokens and indexes each token separately. You can search for individual tokens directly without wildcards.
What Makes a Token
Tokens consist of alphanumeric characters (a-z, A-Z, 0-9) and underscores. Everything else is a boundary.
Part of tokens: Letters, numbers, underscores (
_)Boundaries: Spaces, hyphens, dots, colons, slashes, brackets, quotes, equals, commas, and all other punctuation
For example, user_admin_backup is a single token, but user-admin-backup splits into three tokens: user, admin, backup.
Search is case-insensitive by default. Searching for error, Error, or ERROR all match the same tokens.
IP addresses are special: Scanner indexes both the full IP address and each octet separately. So 192.168.1.100 generates tokens for 192.168.1.100, 192, 168, 1, and 100.
Examples
The value scnr-QueryWorker-95a7410[$LATEST] becomes tokens scnr, QueryWorker, 95a7410, LATEST. Searching field: "QueryWorker" matches.
The value /var/log/application/error.log becomes tokens var, log, application, error, log. Searching field: "error" matches.
The value user-admin-backup becomes tokens user, admin, backup (hyphens are boundaries). Searching field: "admin" matches.
The value 192.168.1.100 becomes tokens 192.168.1.100, 192, 168, 1, 100 (IP addresses get special handling). Searching field: "192.168.1.100" matches, and so does field: "168" since individual octets are searchable.
The value user_admin_backup stays as one token user_admin_backup (underscores are NOT boundaries). Searching field: "admin" no match.
Why This Matters
Consider searching for Lambda functions containing "QueryWorker":
Both queries find the same results. The second is faster because QueryWorker is a complete token within values like scnr-QueryWorker-95a7410[$LATEST], so Scanner can look it up directly in the index.
When You Still Need Wildcards
Wildcards are necessary when there are no token boundaries:
Prefix matching within a token:
admin*matches "administrator"Substring within a token: searching for "Processor" within
AdminActionProcessorWorkerrequires*Processor*(slow)
If possible, restructure substring searches as prefix searches. AdminActionProcessor* is fast; *Processor* is slow.
Wildcard Performance
Trailing Wildcards are Fast
Trailing wildcards (value*) use index prefix matching on sorted keys:
Leading Wildcards are Slow
Leading wildcards (*value) cannot use the index to narrow down the search. Scanner must directly scan all data in the time range and index specified, which can be significantly slower on large datasets:
Avoid Exact Match with Wildcards
Scanner has two operators for column queries (see Query Syntax):
:(contains) - searches for a token within the field value (uses the index)=(exact match) - matches the entire raw field value exactly
Both operators are fast when used correctly. The problem is combining = with wildcards:
The : operator is fast because it looks up tokens in the index. The = operator with wildcards is slow because it must scan every raw field value to check if it matches the pattern.
If you're coming from Splunk, note that the = "*value*" pattern doesn't translate well. Use : "value" instead.
When to Use Each Operator
Find token within field
:
message: "error"
Fast (index lookup)
Exact raw field value
=
status = "success"
Fast (direct match)
Token prefix match
: with *
user: "admin*"
Fast (index prefix scan)
Raw value pattern match
= with *...*
path = "*api*"
Slow (scans all values)
Index Selection
Queries run against a specific index using the @index= syntax (see examples in Data Exploration). If you don't specify an index, Scanner runs the query across all indexes you have permission to query.
Smaller, focused indices are faster to query than large ones containing all your data. When Scanner executes a query, it scans data within the selected index and time range. A smaller index means less data to scan, especially for queries that can't fully leverage the token index (like leading wildcard searches).
See Index Organization for strategies on structuring your indexes.
Best Practices Summary
Understand token boundaries - you often don't need wildcards at all
Use contains (
:) for "find within" queries - avoid= "*value*"patternsPrefer trailing wildcards (
value*) over leading wildcards (*value) when wildcards are neededUse exact match (
=) for exact values - it's fast for direct lookups likestatus = "success"Apply filters before aggregations to reduce the amount of data scanned
Last updated
Was this helpful?