# Query Syntax

### Log event structure

In Scanner, a log event is a collection of key-value pairs called **fields**. In a field, the **key** is always a string, and the **value** may be either a string or a number.

For example, a log event from application logs might look like this:

```json
{
  "message": "INFO - Successfully added item. item_id=817343 shopping_cart_id=1842101",
  "elapsed_ms": 79,
  "status_code": 200,
  "kubernetes": {
    "container_name": "shopping_cart_api",
    "pod_name": "app-3"
  },
  "@scnr": {
    "context_fields": "container_name,pod_name"
  }
}
```

And the resulting Scanner **log event** would look like this:

```python
message: "INFO - Successfully added item. item_id=817343 shopping_cart_id=1842101"
message.%kv.item_id: 817343
message.%kv.shopping_cart_id: 1842101
elapsed_ms: 79
status_code: 200
kubernetes.container_name: "shopping_cart_api"
kubernetes.pod_name: "app-3"
@scnr.context_fields: "container_name,pod_name"
```

### Text queries

Type in free-form text to search for hits. By default, search is case insensitive for ASCII characters, so these match the same lines.

```python
info successfully added
INFO Successfully added
```

By default, tokens are matched separately, so these match the same lines.

```python
info successfully added
info added successfully
added and info and successfully
```

Search terms only match full tokens by default (see [#token-boundaries](#token-boundaries "mention")). Wildcards (see below) can be used for subtoken matches.

Bare (unquoted) strings must begin with an alphabetical or `*@%_` character, and cannot include whitespace or any of the following characters: `` :()"'<>=|,~{}!#` ``. They also can't be any reserved keywords (see [#reserved-keywords](#reserved-keywords "mention")). Use single-quotes `'` if you need to match whitespace, reserved characters, or reserved keywords.

```python
'info - item not added'
'info - successfully added item and committed transaction'
```

Use double-quotes `"` for exact, case-sensitive matching.

```python
"item_id=817343"
"INFO - Successfully added item"
```

Quoted strings support escape sequences. See [#escape-sequences-for-strings](#escape-sequences-for-strings "mention") for a comprehensive list.

Use `*` for wildcard searches. You can use `\*` to match the actual asterisk character instead.

```python
app-*
*@protonmail.com
'andrew j*son'
"This sentence contains an actual asterisk: \*"
```

> **Note:** Leading wildcards like `*value` are slow because they cannot use the index and must scan all data directly. Prefer trailing wildcards (`value*`) when possible. See [Wildcard Performance](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/tokens-and-query-performance#wildcard-performance) for details.

Use `r"` for a raw string. Raw strings do not support escape sequences or wildcards and perform case-sensitive matching.

```rust
r"line_ending=\n"
r"*** CAUTION ***"
```

To include double quote characters `"` in a raw string, add a `#` character outside each quote. Use multiple (up to 4) if you need to include `#` characters inside the string:

```rust
r#"{"message": "OK", "code": 200}"#
r####"### ERROR: Authentication failed for user="admin" ###"####
```

### Column Queries

Use `column: value` to search for a `column` that *contains* `value`.

<pre class="language-python"><code class="lang-python"><strong>message: info added
</strong>message: 'info - successfully added item'
message: "INFO - Successfully added item"
kubernetes.pod_name: app-*
email: *@protonmail.com
current_president: 'andrew j*son'
</code></pre>

Like in simple text queries, the `:` operator only matches full tokens by default; see [#token-boundaries](#token-boundaries "mention"). Wildcards `*` can be used for subtoken matching.

Use `column = value` to search for a `column` that is *exactly* `value`.

```python
name = al
# matches: {name: "Al"}, {name: "al"}
# but NOT: {name: "Big Al"}

name = "Al"
# matches: {name: "Al"}
# but NOT: {name: "al"}, {name: "Big Al"}

email = "*@protonmail.co"
# matches: {email: "al@protonmail.co"}, {email: "rob@protonmail.co"}
# but NOT: {email: "jon@protonmail.com"}
```

Use `column: *` or `column = *` if you just want to check if a column exists at all.

Like with string values, bare (unquoted) column names must begin with an alphabetical or `*@%_` character, and cannot include whitespace or any of the following characters: `` :()"'<>=|,~{}!` ``. They also can't be any reserved keywords (see [#reserved-keywords](#reserved-keywords "mention")).

Use backticks ` `` ` to denote columns that contain spaces or other disallowed characters.

```sql
`cat breed` = "Domestic shorthair"
```

Quoted column names support escape sequences. See [#escape-sequences-for-strings](#escape-sequences-for-strings "mention") for a comprehensive list.

Use `*` or `**` as a wildcard in column names. `*` matches any character other than `.[]`; `**` matches any character. You can use `\*` to match the actual asterisk character instead.

<pre class="language-python"><code class="lang-python">*name = "Jackson"
# matches: {fname: "Andrew", lname: "Jackson"}, {fname: "Jackson", lname: "Pollock"}
# but NOT: {name_first: "Janet", name_last: "Jackson"}

<strong>request.*.status = 500
</strong># matches: {request.first_part.status: 200, request.second_part.status: 500}
# but NOT: {request.first_part.connection.status: 500}

request.**.status = 500
# matches: {request.first_part.status: 200, request.second_part.status: 500}
# matches: {request.first_part.connection.status: 500}

pet_kinds[*]: "fish"
# matches: {pet_kinds[0]: "cat", pet_kinds[1]: "dog", pet_kinds[2]: "fish"}
# but NOT: {pet_kinds[0].preferred_foods[0]: "fish"}

pet_kinds[**]: "fish"
# matches: {pet_kinds[0]: "cat", pet_kinds[1]: "dog", pet_kinds[2]: "fish"}
# matches: {pet_kinds[0].preferred_foods[0]: "fish"}
</code></pre>

### Number queries

If your log events have number fields, you can look for exact matches or inequalities.

```python
elapsed_ms: 79
elapsed_ms = 79
elapsed_ms <= 100
elapsed_ms > 100
```

### Boolean queries

Scanner supports boolean queries using `and`, `or`, and `not`. These are case-insensitive.

```python
kubernetes.container_name: "shopping_cart_api"
and elapsed_ms > 100 and elapsed_ms < 10000
and not status_code >= 400
```

You can use parentheses to specify order of operations.

```python
(message.%kv.item_id: 817343 or message.%kv.item_id: 25134)
and elapsed_ms > 50
```

If parentheses aren't used, then `not` has highest precedence, then `and`, then `or`, so these two queries are identical.

```python
elapsed_ms > 10 and not status_code >= 400 or message.%kv.item_id: 817343

(elapsed_ms > 10 and (not status_code >= 400)) or message.%kv.item_id: 817343
```

If omitted, the default operator is `and`; i.e. any two query terms without a boolean operator will be assumed to be using `and`, so the following two queries are identical.

```python
kubernetes.container_name: "shopping_cart_api" and elapsed_ms > 100

kubernetes.container_name: "shopping_cart_api" elapsed_ms > 100
```

Boolean operators can be used *inside* of column filters for the `:` and `=` operators, in which case the column filter distributes. Hence, these queries are identical.

```python
stdout: ("hello" and 'world')

stdout: "hello" and stdout: 'world'
```

Inside of a column filter, the default operator is `or` rather than `and`, so the following queries are identical.

```python
message.%kv.item_id = (817343 or 25134 or 55535)

message.%kv.item_id = (817343 25134 55535)

message.%kv.item_id = 817343
or message.%kv.item_id = 25134
or message.%kv.item_id = 55535
```

### Additional Details

#### Token Boundaries

Scanner breaks field values into tokens at boundaries. Tokens consist of alphanumeric characters and underscores; everything else (spaces, hyphens, dots, etc.) is a boundary.

A query match will always start and stop on a whole token. Wildcards `*` can span token boundaries and be used for subtoken matching.

* `al` matches "Al Sharpton" but not "Walt Whitman" or "Alan Turing"
* `al*` matches "Al Sharpton" and "Alan Turing" but not "Walt Whitman"
* `*al*` matches all of the above

See [Understanding Tokens and Query Performance](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/querying-and-analysis/tokens-and-query-performance) for details on how tokenization affects query efficiency.

#### Escape Sequences for Strings

You can use escape sequences for certain characters. These work in all strings, including column name strings.

<table><thead><tr><th width="257">Escape sequence</th><th>Character</th></tr></thead><tbody><tr><td><code>\"</code></td><td>double quote <code>"</code></td></tr><tr><td><code>\'</code></td><td>single quote <code>'</code></td></tr><tr><td><code>\`</code></td><td>backtick <code>`</code></td></tr><tr><td><code>\*</code></td><td>asterisk <code>*</code></td></tr><tr><td><code>\\</code></td><td>backslash <code>\</code></td></tr><tr><td><code>\/</code></td><td>forward slash <code>/</code></td></tr><tr><td><code>\b</code></td><td>backspace <code>U+0008</code></td></tr><tr><td><code>\f</code></td><td>form feed <code>U+000C</code></td></tr><tr><td><code>\n</code></td><td>line feed <code>U+000A</code></td></tr><tr><td><code>\r</code></td><td>carriage return <code>U+000D</code></td></tr><tr><td><code>\t</code></td><td>horizontal tab <code>U+0009</code></td></tr><tr><td><code>\uXXXX</code></td><td>unicode character <code>U+XXXX</code></td></tr></tbody></table>

#### Reserved Keywords

The following keywords are reserved in filters: `and`, `or`, `not`, `let`, `from`, `in`. Use quotes if you need to search for them as strings, and backticks if you need to use them as column names.
