# Regular Expressions in Index Rules

Regular expressions used in Scanner data ingestion (such as "Additional Regex" filter for AWS S3 keys or timestamp extraction) support the following standard syntax:

<table><thead><tr><th width="83"></th><th width="279"></th><th width="113"></th><th></th></tr></thead><tbody><tr><td><code>.</code></td><td>any non-newline character</td><td><code>(a|z)</code></td><td><code>a</code> or <code>z</code></td></tr><tr><td><code>^</code></td><td>start of line</td><td><code>[az]</code></td><td><code>a</code> or <code>z</code></td></tr><tr><td><code>$</code></td><td>end of line</td><td><code>[^az]</code></td><td>not <code>a</code> or <code>z</code></td></tr><tr><td><code>\b</code></td><td>word boundary</td><td><code>[a-z]</code></td><td><code>a</code> through <code>z</code></td></tr><tr><td><code>\B</code></td><td>non-word boundary</td><td><code>(foo)</code></td><td>capture <code>foo</code></td></tr><tr><td><code>\A</code></td><td>start of subject (usually the same as <code>^</code>)</td><td><code>a?</code></td><td>0 or 1 <code>a</code>s</td></tr><tr><td><code>\z</code></td><td>end of subject (usually the same as <code>$</code>)</td><td><code>a*</code></td><td>0 or more <code>a</code>s</td></tr><tr><td><code>\d</code></td><td>decimal digit</td><td><code>a+</code></td><td>1 or more <code>a</code>s</td></tr><tr><td><code>\D</code></td><td>non-decimal digit</td><td><code>a{3}</code></td><td>exactly 3 <code>a</code>s</td></tr><tr><td><code>\s</code></td><td>whitespace</td><td><code>a{3,}</code></td><td>3 or more <code>a</code>s</td></tr><tr><td><code>\S</code></td><td>non-whitespace</td><td><code>a{3,5}</code></td><td>between 3 and 5 <code>a</code>s (inclusive)</td></tr><tr><td><code>\w</code></td><td>word character</td><td></td><td></td></tr><tr><td><code>\W</code></td><td>non-word character</td><td></td><td></td></tr></tbody></table>

All regular expressions are case-sensitive and unicode-aware, e.g. `\s` will match unicode whitespace characters as well as ASCII ones.

## Limitations

Certain features of regular expressions aren't supported when they're used in [Index Rules](/scanner/using-scanner-complete-feature-reference/data-ingestion/sources/custom-logs-aws-s3.md). These are, specifically:

* Lookarounds (i.e. lookahead and lookbehind), both negative and positive.
  * Positive lookaround can usually be matched directly instead. E.g. `foo(?=bar)` could just be matched as `foobar`.
  * Negative lookaround *can* usually be matched as a normal regex, but it can be tricky.
    * E.g. `pre_(?!no)/` can be matched as `pre_([^/]?|[^n/][^/]|[^/][^o/]|[^/]{3,})/`.
      * Because complex regexes like this are hard to maintain, we recommend just positive-matching the specific known items instead, e.g. `pre_(yes|yeah|sure)`.
* Backreferences.
  * Due to the nature of backreferences (i.e. that they are non-regular), it isn't generally possible to replicate the same match without them.
    * When possible, we recommend just enumerating all the items in this case instead, e.g. instead of trying to match all folders`foob(a+)rb\0z/`, you can just enumerate the folders you know exist, like `foo(barbaz|baarbaaz)/`.

These features are unsupported due to allowing for construction of [extremely slow (exponential-time) regexes](https://en.wikipedia.org/wiki/ReDoS) that are hard for Scanner to detect.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/data-ingestion/regular-expressions-in-index-rules.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
