> For the complete documentation index, see [llms.txt](https://docs.scanner.dev/scanner/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/data-ingestion/log-event-structure.md).

# Log Event Structure

When Scanner ingests a log event, it converts the event into a flat table of **columns**: a map from column name to value. Queries match against these columns, so understanding how raw events become columns explains exactly what your queries will (and won't) match.

At a high level, ingestion is a four-stage pipeline:

```
raw file ──parse──▶ JSON object ──transform──▶ JSON object ──extract timestamp──▶ timestamped JSON object ──flatten──▶ columns
```

1. **Parse** — each event in the source file becomes a JSON object, according to its [ingestion format](#ingestion-formats).
2. **Transform** — the index rule's [transformation steps](#transformation-steps) run on the JSON object, producing another JSON object (or multiple JSON objects for some transformations).
3. **Extract timestamp** — the [event timestamp](#timestamp-extraction) is extracted from each JSON object based on the index rule's timestamp extractors.
4. **Flatten** — the final JSON object is [flattened](#flattening) into the columns queries run against, including the [reserved system fields](#reserved-system-fields) Scanner populates.

## Ingestion Formats

Each ingested file is first parsed into one **JSON object per log event**. The file format is set by the index rule configuration — it is not auto-detected from file content or extension. Each supported format is transformed into a JSON object in a different way.

### JSON Logs

JSON files yield one event per JSON object; the object is used as-is. Newline-delimited objects (JSONL), JSON arrays, and concatenated objects are all accepted (assuming you've set the configuration to the appropriate value).

### Plaintext Logs

Plaintext (unstructured) log files are ingested one event per line. Each line becomes the JSON object `{"message": "<line>"}`, carrying the raw text verbatim:

```
Accepted publickey for deploy from 203.0.113.5 port 52144 ssh2
```

produces the event:

```json
{ "message": "Accepted publickey for deploy from 203.0.113.5 port 52144 ssh2" }
```

A plaintext line carries no other structure — to extract individual fields or an event timestamp from the text, use a [data transformation](/scanner/using-scanner-complete-feature-reference/data-transformation-and-enrichment/data-transformations.md); the extracted fields then become columns like any other.

### Tabular Formats (CSV and Parquet)

**CSV**: each row becomes one event — the flat JSON object `{"<header>": "<value>", ...}`, with the header names as keys and every value a string.

**Parquet**: each record becomes one event — the record's equivalent JSON structure, including any nesting. Field names derive from the Parquet schema.

Scanner also ships built-in parsers for several vendor-specific formats (AWS CloudTrail JSON, ELB / CloudFront / S3 server access logs, and others) that produce fixed, format-defined event shapes.

## Transformation Steps

After parsing, each event runs through the index rule's [data transformations](/scanner/using-scanner-complete-feature-reference/data-transformation-and-enrichment/data-transformations.md) (VRL). Whatever the source format, the event is treated as a nested JSON object: transformations can parse, reshape, enrich, drop, or split it (a transformation that outputs an array fans out into one event per element). The output is again a JSON object, which proceeds to flattening.

## Timestamp Extraction

Every log event has exactly one **timestamp**: the event's time, at nanosecond precision. It is determined by the index rule's [timestamp extractors](/scanner/using-scanner-complete-feature-reference/data-transformation-and-enrichment/data-transformations.md#extract-timestamp), which are applied to the transformed JSON object in order — the first one to yield a timestamp wins. If no extractor succeeds, the event inherits the previous event's timestamp (or the source file's modification time, if no event in the file has yielded one yet).

Timestamps are also made **unique within a file**: when multiple events carry the same timestamp, each subsequent one is nudged forward by a nanosecond. The timestamp Scanner records can therefore differ from the source data's by a few nanoseconds.

The timestamp is the basis for the event's time-related behavior:

* The [reserved system fields](#reserved-system-fields) `@scnr.datetime` and `@scnr.time_ns` are derived from it.
* It determines how the event is time-filtered — where it falls in time-range queries.

## Flattening

The final JSON object is flattened into table columns by joining the path segments of each leaf value:

* Nested object keys are joined with `.` — `{"userIdentity": {"arn": "..."}}` produces the column `userIdentity.arn`.
* Array elements use `[n]` index segments — `{"tags": ["a", "b"]}` produces the columns `tags[0]` and `tags[1]`.
* A key containing any of the path control characters `.[]"` is **quoted**: the key is wrapped in double quotes, with any interior `"` doubled (`"` → `""`). So `{"a.b": 1}` produces the column `"a.b"` (with literal quotes) — a different column from the `a.b` produced by `{"a": {"b": 1}}` — and a key consisting of a single `"` character produces the column `""""`.

For example, this event:

```json
{
  "eventName": "ConsoleLogin",
  "userIdentity": {
    "type": "Root",
    "arn": "arn:aws:iam::123456789012:root",
    "session.issuer": "arn:aws:iam::123456789012:role/Admin"
  },
  "resources": [{ "accountId": "123456789012" }],
  "client.ip": "203.0.113.5"
}
```

flattens to the columns:

| Column                          | Value                                  |
| ------------------------------- | -------------------------------------- |
| `eventName`                     | `ConsoleLogin`                         |
| `userIdentity.type`             | `Root`                                 |
| `userIdentity.arn`              | `arn:aws:iam::123456789012:root`       |
| `userIdentity."session.issuer"` | `arn:aws:iam::123456789012:role/Admin` |
| `resources[0].accountId`        | `123456789012`                         |
| `"client.ip"`                   | `203.0.113.5`                          |

Note the quoted segments: `"client.ip"` and `"session.issuer"` were keys containing a literal dot (a path control character), so each is escaped to stay distinct from a nested path. Quoting applies per segment, composing with the `.` join — the nested `"session.issuer"` key becomes one quoted segment in the column `userIdentity."session.issuer"`. This applies to keys from every source format — e.g. a CSV header `location.lat` produces the column `"location.lat"`.

Every leaf value is indexed as a string, and values that parse as numbers are additionally indexed numerically, so numeric comparisons (e.g. `statusCode > 400`) work even when the source data carries the value as a string.

## Reserved system fields

The `@scnr*`, `@index*`, `@q*`, `@@*`, and `@debug*` namespaces — i.e. any key beginning with one of those prefixes — are **reserved** for internal use: the behavior of source data carrying keys with these prefixes is undefined. **Source data values under these keys may be replaced, dropped, or otherwise become impossible to reference in queries, and this behavior may change without notice**.

Within these reserved namespaces, Scanner populates a small set of fields with defined, documented meaning:

| Field                           | Description                                                                                                                                                                                                                                                                                                 |
| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `@scnr.source_type`             | The event's source type, e.g. `aws:cloudtrail`, `okta`. Set by the ingest pipeline for log sources with built-in Collect rules; `custom:generic` for sources Scanner doesn't natively recognize.                                                                                                            |
| `@scnr.source_type_custom_name` | A user-supplied source name; present only when `@scnr.source_type` is `custom:generic`. This field is legacy — to label a custom source, use an [Add Field transformation](/scanner/using-scanner-complete-feature-reference/data-transformation-and-enrichment/data-transformations.md#add-field) instead. |
| `@scnr.datetime`                | The event timestamp as an RFC 3339 string.                                                                                                                                                                                                                                                                  |
| `@scnr.time_ns`                 | The event timestamp in epoch nanoseconds (numeric).                                                                                                                                                                                                                                                         |
| `@index` / `@index_id`          | The alias and UUID of the index the event was scanned from.                                                                                                                                                                                                                                                 |

Scanner also synthesizes a few internal bookkeeping fields that are **not** part of the public contract: they are hidden from column listings, but a query that names one will still match against it. We recommend not relying on these values, as they largely exist for historical reasons.

| Field                                | Description                                                                                         |
| ------------------------------------ | --------------------------------------------------------------------------------------------------- |
| `@scnr.log_event_id.timestamp_nanos` | The event timestamp in epoch nanoseconds. Use `@scnr.time_ns` instead.                              |
| `@scnr.log_event_id.sub_id`          | A sub-identifier distinguishing events that share a timestamp. This should be considered arbitrary. |
| `@scnr.timestamp`                    | The event timestamp in epoch nanoseconds. Use `@scnr.time_ns` instead.                              |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/data-ingestion/log-event-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
