# Data Transformations

Scanner can transform your logs during ingestion. Below are the types of transformations you can configure for your Index Rules.

Note that all data added by transformations will count against your ingestion volume.

## Normalize to ECS (Elastic Common Schema)

Add normalized ECS (Elastic Common Schema) fields to the log events.

#### Parameters

* Log Source: One of the 12 log sources for which Scanner provides out-of-the-box normalization.

#### Example

```json
// Parameters
// Log source: "CloudTrail"

// Input log event
{
    "eventName": "CreateBucket",
    "awsRegion": "us-east-1",
    "recipientAccountId": "123456789012",
    "eventSource": "s3.amazonaws.com",
    "requestID": "request-1234567890",
    "sourceIPAddress": "192.168.1.1",
    "userAgent": "aws-cli/2.2.0 Python/3.8.10",
    "userIdentity": {
        "arn": "arn:aws:iam::123456789012:user/john.doe",
        "userName": "john.doe",
        "type": "IAMUser"
    },
}

// Output log event
{
    // Normalized fields are added under a new `@ecs` object
    "@ecs": {
        "event": { "action": "CreateBucket", "outcome": "success" },
        "cloud": {
            "provider": "aws",
            "region": "us-east-1",
            "account": { "id": "123456789012" },
            "service": { "name": "s3.amazonaws.com" },
        },
        "http": { "request_id": "request-1234567890" },
        "source": { "ip": "192.168.1.1" },
        "user_agent": "aws-cli/2.2.0 Python/3.8.10",
        "user": { "id": "arn:aws:iam::123456789012:user/john.doe", "name": "john.doe" },
    },
    // Existing fields are still included and unchanged
    "eventName": "CreateBucket",
    "awsRegion": "us-east-1",
    "recipientAccountId": "123456789012",
    "eventSource": "s3.amazonaws.com",
    "requestID": "request-1234567890",
    "sourceIPAddress": "192.168.1.1",
    "userAgent": "aws-cli/2.2.0 Python/3.8.10",
    "userIdentity": {
        "arn": "arn:aws:iam::123456789012:user/john.doe",
        "userName": "john.doe",
        "type": "IAMUser"
    },
}
```

## Add Field

Add a field to the log event. Useful for tagging log events based on the Index Rule.

#### Parameters

* Target Path: The field to insert into
* Value: The string value to be inserted

#### Example

```json
// Parameters
// Source Path: ".@tag.s3_bucket"
// Target Path: "my-bucket-foo"

// Input log event
{
  "message": "Hello world"
}

// Output log event
{
  "@tag": {
    "s3_bucket": "my-bucket-foo"
  },
  "message": "Hello world"
}
```

## Extract Timestamp

Every log event in Scanner must have a timestamp. This transformation allows users to specify the field from which to extract the timestamp. Must have at least one per Index Rule.

#### Supported Timestamp Formats

Scanner supports various timestamp formats, including:

* RFC 2822
* RFC 3339
* Unix epoch timestamp (seconds/milliseconds/microseconds/nanoseconds since epoch)

The best way to check if timestamps are extracted correctly is to use the preview tool. A warning will appear if Scanner failed to extract the timestamps from the specified fields.

If the existing timestamp field is not an accepted format. You may transform it first using a custom VRL program.

#### Fallbacks

You may specify additional "Extract Timestamp" steps as fallbacks. This is useful if the logs from the same source are heterogenous (i.e. they don't all have the same timestamp field).

If all fails (e.g. none of the fields specified are present), Scanner will make a best guess based on:

* The timestamp of preceding log events in the same file, or
* The S3 file's "last modified" timestamp.

#### Parameters

* Source Path: The field from which the timestamp will be extracted.
* Regex (optional):
  * If the timestamp needs to be extracted from a string field (e.g. a "message" field), the regex is used to extract the value.
  * Must have exactly one capture group for the timestamp value.
  * Not needed if the field value contains just the timestamp.
  * Does not apply if the field value is not a string.

#### Example

```json
// Parameters (with multiple steps defined)
// 1. Source Path: ".started_at", Regex: (none)
// 2. Source Path: ".event.timestamp", Regex: (none)
// 3. Source Path: ".message", Regex: "^(\S+)\s"

// Input log events (from the same file)
{
    "time": "2023-04-05T12:34:56.789Z",
    "started_at": "2023-04-05T12:34:56.123Z",
    "request_id": "123",
    "message": "2023-04-05T12:34:56.234Z INFO Handling request",
}
{
    "request_id": "123",
    "message": "2023-04-05T12:34:56.345Z ERROR Request failed",
}
{
    "request_id": "123",
    "event": {
        "type": "Some event type",
        "timestamp": 1680698096567, // milliseconds since epoch
    },
}

// Extracted timestamps
"2023-04-05T12:34:56.123"
"2023-04-05T12:34:56.345"
"2023-04-05T12:34:56.567" // 1680698096567 milliseconds since epoch
```

## Parse JSON Fields

Parses all fields that contain stringified JSON objects or arrays, so the structure is reflected and indexed in Scanner.

#### Example

```json
// Input log event
{
    "logStream": "abcd1234",
    "message": "{\"elapsed_ms\":238,\"status\":\"200\",\"request_id\":\"123\"}",
}

// Output log event
{
    "logStream": "abcd1234",
    // The field is replaced by the parse JSON object.
    "message": {
        "elapsed_ms": 238,
        "status": "200",
        "request_id": "123",
    },
}
```

## Unroll Array

Transform one log event into multiple by unrolling an array field. Useful when the actual events are wrapped in an array in one object.

All fields other than the unrolled field will be duplicated for each log event.

#### Parameters

* Source Path: The array to be unrolled.
* Target Path: Where the unrolled items will be set at.

#### Example

```json
// Parameters
// Source Path: ".events"
// Target Path: ".event"

// Input log event
{
    "timestamp": "2023-04-05T12:34:56Z",
    "user": "john@example.com",
    "user_ip": "192.168.1.1",
    "events": [
        { "action": "foo", "outcome": "success" },
        { "action": "bar", "outcome": "failure", "error": "AccessDenied" },
        { "action": "baz", "outcome": "success" },
    ],
}

// Output log events (multiple)
{
    // Other fields are copied for each unrolled log event
    "timestamp": "2023-04-05T12:34:56Z",
    "user": "john@example.com",
    "user_ip": "192.168.1.1",
    // The original `events` array becomes a single object
    "event": {
        // The index from the original array is added to every log event
        "%idx": 0,
        "action": "foo",
        "outcome": "success",
    },
}
{
    "timestamp": "2023-04-05T12:34:56Z",
    "user": "john@example.com",
    "user_ip": "192.168.1.1",
    "event": {
        "%idx": 1,
        "action": "bar",
        "outcome": "failure",
        "error": "AccessDenied",
    },
}
{
    "timestamp": "2023-04-05T12:34:56Z",
    "user": "john@example.com",
    "user_ip": "192.168.1.1",
    "event": {
        "%idx": 2,
        "action": "baz",
        "outcome": "success",
    },
}
```

## Enrich with AlienVault OTX

Enriches log events with threat intelligence data from AlienVault Open Threat Exchange (OTX). This transformation matches log fields (IPs, domains, URLs, email addresses, file hashes) against threat indicators and appends threat metadata to matching events.

**Note**: This transformation requires setting up an [AlienVault OTX integration](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/lookup-table-enrichment/threat-intelligence#setting-up-alienvault-otx-integration) and creating a synced lookup table first.

For detailed information about parameters, output structure, and examples, see [Threat Intelligence](https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/lookup-table-enrichment/threat-intelligence#using-alienvault-otx-enrichment).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scanner.dev/scanner/using-scanner-complete-feature-reference/data-transformation-and-enrichment/data-transformations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
