Every log event in Scanner must have a timestamp. This transformation allows users to specify the column from which to extract the timestamp. Must have at least one per import rule.
Supported Timestamp Formats
Scanner supports various timestamp formats, including:
RFC 8601
RFC 3339
Unix epoch timestamp (seconds/milliseconds/microseconds/nanoseconds since epoch)
The best way to check if timestamps are extracted correctly is to use the import preview tool. A warning will appear if Scanner failed to extract the timestamps from the specified columns.
Fallbacks
You may specify additional "Extract Timestamp" steps as fallbacks. This is useful if the logs from the same source are heterogenous (i.e. they don't all have the same timestamp field).
If all fails (e.g. none of the columns specified are present), Scanner will make a best guess based on:
The timestamp of preceding log events in the same file, or
The S3 file's "last modified" timestamp.
Parameters
Extract from column: The timestamp column, or the column from which it will be extracted.
Regex (optional):
If the timestamp needs to be extracted from a string column (e.g. a "message" column), the regex is used to extract the value.
Must have exactly one capture group for the timestamp value.
Not needed if the column value contains just the timestamp.
Does not apply if the column value is not a string.
Parses all stringified JSON objects or arrays, so the structure is reflected and indexed in Scanner.
Example
// Input log event
{
"logStream": "abcd1234",
"message": "{\"elapsed_ms\":238,\"status\":\"200\",\"request_id\":\"123\"}",
}
// Output log event
{
"logStream": "abcd1234",
// The original column is preserved
"message": "{\"elapsed_ms\":238,\"status\":\"200\",\"request_id\":\"123\"}",
// Parsed JSON object is added under a new `.%json` object
"message.%json": {
"elapsed_ms": 238,
"status": "200",
"request_id": "123",
},
}
Parse Key-Value Columns
Parses all "key=value" pairs from all string columns.
Note that there isn't a single widely adopted standard for key-value pairs. If the Scanner implementation does not parse your logs correctly (or is too noisy), you should use the "Extract by Regex" transformation instead.
Example
// Input log event
{
"logStream": "abcd1234",
"message": "Finished running worker. elapsed_ms=238, status=200, request_id=123",
}
// Output log event
{
"logStream": "abcd1234",
"message": "Finished running worker. elapsed_ms=238, status=200, request_id=123",
// Parsed fields are added under a new `.%kv` object. All values will be strings.
"message.%kv": {
"elapsed_ms": "238",
"status": "200",
"request_id": "123",
},
}
Unroll Array
Transform one log event into multiple by unrolling an array column. Useful when the actual events are wrapped in an array in one object.
All fields other than the "unroll column" will be duplicated for each log event.
Parameters
Unroll column: The column to be unrolled.
Example
// Parameters
// Column: "events"
// Input log event
{
"timestamp": "2023-04-05T12:34:56Z",
"user": "john@example.com",
"user_ip": "192.168.1.1",
"events": [
{ "action": "foo", "outcome": "success" },
{ "action": "bar", "outcome": "failure", "error": "AccessDenied" },
{ "action": "baz", "outcome": "success" },
],
}
// Output log events (multiple)
{
// Other columns are copied for each unrolled log event
"timestamp": "2023-04-05T12:34:56Z",
"user": "john@example.com",
"user_ip": "192.168.1.1",
// The original `events` array becomes a single object
"events": {
// The index from the original array is added to every log event
"%idx": 0,
"action": "foo",
"outcome": "success",
},
}
{
"timestamp": "2023-04-05T12:34:56Z",
"user": "john@example.com",
"user_ip": "192.168.1.1",
"events": {
"%idx": 1,
"action": "bar",
"outcome": "failure",
"error": "AccessDenied",
},
}
{
"timestamp": "2023-04-05T12:34:56Z",
"user": "john@example.com",
"user_ip": "192.168.1.1",
"events": {
"%idx": 2,
"action": "baz",
"outcome": "success",
},
}