Creating S3 Import Rules

Tell Scanner which S3 files to index and how to parse them.

After linking your AWS account and S3 buckets, you need to create S3 Import Rules for each bucket. These rules determine:

  • Which files are indexed

  • How the files will be read and parsed

  • Which destination index these log events will go into

Most Scanner users have their logs separated at the bucket level. E.g. you may have your AWS CloudTrail logs in the my-cloudtrail-logs bucket, Okta logs in the my-okta-logs bucket. If that is the case, you will only need one import rule per bucket.

S3 Import Rules can be previewed using the built-in import preview tool.

Create or View Rules

Go to the Scanner app > Settings > S3 Import Rules to create new rules or view existing ones.

Configuration

Name

The name of the import rule.

You can use any strings. This is simply an identifier which will be added to each log event. You can then search for these log events using the query %ingest.import_rule_name: "my_import_rule_name".

Required: Yes

Valid characters: [A-Za-z0-9_-]

Source Type

The source type of these log events.

Scanner provides a list of suggested source types, but you can use any strings. This is another identifier that will be added to each log event. You can then search for these log events using the query %ingest.source_type:"my:source:type".

Required: Yes

Valid characters: [A-Za-z0-9_-:]

Destination Index

The destination index for these log events.

You can choose from any of your indexes. Scanner indexes are access-controlled, so make sure you choose an index that can be accessed by team members who need these log events.

Required: Yes

AWS Account & S3 Bucket

The S3 bucket (and the AWS account it is in) this import rule is for.

Required: Yes

S3 Key Prefix

Files from the bucket will only be indexed if they match this key prefix.

For example, if your bucket has 3 folders production/, staging/ and sandbox/ at the root. You could index only one of them by specifying an S3 key prefix of production/.

The key prefix does not have to correspond to a directory. For instance, foo/b can be used to match every key in directory foo which begins with b.

Required: No

Note: This is NOT a regex. If you need to index two of the above folders, you might want to set up two separate import rules.

S3 Key: Additional Regex

Files from the bucket will only be indexed if the S3 key (after the key prefix) matches this regex. This regex supports the standard import rule regex syntax, and has the standard limitations.

For example, AWS CloudTrail can be configured to generate digest files, and by default stores them under the s3://<s3-bucket-name>/AWSLogs/<aws-account-id>/CloudTrail-Digest/<region>/ path, while the actual logs go to .../CloudTrail/<region>/. You can specify a regex of .*/CloudTrail/.* to skip the digest logs.

The regex is applied only to the part of the key after the specified S3 key prefix, and is not anchored. E.g. the prefix foo/ with regex [ab] will match foo/abc and foo/bbc, but also foo/cbc (as cbc contains the letter b). To match only values starting with a or b, use regex ^[ab].

Required: No

File Type & Compression

The file type and compression format of the file.

The most common format for log files is JsonLines/Gzip. However, Scanner does support other log formats like Parquet and CSV.

If your log file format is not listed, contact us! It could be easy for us to add support for it.

Required: Yes

Unroll Array Column

Unroll a log event with an array in one column into multiple log events.

Use the import preview tool in the Scanner app to make sure the column is unrolled properly.

Required: No (this configuration is uncommon)

Example:

// Input log event
{
  "logGroup": "prod-logs",
  "logStream": "abc123",
  "logEvents": [
    { "time": 1704111814, "log": "INFO 1st message" },
    { "time": 1704111816, "log": "INFO 2nd message" },
    { "time": 1704111819, "log": "INFO 3rd message" }
  ]
}

// Output log events, after unrolling the "logEvents" column
[
  { "logGroup": "prod-logs", "logStream": "abc123", "logEvents": { "time": 1704111814, "log": "INFO 1st message" } },
  { "logGroup": "prod-logs", "logStream": "abc123", "logEvents": { "time": 1704111816, "log": "INFO 2nd message" } },
  { "logGroup": "prod-logs", "logStream": "abc123", "logEvents": { "time": 1704111819, "log": "INFO 3rd message" } }
]

Timestamp Extractors

Specify the column where the timestamp is located. Nested JSON columns are also supported.

Scanner can intelligently parse different timestamp representations, including RFC3339 strings and epoch seconds/milliseconds.

If the timestamp is part of a string column, you will need to provide a regex pattern with exactly one capture group which contains the timestamp. This regex supports the standard import rule regex syntax, and has the standard limitations.

If Scanner failed to parse a timestamp with the given instructions, we will default to use the S3 file's last-modified time as timestamp for the log event.

Use the import preview tool in the Scanner app to make sure the timestamp is parsed correctly.

Required: At least one column name (additional regex is optional)

Examples:

ColumnRegexExample Log Event

timestamp

<None>

{ "timestamp": 1704111814456 }

logEvent.timestamp

<None>

{ "logGroup": "prod-logs", "logEvent": { "timestamp": 1704111814456, "log": "INFO log message" } }

message

^(\S+)\s

{ "message": "2024-01-01T12:23:34.456Z INFO Some log message." }

raw

(?:"time":")([^"]*)(?:")

{ "raw": "{\"time\":\"2024-01-01T12:23:34.456Z\" }" } (a serialized JSON string)

Last updated