Creating S3 Import Rules
Tell Scanner which S3 files to index and how to parse them.
After linking your AWS account and S3 buckets, you need to create S3 Import Rules for each bucket. These rules determine:
Which files are indexed
How the files will be read and parsed
Which destination index these log events will go into
Most Scanner users have their logs separated at the bucket level. E.g. you may have your AWS CloudTrail logs in the my-cloudtrail-logs
bucket, Okta logs in the my-okta-logs
bucket. If that is the case, you will only need one import rule per bucket.
S3 Import Rules can be previewed using the built-in import preview tool.
Create or View Rules
Go to the Scanner app > Settings > S3 Import Rules to create new rules or view existing ones.
Configuration
Name
The name of the import rule.
You can use any strings. This is simply an identifier which will be added to each log event. You can then search for these log events using the query %ingest.import_rule_name: "my_import_rule_name"
.
Required: Yes
Valid characters: [A-Za-z0-9_-]
Source Type
The source type of these log events.
Scanner provides a list of suggested source types, but you can use any strings. This is another identifier that will be added to each log event. You can then search for these log events using the query %ingest.source_type:"my:source:type"
.
Required: Yes
Valid characters: [A-Za-z0-9_-:]
Destination Index
The destination index for these log events.
You can choose from any of your indexes. Scanner indexes are access-controlled, so make sure you choose an index that can be accessed by team members who need these log events.
Required: Yes
AWS Account & S3 Bucket
The S3 bucket (and the AWS account it is in) this import rule is for.
Required: Yes
S3 Key Prefix
Files from the bucket will only be indexed if they match this key prefix.
For example, if your bucket has 3 folders production/
, staging/
and sandbox/
at the root. You could index only one of them by specifying an S3 key prefix of production/
.
The key prefix does not have to correspond to a directory. For instance, foo/b
can be used to match every key in directory foo
which begins with b
.
Required: No
Note: This is NOT a regex. If you need to index two of the above folders, you might want to set up two separate import rules.
S3 Key: Additional Regex
Files from the bucket will only be indexed if the S3 key (after the key prefix) matches this regex. This regex supports the standard import rule regex syntax, and has the standard limitations.
For example, AWS CloudTrail can be configured to generate digest files, and by default stores them under the s3://<s3-bucket-name>/AWSLogs/<aws-account-id>/CloudTrail-Digest/<region>/
path, while the actual logs go to .../CloudTrail/<region>/
. You can specify a regex of .*/CloudTrail/.*
to skip the digest logs.
The regex is applied only to the part of the key after the specified S3 key prefix, and is not anchored. E.g. the prefix foo/
with regex [ab]
will match foo/abc
and foo/bbc
, but also foo/cbc
(as cbc
contains the letter b
). To match only values starting with a
or b
, use regex ^[ab]
.
Required: No
File Type & Compression
The file type and compression format of the file.
The most common format for log files is JsonLines/Gzip
. However, Scanner does support other log formats like Parquet and CSV.
If your log file format is not listed, contact us! It could be easy for us to add support for it.
Required: Yes
Unroll Array Column
Unroll a log event with an array in one column into multiple log events.
Use the import preview tool in the Scanner app to make sure the column is unrolled properly.
Required: No (this configuration is uncommon)
Example:
Timestamp Extractors
Specify the column where the timestamp is located. Nested JSON columns are also supported.
Scanner can intelligently parse different timestamp representations, including RFC3339 strings and epoch seconds/milliseconds.
If the timestamp is part of a string column, you will need to provide a regex pattern with exactly one capture group which contains the timestamp. This regex supports the standard import rule regex syntax, and has the standard limitations.
If Scanner failed to parse a timestamp with the given instructions, we will default to use the S3 file's last-modified time as timestamp for the log event.
Use the import preview tool in the Scanner app to make sure the timestamp is parsed correctly.
Required: At least one column name (additional regex is optional)
Examples:
timestamp
<None>
{ "timestamp": 1704111814456 }
logEvent.timestamp
<None>
{ "logGroup": "prod-logs", "logEvent": { "timestamp": 1704111814456, "log": "INFO log message" } }
message
^(\S+)\s
{ "message": "2024-01-01T12:23:34.456Z INFO Some log message." }
raw
(?:"time":")([^"]*)(?:")
{ "raw": "{\"time\":\"2024-01-01T12:23:34.456Z\" }" }
(a serialized JSON string)
Last updated