Custom Logs - AWS S3

This guide walks through how to set up Custom Logs from AWS S3 in Scanner Collect, enabling Scanner to index and analyze your application’s log files stored in S3.

For the sake of this guide, we’ll assume you have custom application logs being written to an S3 bucket, using JSON Lines format and compressed with Gzip. Scanner supports a wide range of formats beyond this, including Parquet, JSON arrays, CSV, plaintext, and others.

Step 1: Create a New Source

In the Scanner UI, navigate to the Collect tab.

  • Click Create New Source.

  • Click Select a Source Type.

  • Under Custom Logs, select AWS S3.

You’ll see that Destination is automatically set to Scanner, meaning Scanner will index the logs for search and detections.

Click Next.

Step 2: Configure the Source

  • Choose a Display Name (e.g., my-app-custom-logs).

  • Select the appropriate File Type. Common options include:

    • JsonLines

    • Parquet

    • Csv, CsvWithConfig, Plaintext, etc.

For this guide, we’ll assume the logs are in newline-delimited JSON format, so we'll select JsonLines.

Choose the Compression Type:

  • Supported: Gzip, Bz2, Deflate, Zstd

  • Most commonly: Gzip

For this guide, we’ll assume the logs were compressed with Gzip, so we'll choose Gzip.

Click Next.

Step 3: Set the Origin (S3 Bucket)

Select the S3 bucket containing your application logs.

  • (Optional) S3 Key Prefix:

    • Enter a Bucket Prefix if your logs are stored under a specific path (e.g., my_app/logs). Files from the bucket will only be indexed if they match this key prefix. For example, if your bucket has 3 folders production/, staging/ and sandbox/ at the root. You could index only one of them by specifying an S3 key prefix of production/. The key prefix does not have to correspond to a directory. For instance, foo/b can be used to match every key in directory foo which begins with b. Note: This is NOT a regex. If you need to index two of the above folders, you might want to set up two separate import rules.

  • (Optional) S3 Key: Additional Regex

    • Files from the bucket will only be indexed if the S3 key (after the key prefix) matches this regex. This regex supports the standard import rule regex syntax, and has the standard limitations.

      For example, AWS CloudTrail can be configured to generate digest files, and by default stores them under the s3://<s3-bucket-name>/AWSLogs/<aws-account-id>/CloudTrail-Digest/<region>/ path, while the actual logs go to .../CloudTrail/<region>/. You can specify a regex of .*/CloudTrail/.* to skip the digest logs.

    • The regex is applied only to the part of the key after the specified S3 key prefix, and is not anchored. E.g. the prefix foo/ with regex [ab] will match foo/abc and foo/bbc, but also foo/cbc (as cbc contains the letter b). To match only values starting with a or b, use regex ^[ab].

Click Next.

Step 4: Set the Destination

  1. Choose the Scanner Index where you want these logs to be indexed and searchable.

  2. Leave the Source Label set to the default: custom:generic.

Click Next.

Step 5: Transform and Enrich

Please see the Data Transformations & Enrichment documentation for full details on this step.

Scanner provides two default transformations:

Parse JSON Columns

Detects and extracts embedded JSON strings from fields.

Example:

"request_json": "{\"my_key\":\"my_val\"}"

Adds field: request_json.%json.my_key = "my_val"

Parse Key-Value Columns

Extracts key=value pairs from unstructured log lines.

Example:

"log_message": "my_key1=\"my_val1\" my_key2=\"my_val2\""

Adds fields:

  • log_message.%kv.my_key1 = "my_val1"

  • log_message.%kv.my_key2 = "my_val2"

These transformations are optional. You can remove them or add additional steps as needed.

Click Next.

Step 6: Timestamp Extraction

Scanner needs to know which field represents the time of the event.

  1. Choose a Timestamp Field, such as:

    • timestamp

    • eventTime

    • ts

    • etc.

  2. (Optional) Use a regex with capture group to extract a timestamp from inside a larger string.

Note: If Scanner is unable to parse a timestamp, it will set the log event's time to be the time of ingestion.

Example:

If your log contains:

"log_message": "[2025-03-04T10:43:12.882Z] INFO - Received new request"

You can extract the timestamp using:

  • Field: log_message

  • Regex: ^\[(.+)\]\s

The regular expression in this example works as follows:

  • ^ asserts that the match must start at the beginning of the string.

  • \[ matches a literal opening square bracket ([).

  • (.+) is a capture group that matches and captures the timestamp inside the brackets.

  • \] matches the closing square bracket (]).

  • \s matches the whitespace character that typically follows the bracketed timestamp.

In the example above, the regex will extract 2025-03-04T10:43:12.882Z

Click Next.

Step 7: Review and Create

  1. Review your settings.

  2. Click Preview to test the configuration.

  3. This is especially useful for Custom Logs - AWS S3, allowing you to verify that files are detected and parsed correctly.

  4. Once everything looks good, click Create Source.

Last updated

Was this helpful?