Data Exploration

Data exploration in Scanner helps you discover patterns, anomalies, and insights in your logs through interactive analysis. This guide covers techniques for investigating data using column statistics, progressive query refinement, and pivoting between different views.

Using Column Statistics

Column statistics show you the most common values for any field in your search results, helping you quickly identify patterns and outliers.

How It Works

Column statistics are computed on the first 1,000 events loaded in your search results table. This gives you a quick snapshot of the data distribution without processing the entire dataset.

Accessing Column Statistics

  1. Run a search query

  2. Open the Columns sidebar on the right

  3. Click on any column name to see its value distribution

Finding Anomalies

Use column statistics to identify suspicious patterns:

Example: Finding unusual error codes

errorCode: *

Click on the errorCode column in the sidebar. If you see high frequencies of AccessDenied or PermissionDenied, this might indicate:

  • Misconfigured applications

  • Privilege escalation attempts

  • Reconnaissance activity

Example: Identifying suspicious IP addresses

eventName: "LoginFailure"

Click on the sourceIP column. Look for:

  • Single IPs with unusually high failure counts (brute force attacks)

  • Unfamiliar geographic locations

  • Known malicious IP ranges

Adding Filters from Column Statistics

Once you identify a suspicious value:

  1. Click the + button next to the value

  2. Scanner automatically adds it as a filter to your query

  3. Click Run to narrow your search


Building Queries Progressively

Start with broad searches and iteratively refine based on what you discover. This approach helps you understand your data before jumping to conclusions.

The Progressive Refinement Pattern

Step 1: Initial Exploration Begin with a simple filter to understand the data:

errorCode: *

Step 2: Use Column Statistics Identify interesting patterns by examining column distributions.

Step 3: Add Specific Filters Narrow your search based on suspicious values:

errorCode: "AccessDenied"

Step 4: Aggregate for Patterns Summarize by relevant dimensions to see trends:

errorCode: "AccessDenied"
| groupbycount userIdentity.userName, eventName

Step 5: Pivot to Details Once you identify anomalies in aggregated data, remove the aggregation to view individual events:

errorCode: "AccessDenied" userIdentity.userName: "suspicious_user"

Example: Investigating Failed API Calls

Start broad:

@index=cloudtrail

Find failures: Use column statistics on errorCode → see high count of AccessDenied

Filter to failures:

@index=cloudtrail errorCode: "AccessDenied"

Identify suspicious users:

@index=cloudtrail errorCode: "AccessDenied"
| groupbycount userIdentity.userName

Investigate specific user:

@index=cloudtrail userIdentity.userName: "compromised_user"

Pivoting Between Views

Move fluidly between aggregated summaries and detailed event logs to investigate from multiple angles.

From Summary to Details

When you spot an anomaly in aggregated data:

  1. Note the key identifying values (username, IP address, service name, etc.)

  2. Remove the aggregation from your query (delete everything from | onward)

  3. Add the key values as filters

  4. View the detailed logs

Example:

Summary query showing data transfer by user:

eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName

Results show user_a with 10GB transferred (suspicious).

Pivot to see exactly what they accessed:

eventName: GetObject userName: "user_a"

From Details to Summary

When examining individual events, you may want to see the bigger picture:

  1. Identify key fields you want to summarize (IP, user, service, action)

  2. Add an aggregation to your existing query

  3. Group by relevant dimensions

Example:

Viewing individual S3 access events:

eventSource: s3.amazonaws.com eventName: GetObject

Want to see which users are accessing data most frequently:

eventSource: s3.amazonaws.com eventName: GetObject
| groupbycount userIdentity.userName

Multi-Dimensional Pivoting

Investigate from different angles by changing aggregation dimensions:

By user:

suspicious_activity_indicator
| groupbycount userName

By service:

suspicious_activity_indicator
| groupbycount eventSource

By time pattern:

suspicious_activity_indicator
| groupbycount eventSource, eventName

Advanced Aggregation Techniques

Combine multiple aggregation functions to perform complex analysis.

Renaming Fields for Clarity

Use rename to create more readable column names, especially useful for deeply nested fields:

eventName: GetObject
| rename
    additionalEventData.bytesTransferredOut as bytes_exfiltrated,
    requestParameters.bucketName as s3_bucket,
    userIdentity.userName as user_name
| stats sum(bytes_exfiltrated) as total_bytes
    by s3_bucket, user_name

Why rename?

  • Shorter, more readable column names in results

  • Prepare field names for external systems (webhooks, APIs)

  • Clarify intent when sharing queries with team members

Combining Stats Functions

Use stats with multiple aggregation functions:

api_calls_filter
| stats
    count() as total_calls,
    sum(duration) as total_duration,
    avg(duration) as avg_duration
    by service, endpoint

Filtering Aggregated Results

Use where to filter aggregation results based on thresholds:

eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName
| where total_bytes > 100000000

This shows only users who transferred more than 100MB.

Selecting Specific Columns

Use table to return only specific columns from your results:

suspicious_users_filter
| rename sourceIPAddress as ip_address
| groupbycount ip_address
| table ip_address

Useful when:

  • Preparing data for external webhooks

  • Simplifying results for dashboards

  • Extracting specific values for further investigation


Investigation Workflow Examples

Example 1: Detecting Data Exfiltration

Step 1: Find high-volume data transfers

eventSource: s3.amazonaws.com eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName
| where total_bytes > 1000000000

Step 2: Identify which buckets are being accessed

eventSource: s3.amazonaws.com eventName: GetObject userName: "suspicious_user"
| groupbycount requestParameters.bucketName

Step 3: View detailed access logs

eventSource: s3.amazonaws.com 
eventName: GetObject 
userName: "suspicious_user"
requestParameters.bucketName: "sensitive-data-bucket"

Step 4: Check for other users accessing same buckets

eventSource: s3.amazonaws.com 
eventName: GetObject
requestParameters.bucketName: "sensitive-data-bucket"
| groupbycount userName

Example 2: Investigating Privilege Escalation

Step 1: Find IAM policy modification attempts

eventSource: iam.amazonaws.com
eventName: (PutRolePolicy OR AttachRolePolicy OR CreateAccessKey)

Step 2: Identify users making changes

eventSource: iam.amazonaws.com
eventName: (PutRolePolicy OR AttachRolePolicy OR CreateAccessKey)
| groupbycount userIdentity.userName, errorCode

Step 3: Check success vs. failure rates

eventSource: iam.amazonaws.com
userIdentity.userName: "suspicious_user"
| groupbycount eventName, errorCode

Step 4: View successful privilege escalations

eventSource: iam.amazonaws.com
userIdentity.userName: "suspicious_user"
NOT errorCode: *

Example 3: Analyzing Attack Patterns

Step 1: Get overview of user's activity

userIdentity.userName: "compromised_account"
| groupbycount eventSource, eventName, errorCode

Step 2: Sort by success/failure Click the errorCode column header to partition results by successful operations vs. failures.

Step 3: Investigate successful operations

userIdentity.userName: "compromised_account"
NOT errorCode: *
| groupbycount eventSource, eventName

Step 4: Focus on high-risk services

userIdentity.userName: "compromised_account"
eventSource: (lambda.amazonaws.com OR events.amazonaws.com)

Best Practices

Start Broad, Then Narrow

  • Begin with time range and basic filters

  • Use column statistics to guide refinement

  • Add specificity incrementally

Use Aggregations to Find Outliers

  • Look for unusually high counts

  • Identify rare events

  • Compare current patterns to historical baselines

Pivot Frequently

  • Don't get stuck in one view

  • Switch between summary and detail

  • Look at data from multiple dimensions

Save Useful Queries

  • Document investigation patterns that work

  • Build a library of starting points

  • Share queries with your team

Leverage Time Windows

  • Adjust time ranges to match investigation scope

  • Use histogram visualization to spot activity spikes

  • Click and drag on histogram to zoom into specific periods


Tips for Effective Exploration

Performance optimization:

  • Start with smaller time windows when exploring unfamiliar data

  • Use specific filters before aggregations

  • Limit result sets with | head 100 when testing queries

Column discovery:

  • Click on an event to view all available fields

  • Use the Filter box in the details panel to search for specific column names

  • Tab-complete in the query box to see available fields

Pattern recognition:

  • Look for deviations from normal behavior

  • Identify time-based patterns (weekday vs. weekend, business hours vs. off-hours)

  • Cross-reference multiple data sources for correlation

Collaboration:

  • Share queries with team members via saved queries

  • Document your investigation process

  • Use clear, descriptive names when saving queries

Last updated

Was this helpful?