Data Exploration

Data exploration in Scanner helps you discover patterns, anomalies, and insights in your logs through interactive analysis. This guide covers techniques for investigating data using column statistics, progressive query refinement, and pivoting between different views.

Using Column Statistics

Column statistics show you the most common values for any field in your search results, helping you quickly identify patterns and outliers.

How It Works

Column statistics are computed on the first 1,000 events loaded in your search results table. This gives you a quick snapshot of the data distribution without processing the entire dataset.

Accessing Column Statistics

  1. Run a search query

  2. Open the Columns sidebar on the right

  3. Click on any column name to see its value distribution

Finding Anomalies

Use column statistics to identify suspicious patterns:

Example: Finding unusual error codes

@index=cloudtrail
errorCode: *

Click on the errorCode column in the sidebar. If you see high frequencies of AccessDenied or PermissionDenied, this might indicate:

  • Misconfigured applications

  • Privilege escalation attempts

  • Reconnaissance activity

Example: Identifying suspicious IP addresses

Click on the sourceIPAddress column. Look for:

  • Single IPs with unusually high failure counts (brute force attacks)

  • IPs from unexpected locations or networks

  • Known malicious IP ranges

Adding Filters from Column Statistics

Once you identify a suspicious value:

  1. Click the + button next to the value

  2. Scanner automatically adds it as a filter to your query

  3. Click Run to narrow your search


Understanding ECS Normalized Fields

Scanner Collect normalizes logs from all supported sources into Elastic Common Schema (ECS) fields. This means you can write queries that work across CloudTrail, Auth0, Okta, and other sources without learning source-specific field names.

Why ECS Fields Matter

When you ingest logs from different sources, they use different field names:

  • CloudTrail uses userIdentity.userName, Auth0 uses user_id, Okta uses actor.alternateId

  • CloudTrail uses eventName, but other sources might use action or activity

  • Error indicators vary: CloudTrail has errorCode, others might use status or result

ECS normalizes these into standard fields so your queries work across all sources. Learn more about ECS normalization.

Common ECS Fields

User and Identity:

  • @ecs.user.name - Username (from userIdentity.userName in CloudTrail, user_id in Auth0, etc.)

  • @ecs.user.id - User ID or ARN

  • @ecs.user_agent - User agent string

Event Information:

  • @ecs.event.action - The action performed (e.g., "DescribeNetworkInterfaces", "AssumeRole")

  • @ecs.event.outcome - Result of the action: success or failure

  • @ecs.event.category - Category like "Management", "Authentication", "Discovery"

Cloud and Network:

  • @ecs.cloud.provider - Cloud provider ("aws", "azure", "gcp")

  • @ecs.cloud.region - Cloud region

  • @ecs.cloud.account.id - Cloud account ID

  • @ecs.cloud.service.name - Service name (e.g., "s3.amazonaws.com")

  • @ecs.source.ip - Source IP address

Writing Cross-Source Queries

Instead of writing separate queries for each source, use ECS fields:

CloudTrail-only query (source-specific):

Cross-source query (works for CloudTrail, Auth0, Okta, etc.):

This second query will return matching events from ANY source that Scanner Collect supports, making it much more powerful.

Availability

All sources supported by Scanner Collect provide ECS fields. Native source fields (like eventName, userIdentity.arn for CloudTrail) are still available—ECS fields are in addition to, not instead of, the original fields.


Building Queries Progressively

Start with broad searches and iteratively refine based on what you discover. This approach helps you understand your data before jumping to conclusions.

The Progressive Refinement Pattern

Step 1: Initial Exploration Begin with a simple filter to understand the data:

Run this and open the Columns sidebar to see error code distribution.

Step 2: Use Column Statistics Identify interesting patterns by examining column distributions. Click on the errorCode column and look for high-frequency values like AccessDenied or UnauthorizedOperation.

Step 3: Add Specific Filters Narrow your search based on suspicious values:

Step 4: Aggregate for Patterns Summarize by relevant dimensions to see trends:

This shows which users are hitting access denied errors and which actions they're attempting.

Step 5: Pivot to Details Once you identify anomalies in aggregated data, remove the aggregation to view individual events. If you see a user with high count:

Then pick a high-count user and drill into their details:

Example: Investigating Failed API Calls

Start broad:

Find failures: Use column statistics on errorCode → see high count of AccessDenied

Filter to failures:

Identify suspicious users:

Investigate specific user: Pick a user with high failure count and drill in:


Pivoting Between Views

Move fluidly between aggregated summaries and detailed event logs to investigate from multiple angles.

From Summary to Details

When you spot an anomaly in aggregated data:

  1. Note the key identifying values (username, IP address, service name, etc.)

  2. Remove the aggregation from your query (delete everything from | onward)

  3. Add the key values as filters

  4. View the detailed logs

Example:

Summary query showing data transfer by user:

This identifies users transferring over 1GB in a single session. If user_a shows 10GB transferred (far above normal), investigate further.

Pivot to see exactly what they accessed:

From Details to Summary

When examining individual events, you may want to see the bigger picture:

  1. Identify key fields you want to summarize (IP, user, service, action)

  2. Add an aggregation to your existing query

  3. Group by relevant dimensions

Example:

Viewing individual S3 access events:

Want to see which users are accessing data most frequently:

When you spot a suspicious event, use Go to Context to see other events that happened around the same time with related properties.

How it works:

  1. Click on any event in your results table

  2. Click Go to Context to see events from that time period

  3. Select facets to narrow the context (e.g., userIdentity.userName=jane_lopez, eventSource=s3.amazonaws.com)

  4. Select multiple facets to find correlated activity

Example workflow:

You find a suspicious S3 access event by jane_lopez. Click Go to Context and select:

  • userIdentity.userName: jane_lopez [x]

  • eventSource: s3.amazonaws.com [x]

This shows all S3 events from that user in a 5-minute window, letting you see if they accessed multiple buckets, how many objects they retrieved, or if they failed before succeeding.

When to use:

  • Correlate user activity across multiple services

  • Find failed attempts that led to successful exploits

  • Trace an attacker's full activity path in a time window

  • Understand relationships between events

Multi-Dimensional Pivoting

Investigate from different angles by changing aggregation dimensions:

By user:

By service:

By service and action:


Advanced Aggregation Techniques

Combine multiple aggregation functions to perform complex analysis.

Renaming Fields for Clarity

Use rename to create more readable column names, especially useful for deeply nested fields:

Why rename?

  • Shorter, more readable column names in results

  • Prepare field names for external systems (webhooks, APIs)

  • Clarify intent when sharing queries with team members

Combining Stats Functions

Use stats with multiple aggregation functions:

Filtering Aggregated Results

Use where to filter aggregation results based on thresholds:

This shows only users who transferred more than 100MB.

Selecting Specific Columns

Use table to return only specific columns from your results:

Useful when:

  • Preparing data for external webhooks

  • Simplifying results for dashboards

  • Extracting specific values for further investigation


Investigation Workflow Examples

Example 1: Detecting Data Exfiltration

Step 1: Find high-volume data transfers

This threshold (1GB) catches unusual single-session transfers. Adjust based on your baseline—if users typically transfer 100MB, flag anything over 500MB.

Why this matters: Normal S3 usage rarely involves single-session transfers exceeding 1GB. Large transfers may indicate exfiltration.

Step 2: Identify anomalous access patterns by user

Find users with unusually high request volumes and large data transfers in single sessions.

Threshold rationale: 100+ requests to a single bucket is unusual (most users query a few times). Combined with 0.5GB+ transfer, this strongly suggests automated data collection rather than normal usage.

Step 3: Analyze access velocity and timing for high-volume users

Find rapid-fire access (high request count in short time window). Legitimate users rarely access S3 hundreds of times in 30 minutes - this indicates scripted data collection.

Threshold rationale: 50+ requests in under 30 minutes is physically impossible for manual access. This pattern is typical of automated tools systematically exfiltrating data.

Step 4: Cross-reference with normal baseline

Users accessing 5+ different buckets with high volume suggests cross-bucket data collection, typical of exfiltration campaigns.

Example 2: Investigating Privilege Escalation

Step 1: Find IAM policy modification attempts

Step 2: Identify users attempting policy changes

Find users with multiple privilege escalation attempts. Multiple failures followed by success is a classic attack pattern.

Step 3: Examine successful policy modifications and targets

Reveals which roles were successfully modified and the scope of changes. Multiple modifications to admin or service roles indicates privilege consolidation.

Example 3: Analyzing Attack Patterns

Step 1: Identify users with unusually broad service access

Users accessing 8+ different AWS services with high event volume suggests reconnaissance or privilege expansion. Most legitimate users interact with 1-3 services.

Step 2: Find trial-and-error attempts on high-risk services

Lambda, EventBridge, IAM, and EC2 are critical for persistence. The pattern of both AccessDenied failures and successful operations on the same endpoint suggests trial-and-error attacks: attackers probing with wrong permissions/parameters until they find what works. Multiple users or IPs indicates coordinated activity.

Step 3: Detect lateral movement patterns

Rapid movement across multiple services in short timeframe indicates automated lateral movement, not human behavior.


Best Practices

Start Broad, Then Narrow

  • Begin with time range and basic filters

  • Use column statistics to guide refinement

  • Add specificity incrementally

Use Aggregations to Find Outliers

  • Look for unusually high counts

  • Identify rare events

  • Compare current patterns to historical baselines

Pivot Frequently

  • Don't get stuck in one view

  • Switch between summary and detail

  • Look at data from multiple dimensions

Save Useful Queries

  • Document investigation patterns that work

  • Build a library of starting points

  • Share queries with your team

Leverage Time Windows

  • Adjust time ranges to match investigation scope

  • Use histogram visualization to spot activity spikes

  • Click and drag on histogram to zoom into specific periods


Tips for Effective Exploration

Performance optimization:

  • Start with smaller time windows when exploring unfamiliar data

  • Use specific filters before aggregations

  • Use where clauses to limit results after aggregations

Column discovery:

  • Click on an event to view all available fields

  • Use the Filter box in the details panel to search for specific column names

  • Tab-complete in the query box to see available fields

Pattern recognition:

  • Look for deviations from normal behavior

  • Identify time-based patterns (weekday vs. weekend, business hours vs. off-hours)

  • Cross-reference multiple data sources for correlation

Collaboration:

  • Share queries with team members via saved queries

  • Document your investigation process

  • Use clear, descriptive names when saving queries

Last updated

Was this helpful?