Data Exploration

Data exploration in Scanner helps you discover patterns, anomalies, and insights in your logs through interactive analysis. This guide covers techniques for investigating data using column statistics, progressive query refinement, and pivoting between different views.

Using Column Statistics

Column statistics show you the most common values for any field in your search results, helping you quickly identify patterns and outliers.

How It Works

Column statistics are computed on the first 1,000 events loaded in your search results table. This gives you a quick snapshot of the data distribution without processing the entire dataset.

Accessing Column Statistics

  1. Run a search query

  2. Open the Columns sidebar on the right

  3. Click on any column name to see its value distribution

Finding Anomalies

Use column statistics to identify suspicious patterns:

Example: Finding unusual error codes

@index=cloudtrail
errorCode: *

Click on the errorCode column in the sidebar. If you see high frequencies of AccessDenied or PermissionDenied, this might indicate:

  • Misconfigured applications

  • Privilege escalation attempts

  • Reconnaissance activity

Example: Identifying suspicious IP addresses

@index=cloudtrail
eventName: "ConsoleLogin"
errorCode: *

Click on the sourceIPAddress column. Look for:

  • Single IPs with unusually high failure counts (brute force attacks)

  • IPs from unexpected locations or networks

  • Known malicious IP ranges

Adding Filters from Column Statistics

Once you identify a suspicious value:

  1. Click the + button next to the value

  2. Scanner automatically adds it as a filter to your query

  3. Click Run to narrow your search


Understanding ECS Normalized Fields

Scanner Collect normalizes logs from all supported sources into Elastic Common Schema (ECS) fields. This means you can write queries that work across CloudTrail, Auth0, Okta, and other sources without learning source-specific field names.

Why ECS Fields Matter

When you ingest logs from different sources, they use different field names:

  • CloudTrail uses userIdentity.userName, Auth0 uses user_id, Okta uses actor.alternateId

  • CloudTrail uses eventName, but other sources might use action or activity

  • Error indicators vary: CloudTrail has errorCode, others might use status or result

ECS normalizes these into standard fields so your queries work across all sources. Learn more about ECS normalization.

Common ECS Fields

User and Identity:

  • @ecs.user.name - Username (from userIdentity.userName in CloudTrail, user_id in Auth0, etc.)

  • @ecs.user.id - User ID or ARN

  • @ecs.user_agent - User agent string

Event Information:

  • @ecs.event.action - The action performed (e.g., "DescribeNetworkInterfaces", "AssumeRole")

  • @ecs.event.outcome - Result of the action: success or failure

  • @ecs.event.category - Category like "Management", "Authentication", "Discovery"

Cloud and Network:

  • @ecs.cloud.provider - Cloud provider ("aws", "azure", "gcp")

  • @ecs.cloud.region - Cloud region

  • @ecs.cloud.account.id - Cloud account ID

  • @ecs.cloud.service.name - Service name (e.g., "s3.amazonaws.com")

  • @ecs.source.ip - Source IP address

Writing Cross-Source Queries

Instead of writing separate queries for each source, use ECS fields:

CloudTrail-only query (source-specific):

@index=cloudtrail
userIdentity.userName: "admin_user"
eventName: "AssumeRole"
NOT errorCode: *

Cross-source query (works for CloudTrail, Auth0, Okta, etc.):

@ecs.user.name: "admin_user"
@ecs.event.action: "AssumeRole"
@ecs.event.outcome: "success"

This second query will return matching events from ANY source that Scanner Collect supports, making it much more powerful.

Availability

All sources supported by Scanner Collect provide ECS fields. Native source fields (like eventName, userIdentity.arn for CloudTrail) are still available—ECS fields are in addition to, not instead of, the original fields.


Building Queries Progressively

Start with broad searches and iteratively refine based on what you discover. This approach helps you understand your data before jumping to conclusions.

The Progressive Refinement Pattern

Step 1: Initial Exploration Begin with a simple filter to understand the data:

@index=cloudtrail
errorCode: *

Run this and open the Columns sidebar to see error code distribution.

Step 2: Use Column Statistics Identify interesting patterns by examining column distributions. Click on the errorCode column and look for high-frequency values like AccessDenied or UnauthorizedOperation.

Step 3: Add Specific Filters Narrow your search based on suspicious values:

@index=cloudtrail
errorCode: "AccessDenied"

Step 4: Aggregate for Patterns Summarize by relevant dimensions to see trends:

@index=cloudtrail
errorCode: "AccessDenied"
| groupbycount userIdentity.userName, eventName

This shows which users are hitting access denied errors and which actions they're attempting.

Step 5: Pivot to Details Once you identify anomalies in aggregated data, remove the aggregation to view individual events. If you see a user with high count:

@index=cloudtrail
errorCode: "AccessDenied"
| groupbycount userIdentity.userName

Then pick a high-count user and drill into their details:

@index=cloudtrail
errorCode: "AccessDenied" userIdentity.userName: "admin_user"

Example: Investigating Failed API Calls

Start broad:

@index=cloudtrail

Find failures: Use column statistics on errorCode → see high count of AccessDenied

Filter to failures:

@index=cloudtrail errorCode: "AccessDenied"

Identify suspicious users:

@index=cloudtrail errorCode: "AccessDenied"
| groupbycount userIdentity.userName

Investigate specific user: Pick a user with high failure count and drill in:

@index=cloudtrail userIdentity.userName: "admin_user"

Pivoting Between Views

Move fluidly between aggregated summaries and detailed event logs to investigate from multiple angles.

From Summary to Details

When you spot an anomaly in aggregated data:

  1. Note the key identifying values (username, IP address, service name, etc.)

  2. Remove the aggregation from your query (delete everything from | onward)

  3. Add the key values as filters

  4. View the detailed logs

Example:

Summary query showing data transfer by user:

@index=cloudtrail
eventName: GetObject
| stats
  sum(bytesTransferred) as total_bytes
  by userIdentity.userName
| eval gb = total_bytes / (1024 * 1024 * 1024)
| where gb > 1

This identifies users transferring over 1GB in a single session. If user_a shows 10GB transferred (far above normal), investigate further.

Pivot to see exactly what they accessed:

@index=cloudtrail
eventName: GetObject userIdentity.userName: "user_a"

From Details to Summary

When examining individual events, you may want to see the bigger picture:

  1. Identify key fields you want to summarize (IP, user, service, action)

  2. Add an aggregation to your existing query

  3. Group by relevant dimensions

Example:

Viewing individual S3 access events:

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"

Want to see which users are accessing data most frequently:

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| groupbycount userIdentity.userName

When you spot a suspicious event, use Go to Context to see other events that happened around the same time with related properties.

How it works:

  1. Click on any event in your results table

  2. Click Go to Context to see events from that time period

  3. Select facets to narrow the context (e.g., userIdentity.userName=jane_lopez, eventSource=s3.amazonaws.com)

  4. Select multiple facets to find correlated activity

Example workflow:

You find a suspicious S3 access event by jane_lopez. Click Go to Context and select:

  • userIdentity.userName: jane_lopez [x]

  • eventSource: s3.amazonaws.com [x]

This shows all S3 events from that user in a 5-minute window, letting you see if they accessed multiple buckets, how many objects they retrieved, or if they failed before succeeding.

When to use:

  • Correlate user activity across multiple services

  • Find failed attempts that led to successful exploits

  • Trace an attacker's full activity path in a time window

  • Understand relationships between events

Multi-Dimensional Pivoting

Investigate from different angles by changing aggregation dimensions:

By user:

@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount userIdentity.userName

By service:

@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount eventSource

By service and action:

@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount eventSource, eventName

Advanced Aggregation Techniques

Combine multiple aggregation functions to perform complex analysis.

Renaming Fields for Clarity

Use rename to create more readable column names, especially useful for deeply nested fields:

@index=cloudtrail
eventName: GetObject
| rename
  additionalEventData.bytesTransferredOut as bytes_exfiltrated,
  requestParameters.bucketName as s3_bucket,
  userIdentity.userName as user_name
| stats
  sum(bytes_exfiltrated) as total_bytes
  by s3_bucket, user_name

Why rename?

  • Shorter, more readable column names in results

  • Prepare field names for external systems (webhooks, APIs)

  • Clarify intent when sharing queries with team members

Combining Stats Functions

Use stats with multiple aggregation functions:

@index=api_server
http.response_status = 200
| stats
  count() as total_calls,
  sum(http.duration_ms) as total_duration,
  avg(http.duration_ms) as avg_duration,
  percentile(http.duration_ms, 95) as p95_duration
  by service_name, endpoint

Filtering Aggregated Results

Use where to filter aggregation results based on thresholds:

@index=cloudtrail
eventName: GetObject
| stats
  sum(bytesTransferred) as total_bytes
  by userIdentity.userName
| eval mb = total_bytes / (1024 * 1024)
| where mb > 100

This shows only users who transferred more than 100MB.

Selecting Specific Columns

Use table to return only specific columns from your results:

@index=cloudtrail
eventName: GetObject
| stats
  sum(bytesTransferred) as total_bytes
  by sourceIPAddress
| eval mb = total_bytes / (1024 * 1024)
| where mb > 500
| rename sourceIPAddress as ip_address
| table ip_address

Useful when:

  • Preparing data for external webhooks

  • Simplifying results for dashboards

  • Extracting specific values for further investigation


Investigation Workflow Examples

Example 1: Detecting Data Exfiltration

Step 1: Find high-volume data transfers

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
  sum(bytesTransferred) as total_bytes
  by userIdentity.userName
| eval gbTransferred = total_bytes / (1024 * 1024 * 1024)
| where gbTransferred > 1

This threshold (1GB) catches unusual single-session transfers. Adjust based on your baseline—if users typically transfer 100MB, flag anything over 500MB.

Why this matters: Normal S3 usage rarely involves single-session transfers exceeding 1GB. Large transfers may indicate exfiltration.

Step 2: Identify anomalous access patterns by user

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
  count() as access_count,
  sum(additionalEventData.bytesTransferredOut) as total_bytes
  by userIdentity.userName, requestParameters.bucketName
| eval gb = total_bytes / (1024 * 1024 * 1024)
| where access_count > 100 and gb > 0.5
| table userIdentity.userName, requestParameters.bucketName, access_count, gb

Find users with unusually high request volumes and large data transfers in single sessions.

Threshold rationale: 100+ requests to a single bucket is unusual (most users query a few times). Combined with 0.5GB+ transfer, this strongly suggests automated data collection rather than normal usage.

Step 3: Analyze access velocity and timing for high-volume users

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
  count() as requests,
  min(@scnr.time_ns) as first_access_ns,
  max(@scnr.time_ns) as last_access_ns
  by userIdentity.userName, sourceIPAddress
| eval duration_seconds = (last_access_ns - first_access_ns) / 1000000000
| eval duration_minutes = duration_seconds / 60
| where requests > 50 and duration_minutes < 30
| table userIdentity.userName, sourceIPAddress, requests, duration_minutes

Find rapid-fire access (high request count in short time window). Legitimate users rarely access S3 hundreds of times in 30 minutes - this indicates scripted data collection.

Threshold rationale: 50+ requests in under 30 minutes is physically impossible for manual access. This pattern is typical of automated tools systematically exfiltrating data.

Step 4: Cross-reference with normal baseline

@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
  count() as total_access_count,
  countdistinct(requestParameters.bucketName) as bucket_count
  by userIdentity.userName
| eval access_pattern =
  if(total_access_count > 1000, "high_volume",
    if(total_access_count > 100, "medium", "low"))
| where bucket_count > 5
| table userIdentity.userName, total_access_count, bucket_count, access_pattern

Users accessing 5+ different buckets with high volume suggests cross-bucket data collection, typical of exfiltration campaigns.

Example 2: Investigating Privilege Escalation

Step 1: Find IAM policy modification attempts

@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)

Step 2: Identify users attempting policy changes

@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| eval has_error_code = coalesce(errorCode != null, false)
| eval is_failure = if(has_error_code, 1, 0)
| eval is_success = 1 - is_failure
| stats
  sum(is_success) as success_count,
  sum(is_failure) as failure_count,
  count() as total_attempts
  by userIdentity.userName, eventName
| eval success_pct = 100 * success_count / total_attempts
| where total_attempts > 3
| table userIdentity.userName, eventName, total_attempts, success_count, success_pct

Find users with multiple privilege escalation attempts. Multiple failures followed by success is a classic attack pattern.

Step 3: Examine successful policy modifications and targets

@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
not errorCode: *
| stats
  count() as success_count,
  min(eventTime) as first_success,
  max(eventTime) as last_success
  by userIdentity.userName, eventName, requestParameters.roleName
| eval scope = if(success_count > 5, "broad", "targeted")
| table userIdentity.userName, eventName, requestParameters.roleName, success_count, first_success, last_success, scope

Reveals which roles were successfully modified and the scope of changes. Multiple modifications to admin or service roles indicates privilege consolidation.

Example 3: Analyzing Attack Patterns

Step 1: Identify users with unusually broad service access

@index=cloudtrail
not userIdentity.userName: (terraform automation-user ci-pipeline)
| eval has_error_code = coalesce(errorCode != null, false)
| eval is_failure = if(has_error_code, 1, 0)
| eval is_success = 1 - is_failure
| stats
  sum(is_success) as success_count,
  countdistinct(eventSource) as service_count,
  count() as total_events
  by userIdentity.userName
| where service_count > 8 and total_events > 50
| eval risk =
  if(service_count > 12, "critical",
    if(service_count > 8, "high", "medium"))
| table userIdentity.userName, service_count, total_events, success_count, risk

Users accessing 8+ different AWS services with high event volume suggests reconnaissance or privilege expansion. Most legitimate users interact with 1-3 services.

Step 2: Find trial-and-error attempts on high-risk services

@index=cloudtrail
eventSource: (
  "lambda.amazonaws.com"
  "events.amazonaws.com"
  "iam.amazonaws.com"
  "ec2.amazonaws.com"
)
| eval is_access_denied = if(coalesce(errorCode == "AccessDenied", false), 1, 0)
| eval is_success = 1 - is_access_denied
| stats
  sum(is_access_denied) as access_denied_count,
  sum(is_success) as success_count,
  count() as total_events,
  countdistinct(userIdentity.userName) as user_count,
  countdistinct(sourceIPAddress) as ip_count
  by eventSource, eventName
| where access_denied_count > 0 and success_count > 0
| table
  eventSource,
  eventName,
  total_events,
  access_denied_count,
  success_count,
  user_count,
  ip_count

Lambda, EventBridge, IAM, and EC2 are critical for persistence. The pattern of both AccessDenied failures and successful operations on the same endpoint suggests trial-and-error attacks: attackers probing with wrong permissions/parameters until they find what works. Multiple users or IPs indicates coordinated activity.

Step 3: Detect lateral movement patterns

@index=cloudtrail
not userIdentity.userName: (terraform automation-user ci-pipeline)
| eval is_failure = if(coalesce(errorCode != null, false), 1, 0)
| eval is_success = 1 - is_failure
| stats
  min(@scnr.time_ns) as first_event_ns,
  max(@scnr.time_ns) as last_event_ns,
  countdistinct(eventSource) as services_touched,
  sum(is_success) as successful_actions
  by userIdentity.userName, sourceIPAddress
| eval duration_seconds = (last_event_ns - first_event_ns) / 1000000000
| eval duration_hours = duration_seconds / 3600
| where services_touched > 5 and duration_hours < 2 and successful_actions > 10
| table userIdentity.userName, sourceIPAddress, services_touched, duration_hours, successful_actions

Rapid movement across multiple services in short timeframe indicates automated lateral movement, not human behavior.


Best Practices

Start Broad, Then Narrow

  • Begin with time range and basic filters

  • Use column statistics to guide refinement

  • Add specificity incrementally

Use Aggregations to Find Outliers

  • Look for unusually high counts

  • Identify rare events

  • Compare current patterns to historical baselines

Pivot Frequently

  • Don't get stuck in one view

  • Switch between summary and detail

  • Look at data from multiple dimensions

Save Useful Queries

  • Document investigation patterns that work

  • Build a library of starting points

  • Share queries with your team

Leverage Time Windows

  • Adjust time ranges to match investigation scope

  • Use histogram visualization to spot activity spikes

  • Click and drag on histogram to zoom into specific periods


Tips for Effective Exploration

Performance optimization:

  • Start with smaller time windows when exploring unfamiliar data

  • Use specific filters before aggregations

  • Use where clauses to limit results after aggregations

Column discovery:

  • Click on an event to view all available fields

  • Use the Filter box in the details panel to search for specific column names

  • Tab-complete in the query box to see available fields

Pattern recognition:

  • Look for deviations from normal behavior

  • Identify time-based patterns (weekday vs. weekend, business hours vs. off-hours)

  • Cross-reference multiple data sources for correlation

Collaboration:

  • Share queries with team members via saved queries

  • Document your investigation process

  • Use clear, descriptive names when saving queries

Last updated

Was this helpful?