Data Exploration
Data exploration in Scanner helps you discover patterns, anomalies, and insights in your logs through interactive analysis. This guide covers techniques for investigating data using column statistics, progressive query refinement, and pivoting between different views.
Using Column Statistics
Column statistics show you the most common values for any field in your search results, helping you quickly identify patterns and outliers.
How It Works
Column statistics are computed on the first 1,000 events loaded in your search results table. This gives you a quick snapshot of the data distribution without processing the entire dataset.
Accessing Column Statistics
Run a search query
Open the Columns sidebar on the right
Click on any column name to see its value distribution
Finding Anomalies
Use column statistics to identify suspicious patterns:
Example: Finding unusual error codes
@index=cloudtrail
errorCode: *Click on the errorCode column in the sidebar. If you see high frequencies of AccessDenied or PermissionDenied, this might indicate:
Misconfigured applications
Privilege escalation attempts
Reconnaissance activity
Example: Identifying suspicious IP addresses
@index=cloudtrail
eventName: "ConsoleLogin"
errorCode: *Click on the sourceIPAddress column. Look for:
Single IPs with unusually high failure counts (brute force attacks)
IPs from unexpected locations or networks
Known malicious IP ranges
Adding Filters from Column Statistics
Once you identify a suspicious value:
Click the + button next to the value
Scanner automatically adds it as a filter to your query
Click Run to narrow your search
Understanding ECS Normalized Fields
Scanner Collect normalizes logs from all supported sources into Elastic Common Schema (ECS) fields. This means you can write queries that work across CloudTrail, Auth0, Okta, and other sources without learning source-specific field names.
Why ECS Fields Matter
When you ingest logs from different sources, they use different field names:
CloudTrail uses
userIdentity.userName, Auth0 usesuser_id, Okta usesactor.alternateIdCloudTrail uses
eventName, but other sources might useactionoractivityError indicators vary: CloudTrail has
errorCode, others might usestatusorresult
ECS normalizes these into standard fields so your queries work across all sources. Learn more about ECS normalization.
Common ECS Fields
User and Identity:
@ecs.user.name- Username (fromuserIdentity.userNamein CloudTrail,user_idin Auth0, etc.)@ecs.user.id- User ID or ARN@ecs.user_agent- User agent string
Event Information:
@ecs.event.action- The action performed (e.g., "DescribeNetworkInterfaces", "AssumeRole")@ecs.event.outcome- Result of the action:successorfailure@ecs.event.category- Category like "Management", "Authentication", "Discovery"
Cloud and Network:
@ecs.cloud.provider- Cloud provider ("aws", "azure", "gcp")@ecs.cloud.region- Cloud region@ecs.cloud.account.id- Cloud account ID@ecs.cloud.service.name- Service name (e.g., "s3.amazonaws.com")@ecs.source.ip- Source IP address
Writing Cross-Source Queries
Instead of writing separate queries for each source, use ECS fields:
CloudTrail-only query (source-specific):
@index=cloudtrail
userIdentity.userName: "admin_user"
eventName: "AssumeRole"
NOT errorCode: *Cross-source query (works for CloudTrail, Auth0, Okta, etc.):
@ecs.user.name: "admin_user"
@ecs.event.action: "AssumeRole"
@ecs.event.outcome: "success"This second query will return matching events from ANY source that Scanner Collect supports, making it much more powerful.
Availability
All sources supported by Scanner Collect provide ECS fields. Native source fields (like eventName, userIdentity.arn for CloudTrail) are still available—ECS fields are in addition to, not instead of, the original fields.
Building Queries Progressively
Start with broad searches and iteratively refine based on what you discover. This approach helps you understand your data before jumping to conclusions.
The Progressive Refinement Pattern
Step 1: Initial Exploration Begin with a simple filter to understand the data:
@index=cloudtrail
errorCode: *Run this and open the Columns sidebar to see error code distribution.
Step 2: Use Column Statistics Identify interesting patterns by examining column distributions. Click on the errorCode column and look for high-frequency values like AccessDenied or UnauthorizedOperation.
Step 3: Add Specific Filters Narrow your search based on suspicious values:
@index=cloudtrail
errorCode: "AccessDenied"Step 4: Aggregate for Patterns Summarize by relevant dimensions to see trends:
@index=cloudtrail
errorCode: "AccessDenied"
| groupbycount userIdentity.userName, eventNameThis shows which users are hitting access denied errors and which actions they're attempting.
Step 5: Pivot to Details Once you identify anomalies in aggregated data, remove the aggregation to view individual events. If you see a user with high count:
@index=cloudtrail
errorCode: "AccessDenied"
| groupbycount userIdentity.userNameThen pick a high-count user and drill into their details:
@index=cloudtrail
errorCode: "AccessDenied" userIdentity.userName: "admin_user"Example: Investigating Failed API Calls
Start broad:
@index=cloudtrailFind failures: Use column statistics on errorCode → see high count of AccessDenied
Filter to failures:
@index=cloudtrail errorCode: "AccessDenied"Identify suspicious users:
@index=cloudtrail errorCode: "AccessDenied"
| groupbycount userIdentity.userNameInvestigate specific user: Pick a user with high failure count and drill in:
@index=cloudtrail userIdentity.userName: "admin_user"Pivoting Between Views
Move fluidly between aggregated summaries and detailed event logs to investigate from multiple angles.
From Summary to Details
When you spot an anomaly in aggregated data:
Note the key identifying values (username, IP address, service name, etc.)
Remove the aggregation from your query (delete everything from
|onward)Add the key values as filters
View the detailed logs
Example:
Summary query showing data transfer by user:
@index=cloudtrail
eventName: GetObject
| stats
sum(bytesTransferred) as total_bytes
by userIdentity.userName
| eval gb = total_bytes / (1024 * 1024 * 1024)
| where gb > 1This identifies users transferring over 1GB in a single session. If user_a shows 10GB transferred (far above normal), investigate further.
Pivot to see exactly what they accessed:
@index=cloudtrail
eventName: GetObject userIdentity.userName: "user_a"From Details to Summary
When examining individual events, you may want to see the bigger picture:
Identify key fields you want to summarize (IP, user, service, action)
Add an aggregation to your existing query
Group by relevant dimensions
Example:
Viewing individual S3 access events:
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"Want to see which users are accessing data most frequently:
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| groupbycount userIdentity.userNameExploring Related Context
When you spot a suspicious event, use Go to Context to see other events that happened around the same time with related properties.
How it works:
Click on any event in your results table
Click Go to Context to see events from that time period
Select facets to narrow the context (e.g.,
userIdentity.userName=jane_lopez,eventSource=s3.amazonaws.com)Select multiple facets to find correlated activity
Example workflow:
You find a suspicious S3 access event by jane_lopez. Click Go to Context and select:
userIdentity.userName: jane_lopez[x]eventSource: s3.amazonaws.com[x]
This shows all S3 events from that user in a 5-minute window, letting you see if they accessed multiple buckets, how many objects they retrieved, or if they failed before succeeding.
When to use:
Correlate user activity across multiple services
Find failed attempts that led to successful exploits
Trace an attacker's full activity path in a time window
Understand relationships between events
Multi-Dimensional Pivoting
Investigate from different angles by changing aggregation dimensions:
By user:
@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount userIdentity.userNameBy service:
@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount eventSourceBy service and action:
@index=cloudtrail
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| groupbycount eventSource, eventNameAdvanced Aggregation Techniques
Combine multiple aggregation functions to perform complex analysis.
Renaming Fields for Clarity
Use rename to create more readable column names, especially useful for deeply nested fields:
@index=cloudtrail
eventName: GetObject
| rename
additionalEventData.bytesTransferredOut as bytes_exfiltrated,
requestParameters.bucketName as s3_bucket,
userIdentity.userName as user_name
| stats
sum(bytes_exfiltrated) as total_bytes
by s3_bucket, user_nameWhy rename?
Shorter, more readable column names in results
Prepare field names for external systems (webhooks, APIs)
Clarify intent when sharing queries with team members
Combining Stats Functions
Use stats with multiple aggregation functions:
@index=api_server
http.response_status = 200
| stats
count() as total_calls,
sum(http.duration_ms) as total_duration,
avg(http.duration_ms) as avg_duration,
percentile(http.duration_ms, 95) as p95_duration
by service_name, endpointFiltering Aggregated Results
Use where to filter aggregation results based on thresholds:
@index=cloudtrail
eventName: GetObject
| stats
sum(bytesTransferred) as total_bytes
by userIdentity.userName
| eval mb = total_bytes / (1024 * 1024)
| where mb > 100This shows only users who transferred more than 100MB.
Selecting Specific Columns
Use table to return only specific columns from your results:
@index=cloudtrail
eventName: GetObject
| stats
sum(bytesTransferred) as total_bytes
by sourceIPAddress
| eval mb = total_bytes / (1024 * 1024)
| where mb > 500
| rename sourceIPAddress as ip_address
| table ip_addressUseful when:
Preparing data for external webhooks
Simplifying results for dashboards
Extracting specific values for further investigation
Investigation Workflow Examples
Example 1: Detecting Data Exfiltration
Step 1: Find high-volume data transfers
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
sum(bytesTransferred) as total_bytes
by userIdentity.userName
| eval gbTransferred = total_bytes / (1024 * 1024 * 1024)
| where gbTransferred > 1This threshold (1GB) catches unusual single-session transfers. Adjust based on your baseline—if users typically transfer 100MB, flag anything over 500MB.
Why this matters: Normal S3 usage rarely involves single-session transfers exceeding 1GB. Large transfers may indicate exfiltration.
Step 2: Identify anomalous access patterns by user
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
count() as access_count,
sum(additionalEventData.bytesTransferredOut) as total_bytes
by userIdentity.userName, requestParameters.bucketName
| eval gb = total_bytes / (1024 * 1024 * 1024)
| where access_count > 100 and gb > 0.5
| table userIdentity.userName, requestParameters.bucketName, access_count, gbFind users with unusually high request volumes and large data transfers in single sessions.
Threshold rationale: 100+ requests to a single bucket is unusual (most users query a few times). Combined with 0.5GB+ transfer, this strongly suggests automated data collection rather than normal usage.
Step 3: Analyze access velocity and timing for high-volume users
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
count() as requests,
min(@scnr.time_ns) as first_access_ns,
max(@scnr.time_ns) as last_access_ns
by userIdentity.userName, sourceIPAddress
| eval duration_seconds = (last_access_ns - first_access_ns) / 1000000000
| eval duration_minutes = duration_seconds / 60
| where requests > 50 and duration_minutes < 30
| table userIdentity.userName, sourceIPAddress, requests, duration_minutesFind rapid-fire access (high request count in short time window). Legitimate users rarely access S3 hundreds of times in 30 minutes - this indicates scripted data collection.
Threshold rationale: 50+ requests in under 30 minutes is physically impossible for manual access. This pattern is typical of automated tools systematically exfiltrating data.
Step 4: Cross-reference with normal baseline
@index=cloudtrail
eventSource: "s3.amazonaws.com"
eventName: "GetObject"
| stats
count() as total_access_count,
countdistinct(requestParameters.bucketName) as bucket_count
by userIdentity.userName
| eval access_pattern =
if(total_access_count > 1000, "high_volume",
if(total_access_count > 100, "medium", "low"))
| where bucket_count > 5
| table userIdentity.userName, total_access_count, bucket_count, access_patternUsers accessing 5+ different buckets with high volume suggests cross-bucket data collection, typical of exfiltration campaigns.
Example 2: Investigating Privilege Escalation
Step 1: Find IAM policy modification attempts
@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)Step 2: Identify users attempting policy changes
@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
| eval has_error_code = coalesce(errorCode != null, false)
| eval is_failure = if(has_error_code, 1, 0)
| eval is_success = 1 - is_failure
| stats
sum(is_success) as success_count,
sum(is_failure) as failure_count,
count() as total_attempts
by userIdentity.userName, eventName
| eval success_pct = 100 * success_count / total_attempts
| where total_attempts > 3
| table userIdentity.userName, eventName, total_attempts, success_count, success_pctFind users with multiple privilege escalation attempts. Multiple failures followed by success is a classic attack pattern.
Step 3: Examine successful policy modifications and targets
@index=cloudtrail
eventSource: "iam.amazonaws.com"
eventName: (PutRolePolicy AttachRolePolicy CreateAccessKey)
not errorCode: *
| stats
count() as success_count,
min(eventTime) as first_success,
max(eventTime) as last_success
by userIdentity.userName, eventName, requestParameters.roleName
| eval scope = if(success_count > 5, "broad", "targeted")
| table userIdentity.userName, eventName, requestParameters.roleName, success_count, first_success, last_success, scopeReveals which roles were successfully modified and the scope of changes. Multiple modifications to admin or service roles indicates privilege consolidation.
Example 3: Analyzing Attack Patterns
Step 1: Identify users with unusually broad service access
@index=cloudtrail
not userIdentity.userName: (terraform automation-user ci-pipeline)
| eval has_error_code = coalesce(errorCode != null, false)
| eval is_failure = if(has_error_code, 1, 0)
| eval is_success = 1 - is_failure
| stats
sum(is_success) as success_count,
countdistinct(eventSource) as service_count,
count() as total_events
by userIdentity.userName
| where service_count > 8 and total_events > 50
| eval risk =
if(service_count > 12, "critical",
if(service_count > 8, "high", "medium"))
| table userIdentity.userName, service_count, total_events, success_count, riskUsers accessing 8+ different AWS services with high event volume suggests reconnaissance or privilege expansion. Most legitimate users interact with 1-3 services.
Step 2: Find trial-and-error attempts on high-risk services
@index=cloudtrail
eventSource: (
"lambda.amazonaws.com"
"events.amazonaws.com"
"iam.amazonaws.com"
"ec2.amazonaws.com"
)
| eval is_access_denied = if(coalesce(errorCode == "AccessDenied", false), 1, 0)
| eval is_success = 1 - is_access_denied
| stats
sum(is_access_denied) as access_denied_count,
sum(is_success) as success_count,
count() as total_events,
countdistinct(userIdentity.userName) as user_count,
countdistinct(sourceIPAddress) as ip_count
by eventSource, eventName
| where access_denied_count > 0 and success_count > 0
| table
eventSource,
eventName,
total_events,
access_denied_count,
success_count,
user_count,
ip_countLambda, EventBridge, IAM, and EC2 are critical for persistence. The pattern of both AccessDenied failures and successful operations on the same endpoint suggests trial-and-error attacks: attackers probing with wrong permissions/parameters until they find what works. Multiple users or IPs indicates coordinated activity.
Step 3: Detect lateral movement patterns
@index=cloudtrail
not userIdentity.userName: (terraform automation-user ci-pipeline)
| eval is_failure = if(coalesce(errorCode != null, false), 1, 0)
| eval is_success = 1 - is_failure
| stats
min(@scnr.time_ns) as first_event_ns,
max(@scnr.time_ns) as last_event_ns,
countdistinct(eventSource) as services_touched,
sum(is_success) as successful_actions
by userIdentity.userName, sourceIPAddress
| eval duration_seconds = (last_event_ns - first_event_ns) / 1000000000
| eval duration_hours = duration_seconds / 3600
| where services_touched > 5 and duration_hours < 2 and successful_actions > 10
| table userIdentity.userName, sourceIPAddress, services_touched, duration_hours, successful_actionsRapid movement across multiple services in short timeframe indicates automated lateral movement, not human behavior.
Best Practices
Start Broad, Then Narrow
Begin with time range and basic filters
Use column statistics to guide refinement
Add specificity incrementally
Use Aggregations to Find Outliers
Look for unusually high counts
Identify rare events
Compare current patterns to historical baselines
Pivot Frequently
Don't get stuck in one view
Switch between summary and detail
Look at data from multiple dimensions
Save Useful Queries
Document investigation patterns that work
Build a library of starting points
Share queries with your team
Leverage Time Windows
Adjust time ranges to match investigation scope
Use histogram visualization to spot activity spikes
Click and drag on histogram to zoom into specific periods
Tips for Effective Exploration
Performance optimization:
Start with smaller time windows when exploring unfamiliar data
Use specific filters before aggregations
Use
whereclauses to limit results after aggregations
Column discovery:
Click on an event to view all available fields
Use the Filter box in the details panel to search for specific column names
Tab-complete in the query box to see available fields
Pattern recognition:
Look for deviations from normal behavior
Identify time-based patterns (weekday vs. weekend, business hours vs. off-hours)
Cross-reference multiple data sources for correlation
Collaboration:
Share queries with team members via saved queries
Document your investigation process
Use clear, descriptive names when saving queries
Last updated
Was this helpful?