Data Exploration
Data exploration in Scanner helps you discover patterns, anomalies, and insights in your logs through interactive analysis. This guide covers techniques for investigating data using column statistics, progressive query refinement, and pivoting between different views.
Using Column Statistics
Column statistics show you the most common values for any field in your search results, helping you quickly identify patterns and outliers.
How It Works
Column statistics are computed on the first 1,000 events loaded in your search results table. This gives you a quick snapshot of the data distribution without processing the entire dataset.
Accessing Column Statistics
Run a search query
Open the Columns sidebar on the right
Click on any column name to see its value distribution
Finding Anomalies
Use column statistics to identify suspicious patterns:
Example: Finding unusual error codes
errorCode: *
Click on the errorCode
column in the sidebar. If you see high frequencies of AccessDenied
or PermissionDenied
, this might indicate:
Misconfigured applications
Privilege escalation attempts
Reconnaissance activity
Example: Identifying suspicious IP addresses
eventName: "LoginFailure"
Click on the sourceIP
column. Look for:
Single IPs with unusually high failure counts (brute force attacks)
Unfamiliar geographic locations
Known malicious IP ranges
Adding Filters from Column Statistics
Once you identify a suspicious value:
Click the + button next to the value
Scanner automatically adds it as a filter to your query
Click Run to narrow your search
Building Queries Progressively
Start with broad searches and iteratively refine based on what you discover. This approach helps you understand your data before jumping to conclusions.
The Progressive Refinement Pattern
Step 1: Initial Exploration Begin with a simple filter to understand the data:
errorCode: *
Step 2: Use Column Statistics Identify interesting patterns by examining column distributions.
Step 3: Add Specific Filters Narrow your search based on suspicious values:
errorCode: "AccessDenied"
Step 4: Aggregate for Patterns Summarize by relevant dimensions to see trends:
errorCode: "AccessDenied"
| groupbycount userIdentity.userName, eventName
Step 5: Pivot to Details Once you identify anomalies in aggregated data, remove the aggregation to view individual events:
errorCode: "AccessDenied" userIdentity.userName: "suspicious_user"
Example: Investigating Failed API Calls
Start broad:
@index=cloudtrail
Find failures: Use column statistics on errorCode
→ see high count of AccessDenied
Filter to failures:
@index=cloudtrail errorCode: "AccessDenied"
Identify suspicious users:
@index=cloudtrail errorCode: "AccessDenied"
| groupbycount userIdentity.userName
Investigate specific user:
@index=cloudtrail userIdentity.userName: "compromised_user"
Pivoting Between Views
Move fluidly between aggregated summaries and detailed event logs to investigate from multiple angles.
From Summary to Details
When you spot an anomaly in aggregated data:
Note the key identifying values (username, IP address, service name, etc.)
Remove the aggregation from your query (delete everything from
|
onward)Add the key values as filters
View the detailed logs
Example:
Summary query showing data transfer by user:
eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName
Results show user_a
with 10GB transferred (suspicious).
Pivot to see exactly what they accessed:
eventName: GetObject userName: "user_a"
From Details to Summary
When examining individual events, you may want to see the bigger picture:
Identify key fields you want to summarize (IP, user, service, action)
Add an aggregation to your existing query
Group by relevant dimensions
Example:
Viewing individual S3 access events:
eventSource: s3.amazonaws.com eventName: GetObject
Want to see which users are accessing data most frequently:
eventSource: s3.amazonaws.com eventName: GetObject
| groupbycount userIdentity.userName
Multi-Dimensional Pivoting
Investigate from different angles by changing aggregation dimensions:
By user:
suspicious_activity_indicator
| groupbycount userName
By service:
suspicious_activity_indicator
| groupbycount eventSource
By time pattern:
suspicious_activity_indicator
| groupbycount eventSource, eventName
Advanced Aggregation Techniques
Combine multiple aggregation functions to perform complex analysis.
Renaming Fields for Clarity
Use rename
to create more readable column names, especially useful for deeply nested fields:
eventName: GetObject
| rename
additionalEventData.bytesTransferredOut as bytes_exfiltrated,
requestParameters.bucketName as s3_bucket,
userIdentity.userName as user_name
| stats sum(bytes_exfiltrated) as total_bytes
by s3_bucket, user_name
Why rename?
Shorter, more readable column names in results
Prepare field names for external systems (webhooks, APIs)
Clarify intent when sharing queries with team members
Combining Stats Functions
Use stats
with multiple aggregation functions:
api_calls_filter
| stats
count() as total_calls,
sum(duration) as total_duration,
avg(duration) as avg_duration
by service, endpoint
Filtering Aggregated Results
Use where
to filter aggregation results based on thresholds:
eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName
| where total_bytes > 100000000
This shows only users who transferred more than 100MB.
Selecting Specific Columns
Use table
to return only specific columns from your results:
suspicious_users_filter
| rename sourceIPAddress as ip_address
| groupbycount ip_address
| table ip_address
Useful when:
Preparing data for external webhooks
Simplifying results for dashboards
Extracting specific values for further investigation
Investigation Workflow Examples
Example 1: Detecting Data Exfiltration
Step 1: Find high-volume data transfers
eventSource: s3.amazonaws.com eventName: GetObject
| stats sum(bytesTransferred) as total_bytes by userName
| where total_bytes > 1000000000
Step 2: Identify which buckets are being accessed
eventSource: s3.amazonaws.com eventName: GetObject userName: "suspicious_user"
| groupbycount requestParameters.bucketName
Step 3: View detailed access logs
eventSource: s3.amazonaws.com
eventName: GetObject
userName: "suspicious_user"
requestParameters.bucketName: "sensitive-data-bucket"
Step 4: Check for other users accessing same buckets
eventSource: s3.amazonaws.com
eventName: GetObject
requestParameters.bucketName: "sensitive-data-bucket"
| groupbycount userName
Example 2: Investigating Privilege Escalation
Step 1: Find IAM policy modification attempts
eventSource: iam.amazonaws.com
eventName: (PutRolePolicy OR AttachRolePolicy OR CreateAccessKey)
Step 2: Identify users making changes
eventSource: iam.amazonaws.com
eventName: (PutRolePolicy OR AttachRolePolicy OR CreateAccessKey)
| groupbycount userIdentity.userName, errorCode
Step 3: Check success vs. failure rates
eventSource: iam.amazonaws.com
userIdentity.userName: "suspicious_user"
| groupbycount eventName, errorCode
Step 4: View successful privilege escalations
eventSource: iam.amazonaws.com
userIdentity.userName: "suspicious_user"
NOT errorCode: *
Example 3: Analyzing Attack Patterns
Step 1: Get overview of user's activity
userIdentity.userName: "compromised_account"
| groupbycount eventSource, eventName, errorCode
Step 2: Sort by success/failure Click the errorCode
column header to partition results by successful operations vs. failures.
Step 3: Investigate successful operations
userIdentity.userName: "compromised_account"
NOT errorCode: *
| groupbycount eventSource, eventName
Step 4: Focus on high-risk services
userIdentity.userName: "compromised_account"
eventSource: (lambda.amazonaws.com OR events.amazonaws.com)
Best Practices
Start Broad, Then Narrow
Begin with time range and basic filters
Use column statistics to guide refinement
Add specificity incrementally
Use Aggregations to Find Outliers
Look for unusually high counts
Identify rare events
Compare current patterns to historical baselines
Pivot Frequently
Don't get stuck in one view
Switch between summary and detail
Look at data from multiple dimensions
Save Useful Queries
Document investigation patterns that work
Build a library of starting points
Share queries with your team
Leverage Time Windows
Adjust time ranges to match investigation scope
Use histogram visualization to spot activity spikes
Click and drag on histogram to zoom into specific periods
Tips for Effective Exploration
Performance optimization:
Start with smaller time windows when exploring unfamiliar data
Use specific filters before aggregations
Limit result sets with
| head 100
when testing queries
Column discovery:
Click on an event to view all available fields
Use the Filter box in the details panel to search for specific column names
Tab-complete in the query box to see available fields
Pattern recognition:
Look for deviations from normal behavior
Identify time-based patterns (weekday vs. weekend, business hours vs. off-hours)
Cross-reference multiple data sources for correlation
Collaboration:
Share queries with team members via saved queries
Document your investigation process
Use clear, descriptive names when saving queries
Last updated
Was this helpful?