Index Organization
How you structure your indexes impacts query performance, access control, and day-to-day usability. This guide covers common patterns and helps you decide how to organize logs across indexes.
Most Common Approach: One Index Per Log Source
The most popular pattern among Scanner users is to create a dedicated index for each log source. For example:
Okta
okta_system
CloudTrail
aws_cloudtrail
VPC Flow
aws_vpc_flow
GitHub
github_audit
Cloudflare HTTP
cloudflare_http
Cloudflare DNS
cloudflare_dns
You can use underscores (aws_cloudtrail) or slashes (aws/cloudtrail) as separators. Underscores are more traditional, while slashes create a visual hierarchy. Pick a convention and stick with it. You can also incorporate environments, such as prod_aws_cloudtrail or prod/aws/cloudtrail.
To configure which index logs are sent to, see Sources.
Benefits
Granular RBAC permissions
With separate indexes, you can grant roles access to specific log sources. For example, your security operations team might have access to all indexes, while application developers only see logs relevant to their services.
Easier navigation
When investigating an incident, analysts can quickly target the relevant index without sifting through unrelated data. Searching @index=okta_system immediately narrows focus to identity events.
Query performance
Queries only scan index segments containing relevant data. When you search for Okta events in a dedicated okta_system index, every segment you read contains Okta logs.
Simpler management
Each index can have its own retention policies, and ownership is clear. You can easily track storage costs and query usage per log source.
Alternative: Consolidated Index
The second most common approach is to put all logs into a single index, filtering by %ingest.source_type or %ingest.import_rule_name at query time.
When you first set up Scanner, a default main index is created. Many users start here and later create additional indexes as their needs evolve. Note that reorganizing indexes only affects new data going forward—see the FAQ for details.
When This Might Work
Very small environments with minimal total log volume
Proof-of-concept or testing scenarios
Temporary setups before deciding on final organization
Performance Implications
A consolidated index becomes problematic when log sources have vastly different volumes.
Voluminous sources drown out smaller ones
Consider a "logs" index containing both VPC Flow logs (millions or billions of events daily) and Okta audit logs (thousands of events daily). When you search for Okta events, Scanner must read index segments that are overwhelmingly filled with VPC Flow data.
Slower queries for low-volume sources
Index segments contain mixed data, so queries must process more unrelated events to find matches. A query like @index=logs %ingest.source_type="okta:system" scans segments that are 99%+ VPC Flow data to find the small fraction of Okta events.
No RBAC granularity
You cannot restrict access to specific log types within the same index using standard index permissions. While restriction filters can help, they add complexity compared to simply using separate indexes.
Technical Background
Scanner builds inverted indexes stored as segments. Each segment contains events from whatever logs were indexed during that time window. When log sources of vastly different volumes share an index:
High-volume sources dominate segment contents
Queries for low-volume logs must scan many segments to find relevant events
Query latency and cost increase compared to dedicated indexes where every segment contains relevant data
For more details on Scanner's indexing architecture, see How Scanner Achieves Fast Queries.
Hybrid Approaches
You don't have to go fully one-index-per-source or all-in-one. Hybrid strategies can work well.
Grouping Related Sources
It's reasonable to combine closely related sources with similar volumes into a shared index:
Identity logs: Okta + Auth0 + Azure AD in an
identity_auditindexCloud audit logs: CloudTrail + GCP Audit + Azure Activity in a
cloud_auditindexEndpoint logs: Sysmon + OSSEC + Osquery in an
endpoint_logsindex
This works well when:
The sources have comparable event volumes
The same team typically queries them together
You want unified access control for related data
Separating by Team
Some organizations structure indexes around team ownership rather than (or in addition to) log source. This works well when different teams have distinct data and access requirements:
Platform team:
platform_kubernetes,platform_api_gatewaySecurity team:
security_alerts,security_threat_intelApplication team:
app_payments,app_checkout,app_inventory
You can also combine team and source dimensions:
platform_aws_cloudtrailsecurity_okta_systemapp_payments_logs
This approach makes RBAC straightforward—grant each team access to their indexes—and clarifies ownership for retention policies and cost tracking.
Isolating High-Volume Sources
Always give dedicated indexes to high-volume sources:
AWS VPC Flow Logs
AWS CloudTrail Logs
DNS Logs (AWS Route53 Resolver, Cloudflare DNS)
CDN/WAF Logs (Cloudflare HTTP, AWS WAF)
Network telemetry (Zeek, Suricata)
These sources can generate orders of magnitude more events than audit logs. Mixing them with lower-volume sources forces every query to wade through massive amounts of unrelated data.
Quick Reference
When deciding how to organize a new log source:
High-volume source? (VPC Flow, DNS, CDN logs) → Dedicated index
Different teams need different access? → Separate indexes per team's data
Closely related sources with similar volumes? → Can combine into one index
Unsure? → Default to one index per source
Starting with separate indexes is generally safer. You can always consolidate later if it makes sense, but splitting an overloaded index is more disruptive.
FAQ
Can I split an existing index into multiple indexes and apply it retroactively?
Splitting into multiple indexes works going forward, but won't apply retroactively to historical data. However, this is usually fine in practice:
Scanner's built-in detection rules use
%ingest.source_typerather than@indexto identify which logs to analyze. This means detection rules continue working seamlessly as you transition data to new indexes.When you query without specifying
@index, Scanner automatically searches all indexes you have permission to access. Your data remains discoverable regardless of which index it lives in.
Is it faster to query one large index or multiple smaller indexes?
At high scale (2-3 TB/day or more), multiple smaller indexes typically perform better. When you query a specific low-volume source in a dedicated index, you avoid scanning segments dominated by unrelated high-volume data. With a single large index, queries for low-volume sources must wade through segments filled mostly with other data.
Do detection rules work differently with multiple indexes vs. one large index?
No. All data flowing into any index is analyzed by the detection engine. Logs that match the initial filter of a detection rule flow through the rest of the streaming query engine regardless of which index they're in.
Can I restrict what data a role can see within the same index?
Yes, using restriction filters. These filters apply to every query that role runs, ensuring users cannot return data they shouldn't see. However, we recommend splitting data into separate indexes when possible—it gives you cleaner, easier-to-manage access control compared to layering restriction filters on a shared index.
Does all data in the same index need to have the same schema?
No. Scanner is schemaless—data from many different sources with varying structures can coexist in the same index.
Can I query multiple indexes at once?
Yes. Omit @index to search all indexes you have permission to access, or use or to target specific indexes. See Data Exploration for query techniques.
Can I search for the same field across multiple log sources?
Yes, using ECS normalization. Scanner normalizes fields like IP addresses and usernames to common names (@ecs.source.ip, @ecs.user.name), so you can search across all sources without knowing each source's native field names.
Are there limits on the number of indexes I can create?
No. There are no limits on the number of indexes, and no additional cost implications for having more indexes.
Should I create separate indexes for each AWS account?
Most users put all AWS accounts into the same index (e.g., one aws_cloudtrail index for all accounts). You can filter by account at query time using the recipientAccountId field. Separate indexes per account are only necessary if different teams need isolated access to different accounts.
Can I rename an index?
Yes. You can rename an index in the Scanner UI. Existing data stays with the renamed index, but saved queries and custom detection rules that reference the old name will need to be updated.
What happens to saved queries and detection rules when I reorganize indexes?
Saved queries and custom detection rules that reference indexes by name (using @index=) will need to be updated to use the new index names. Queries that omit @index and search across all permitted indexes will continue to work without changes.
Last updated
Was this helpful?