Index Organization

How you structure your indexes impacts query performance, access control, and day-to-day usability. This guide covers common patterns and helps you decide how to organize logs across indexes.

Most Common Approach: One Index Per Log Source

The most popular pattern among Scanner users is to create a dedicated index for each log source. For example:

Log Source
Index Name

Okta

okta_system

CloudTrail

aws_cloudtrail

VPC Flow

aws_vpc_flow

GitHub

github_audit

Cloudflare HTTP

cloudflare_http

Cloudflare DNS

cloudflare_dns

You can use underscores (aws_cloudtrail) or slashes (aws/cloudtrail) as separators. Underscores are more traditional, while slashes create a visual hierarchy. Pick a convention and stick with it. You can also incorporate environments, such as prod_aws_cloudtrail or prod/aws/cloudtrail.

To configure which index logs are sent to, see Sources.

Benefits

Granular RBAC permissions

With separate indexes, you can grant roles access to specific log sources. For example, your security operations team might have access to all indexes, while application developers only see logs relevant to their services.

Easier navigation

When investigating an incident, analysts can quickly target the relevant index without sifting through unrelated data. Searching @index=okta_system immediately narrows focus to identity events.

Query performance

Queries only scan index segments containing relevant data. When you search for Okta events in a dedicated okta_system index, every segment you read contains Okta logs.

Simpler management

Each index can have its own retention policies, and ownership is clear. You can easily track storage costs and query usage per log source.

Alternative: Consolidated Index

The second most common approach is to put all logs into a single index, filtering by %ingest.source_type or %ingest.import_rule_name at query time.

When you first set up Scanner, a default main index is created. Many users start here and later create additional indexes as their needs evolve. Note that reorganizing indexes only affects new data going forward—see the FAQ for details.

When This Might Work

  • Very small environments with minimal total log volume

  • Proof-of-concept or testing scenarios

  • Temporary setups before deciding on final organization

Performance Implications

A consolidated index becomes problematic when log sources have vastly different volumes.

Voluminous sources drown out smaller ones

Consider a "logs" index containing both VPC Flow logs (millions or billions of events daily) and Okta audit logs (thousands of events daily). When you search for Okta events, Scanner must read index segments that are overwhelmingly filled with VPC Flow data.

Slower queries for low-volume sources

Index segments contain mixed data, so queries must process more unrelated events to find matches. A query like @index=logs %ingest.source_type="okta:system" scans segments that are 99%+ VPC Flow data to find the small fraction of Okta events.

No RBAC granularity

You cannot restrict access to specific log types within the same index using standard index permissions. While restriction filters can help, they add complexity compared to simply using separate indexes.

Technical Background

Scanner builds inverted indexes stored as segments. Each segment contains events from whatever logs were indexed during that time window. When log sources of vastly different volumes share an index:

  • High-volume sources dominate segment contents

  • Queries for low-volume logs must scan many segments to find relevant events

  • Query latency and cost increase compared to dedicated indexes where every segment contains relevant data

For more details on Scanner's indexing architecture, see How Scanner Achieves Fast Queries.

Hybrid Approaches

You don't have to go fully one-index-per-source or all-in-one. Hybrid strategies can work well.

It's reasonable to combine closely related sources with similar volumes into a shared index:

  • Identity logs: Okta + Auth0 + Azure AD in an identity_audit index

  • Cloud audit logs: CloudTrail + GCP Audit + Azure Activity in a cloud_audit index

  • Endpoint logs: Sysmon + OSSEC + Osquery in an endpoint_logs index

This works well when:

  • The sources have comparable event volumes

  • The same team typically queries them together

  • You want unified access control for related data

Separating by Team

Some organizations structure indexes around team ownership rather than (or in addition to) log source. This works well when different teams have distinct data and access requirements:

  • Platform team: platform_kubernetes, platform_api_gateway

  • Security team: security_alerts, security_threat_intel

  • Application team: app_payments, app_checkout, app_inventory

You can also combine team and source dimensions:

  • platform_aws_cloudtrail

  • security_okta_system

  • app_payments_logs

This approach makes RBAC straightforward—grant each team access to their indexes—and clarifies ownership for retention policies and cost tracking.

Isolating High-Volume Sources

Always give dedicated indexes to high-volume sources:

  • AWS VPC Flow Logs

  • AWS CloudTrail Logs

  • DNS Logs (AWS Route53 Resolver, Cloudflare DNS)

  • CDN/WAF Logs (Cloudflare HTTP, AWS WAF)

  • Network telemetry (Zeek, Suricata)

These sources can generate orders of magnitude more events than audit logs. Mixing them with lower-volume sources forces every query to wade through massive amounts of unrelated data.

Quick Reference

When deciding how to organize a new log source:

  1. High-volume source? (VPC Flow, DNS, CDN logs) → Dedicated index

  2. Different teams need different access? → Separate indexes per team's data

  3. Closely related sources with similar volumes? → Can combine into one index

  4. Unsure? → Default to one index per source

Starting with separate indexes is generally safer. You can always consolidate later if it makes sense, but splitting an overloaded index is more disruptive.

FAQ

Can I split an existing index into multiple indexes and apply it retroactively?

Splitting into multiple indexes works going forward, but won't apply retroactively to historical data. However, this is usually fine in practice:

  • Scanner's built-in detection rules use %ingest.source_type rather than @index to identify which logs to analyze. This means detection rules continue working seamlessly as you transition data to new indexes.

  • When you query without specifying @index, Scanner automatically searches all indexes you have permission to access. Your data remains discoverable regardless of which index it lives in.

Is it faster to query one large index or multiple smaller indexes?

At high scale (2-3 TB/day or more), multiple smaller indexes typically perform better. When you query a specific low-volume source in a dedicated index, you avoid scanning segments dominated by unrelated high-volume data. With a single large index, queries for low-volume sources must wade through segments filled mostly with other data.

Do detection rules work differently with multiple indexes vs. one large index?

No. All data flowing into any index is analyzed by the detection engine. Logs that match the initial filter of a detection rule flow through the rest of the streaming query engine regardless of which index they're in.

Can I restrict what data a role can see within the same index?

Yes, using restriction filters. These filters apply to every query that role runs, ensuring users cannot return data they shouldn't see. However, we recommend splitting data into separate indexes when possible—it gives you cleaner, easier-to-manage access control compared to layering restriction filters on a shared index.

Does all data in the same index need to have the same schema?

No. Scanner is schemaless—data from many different sources with varying structures can coexist in the same index.

Can I query multiple indexes at once?

Yes. Omit @index to search all indexes you have permission to access, or use or to target specific indexes. See Data Exploration for query techniques.

Can I search for the same field across multiple log sources?

Yes, using ECS normalization. Scanner normalizes fields like IP addresses and usernames to common names (@ecs.source.ip, @ecs.user.name), so you can search across all sources without knowing each source's native field names.

Are there limits on the number of indexes I can create?

No. There are no limits on the number of indexes, and no additional cost implications for having more indexes.

Should I create separate indexes for each AWS account?

Most users put all AWS accounts into the same index (e.g., one aws_cloudtrail index for all accounts). You can filter by account at query time using the recipientAccountId field. Separate indexes per account are only necessary if different teams need isolated access to different accounts.

Can I rename an index?

Yes. You can rename an index in the Scanner UI. Existing data stays with the renamed index, but saved queries and custom detection rules that reference the old name will need to be updated.

What happens to saved queries and detection rules when I reorganize indexes?

Saved queries and custom detection rules that reference indexes by name (using @index=) will need to be updated to use the new index names. Queries that omit @index and search across all permitted indexes will continue to work without changes.

Last updated

Was this helpful?