How it Works

Scanner is designed to make log search, detection, and investigation fast and cost-effective at any scale. The system is simple at the core: logs stay in your S3 buckets, and Scanner builds lightweight indexes that make them instantly searchable. Everything else builds on this foundation.

Core Design Principles

Storage stays yours — Logs remain in S3; Scanner only adds index files alongside them.
Schema-agnostic indexing— No tedious ETL projects into SQL tables. Scanner builds compact indexes (posting lists for text, numerical ranges for numbers) that make selective scans lightning-fast - whether your data is perfectly tabular, semi-structured JSON, or just plaintext.
Serverless execution — Queries run on short-lived Lambda workers in parallel, scaling automatically with dataset size and delivering hundreds of GB/s to 1TB/s scan speeds.
Query-centric design - The same query-first infrastructure powers ad-hoc searches and always-on detection rules. Every component — from indexing to alerting — is optimized around queries, simplifying architecture and operations.
Governed by default — Role-based access control and a dedicated _audit index provide accountability at every step.

Scanner Architecture

Scanner processes logs through four stages: Ingestion → Indexing → Querying → Detections.

Stage 1: Ingestion

S3 bucket notifications fire when new log files arrive.
Messages are delivered to SQS queues
Scanner instances pull messages to initiate processing.

Stage 2: Indexing

Indexers read raw logs from your S3 buckets.
Compact index files are generated (posting lists for strings, numerical ranges).
Indexes are written to your Scanner Index files bucket
Small index files merge over time for optimal query performance.

Storage ratio: ~150 GB of index files per 1 TB of uncompressed logs

Stage 3: Ad-Hoc Querying

Queries from the browser or API are dispatched to Lambda functions, which search through data in multiple AWS accounts.
Index-first scanning: The metadata database identifies relevant index files, and Lambda workers scan only the necessary data regions using posting lists and numerical ranges, dramatically reducing the search space.
Parallel, serverless execution: Lambda workers traverse index segments simultaneously.
High-speed text matching: Optimized in-memory routines scan relevant data efficiently.
Multi-account aggregation: Queries can span data in multiple AWS accounts, and results are merged automatically, minimizing end-to-end latency.

Queries complete in seconds, even across petabytes of data. Learn how Scanner achieves fast queries →

Stage 4: Continuous Detections

Detection rules are saved queries applied automatically to new log data as it arrives.
Any query can be saved as a detection rule.
Rules run automatically on new or recent logs.
Matches send alerts to destinations like Webhooks, SOARs, Slack, PagerDuty
Rule creations and modifications are logged in the Audit index.

Deployment Models

Scanner offers three deployment models to meet different organizational requirements for security, data governance, cost, and operational overhead. Across all models, customers retain full ownership of their logs and index files, which always reside in their S3 buckets.

Managed Scanner: Multi-Tenant

How it works: Scanner compute infrastructure runs in a shared AWS account owned and managed by Scanner.
- Shared compute resources for cost efficiency.
- Scanner accesses your S3 buckets via secure, customer-managed IAM roles.
- Fully hands-off, SaaS-like experience with zero operational overhead.
Best for: Teams with daily log volumes up to ~500 GB seeking maximum simplicity and minimal operational burden.

Managed Scanner: Single-Tenant

How it works: Dedicated AWS account owned and managed by Scanner, exclusively for your compute infrastructure.
- No shared resources or “noisy neighbor” issues.
- Consistent performance with dedicated compute capacity.
- Scanner accesses your S3 buckets via secure, customer-managed IAM roles.
Best for: Teams ingesting 500 GB+ logs per day who want managed infrastructure with enhanced isolation.

Bring Your Own Cloud (BYOC) / Self-Hosted

How it works: Scanner compute infrastructure is deployed directly into your dedicated AWS account.
- Scanner manages deployment, maintenance, and updates via permission-scoped IAM roles.
- Customers handle underlying AWS infrastructure costs, leveraging any existing discounts or credits.
- Complete visibility into CloudTrail audit logs for all Scanner operations.
Best for: Organizations with strict data governance requirements, significant AWS investments, or teams needing full control over infrastructure and vendor independence.

Cost optimization: Deploy Scanner in the same AWS region as your log buckets to eliminate cross-region transfer costs.

Learn more about Self-Hosted Scanner

Security & Compliance Posture

Unlike traditional SIEMs that require shipping logs to vendor environments, Scanner operates directly on data in your own cloud storage. Security is built into the architecture from the ground up, with a focus on data custody, sovereignty, and operational transparency. Our platform is designed to meet a wide range of regulatory and compliance requirements, including SOC 2, and data residency rules like GDPR.

Customer Data Custody and Control

Data stays in your buckets — Both raw logs and index files are stored in your own S3 buckets. Scanner never moves your data out of your environment, ensuring full control and avoiding vendor lock-in.
Immutable, append-only indexes — Once logs are indexed, the compressed index files are append-only, include internal checksums, and remain fully searchable even if original logs are deleted or altered.
Data residency and regional isolation — Scanner can operate in the same AWS region as your S3 buckets, ensuring your data and compute never leave a specific geographic area, supporting GDPR and other data residency requirements while avoiding cross-region transfer costs.
Long-term, cost-effective retention —Scanner index files can use any S3 storage tier that supports “GetObject” requests (Standard, Standard-Infrequent Access, Glacier Instant) while remaining searchable, helping meet regulatory retention periods efficiently and allowing you to control your costs

Secure by Default Architecture

Encryption at rest and in transit — All data inherits your S3 bucket encryption (SSE-S3, SSE-KMS, or SSE-C) and uses TLS 1.2+ for all transfers. Index files are stored in a proprietary binary format for additional protection.
Least-privilege access — Scanner connects via IAM roles you create, with read-only permissions for logs and controlled read/write for index files. Roles use STS External IDs for secure cross-account access.
Internal controls and auditing — Scanner operations use temporary, monitored "ops roles." All AssumeRole events and API actions are audited via CloudTrail and Scanner's own detection engine.

Compliance and Audit Readiness

SOC 2 Type 2 certified — Attests to security controls, operational processes, and internal auditing practices. Full documentation is available via the Drata Trust Center.
Automated compliance evidence — Detection rule logs and _audit index provide verifiable evidence for PCI daily log review requirements and SOX change management audits. Historical searches complete in seconds/minutes vs. hours/days with traditional tools, helping meet auditor deadlines efficiently.
Secure AI usage — Integrated AI features (e.g., log explanation) are powered by enterprise-grade models like Amazon Bedrock, with enterprise agreements ensuring no training on customer data

PreviousWhat is Scanner NextHow Scanner Achieves Fast Queries

Last updated 1 month ago

Was this helpful?

Good evening