# How it Works

Scanner is designed to make log search, detection, and investigation fast and cost-effective at any scale. The system is simple at the core: logs stay in your S3 buckets, and Scanner builds lightweight indexes that make them instantly searchable. Everything else builds on this foundation.

## Core Design Principles

* **Storage stays yours** — Logs remain in S3; Scanner only adds index files alongside them.
* **Schema-agnostic indexing**— No tedious ETL projects into SQL tables. Scanner builds compact indexes (posting lists for text, numerical ranges for numbers) that make selective scans lightning-fast - whether your data is perfectly tabular, semi-structured JSON, or just plaintext.
* **Serverless execution** — Queries run on short-lived Lambda workers in parallel, scaling automatically with dataset size and delivering hundreds of GB/s to 1TB/s scan speeds.
* **Query-centric design** - The same query-first infrastructure powers ad-hoc searches and always-on detection rules. Every component — from indexing to alerting — is optimized around queries, simplifying architecture and operations.
* **Governed by default** — Role-based access control and a dedicated \_audit index provide accountability at every step.

## Scanner Architecture

Scanner processes logs through four stages: **Ingestion** → **Indexing** → **Querying** → **Detections**.

### Stage 1: Ingestion

* S3 bucket notifications fire when new log files arrive.
* Messages are delivered to the Scanner event bus
* Scanner instances pull messages to initiate processing.

<figure><img src="https://974571140-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxPzBslRzquS8OU1IlC6E%2Fuploads%2Fgit-blob-11dcd9ac784f24a982ba1a09dc240e4cbc5c99c2%2Funknown.png?alt=media" alt=""><figcaption></figcaption></figure>

### Stage 2: Indexing

* Indexers read raw logs from your S3 buckets.
* Compact index files are generated (posting lists for strings, numerical ranges).
* Indexes are written to your Scanner Index files bucket
* Small index files merge over time for optimal query performance.

**Storage ratio**: \~150 GB of index files per 1 TB of uncompressed logs

### Stage 3: Ad-Hoc Querying

* Queries from the browser or API are dispatched to Lambda functions, which search through data in multiple AWS accounts.
* Index-first scanning: The metadata database identifies relevant index files, and Lambda workers scan only the necessary data regions using posting lists and numerical ranges, dramatically reducing the search space.
* Parallel, serverless execution: Lambda workers traverse index segments simultaneously.
* High-speed text matching: Optimized in-memory routines scan relevant data efficiently.
* Multi-account aggregation: Queries can span data in multiple AWS accounts, and results are merged automatically, minimizing end-to-end latency.

Queries complete in seconds, even across petabytes of data. [Learn how Scanner achieves fast queries →](https://docs.scanner.dev/scanner/what-and-why/how-it-works/how-scanner-achieves-fast-queries)

<figure><img src="https://974571140-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxPzBslRzquS8OU1IlC6E%2Fuploads%2Fgit-blob-726886a9fa063bc68c47d55ff11059920262bb40%2Funknown.png?alt=media" alt=""><figcaption></figcaption></figure>

### Stage 4: Continuous Detections

* Detection rules are saved queries applied automatically to new log data as it arrives.
* Any query can be saved as a detection rule.
* Rules run automatically on new or recent logs.
* Matches send alerts to destinations like Webhooks, SOARs, Slack, PagerDuty
* Rule creations and modifications are logged in the Audit index.

<figure><img src="https://974571140-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxPzBslRzquS8OU1IlC6E%2Fuploads%2Fgit-blob-f56666f288f2a12ccb6899d2024479eca76d4a9d%2Funknown.png?alt=media" alt=""><figcaption></figcaption></figure>

## Deployment Models

Scanner offers three deployment models to meet different organizational requirements for **security, data governance, cost,** and **operational overhead**. Across all models, **customers retain full ownership of their logs and index files**, which always reside in their S3 buckets.

### Managed Scanner: Multi-Tenant

* **How it works:** Scanner compute infrastructure runs in a shared AWS account owned and managed by Scanner.
  * Shared compute resources for cost efficiency.
  * Scanner accesses your S3 buckets via secure, customer-managed IAM roles.
  * Fully hands-off, SaaS-like experience with zero operational overhead.
* **Best for:** Teams with daily log volumes up to \~500 GB seeking maximum simplicity and minimal operational burden.

### Managed Scanner: Single-Tenant

* **How it works:** Dedicated AWS account owned and managed by Scanner, exclusively for your compute infrastructure.
  * No shared resources or “noisy neighbor” issues.
  * Consistent performance with dedicated compute capacity.
  * Scanner accesses your S3 buckets via secure, customer-managed IAM roles.
* **Best for:** Teams ingesting 500 GB+ logs per day who want managed infrastructure with enhanced isolation.

### Bring Your Own Cloud (BYOC) / Self-Hosted

* **How it works:** Scanner compute infrastructure is deployed directly into your dedicated AWS account.
  * Scanner manages deployment, maintenance, and updates via permission-scoped IAM roles.
  * Customers handle underlying AWS infrastructure costs, leveraging any existing discounts or credits.
  * Complete visibility into CloudTrail audit logs for all Scanner operations.
* **Best for:** Organizations with strict data governance requirements, significant AWS investments, or teams needing full control over infrastructure and vendor independence.

**Cost optimization:** Deploy Scanner in the same AWS region as your log buckets to eliminate cross-region transfer costs.

[Learn more about Self-Hosted Scanner](https://docs.scanner.dev/scanner/what-and-why/how-it-works/bring-your-own-cloud-byoc-self-hosted)

## Security & Compliance Posture

Unlike traditional SIEMs that require shipping logs to vendor environments, Scanner operates directly on data in your own cloud storage. **Security is built into the architecture from the ground up**, with a focus on **data custody, sovereignty, and operational transparency**. Our platform is designed to meet a wide range of regulatory and compliance requirements, including SOC 2, and data residency rules like GDPR.

#### Customer Data Custody and Control

* **Data stays in your buckets** — Both raw logs and index files are stored in your own S3 buckets. Scanner never moves your data out of your environment, ensuring full control and avoiding vendor lock-in.
* **Immutable, append-only indexes** — Once logs are indexed, the compressed index files are append-only, include internal checksums, and remain fully searchable even if original logs are deleted or altered.
* **Data residency and regional isolation** — Scanner can operate in the same AWS region as your S3 buckets, ensuring your data and compute never leave a specific geographic area, supporting GDPR and other data residency requirements while avoiding cross-region transfer costs.
* **Long-term, cost-effective retention** —Scanner index files can use any S3 storage tier that supports “GetObject” requests (Standard, Standard-Infrequent Access, Glacier Instant) while remaining searchable, helping meet regulatory retention periods efficiently and allowing you to control your costs

#### Secure by Default Architecture

* **Encryption at rest and in transit** — All data inherits your S3 bucket encryption (SSE-S3, SSE-KMS, or SSE-C) and uses TLS 1.2+ for all transfers. Index files are stored in a proprietary binary format for additional protection.
* **Least-privilege access** — Scanner connects via IAM roles you create, with read-only permissions for logs and controlled read/write for index files. Roles use STS External IDs for secure cross-account access.
* **Internal controls and auditing** — Scanner operations use temporary, monitored "ops roles." All AssumeRole events and API actions are audited via CloudTrail and Scanner's own detection engine.

#### Compliance and Audit Readiness

* **SOC 2 Type 2 certified** — Attests to security controls, operational processes, and internal auditing practices. Full documentation is available via the Drata Trust Center.
* **Automated compliance evidence** — Detection rule logs and \_audit index provide verifiable evidence for PCI daily log review requirements and SOX change management audits. Historical searches complete in seconds/minutes vs. hours/days with traditional tools, helping meet auditor deadlines efficiently.
* **Secure AI usage** — Integrated AI features (e.g., log explanation) are powered by enterprise-grade models like Amazon Bedrock, with enterprise agreements ensuring no training on customer data
