Google Cloud Platform (GCP) Audit
GCP Audit logs provide visibility into administrative activities, API calls, and access patterns across your Google Cloud infrastructure. This guide walks through setting up GCP Audit logs in Scanner Collect, so that logs can be ingested from S3, normalized, and indexed for search and detection.
Overview
The GCP-to-S3 pipeline uses a modular Terraform setup to create a serverless architecture that automatically collects and delivers GCP Audit logs to your S3 bucket:
Cloud Logging routes logs to a Pub/Sub topic
Pub/Sub Push Subscription batches entries and writes to Google Cloud Storage (GCS)
Cloud Function transfers batched files from GCS to your S3 bucket with compression (idempotent), then deletes the temporary GCS files
Cleanup Function retries failed transfers every 30 minutes
Expected latency: 2-3 minutes from log generation to S3 availability.
Prerequisites
Before setting up GCP Audit logs in Scanner, you must:
Have a GCP project with appropriate permissions - You'll need permissions to create Cloud Logging sinks, Pub/Sub topics, GCS buckets, and Cloud Functions
Have an AWS account - Required to deliver logs to S3. The Terraform code can be configured to create a new S3 bucket or point to an existing one
Install Terraform locally - Required to deploy the pipeline infrastructure
Once the Terraform setup is complete and logs are flowing to S3, you can proceed with configuring the source in Scanner Collect.
Part 1: Deploy the GCP-to-S3 Pipeline
From a terminal, clone the gcp-to-scanner-collect repository:
git clone https://github.com/scanner-inc/gcp-to-scanner-collect.git
cd gcp-to-scanner-collectThen follow the README to deploy the Terraform infrastructure.
In main.tf, choose the configuration that matches your S3 setup:
To create a new S3 bucket:
Uncomment the audit_logs_pipeline module.
To use an existing S3 bucket:
Uncomment the audit_logs_to_existing_bucket module.
Follow the README's Setup section to:
Configure your
terraform.tfvarsfile with required variablesInitialize and deploy the Terraform infrastructure
Verify that logs are flowing to your S3 bucket under
gcp/audit/
Note: Logs typically appear in S3 within 2-3 minutes of being generated. If you don't see logs immediately after deployment, wait a few minutes before proceeding.
Once logs are flowing to S3, proceed to Part 2 below to configure the Scanner source.
Part 2: Create a Scanner Source
Once logs are flowing to S3, configure the source in Scanner Collect to index and search them.
Step 1: Create a New Source
Navigate to the Collect tab in the Scanner UI.
Click Create New Source.
Click Select a Source Type.
Choose Google Cloud Platform (GCP) - Audit logs.
You'll see that:
Ingest Method is set to AWS S3
Destination is set to Scanner
Click Next.
Step 2: Configure the Source
Set a Display Name, such as
my-org-gcp-audit.Leave File Type as
JsonLines.Leave Compression as
Gzip.
Click Next.
Step 3: Set the Origin (S3 Bucket)
Select the S3 bucket where your GCP logs are being delivered by Terraform.
Enter the Bucket Prefix:
gcp/audit/(logs are organized by date:gcp/audit/YYYY/MM/DD/hh/mm_ssZ_*.jsonl.gz)No additional File Regex configuration is needed.
Click Next.
Step 4: Set the Destination
Choose the Scanner index where GCP Audit logs should be stored for search and detection.
Leave the Source Label set to
gcp.
Click Next.
Step 5: Transform and Enrich
Keep the default enrichment settings:
Normalize to ECS - GCP Audit (maps GCP log fields to Elastic Common Schema for cross-source queries and detections)
Parse JSON Columns (automatically parses stringified JSON if present)
(Optional) Add additional transformation or enrichment steps if desired.
Click Next.
Step 6: Timestamp Extraction
The Timestamp Field will be automatically set to timestamp.
Click Next.
Step 7: Review and Create
Review your configuration.
(Optional) Use the preview feature to confirm how Scanner will match S3 keys and parse your log files.
When everything looks correct, click Create Source.
Once created, Scanner will begin monitoring your S3 bucket for new GCP Audit logs, normalize them to ECS, index them into your selected destination, and make them available for search and detection.
Troubleshooting
For issues with the Terraform deployment, infrastructure, or log flow to S3, refer to the gcp-to-scanner-collect README.
If logs are reaching S3 but not appearing in Scanner, check:
Bucket and Prefix: Verify you're pointing Scanner to the correct S3 bucket and the
gcp/audit/prefixFile Format: Confirm that files in S3 are gzipped JSONL format (
.jsonl.gz)Source Configuration: Review your Scanner source settings, particularly the timestamp field and transformations
Permissions: Ensure Scanner has read permissions on the S3 bucket
Design Rationale
This pipeline batches logs in GCS, compresses them with gzip, and transfers them to S3 in bulk. The result is a 10x reduction in GCP egress costs compared to pushing individual log events via HTTP.
The alternative—pushing individual logs over HTTP—is what Pub/Sub does by default with HTTP push: each log event becomes its own individual HTTP request. This approach creates two critical problems at scale:
Infrastructure load: Millions or billions of individual HTTP requests per day overwhelm backend systems
Data transfer costs: Uncompressed logs create 10x more data to transfer, multiplying egress charges
This pipeline solves both:
Batching: Batches millions or billions of individual requests into only a few thousand daily file transfers
Compression: Gzip reduces raw JSON to ~10% of original size, dramatically lowering data transfer volume
Example: For a significant volume like 1TB/day of logs sent uncompressed via Pub/Sub to HTTP, egress costs run ~$40k/year. With batching and gzip compression, that same volume drops to ~$4k/year—a 10x savings.
Additional Resources
Last updated
Was this helpful?