Google Cloud Platform (GCP) Audit

GCP Audit logs provide visibility into administrative activities, API calls, and access patterns across your Google Cloud infrastructure. This guide walks through setting up GCP Audit logs in Scanner Collect, so that logs can be ingested from S3, normalized, and indexed for search and detection.

Overview

The GCP-to-S3 pipeline uses a modular Terraform setup to create a serverless architecture that automatically collects and delivers GCP Audit logs to your S3 bucket:

Cloud Logging routes logs to a Pub/Sub topic
Pub/Sub Push Subscription batches entries and writes to Google Cloud Storage (GCS)
Cloud Function transfers batched files from GCS to your S3 bucket with compression (idempotent), then deletes the temporary GCS files
Cleanup Function retries failed transfers every 30 minutes

Expected latency: 2-3 minutes from log generation to S3 availability.

Prerequisites

Before setting up GCP Audit logs in Scanner, you must:

Have a GCP project with appropriate permissions - You'll need permissions to create Cloud Logging sinks, Pub/Sub topics, GCS buckets, and Cloud Functions
Have an AWS account - Required to deliver logs to S3. The Terraform code can be configured to create a new S3 bucket or point to an existing one
Install Terraform locally - Required to deploy the pipeline infrastructure

Once the Terraform setup is complete and logs are flowing to S3, you can proceed with configuring the source in Scanner Collect.

Part 1: Deploy the GCP-to-S3 Pipeline

From a terminal, clone the gcp-to-scanner-collect repository:

git clone https://github.com/scanner-inc/gcp-to-scanner-collect.git
cd gcp-to-scanner-collect

Then follow the README to deploy the Terraform infrastructure.

In main.tf, choose the configuration that matches your S3 setup:

To create a new S3 bucket:

Uncomment the audit_logs_pipeline module.

To use an existing S3 bucket:

Uncomment the audit_logs_to_existing_bucket module.

Follow the README's Setup section to:

Configure your terraform.tfvars file with required variables
Initialize and deploy the Terraform infrastructure
Verify that logs are flowing to your S3 bucket under gcp/audit/

Note: Logs typically appear in S3 within 2-3 minutes of being generated. If you don't see logs immediately after deployment, wait a few minutes before proceeding.

Once logs are flowing to S3, proceed to Part 2 below to configure the Scanner source.

Part 2: Create a Scanner Source

Once logs are flowing to S3, configure the source in Scanner Collect to index and search them.

Step 1: Create a New Source

Navigate to the Collect tab in the Scanner UI.

Navigate to the Index Rules tab from the left hand menu
Click the '+' icon in the upper right corner to create a new Index Rule
Select From AWS S3 as your data source
Choose Google Cloud Platform (GCP) - Audit logs.

Click Continue.

Step 2: Configure the Source

Set a Display Name, such as my-org-gcp-audit.
Leave File Type as JsonLines.
Leave Compression as Gzip.

Click Next.

Step 3: Set the Origin (S3 Bucket)

Select the S3 bucket where your GCP logs are being delivered by Terraform.
Enter the Bucket Prefix: gcp/audit/ (logs are organized by date: gcp/audit/YYYY/MM/DD/hh/mm_ssZ_*.jsonl.gz)
No additional File Regex configuration is needed.

Click Next.

Step 4: Set the Destination

Choose the Scanner index where GCP Audit logs should be stored for search and detection.
Leave the Source Label set to gcp.

Click Next.

Step 5: Transform and Enrich

Keep the default enrichment settings:
- Normalize to ECS - GCP Audit (maps GCP log fields to Elastic Common Schema for cross-source queries and detections)
- Parse JSON Columns (automatically parses stringified JSON if present)
(Optional) Add additional transformation or enrichment steps if desired.

Click Next.

Step 6: Timestamp Extraction

The Timestamp Field will be automatically set to timestamp.

Click Next.

Step 7: Review and Create

Review your configuration.
(Optional) Use the preview feature to confirm how Scanner will match S3 keys and parse your log files.

When everything looks correct, click Create Source.

Once created, Scanner will begin monitoring your S3 bucket for new GCP Audit logs, normalize them to ECS, index them into your selected destination, and make them available for search and detection.

Troubleshooting

For issues with the Terraform deployment, infrastructure, or log flow to S3, refer to the gcp-to-scanner-collect README.

If logs are reaching S3 but not appearing in Scanner, check:

Bucket and Prefix: Verify you're pointing Scanner to the correct S3 bucket and the gcp/audit/ prefix
File Format: Confirm that files in S3 are gzipped JSONL format (.jsonl.gz)
Source Configuration: Review your Scanner source settings, particularly the timestamp field and transformations
Permissions: Ensure Scanner has read permissions on the S3 bucket

Design Rationale

This pipeline batches logs in GCS, compresses them with gzip, and transfers them to S3 in bulk. The result is a 10x reduction in GCP egress costs compared to pushing individual log events via HTTP.

The alternative—pushing individual logs over HTTP—is what Pub/Sub does by default with HTTP push: each log event becomes its own individual HTTP request. This approach creates two critical problems at scale:

Infrastructure load: Millions or billions of individual HTTP requests per day overwhelm backend systems
Data transfer costs: Uncompressed logs create 10x more data to transfer, multiplying egress charges

This pipeline solves both:

Batching: Batches millions or billions of individual requests into only a few thousand daily file transfers
Compression: Gzip reduces raw JSON to ~10% of original size, dramatically lowering data transfer volume

Example: For a significant volume like 1TB/day of logs sent uncompressed via Pub/Sub to HTTP, egress costs run ~$40k/year. With batching and gzip compression, that same volume drops to ~$4k/year—a 10x savings.

Additional Resources

PreviousGithub NextGoogle Workspace

Last updated 3 months ago

Was this helpful?

Good evening

hashtagOverview

hashtagPrerequisites

hashtagPart 1: Deploy the GCP-to-S3 Pipeline

hashtagPart 2: Create a Scanner Source

hashtagStep 1: Create a New Source

hashtagStep 2: Configure the Source

hashtagStep 3: Set the Origin (S3 Bucket)

hashtagStep 4: Set the Destination

hashtagStep 5: Transform and Enrich

hashtagStep 6: Timestamp Extraction

hashtagStep 7: Review and Create

hashtagTroubleshooting

hashtagDesign Rationale

hashtagAdditional Resources

Overview

Prerequisites

Part 1: Deploy the GCP-to-S3 Pipeline

Part 2: Create a Scanner Source

Step 1: Create a New Source

Step 2: Configure the Source

Step 3: Set the Origin (S3 Bucket)

Step 4: Set the Destination

Step 5: Transform and Enrich

Step 6: Timestamp Extraction

Step 7: Review and Create

Troubleshooting

Design Rationale

Additional Resources