SCANNER
Search
K
Comment on page

S3 Integration

Getting started is fast via CloudFormation, Terraform, or Pulumi
After you sign up, an instance of Scanner will be launched in the AWS region where your S3 buckets are located.

Concierge Onboarding

The Scanner team offers a concierge onboarding service to all new customers where we walk through the onboarding process together, executing the CloudFormation template, choosing files to index, and making sure everything is running smoothly. This meeting usually takes 30 minutes, with an optional additional 30 minutes for questions and product feedback.
To analyze your data, the CloudFormation template will configure a few things in your AWS account:
  • An IAM role with permission to read the S3 bucket(s) containing the log files that you want to index.
  • A new S3 bucket in your account where Scanner will store its skip-list index files.

CloudFormation

Use this CloudFormation template to configure your AWS account. You can get the ScannerAWSAccountID and ScannerExternalID from the Scanner team or from the Scanner UI (coming soon).
---
AWSTemplateFormatVersion: "2010-09-09"
Description: Scanner S3 Indexing Integration
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Scanner Auth Parameters
Parameters:
- ScannerExternalID
ParameterLabels:
ScannerExternalID:
default: Scanner External ID
ScannerAWSAccountID:
default: Scanner AWS Account ID
S3BucketToIndex:
default: S3 Bucket to Index
Parameters:
ScannerExternalID:
Description: Scanner provides an External ID to use here.
Type: String
ScannerAWSAccountID:
Description: Scanner provides its AWS Account ID to use here.
Type: String
S3BucketToIndex:
Description: Enter the name of the S3 bucket that you would like Scanner to index.
Type: String
Resources:
ScannerRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
AWS:
Ref: ScannerAWSAccountID
Action: sts:AssumeRole
Condition:
StringEquals:
sts:ExternalId:
Ref: ScannerExternalID
ScannerIndexFilesBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub
- "scanner-index-files-${Suffix}"
- Suffix: !Ref ScannerExternalID
AccessControl: Private
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
BucketEncryption:
ServerSideEncryptionConfiguration:
- BucketKeyEnabled: true
ServerSideEncryptionByDefault:
SSEAlgorithm: "aws:kms"
LifecycleConfiguration:
Rules:
- Id: ExpireTagging
Status: Enabled
TagFilters:
- Key: "Scnr-Lifecycle"
Value: "expire"
ExpirationInDays: 1
- Id: AbortIncompleteMultiPartUploads
Status: Enabled
AbortIncompleteMultipartUpload:
DaysAfterInitiation: 1
LogsBucketEventNotificationTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: "scnr-LogsBucketEventNotificationTopic"
LogsBucketEventNotificationTopicPolicy:
Type: AWS::SNS::TopicPolicy
Properties:
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: s3.amazonaws.com
Action: sns:Publish
Resource: "*"
Topics:
- !Ref LogsBucketEventNotificationTopic
LogsBucketEventNotificationTopicSubscription:
Type: AWS::SNS::Subscription
Properties:
TopicArn: !Ref LogsBucketEventNotificationTopic
Endpoint: !Sub
- "arn:aws:sqs:${InstanceRegion}:${ScannerAWSAccountID}:scnr-S3ObjectCreatedNotificationsQueue"
- InstanceRegion: !Ref "AWS::Region"
ScannerAWSAccountID: !Ref ScannerAWSAccountID
Protocol: "sqs"
RawMessageDelivery: true
ScannerPolicy:
Type: AWS::IAM::Policy
Properties:
PolicyName: ScannerPolicy
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- s3:ListAllMyBuckets
- s3:GetBucketLocation
- s3:GetBucketTagging
Resource: "*"
- Effect: Allow
Action:
- s3:ListBucket
Resource:
- Fn::Join:
- ""
- - "arn:aws:s3:::"
- Ref: S3BucketToIndex
- Effect: Allow
Action:
- s3:GetObject
Resource:
- Fn::Join:
- ""
- - "arn:aws:s3:::"
- Ref: S3BucketToIndex
- "/*"
- Effect: Allow
Action:
- s3:GetObjectTagging
- s3:PutObjectTagging
- s3:ListBucket
- s3:CreateBucket
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
Resource:
- Fn::Join:
- ""
- - "arn:aws:s3:::scanner-index-files-"
- Ref: ScannerExternalID
- Fn::Join:
- ""
- - "arn:aws:s3:::scanner-index-files-"
- Ref: ScannerExternalID
- "/*"
Roles:
- Ref: ScannerRole
Outputs:
RoleARN:
Description: The ARN of the new Scanner IAM Role
Value:
Fn::GetAtt:
- ScannerRole
- Arn

Permissions needed to launch the CloudFormation template

To launch (and rollback) the template successfully, your IAM role will need the following IAM permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "sid1",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:DeleteBucket",
"s3:PutEncryptionConfiguration",
"s3:PutLifecycleConfiguration",
"s3:PutBucketPublicAccessBlock"
],
"Resource": "arn:aws:s3:::scanner-index-files-*"
},
{
"Sid": "sid2",
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:CreateRole",
"iam:DeleteRole",
"iam:GetRolePolicy",
"iam:DeleteRolePolicy",
"iam:PutRolePolicy",
"iam:AttachRolePolicy",
"iam:DetachRolePolicy",
],
"Resource": [
"arn:aws:iam::*:role/*-ScannerRole-*"
]
},
{
"Sid": "sid3",
"Effect": "Allow",
"Action": [
"cloudformation:CreateChangeSet",
"cloudformation:DeleteChangeSet",
"cloudformation:DescribeStacks",
"cloudformation:DescribeStackEvents",
"cloudformation:ListStacks",
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeChangeSet",
"cloudformation:ExecuteChangeSet",
"cloudformation:GetTemplateSummary"
],
"Resource": [
"*"
]
},
{
"Sid": "sid4",
"Effect": "Allow",
"Action": [
"s3:GetBucketNotification",
"s3:PutBucketNotification"
],
"Resource": [
"arn:aws:s3:::<s3_bucket_you_want_to_index>"
]
}
]
}

Setting up bucket notifications

To allow Scanner to index log files continuously, you will need to configure your S3 buckets to send "object created" notifications to an SQS queue running in your Scanner instance. This cannot be done directly in CloudFormation.
Here is how to do that manually using the AWS console.
Navigate to S3 > (Click on your bucket) > Properties.
Scroll down to Event notifications.
Click Create event notification
Give it an Event name. Optionally provide a Prefix and Suffix to filter down to a specific set of files.
Select the checkbox next to All object create events - s3:ObjectCreated:*.
Scroll down to Destination. Select SQS queue. Under Specify SQS queue, select Enter SQS queue ARN.
Enter the ARN of the SQS queue in you Scanner instance. The Scanner team will give this to you as part of your onboarding. Here is what it looks like:
arn:aws:sqs:<region>:<scanner_instance_aws_account_id>:scnr-S3ObjectCreatedNotificationsQueue
Click Save changes.
Object creation notifications should now be sent to your Scanner instance.

What to do if there is an existing, conflicting bucket event notification?

S3 buckets do not allow you to create multiple bucket event notifications with conflicting prefix/suffix filters. Basically, they require that each file be directed to one destination.
To solve this, you have a few options:
  • Remove the existing event notification that conflicts with the event notification you are creating for Scanner.
  • Use an SNS topic and Lambda functions to provide fanout for the event notifications so they may be sent to multiple destinations. See this post from AWS for more details on how to accomplish this.

Terraform

You can use this Terraform file to set up the IAM role, IAM policies,and S3 bucket to integrate with Scanner. You provide the value for s3_bucket_to_index, and Scanner provides the values for scanner_aws_account_id and scanner_external_id.
It will also update your S3 bucket to send s3:ObjectCreated notifications to the SQS queue in your Scanner instance.
variable "scanner_external_id" {
description = "Scanner provides an External ID to use here."
type = string
}
variable "scanner_aws_account_id" {
description = "Scanner provides its AWS Account ID to use here."
type = string
}
variable "s3_bucket_to_index" {
description = "Enter the name of the S3 bucket that you would like Scanner to index."
type = string
}
resource "aws_iam_role" "scanner_role" {
name = "ScannerRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = var.scanner_aws_account_id
}
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"sts:ExternalId" = var.scanner_external_id
}
}
}
]
})
}
resource "aws_sns_topic" "logs_bucket_event_notification_topic" {
name = "scnr-LogsBucketEventNotificationTopic"
}
resource "aws_sns_topic_policy" "logs_bucket_event_notification_topic_policy" {
arn = aws_sns_topic.logs_bucket_event_notification_topic.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = ["SNS:Publish"]
Effect = "Allow"
Principal = {
Service = "s3.amazonaws.com"
}
Resource = aws_sns_topic.logs_bucket_event_notification_topic.arn
}
]
})
}
resource "aws_s3_bucket_notification" "s3_bucket_to_index_notification" {
bucket = var.s3_bucket_to_index
topic {
topic_arn = aws_sns_topic.logs_bucket_event_notification_topic.arn
events = ["s3:ObjectCreated:*"]
}
}
resource "aws_sns_topic_subscription" "logs_bucket_event_notification_topic_subscription" {
topic_arn = aws_sns_topic.logs_bucket_event_notification_topic.arn
protocol = "sqs"
endpoint = "arn:aws:sqs:${data.aws_region.current.name}:${var.scanner_aws_account_id}:scnr-S3ObjectCreatedNotificationsQueue"
}
resource "aws_s3_bucket" "scanner_index_files_bucket" {
bucket = "scanner-index-files-${var.scanner_external_id}"
}
resource "aws_s3_bucket_public_access_block" "scanner_index_files_bucket_public_access_block" {
bucket = aws_s3_bucket.scanner_index_files_bucket.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_server_side_encryption_configuration" "scanner_index_files_bucket_encryption_config" {
bucket = aws_s3_bucket.scanner_index_files_bucket.id
rule {
bucket_key_enabled = true
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_lifecycle_configuration" "scanner_index_files_bucket_lifecycle_configuration" {
bucket = aws_s3_bucket.scanner_index_files_bucket.id
rule {
id = "ExpireTagging"
status = "Enabled"
filter {
tag {
key = "Scnr-Lifecycle"
value = "expire"
}
}
expiration {
days = 1
}
}
rule {
id = "AbortIncompleteMultiPartUploads"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 1
}
}
}
resource "aws_iam_policy" "scanner_policy" {
name = "ScannerPolicy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:GetBucketTagging"
]
Resource = "*"
},
{
Effect = "Allow"
Action = "s3:ListBucket"
Resource = "arn:aws:s3:::${var.s3_bucket_to_index}"
},
{
Effect = "Allow"
Action = "s3:GetObject"
Resource = "arn:aws:s3:::${var.s3_bucket_to_index}/*"
},
{
Effect = "Allow"
Action = [
"s3:GetObjectTagging",
"s3:PutObjectTagging",
"s3:ListBucket",
"s3:CreateBucket",
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
]
Resource = [
"arn:aws:s3:::scanner-index-files-${var.scanner_external_id}",
"arn:aws:s3:::scanner-index-files-${var.scanner_external_id}/*"
]
}
]
})
}
resource "aws_iam_policy_attachment" "scanner_policy_attachment" {
name = "ScannerPolicyAttachment"
roles = [aws_iam_role.scanner_role.name]
policy_arn = aws_iam_policy.scanner_policy.arn
}
output "scanner_role_arn" {
description = "The ARN of the new Scanner IAM Role"
value = aws_iam_role.scanner_role.arn
}
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

Pulumi

You can use this Pulumi Typescript function to set up the IAM role, IAM policies,and S3 bucket to integrate with Scanner. You provide the value of dataLakeS3Bucket to indicate which S3 bucket you want Scanner to index. Scanner will provide the values for the scannerInstanceStsExternalId, scannerInstanceAwsAccountId, and scannerInstanceSqsIndexQueueArn.
import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
export function setUpScannerInfra(
dataLakeS3Bucket: aws.s3.Bucket,
scannerInstanceStsExternalId: string,
scannerInstanceAwsAccountId: string,
scannerInstanceSqsIndexQueueArn: string,
): { scannerRole: aws.iam.Role } {
const scannerRole = new aws.iam.Role("ScannerRole", {
name: "ScannerRole",
assumeRolePolicy: aws.iam.getPolicyDocumentOutput({
statements: [
{
actions: ["sts:AssumeRole"],
effect: "Allow",
principals: [{
type: "AWS",
identifiers: [scannerInstanceAwsAccountId],
}],
conditions: [{
test: "StringEquals",
variable: "sts:ExternalId",
values: [scannerInstanceStsExternalId],
}],
},
],
}).json,
});
const logsBucketsNotificationTopic = new aws.sns.Topic(
"LogsBucketsNotificationTopic",
{
policy: aws.iam.getPolicyDocumentOutput(
{
statements: [
{
effect: "Allow",
principals: [
{
type: "Service",
identifiers: ["s3.amazonaws.com"],
},
],
actions: ["SNS:Publish"],
resources: ["*"],
},
],
},
{ provider },
).json,
},
);
new aws.sns.TopicSubscription(
"ScannerLogsBucketsNotificationTopicSubscription",
{
topic: logsBucketsNotificationTopic.arn,
protocol: "sqs",
endpoint: scannerInstanceSqsIndexQueueArn,
rawMessageDelivery: true,
},
);
new aws.s3.BucketNotification("DataLakeBucketNotification", {
bucket: dataLakeS3Bucket.id,
topics: [{
topicArn: logsBucketsNotificationTopic.arn,
events: ["s3:ObjectCreated:*"],
}],
});
const scannerIndexFilesS3BucketName = `scanner-index-files-${scannerInstanceStsExternalId}`;
const scannerIndexFilesS3Bucket = new aws.s3.Bucket("ScannerIndexFilesBucket", {
bucket: scannerIndexFilesS3BucketName,
acl: "private",
lifecycleRules: [
{
id: "ExpireTagging",
enabled: true,
tags: {
'Scnr-Lifecycle': 'expire',
},
expiration: {
days: 1,
}
},
{
id: "AbortIncompleteMultiPartUploads",
enabled: true,
abortIncompleteMultipartUploadDays: 1,
},
],
serverSideEncryptionConfiguration: {
rule: {
applyServerSideEncryptionByDefault: {
sseAlgorithm: "aws:kms",
},
bucketKeyEnabled: true,
}
},
}, {});
new aws.s3.BucketPublicAccessBlock("ScannerIndexFilesBucketPublicAccessBlock", {
bucket: scannerIndexFilesS3Bucket.id,
blockPublicAcls: true,
blockPublicPolicy: true,
ignorePublicAcls: true,
restrictPublicBuckets: true,
});
const scannerPolicy = new aws.iam.Policy("ScannerPolicy", {
description: "Allow ScannerRole to interact with data lake S3 bucket",
path: "/",
policy: aws.iam.getPolicyDocumentOutput({
statements: [
{
actions: [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation",
"s3:GetBucketTagging",
],
effect: "Allow",
resources: ["*"],
},
{
actions: ["s3:ListBucket"],
effect: "Allow",
resources: [
dataLakeS3Bucket.arn,
],
},
{
actions: ["s3:GetObject"],
effect: "Allow",
resources: [
pulumi.interpolate `${dataLakeS3Bucket.arn}/*`,
],
},
{
actions: ["s3:*"],
effect: "Allow",
resources: [
scannerIndexFilesS3Bucket.arn,
pulumi.interpolate `${scannerIndexFilesS3Bucket.arn}/*`
],
},
],
}).json,
});
new aws.iam.RolePolicyAttachment("ScannerRpa", {
role: scannerRole.name,
policyArn: scannerPolicy.arn,
});
return {
scannerRole,
};
}
You can include this function in your Pulumi codebase or use it as a starting point as you update your infrastructure code to support what Scanner needs.