HOW TO: upload content to a private S3 Bucket…
… and scan the content for viruses.
Recently, one of our clients asked us to build a private, unlimited storage service in the cloud. This service would be used for saving a wide variety of content. Taking into account the content would be uploaded by many different users, the data needed to be scanned for viruses after uploading.
By: Dennis van Bavel
We realized this request by building a private S3 Bucket by Amazon Web Services, and thereafter added a virus scanner.
In this article we firstly describe how we built the private S3 Bucket for uploading content with a pre-signed URL. Secondly, we’ll clarify how we added the virus scanner ClamAV in the process.
Amazon Simple Storage Service (S3) is a service with unlimited storage in the cloud. There are different ways to upload content into a S3 Bucket: we chose a pre-signed URL.
Step 1: make a pre-signed URL
The first step is creating a service for ‘making the pre-signed URL’. This can be done by using your own service or by using a Lambda service. Visit the Amazon Web Services website for background information (Java, .net en Ruby).
Step 2: activate the API Gateway
The next step is activating the API Gateway for the Lambda Service. With the API Gateway you have multiple options to manage your service (traffic, security, monitoring). After you’ve implemented the API gateway, run your test set to check if everything is working properly.
Now we can upload content to a private S3 Bucket with a pre-signed URL.
When you activate the virus scanner for scanning this file, it will identify the file as being infected. Now the virus scan is ready to scan the whole bucket.
The whole process
By uploading content to the S3 Bucket, Lambda (S3-CreateObject) will be triggered and the content will be pushed to the virus scanner.
ClamAV will scan all data. If it’s safe, the scanner will push the data to another bucket. If the data is infected, it will be moved to a ‘quarantine’ S3 Bucket. The ‘safe’ S3 Bucket is accessible for the customer. The ‘quarantine’ S3 Bucket has read-only access, to check what files are infected.
The S3 Bucket with the original objects is not accessible; its only a trigger for the Lambda function. After successfully processing and scanning the object, ClamAV will clean-up the private S3 Bucket.