Migrate MongoDB GridFS Data to AWS S3
MongoDB GridFS can be used for storing and retrieving large files such as images, audio files, video files. But it’s advisable to store this binary data outside the database itself, due to performance issues. In this post, we’ll see how we can migrate our existing data in MongoDB GridFS to AWS S3.
Pre-requisite: A working MongoDB setup. Here we are using a local installation of MongoDB, but you can also use the same process for a replica set by just modifying the connection string in the code.
Step 1: Create an S3 bucket named gridfs.
Step 2: Create an IAM role for EC2 with following IAM permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::gridfs/*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::gridfs"
}
]
}
Step 3: Attach this role to the EC2 instance where our script will be executed. I am executing the script on the same server where mongo is running, so I’ve attached this role to my mongo instance. If your mongo is not running on AWS, then you can use IAM access keys.
Step 4: Let’s put some data first. We’ll push an image to 2 collections names test and test2(in example database). Download the code given at https://raw.githubusercontent.com/vinycoolguy2015/awslambda/master/gridfs_upload.py. I’ve referenced this code from https://hubpages.com/technology/Storing-large-objects-in-MongoDB-using-Python
Next download an image which we’ll upload to GridFS and execute the script. In the code,I’ve used image name as red_fort.jpg and you can change it accordingly. Next change the collection name from test to test2 and again execute the script. This will create following collections in example database.
Step 5: Now copy the script given at https://raw.githubusercontent.com/vinycoolguy2015/awslambda/master/gridfs_to_s3.py which we’ll use to push data to S3. In the script change the value of filepath variable to the path where the data will be downloaded locally. Update the same path on line 36(for aws s3 sync command)
Step 6: Execute the script. Once the script is executed successfully, you’ll see data uploaded in S3.
As you can see, now our data is uploaded to S3 in the folders corresponding to the collection names.