How to create a cheap,easy and simple AWS S3 Backup strategy

When you store your objects on Amazon S3, but even though there are several features such as object lifecycle and versioning, there isn’t an out-of-the-box solution for cases where you just want to save an exact copy of objects that will never change (i.e. user generated content, files that

This is for you if…

  • Your objects never change, and you want to back up exact copies of them.
  • Having to wait for a few hours in case you need to recover a backup is not a problem for you.
  • You are familiar with S3, EC2 and the AWS CLI and comfortable using it.

This *might* not be for you if …

  • Your files change over time (i.e. create versions of them).
  • You need a restore from backup to be quick.
    You would rather stay away from unix commands and/or prefer a GUI (drag&drop backup/restore).

 

My Scenario

I’ve got about 2Tb of data on an Amazon S3 bucket that I’ve migrated from Rackspace (more on this on a future post) and I need to back them up… in case s**t goes wrong.

 

My solution

PART ONE – Our Backup Bucket

  1. We need to create a separate S3 bucket, where we will copy every new object in our main bucket. In order to follow convention and for this tutorial, we’ll name it mybucket-backup, so we now have two S2 buckets: mybucket and mybucket-backup.
  2. Next step is to enable versioning and configure the lifecycle policy to use Glacier on the backup bucket:
    1. Go into the bucket and show its properties
    2. Click on Versioning, and then Enable Versioning
    3. Now go to the Lifecycle tab and Add Rule, a dialog will open.
      We want the following settings:

      1. Apply the rule to: Whole Bucket
      2. Actions on objects:
        Untick everything else.
      3. Rule Name: Backup -> Create
        If we did everything right, we should see the following with our new rule:
        Screen Shot 2016-02-17 at 12.44.55

PART TWO – Our EC2 instance

Now we are going to create a background task on a Free Tier EC2 machine that will keep both mybucket and mybucket-backup synchronised.

  1. Spin up an EC2 instance from your console, t2.micro will be enough.
  2. Connect to your instance through SSH (help?)
  3. Edit your crontab:
    crontab -e
  4. Add the following line and save the file

We’re done!

How it works – Our backup strategy

All the changes we make onto mybucket will be replicated onto mybucket-backup every 2 hours.
Simultaneously, every file on mybucket-backup that’s older that 1 day will be stored using Amazon Glacier, which will save us costs for the backup storage.

Feel free to reach out to me if you have any question or problem!

Leave a Reply