This site uses cookies. Continue to use the site as normal if you are happy with this, or read more about cookies and how to manage them.

×

This site uses cookies. Continue to use the site as normal if you are happy with this, or read more about cookies and how to manage them.

×

Using Amazon EC2/EBS/S3 for Automated Backups

I'll not go into a long winded argument about why you should backup your data. If you're reading this, you know it's a good idea and you like to learn how to push your data off site into the cloud for as low as $0.11 per GB per month. Note that I'm talking about the European pricing here, the US pricing is slightly cheaper.

In essence, what this blog will show you how to do is:

  1. Create an Amazon EC2 persistent volume of your desired size
  2. Start up an Amazon EC2 cloud machine
  3. Mount your persistent volume to this temporary machine
  4. rsync your data over to the machine
  5. Shutdown the machine

All of this can be done more or less with a single script that you can schedule to run at any time. It will cost you around $0.11 per hour whilst the machine is running for your upload and then $0.11 per GB per month for storage.

So let's say you run the backup script once a day, you'll be charged the minimum of an hour of use a day. And then let's say you upload around 10GB in one month and already have stored 50GB. Your charges would be:

  • 30 hours of small EC2 instance: $3.30
  • 50GB of storage for the month: $5.50
  • 10GB uploads fo the month: $1.00
  • Total: $9.80

So for less than $10 a month you can back up and store 50 GB to a nice and safe cloud chunk of storage, not bad.

Getting started with Amazon EC2

To even attempt this solution, you'll need to sign up with Amazon EC2 and get to grips with the basics. A good starting point is the latest Amazon EC2 Getting Started Guide.

The most important part of the getting started guide that we're interested in is getting your EC2 account setup and the API tools installed.

You can get the Amazon EC2 API Tools here.

Whilst working out how Amazon EC2 works, it wouldn't hurt to have a play with their web based console which allows you to work with the various parts of EC2 in a nice web interface: Amazon EC2 Console. It is by far the easiest way to get an understanding of the service and see what is going on with your account.

The 2 bits of security information you really need to get started are:

  1. Certificate private key: pk-XXXXXXXXXXXXXXXXXXX.pem
  2. Certificate public key: cert-XXXXXXXXXXXXXXXXXXX.pem

These identify you when accessing the Amazon EC2 API. You can generate them under the "Your Account" -> "Security Credentials" section of the Amazon Web Services website.

Creating your backup volume

The backup volume is your permanent storage space where your backups will go. You can create this on the Amazon EC2 console quiet easily.

On my account I've got a 30GB volume with ID "vol-2e0aef47". One of the annoying things about EC2 is that you can't really put any metadata about volumes/snapshots/instances/etc.. so it can sometimes be hard to identify which item is which without a human understandable description so keep track of your IDs.

Once you've got this volume, you don't need to create it each time, this will just sit there forever and store your data for $0.11 per GB per month.

SSH keypair

So that you can connect to any EC2 instances you startup, you must create an SSH keypair and let EC2 know about it. You can do this through the EC2 web console under the "Key Pairs" section. You'll need the key on your machine so you can logon later and you'll need to know the name so that EC2 can assign the right key to your instance when it starts up as you can have many keypairs on your account.

Creating your EC2 instance to upload and mount the volume

Now we are onto the command line as we want to be able to script the startup and shutdown of your EC2 instance so that you can send your data to it automatically.

Below is the script that I've cobbled together from other bits on the web and a bit of experimentation. Instead of explaining each bit in turn I've commented it liberally to help explain what is going on. Take a read and we'll continue in a moment.

### Setup EC Tools
export EC2_HOME=~/tools/ec2-api-tools-1.3-41620
PATH=$EC2_HOME/bin:$PATH
### Location of your account security credentials
export EC2_PRIVATE_KEY=~/Documents/pk-XXXXXXXXXXXXXXXXXXX.pem
export EC2_CERT=~/Documents/cert-XXXXXXXXXXXXXXXXXXX.pem
### Override the default US EC2 location if you want to use the European data centre
export EC2_URL=https://eu-west-1.ec2.amazonaws.com
### Java location
export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre/

### Machine image you want to use as the base for the machine you want to start up, more on this later.
export amiid="ami-0db89079"
### SSH key to use to setup the machine with. In the EC2 console you need to setup an SSH key that you can connect to your new machine with as by default they do not allow access by any other means. This is my own keyname on the console.
export key="ec2-backups"
### Where do launch your machine. Europe for me.
export zone="eu-west-1a"
### Local SSH key to connect to machine with. Location of the actual SSH key that you also put in the EC2 console.
export id_file="/home/kieran/ec2-backups.pem"
### Volume to mount to machine. The volume that we've previously created.
export vol_name="vol-2e0aef47"
### Where to mount the volume on our new machine.
export mount_point="/mnt/vol"
### Device name for the mount
export device_name="/dev/sdf"
### Security group. To help me identify my machine, I use security groups as EC2 doesn't have real instance labels.
export group="backups"

#
# Start the instance
# Capture the output so that
# we can grab the INSTANCE ID field
# and use it to determine when
# the instance is running
#
echo Launching AMI ${amiid}
${EC2_HOME}/bin/ec2-run-instances ${amiid} -z ${zone} -k ${key} --group ${group}  > /tmp/a
if [ $? != 0 ]; then
   echo Error starting instance for image ${amiid}
   exit 1
fi
export iid=`cat /tmp/a | grep INSTANCE | cut -f2`

#
# Loop until the status changes to 'running'
#
sleep 30
echo Starting instance ${iid}
export RUNNING="running"
export done="false"
while [ $done == "false" ]
do
   export status=`${EC2_HOME}/bin/ec2-describe-instances ${iid} | grep INSTANCE | cut -f6`
   if [ $status == ${RUNNING} ]; then
      export done="true"
   else
      echo Waiting...
      sleep 10
   fi
done
echo Instance ${iid} is running

#
# Attach the volume to the running instance
#
echo Attaching volume ${vol_name}
${EC2_HOME}/bin/ec2-attach-volume ${vol_name} -i ${iid} -d ${device_name}
sleep 15

#
# Loop until the volume status changes
# to 'attached'
#
export ATTACHED="attached"
export done="false"
while [ $done == "false" ]
do
   export status=`${EC2_HOME}/bin/ec2-describe-volumes | grep ATTACHMENT | grep ${iid} | cut -f5`
   if [ $status == ${ATTACHED} ]; then
      export done="true"
   else
      echo Waiting...
      sleep 10
   fi
done
echo Volume ${vol_name} is attached

export EC2_HOST=`ec2-describe-instances | grep ${iid} | tr 't' 'n'
    | grep amazonaws.com`

### Important trick here. Because you will be starting up a different machine every time you run this script, you'll be forced to say yes to accepting the change of host for the SSH key, the options here make sure the doesn't happen and you can run this completely automated without human interaction.
### This line logs on and mounts our volume to our machine.
ssh -i ${id_file} -o "StrictHostKeyChecking no" root@$EC2_HOST "mkdir /mnt/data-store && mount ${device_name} /mnt/data-store"

### Run rsync, whatever options you'd like, here are a couple of examples I use.
rsync -e "ssh -i ${id_file}" -avRz --exclude-from=exclude-from-backup.txt --include-from=include-in-backup.txt /home/kieran/ root@$EC2_HOST:/mnt/data-store/kierans-laptop/

#rsync -e "ssh -i ${id_file}" -avRz --exclude-from=exclude-from-backup.txt --include-from=include-in-backup.txt /home/kieran/ root@$EC2_HOST:/mnt/data-#store/kierans-laptop/ > kieranhomebackuplog.txt

#rsync -e "ssh -i ${id_file}" -avRz --exclude-from=exclude-from-backup.txt --include-from=include-in-backup.txt /linkstation/ root@$EC2_HOST:/mnt/data-store/kierans-laptop/
#rsync -e "ssh -i ${id_file}" -avRz --exclude-from=exclude-from-backup.txt --include-from=include-in-backup.txt /linkstation/ root@$EC2_HOST:/mnt/data-#store/kierans-laptop/ > linkstationbackuplog.txt

### Clean up. Disconnect the volume
ssh -i ~/ec2-backups.pem root@$EC2_HOST "umount /mnt/data-store"
### Detact volume from machine
ec2-detach-volume ${vol_name} -i ${iid}
### Shutdown instance
ec2-terminate-instances ${iid}

Output

kieran@kieran-laptop:~/Backup$ ./run-ec2-backup.sh
Launching AMI ami-0db89079
Starting instance i-91d837e6
Instance i-91d837e6 is running
Attaching volume vol-2e0aef47
ATTACHMENT    vol-2e0aef47    i-91d837e6    /dev/sdf    attaching    2009-10-22T09:45:18+0000
Volume vol-2e0aef47 is attached
Warning: Permanently added 'ec2-79-125-42-121.eu-west-1.compute.amazonaws.com,79.125.42.121' (RSA) to the list of known hosts.
sending incremental file list
/home/kieran/
/home/kieran/Pictures/
/home/kieran/Pictures/blogs/
/home/kieran/Pictures/blogs/ec2backup/
/home/kieran/Pictures/blogs/ec2backup/createdvolume.png
/home/kieran/Pictures/blogs/ec2backup/createvolume.png
ATTACHMENT    vol-2e0aef47    i-91d837e6    /dev/sdf    detaching    2009-10-22T09:45:18+0000
INSTANCE    i-91d837e6    running    shutting-down
kieran@kieran-laptop:~/Backup$

So there you have it, that should just about do it. One of the important bits in there is the use of an AMI. An AMI is a base image for a machine that you want to startup. I've used an unaltered Ubuntu 9.04 image that someone else has setup and made publicly available for other EC2 users. You may want to customise your image and take a copy yourself to protect you from someone getting rid of that public image.

Ongoing maintenance

Although in theory your volume will stay there forever nicely backed up, you may not trust Amazon completely. Handily you can easily take snapshots of your volume and back them up to Amazon S3. You can even take that snapshot and create a new volume from it to get a handily accessibly copy of your data.

You may also want to resize your volume as the size you choose at the start might be too small. To do this, take a snapshot and create a new volume of a larger size from that snapshot. You can then just alter your backup script to use this new volume.

There are no doubt lots of refinements to this script you could come up with, so if you have any ideas of comments, please leave a comment below.