I maintain several personal websites, and felt they should be backed up in case my Xen host has issues.
Step 1
Create an Amazon S3 account at http://aws.amazon.com/s3/. Once your account is created, you will need to create ‘credentials’, which will allow us to authenticate with S3. You can access this by going to the “Amazon -> Account -> AWS Identity and Access Management” then clicking ‘Security Credentials’ on the left, and then creating a ‘Access Key’. These keys are composed of 2 parts: a public portion, called the ‘Access Key ID’, and a private portion (never to be shared) called the ‘Secret Access Key’.
Step 2
We need to install a program called ‘s3cmd’. This will allow us to interface with Amazon S3 via the command line. On Ubuntu:
sudo apt-get install s3cmd
Step 3
Now we need to setup s3cmd to save settings about our setup. Make sure you have the Access Key ID and the Secret Key. Run the following command to get started:
s3cmd --configure
From here you will get an interactive prompt:
Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3 Access Key: 231231232 Secret Key: 213123123 Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: ubuntu Path to GPG program [/usr/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP and can't be used if you're behind a proxy Use HTTPS protocol [No]: yes New settings: Access Key: 231231232 Secret Key: 213123123 Encryption password: ubuntu Path to GPG program: /usr/bin/gpg Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n]
I chose to pick “Use HTTPS protocol”, which will upload it via a secure method. This is a good idea, although will slightly impact performance and may use slightly more traffic. In addition, s3cmd also will encrypt the files using gpg, which means that if someone broke into your s3 account, they would still need that pass phrase to decrypt your data.
Step 4
We can now test s3cmd and try to upload a file. You will need to create a ‘bucket’, which is where our files for this project are stored. You can have many buckets, so if you want to separate your projects you could create additional ones for each one. When we make a bucket name, they are globally visible in S3, so you will want to pick something not likely to be taken:
s3cmd mb s3://sharms.org-wordpress-blog
If that command runs successfully, we now have a new bucket called ‘sharms.org-wordpress-blog’. If not, pick a different name and try again. Now we can test uploading a file:
s3cmd put /home/sharms/testfile.txt s3://sharms.org-wordpress-blog # Verify its where we think it is s3cmd ls s3://sharms.org-wordpress-blog
Step 5
Using bash, we can automate this, and backup all of our files, daily, weekly, monthly etc. Here is an example, which I put at ‘/usr/local/bin/backup_blog_to_s3.sh’:
bucket="s3://sharms.org-wordpress-blog" logger -t backup_blog_to_s3.sh "Backing up sharms.org blog to S3" cd /var/www tar -cf sharms.org.tar blog bzip2 -9 sharms.org.tar s3cmd put sharms.org.tar.bz2 ${bucket} rm /var/www/sharms.org.tar.bz2 logger -t backup_blog_to_s3.sh "Backing up MySQL database to S3" mysqldump sharms-wordpress -u databaseuser -p databasepassword -a -r sharms-wordpress.sql bzip2 -9 sharms-wordpress.sql s3cmd put sharms-wordpress.sql.bz2 ${bucket} rm sharms-wordpress.sql.bz2
You can see from the example that we backup all of the files in the ‘blog’ directory, and export all of our data out of a MySQL database. You can even change the file names so they have the date when they were backed up:
tar -cf sharms.org-wordpress-$(date +%d%m%y) blog
Running Automatically
If we wanted to backup the system every day, this is very easy:
sudo cp /usr/local/bin/backup_blog_to_s3.sh /etc/cron.daily sudo chmod 755 /etc/cron.daily
Security Notes
When considering this setup, you are most vulnerable to someone obtaining access to your server, and getting your Amazon keys. You can always revoke them from the Amazon Webservices Control Panel, but you don’t want an attacker using your S3 account for nefarious means. Beyond the scope of this document, you could setup a user called ‘backups’, and make the file ‘~backups/.s3cmd’ with the permissions ’600′, to stop other users from looking at it’s contents.
Related posts: