15 Feb

Backups, Automated and Off Site

One of the biggest issues in running a server1 is making sure if everything disappears you can be up and running as quickly as possible. So how do I do it?

Simple answer is I use a cron job that runs every day and does daily, weekly and monthly database and file system backups and then pushes those to Amazon S3. I rolled my own bash script to perform the backups and after a few months of both testing and improving it’s ready to be shown off.

The script is extremly simple:

  1. Import config settings from a file
  2. Dump MySQL Databases, gzip and move the file to your backup folder
  3. Dump PostgreSQL Databases, gzip and move the file to your backup folder
  4. Dump MongoDB Databases, gzip and move the file to your backup folder
  5. Tar and gzip the local webroot and move the file to your backup folder
  6. Delete daily backup files older than 7 days from the backup folder
  7. If Monday
    1. Copy just created database and webroot backups to be weekly backups
    2. Delete weekly backup files older than 28 days from the backup folder
  8. If First of Month
    1. Copy just created database and webroot backups to be monthly backups
    2. Delete monthly backup files older than 365 days from the backup folder
  9. Use S3 Tools to essentially rsync the backup folder with an Amazon S3 Bucket

It’s clean, quick and above all has worked without fail for several months now. The slowest part of the process is uploading the files to S3 which has never taken that terribly long. It’s also repeating the mantra from my earlier post of “tar it then sync”.

This method is simple and it seems to work great for most single server setups. I haven’t optimized the database dumps, mainly because that is highly dependent upon your particular use of each. If you have multiple servers or separate database and web servers, why are you taking sys admin advice from me?

It’s available on GitHub: S3_Backup


  1. I use a virtual host from Linode for this site and a few others, they are great. 

15 Feb

Incubaid Research – Rediscovering the RSync Algorithm

Don’t walk the folder and ‘rsync’ each file you encounter. A small calculation will show you how bad it really is.

Suppose you have 20000 files, each 1KB. Suppose 1 rsync costs you about 0.1s (reading the file, sending over the signature, building the stream of updates, applying them). This costs you about 2000s or more than half an hour.

System administrators know better:they would not hesitate: “tar the tree, sync the tars, and untar the synced tar”.

Suppose each of the actions takes 5s (overestimating) you’re still synced in 15s.

via Incubaid Research – Rediscovering the RSync Algorithm. The right way to synch two remote file systems.