Warning: This article is technical, but it pertains to something I have been working on for the past week and is pretty important: the backing up of data on the network.

You may have heard it said before, but it bears repeating: It is not a matter of "if" your hard drive or computer is going to crash but "when".  Therefore, you should be prepared if you care about the data on your computer. Our file/print/intranet server has four 250 GB hard drives configured in a RAID-5 configuration, so that if one drive fails the other drives pick up the slack. It works, I experienced a failed drive in a 3 drive array. Because of the redundent data the drives actually store 650 GB. The data is backed up to another four drive terabyte RAID-5 server that is remotely located. There are many backup strategies, but today I am just going to cover what I do with a free open source program called Rsync. 

From the rsync web site:

rsync is a file transfer program for Unix systems. rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.

Some features of rsync include

  • can update whole directory trees and filesystems
  • optionally preserves symbolic links, hard links, file ownership, permissions, devices and times
  • requires no special privileges to install
  • internal pipelining reduces latency for multiple files
  • can use rsh, ssh or direct sockets as the transport
  • supports anonymous rsync which is ideal for mirroring

This command line utility is very powerful and is relatively easy to use. The following command will backup up one directory to another: rsync -av /src/foo /dest. However, for useful daily incremental backups more options are required. There are some very complex scripts written around rsync, however, with the backup option the following script will backup multiple directories to a remote server via ssh (a secure connection), retain 29-31 days of incremental backups (depending on the month), store information in a log  and email the network admin when the job is complete. The first backup may take a while, but the following ones will be much faster since only the items that have changed will be transferred.

Here is the script with comments. You'll want to schedule it via cron, a scheduling application. Here is an article on how to generate and share a ssh key.

#!/bin/bash
# usage: backup.sh [ -d ]
BACKUPS=user@domainname:/c/archive
TIME_STAMP=$(date %d)
RSYNC_OPTS="-avz --timeout=600 --force --ignore-errors --delete --backup --backup-dir=/c/archive/$TIME_STAMP"
DEBUG=0
LOG_FILE=/var/log/your_rsync.log     ## Keeps a copy in /var/log
TMP_LOG_FILE=/tmp/your-tmp-rsync.log  ## Mails the current session's log
rm $TMP_LOG_FILE

CURRENT_DATE=`date`

# this option allows me to easily do a test run when I make changes
if [ "x$1" == "x-d" ]; then DEBUG=1; fi
if [ $DEBUG -eq 1 ]; then RSYNC_OPTS="$RSYNC_OPTS --dry-run"; fi

# the following line clears the last months incremental directory
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync --delete -aq -e "ssh -i /your-ssh-key" $HOME/emptydir/ $BACKUPS/$TIME_STAMP
rmdir $HOME/emptydir

#Clearly shows me in the log the the start of each backup
echo "Starting Backup of your server on : $CURRENT_DATE" >> $TMP_LOG_FILE

#this is a for loop to go through each of the directories I want backed up
for DATA in /etc /var/flexshare /var/www /home
do
rsync $RSYNC_OPTS -e "ssh -i /your-ssh-key" $DATA $BACKUPS   2>1 1>>$TMP_LOG_FILE
done

#separate lines for mysql database backup, because mysql needs to be stopped 
#This step backs up th entire  database
/etc/init.d/mysqld stop
rsync $RSYNC_OPTS -e "ssh -i /your-ssh-key" /var/lib/mysql $BACKUPS 2>1 1>>$TMP_LOG_FILE
/etc/init.d/mysqld start

#appends the log for the current session to the main log file. 
cat $TMP_LOG_FILE >> $LOG_FILE
cat $TMP_LOG_FILE | mail you@domainname.com -s  "Rsync Backup Results"

exit $?

I would have included the ssh command into the RSYNC_OPTS variable, but when I did I always got the error from rsync that "-i" is not a known option. And yes I did try various combinations of single qoutes inside double and vice versa to no avail. However, the above solution works fine.  The reason for the ssh command is that your connection is secure and you do not have to manually enter your password and the job can be scheduled to run any time of the day. 

This job only takes a few minutes to run after the initial backup (which took a few hours) and I run it nightly, however if you are really paranoid you could set it up to run every hour. If you do so, change the time stamp to include the hour, as well as the day to preserve your incremental backups of which you will have many more so make sure you have the free space.