Not too long ago I created my own automated backup script. Shortly afterwards, helpful people sent me links to other, more robust scripts that have been written. One of those was called Snapback2.
Snapback2 is a backup script based on rsync and hard-links. I could explain what that means, but why reinvent an already well invented wheel? Again?
My original script was alright, but it worked by making an exact copy of everything each time it ran. For my 4 GB home directory, backing it up weekly over the course of a month would result in a backup directory that is 16-20 GB in size! That’s a lot of wasted space, especially when some files don’t change at all.
Snapback2 uses hard links and only stores changes between one backup to the next, which means that if I only changed files that were 30MB in size, then the next backup will be 30MB as well. If no changes were made, then no space is wasted at all. Clearly this method is superior to what I have written.
Setting up Snapback2 is supposed to be very simple, but I found that the documentation assumes you know what you’re doing. The following is my Snapback2 How-To:
You can download Snapback2 at http://search.cpan.org/~mikeh/Snapback2-0.5/ for the latest version as of this writing. Technically you should be able to use Perl to download it from CPAN, but I didn’t. Most of the prerequisites should be on your Linux-based system already. According to the documentation, you’ll need:
Gnu toolset, including cp, rm, and mv
rsync 2.5.7 or higher
ssh
Perl 5.8 or higher
Perl module Config::ApacheFormat
On my Debian Sarge system, I have rsync 2.6.4, so your distribution will likely have at least 2.5.7. Similarly, I have Perl 5.8.4. The one thing that you need to do is get and install Config::ApacheFormat. To do so, make sure you have root privileges and run:
# perl -MCPAN -e 'install Config::ApacheFormat'
If it is the first time you’ve used CPAN through Perl, you will be prompted to configure it. If you aren’t sure, you can simply cancel the configuration step and it apparently grabs some defaults just fine. Any and all dependencies will also be installed.
Once you have all of the prerequisites, you can install Snapback2. Again, you could probably do the same thing above to grab it from CPAN, and it will probably grab Config::ApacheFormat for you, but as I didn’t do that, I won’t cover it here.
If you grabbed the tar.gz file from the link I provided above, you should run the following:
# tar xzf Snapback2-0.5.tar.gz
It will create a directory called Snapback2-0.5. The README tells you what to do, but for completeness, here are the next steps:
# cd Snapback2-0.5
# perl Makefile.PL
# make
# make test
# make install
Snapback2 should now be installed on your system. If it isn’t, you should double-check that you have all the prerequisites. The fourth line in the previous list runs tests before installing. If something failed, you should know why from the test results. Even if you did install it successfully, it isn’t going to do anything yet. You now need to make a configuration file.
You can read the documentation setting up the configuration file in the man page for snapback2, but you can also view it online.
Here is my file, snapback.conf:
Hourlies 4
Dailies 7
Weeklies 4
Monthlies 12
AutoTime Yes
AdminEmail gberardi
LogFile /var/log/snapback.log
ChargeFile /var/log/snapback.charges
Exclude core.*
SnapbackRoot /etc/snapback
DestinationList /home/gberardi/LauraGB
<Backup 192.168.2.41>
Directory /home
</Backup>
I didn’t change it much from what was in Snapback2-0.5/examples. I installed Snapback2 on the machine called MariaGB. MariaGB will connect to 192.168.2.41, which is the IP address of my main machine called LauraGB. This is why the DestinationList refers to LauraGB. If I wanted to backup another system, say BobGB, I would keep those backups separate in their own directory called BobGB. Normally, the ssh/rsync request would ask for a password. When I setup the backups to run automatically, it won’t be useful to me if I need to be present to login. You can do the following to create a secure public/private key pair:
$ ssh-keygen -t rsa
The above line will create keys based on RSA encryption, although you could alternatively use DSA. You will be prompted for a passphrase, which is optional. Still, a good passphrase is much better than no phrase at all. Using the defaults, you should now have two files in your .ssh directory: id_rsa and id_rsa.pub. The first one is your private key. DO NOT give it to anyone. The second one is your public key, which you could give to anyone. When setting up key-based authentication, you will append the contents of this file to the server’s .ssh/authorized_keys file. Next time you login, instead of being prompted for a password, you will find yourself at a prompt, ready to work. For more detailed information, read this document about using key-based authentication with SSH.
So now, if I run the following command on MariaGB:
# snapback2
It will backup any changes from LauraGB’s /home directory to MariaGB. If, however, it hasn’t been an hour since the last backup, it won’t do anything.
Still, manually running this command isn’t very useful, and while I could install a cron job to run snapback2, I will instead make sure that snapback_loop is running. It acts as a daemon, checking to see if a file gets created in /tmp/backups. Now I can create the following entry in my crontab:
# Create file for snapback_loop to run
0,30 * * * * touch /tmp/backups/snapback
So now, every 30 minutes, I create the file /tmp/backups/snapback, which snapback_loop will take as its cue to delete that file and run snapback2. Then snapback2 will make a backup if there has been enough time since the last backup was made.
Now, I have automated backups that run regularly. Some caveats:
- Verify that snapback2 is in /usr/local/bin. On my system, snapback2 would run manually, but snapback_loop would output errors to /tmp/backups/errors that weren’t too clear. I had to create a symlink to /usr/bin/snapback2 in /usr/local/bin in order to get it to run.
- Make sure snapback_loop is running with root privileges. It has to call snapback2, which will need access to files in /var/log and other directories which will have restricted access. If you run it as a regular user, you may get errors. You could also change the location of the log file, but /var/log is a standard spot to keep such output.
- Because you are running it with root privileges, you’ll need to make sure root is the one with the public key in authorized keys rather than your user account. Otherwise, you’ll get errors like “permission denied” when rsync tries to connect to the other machine.
If you don’t use a second computer, you can always use a second hard drive instead. Either way, you now have an effortless system for automating your backups!