Backup using rsync and hard links
Backup using rsync and hard links
Overview
What is this?
backup.sh is the backup script I once put together, and which does its job quite well for my needs. It uses rsync for creating incremental backups using hard links.
Advantages
- Incremental backups: never lose any work
- Storage space: only store changed files
- Immediate access to every backup date, using /DATE/path/to/file
- Easy deletion of backups, without affecting future or past backup points
- KISS: Easy to understand, and no uncommon dependencies
- Probably works across networks
Disadvantages
- Heavy usage of hard links might confuse some user space tools
- No deduplication
- Small changes in big files cause the whole file to be stored
- CPU-intensive migration to a new backup medium (keeping hardlinks together has quadratic runtime).
Usage
Configuration
It's as easy as adapting the SRC and DEST variables in the script header to your needs. Then drop a .backup.exclude (note the leading dot) file (which may be empty) into the SRC directory, listing all the files to exclude.
To exclude all 'foo'-named files, add a 'foo' line. To exclude only 'SRC/foo' (but not 'SRC/subdir/foo'), then add a '/foo' line. For more details, consult the rsync man page.
Back it up!
Just fire up the script, lean back and enjoy the names of files scrolling down your screen.
Deleting backups
If you want to free some space, just delete the backup points you like with
rm -rf DEST/20140215 DEST/20141210 ...
Implementation details
How it works
The scripts rsyncs your directory contents to backups/DATE/. The --hard-links rsync option tells rsync to compare the files with those in backups/LAST_DATE/. If the files differ, then it copies the file to backups/DATE/file, if they're the same, it hardlinks backups/DATE/file to backups/LAST_DATE/file.
The .backup.exclude file in the source directory root is passed to rsync for excluding unwanted files from being backed up.
Further tweaking
You can comment in or out the lines that calculate the size of this backup, or the size of all backups taken. While I recommend watching these sizes after your first 5-or-so backups in order to ensure that your .backup.exclude is correct, I have commented them out after some backups; they just slowed things down.
If you want multiple backups per day, you can easily adapt the naming scheme to include hours/minutes as well. Just make sure that alphanumerical sorting stays consistent with time (i.e., don't use DD.MM.YYYY, but YYYY-MM-DD).