Dec 012012
 
About

Due to popular request, I’m putting up a polished version of the backup script that we’ve been using over the years at Lanedo to backup our systems remotely. This script uses a special feature of rsync(1) v2.6.4 for the creation of backups which share storage space with previous backups by hard-linking files.
The various options needed for rsync and ssh to minimize transfer bandwidth over the Internet, time-stamping for the backups and handling of several rsync oddities warranted encapsulation of the logic into a dedicated script.

Resources

The GitHub release tag is here: backups-0.0.1
Script URL for direct downloads: sayebackup.sh

Example

This example shows creation of two consecutive backups and displays the sizes.

$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create backup as bak-.../mydir
$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create second bak-2012...-snap/
$ ls -l # show all the backups that have been created
drwxrwxr-x 3 user group 4096 Dez  1 03:16 bak-2012-12-01-03:16:50-snap
drwxrwxr-x 3 user group 4096 Dez  1 03:17 bak-2012-12-01-03:17:12-snap
lrwxrwxrwx 1 user group   28 Dez  1 03:17 bak-current -> bak-2012-12-01-03:17:12-snap
$ du -sh bak-* # the second backup is smaller due to hard links
4.1M    bak-2012-12-01-03:16:50-snap
128K    bak-2012-12-01-03:17:12-snap
4.0K    bak-current
Usage
Usage: sayebackup.sh [options] sources...
OPTIONS:
  --inc         make reverse incremental backup
  --dry         run and show rsync with --dry-run option
  --help        print usage summary
  -C <dir>      backup directory (default: '.')
  -E <exclfile> file with rsync exclude list
  -l <account>  ssh user name to use (see ssh(1) -l)
  -i <identity> ssh identity key file to use (see ssh(1) -i)
  -P <sshport>  ssh port to use on the remote system
  -L <linkdest> hardlink dest files from <linkdest>/
  -o <prefix>   output directory name (default: 'bak')
  -q, --quiet   suppress progress information
  -c            perform checksum based file content comparisons
  --one-file-system
  -x            disable crossing of filesystem boundaries
  --version     script and rsync versions
DESCRIPTION:
  This script creates full or reverse incremental backups using the
  rsync(1) command. Backup directory names contain the date and time
  of each backup run to allow sorting and selective pruning.
  At the end of each successful backup run, a symlink '*-current' is
  updated to always point at the latest backup. To reduce remote file
  transfers, the '-L' option can be used (possibly multiple times) to
  specify existing local file trees from which files will be
  hard-linked into the backup.
 Full Backups:
  Upon each invocation, a new backup directory is created that contains
  all files of the source system. Hard links are created to files of
  previous backups where possible, so extra storage space is only required
  for contents that changed between backups.
 Incremental Backups:
  In incremental mode, the most recent backup is always a full backup,
  while the previous full backup is degraded to a reverse incremental
  backup, which only contains differences between the current and the
  last backup.
 RSYNC_BINARY Environment variable used to override the rsync binary path.
See Also

Testbit Tools – Version 11.09 Release

Tweet about this on TwitterShare on Google+Share on LinkedInShare on FacebookFlattr the authorBuffer this pageShare on RedditDigg thisShare on VKShare on YummlyPin on PinterestShare on StumbleUponShare on TumblrPrint this pageEmail this to someone

[suffusion-the-author display='description']

  6 Responses to “Sayebackup.sh – deduplicating backups with rsync”

  1. That looks sufficiently nice and simple, great!

    if more than one directory is to be backed up how is that done? or do we need multiple calls like:
    #!/bin/bash
    /usr/local/bin/get_packages
    /usr/local/bin/sayebackup.sh -C /mnt/2TBbackup/sayebackup/eno/etc /etc
    /usr/local/bin/sayebackup.sh -C /mnt/2TBbackup/sayebackup/eno/usr/local /usr/local
    /usr/local/bin/sayebackup.sh -C /mnt/2TBbackup/sayebackup/eno/home/eg /home/eg
    the disadvantage here is, that each run has a different time stamp (maybe something I could live with)

    or is there a better way?

    Thanks in advance.

    Redil

    • I’d recommend to backup multiple dirs like this:
      sayebackup.sh -o multi /var/log /etc

      I.e. creating a multi-/ backup directory, that contains etc/ and log/.
      Beware not to append slashes to the source directories in this command line, as that backs up the directory contents, not the directory entries themselves, just like rsync.

  2. Very nice. I guess everybody should write their own rsync backup script. +1 for sharing your code!

    Mine runs as a service and takes care of removing old backups when the disk space runs out, hence it is called RSYNC BACKUP MADE EASY 🙂

    • Haha, you’re right Schlomo. 😉
      About removing old backups, I plan to tend to that in a future installment…
      Looked at rbme, it’s quite extensive, thanks for sharing the link!

  3. In the example you gave, the second bakcup is smaller “due to hardlinks”, but du also count the size of hardlinked files. So the different directory size should be coming for another source.

    • You are probably thinking of the –count-links (-l) option to du, which counts contents several times if hard linked. I’m not using the option in the above scenario to show the gains from hard linking. Here’s a copy-paste example:

      $ du -sh etc-*
      19M     etc-2013-02-11-15:44:44-snap
      0       etc-2013-02-11-15:44:50-snap
      $ du -sh --count-links etc-*
      19M     etc-2013-02-11-15:44:44-snap
      19M     etc-2013-02-11-15:44:50-snap
      

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>