Feb 082013

Determine candidates and delete from a set of directories containing aging backups.
As a follow up to the release of sayebackup.sh last December, here’s a complimentary tool we’re using at Lanedo. Suppose a number of backup directories have piled up after a while, using sayebackup.sh or any other tool that creates time stamped file names:

 drwxrwxr-x etc-2010-02-02-06:06:01-snap
 drwxrwxr-x etc-2011-07-07-06:06:01-snap
 drwxrwxr-x etc-2011-07-07-12:45:53-snap
 drwxrwxr-x etc-2012-12-28-06:06:01-snap
 drwxrwxr-x etc-2013-02-02-06:06:01-snap
 lrwxrwxrwx etc-current -> etc-2012-12-28-06:06:01-snap

Which file should be deleted once the backup device starts to fill up?
Sayepurge parses the timestamps from the names of this set of backup directories, computes the time deltas, and determines good deletion candidates so that backups are spaced out over time most evenly. The exact behavior can be tuned by specifying the number of recent files to guard against deletion (-g), the number of historic backups to keep around (-k) and the maximum number of deletions for any given run (-d). In the above set of files, the two backups from 2011-07-07 are only 6h apart, so they make good purging candidates, example:

 $ sayepurge.sh -o etc -g 1 -k 3 
 Ignore: ./etc-2013-02-02-06:06:01-snap
 Purge:  ./etc-2011-07-07-06:06:01-snap
 Keep:   ./etc-2012-12-28-06:06:01-snap
 Keep:   ./etc-2011-07-07-12:45:53-snap
 Keep:   ./etc-2010-02-02-06:06:01-snap

For day to day use, it makes sense to use both tools combined e.g. via crontab. Here’s a sample command to perform daily backups of /etc/ and then keep 6 directories worth of daily backups stored in a toplevel directory for backups:

 /bin/sayebackup.sh -q -C /backups/ -o etc /etc/ && /bin/sayepurge.sh -q -o etc -g 3 -k 3

Let me know in the comments what mechanisms you are using to purge aging backups!


The GitHub release tag is here: backups-0.0.2
Script URL for direct downloads: sayepurge.sh

Usage: sayepurge.sh [options] sources...
  --inc         merge incremental backups
  -g <nguarded> recent files to guard (8)
  -k <nkeeps>   non-recent to keep (8)
  -d <maxdelet> maximum number of deletions
  -C <dir>      backup directory
  -o <prefix>   output directory name (default: 'bak')
  -q, --quiet   suppress progress information
  --fake        only simulate deletions or merges
  -L            list all backup files with delta times
  Delete candidates from a set of aging backups to spread backups most evenly
  over time, based on time stamps embedded in directory names.
  Backups older than <nguarded> are purged, so that only <nkeeps> backups
  remain. In other words, the number of backups is reduced to <nguarded>
  + <nkeeps>, where <nguarded> are the most recent backups.
  The puring logic will always pick the backup with the shortest time
  distance to other backups. Thus, the number of <nkeeps> remaining
  backups is most evenly distributed across the total time period within
  which backups have been created.
  Purging of incremental backups happens via merging of newly created
  files into the backups predecessor. Thus merged incrementals may
  contain newly created files from after the incremental backups creation
  time, but the function of reverse incremental backups is fully
  preserved. Merged incrementals use a different file name ending (-xinc).
See Also

Sayebackup.sh – deduplicating backups with rsync

Dec 012012

Due to popular request, I’m putting up a polished version of the backup script that we’ve been using over the years at Lanedo to backup our systems remotely. This script uses a special feature of rsync(1) v2.6.4 for the creation of backups which share storage space with previous backups by hard-linking files.
The various options needed for rsync and ssh to minimize transfer bandwidth over the Internet, time-stamping for the backups and handling of several rsync oddities warranted encapsulation of the logic into a dedicated script.


The GitHub release tag is here: backups-0.0.1
Script URL for direct downloads: sayebackup.sh


This example shows creation of two consecutive backups and displays the sizes.

$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create backup as bak-.../mydir
$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create second bak-2012...-snap/
$ ls -l # show all the backups that have been created
drwxrwxr-x 3 user group 4096 Dez  1 03:16 bak-2012-12-01-03:16:50-snap
drwxrwxr-x 3 user group 4096 Dez  1 03:17 bak-2012-12-01-03:17:12-snap
lrwxrwxrwx 1 user group   28 Dez  1 03:17 bak-current -> bak-2012-12-01-03:17:12-snap
$ du -sh bak-* # the second backup is smaller due to hard links
4.1M    bak-2012-12-01-03:16:50-snap
128K    bak-2012-12-01-03:17:12-snap
4.0K    bak-current
Usage: sayebackup.sh [options] sources...
  --inc         make reverse incremental backup
  --dry         run and show rsync with --dry-run option
  --help        print usage summary
  -C <dir>      backup directory (default: '.')
  -E <exclfile> file with rsync exclude list
  -l <account>  ssh user name to use (see ssh(1) -l)
  -i <identity> ssh identity key file to use (see ssh(1) -i)
  -P <sshport>  ssh port to use on the remote system
  -L <linkdest> hardlink dest files from <linkdest>/
  -o <prefix>   output directory name (default: 'bak')
  -q, --quiet   suppress progress information
  -c            perform checksum based file content comparisons
  -x            disable crossing of filesystem boundaries
  --version     script and rsync versions
  This script creates full or reverse incremental backups using the
  rsync(1) command. Backup directory names contain the date and time
  of each backup run to allow sorting and selective pruning.
  At the end of each successful backup run, a symlink '*-current' is
  updated to always point at the latest backup. To reduce remote file
  transfers, the '-L' option can be used (possibly multiple times) to
  specify existing local file trees from which files will be
  hard-linked into the backup.
 Full Backups:
  Upon each invocation, a new backup directory is created that contains
  all files of the source system. Hard links are created to files of
  previous backups where possible, so extra storage space is only required
  for contents that changed between backups.
 Incremental Backups:
  In incremental mode, the most recent backup is always a full backup,
  while the previous full backup is degraded to a reverse incremental
  backup, which only contains differences between the current and the
  last backup.
 RSYNC_BINARY Environment variable used to override the rsync binary path.
See Also

Testbit Tools – Version 11.09 Release