About
Determine candidates and delete from a set of directories containing aging backups.
As a follow up to the release of sayebackup.sh last December, here’s a complimentary tool we’re using at Lanedo.
Suppose a number of backup directories have piled up after a while, using sayebackup.sh or any other tool that creates time stamped file names:
drwxrwxr-x etc-2010-02-02-06:06:01-snap
drwxrwxr-x etc-2011-07-07-06:06:01-snap
drwxrwxr-x etc-2011-07-07-12:45:53-snap
drwxrwxr-x etc-2012-12-28-06:06:01-snap
drwxrwxr-x etc-2013-02-02-06:06:01-snap
lrwxrwxrwx etc-current -> etc-2012-12-28-06:06:01-snap
Which file should be deleted once the backup device starts to fill up?
Sayepurge parses the timestamps from the names of this set of backup directories, computes the time deltas, and determines good deletion candidates so that backups are spaced out over time most evenly.
The exact behavior can be tuned by specifying the number of recent files to guard against deletion (-g), the number of historic backups to keep around (-k) and the maximum number of deletions for any given run (-d). In the above set of files, the two backups from 2011-07-07 are only 6h apart, so they make good purging candidates, example:
$ sayepurge.sh -o etc -g 1 -k 3
Ignore: ./etc-2013-02-02-06:06:01-snap
Purge: ./etc-2011-07-07-06:06:01-snap
Keep: ./etc-2012-12-28-06:06:01-snap
Keep: ./etc-2011-07-07-12:45:53-snap
Keep: ./etc-2010-02-02-06:06:01-snap
For day to day use, it makes sense to use both tools combined e.g. via crontab. Here’s a sample command to perform daily backups of /etc/ and then keep 6 directories worth of daily backups stored in a toplevel directory for backups:
/bin/sayebackup.sh -q -C /backups/ -o etc /etc/ && /bin/sayepurge.sh -q -o etc -g 3 -k 3
Let me know in the comments what mechanisms you are using to purge aging backups!
Resources
The GitHub release tag is here: backups-0.0.2
Script URL for direct downloads: sayepurge.sh
Usage
Usage: sayepurge.sh [options] sources...
OPTIONS:
--inc merge incremental backups
-g <nguarded> recent files to guard (8)
-k <nkeeps> non-recent to keep (8)
-d <maxdelet> maximum number of deletions
-C <dir> backup directory
-o <prefix> output directory name (default: 'bak')
-q, --quiet suppress progress information
--fake only simulate deletions or merges
-L list all backup files with delta times
DESCRIPTION:
Delete candidates from a set of aging backups to spread backups most evenly
over time, based on time stamps embedded in directory names.
Backups older than <nguarded> are purged, so that only <nkeeps> backups
remain. In other words, the number of backups is reduced to <nguarded>
+ <nkeeps>, where <nguarded> are the most recent backups.
The puring logic will always pick the backup with the shortest time
distance to other backups. Thus, the number of <nkeeps> remaining
backups is most evenly distributed across the total time period within
which backups have been created.
Purging of incremental backups happens via merging of newly created
files into the backups predecessor. Thus merged incrementals may
contain newly created files from after the incremental backups creation
time, but the function of reverse incremental backups is fully
preserved. Merged incrementals use a different file name ending (-xinc).
See Also
Sayebackup.sh - deduplicating backups with rsync