Jim's Depository

this code is not yet written
 

After decades of backing up with dump I no longer do it. I suppose I got in the habit back in the days of tapes and just stayed through the disk years.

rsync is far better. I should have switched years ago.

  • Efficient incremental backups over networks, even for appending little bits to the ends of long files.
  • The backups are real directories of real files, easy to ferret about and find what you need.
  • Nifty trick to keep N days of backups without using N times the space, but still each tree looks like a snapshot.
  • Easy ssh based security.
  • You can do either a push or a pull depending on your security requirements.

The first thing to do is to read about the rsync –link-dest option. It lets you use hard links to share the contents of files across days of your backups.

The second thing to do is to decide on your backup strategy. For many machines I just keep 7 days of backups, it makes things easy. There is a backup for each day of the week and they overwrite when it wraps. For other applications where I have to go further back I rename directories, much like logrotate would.

The third thing to think about is what happens if your ssh key or rsync password is compromised. If you are running backups from cron, then there will be a machine readable key on your machine somehow. This may or may not be a danger depending how you have things secured. In my setups if you could get the key you could have gotten the data anyway. (Remember that your backup archive machine needs to be at least as secure as the live machine.)

Enough talking, more sample code:

Scenario #1: Many big machines, lots of bits to push, on the same secure network. We want to go fast. All the machines have different security policies.

I use push in this situation. There is a dedicated backup machine to receive and hold the bits. Only two trusted people have access to this machine. The backup machine runs an rsync daemon with a module for each host that lets the host write backups (only in its host specific area, write only). On each host there is a root cron job with the rsync password embedded to run the backup.

Sample backup script… cron these, offset their run times so keep contention down….

#!/bin/sh   
 HOST=`hostname` export
RSYNC_PASSWORD=1234567890abcdef12345678890abcdef   
 DAY=`date +%a | tr '[A-Z]' '[a-z]'`   
 case $DAY in   sun ) PDAY=sat ;;   mon ) PDAY=sun ;;   tue ) PDAY=mon
;;   wed ) PDAY=tue ;;   thu ) PDAY=wed ;;   fri ) PDAY=thu ;;   sat )
PDAY=fri ;; esac   
 OPTS="-aqH --link-dest=/$PDAY/ --no-devices --no-specials
--exclude=/proc/ --exclude=/sys/ --exclude=/dev/ --exclude=/tmp/
--delete"   
 time rsync $OPTS / $HOST@warehouse.federated.com::$HOST/$DAY

Sample module from rsyncd.conf…

[nexus]     auth users = nexus     secrets file = /etc/rsyncd.secrets   
 use chroot = yes     path = /warehouse/nexus     numeric ids = yes   
 list = no     read only = no     write only = yes     uid = 0     gid =
0     hosts allow = 111.222.33.44     hosts deny = *

Scenario #2: Offsite backup of Virtual Private Server

I have a machine that lives in a hosting facility. I have broken my first rule of service providers. They are not close enough for me to pop over and wrap my hands around someone if there is a problem, so I content myself with a full backup and the ability to be up and running at a new provider in 60 minutes if needed. I don’t want any credentials sitting on a machine at the hosting facility, so I do a pull in this situation. I also use ssh to protect my data in transit, but use the rsync daemon and modules on the far side to get better control, for instance to make it read only.

Cron job on my backup server (pardon the db_client related noise, that machine has to run dropbear instead of a more common ssh. But do notice that I have a –rsh to force a tunnel and a :: to use daemon mode and modules.)

#!/bin/bash

function saveone () { TAG=$1 SRC=$2

rm -rf vhosts/\$TAG.9 [ -d vhosts/\$TAG.8 ] && mv vhosts/\$TAG.8
vhosts/\$TAG.9 [ -d vhosts/\$TAG.7 ] && mv vhosts/\$TAG.7 vhosts/\$TAG.8
[ -d vhosts/\$TAG.6 ] && mv vhosts/\$TAG.6 vhosts/\$TAG.7 [ -d
vhosts/\$TAG.5 ] && mv vhosts/\$TAG.5 vhosts/\$TAG.6 [ -d vhosts/\$TAG.4
] && mv vhosts/\$TAG.4 vhosts/\$TAG.5 [ -d vhosts/\$TAG.3 ] && mv
vhosts/\$TAG.3 vhosts/\$TAG.4 [ -d vhosts/\$TAG.2 ] && mv vhosts/\$TAG.2
vhosts/\$TAG.3 [ -d vhosts/\$TAG.1 ] && mv vhosts/\$TAG.1 vhosts/\$TAG.2
[ -d vhosts/\$TAG.0 ] && mv vhosts/\$TAG.0 vhosts/\$TAG.1 [ -d
vhosts/\$TAG ] && mv vhosts/\$TAG vhosts/\$TAG.0
RSYNC\_PASSWORD=a8e261e7bac90138087f770caa5fea5b export RSYNC\_PASSWORD
OPTS="-aqHz --bwlimit=400 --exclude lost+found --exclude /tmp --exclude
/var/tmp --exclude /proc --exclude /sys --no-devices --no-specials
--delete" rsync \$OPTS --rsh "dbclient -l root -i .ssh/id\_archivist.db"
--link-dest=/home/archivist/vhosts/\$TAG.0/ \$SRC
/home/archivist/vhosts/\$TAG/ \> \~/\$TAG.log }

saveone studt-net rhth.lunarware.com::rhth

\~root/.ssh/authorized_keys on the virtual private server (Look at the bit in front of the ssh-dss, it restricts what that key can do, in particular it makes it only able to run the rsync daemon.)

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/rsync
--server --daemon ." ssh-dss
adfeadfaefasdfefeI\_DELETED\_MY\_KEY\_HEREadfasdfefadfae backups

rsyncd.conf

[machine] auth users = archivist secrets file = /etc/rsyncd.secrets path
= / numeric ids = yes list = no read only = yes write only = no uid = 0
gid = 0

There you have it. Reasonably safe backups. There is room for improvement, for instance, rather than coming straight into root with the restricted command it could be a different account and use “super” to run the command, and it should check the source IP and only work from the backup machine.

 

Eww, nasty double spacing of the code segments. I'll have to think about how to fix that. Safari put each line into its own div for some reason.