After decades of backing up
with dump I no longer do it. I
suppose I got in the habit back in the days of tapes and just stayed
through the disk years.
rsync is far better. I
should have switched years ago.
- Efficient incremental backups over networks, even for appending
little bits to the ends of long files.
- The backups are real directories of real files, easy to ferret about
and find what you need.
- Nifty trick to keep N days of backups without using N times the
space, but still each tree looks like a snapshot.
- Easy ssh based security.
- You can do either a push or a pull depending on your security
requirements.
The first thing to do is to read about the rsync –link-dest option. It
lets you use hard links to share the contents of files across days of
your backups.
The second thing to do is to decide on your backup strategy. For many
machines I just keep 7 days of backups, it makes things easy. There is a
backup for each day of the week and they overwrite when it wraps. For
other applications where I have to go further back I rename directories,
much like logrotate would.
The third thing to think about is what happens if your ssh key or rsync
password is compromised. If you are running backups from cron, then
there will be a machine readable key on your machine somehow. This may
or may not be a danger depending how you have things secured. In my
setups if you could get the key you could have gotten the data anyway.
(Remember that your backup archive machine needs to be at least as
secure as the live machine.)
Enough talking, more sample code:
Scenario #1: Many big machines, lots of bits to push, on the same
secure network. We want to go fast. All the machines have different
security policies.
I use push in this situation. There is a dedicated backup machine to
receive and hold the bits. Only two trusted people have access to this
machine. The backup machine runs an rsync daemon with a module for each
host that lets the host write backups (only in its host specific area,
write only). On each host there is a root cron job with the rsync
password embedded to run the backup.
Sample backup script… cron these, offset their run times so keep
contention down….
#!/bin/sh
HOST=`hostname` export
RSYNC_PASSWORD=1234567890abcdef12345678890abcdef
DAY=`date +%a | tr '[A-Z]' '[a-z]'`
case $DAY in sun ) PDAY=sat ;; mon ) PDAY=sun ;; tue ) PDAY=mon
;; wed ) PDAY=tue ;; thu ) PDAY=wed ;; fri ) PDAY=thu ;; sat )
PDAY=fri ;; esac
OPTS="-aqH --link-dest=/$PDAY/ --no-devices --no-specials
--exclude=/proc/ --exclude=/sys/ --exclude=/dev/ --exclude=/tmp/
--delete"
time rsync $OPTS / $HOST@warehouse.federated.com::$HOST/$DAY
Sample module from rsyncd.conf…
[nexus] auth users = nexus secrets file = /etc/rsyncd.secrets
use chroot = yes path = /warehouse/nexus numeric ids = yes
list = no read only = no write only = yes uid = 0 gid =
0 hosts allow = 111.222.33.44 hosts deny = *
Scenario #2: Offsite backup of Virtual Private Server
I have a machine that lives in a hosting facility. I have broken my
first rule of service providers. They are not close enough for me to pop
over and wrap my hands around someone if there is a problem, so I
content myself with a full backup and the ability to be up and running
at a new provider in 60 minutes if needed. I don’t want any credentials
sitting on a machine at the hosting facility, so I do a pull in this
situation. I also use ssh to protect my data in transit, but use the
rsync daemon and modules on the far side to get better control, for
instance to make it read only.
Cron job on my backup server (pardon the db_client related noise,
that machine has to run dropbear instead of a more common ssh. But do
notice that I have a –rsh to force a tunnel and a :: to use daemon mode
and modules.)
#!/bin/bash
function saveone () { TAG=$1 SRC=$2
rm -rf vhosts/\$TAG.9 [ -d vhosts/\$TAG.8 ] && mv vhosts/\$TAG.8
vhosts/\$TAG.9 [ -d vhosts/\$TAG.7 ] && mv vhosts/\$TAG.7 vhosts/\$TAG.8
[ -d vhosts/\$TAG.6 ] && mv vhosts/\$TAG.6 vhosts/\$TAG.7 [ -d
vhosts/\$TAG.5 ] && mv vhosts/\$TAG.5 vhosts/\$TAG.6 [ -d vhosts/\$TAG.4
] && mv vhosts/\$TAG.4 vhosts/\$TAG.5 [ -d vhosts/\$TAG.3 ] && mv
vhosts/\$TAG.3 vhosts/\$TAG.4 [ -d vhosts/\$TAG.2 ] && mv vhosts/\$TAG.2
vhosts/\$TAG.3 [ -d vhosts/\$TAG.1 ] && mv vhosts/\$TAG.1 vhosts/\$TAG.2
[ -d vhosts/\$TAG.0 ] && mv vhosts/\$TAG.0 vhosts/\$TAG.1 [ -d
vhosts/\$TAG ] && mv vhosts/\$TAG vhosts/\$TAG.0
RSYNC\_PASSWORD=a8e261e7bac90138087f770caa5fea5b export RSYNC\_PASSWORD
OPTS="-aqHz --bwlimit=400 --exclude lost+found --exclude /tmp --exclude
/var/tmp --exclude /proc --exclude /sys --no-devices --no-specials
--delete" rsync \$OPTS --rsh "dbclient -l root -i .ssh/id\_archivist.db"
--link-dest=/home/archivist/vhosts/\$TAG.0/ \$SRC
/home/archivist/vhosts/\$TAG/ \> \~/\$TAG.log }
saveone studt-net rhth.lunarware.com::rhth
\~root/.ssh/authorized_keys on the virtual private server (Look at
the bit in front of the ssh-dss, it restricts what that key can do, in
particular it makes it only able to run the rsync daemon.)
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/rsync
--server --daemon ." ssh-dss
adfeadfaefasdfefeI\_DELETED\_MY\_KEY\_HEREadfasdfefadfae backups
rsyncd.conf
[machine] auth users = archivist secrets file = /etc/rsyncd.secrets path
= / numeric ids = yes list = no read only = yes write only = no uid = 0
gid = 0
There you have it. Reasonably safe backups. There is room for
improvement, for instance, rather than coming straight into root with
the restricted command it could be a different account and use “super”
to run the command, and it should check the source IP and only work from
the backup machine.