Hardened File Backup Routine

Mar. 27, 2025 [technology] [privacy-security] [hardening] [guides] [libre]

One overlooked aspect of data security is availability. If one cannot guarantee the ability to access information, particularly following data damage or loss, one does not have security. I’ve arrived at the conclusion that another major component of a good backup system is simplicity. There are hundreds of programs that purport to solve the issue of file backup. But I’m weary of such programs. Who maintains them? If some guy in Nebraska passes away, will that backup program fall out of maintenance? Do you have the gumption to roll your own? I shelved my own overly complicated attempt.

You don’t want to end up in a situation where files that you backed up years ago using XYZ program cannot easily be read or restored by the current year version, such as because the way it compresses or encrypts files having changed since. You also want some certainty that if you’re locked away in a madhouse for questioning the JFK assassination, that your backup files are there, waiting for you when the institution returns your belongings from holding.

Drives and cables

It has taken me several reimaginings of my backup system before evolving it into what it is today. A bit of history, starting with me doing things the wrong way:

When I was in my adolescence, my entire “backup solution” was simply to drag and drop files in Windows XP file explorer from my internal drive to a 120GB external hard drive at the other end of a USB 2.0 cable. Complete with the little animated papers flying between folders on the file transfer dialogue. As scrappy as it sounds, such a basic practice remains more forward thinking than keeping no backup at all. Something that my time in computer repair informed me almost nobody does.

Once I’d become more serious about maintaining a consistent backup of my files, and partially motivated by witnessing data loss from a front row seat, I upgraded to a dual set of external hard drives. They both received simultaneous weekly backups using my DE’s file explorer at first but later using automated scripts to do so. That kept me going before finally committing to follow the 3-2-1 backup axiom proper (No, two identical drives and an internal production drive doesn’t count).

It wasn’t until 2018 that I devised the backup routine that I still use to this day.

Good File Backup

First, what makes a good backup? This is my informed opinion, and note that I emphasize file backup since that is not the same as a system backup. A system can be reprovisioned, while irreplaceable personal files cannot be recreated. Hence, I only backup files and some configs. But if you are interested in system backup solutions, I’d like to point you over to Dig Deeper’s Backing up and Restoring Operating Systems.

A good file backup solution should:

A good file backup solution should NOT:

Some backup programs (deja-dup I think it was) offer up options to encrypt files during backup. But I consider this the wrong approach, as the files themselves are the only thing being encrypted, passed off to a non-encrypted volume (presumably). This leaves room for tampering, and relies on the backup program to decide for you how and by what encryption tool the files get encrypted. I find it much more reasonable to simply use full disk encryption with something like cryptsetup.

Provisioning

I recommend procuring three external USB hard drives. They should be a mix of both mechanical and solid state storage, with no two drives sharing the same vendor. Purchase in-person, with cash, ofc.

Next setup dm-crypt encrypted volumes on each drive.

cryptsetup -v --cipher aes-xts-plain64 --key-size 512 --use-urandom --verify-passphrase luksFormat /dev/sdX

AES-XTS-PLAIN64 is recognized as one of the most secure encryption modes available for full disk encryption. Note that 512 bit key size is actually the default when using AES-XTS cryptography, and gets split into two AES-256 bit keys anyway. I only include that switch for historic consistency.

Open and mount the new drive (substituting “DriveName” and “VolumeName” for your intended naming scheme):

cryptsetup luksOpen /dev/sdX DriveName
mkfs.ext4 -L VolumeName /dev/mapper/DriveName
cryptsetup luksClose /dev/mapper/DriveName

Set the new volume as writable by your standard user account.

udisksctl unlock -b /dev/sdX
udisksctl mount -b /dev/dm-X
chown -R user /media/user/VolumeName
chgrp -R user /media/user/VolumeName
udisksctl unmount -b /dev/dm-X
udisksctl lock -b /dev/sdX

You may wish to take LUKS header backups, for the off chance that header information gets corrupted or overwritten (or borked by user error):

cryptsetup luksHeaderBackup /dev/sdX --header-backup-file /path/to/destination/luks_header_backup_DriveName

Each drive should perhaps hold the two other headers for each of its sister devices.

Also consider taking a SMART stats read to keep on each drive so that you have something to compare future attributes against.

smartctl -a /media/user/Backup /media/user/Backup/smartstats.txt

Backup Script

rsync meets all the criteria for an established, well-maintained tool. Any script using it does not need to be complex, and in fact shouldn’t be. This is a genericized version of what I run:

#!/bin/bash
rsync -aEvv --delete-delay --progress --files-from="$PWD/file-list.txt" "$HOME" "/media/user/Backup/Files/"

I include –delete-delay because I’m rather paranoid about files no longer in the source being pruned from the backup before the full transfer has finished, just in case something went wrong.

The –files-from= switch is an easy way to define a list of directories to include without needing to resort to writing for loops. I store my file list among the host files so that independent copies don’t need to be individually maintained across the multiple backup drives. It can look something like this:

Documents
Downloads
Music
Pictures
Videos
.cache
.config
.gnupg
.local
.mozilla
.ssh

Those willing to poke around the guides section may also infer including additional things like .newsboat. Everyone’s backup list will look different, but don’t forget about those hidden directories!

chmod +x backup.sh

I recommend keeping the backup script on the host (just like the file list) to avoid accidentally forgetting to distribute changes out to multiple copies.

rsync also includes a nifty feature to write logs with the –log-file= switch. That way you can keep a historical record of changes made to the backup file set.

Procedure

What I do is take a backup to one drive each week, rotating it back into storage and pushing the oldest one off premises. If you’re not an enemy of the state, the off-premise location can be something like a lock box at a bank. Otherwise it can assuredly go into that capsule buried next to the stone wall in the woods.

Each drive will see roughly eighteen backups in a year. And at year’s end, I like to compress all of the log files into a tarball for posterity.

tar --exclude=*.tar.gz -czvf Logs-2024.tar.gz *2024*.log

The naming convention used for your logs will dictate how that last argument of the tar command is formulated. My backup encodes the year it was taken into the filename so wildcard’ing like that is reliable enough.

Individual disks should be replaced after about five years. Even though they don’t see a whole lot of power on hours, being backup drives, they receive tons of writes. Taking a diff of the SMART stats from the beginning of a drive’s deployment to the end with its retirement only confirms this.

Lastly, consider physically differentiating the drives in some way to simplify their rotation. Otherwise you need to resort to checking the last backup date to confirm you’re indeed updating the least recent of the bunch.

It’s not sexy. It doesn’t use the latest tech trends. It requires diligent habit. And that’s the point. Backups are not something where you want to be trying new and novel tricks. It calls for slow and steady iteration over sweeping changes.