A dedicated backup VM called Archive that continuously mirrors every machine's home directory (and Vault's /mnt) via rsync. The entire archive lives in a single git repository, committed on every sync. It's a parallel, independent backup system โ not replacing snapshots, but layered on top of them.
Archive pulls. Nobody pushes to it. It has SSH access to every machine. No machine has SSH access to Archive. It's a one-way mirror.
| Machine | Home Dir | Special Paths | SSH Status |
|---|---|---|---|
| ๐ฆ Walter | 23 GB | ~/events/ (~15K relay files) | localhost |
| ๐ฆ Walter Jr | 3.9 GB | โ | โ reachable |
| ๐ฆ Vault | 1.6 GB home + 7.7 GB /mnt | /mnt/public/, /mnt/git/ | โ reachable |
| ๐ธ Matilda | 4.1 GB | โ | โ reachable |
| ๐ Jamie | 3.4 GB | โ | โ reachable |
| ๐ Foreman | ? (SSH denied) | web content | โ key not authorized |
| ๐ฑ Amy | ? (stopped) | amy-bot.py, bridge | โ VM stopped |
| Others | ? (stopped) | โ | โ VM stopped |
Total estimated data to archive (reachable machines): ~44 GB (23 + 3.9 + 9.3 + 4.1 + 3.4). After initial sync, deltas will be small โ mostly config changes and new event files.
Name: archive
Type: e2-small (2 vCPU, 2 GB RAM โ needs enough for rsync + git)
Zone: us-central1-a (same zone as Vault for fast transfers)
Boot disk: archive โ 10 GB pd-ssd (OS only)
Data disk: archive-mnt โ 100 GB pd-ssd (mounted at /mnt)
100 GB gives us ~2x the current data with room to grow. As machines grow, we resize the disk โ same as Vault.
/mnt/
โโโ mirrors/
โ โโโ walter-20260203-1811z/ # named: {disk}-{creation-timestamp}
โ โ โโโ home/daniel/ # rsync of /home/daniel from Walter
โ โโโ walter-jr-20260215-0900z/
โ โ โโโ home/daniel/
โ โโโ vault-20260101-0000z/
โ โ โโโ home/daniel/
โ โ โโโ mnt/ # Vault also gets /mnt
โ โโโ matilda-20260310-1200z/
โ โ โโโ home/daniel/
โ โโโ jamie-20260101-0000z/
โ โ โโโ home/daniel/
โ โโโ foreman-20260101-0000z/
โ โโโ home/daniel/
โโโ .git/ # entire /mnt is one git repo
The naming convention {disk}-{creation-date}z ensures that if a machine is destroyed and recreated, the old mirror stays and a new directory is created for the new disk. No collisions. No overwrites. History preserved.
For most machines: /home/daniel/
For Vault: /home/daniel/ AND /mnt/ (this is where all the important data lives)
Excluded: .cache/, node_modules/, .npm/, snap/, .local/share/Trash/, .venv/ โ large ephemeral directories that can be recreated.
A script on Archive runs every 5 minutes via cron. For each machine in the gold file:
1. Check if machine is reachable (ssh -o ConnectTimeout=5)
2. If reachable: rsync -az --delete \
--exclude='.cache/' --exclude='node_modules/' \
--exclude='.npm/' --exclude='snap/' \
--exclude='.local/share/Trash/' --exclude='.venv/' \
{host}:/home/daniel/ /mnt/mirrors/{disk-name}/home/daniel/
3. If Vault: also rsync /mnt/ (excluding the mirrors themselves)
4. After all syncs: cd /mnt && git add -A && git commit -m "sync: {timestamp}"
The --delete flag means the mirror is an exact copy. Files deleted on the source are deleted in the mirror. But git preserves all history โ so every deleted file is recoverable from git log.
gcloud compute disks create archive --size=10GB --type=pd-ssd --zone=us-central1-a
gcloud compute disks create archive-mnt --size=100GB --type=pd-ssd --zone=us-central1-a
archive with both disks. Debian 12. No external IP needed if we use internal networking โ but external IP makes initial setup easier.
~daniel/.ssh/authorized_keys). Archive can SSH to everyone. Nobody can SSH to Archive (except Daniel, for maintenance).
ssh walter.1.foo, ssh vault.1.foo, etc.
cd /mnt && git init && mkdir -p mirrors/
/home/daniel/bin/archive-sync.sh.
*/5 * * * * /home/daniel/bin/archive-sync.sh >> /home/daniel/logs/archive-sync.log 2>&1
fleet-gold.json to include the new VM. It will appear on clankers.discount.
Q1: Zone placement? us-central1-a is same zone as Vault (fast, cheap transfers). But for disaster recovery, a different zone/region would be safer. Different region = slower + egress costs. Recommendation: same zone for now, different region later when we have more budget.
Q2: External IP? Archive doesn't need to serve anything. Could run without external IP (internal-only). But then we can't SSH to it from outside GCP. Recommendation: give it an external IP but no DNS record and no open ports except SSH.
Q3: Git at this scale? Git handles the events folder (15K small files) fine. But git add + commit on 44 GB every 5 minutes could be slow. Mitigation: most files won't change between syncs, so git add -A diffs are fast. If it gets slow, we increase the interval to 15 or 30 minutes. Could also use git annex for large binary files โ but that's complexity we don't need yet.
Q4: Disk creation timestamps? You mentioned naming mirrors after the disk's creation date. We can get this from GCP: gcloud compute disks describe {name} --format="get(creationTimestamp)". The sync script can auto-discover this.
Q5: What about stopped machines? Archive can only sync running machines. Stopped machines are already covered by GCP disk snapshots (daily at 04:00 UTC). When a stopped machine is started, Archive picks it up on the next sync cycle automatically.
vCPU quota. Current quota is 12 global. An e2-small is 2 vCPUs. Current running machines use: Walter (2) + Walter Jr (2) + Vault (shared) + Matilda (2) + Foreman (2) + Ghost Jr (shared) + Jamie (2) = ~10-12. We may hit the quota. Need to check before creating.
Cost. e2-small (~$12/month) + 100 GB pd-ssd (~$17/month) = ~$29/month. Plus network egress if cross-region. Within same zone, internal traffic is free.
Git repo bloat. If large binary files change frequently, the git repo will grow fast. The .git directory could eventually exceed the data itself. Mitigation: monitor .git size, consider git gc, or switch to git-annex for binaries if needed.
Walter's events folder. 23 GB of which most is ~/events/ (15K relay files). These are small text files โ git handles them well. But the initial commit will be large.
| Item | Monthly Cost |
|---|---|
| e2-small VM (2 vCPU, 2 GB) | ~$12 |
| 10 GB pd-ssd (boot) | ~$1.70 |
| 100 GB pd-ssd (archive-mnt) | ~$17 |
| Network (intra-zone) | Free |
| Total | ~$31/month |
Once Archive is stable, create a second archive in a different region (europe-west, for example) that mirrors the first archive. True disaster recovery.
Add a section to clankers.discount showing Archive status: last sync time per machine, data sizes, git commit count, disk usage.
If a machine that should be RUNNING hasn't been synced in >1 hour, Archive sends an alert to the group chat.