Incident Report: Power Outage + Git Reset Config Wipe¶
Date: 2026-03-24 Duration: ~2 hours (power restoration + troubleshooting) Severity: High — multiple critical services down, cameras offline, Authelia SSO broken Trigger: Power disconnection at home (manual NAS restart required)
Summary¶
A power disconnection took the NAS offline. The NAS did not auto-restart (known BIOS limitation — boots from USB). After manual restart, 16 of 36 Docker VM containers failed to start due to cascading effects from a previous git reset --hard that had wiped local .env secrets, Authelia password hashes, and Frigate camera configurations.
Three distinct failures surfaced:
- NAS offline — power loss, no auto-restart (BIOS/USB boot issue)
- Authelia crash loop — password hashes replaced with repo placeholders after
git reset --hard - Immich migration failure — pinned v2.5.6 incompatible with DB upgraded to v2.6.1 by Watchtower
- Frigate unhealthy — camera IPs and zone coordinates replaced with
<REDACTED>placeholders
Timeline (PYT, UTC-3)¶
| Time | Event |
|---|---|
| ~09:30 | Power disconnection at home |
| ~09:30 | NAS goes offline (no UPS, no auto-restart) |
| ~09:30 | Docker VM: 16 NFS-dependent containers stop working |
| 09:45 | Investigation begins — Mac on phone hotspot, Docker VM reachable via Tailscale |
| 09:50 | Docker VM has internet (OPNsense OK), but NAS (192.168.0.12) unreachable |
| 09:55 | NAS manually restarted (power button) |
| 10:00 | NAS back online — all 19 containers healthy |
| 10:05 | Docker VM: NFS mounts partially restored (only /srv/frigate) |
| 10:10 | mount -a fails for media/downloads/photos NFS (exports don't exist yet — 8TB recovery pending) |
| 10:15 | Start remaining Docker VM stacks — Authelia crash loop detected |
| 10:20 | Root cause: users_database.yml has placeholder hashes (from git reset --hard during filter-repo) |
| 10:35 | Regenerated Argon2 hashes for both users, restarted Authelia — healthy |
| 10:40 | Documents + Photos stacks fail: RESTIC_PASSWORD missing from .env (same git reset cause) |
| 10:42 | Restored RESTIC_PASSWORD to documents and photos .env files |
| 10:45 | All stacks starting — Immich server still crash-looping |
| 10:50 | Root cause: DB has migration from v2.6.1, but server pinned to v2.5.6 |
| 10:55 | Switched Immich to release tag (v2.6.1) — healthy |
| 11:00 | Frigate unhealthy — camera IPs are <REDACTED> placeholders |
| 11:05 | Restored real camera IPs and zone coordinates from frigate-zones-real.local |
| 11:10 | Frigate restarted — healthy, cameras green |
| 11:15 | Full recovery: 36/36 Docker VM + 19/19 NAS containers healthy |
Root Causes¶
1. Power Outage (no UPS)¶
Home power disconnection. The NAS boots from USB (ASUS BIOS can't detect SSD UEFI) and does not auto-restart after power loss. Manual power button press required.
Impact: NAS offline until physically restarted. Docker VM NFS-dependent containers fail.
2. Git Reset Config Wipe (the real problem)¶
Multiple git filter-repo operations (for secrets purge and domain strategy purge) required git reset --hard origin/main on deployed hosts to sync with the rewritten history. This overwrote:
- Authelia
users_database.yml: real Argon2 password hashes → placeholder$REPLACE_WITH_REAL_HASH - Frigate
frigate.yml: real camera IPs →<CAMERA_FRONT_IP>placeholders, real zone coordinates →<REDACTED> - Documents
.env: RESTIC_PASSWORD removed (added by:?validation, not in repo) - Photos
.env: same RESTIC_PASSWORD issue
These changes were silent — containers continued running with cached config until restarted (by this power outage).
3. Immich Version Mismatch¶
Watchtower auto-upgraded Immich from v2.5.6 to v2.6.1 before we pinned the version. The DB received migration 1773242919341-EncodedVideoAssetFiles from v2.6.1. When we pinned to v2.5.6, the server couldn't start because it didn't recognize the newer migration.
4. NFS Mounts Not Configured¶
Only /srv/frigate NFS export exists on the NAS. Media, downloads, and photos exports are planned for after the 8TB recovery. Docker VM fstab has nofail so these fail silently, but containers expecting the mounts start with the root filesystem instead.
Actions Taken¶
| # | Action | Result |
|---|---|---|
| 1 | Manually restarted NAS | NAS online, 19 containers healthy |
| 2 | NFS remount (mount -a) |
Only frigate mount succeeded (others don't exist yet) |
| 3 | Regenerated Authelia Argon2 hashes via Docker | Authelia healthy |
| 4 | Restored RESTIC_PASSWORD to documents + photos .env |
Stacks started |
| 5 | Switched Immich from v2.5.6 to v2.6.1 (matching DB) | Immich healthy (PR #45) |
| 6 | Restored camera IPs + zone coordinates in frigate.yml | Frigate healthy, cameras online |
Cascading Failure Analysis¶
Power Outage
→ NAS offline (no UPS, no auto-restart)
→ Docker VM NFS mounts fail
→ 16 containers stop/crash
→ Containers restart after NAS returns
→ Authelia crash loop (placeholder hashes from git reset)
→ Authelia-dependent services can't start
→ Immich crash loop (DB ahead of pinned version)
→ Frigate unhealthy (redacted camera IPs from git reset)
→ Documents/Photos fail (missing RESTIC_PASSWORD from git reset)
Root problem: git reset --hard wiped local configs silently.
Containers ran with cached config until power outage forced restarts.
Prevention Plan¶
P0 — Immediate¶
1. Never Run git reset --hard on Deployed Hosts¶
The git reset --hard origin/main pattern after filter-repo destroys local configuration that differs from the repo (secrets, redacted values, pinned versions).
Alternative: After a filter-repo force push, on deployed hosts use:
Or better: use Ansible to deploy, which applies repo changes but preserves .env files and local overrides.
2. Backup .env Files Before Any Git Reset¶
# On deployed host, before git operations:
mkdir -p /tmp/env-backup-$(date +%Y%m%d)
find /opt/homelab/repo/docker -name ".env" -exec cp --parents {} /tmp/env-backup-$(date +%Y%m%d)/ \;
Add this to MEMORY.md as a gotcha.
3. Pin Immich to Correct Version — DONE¶
Updated from v2.5.6 to v2.6.1 matching DB migration state (PR #45).
4. Document Frigate Local Config Restoration¶
The frigate-zones-real.local file (gitignored) contains real camera IPs and zone coordinates. After any git reset, these must be manually restored. Consider an Ansible task for this.
P1 — Short Term¶
5. UPS for NAS¶
The NAS has no UPS. A power flicker takes it offline and requires manual restart. The UPS purchase is already on the pending list.
6. NAS Auto-Restart After Power Loss¶
ASUS BIOS: check for "Restore on AC Power Loss" setting. Current behavior is "stay off" after power loss. If the BIOS supports it, enable "Power On" after AC restore.
7. Ansible-Based Deployment¶
Instead of git pull && docker compose up on deployed hosts, use Ansible playbooks that:
- Pull latest repo changes
- Preserve
.envfiles (never overwrite) - Apply local overrides (camera IPs, zone coordinates)
- Restart only changed containers
This eliminates the entire class of "git reset wiped my config" incidents.
8. .env Backup in Restic¶
The backup-env.sh script exists but only covers Docker VM. Extend to NAS stacks. Add to the offsite sync so .env files are backed up to Google Drive.
Lessons Learned¶
-
git reset --hardis a deployment anti-pattern. It wipes everything that differs from the repo — including secrets, local overrides, and version pins. Everyfilter-repoforce push has caused downstream issues on deployed hosts. -
Redacted configs in the repo create a deployment trap. Camera IPs, zone coordinates, and password hashes are redacted for the public repo but needed for deployment. The gap between "what's in git" and "what's deployed" grows with each security scrub.
-
Silent config drift is dangerous. Containers continued running with cached old configs after the git reset. The problems only surfaced when containers restarted (power outage). This could have been weeks later with harder-to-diagnose symptoms.
-
Immich version pinning must match DB state. Watchtower upgraded the DB before we pinned the server version. Always check
docker logsfor the running version before pinning. -
A UPS would have prevented this entire incident. The NAS going offline was the trigger. A $40 UPS riding through a brief power disconnection would have avoided 2 hours of troubleshooting.
References¶
- Previous incidents:
incident-2026-03-05-isp-outage.md,incident-2026-03-18-wan-nat-outage.md - Frigate zones backup:
docker/fixed/docker-vm/security/frigate-zones-real.local .envbackup script:docker/shared/backup/backup-env.sh- NAS boot issue: documented in MEMORY.md ("NAS boots from USB — ASUS BIOS can't detect SSD UEFI")