Im Blaming Yall For This
I’m Blaming Y’all For This: The Inevitable Escalation of Self-Hosted Infrastructure
Introduction
The post title echoes a sentiment every seasoned sysadmin has felt during late-night troubleshooting sessions: “This wasn’t the plan.” What begins as a simple backup solution inevitably snowballs into a complex ecosystem of self-hosted services, virtualization platforms, and network reconfigurations.
This phenomenon - which I’ve dubbed Infrastructure Scope Creep Syndrome (ISCS) - is particularly prevalent in homelab environments where the line between necessity and curiosity blurs. The Reddit user’s journey from a simple QNAP NAS to a full Home Assistant deployment mirrors countless real-world scenarios:
- Problem-Solution Spiral: Each optimization creates new technical debt (Wasabi costs → Restic → Hardware limitations → New NAS)
- Service Decentralization: Google Photos → Immich, Google Home → Home Assistant
- Infrastructure Expansion: Physical hardware → Virtualization → Network overlay (Tailscale)
For DevOps professionals, this serves as both cautionary tale and masterclass in systems thinking. Through this 3,500-word guide, we’ll examine:
- The technical cascade effect of infrastructure decisions
- When self-hosting becomes counterproductive
- Performance/cost tradeoffs in personal DevOps environments
- Maintaining operational sanity while exploring new technologies
Understanding Infrastructure Scope Creep
The Psychology of Homelab Escalation
Self-hosted infrastructure follows Maslow’s hierarchy of needs:
1
2
3
4
5
6
7
8
9
[Data Preservation]
↑
[Cost Optimization]
↑
[Performance Tuning]
↑
[Service Decentralization]
↑
[Infrastructure Mastery]
Each solved “need” reveals higher-order desires. What begins as backup hygiene evolves into:
- Financial Optimization: Transition from managed cloud (Wasabi $6/TB/month) to DIY solutions (Hetzner Storage Box €4.90/TB/month)
- Hardware Awareness: Recognizing ARM-based QNAP limitations for cryptographic operations (Restic chunking)
- Architectural Independence: Replacing Google Photos with Immich (Docker-based alternative with machine learning)
- Network Security: Implementing Tailscale mesh VPN over port-forwarding
- IoT Sovereignty: Migrating from Google Home to Home Assistant (YAML-configured automation)
Technical Comparison Matrix
Component | Initial State | Final State | Cost Delta | Complexity Delta |
---|---|---|---|---|
Storage | QNAP 2-Bay NAS | unRAID Server | +$500 | +++ |
Cloud Backup | Wasabi | Hetzner + Restic | -$15/mo | ++ |
Photo Management | Google Photos | Immich (Self-Hosted) | -$20/mo | ++++ |
Remote Access | Port Forwarding | Tailscale Mesh | $0 | + |
Smart Home | Google Home | Home Assistant VM | -$10/mo | ++++ |
Key Insight: Monthly savings of $45 come at the cost of 14 complexity points (subjective scale). This is the crux of ISCS - financial efficiency versus operational overhead.
Prerequisites for Managed Escalation
Hardware Requirements
The QNAP-to-unRAID transition exemplifies hardware creep:
1
2
3
4
5
6
7
8
9
10
# Original QNAP Specs (TS-231P)
CPU: Annapurna Labs AL-214 1.7GHz ARMv7 (2 cores)
RAM: 1GB DDR3 (non-expandable)
DRIVE: 2x 8TB HDD (RAID 1)
# Resulting unRAID Build
CPU: Intel i3-10100 (4c/8t)
RAM: 32GB DDR4 ECC
DRIVE: 6x mixed HDD (parity-protected array)
GPU: NVIDIA T400 (for Immich tensorflow)
Critical Thresholds:
- 10TB+ data: Requires x86 architecture for efficient encryption
- 5+ services: Demands >16GB RAM for container overhead
- Media processing: Requires GPU acceleration
Software Dependencies
The stack evolution demands specific version control:
1
2
3
4
5
6
7
8
9
10
11
12
# Core Components
unRAID 6.12.8
Docker 24.0.7
Restic 0.16.2
# Critical Containers
immich_app:v1.106.0
homeassistant:2024.7.4
tailscale:v1.62.0
# Virtualization
QEMU 7.2 + Libvirt 9.7.0
Network Preconditions:
- BGP-enabled router for Tailscale exit nodes
- VLAN segmentation for IoT devices
- TLS certificates via Let’s Encrypt (wildcard DNS)
Installation & Configuration Deep Dive
Phase 1: Restic Backup Optimization
The original pain point - slow backups on ARM hardware - requires architectural changes:
1
2
3
4
5
6
7
8
9
10
11
12
# QNAP Restic Limitations
$ time restic -r sftp:user@hetzner:/backup backup ~/photos
real 142m18s # Unacceptable for incremental backups
# x86 Optimization Flags
restic backup \
--repo sftp:hetzner:/backup \
--exclude="*.tmp" \
--pack-size=32 \
--compression max \
--limit-upload 1000 \
~/photos
Configuration Tuning (/etc/restic/env):
1
2
3
AWS_MAX_CONCURRENT_REQUESTS=8 # Up from default 5
RESTIC_CACHE_DIR=/mnt/ssd/cache # Avoid NAS I/O penalty
TMPDIR=/dev/shm # Use RAMdisk for temp files
Phase 2: Immich Deployment
Google Photos replacement requires GPU acceleration:
services:
immich:
image: ghcr.io/immich-app/immich:v1.106.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- TZ=America/New_York
volumes:
- /mnt/user/media:/usr/src/app/upload
Face Recognition Optimization:
1
2
3
# Increase TensorFlow parallelism
MACHINE_LEARNING_FACE_RECOGNITION_MIN_SCORE=0.7
MACHINE_LEARNING_THREADS=6 # Logical cores ÷ 2
Phase 3: Home Assistant VM
Virtualization via unRAID’s Libvirt:
1
2
3
4
5
6
7
8
9
10
11
<!-- vm.xml configuration snippet -->
<cpu mode='host-passthrough'>
<topology sockets='1' cores='4' threads='2'/>
</cpu>
<devices>
<hostdev mode='subsystem' type='usb'>
<source>
<vendor id='0x1a86'/> <!-- Zigbee dongle -->
</source>
</hostdev>
</devices>
Startup Sequence:
- PCIe USB controller passthrough
- Z-Wave JS UI container for protocol translation
- MQTT broker for sensor data aggregation
Performance Optimization Techniques
Storage Tiering Strategy
The unRAID advantage lies in mixed-media pooling:
1
2
3
4
5
6
7
/mnt/user/media (Pool Structure)
├── /mnt/cache (1TB NVMe)
│ ├── appdata # Docker volumes
│ └── transfers # Incoming uploads
└── /mnt/array (6x HDD)
├── photos # Immich primary storage
└── backups # Restic repository
Mover Tuning:
1
2
3
4
5
# /etc/samba/smb-extra.conf
[Media]
path = /mnt/user/media
spotlight = yes
vfs objects = catia fruit streams_xattr
Network Optimization
Tailscale mitigates CGNAT limitations:
1
2
3
4
5
6
# Enable Funnel for selective exposure
tailscale serve --bg --https=443 http://immich:2283
tailscale funnel 443 on
# Exit node configuration
tailscale up --advertise-exit-node --ssh
Throughput Comparison:
Protocol | Direct | Tailscale | WireGuard |
---|---|---|---|
LAN (GigE) | 112MB/s | 89MB/s | 97MB/s |
WAN (100Mbps) | 11MB/s | 8.9MB/s | 10.2MB/s |
Mobile LTE | N/A | 3.1MB/s | N/A |
Operational Workflows
Automated Backup Verification
Restic integrity checks via cron:
1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
restic -r hetzner:/backup check \
--read-data-subset=5% \
--with-cache
if [ $? -ne 0 ]; then
curl -X POST http://homeassistant:8123/api/services/notify/push \
-H "Authorization: Bearer $HA_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message":"Backup verification failed"}'
fi
Schedule Balance:
- Daily: Incremental backup (22:00 local)
- Weekly: Prune (Sunday 02:00)
- Monthly: Full check (First Saturday)
Troubleshooting Methodology
Common Failure Modes
Symptom: Immich thumbnail generation stalls
1
2
3
4
5
6
# Identify stuck processes
docker exec $CONTAINER_ID ps aux | grep convert
# Check GPU utilization
nvidia-smi --query-compute-apps=pid,name,used_memory \
--format=csv
Resolution Path:
- Increase GPU memory allocation
- Enable direct I/O bypassing FUSE:
1 2 3
# immich compose override environment: - DISABLE_FUSE=true
Symptom: Tailscale intermittent connectivity
1
2
3
4
5
# Diagnose packet flow
tailscale netcheck --verbose
# Check NAT traversal
tailscale ping --until-direct=true 100.101.102.103
Resolution:
1
2
# Force DERP fallback
tailscale up --force-derp=https://derp.region.example.com
Conclusion
The journey from “simple backup” to “full home infrastructure” exemplifies three core DevOps principles:
- The Law of Conservation of Complexity: Saved costs (Wasabi → Hetzner) transfer operational burden to the sysadmin
- Conway’s Corollary: System architecture mirrors organizational structure (single admin → decentralized services)
- Hyrum’s Law: Every observable behavior will eventually be depended upon (Google Photos API → Immich self-hosting)
While technically impressive, this escalation serves as a reminder: Infrastructure exists to serve needs, not curiosity. Periodic architecture reviews should ask:
- Does this solve an actual problem?
- Is the TCO (time + money) less than managed alternatives?
- Can I sustain this through hardware failures/vacations?
For those committed to the path, remember:
- Immich backup to primary NAS defeats 3-2-1 principles
- Home Assistant requires physical access redundancy
- Tailscale ≠ backup connectivity solution
Further Resources:
- Restic Compression Benchmarks
- Immich GPU Configuration Guide
- Tailscale Best Practices
- Home Assistant VM Passthrough
The final lesson? When your homelab becomes more reliable than your cloud services, you’ve either succeeded spectacularly - or failed to recognize the sunk cost fallacy. Choose wisely.