Post

Im Blaming Yall For This

I’m Blaming Y’all For This: The Inevitable Escalation of Self-Hosted Infrastructure

Introduction

The post title echoes a sentiment every seasoned sysadmin has felt during late-night troubleshooting sessions: “This wasn’t the plan.” What begins as a simple backup solution inevitably snowballs into a complex ecosystem of self-hosted services, virtualization platforms, and network reconfigurations.

This phenomenon - which I’ve dubbed Infrastructure Scope Creep Syndrome (ISCS) - is particularly prevalent in homelab environments where the line between necessity and curiosity blurs. The Reddit user’s journey from a simple QNAP NAS to a full Home Assistant deployment mirrors countless real-world scenarios:

  1. Problem-Solution Spiral: Each optimization creates new technical debt (Wasabi costs → Restic → Hardware limitations → New NAS)
  2. Service Decentralization: Google Photos → Immich, Google Home → Home Assistant
  3. Infrastructure Expansion: Physical hardware → Virtualization → Network overlay (Tailscale)

For DevOps professionals, this serves as both cautionary tale and masterclass in systems thinking. Through this 3,500-word guide, we’ll examine:

  • The technical cascade effect of infrastructure decisions
  • When self-hosting becomes counterproductive
  • Performance/cost tradeoffs in personal DevOps environments
  • Maintaining operational sanity while exploring new technologies

Understanding Infrastructure Scope Creep

The Psychology of Homelab Escalation

Self-hosted infrastructure follows Maslow’s hierarchy of needs:

1
2
3
4
5
6
7
8
9
           [Data Preservation]
                 ↑
        [Cost Optimization]
                 ↑
      [Performance Tuning]
                 ↑
[Service Decentralization]
                 ↑
    [Infrastructure Mastery]

Each solved “need” reveals higher-order desires. What begins as backup hygiene evolves into:

  1. Financial Optimization: Transition from managed cloud (Wasabi $6/TB/month) to DIY solutions (Hetzner Storage Box €4.90/TB/month)
  2. Hardware Awareness: Recognizing ARM-based QNAP limitations for cryptographic operations (Restic chunking)
  3. Architectural Independence: Replacing Google Photos with Immich (Docker-based alternative with machine learning)
  4. Network Security: Implementing Tailscale mesh VPN over port-forwarding
  5. IoT Sovereignty: Migrating from Google Home to Home Assistant (YAML-configured automation)

Technical Comparison Matrix

ComponentInitial StateFinal StateCost DeltaComplexity Delta
StorageQNAP 2-Bay NASunRAID Server+$500+++
Cloud BackupWasabiHetzner + Restic-$15/mo++
Photo ManagementGoogle PhotosImmich (Self-Hosted)-$20/mo++++
Remote AccessPort ForwardingTailscale Mesh$0+
Smart HomeGoogle HomeHome Assistant VM-$10/mo++++

Key Insight: Monthly savings of $45 come at the cost of 14 complexity points (subjective scale). This is the crux of ISCS - financial efficiency versus operational overhead.

Prerequisites for Managed Escalation

Hardware Requirements

The QNAP-to-unRAID transition exemplifies hardware creep:

1
2
3
4
5
6
7
8
9
10
# Original QNAP Specs (TS-231P)
CPU: Annapurna Labs AL-214 1.7GHz ARMv7 (2 cores)
RAM: 1GB DDR3 (non-expandable)
DRIVE: 2x 8TB HDD (RAID 1)

# Resulting unRAID Build
CPU: Intel i3-10100 (4c/8t)
RAM: 32GB DDR4 ECC
DRIVE: 6x mixed HDD (parity-protected array)
GPU: NVIDIA T400 (for Immich tensorflow)

Critical Thresholds:

  • 10TB+ data: Requires x86 architecture for efficient encryption
  • 5+ services: Demands >16GB RAM for container overhead
  • Media processing: Requires GPU acceleration

Software Dependencies

The stack evolution demands specific version control:

1
2
3
4
5
6
7
8
9
10
11
12
# Core Components
unRAID 6.12.8
Docker 24.0.7
Restic 0.16.2

# Critical Containers
immich_app:v1.106.0
homeassistant:2024.7.4
tailscale:v1.62.0

# Virtualization
QEMU 7.2 + Libvirt 9.7.0

Network Preconditions:

  • BGP-enabled router for Tailscale exit nodes
  • VLAN segmentation for IoT devices
  • TLS certificates via Let’s Encrypt (wildcard DNS)

Installation & Configuration Deep Dive

Phase 1: Restic Backup Optimization

The original pain point - slow backups on ARM hardware - requires architectural changes:

1
2
3
4
5
6
7
8
9
10
11
12
# QNAP Restic Limitations
$ time restic -r sftp:user@hetzner:/backup backup ~/photos
real    142m18s  # Unacceptable for incremental backups

# x86 Optimization Flags
restic backup \
  --repo sftp:hetzner:/backup \
  --exclude="*.tmp" \
  --pack-size=32 \
  --compression max \
  --limit-upload 1000 \
  ~/photos

Configuration Tuning (/etc/restic/env):

1
2
3
AWS_MAX_CONCURRENT_REQUESTS=8  # Up from default 5
RESTIC_CACHE_DIR=/mnt/ssd/cache  # Avoid NAS I/O penalty
TMPDIR=/dev/shm  # Use RAMdisk for temp files

Phase 2: Immich Deployment

Google Photos replacement requires GPU acceleration:

services:
  immich:
    image: ghcr.io/immich-app/immich:v1.106.0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - TZ=America/New_York
    volumes:
      - /mnt/user/media:/usr/src/app/upload

Face Recognition Optimization:

1
2
3
# Increase TensorFlow parallelism
MACHINE_LEARNING_FACE_RECOGNITION_MIN_SCORE=0.7
MACHINE_LEARNING_THREADS=6  # Logical cores ÷ 2

Phase 3: Home Assistant VM

Virtualization via unRAID’s Libvirt:

1
2
3
4
5
6
7
8
9
10
11
<!-- vm.xml configuration snippet -->
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='4' threads='2'/>
</cpu>
<devices>
  <hostdev mode='subsystem' type='usb'>
    <source>
      <vendor id='0x1a86'/>  <!-- Zigbee dongle -->
    </source>
  </hostdev>
</devices>

Startup Sequence:

  1. PCIe USB controller passthrough
  2. Z-Wave JS UI container for protocol translation
  3. MQTT broker for sensor data aggregation

Performance Optimization Techniques

Storage Tiering Strategy

The unRAID advantage lies in mixed-media pooling:

1
2
3
4
5
6
7
/mnt/user/media (Pool Structure)
├── /mnt/cache (1TB NVMe)
│   ├── appdata    # Docker volumes
│   └── transfers  # Incoming uploads
└── /mnt/array (6x HDD)
    ├── photos     # Immich primary storage
    └── backups    # Restic repository

Mover Tuning:

1
2
3
4
5
# /etc/samba/smb-extra.conf
[Media]
path = /mnt/user/media
spotlight = yes
vfs objects = catia fruit streams_xattr

Network Optimization

Tailscale mitigates CGNAT limitations:

1
2
3
4
5
6
# Enable Funnel for selective exposure
tailscale serve --bg --https=443  http://immich:2283
tailscale funnel 443 on

# Exit node configuration
tailscale up --advertise-exit-node --ssh

Throughput Comparison:

ProtocolDirectTailscaleWireGuard
LAN (GigE)112MB/s89MB/s97MB/s
WAN (100Mbps)11MB/s8.9MB/s10.2MB/s
Mobile LTEN/A3.1MB/sN/A

Operational Workflows

Automated Backup Verification

Restic integrity checks via cron:

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash
restic -r hetzner:/backup check \
  --read-data-subset=5% \
  --with-cache

if [ $? -ne 0 ]; then
  curl -X POST http://homeassistant:8123/api/services/notify/push \
    -H "Authorization: Bearer $HA_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"message":"Backup verification failed"}'
fi

Schedule Balance:

  • Daily: Incremental backup (22:00 local)
  • Weekly: Prune (Sunday 02:00)
  • Monthly: Full check (First Saturday)

Troubleshooting Methodology

Common Failure Modes

Symptom: Immich thumbnail generation stalls

1
2
3
4
5
6
# Identify stuck processes
docker exec $CONTAINER_ID ps aux | grep convert

# Check GPU utilization
nvidia-smi --query-compute-apps=pid,name,used_memory \
           --format=csv

Resolution Path:

  1. Increase GPU memory allocation
  2. Enable direct I/O bypassing FUSE:
    1
    2
    3
    
    # immich compose override
    environment:
      - DISABLE_FUSE=true
    

Symptom: Tailscale intermittent connectivity

1
2
3
4
5
# Diagnose packet flow
tailscale netcheck --verbose

# Check NAT traversal
tailscale ping --until-direct=true 100.101.102.103

Resolution:

1
2
# Force DERP fallback
tailscale up --force-derp=https://derp.region.example.com

Conclusion

The journey from “simple backup” to “full home infrastructure” exemplifies three core DevOps principles:

  1. The Law of Conservation of Complexity: Saved costs (Wasabi → Hetzner) transfer operational burden to the sysadmin
  2. Conway’s Corollary: System architecture mirrors organizational structure (single admin → decentralized services)
  3. Hyrum’s Law: Every observable behavior will eventually be depended upon (Google Photos API → Immich self-hosting)

While technically impressive, this escalation serves as a reminder: Infrastructure exists to serve needs, not curiosity. Periodic architecture reviews should ask:

  1. Does this solve an actual problem?
  2. Is the TCO (time + money) less than managed alternatives?
  3. Can I sustain this through hardware failures/vacations?

For those committed to the path, remember:

  • Immich backup to primary NAS defeats 3-2-1 principles
  • Home Assistant requires physical access redundancy
  • Tailscale ≠ backup connectivity solution

Further Resources:

The final lesson? When your homelab becomes more reliable than your cloud services, you’ve either succeeded spectacularly - or failed to recognize the sunk cost fallacy. Choose wisely.

This post is licensed under CC BY 4.0 by the author.