Post

Learn From My Mistakes - What I Learnt Over The Years Of Selfhosting And What I Shouldve Done Differently

Learn From My Mistakes - What I Learnt Over The Years Of Selfhosting And What I Shouldve Done Differently

Learn From My Mistakes - What I Learnt Over The Years Of Selfhosting And What I Shouldve Done Differently

Introduction

Self-hosting infrastructure is a rite of passage for DevOps engineers and system administrators. What starts as a simple Minecraft server shared via Hamachi often evolves into complex home labs running mission-critical services. After a decade of managing self-hosted environments, I’ve accumulated enough painful lessons to fill a disaster recovery plan tome.

The allure of self-hosting is undeniable: complete control over your infrastructure, cost savings compared to cloud services, and unparalleled learning opportunities. But without proper foundations, these environments become ticking time bombs. I’ve personally lost terabytes of data, suffered multi-day outages, and rebuilt entire clusters from scratch - all due to preventable mistakes.

In this comprehensive guide, we’ll examine critical infrastructure management lessons through the lens of hard-earned experience. You’ll learn:

  • Backup strategies that actually work when disaster strikes
  • Configuration management techniques that prevent “works on my machine” syndrome
  • Security hardening approaches for internet-exposed services
  • Maintenance workflows that don’t consume your weekends
  • Architectural decisions that pay dividends as your environment grows

Whether you’re running a Raspberry Pi cluster or a basement data center, these battle-tested practices will transform your self-hosted environment from a fragile house of cards into a resilient infrastructure worthy of production workloads.

Understanding Self-Hosting Infrastructure

What is Self-Hosting?

Self-hosting refers to deploying and managing services on infrastructure you control, typically in a home or private lab environment. Unlike cloud hosting where providers manage the underlying hardware and virtualization layer, self-hosted environments require hands-on management of:

  • Physical/Virtual servers
  • Networking equipment
  • Storage systems
  • Security controls
  • Service deployments

Evolution of Home Lab Technology

The self-hosting landscape has dramatically changed over the past decade:

EraHardwareSoftwareKey Innovations
2010-2013Consumer PCsManual installsVirtualBox, Hamachi
2014-2016Used enterprise gearProxmox/ESXiZFS adoption
2017-2019ARM devicesDocker swarmContainerization
2020-PresentMini PCs/NVMe storageKubernetes/IaCGitOps, Tailscale

Critical Components of Modern Self-Hosted Environments

  1. Virtualization Layer: Proxmox VE (open-source) or VMware ESXi
  2. Container Orchestration: Docker Compose or Kubernetes
  3. Storage: ZFS or Btrfs for data integrity
  4. Networking: VLANs, reverse proxies (Traefik/Caddy), and VPNs
  5. Automation: Ansible/Terraform for configuration management

When Self-Hosting Makes Sense

Consider self-hosting when:

  • Handling sensitive data that can’t go to third-party clouds
  • Developing infrastructure management skills
  • Running specialized hardware (GPU clusters, high-performance storage)
  • Maintaining legacy systems that cloud providers don’t support
  • Cost optimization for long-running workloads

Prerequisites for Stable Self-Hosting

Hardware Requirements

Minimum viable setup:

  • 64-bit x86 processor with virtualization support (Intel VT-d/AMD-V)
  • 16GB RAM (32GB recommended)
  • 256GB SSD for OS + 1TB HDD for storage
  • Dual Gigabit NICs

Enterprise-grade setup:

  • ECC memory (critical for ZFS)
  • IPMI/iDRAC for remote management
  • UPS with network shutdown capability
  • 10GbE networking for storage traffic

Software Requirements

Core components:

  • Hypervisor: Proxmox VE 8.x or VMware ESXi 8
  • Container Runtime: Docker 24.x with containerd
  • Orchestration: Kubernetes 1.28+ or Docker Compose v2
  • OS: Debian 12 Bookworm or Ubuntu 22.04 LTS

Security Foundations

Before exposing any services:

  1. Implement network segmentation:
    1
    2
    3
    
    # Create VLAN 30 for IoT devices
    ip link add link eth0 name eth0.30 type vlan id 30
    ip addr add 192.168.30.1/24 dev eth0.30
    
  2. Configure firewall defaults:
    1
    2
    3
    
    ufw default deny incoming
    ufw default allow outgoing
    ufw allow from 192.168.1.0/24 to any port 22
    
  3. Enable automatic security updates:
    1
    2
    
    apt install unattended-upgrades
    dpkg-reconfigure -plow unattended-upgrades
    

Pre-Installation Checklist

  1. Verify hardware compatibility
  2. Document physical network layout
  3. Prepare offline installation media
  4. Test UPS shutdown procedures
  5. Create cryptographic secrets vault (passbolt or hashicorp vault)

Installation & Configuration Walkthrough

Proxmox VE Base Installation

1
2
3
4
5
6
7
# Download latest ISO from https://www.proxmox.com/en/downloads
dd if=proxmox-ve_8.0.iso of=/dev/sdX bs=4M conv=fsync

# Post-install configuration
pveceph install --version reef
pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst

ZFS Storage Pool Creation

1
2
3
4
5
6
7
8
9
10
# Identify disks for storage pool
lsblk -o NAME,SIZE,MODEL -d

# Create mirrored pool
zpool create -o ashift=12 tank mirror /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX12D1234567 /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX12D1234568

# Enable compression and regular scrubs
zfs set compression=lz4 tank
zpool set autotrim=on tank
echo "0 0 * * 0 /sbin/zpool scrub tank" | crontab -

Docker Rootless Configuration

1
2
3
4
5
6
7
8
9
10
11
# Install prerequisites
sudo apt-get install uidmap dbus-user-session

# Configure rootless mode
dockerd-rootless-setuptool.sh install

# Verify operation
docker run --rm hello-world

# Enable lingering for service persistence
sudo loginctl enable-linger $USER

Automated Backups with Restic

1
2
3
4
5
6
7
8
9
10
11
12
13
# Initialize repository (replace with your storage target)
restic init --repo sftp:user@backup-host:/restic-repos/homelab

# Create backup script /usr/local/bin/homelab-backup
#!/bin/bash
export RESTIC_PASSWORD="$(pass restic/homelab)"
restic backup \
  --exclude-caches \
  --exclude-file=/etc/restic/excludes \
  /etc /var/lib/docker/volumes /opt

# Create systemd service and timer
systemctl --user enable --now restic-backup.timer

Configuration & Optimization

Security Hardening Checklist

  1. Container Isolation:
    1
    2
    3
    4
    5
    6
    7
    8
    
    # docker-compose.yml security options
    services:
      app:
        security_opt:
          - no-new-privileges:true
        cap_drop:
          - ALL
        read_only: true
    
  2. Network Policies:
    1
    2
    
    # Create docker network with no external access
    docker network create --internal secured-net
    
  3. Automated Vulnerability Scanning:
    1
    2
    3
    4
    5
    
    # Install trivy scanner
    curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
    
    # Scan Docker images
    trivy image --severity HIGH,CRITICAL myapp:latest
    

Performance Optimization

ZFS Tunables for Mixed Workloads:

1
2
3
4
5
6
# Adjust ARC size (50% of RAM)
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max

# Enable prefetch for database workloads
zfs set primarycache=all tank/databases
zfs set secondarycache=all tank/databases

Container Resource Constraints:

1
2
3
4
5
6
7
8
9
10
11
# docker-compose.yml resource limits
services:
  db:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '0.5'
          memory: 512M

Monitoring Stack Configuration

1
2
3
4
5
6
7
8
# prometheus.yml excerpt
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
  - job_name: 'docker'
    static_configs:
      - targets: ['docker-host:9323']

Usage & Operational Procedures

Daily Maintenance Checklist

  1. Check backup status:
    1
    
    restic snapshots --latest 3
    
  2. Verify container health:
    1
    
    docker ps --format "table $CONTAINER_ID\t$CONTAINER_NAMES\t$CONTAINER_STATUS\t$CONTAINER_PORTS"
    
  3. Review security updates:
    1
    
    apt list --upgradable
    

Disaster Recovery Workflow

Database Restoration:

1
2
3
4
5
6
7
8
9
10
# Identify latest backup
restic snapshots --path /var/lib/docker/volumes/db_data

# Restore to temporary location
restic restore latest --target /tmp/restore --include /var/lib/docker/volumes/db_data

# Replace volume contents
docker stop db
rsync -av --delete /tmp/restore/var/lib/docker/volumes/db_data/ /var/lib/docker/volumes/db_data/
docker start db

Capacity Planning Metrics

Monitor these key metrics:

MetricWarning ThresholdCritical ThresholdCollection Method
ZFS pool capacity80%90%zpool list
ARC hit rate90%80%node_exporter
Docker node memory70%85%cadvisor
Network throughput70% of link speed90% of link speedprometheus/node_exporter

Troubleshooting Common Issues

Backup Failures

Symptoms: Restic exits with Fatal: unable to open repo

1
2
3
4
5
# Check network connectivity to repo
restic --repo sftp:user@backup-host:/restic-repos/homelab check

# Verify SSH key permissions
chmod 600 ~/.ssh/restic-key

Container Networking Problems

Diagnosis Steps:

1
2
3
4
5
6
7
8
# Inspect container network
docker inspect $CONTAINER_ID --format ''

# Test DNS resolution
docker exec -it $CONTAINER_ID nslookup github.com

# Check iptables rules
iptables -L DOCKER-USER -v

ZFS Performance Degradation

Diagnosis:

1
2
3
4
5
6
7
8
# Check scrub status
zpool status tank

# Monitor ARC efficiency
arc_summary.py | grep -A 10 "ARC Size"

# Identify slow disks
zpool iostat -v 1

Conclusion

Through a decade of self-hosting misadventures, one truth emerges: resilience isn’t about avoiding failures but engineering systems that fail gracefully. The difference between a catastrophic outage and a minor inconvenience often comes down to foundational practices implemented before disaster strikes.

The most valuable lesson? Start simple. A single external drive backup beats no backup. A basic Docker Compose file is better than undocumented manual installs. Incremental improvements compound over time - my current robust environment evolved from years of iterative enhancements, not overnight transformations.

For those embarking on their self-hosting journey, prioritize these fundamentals:

  1. Automated Backups: Validate recovery weekly
  2. Immutable Infrastructure: Treat servers as cattle, not pets
  3. Observability: You can’t manage what you can’t measure
  4. Documentation: Future you will thank past you
  5. Security Boundaries: Assume breach and contain damage

Further Resources

Self-hosting remains one of the most effective ways to develop infrastructure expertise - provided you learn from others’ mistakes before making them yourself. The road ahead is filled with challenging problems waiting to be solved, and now you’re better equipped to solve them.

This post is licensed under CC BY 4.0 by the author.