Learn From My Mistakes - What I Learnt Over The Years Of Selfhosting And What I Shouldve Done Differently
Learn From My Mistakes - What I Learnt Over The Years Of Selfhosting And What I Shouldve Done Differently
Introduction
Self-hosting infrastructure is a rite of passage for DevOps engineers and system administrators. What starts as a simple Minecraft server shared via Hamachi often evolves into complex home labs running mission-critical services. After a decade of managing self-hosted environments, I’ve accumulated enough painful lessons to fill a disaster recovery plan tome.
The allure of self-hosting is undeniable: complete control over your infrastructure, cost savings compared to cloud services, and unparalleled learning opportunities. But without proper foundations, these environments become ticking time bombs. I’ve personally lost terabytes of data, suffered multi-day outages, and rebuilt entire clusters from scratch - all due to preventable mistakes.
In this comprehensive guide, we’ll examine critical infrastructure management lessons through the lens of hard-earned experience. You’ll learn:
- Backup strategies that actually work when disaster strikes
- Configuration management techniques that prevent “works on my machine” syndrome
- Security hardening approaches for internet-exposed services
- Maintenance workflows that don’t consume your weekends
- Architectural decisions that pay dividends as your environment grows
Whether you’re running a Raspberry Pi cluster or a basement data center, these battle-tested practices will transform your self-hosted environment from a fragile house of cards into a resilient infrastructure worthy of production workloads.
Understanding Self-Hosting Infrastructure
What is Self-Hosting?
Self-hosting refers to deploying and managing services on infrastructure you control, typically in a home or private lab environment. Unlike cloud hosting where providers manage the underlying hardware and virtualization layer, self-hosted environments require hands-on management of:
- Physical/Virtual servers
- Networking equipment
- Storage systems
- Security controls
- Service deployments
Evolution of Home Lab Technology
The self-hosting landscape has dramatically changed over the past decade:
| Era | Hardware | Software | Key Innovations |
|---|---|---|---|
| 2010-2013 | Consumer PCs | Manual installs | VirtualBox, Hamachi |
| 2014-2016 | Used enterprise gear | Proxmox/ESXi | ZFS adoption |
| 2017-2019 | ARM devices | Docker swarm | Containerization |
| 2020-Present | Mini PCs/NVMe storage | Kubernetes/IaC | GitOps, Tailscale |
Critical Components of Modern Self-Hosted Environments
- Virtualization Layer: Proxmox VE (open-source) or VMware ESXi
- Container Orchestration: Docker Compose or Kubernetes
- Storage: ZFS or Btrfs for data integrity
- Networking: VLANs, reverse proxies (Traefik/Caddy), and VPNs
- Automation: Ansible/Terraform for configuration management
When Self-Hosting Makes Sense
Consider self-hosting when:
- Handling sensitive data that can’t go to third-party clouds
- Developing infrastructure management skills
- Running specialized hardware (GPU clusters, high-performance storage)
- Maintaining legacy systems that cloud providers don’t support
- Cost optimization for long-running workloads
Prerequisites for Stable Self-Hosting
Hardware Requirements
Minimum viable setup:
- 64-bit x86 processor with virtualization support (Intel VT-d/AMD-V)
- 16GB RAM (32GB recommended)
- 256GB SSD for OS + 1TB HDD for storage
- Dual Gigabit NICs
Enterprise-grade setup:
- ECC memory (critical for ZFS)
- IPMI/iDRAC for remote management
- UPS with network shutdown capability
- 10GbE networking for storage traffic
Software Requirements
Core components:
- Hypervisor: Proxmox VE 8.x or VMware ESXi 8
- Container Runtime: Docker 24.x with containerd
- Orchestration: Kubernetes 1.28+ or Docker Compose v2
- OS: Debian 12 Bookworm or Ubuntu 22.04 LTS
Security Foundations
Before exposing any services:
- Implement network segmentation:
1 2 3
# Create VLAN 30 for IoT devices ip link add link eth0 name eth0.30 type vlan id 30 ip addr add 192.168.30.1/24 dev eth0.30
- Configure firewall defaults:
1 2 3
ufw default deny incoming ufw default allow outgoing ufw allow from 192.168.1.0/24 to any port 22
- Enable automatic security updates:
1 2
apt install unattended-upgrades dpkg-reconfigure -plow unattended-upgrades
Pre-Installation Checklist
- Verify hardware compatibility
- Document physical network layout
- Prepare offline installation media
- Test UPS shutdown procedures
- Create cryptographic secrets vault (passbolt or hashicorp vault)
Installation & Configuration Walkthrough
Proxmox VE Base Installation
1
2
3
4
5
6
7
# Download latest ISO from https://www.proxmox.com/en/downloads
dd if=proxmox-ve_8.0.iso of=/dev/sdX bs=4M conv=fsync
# Post-install configuration
pveceph install --version reef
pveam update
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
ZFS Storage Pool Creation
1
2
3
4
5
6
7
8
9
10
# Identify disks for storage pool
lsblk -o NAME,SIZE,MODEL -d
# Create mirrored pool
zpool create -o ashift=12 tank mirror /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX12D1234567 /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX12D1234568
# Enable compression and regular scrubs
zfs set compression=lz4 tank
zpool set autotrim=on tank
echo "0 0 * * 0 /sbin/zpool scrub tank" | crontab -
Docker Rootless Configuration
1
2
3
4
5
6
7
8
9
10
11
# Install prerequisites
sudo apt-get install uidmap dbus-user-session
# Configure rootless mode
dockerd-rootless-setuptool.sh install
# Verify operation
docker run --rm hello-world
# Enable lingering for service persistence
sudo loginctl enable-linger $USER
Automated Backups with Restic
1
2
3
4
5
6
7
8
9
10
11
12
13
# Initialize repository (replace with your storage target)
restic init --repo sftp:user@backup-host:/restic-repos/homelab
# Create backup script /usr/local/bin/homelab-backup
#!/bin/bash
export RESTIC_PASSWORD="$(pass restic/homelab)"
restic backup \
--exclude-caches \
--exclude-file=/etc/restic/excludes \
/etc /var/lib/docker/volumes /opt
# Create systemd service and timer
systemctl --user enable --now restic-backup.timer
Configuration & Optimization
Security Hardening Checklist
- Container Isolation:
1 2 3 4 5 6 7 8
# docker-compose.yml security options services: app: security_opt: - no-new-privileges:true cap_drop: - ALL read_only: true
- Network Policies:
1 2
# Create docker network with no external access docker network create --internal secured-net
- Automated Vulnerability Scanning:
1 2 3 4 5
# Install trivy scanner curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin # Scan Docker images trivy image --severity HIGH,CRITICAL myapp:latest
Performance Optimization
ZFS Tunables for Mixed Workloads:
1
2
3
4
5
6
# Adjust ARC size (50% of RAM)
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
# Enable prefetch for database workloads
zfs set primarycache=all tank/databases
zfs set secondarycache=all tank/databases
Container Resource Constraints:
1
2
3
4
5
6
7
8
9
10
11
# docker-compose.yml resource limits
services:
db:
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '0.5'
memory: 512M
Monitoring Stack Configuration
1
2
3
4
5
6
7
8
# prometheus.yml excerpt
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['192.168.1.10:9100', '192.168.1.11:9100']
- job_name: 'docker'
static_configs:
- targets: ['docker-host:9323']
Usage & Operational Procedures
Daily Maintenance Checklist
- Check backup status:
1
restic snapshots --latest 3 - Verify container health:
1
docker ps --format "table $CONTAINER_ID\t$CONTAINER_NAMES\t$CONTAINER_STATUS\t$CONTAINER_PORTS"
- Review security updates:
1
apt list --upgradable
Disaster Recovery Workflow
Database Restoration:
1
2
3
4
5
6
7
8
9
10
# Identify latest backup
restic snapshots --path /var/lib/docker/volumes/db_data
# Restore to temporary location
restic restore latest --target /tmp/restore --include /var/lib/docker/volumes/db_data
# Replace volume contents
docker stop db
rsync -av --delete /tmp/restore/var/lib/docker/volumes/db_data/ /var/lib/docker/volumes/db_data/
docker start db
Capacity Planning Metrics
Monitor these key metrics:
| Metric | Warning Threshold | Critical Threshold | Collection Method |
|---|---|---|---|
| ZFS pool capacity | 80% | 90% | zpool list |
| ARC hit rate | 90% | 80% | node_exporter |
| Docker node memory | 70% | 85% | cadvisor |
| Network throughput | 70% of link speed | 90% of link speed | prometheus/node_exporter |
Troubleshooting Common Issues
Backup Failures
Symptoms: Restic exits with Fatal: unable to open repo
1
2
3
4
5
# Check network connectivity to repo
restic --repo sftp:user@backup-host:/restic-repos/homelab check
# Verify SSH key permissions
chmod 600 ~/.ssh/restic-key
Container Networking Problems
Diagnosis Steps:
1
2
3
4
5
6
7
8
# Inspect container network
docker inspect $CONTAINER_ID --format ''
# Test DNS resolution
docker exec -it $CONTAINER_ID nslookup github.com
# Check iptables rules
iptables -L DOCKER-USER -v
ZFS Performance Degradation
Diagnosis:
1
2
3
4
5
6
7
8
# Check scrub status
zpool status tank
# Monitor ARC efficiency
arc_summary.py | grep -A 10 "ARC Size"
# Identify slow disks
zpool iostat -v 1
Conclusion
Through a decade of self-hosting misadventures, one truth emerges: resilience isn’t about avoiding failures but engineering systems that fail gracefully. The difference between a catastrophic outage and a minor inconvenience often comes down to foundational practices implemented before disaster strikes.
The most valuable lesson? Start simple. A single external drive backup beats no backup. A basic Docker Compose file is better than undocumented manual installs. Incremental improvements compound over time - my current robust environment evolved from years of iterative enhancements, not overnight transformations.
For those embarking on their self-hosting journey, prioritize these fundamentals:
- Automated Backups: Validate recovery weekly
- Immutable Infrastructure: Treat servers as cattle, not pets
- Observability: You can’t manage what you can’t measure
- Documentation: Future you will thank past you
- Security Boundaries: Assume breach and contain damage
Further Resources
- Proxmox VE Documentation - Official virtualization platform docs
- ZFS Best Practices Guide - Storage configuration recommendations
- Docker Security Best Practices - Container hardening techniques
- Restic Manual - Backup system documentation
- Linux Server Hardening Guide - Comprehensive security reference
Self-hosting remains one of the most effective ways to develop infrastructure expertise - provided you learn from others’ mistakes before making them yourself. The road ahead is filled with challenging problems waiting to be solved, and now you’re better equipped to solve them.