My First Homelab Setup
My First Homelab Setup: A DevOps Engineer’s Practical Guide
Introduction
The Reddit post showing a homelab running on “a mix of Ethernet and ethanol” perfectly captures the essence of this journey. As DevOps engineers and sysadmins, our professional expertise often collides with the chaotic reality of personal infrastructure projects. This guide documents a practical, production-grade approach to building your first homelab - complete with the inevitable troubleshooting whiskey bottle within arm’s reach.
Homelabs serve as critical learning environments where we experiment with technologies too risky for production systems. They’re sandboxes for mastering infrastructure-as-code, container orchestration, network segmentation, and high-availability configurations. Unlike cloud playgrounds that disappear with a billing cycle, physical homelabs provide tangible experience with hardware limitations, thermal constraints, and real-world failure scenarios.
In this 4000-word definitive guide, you’ll learn:
- Hardware selection balancing performance and power efficiency
- Enterprise-grade virtualization using Proxmox VE
- Container orchestration with Docker Swarm (Kubernetes-light)
- Network segmentation and security best practices
- Monitoring and alerting stack implementation
- Automated backup strategies for bare-metal recovery
- Power management and operational cost optimization
We’ll implement these solutions while acknowledging the “Fireball whiskey” reality of homelab operations - where high availability sometimes means having spare hardware in the closet and backups might consist of external drives in a fireproof safe.
Understanding Homelab Fundamentals
What Exactly is a Homelab?
A homelab is a personal technology sandbox that replicates enterprise infrastructure environments at a smaller scale. Unlike corporate data centers, homelabs typically:
- Run on consumer-grade or decommissioned enterprise hardware
- Prioritize learning over uptime (though we pretend otherwise)
- Combine production services (media servers, file shares) with experimental setups
- Operate within residential power and thermal constraints
Key Homelab Components
| Component | Enterprise Equivalent | Homelab Reality |
|---|---|---|
| Compute | VMware ESXi Cluster | Used Intel NUC/Proliant DL |
| Storage | SAN/NAS with SSD caching | ZFS array in old PC case |
| Networking | Cisco/Juniper stack | Ubiquiti Dream Machine SE |
| Backup | Veeam/Commvault | Rclone to Backblaze B2 |
| Monitoring | Datadog/New Relic | Prometheus+Grafana VM |
| High Availability | Redundant power/network | Single PSU with UPS backup |
The Homelab Evolution Cycle
- Phase 1 - The Accidental Server: Old desktop running Plex
- Phase 2 - Virtualization Enlightenment: Proxmox/Hyper-V cluster
- Phase 3 - Network Segmentation: VLANs and pfSense firewall
- Phase 4 - Infrastructure as Code: Terraform/Ansible adoption
- Phase 5 - The Whiskey Phase: Realization that Kubernetes on Raspberry Pis was a bad idea
Why Homelabs Matter for DevOps Professionals
- Risk-Free Experimentation: Test kernel updates, breaking changes, and security patches without career consequences
- Deep Technology Understanding: Learn how storage actually works when you lose a ZFS vdev
- Troubleshooting Skills: Develop patience when debugging why NFS shares disappear after reboots
- Budget Constraints Creativity: Implement HAProxy because you can’t afford F5 BIG-IP
Prerequisites
Hardware Requirements (Minimum)
| Component | Specification | Notes |
|---|---|---|
| Host Machine | Intel i5 8th Gen / Ryzen 5 3600 | AES-NI for encryption, VT-d/AMD-V |
| RAM | 16GB DDR4 | ECC recommended for ZFS |
| Storage | 2x500GB SSD (Boot/VM) + 2x4TB HDD | Separate boot/media storage |
| Network | Dual Gigabit NIC | VLAN separation for management/data |
| Power | UPS 650VA | Protect against dirty power |
Software Requirements
- Hypervisor: Proxmox VE 8.x (Installation Guide)
- Containers: Docker 24.x (Install Docs)
- Networking: Open vSwitch 3.1.x
- Monitoring: Prometheus 2.47 + Grafana 10.1
Network Planning
Create this VLAN structure before installation:
| VLAN ID | Purpose | Subnet | DHCP Scope |
|---|---|---|---|
| 10 | Management | 192.168.10.0/24 | .100-.200 |
| 20 | Services | 192.168.20.0/24 | .100-.200 |
| 30 | IoT | 192.168.30.0/24 | .100-.200 |
| 40 | Guest | 192.168.40.0/24 | .100-.200 |
Security Checklist
- Physically secure server location (no “basement flooding” disasters)
- Disable IPMI default credentials
- Plan firewall rules between VLANs
- Generate SSH keys for host access
- Note down all MAC addresses for port security
Installation & Configuration
Proxmox VE Installation
1
2
3
4
5
6
7
8
9
10
11
12
# Download latest ISO
wget https://download.proxmox.com/iso/proxmox-ve_8.1-1.iso
# Create bootable USB (Linux example)
sudo dd if=proxmox-ve_8.1-1.iso of=/dev/sdb bs=4M status=progress conv=fdatasync
# Installation steps:
# 1. Select "Install Proxmox VE"
# 2. Set country/timezone/US keyboard
# 3. Password for root@host (use 32+ char password)
# 4. Configure management interface on VLAN 10
# 5. Use ZFS mirror for boot drives
Post-install configuration:
1
2
3
4
5
6
7
8
9
10
11
# Update sources
echo "deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise" > /etc/apt/sources.list.d/pve-enterprise.list
# Add no-subscription repo
echo "deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list
# Update and upgrade
apt update && apt dist-upgrade -y
# Install common tools
apt install tmux zsh git curl net-tools ovs-switch -y
Docker Swarm Setup
On first manager node:
1
2
3
4
5
6
7
8
9
10
11
# Install Docker
curl -fsSL https://get.docker.com | sh
# Initialize swarm
docker swarm init --advertise-addr 192.168.10.101
# Get join token
docker swarm join-token worker
# Sample output:
# docker swarm join --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c 192.168.10.101:2377
On worker nodes:
1
docker swarm join --token <TOKEN> 192.168.10.101:2377
Verify cluster status:
1
2
3
4
5
6
docker node ls
# CORRECT OUTPUT FORMAT:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
l4gku8f7a8j5zq6z9x3x9x3x9 node1 Ready Active Leader 24.0.7
3x9x3x9x3x9x3x9x3x9x3x9x3 node2 Ready Active Reachable 24.0.7
Network Configuration with Open vSwitch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Create bridge for VM traffic
ovs-vsctl add-br vmbr0
ovs-vsctl add-port vmbr0 eno1 tag=10 vlan_mode=native-untagged
ovs-vsctl add-port vmbr0 eno2 tag=20
ovs-vsctl set port eno2 trunks=10,20,30,40
# Configure VLAN interfaces
ovs-vsctl add-br mgmt0
ovs-vsctl add-port mgmt0 vmbr0 tag=10
ovs-vsctl add-br services0
ovs-vsctl add-port services0 vmbr0 tag=20
# Persistent configuration
cat << EOF > /etc/network/interfaces.d/ovs
auto vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports eno1 eno2
allow-vmbr0 mgmt0
iface mgmt0 inet static
address 192.168.10.101/24
gateway 192.168.10.1
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=10
allow-vmbr0 services0
iface services0 inet manual
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=20
EOF
Configuration & Optimization
Proxmox Resource Allocation Best Practices
- CPU Pinning: Assign dedicated cores to critical VMs
1
qm set 100 -cpu cpus=0-3
- Memory Ballooning: Allow dynamic memory allocation
1
qm set 100 -balloon 1024
- Storage Tiering:
- SSD: VM operating systems and databases
- HDD: Media storage and backups
- NVMe: Ceph or ZFS caching
Docker Swarm Security Hardening
- Enable swarm encryption:
1
docker swarm update --task-history-limit 50 --autolock=true
- Create container security profiles:
1 2 3 4 5 6 7 8 9 10
{ "defaults": { "user": "nobody", "no-new-privileges": true }, "sysctls": { "net.ipv4.tcp_syncookies": "1", "net.ipv4.conf.all.rp_filter": "1" } }
- Implement resource constraints:
1 2 3 4 5 6 7 8 9 10
services: nginx: deploy: resources: limits: cpus: '0.50' memory: 512M reservations: cpus: '0.25' memory: 256M
Monitoring Stack Implementation
Prometheus configuration for Proxmox:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# prometheus.yml
scrape_configs:
- job_name: 'proxmox'
metrics_path: '/pve'
params:
module: [proxmox]
static_configs:
- targets: ['192.168.10.101:9221']
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.10.101:9221
Grafana dashboard import via docker-compose:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version: '3.8'
services:
grafana:
image: grafana/grafana:10.1.0
volumes:
- grafana_data:/var/lib/grafana
- ./dashboards:/etc/grafana/provisioning/dashboards
deploy:
mode: replicated
replicas: 1
resources:
limits:
memory: 512M
networks:
- metrics
volumes:
grafana_data:
networks:
metrics:
driver: overlay
attachable: true
Operations & Maintenance
Daily Operations Checklist
- Resource Monitoring:
1 2 3 4 5 6 7 8
# Check Proxmox cluster status pvecm status # Docker node health docker node ps $CONTAINER_ID # ZFS pool status zpool status -v
- Backup Verification:
1 2 3 4 5
# List Proxmox backups pvesm list local-backup # Test Docker volume restore docker run --rm -v backup_verify:/data alpine ls -l /data
- Security Updates:
1 2 3 4 5
# Proxmox updates apt update && apt dist-upgrade -y # Docker image updates docker images | awk '(NR>1) && ($2!="<none>") {print $1":"$2}' | xargs -L1 docker pull
Backup Strategy Implementation
Three-tier backup approach:
- Local Snapshots (15 minute intervals):
1 2
# ZFS automated snapshots zfs set com.sun:auto-snapshot=true rpool/data
- NAS Replication (Nightly):
1 2
# ZFS send/receive to backup server zfs send rpool/data@snap-20240501 | ssh backup-host zfs recv backup/data - Cloud Backup (Weekly):
1 2
# Rclone encrypted backup to B2 rclone sync /backup b2:homelab-backup --b2-hard-delete --transfers 16
Scaling Considerations
When your whiskey collection outgrows your rack space:
- Vertical Scaling:
- Add NVMe caching layer
- Upgrade to 10Gbps networking
- Implement ECC memory
- Horizontal Scaling:
- Add compute nodes with identical specs
- Deploy Ceph distributed storage
- Implement load balancing with HAProxy
- Density Optimization:
- Replace towers with rack-mounted servers
- Implement PoE switches for low-power devices
- Consolidate services through containerization
Troubleshooting
Common Issues and Resolutions
Problem: All VMs lose network connectivity after switch reboot
Solution:
1
2
3
4
5
# Reinitialize OVS bridges
systemctl restart openvswitch-switch
# Verify bridge mappings
ovs-vsctl show
Problem: Docker swarm nodes show “Unreachable” status
Debugging:
1
2
3
4
5
6
7
8
# Check swarm manager logs
journalctl -u docker.service --since "10 minutes ago"
# Verify firewall rules
iptables -L DOCKER-USER -v -n
# Test overlay network
docker network create -d overlay --attachable test-net
Problem: ZFS pool reports “corrupted data” errors
Recovery:
1
2
3
4
5
6
7
8
# Scrub pool
zpool scrub tank
# Check errors
zpool status -v
# Restore from backup if needed
zfs rollback tank/data@snap-pre-corruption
Performance Tuning
- Disk I/O Optimization:
1 2 3 4 5
# Set ZFS ARC limits echo $((16 * 1024 * 1024 * 1024)) > /sys/module/zfs/parameters/zfs_arc_max # Adjust VM disk scheduler echo ky