Post

My First Homelab Setup

My First Homelab Setup: A DevOps Engineer’s Practical Guide

Introduction

The Reddit post showing a homelab running on “a mix of Ethernet and ethanol” perfectly captures the essence of this journey. As DevOps engineers and sysadmins, our professional expertise often collides with the chaotic reality of personal infrastructure projects. This guide documents a practical, production-grade approach to building your first homelab - complete with the inevitable troubleshooting whiskey bottle within arm’s reach.

Homelabs serve as critical learning environments where we experiment with technologies too risky for production systems. They’re sandboxes for mastering infrastructure-as-code, container orchestration, network segmentation, and high-availability configurations. Unlike cloud playgrounds that disappear with a billing cycle, physical homelabs provide tangible experience with hardware limitations, thermal constraints, and real-world failure scenarios.

In this 4000-word definitive guide, you’ll learn:

  1. Hardware selection balancing performance and power efficiency
  2. Enterprise-grade virtualization using Proxmox VE
  3. Container orchestration with Docker Swarm (Kubernetes-light)
  4. Network segmentation and security best practices
  5. Monitoring and alerting stack implementation
  6. Automated backup strategies for bare-metal recovery
  7. Power management and operational cost optimization

We’ll implement these solutions while acknowledging the “Fireball whiskey” reality of homelab operations - where high availability sometimes means having spare hardware in the closet and backups might consist of external drives in a fireproof safe.

Understanding Homelab Fundamentals

What Exactly is a Homelab?

A homelab is a personal technology sandbox that replicates enterprise infrastructure environments at a smaller scale. Unlike corporate data centers, homelabs typically:

  • Run on consumer-grade or decommissioned enterprise hardware
  • Prioritize learning over uptime (though we pretend otherwise)
  • Combine production services (media servers, file shares) with experimental setups
  • Operate within residential power and thermal constraints

Key Homelab Components

ComponentEnterprise EquivalentHomelab Reality
ComputeVMware ESXi ClusterUsed Intel NUC/Proliant DL
StorageSAN/NAS with SSD cachingZFS array in old PC case
NetworkingCisco/Juniper stackUbiquiti Dream Machine SE
BackupVeeam/CommvaultRclone to Backblaze B2
MonitoringDatadog/New RelicPrometheus+Grafana VM
High AvailabilityRedundant power/networkSingle PSU with UPS backup

The Homelab Evolution Cycle

  1. Phase 1 - The Accidental Server: Old desktop running Plex
  2. Phase 2 - Virtualization Enlightenment: Proxmox/Hyper-V cluster
  3. Phase 3 - Network Segmentation: VLANs and pfSense firewall
  4. Phase 4 - Infrastructure as Code: Terraform/Ansible adoption
  5. Phase 5 - The Whiskey Phase: Realization that Kubernetes on Raspberry Pis was a bad idea

Why Homelabs Matter for DevOps Professionals

  1. Risk-Free Experimentation: Test kernel updates, breaking changes, and security patches without career consequences
  2. Deep Technology Understanding: Learn how storage actually works when you lose a ZFS vdev
  3. Troubleshooting Skills: Develop patience when debugging why NFS shares disappear after reboots
  4. Budget Constraints Creativity: Implement HAProxy because you can’t afford F5 BIG-IP

Prerequisites

Hardware Requirements (Minimum)

ComponentSpecificationNotes
Host MachineIntel i5 8th Gen / Ryzen 5 3600AES-NI for encryption, VT-d/AMD-V
RAM16GB DDR4ECC recommended for ZFS
Storage2x500GB SSD (Boot/VM) + 2x4TB HDDSeparate boot/media storage
NetworkDual Gigabit NICVLAN separation for management/data
PowerUPS 650VAProtect against dirty power

Software Requirements

  • Hypervisor: Proxmox VE 8.x (Installation Guide)
  • Containers: Docker 24.x (Install Docs)
  • Networking: Open vSwitch 3.1.x
  • Monitoring: Prometheus 2.47 + Grafana 10.1

Network Planning

Create this VLAN structure before installation:

VLAN IDPurposeSubnetDHCP Scope
10Management192.168.10.0/24.100-.200
20Services192.168.20.0/24.100-.200
30IoT192.168.30.0/24.100-.200
40Guest192.168.40.0/24.100-.200

Security Checklist

  1. Physically secure server location (no “basement flooding” disasters)
  2. Disable IPMI default credentials
  3. Plan firewall rules between VLANs
  4. Generate SSH keys for host access
  5. Note down all MAC addresses for port security

Installation & Configuration

Proxmox VE Installation

1
2
3
4
5
6
7
8
9
10
11
12
# Download latest ISO
wget https://download.proxmox.com/iso/proxmox-ve_8.1-1.iso

# Create bootable USB (Linux example)
sudo dd if=proxmox-ve_8.1-1.iso of=/dev/sdb bs=4M status=progress conv=fdatasync

# Installation steps:
# 1. Select "Install Proxmox VE"
# 2. Set country/timezone/US keyboard
# 3. Password for root@host (use 32+ char password)
# 4. Configure management interface on VLAN 10
# 5. Use ZFS mirror for boot drives

Post-install configuration:

1
2
3
4
5
6
7
8
9
10
11
# Update sources
echo "deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise" > /etc/apt/sources.list.d/pve-enterprise.list

# Add no-subscription repo
echo "deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list

# Update and upgrade
apt update && apt dist-upgrade -y

# Install common tools
apt install tmux zsh git curl net-tools ovs-switch -y

Docker Swarm Setup

On first manager node:

1
2
3
4
5
6
7
8
9
10
11
# Install Docker
curl -fsSL https://get.docker.com | sh

# Initialize swarm
docker swarm init --advertise-addr 192.168.10.101

# Get join token
docker swarm join-token worker

# Sample output: 
# docker swarm join --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c 192.168.10.101:2377

On worker nodes:

1
docker swarm join --token <TOKEN> 192.168.10.101:2377

Verify cluster status:

1
2
3
4
5
6
docker node ls

# CORRECT OUTPUT FORMAT:
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
l4gku8f7a8j5zq6z9x3x9x3x9     node1      Ready     Active         Leader           24.0.7
3x9x3x9x3x9x3x9x3x9x3x9x3     node2      Ready     Active         Reachable        24.0.7

Network Configuration with Open vSwitch

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Create bridge for VM traffic
ovs-vsctl add-br vmbr0
ovs-vsctl add-port vmbr0 eno1 tag=10 vlan_mode=native-untagged
ovs-vsctl add-port vmbr0 eno2 tag=20
ovs-vsctl set port eno2 trunks=10,20,30,40

# Configure VLAN interfaces
ovs-vsctl add-br mgmt0
ovs-vsctl add-port mgmt0 vmbr0 tag=10

ovs-vsctl add-br services0
ovs-vsctl add-port services0 vmbr0 tag=20

# Persistent configuration
cat << EOF > /etc/network/interfaces.d/ovs
auto vmbr0
iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports eno1 eno2

allow-vmbr0 mgmt0
iface mgmt0 inet static
    address 192.168.10.101/24
    gateway 192.168.10.1
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=10

allow-vmbr0 services0
iface services0 inet manual
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_options tag=20
EOF

Configuration & Optimization

Proxmox Resource Allocation Best Practices

  1. CPU Pinning: Assign dedicated cores to critical VMs
    1
    
    qm set 100 -cpu cpus=0-3
    
  2. Memory Ballooning: Allow dynamic memory allocation
    1
    
    qm set 100 -balloon 1024
    
  3. Storage Tiering:
    • SSD: VM operating systems and databases
    • HDD: Media storage and backups
    • NVMe: Ceph or ZFS caching

Docker Swarm Security Hardening

  1. Enable swarm encryption:
    1
    
    docker swarm update --task-history-limit 50 --autolock=true
    
  2. Create container security profiles:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    {
      "defaults": {
        "user": "nobody",
        "no-new-privileges": true
      },
      "sysctls": {
        "net.ipv4.tcp_syncookies": "1",
        "net.ipv4.conf.all.rp_filter": "1"
      }
    }
    
  3. Implement resource constraints:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    services:
      nginx:
        deploy:
          resources:
            limits:
              cpus: '0.50'
              memory: 512M
            reservations:
              cpus: '0.25'
              memory: 256M
    

Monitoring Stack Implementation

Prometheus configuration for Proxmox:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# prometheus.yml
scrape_configs:
  - job_name: 'proxmox'
    metrics_path: '/pve'
    params:
      module: [proxmox]
    static_configs:
      - targets: ['192.168.10.101:9221']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.10.101:9221

Grafana dashboard import via docker-compose:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version: '3.8'

services:
  grafana:
    image: grafana/grafana:10.1.0
    volumes:
      - grafana_data:/var/lib/grafana
      - ./dashboards:/etc/grafana/provisioning/dashboards
    deploy:
      mode: replicated
      replicas: 1
      resources:
        limits:
          memory: 512M
    networks:
      - metrics

volumes:
  grafana_data:

networks:
  metrics:
    driver: overlay
    attachable: true

Operations & Maintenance

Daily Operations Checklist

  1. Resource Monitoring:
    1
    2
    3
    4
    5
    6
    7
    8
    
    # Check Proxmox cluster status
    pvecm status
       
    # Docker node health
    docker node ps $CONTAINER_ID
       
    # ZFS pool status
    zpool status -v
    
  2. Backup Verification:
    1
    2
    3
    4
    5
    
    # List Proxmox backups
    pvesm list local-backup
       
    # Test Docker volume restore
    docker run --rm -v backup_verify:/data alpine ls -l /data
    
  3. Security Updates:
    1
    2
    3
    4
    5
    
    # Proxmox updates
    apt update && apt dist-upgrade -y
       
    # Docker image updates
    docker images | awk '(NR>1) && ($2!="<none>") {print $1":"$2}' | xargs -L1 docker pull
    

Backup Strategy Implementation

Three-tier backup approach:

  1. Local Snapshots (15 minute intervals):
    1
    2
    
    # ZFS automated snapshots
    zfs set com.sun:auto-snapshot=true rpool/data
    
  2. NAS Replication (Nightly):
    1
    2
    
    # ZFS send/receive to backup server
    zfs send rpool/data@snap-20240501 | ssh backup-host zfs recv backup/data
    
  3. Cloud Backup (Weekly):
    1
    2
    
    # Rclone encrypted backup to B2
    rclone sync /backup b2:homelab-backup --b2-hard-delete --transfers 16
    

Scaling Considerations

When your whiskey collection outgrows your rack space:

  1. Vertical Scaling:
    • Add NVMe caching layer
    • Upgrade to 10Gbps networking
    • Implement ECC memory
  2. Horizontal Scaling:
    • Add compute nodes with identical specs
    • Deploy Ceph distributed storage
    • Implement load balancing with HAProxy
  3. Density Optimization:
    • Replace towers with rack-mounted servers
    • Implement PoE switches for low-power devices
    • Consolidate services through containerization

Troubleshooting

Common Issues and Resolutions

Problem: All VMs lose network connectivity after switch reboot
Solution:

1
2
3
4
5
# Reinitialize OVS bridges
systemctl restart openvswitch-switch

# Verify bridge mappings
ovs-vsctl show

Problem: Docker swarm nodes show “Unreachable” status
Debugging:

1
2
3
4
5
6
7
8
# Check swarm manager logs
journalctl -u docker.service --since "10 minutes ago"

# Verify firewall rules
iptables -L DOCKER-USER -v -n

# Test overlay network
docker network create -d overlay --attachable test-net

Problem: ZFS pool reports “corrupted data” errors
Recovery:

1
2
3
4
5
6
7
8
# Scrub pool
zpool scrub tank

# Check errors
zpool status -v

# Restore from backup if needed
zfs rollback tank/data@snap-pre-corruption

Performance Tuning

  1. Disk I/O Optimization:
    1
    2
    3
    4
    5
    
    # Set ZFS ARC limits
    echo $((16 * 1024 * 1024 * 1024)) > /sys/module/zfs/parameters/zfs_arc_max
       
    # Adjust VM disk scheduler
    echo ky
    
This post is licensed under CC BY 4.0 by the author.