Post

How Many Computers Do You Need

How Many Computers Do You Need

Introduction

The perennial question in homelab and professional DevOps circles strikes a chord with every infrastructure enthusiast: How many computers do you really need? From the Reddit user running 7 active machines to enterprises managing thousands of nodes, this question exposes fundamental truths about infrastructure design philosophy.

In an era where Kubernetes clusters span continents and Raspberry Pis host critical services, the answer is never simple. This guide dissects the technical considerations behind infrastructure sprawl versus consolidation, examining:

  • The role of physical separation in security-critical workloads
  • Performance isolation strategies for mixed workloads
  • Cost/benefit analysis of distributed systems
  • The containerization paradox: fewer machines running more services
  • When specialized hardware justifies dedicated nodes

We’ll analyze real-world deployments through the lens of professional system administration, evaluating when multiple machines provide tangible benefits versus when they create unnecessary complexity. By the end, you’ll possess a structured framework for making informed infrastructure decisions.

Understanding Infrastructure Scaling

The Evolution of Compute Density

EraTypical SetupKey Driver
1990sSingle physical serverHardware costs
2000sVirtual machines (4-8/core)Virtualization efficiency
2010sContainers (10-100+/core)Microservices adoption
2020sServerless + edge computeDistributed workloads

Modern infrastructure exists on a spectrum between two extremes:

  1. Hyperconverged Infrastructure (HCI)
    • Example: Single Proxmox server handling NAS, gaming, and services
    • Pros: Lower power consumption, simplified management
    • Cons: Single point of failure, noisy neighbor issues
  2. Disaggregated Architecture
    • Example: Dedicated NAS, gaming PC, Kubernetes nodes
    • Pros: Hardware optimization, fault isolation
    • Cons: Higher costs, management overhead

The Specialization Calculus

Specialized hardware often justifies dedicated machines:

  • NAS Requirements
    • ECC memory for ZFS integrity
    • Hot-swap drive bays
    • Low-power idle states
  • Gaming PC Needs
    • High-end GPU
    • Low-latency peripherals
    • Real-time performance guarantees
  • Server Workloads
    • IPMI/BMC for remote management
    • Redundant power supplies
    • Diskless configurations

The Containerization Paradox

While containers enable higher density, they introduce new challenges:

1
2
3
4
# Compare container vs VM density on a 32-core server
docker run -it --cpus="0.5" --memory="512m" nginx
vs
kvm -m 2G -smp 2

Containers provide:

  • 5-10x higher density than VMs
  • Milliseconds vs minutes startup times
  • Shared kernel security model

But require:

  • Careful cgroup tuning
  • Storage driver optimization
  • Network namespace management

Prerequisites for Effective Consolidation

Hardware Considerations

Minimum specs for multi-role servers:

WorkloadCPU CoresRAMStorageNetwork
Light services24GBSATA SSD1GbE
NAS + Media416GBZFS HDD Array10GbE
Gaming + VMs8+32GB+NVMe Cache2.5GbE
Production K8s16+64GB+NVMe RAID25GbE+

Software Foundation

Critical tools for infrastructure unification:

  1. Hypervisors
  2. Orchestration
  3. Configuration Management ```bash

    Ansible playbook for unified node configuration

    • hosts: all become: yes tasks:
      • name: Ensure standard packages apt: name: [“htop”, “iotop”, “nload”] state: present ```

Network Design

Consolidated setups require advanced networking:

  • VLAN Segmentation
    1
    2
    3
    4
    5
    6
    
    # Proxmox VLAN configuration example
    auto vmbr0.10
    iface vmbr0.10 inet static
        address 192.168.10.1/24
        bridge-ports eno1
        bridge-stp off
    
  • Traffic Prioritization
    1
    2
    3
    4
    
    # tc rules for gaming PC traffic prioritization
    tc qdisc add dev eth0 root handle 1: htb
    tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit
    tc class add dev eth0 parent 1:1 classid 1:10 htb rate 900mbit ceil 950mbit prio 0
    

Installation & Configuration Strategies

Hypervisor-Based Consolidation

Proxmox VE Deployment

  1. Prepare boot media:
    1
    2
    
    wget https://download.proxmox.com/iso/proxmox-ve_8.0.iso
    dd if=proxmox-ve_8.0.iso of=/dev/sdc bs=4M status=progress
    
  2. Post-install configuration:
    1
    2
    3
    4
    5
    6
    
    # Join cluster or initialize new
    pvecm create CLUSTER_NAME
    pvecm add IP_EXISTING_NODE
    
    # Configure storage
    pvesm add zfspool local-zfs -pool raidz1-0 /dev/sda /dev/sdb /dev/sdc
    
  3. GPU Passthrough for gaming VM:
    1
    2
    3
    4
    5
    6
    7
    8
    
    # Load VFIO modules
    echo "vfio" >> /etc/modules
    echo "vfio_iommu_type1" >> /etc/modules
    echo "vfio_pci" >> /etc/modules
    
    # Identify GPU IDs
    lspci -nn | grep NVIDIA
    # 01:00.0 VGA [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204]
    

Container-First Architecture

Docker Swarm Setup

  1. Initialize swarm cluster:
    1
    2
    
    docker swarm init --advertise-addr 192.168.1.100
    docker node update --availability drain $NODE_ID
    
  2. Deploy stack with resource constraints:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    # docker-compose.prod.yml
    version: '3.8'
    services:
      nextcloud:
        image: nextcloud:27
        deploy:
          resources:
            limits:
              cpus: '0.5'
              memory: 1G
        volumes:
          - nextcloud_data:/var/www/html
    
  3. Verify resource allocation:
    1
    
    docker stats --format "table \t\t"
    

Performance Optimization Techniques

NUMA-Aware Scheduling

Critical for high-performance consolidated systems:

1
2
3
4
5
# Start container with NUMA constraints
docker run -it --cpuset-cpus=0-3 --numa-node=0 nginx

# Verify NUMA allocation
numactl --hardware

Storage Tiering Optimization

Combine performance and capacity layers:

1
2
3
4
5
6
7
# ZFS storage pool configuration
zpool create tank \
  mirror /dev/nvme0n1 /dev/nvme1n1 \
  raidz2 /dev/sd[a-d]

# Add SSD cache
zpool add tank cache /dev/nvme2n1

Network QoS Implementation

Prioritize latency-sensitive traffic:

1
2
3
4
# Linux tc rules for gaming traffic
tc qdisc add dev eth0 root handle 1: htb default 20
tc class add dev eth0 parent 1: classid 1:10 htb rate 90% ceil 95% prio 0
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 27036 0xffff flowid 1:10

Security Hardening Strategies

Hypervisor-Level Protections

  1. Mandatory Access Control
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    # AppArmor for QEMU
    /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper {
      include <tunables/global>
      profile virt-aa-helper /usr/{lib,lib64}/libvirt/virt-aa-helper {
        include <abstractions/base>
        capability dac_override,
        /usr/{lib,lib64}/libvirt/virt-aa-helper mr,
      }
    }
    
  2. VM Escape Mitigation
    1
    2
    
    # Kernel parameters for KVM hardening
    GRUB_CMDLINE_LINUX="... kvm-intel.nested=0 mitigations=auto nospec_store_bypass_disable"
    

Container Security Best Practices

  1. Rootless Docker Configuration
    1
    2
    3
    4
    5
    
    # Install rootless mode
    dockerd-rootless-setuptool.sh install
    
    # Verify context
    docker context ls
    
  2. Seccomp Profiles
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    // custom-seccomp.json
    {
      "defaultAction": "SCMP_ACT_ERRNO",
      "syscalls": [
        {
          "names": ["read", "write"],
          "action": "SCMP_ACT_ALLOW"
        }
      ]
    }
    

Monitoring and Maintenance

Unified Observability Stack

Prometheus configuration for hybrid environments:

1
2
3
4
5
6
7
8
9
10
# prometheus.yml
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.20:9100']
  - job_name: 'proxmox'
    params:
      module: [pve]
    static_configs:
      - targets: ['192.168.1.100:9221']

Automated Patch Management

Ansible playbook for rolling updates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- hosts: k8s_nodes
  serial: 1
  tasks:
    - name: Drain node
      command: kubectl drain $HOSTNAME --ignore-daemonsets --delete-emptydir-data
      when: inventory_hostname in groups['k8s_workers']
    
    - name: Update packages
      apt:
        upgrade: dist
        update_cache: yes
    
    - name: Reboot if needed
      reboot:
        msg: "Kernel updated, rebooting"
        reboot_timeout: 300
    
    - name: Uncordon node
      command: kubectl uncordon $HOSTNAME
      when: inventory_hostname in groups['k8s_workers']

Troubleshooting Common Issues

Resource Contention Diagnosis

  1. Identify noisy neighbors
    1
    2
    3
    4
    5
    
    # Show CPU pressure
    awk '{print $1,$2,$3,$8}' /proc/softirqs
    
    # Detect memory pressure
    grep -E '^(Direct|Kernel)Page' /proc/vmstat
    
  2. Storage Latency Analysis
    1
    2
    3
    4
    5
    
    # ZFS performance stats
    zpool iostat -v 1
    
    # Block device latency
    iostat -xmdz 1
    
  3. Network Saturation
    1
    2
    3
    4
    5
    
    # tc class show
    tc -s class show dev eth0
    
    # Deep packet inspection
    tcpdump -ni eth0 -s0 -w capture.pcap
    

Debugging Hypervisor Issues

Common Proxmox errors and solutions:

  1. PCI Passthrough Failures
    1
    2
    3
    4
    5
    
    dmesg | grep -i vfio
    # Ensure IOMMU groups are properly isolated
       
    # Check kernel parameters
    cat /proc/cmdline | grep intel_iommu=on
    
  2. Ceph Performance Problems
    1
    2
    
    ceph osd perf
    ceph pg dump | awk '$1 == "state" {print $2}' | sort | uniq -c
    

Conclusion

The question “How many computers do you need?” reveals fundamental truths about infrastructure design philosophy. Through our analysis of consolidation strategies, performance isolation techniques, and security hardening approaches, we’ve established a framework for decision-making:

  1. Specialization Threshold - When hardware requirements diverge by >40%, physical separation becomes justified
  2. Fault Domain Budget - Acceptable risk level determines replica count (N+1 vs N+2)
  3. Management Overhead Index - Each additional node increases complexity non-linearly
  4. Energy Efficiency Curve - Consolidation benefits diminish beyond 80% resource utilization

The optimal number balances these factors while aligning with your organizational constraints. For most homelabs, a 3-node cluster with GPU passthrough provides the best balance. Enterprises generally benefit from scale-out architectures beyond 8 nodes.

Further Reading

The infrastructure landscape continues evolving with technologies like WebAssembly microVMs and DPU offloading. What remains constant is the need for deliberate, metrics-driven infrastructure design - whether you’re managing one machine or ten thousand.

This post is licensed under CC BY 4.0 by the author.