Post

I Need To Study Clusters So I Handmade This Longboi

I Need To Study Clusters So I Handmade This Longboi

INTRODUCTION

The raspberry pi cluster pictured in the Reddit post – nicknamed “Longboi” by its creator – represents more than just an impressive homelab flex. It embodies a fundamental DevOps truth: there’s no substitute for hands-on infrastructure experimentation. When you need to understand distributed systems, container orchestration, and high-availability architectures, building physical clusters delivers insights no cloud console can match.

In professional environments, we increasingly manage infrastructure through abstraction layers – Kubernetes, Terraform, cloud APIs. But when these abstractions fail (and they will fail), engineers who understand the underlying metal-and-silicon reality have a critical advantage. This is why serious DevOps practitioners maintain homelabs: physical environments where we can break things without consequences, test failure scenarios, and observe systems at the packet level.

In this comprehensive guide, we’ll dissect a real-world Raspberry Pi cluster build while exploring:

  1. Cluster fundamentals – What makes distributed systems different
  2. Hardware selection – Why RPi 5 + NVMe creates capable nodes
  3. Orchestration choices – Lightweight Kubernetes vs Docker Swarm
  4. Storage challenges – Making NVMe work in ARM environments
  5. Power management – The critical role of proper UPS integration
  6. Observability – Monitoring bare-metal clusters effectively

Whether you’re preparing for CKA certification, architecting production systems, or simply satisfying technical curiosity, this deep dive will equip you with practical cluster operations knowledge that translates directly to enterprise environments.

UNDERSTANDING THE TOPIC

What Is a Computing Cluster?

A computer cluster is a group of interconnected machines (“nodes”) that work together as a single system to provide:

  • High Availability (HA): Continuous operation through redundancy
  • Horizontal Scaling: Distributed workload processing
  • Fault Tolerance: Automatic failure recovery
  • Parallel Processing: Simultaneous task execution

Clusters differ from single-server setups in their networked coordination – nodes communicate via high-speed links to synchronize state and distribute workloads. The Reddit poster’s “Longboi” exemplifies a homogeneous cluster (identical Raspberry Pi 5 nodes), but production environments often mix hardware profiles.

Historical Context

Cluster computing evolved through three key phases:

  1. Mainframe Era (1960s): Single massive computers with redundancy
  2. Beowulf Clusters (1990s): COTS (Commercial Off-The-Shelf) machines networked with Linux
  3. Cloud & Container Era (2010s): Virtualized clusters managed by orchestration systems

Modern DevOps clusters combine Beowulf’s hardware pragmatism with cloud-native orchestration – exactly what our RPi 5 build demonstrates.

Why Raspberry Pi for Cluster Learning?

RPis offer unique advantages for cluster experimentation:

FactorRPi ClusterCloud Instances
Hardware Costs$200-$500 total$50+/month ongoing
Network LatencyReal LAN (0.1-1 ms)Virtual (2-5 ms)
Physical AccessDirect hardware controlAbstracted via API
Failure SimulationPull power cablesArtificial shutdowns
Learning DepthOS-to-application stackLimited to cloud layer

The tactile experience of assembling physical nodes – connecting NVMe drives, configuring power supplies, troubleshooting SD card issues – builds foundational knowledge that cloud platforms abstract away.

Key Cluster Components in “Longboi”

Breaking down the Reddit build:

  1. 5x Raspberry Pi 5 Nodes
    • Broadcom BCM2712 (ARM Cortex-A76) quad-core CPU
    • 8GB LPDDR4X RAM per node
    • Dual 4K HDMI output (hence the “impressive” screen)
    • PCIe 2.0 x1 interface for NVMe connectivity
  2. 4x NVMe Drives
    • Likely PCIe-to-USB3 adapters given RPi 5’s limited PCIe lanes
    • Provides high-speed storage (~1GB/s) compared to SD cards (~100MB/s)
  3. UPS (Uninterruptible Power Supply)
    • Critical for graceful shutdowns during outages
    • Prevents filesystem corruption on power loss
  4. Networking Gear (Not Pictured but Implied)
    • Gigabit Ethernet switch
    • VLAN-capable router

This configuration balances performance and cost – each Pi 5 node delivers roughly 2x the compute power of a t3a.small AWS instance at 1/10th the hourly cost.

Software Choices for ARM Clusters

When orchestrating ARM-based nodes, traditional x86 tools require careful consideration:

ToolARM SupportRPi Considerations
KubernetesOfficial since v1.14Use k3s for resource efficiency
DockerARM64 packages availableRequires 64-bit OS
PrometheusMulti-architecture binariesNo special config needed
AnsibleArchitecture-agnosticUse pip install, not distro packages
CephARM64 compatibleRequires kernel modules for NVMe

The poster likely chose k3s (lightweight Kubernetes) given its minimal footprint – a standard k8s control plane requires 2GB RAM/node, while k3s runs on 512MB.

PREREQUISITES

Hardware Requirements

To replicate the “Longboi” build:

ComponentMinimum SpecNotes
Raspberry Pi 54GB RAM model8GB preferred for k3s workloads
NVMe Drives500GB PCIe 3.0 x4DRAM-less models reduce power draw
PCIe AdapterUSB 3.2 Gen 1 to NVMeEnsure Linux driver support
UPS600VA/360WAPC Back-UPS Pro recommended
Ethernet Switch8-port Gigabit managedVLAN support for network isolation
Power Supply27W USB-C PD per PiAnker 735 Charger (GaNPrime 65W) ideal
CoolingActive heatsinksPi 5 throttles at 80°C without cooling

Software Requirements

  1. Operating System:
    • Raspberry Pi OS Lite 64-bit (Debian 12 Bookworm)
    • Ubuntu Server 22.04 LTS for ARM64
  2. Orchestration:
    • k3s v1.28+ (lightweight Kubernetes)
    • containerd v1.7+ (container runtime)
  3. Provisioning:
    • Ansible Core 2.15+
    • Terraform v1.6+ (optional)
  4. Monitoring:
    • Prometheus v2.47+
    • Grafana v10.1+

Network Configuration

Before powering on nodes:

  1. Reserve DHCP IPs for all Pis in your router
  2. Configure hostnames (e.g., node01.cluster.lan)
  3. Set up SSH keys for password-less access:
    1
    2
    
    ssh-keygen -t ed25519 -C "cluster-admin"
    ssh-copy-id -i ~/.ssh/id_ed25519.pub pi@node01
    
  4. Enable VLANs if isolating cluster traffic

Pre-Installation Checklist

  1. Flash OS images to SD cards using Raspberry Pi Imager
  2. Configure /boot/firmware/config.txt for NVMe boot:
    1
    2
    3
    
    # Enable PCIe and NVMe support
    dtparam=pciex1
    dtoverlay=rp1-pcie
    
  3. Verify NVMe detection:
    1
    2
    
    lsblk | grep nvme
    # Should show /dev/nvme0n1
    
  4. Test UPS communication via NUT (Network UPS Tools):
    1
    
    upsc ups@localhost
    

INSTALLATION & SETUP

Bare-Metal OS Configuration

Step 1: Initial Pi Setup

After booting from SD card:

1
2
3
4
sudo raspi-config
# Set hostname, locale, timezone
# Enable SSH, I2C, SPI, PCIe
# Expand filesystem to NVMe

Step 2: Switch Root to NVMe

  1. Format NVMe as ext4:
    1
    
    sudo mkfs.ext4 /dev/nvme0n1
    
  2. Mount and copy root:
    1
    2
    
    sudo mount /dev/nvme0n1 /mnt
    sudo rsync -axHAWX --numeric-ids --info=progress2 / /mnt
    
  3. Update /boot/firmware/cmdline.txt:
    1
    
    root=/dev/nvme0n1 rootfstype=ext4 rootwait
    

Step 3: Kernel Optimization

Edit /etc/sysctl.conf:

1
2
3
4
5
6
7
# Increase network buffers
net.core.rmem_max=26214400
net.core.wmem_max=26214400

# Improve TCP performance
net.ipv4.tcp_rmem=4096 87380 25165824
net.ipv4.tcp_wmem=4096 65536 25165824

Kubernetes Cluster with k3s

Control Plane (Node01):

1
2
3
4
5
curl -sfL https://get.k3s.io | sh -s - \
  --disable traefik \
  --cluster-init \
  --tls-san cluster.lan \
  --write-kubeconfig-mode 644

Worker Nodes (Node02-05):

1
2
3
curl -sfL https://get.k3s.io | K3S_URL=https://node01:6443 \
  K3S_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token) \
  sh -s - --kubelet-arg="eviction-hard=memory.available<100Mi"

Verify Cluster Status:

1
2
kubectl get nodes -o wide
# All nodes should show Ready status

Persistent Storage with NVMe

  1. Install OpenEBS for local PV provisioning:
    1
    
    kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
    
  2. Create StorageClass for NVMe:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: nvme-ssd
    provisioner: openebs.io/local
    volumeBindingMode: WaitForFirstConsumer
    reclaimPolicy: Delete
    parameters:
      storageType: "hostpath"
      basePath: "/mnt/nvme"
    

CONFIGURATION & OPTIMIZATION

Security Hardening

RBAC Configuration

Limit default permissions:

1
2
3
4
5
6
7
8
9
10
11
12
13
# cluster-auth.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: restricted-users
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

Network Policies

Isolate namespaces:

1
2
3
4
5
6
7
8
9
10
# default-deny.yaml
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Performance Tuning

Kernel Parameters

Add to /etc/sysctl.d/99-k8s.conf:

1
2
3
4
5
6
# Increase inotify watches
fs.inotify.max_user_watches=1048576

# Optimize virtual memory
vm.swappiness=10
vm.vfs_cache_pressure=50

k3s Systemd Service

Override defaults in /etc/systemd/system/k3s.service.d/override.conf:

1
2
3
4
[Service]
CPUQuota=300%
MemoryHigh=6G
MemoryMax=7G

Monitoring Stack

Prometheus Configuration

Target k3s metrics:

1
2
3
4
5
6
7
8
9
10
11
12
# prometheus-config.yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'k3s'
    static_configs:
      - targets: ['node01:6443', 'node02:6443']
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

Grafana Dashboard

Import dashboard ID 13770 (Raspberry Pi Cluster) via Grafana UI.

USAGE & OPERATIONS

Common kubectl Commands

Cluster Info:

1
2
kubectl get nodes -L topology.kubernetes.io/zone
kubectl top nodes

Workload Management:

1
2
3
4
5
6
7
8
# Deploy test pod
kubectl run -i --tty busybox --image=busybox -- sh

# Scale deployment
kubectl scale deployment nginx --replicas=5

# Drain node for maintenance
kubectl drain node03 --ignore-daemonsets

Persistent Volume Claims

Example MySQL Deployment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  storageClassName: nvme-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-pvc

Maintenance Procedures

OS Updates:

  1. Cordon node:
    1
    
    kubectl cordon node02
    
  2. SSH to node and update:
    1
    
    sudo apt update && sudo apt upgrade -y
    
  3. Reboot and uncordon:
    1
    
    kubectl uncordon node02
    

Cluster Backups: Use Velero with NVMe storage:

1
2
3
4
5
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.7.0 \
  --bucket cluster-backups \
  --backup-location-config
This post is licensed under CC BY 4.0 by the author.