I Need To Study Clusters So I Handmade This Longboi

Posted Oct 13, 2025

By Usman Masood Ashraf

views 8 min read

INTRODUCTION

The raspberry pi cluster pictured in the Reddit post – nicknamed “Longboi” by its creator – represents more than just an impressive homelab flex. It embodies a fundamental DevOps truth: there’s no substitute for hands-on infrastructure experimentation. When you need to understand distributed systems, container orchestration, and high-availability architectures, building physical clusters delivers insights no cloud console can match.

In professional environments, we increasingly manage infrastructure through abstraction layers – Kubernetes, Terraform, cloud APIs. But when these abstractions fail (and they will fail), engineers who understand the underlying metal-and-silicon reality have a critical advantage. This is why serious DevOps practitioners maintain homelabs: physical environments where we can break things without consequences, test failure scenarios, and observe systems at the packet level.

In this comprehensive guide, we’ll dissect a real-world Raspberry Pi cluster build while exploring:

Cluster fundamentals – What makes distributed systems different
Hardware selection – Why RPi 5 + NVMe creates capable nodes
Orchestration choices – Lightweight Kubernetes vs Docker Swarm
Storage challenges – Making NVMe work in ARM environments
Power management – The critical role of proper UPS integration
Observability – Monitoring bare-metal clusters effectively

Whether you’re preparing for CKA certification, architecting production systems, or simply satisfying technical curiosity, this deep dive will equip you with practical cluster operations knowledge that translates directly to enterprise environments.

UNDERSTANDING THE TOPIC

What Is a Computing Cluster?

A computer cluster is a group of interconnected machines (“nodes”) that work together as a single system to provide:

High Availability (HA): Continuous operation through redundancy
Horizontal Scaling: Distributed workload processing
Fault Tolerance: Automatic failure recovery
Parallel Processing: Simultaneous task execution

Clusters differ from single-server setups in their networked coordination – nodes communicate via high-speed links to synchronize state and distribute workloads. The Reddit poster’s “Longboi” exemplifies a homogeneous cluster (identical Raspberry Pi 5 nodes), but production environments often mix hardware profiles.

Historical Context

Cluster computing evolved through three key phases:

Mainframe Era (1960s): Single massive computers with redundancy
Beowulf Clusters (1990s): COTS (Commercial Off-The-Shelf) machines networked with Linux
Cloud & Container Era (2010s): Virtualized clusters managed by orchestration systems

Modern DevOps clusters combine Beowulf’s hardware pragmatism with cloud-native orchestration – exactly what our RPi 5 build demonstrates.

Why Raspberry Pi for Cluster Learning?

RPis offer unique advantages for cluster experimentation:

Factor	RPi Cluster	Cloud Instances
Hardware Costs	$200-$500 total	$50+/month ongoing
Network Latency	Real LAN (0.1-1 ms)	Virtual (2-5 ms)
Physical Access	Direct hardware control	Abstracted via API
Failure Simulation	Pull power cables	Artificial shutdowns
Learning Depth	OS-to-application stack	Limited to cloud layer

The tactile experience of assembling physical nodes – connecting NVMe drives, configuring power supplies, troubleshooting SD card issues – builds foundational knowledge that cloud platforms abstract away.

Key Cluster Components in “Longboi”

Breaking down the Reddit build:

5x Raspberry Pi 5 Nodes
- Broadcom BCM2712 (ARM Cortex-A76) quad-core CPU
- 8GB LPDDR4X RAM per node
- Dual 4K HDMI output (hence the “impressive” screen)
- PCIe 2.0 x1 interface for NVMe connectivity
4x NVMe Drives
- Likely PCIe-to-USB3 adapters given RPi 5’s limited PCIe lanes
- Provides high-speed storage (~1GB/s) compared to SD cards (~100MB/s)
UPS (Uninterruptible Power Supply)
- Critical for graceful shutdowns during outages
- Prevents filesystem corruption on power loss
Networking Gear (Not Pictured but Implied)
- Gigabit Ethernet switch
- VLAN-capable router

This configuration balances performance and cost – each Pi 5 node delivers roughly 2x the compute power of a t3a.small AWS instance at 1/10th the hourly cost.

Software Choices for ARM Clusters

When orchestrating ARM-based nodes, traditional x86 tools require careful consideration:

Tool	ARM Support	RPi Considerations
Kubernetes	Official since v1.14	Use k3s for resource efficiency
Docker	ARM64 packages available	Requires 64-bit OS
Prometheus	Multi-architecture binaries	No special config needed
Ansible	Architecture-agnostic	Use pip install, not distro packages
Ceph	ARM64 compatible	Requires kernel modules for NVMe

The poster likely chose k3s (lightweight Kubernetes) given its minimal footprint – a standard k8s control plane requires 2GB RAM/node, while k3s runs on 512MB.

PREREQUISITES

Hardware Requirements

To replicate the “Longboi” build:

Component	Minimum Spec	Notes
Raspberry Pi 5	4GB RAM model	8GB preferred for k3s workloads
NVMe Drives	500GB PCIe 3.0 x4	DRAM-less models reduce power draw
PCIe Adapter	USB 3.2 Gen 1 to NVMe	Ensure Linux driver support
UPS	600VA/360W	APC Back-UPS Pro recommended
Ethernet Switch	8-port Gigabit managed	VLAN support for network isolation
Power Supply	27W USB-C PD per Pi	Anker 735 Charger (GaNPrime 65W) ideal
Cooling	Active heatsinks	Pi 5 throttles at 80°C without cooling

Software Requirements

Operating System:
- Raspberry Pi OS Lite 64-bit (Debian 12 Bookworm)
- Ubuntu Server 22.04 LTS for ARM64
Orchestration:
- k3s v1.28+ (lightweight Kubernetes)
- containerd v1.7+ (container runtime)
Provisioning:
- Ansible Core 2.15+
- Terraform v1.6+ (optional)
Monitoring:
- Prometheus v2.47+
- Grafana v10.1+

Network Configuration

Before powering on nodes:

Reserve DHCP IPs for all Pis in your router
Configure hostnames (e.g., node01.cluster.lan)

Set up SSH keys for password-less access:

  
ssh-keygen -t ed25519 -C "cluster-admin"
ssh-copy-id -i ~/.ssh/id_ed25519.pub pi@node01

Enable VLANs if isolating cluster traffic

Pre-Installation Checklist

Flash OS images to SD cards using Raspberry Pi Imager

Configure /boot/firmware/config.txt for NVMe boot:

  
# Enable PCIe and NVMe support
dtparam=pciex1
dtoverlay=rp1-pcie

Verify NVMe detection:

lsblk | grep nvme
# Should show /dev/nvme0n1

Test UPS communication via NUT (Network UPS Tools):
1 upsc ups@localhost

INSTALLATION & SETUP

Bare-Metal OS Configuration

Step 1: Initial Pi Setup

After booting from SD card:

  
sudo raspi-config
# Set hostname, locale, timezone
# Enable SSH, I2C, SPI, PCIe
# Expand filesystem to NVMe

Step 2: Switch Root to NVMe

Format NVMe as ext4:
1 sudo mkfs.ext4 /dev/nvme0n1

Mount and copy root:

  
sudo mount /dev/nvme0n1 /mnt
sudo rsync -axHAWX --numeric-ids --info=progress2 / /mnt

Update /boot/firmware/cmdline.txt:

  
root=/dev/nvme0n1 rootfstype=ext4 rootwait

Step 3: Kernel Optimization

Edit /etc/sysctl.conf:

  
# Increase network buffers
net.core.rmem_max=26214400
net.core.wmem_max=26214400

# Improve TCP performance
net.ipv4.tcp_rmem=4096 87380 25165824
net.ipv4.tcp_wmem=4096 65536 25165824

Kubernetes Cluster with k3s

Control Plane (Node01):

  
curl -sfL https://get.k3s.io | sh -s - \
  --disable traefik \
  --cluster-init \
  --tls-san cluster.lan \
  --write-kubeconfig-mode 644

Worker Nodes (Node02-05):

  
curl -sfL https://get.k3s.io | K3S_URL=https://node01:6443 \
  K3S_TOKEN=$(cat /var/lib/rancher/k3s/server/node-token) \
  sh -s - --kubelet-arg="eviction-hard=memory.available<100Mi"

Verify Cluster Status:

kubectl get nodes -o wide
# All nodes should show Ready status

Persistent Storage with NVMe

Install OpenEBS for local PV provisioning:

kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml

Create StorageClass for NVMe:

  
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nvme-ssd
provisioner: openebs.io/local
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
parameters:
  storageType: "hostpath"
  basePath: "/mnt/nvme"

CONFIGURATION & OPTIMIZATION

Security Hardening

RBAC Configuration

Limit default permissions:

  
# cluster-auth.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: restricted-users
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

Network Policies

Isolate namespaces:

  
# default-deny.yaml
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Performance Tuning

Kernel Parameters

Add to /etc/sysctl.d/99-k8s.conf:

  
# Increase inotify watches
fs.inotify.max_user_watches=1048576

# Optimize virtual memory
vm.swappiness=10
vm.vfs_cache_pressure=50

k3s Systemd Service

Override defaults in /etc/systemd/system/k3s.service.d/override.conf:

  
[Service]
CPUQuota=300%
MemoryHigh=6G
MemoryMax=7G

Monitoring Stack

Prometheus Configuration

Target k3s metrics:

  
# prometheus-config.yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'k3s'
    static_configs:
      - targets: ['node01:6443', 'node02:6443']
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

Grafana Dashboard

Import dashboard ID 13770 (Raspberry Pi Cluster) via Grafana UI.

USAGE & OPERATIONS

Common kubectl Commands

Cluster Info:

kubectl get nodes -L topology.kubernetes.io/zone
kubectl top nodes

Workload Management:

  
# Deploy test pod
kubectl run -i --tty busybox --image=busybox -- sh

# Scale deployment
kubectl scale deployment nginx --replicas=5

# Drain node for maintenance
kubectl drain node03 --ignore-daemonsets

Persistent Volume Claims

Example MySQL Deployment:

  
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  storageClassName: nvme-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-pvc

Maintenance Procedures

OS Updates:

Cordon node:
1 kubectl cordon node02

SSH to node and update:

  
sudo apt update && sudo apt upgrade -y

Reboot and uncordon:
1 kubectl uncordon node02

Cluster Backups: Use Velero with NVMe storage:

  
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.7.0 \
  --bucket cluster-backups \
  --backup-location-config

Open Source, Reddit Guides, Kubernetes

This post is licensed under CC BY 4.0 by the author.

I Need To Study Clusters So I Handmade This Longboi

INTRODUCTION

UNDERSTANDING THE TOPIC

What Is a Computing Cluster?

Historical Context

Why Raspberry Pi for Cluster Learning?

Key Cluster Components in “Longboi”

Software Choices for ARM Clusters

PREREQUISITES

Hardware Requirements

Software Requirements

Network Configuration

Pre-Installation Checklist

INSTALLATION & SETUP

Bare-Metal OS Configuration

Kubernetes Cluster with k3s

Persistent Storage with NVMe

CONFIGURATION & OPTIMIZATION

Security Hardening

Performance Tuning

Monitoring Stack

USAGE & OPERATIONS

Common kubectl Commands

Persistent Volume Claims

Maintenance Procedures

Trending Tags