Finally Upgraded My Homelab
Finally Upgraded My Homelab
Introduction
For years, my homelab was powered by a humble 4-node Raspberry Pi cluster – an excellent platform for learning Kubernetes and Terraform fundamentals, but plagued by the inherent limitations of MicroSD storage. Random node failures, corruptions during power outages, and sluggish I/O performance created operational headaches that undermined my home automation reliability. When TrueNAS 25 arrived with native Kubernetes integration, it became clear: The SD card experiment had served its purpose. It was time for enterprise-grade infrastructure.
This guide documents my journey from Raspberry Pi nodes to a purpose-built homelab with TrueNAS Scale, Kubernetes, and robust storage. We’ll dissect:
- Hardware selection – Why the DeskPi RackMate T1 and 2.5G PoE networking?
- Storage architecture – Separating apps (SSD) from bulk storage (HDD)
- Software stack – TrueNAS Scale’s Kubernetes integration versus vanilla K8s
- Operational gains – How ZFS and hardware redundancy eliminated my SD card woes
Whether you’re battling unreliable nodes or planning a homelab overhaul, this deep-dive into infrastructure modernization delivers actionable insights for DevOps engineers and sysadmins managing self-hosted environments.
Understanding the Homelab Evolution
What Defines a Modern Homelab?
Homelabs have evolved from hobbyist playgrounds to production-like environments for testing cloud-native technologies. Key characteristics include:
- Self-healing infrastructure: Kubernetes, Proxmox HA, ZFS redundancy
- Automated provisioning: Terraform, Ansible, cloud-init
- Persistent storage: NAS/SAN solutions replacing USB drives
- Enterprise networking: VLANs, firewalls, load balancers
The Problem with SD Cards in Production
MicroSD cards – while convenient for Pi clusters – introduce critical failure points:
Failure Mode | Impact | Mitigation in New Build |
---|---|---|
Write endurance | Corrupted root FS | M.2 SSD with wear leveling |
I/O bottlenecks | Slow PVC provisioning | NVMe-backed RWO volumes |
Power loss corruption | Manual fsck recovery | ZFS transaction groups + UPS |
Limited capacity | Frequent garbage collection | 32TB raw storage + compression |
Why TrueNAS Scale for Kubernetes?
TrueNAS SCALE (25.10 “Uyuni”) merges Linux-based NAS management with Kubernetes:
1
2
3
4
5
[TrueNAS SCALE Architecture]
├── Debian 12 Base
├── ZFS 2.2.2
├── Kubernetes 1.28
└── Web UI → K3s API Proxy
Key Advantages:
- Direct PV Provisioning: Create ZVOLs or datasets as PersistentVolumes
- GPU Passthrough: Assign NVIDIA cards to worker nodes via UI
- Integrated Load Balancer: MetalLB integration for on-prem services
Comparison to Alternatives:
Solution | Storage | K8s Management | Learning Curve |
---|---|---|---|
TrueNAS SCALE | Native ZFS | GUI + CLI | Moderate |
Rancher + Harvester | Ceph | Full UI | High |
MicroK8s + OpenZFS | Manual | CLI Only | Low |
Prerequisites
Hardware Specifications
My build targets 24/7 operation with <150W power draw:
1
2
3
4
5
6
7
8
9
10
11
12
# hardware.yml
motherboard: MSI MPG B650I Edge (DDR5, PCIe 5.0)
cpu: AMD Ryzen 5 7600 (6C/12T, 65W TDP)
ram: 64GB Kingston Fury DDR5-5200 (ECC Unbuffered)
storage:
boot: 500GB WD Red SN700 NVMe
apps: 2x2TB Samsung 870 QVO SATA SSD (Mirror)
data: 4x8TB Seagate IronWolf HDD (RAIDZ1)
networking:
main: 2.5Gbps RJ45 (Intel I225-V)
secondary: 1Gbps (Realtek RTL8125)
power: Corsair SF750 Platinum (750W)
Critical Considerations:
- ECC RAM: Not officially supported on consumer AM5 but reduces ZFS memory errors
- HDD Choice: CMR drives mandatory for ZFS (avoid SMR like WD Red SMR)
- Power Supply: 80+ Platinum efficiency for 24/7 operation
Network Pre-Configuration
Before installing TrueNAS:
1
2
3
4
5
6
7
8
# On Ubiquiti UniFi Controller
# Create VLANs
configure
set interfaces switch switch0 vif 30 description "Storage"
set interfaces switch switch0 vif 30 address 10.10.30.1/24
set interfaces switch switch0 vif 40 description "Kubernetes"
set interfaces switch switch0 vif 40 address 10.10.40.1/24
commit
Software Requirements
Component | Version | Notes |
---|---|---|
TrueNAS SCALE | 25.10 | Requires UEFI boot with secure boot off |
Kubernetes | 1.28 | Via TrueNAS Apps (k3s) |
Terraform | 1.8.2 | State stored on S3-compatible MinIO |
Installation & Configuration
TrueNAS SCALE Base Setup
- Flash Installer:
1 2
# Linux sudo dd if=truenas-scale-25.10.iso of=/dev/sdX bs=4M status=progress conv=fsync
- Web UI Initialization:
- Set admin IP on dedicated VLAN (10.10.30.5/24)
- Disable root password, enforce SSH key authentication:
1
ssh-copy-id -i ~/.ssh/truenas_ed25519 admin@10.10.30.5
- ZFS Pool Creation:
1 2 3 4 5 6 7 8
# CLI alternative to UI zpool create -f -o ashift=12 tank raidz1 \ /dev/disk/by-id/ata-ST8000VN004-2M2101_ABCD1234 \ /dev/disk/by-id/ata-ST8000VN004-2M2101_EFGH5678 \ /dev/disk/by-id/ata-ST8000VN004-2M2101_IJKL9012 \ /dev/disk/by-id/ata-ST8000VN004-2M2101_MNOP3456 zfs set compression=lz4 tank zfs set atime=off tank
Kubernetes via TrueNAS Apps
TrueNAS uses a modified k3s distribution:
- Enable Kubernetes:
- System Settings → Apps → Enable
- Select VLAN 40 (10.10.40.0/24)
- Configure Load Balancer IP Range (10.10.40.100-10.10.40.150)
- Storage Classes:
1 2 3 4 5 6 7 8 9 10 11
# apps.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: truenas-iscsi-lz4 provisioner: csi.truenas.com parameters: compression: lz4 aclmode: passthrough sync: standard volumeBindingMode: WaitForFirstConsumer
- Node Configuration:
1 2 3
# Access k3s cluster truenas-cli kubernetes kubeconfig > ~/.kube/homelab-config kubectl get nodes -o wide
Terraform Bootstrap
Initialize Terraform with remote state:
1
2
3
4
5
6
7
8
9
10
11
# main.tf
terraform {
backend "s3" {
endpoint = "https://minio.mydomain.net"
bucket = "terraform-state"
key = "homelab/network"
region = "main"
skip_credentials_validation = true
skip_region_validation = true
}
}
Apply initial config:
1
2
export TF_VAR_nas_ip="10.10.30.5"
terraform apply -target=module.nfs_storage
Optimization & Hardening
ZFS Tuning for Mixed Workloads
Adjust ARC limits for Kubernetes + SMB/CIFS:
1
2
3
4
# /etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=2147483648 # 2GB floor
options zfs zfs_arc_max=34359738368 # 32GB max (50% of 64GB RAM)
options zfs l2arc_write_boost=400000000 # Burst to L2ARC (SSD)
Kubernetes Security Policies
- Pod Security Admission: ```yaml
psa.yaml
apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins:
- name: PodSecurity configuration: defaults: enforce: “restricted” enforce-version: “latest” exemptions: usernames: [“system:serviceaccount:kube-system:*”] ```
- Network Policies: ```yaml
default-deny.yaml
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: default-deny namespace: automation spec: podSelector: {} policyTypes:
- Ingress
- Egress ```
Day-to-Day Operations
Monitoring Stack
Deploy Prometheus with TrueNAS storage:
1
2
3
helm install prometheus prometheus-community/kube-prometheus-stack \
--set persistentVolume.storageClass=truenas-iscsi-lz4 \
--namespace monitoring
Key metrics to alert on:
zfs_arc_size / zfs_arc_max > 0.8
– ARC pressurekube_pod_status_ready{condition="false"}
– CrashLoopBackoff detectionnode_filesystem_avail_bytes{mountpoint="/mnt/tank"} / node_filesystem_size_bytes < 0.2
– Storage capacity
Backup Strategy
- ZFS Snapshots:
1 2 3 4 5
# Daily recursive snapshots zfs snapshot -r tank@$(date +%Y%m%d) # Offsite replication zfs send tank/apps@20240501 | ssh backup-host "zfs recv backup/apps"
- Velero for Kubernetes:
1 2 3 4 5 6
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.8.0 \ --bucket velero-backups \ --secret-file ./credentials \ --backup-location-config region=default,s3ForcePathStyle="true",s3Url=https://minio.mydomain.net
Troubleshooting Guide
Common Issues and Solutions
Problem: ZFS pool shows DEGRADED
state
Diagnosis:
1
2
3
4
zpool status -v
scan: scrub in progress since Tue May 1 12:00:00 2024
1.14T scanned at 1.2G/s, 256G issued at 300M/s
0B repaired, 21.96% done
Solution: Replace faulted drive using Web UI → Storage → Pools → Status → Replace
Problem: Kubernetes pods stuck in Pending
Diagnosis:
1
2
3
4
5
kubectl describe pod $POD_NAME
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Solution: Adjust resource requests or add tolerations for node selectors
Performance Tuning
Slow NFS Exports:
1
2
3
4
5
6
7
8
9
10
11
# /etc/nfs.conf
[nfsd]
threads=16
[exportd]
manage-gids=true
[mountd]
# Disable UDP, force TCP
udp=n
tcp=y
High Kubernetes API Latency:
1
2
3
4
# Edit k3s systemd service
ExecStart=/usr/local/bin/k3s server \
--kube-apiserver-arg="default-watch-cache-size=2000" \
--kube-apiserver-arg="watch-cache-sizes=services#500,endpointslices#1000"
Conclusion
Migrating from Raspberry Pis to a purpose-built TrueNAS SCALE homelab eliminated the reliability issues inherent in SD card-based infrastructure while unlocking enterprise-grade capabilities:
- Storage Resilience: ZFS scrubbing detects bit rot before it impacts data
- Kubernetes Efficiency: Direct PVC provisioning vs. Longhorn overhead
- Operational Simplicity: Unified UI for storage and container management
Next Steps:
- Implement Istio for service mesh across on-prem workloads
- Test Ceph integration for hyperconverged storage
- Explore GPU partitioning for AI workloads
Recommended Resources:
The investment in proper homelab infrastructure pays dividends through reduced maintenance overhead and a platform capable of testing production-grade workflows. Whether you’re running home automation or staging cloud migrations, eliminating storage bottlenecks is the foundation for reliable self-hosting.