Post

Micro Lab Self-Contained Cluster For Air-Gapped Platform Engineering

Micro Lab Self-Contained Cluster For Air-Gapped Platform Engineering

1. INTRODUCTION

Air-gapped environments represent the final frontier in secure infrastructure - isolated networks where data sovereignty and security trump all other concerns. Yet these environments pose unique challenges for DevOps practitioners: How do you implement modern platform engineering practices without internet access? How do you maintain Kubernetes clusters, CI/CD pipelines, and infrastructure-as-code tooling when disconnected from global repositories?

This is where purpose-built micro lab clusters shine. By combining compact hardware with carefully selected open-source technologies, engineers can create fully functional platform engineering environments that operate entirely offline. These self-contained systems enable:

  • Secure development/testing of air-gapped production systems
  • Local artifact caching for dependency management
  • Isolated network security testing
  • Portable DevOps training environments

In this comprehensive guide, we’ll dissect a real-world micro lab implementation based on practical homelab experience, examining:

  1. Hardware selection for dense computing in constrained spaces
  2. Network architecture for isolated environment segmentation
  3. Kubernetes deployment strategies without internet access
  4. Air-gapped package management solutions
  5. Security hardening for offline operations

Whether you’re building a compliance-ready development environment or creating a portable platform engineering lab, this deep dive into self-contained cluster design will provide actionable insights for your air-gapped infrastructure projects.

2. UNDERSTANDING THE TOPIC

What is a Micro Lab Self-Contained Cluster?

A micro lab self-contained cluster is a compact, fully isolated computing environment designed to support platform engineering workflows without external network dependencies. These systems typically feature:

  • Physical isolation: No inbound/outbound internet connectivity
  • Compact form factor: Often sub-5U rack space or portable desktop units
  • Service independence: All required dependencies hosted internally
  • Infrastructure-as-code: Reproducible configuration management

Key Components Breakdown

Based on the referenced homelab implementation, we see these critical elements:

  1. Network Foundation:
    • GL-iNet Slate7 router running OpenWrt
    • Dual UniFi Flex Mini 2.5G switches
    • Separate VLANs for storage/service traffic
  2. Compute Layer:
    • 3-node Kubernetes cluster (presumably ARM/x86 microcomputers)
    • Shared storage backend (likely NFS or Ceph)
  3. Control Plane:
    • Lightweight Kubernetes distribution (k3s recommended)
    • Local container registry
    • Air-gapped package repository

Air-Gapped Challenges and Solutions

ChallengeConventional ApproachAir-Gapped Solution
Package ManagementPublic repositoriesLocal Artifactory/Mirror
Container ImagesDocker Hub/GCRPrivate Registry with Pre-cached Images
OS UpdatesOnline RepositoriesLocal Mirror with apt-cacher-ng
CI/CD PipelinesCloud-hosted RunnersSelf-hosted Build Agents
MonitoringCloud ServicesLocal Prometheus/Grafana Stack

Benefits Over Traditional Homelabs

  1. Security: Complete control over network traffic flows
  2. Reproducibility: Exactly mimics production air-gapped constraints
  3. Portability: Compact design enables field deployments
  4. Cost Efficiency: Low-power components reduce operational expense

Real-World Use Cases

  1. Defense Contracting: Developing secure applications for classified environments
  2. Industrial Control Systems: Maintaining isolated SCADA infrastructure
  3. Financial Systems: Building payment processing with strict compliance requirements
  4. Research Labs: Protecting intellectual property during development

3. PREREQUISITES

Hardware Requirements

ComponentMinimum SpecificationRecommended Specification
Compute Nodes3x Raspberry Pi 4 (4GB)3x Intel NUC 12 Pro (32GB RAM)
Network Switch1Gbe managed switchDual 2.5Gbe switches with VLAN support
RouterGL-iNet travel routerOPNsense/pfSense appliance
Storage500GB SSD (USB-connected)NVMe RAID array with hardware controller
Power60W USB-C PDPoE++ with battery backup

Software Prerequisites

  1. Base Operating System:
    1
    2
    3
    4
    5
    
    # For ARM devices (Raspberry Pi)
    wget https://downloads.raspberrypi.org/raspios_lite_arm64/images/raspios_lite_arm64-2023-05-03/2023-05-03-raspios-bullseye-arm64-lite.img.xz
    
    # For x86 nodes
    wget https://releases.ubuntu.com/22.04.3/ubuntu-22.04.3-live-server-amd64.iso
    
  2. Kubernetes Distribution:
    • k3s v1.27.6+ (lightweight and air-gap installable)
    • MicroK8s v1.28+ (alternative for single-node setups)
  3. Critical Dependencies:
    1
    2
    3
    4
    5
    6
    7
    8
    
    # Common packages for all nodes
    sudo apt-get install -y \
      apt-transport-https \
      ca-certificates \
      curl \
      gnupg \
      lsb-release \
      nfs-common
    

Network Preconfiguration

Implement strict network segmentation before installing cluster components:

1
2
3
4
5
6
7
8
9
10
11
12
# Example VLAN configuration on UniFi switches
configure
vlan 10
 name service_network
exit
vlan 20
 name storage_network
exit
interface 0/1
 switchport mode trunk
 switchport trunk allowed vlan 10,20
exit

Security Preparation

  1. Generate SSH keys for password-less authentication:
    1
    
    ssh-keygen -t ed25519 -C "airgap-cluster-admin"
    
  2. Create certificate authority for internal PKI:
    1
    2
    3
    
    openssl genrsa -out airgap-ca.key 4096
    openssl req -x509 -new -nodes -key airgap-ca.key \
      -sha256 -days 3650 -out airgap-ca.crt
    
  3. Set up wireguard VPN for secure administrative access:
    1
    2
    
    wg genkey | sudo tee /etc/wireguard/private.key
    sudo cat /etc/wireguard/private.key | wg pubkey | sudo tee /etc/wireguard/public.key
    

4. INSTALLATION & SETUP

Air-Gapped Kubernetes Deployment

  1. Prepare k3s Installation Bundle:
    1
    2
    3
    4
    
    # On internet-connected machine
    mkdir k3s-airgap && cd k3s-airgap
    curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_DOWNLOAD=true sh -
    scp root@server:/var/lib/rancher/k3s/agent/images/k3s-airgap-images-$ARCH.tar.gz .
    
  2. Transfer Bundle to Air-Gapped Network:
    1
    2
    
    # Using physical media transfer
    tar cvzf k3s-airgap-bundle.tar.gz k3s-airgap/
    
  3. Offline k3s Installation:
    1
    2
    3
    4
    5
    6
    7
    
    # On air-gapped nodes
    tar xvzf k3s-airgap-bundle.tar.gz
    sudo mkdir -p /var/lib/rancher/k3s/agent/images/
    sudo cp k3s-airgap-images-$ARCH.tar.gz /var/lib/rancher/k3s/agent/images/
    sudo INSTALL_K3S_SKIP_DOWNLOAD=true \
      K3S_TOKEN=secret-airgap-token \
      sh k3s-install.sh
    

Private Container Registry Setup

Configure Harbor as the internal registry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# harbor-values.yaml
exposureType: ingress
ingress:
  core:
    hostname: registry.airgap.local
persistence:
  persistentVolumeClaim:
    registry:
      storageClass: "local-path"
chartmuseum:
  enabled: false
notary:
  enabled: false
trivy:
  enabled: false

Install using Helm (air-gapped):

1
2
3
4
5
helm upgrade --install harbor . \
  --namespace harbor-system \
  --create-namespace \
  -f harbor-values.yaml \
  --set externalURL=https://registry.airgap.local

Offline Package Management

Implement Artifactory for universal package management:

  1. Create Debian Repository Mirror:
    1
    2
    3
    
    # Using approx proxy
    sudo apt-get install approx
    sudo sed -i 's/^DEFAULT_PREFIX=.*/DEFAULT_PREFIX=mirror.airgap.local/' /etc/approx/approx.conf
    
  2. Populate Python Package Cache:
    1
    
    pip download -r requirements.txt --dest ./python-packages
    
  3. Configure Air-Graded NPM Registry:
    1
    
    npm config set registry http://artifactory.airgap.local/artifactory/api/npm/npm-virtual/
    

5. CONFIGURATION & OPTIMIZATION

Kubernetes Security Hardening

  1. Pod Security Admission: ```yaml

    psa.yaml

    apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins:

    • name: PodSecurity configuration: defaults: enforce: “restricted” enforce-version: “latest” audit: “restricted” audit-version: “latest” warn: “restricted” warn-version: “latest” exemptions: usernames: [“system:serviceaccount:kube-system:calico-kube-controllers”] ```
  2. Network Policy Enforcement: ```yaml

    default-deny.yaml

    apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all spec: podSelector: {} policyTypes:

    • Ingress
    • Egress ```

Performance Tuning

  1. Kernel Parameters:
    1
    2
    3
    4
    5
    6
    
    # /etc/sysctl.d/99-airgap.conf
    net.core.rmem_max=268435456
    net.core.wmem_max=268435456
    net.ipv4.tcp_rmem=4096 87380 134217728
    net.ipv4.tcp_wmem=4096 65536 134217728
    vm.swappiness=10
    
  2. Container Runtime Optimization:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    // /etc/docker/daemon.json
    {
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "10m",
        "max-file": "3"
      },
      "storage-driver": "overlay2",
      "storage-opts": [
        "overlay2.override_kernel_check=true"
      ]
    }
    

Storage Configuration

Implement Rook-Ceph for persistent storage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph-airgap
spec:
  dataDirHostPath: /var/lib/rook
  cephVersion:
    image: ceph/ceph:v17.2.6
  mon:
    count: 3
  dashboard:
    enabled: true
  storage:
    storageClassDeviceSets:
    - name: ssd
      count: 3
      portable: false
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          storageClassName: local-path
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi

6. USAGE & OPERATIONS

Daily Management Tasks

  1. Node Maintenance:
    1
    2
    3
    4
    5
    
    # Safely drain node
    kubectl drain $NODE_NAME --ignore-daemonsets --delete-emptydir-data
    
    # Post-maintenance uncordon
    kubectl uncordon $NODE_NAME
    
  2. Air-Gapped Application Deployment:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    # Build image using internal dependencies
    docker build --network host -t registry.airgap.local/app:v1 .
    
    # Push to private registry
    docker push registry.airgap.local/app:v1
    
    # Deploy with Helm
    helm upgrade --install myapp ./chart \
      --set image.repository=registry.airgap.local/app \
      --set image.tag=v1
    

Monitoring Stack

Implement Prometheus/Grafana with air-gapped dashboards:

1
2
3
4
5
6
7
8
9
# prometheus-values.yaml
server:
  persistentVolume:
    enabled: true
    storageClass: rook-ceph-block
alertmanager:
  enabled: false
thanos:
  enabled: false

Backup Strategy

  1. Cluster State Backup:
    1
    2
    3
    4
    5
    6
    7
    
    # Velero installation without S3
    velero install \
      --provider aws \
      --plugins velero/velero-plugin-for-aws:v1.7.0 \
      --bucket velero-backups \
      --use-volume-snapshots=false \
      --backup-location-config region=minio,s3ForcePathStyle=true,s3Url=http://minio.airgap.local:9000
    
  2. Application Data Backup:
    1
    2
    3
    4
    5
    
    # Kasten K10 air-gapped installation
    helm install kasten k10/k10 --namespace=kasten-io \
      --set global.persistence.storageClass=rook-ceph-block \
      --set externalGateway.create=true \
      --set auth.tokenAuth.enabled=true
    

7. TROUBLESHOOTING

Common Issues and Solutions

  1. DNS Resolution Failures:
    1
    2
    3
    4
    5
    6
    7
    
    # Verify CoreDNS functionality
    kubectl -n kube-system logs -l k8s-app=kube-dns
    
    # Check local DNS configuration
    kubectl run -it --rm --restart=Never dnscheck \
      --image=registry.airgap.local/busybox:latest \
      -- nslookup kubernetes.default
    
  2. Certificate Validation Errors:
    1
    2
    3
    4
    
    # Install custom CA certificate cluster-wide
    kubectl create secret generic airgap-ca \
      --from-file=ca.crt=airgap-ca.crt \
      -n kube-system
    
  3. Storage Provisioning Issues:
    1
    2
    3
    4
    5
    
    # Check storage class parameters
    kubectl describe storageclass rook-ceph-block
    
    # Verify Ceph cluster health
    kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
    

Performance Diagnostics

  1. Node Resource Inspection:
    1
    2
    3
    4
    5
    6
    
    # Install metrics-server for air-gapped
    kubectl apply -f metrics-server-components.yaml
    
    # View resource utilization
    kubectl top nodes
    kubectl top pods --all-namespaces
    
  2. Network Latency Analysis:
    1
    2
    3
    4
    5
    
    # Create network benchmark pod
    kubectl run net-test --image=registry.airgap.local/nicolaka/netshoot -it --rm
    
    # Run iperf3 tests between nodes
    iperf3 -c 10.42.1.5 -p 5201
    

##

This post is licensed under CC BY 4.0 by the author.