Post

When Will They Learn

When Will They Learn

When Will They Learn

Introduction

The eternal debate rages on in every DevOps channel and sysadmin forum: self-hosted infrastructure versus managed cloud services. We’ve all seen the passionate arguments - from the homelab enthusiast preaching the virtues of physical hardware ownership to the cloud-native evangelist advocating for serverless architectures.

But when will we learn that this isn’t a binary choice? The recent Reddit discussion highlights the tribal nature of this debate: “Careful OP, the cloud fan boys will get mad” juxtaposed with “I self-host at home and do cloud work professionally. There are different reasons for different solutions, folks.” This polarization misses the fundamental truth - modern infrastructure management requires understanding both approaches and knowing when each is appropriate.

For DevOps professionals and system administrators, this knowledge isn’t academic. The decision to self-host or use cloud services impacts:

  • Total cost of ownership (TCO)
  • System reliability and uptime
  • Security postures
  • Maintenance overhead
  • Technical debt accumulation

This comprehensive guide cuts through the dogma to examine practical infrastructure strategies. You’ll learn:

  • How to evaluate self-hosting vs cloud solutions objectively
  • Architectural patterns for hybrid deployments
  • Cost optimization techniques for both models
  • Maintenance strategies that prevent 3 AM outages
  • Security considerations for mixed environments

Whether you’re managing a homelab Kubernetes cluster or enterprise-grade cloud infrastructure, the principles here will help you make informed decisions that balance control, cost, and complexity.

Understanding the Topic

Defining the Battle Lines

Self-Hosting refers to deploying and managing infrastructure on hardware you physically control - whether that’s a Raspberry Pi in your basement or a colocation facility rack. The key characteristics include:

  • Direct hardware access
  • Full control over networking stack
  • Responsibility for all maintenance
  • Upfront capital expenditure (CapEx)

Cloud Services encompass managed infrastructure offerings from providers like AWS, Azure, or Google Cloud Platform (GCP). Key attributes:

  • Consumption-based pricing (OpEx)
  • Shared responsibility model
  • Elastic scalability
  • Managed maintenance and updates

Historical Context

The self-hosting vs cloud debate mirrors computing’s evolution:

  1. Mainframe Era (1960s-1980s): Centralized computing with dumb terminals
  2. Client-Server Model (1990s): Distributed computing with on-premises servers
  3. Virtualization Boom (2000s): Improved hardware utilization through VMs
  4. Cloud Revolution (2010s): On-demand infrastructure as a service
  5. Hybrid/Multi-Cloud Present (2020s): Strategic mixing of deployment models

Feature Comparison

CharacteristicSelf-HostedCloud Services
Cost StructureHigh CapEx, lower OpExNo CapEx, variable OpEx
ControlComplete hardware/networkLimited to service tiers
ScalabilityManual, hardware-limitedInstant, API-driven
MaintenanceFull owner responsibilityProvider-managed patching
ComplianceSelf-certifiedProvider certifications
LatencyControllable (local)Depends on region selection

Real-World Applications

When Self-Hosting Wins:

  • Data sovereignty requirements
  • Specialized hardware needs (HPC, GPU clusters)
  • Predictable workloads with static capacity
  • Legacy systems with compatibility constraints

Cloud Advantages:

  • Bursty or unpredictable traffic patterns
  • Global distribution requirements
  • Rapid prototyping needs
  • Compliance-heavy industries (HIPAA, PCI DSS)

The Reddit comment about using Cloudflare for self-hosted projects illustrates a hybrid approach - leveraging cloud services to enhance self-hosted infrastructure. This pattern combines the control of self-hosting with cloud benefits like DDoS protection and global CDN caching.

Prerequisites

Hardware Requirements

For self-hosted deployments:

ComponentMinimum SpecificationRecommended Specification
CPU4 cores (x86_64)8+ cores with VT-x/AMD-V
RAM8GB DDR432GB ECC RAM
Storage250GB SSDRAID 10 with NVMe SSDs
Network1Gbps NIC10Gbps with LACP bonding
PowerSingle PSUDual redundant PSUs

Software Requirements

Base operating systems:

  • Ubuntu Server 22.04 LTS (Linux 5.15+ kernel)
  • CentOS Stream 9 or RHEL 9 equivalent
  • VMware ESXi 8.0 for bare-metal hypervisor

Critical dependencies:

  • Docker CE 24.0+ or Containerd 1.7+
  • Kubernetes 1.28+ (for orchestration)
  • Terraform 1.5+ (for hybrid provisioning)
  • Ansible 8.3+ (for configuration management)

Network Considerations

Security essentials:

  • Hardware firewall (pfSense/OPNsense)
  • VLAN segmentation for services
  • VPN termination (WireGuard/OpenVPN)
  • Reverse proxy (Traefik/Nginx)
  • DNS filtering (Pi-hole/AdGuard Home)

Pre-Installation Checklist

  1. Validate hardware compatibility
  2. Configure BIOS/UEFI settings:
    • Enable virtualization extensions
    • Set power failure recovery mode
  3. Document physical network topology
  4. Establish backup strategy (3-2-1 rule):
    • 3 copies of data
    • 2 different media types
    • 1 offsite copy
  5. Test UPS battery runtime under load

Installation & Setup

Bare-Metal Provisioning

For self-hosted Kubernetes clusters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Install kubeadm, kubelet and kubectl
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# Initialize control plane
sudo kubeadm init --pod-network-cidr=192.168.0.0/16

# Configure kubectl access
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install network plugin (Calico)
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml

Hybrid Cloud Integration

Connecting self-hosted infrastructure to AWS:

1
2
3
4
5
6
7
8
9
10
# Install AWS Systems Manager Agent for hybrid management
sudo mkdir /tmp/ssm
cd /tmp/ssm
sudo wget https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/debian_amd64/amazon-ssm-agent.deb
sudo dpkg -i amazon-ssm-agent.deb
sudo systemctl enable amazon-ssm-agent
sudo systemctl start amazon-ssm-agent

# Verify instance registration
aws ssm describe-instance-information --filters "Key=ResourceType,Values=ManagedInstance"

Cloudflare Tunnel Setup

Securely exposing self-hosted services without public IPs:

  1. Create Cloudflare Zero Trust account
  2. Install cloudflared daemon:
1
2
3
4
5
6
7
8
9
10
11
12
# For Debian/Ubuntu
wget https://github.com/cloudflare/cloudflared/releases/download/2023.8.2/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared-linux-amd64.deb

# Authenticate
cloudflared tunnel login

# Create tunnel
cloudflared tunnel create usman-tunnel

# Configure ingress rules
nano ~/.cloudflared/config.yaml

Example config.yaml:

1
2
3
4
5
6
7
8
9
tunnel: 6a145a39-1a85-4ed4-8956-3a15f3f8e6e7
credentials-file: /home/usman/.cloudflared/6a145a39-1a85-4ed4-8956-3a15f3f8e6e7.json

ingress:
  - hostname: gitlab.
    service: http://localhost:3000
  - hostname: prometheus.
    service: http://localhost:9090
  - service: http_status:404

Verification Steps

Validate Kubernetes cluster health:

1
2
3
kubectl get nodes -o wide
kubectl get pods -A
kubectl describe node $NODE_NAME

Test Cloudflare Tunnel connectivity:

1
2
cloudflared tunnel route dns usman-tunnel gitlab.
cloudflared tunnel run usman-tunnel

Configuration & Optimization

Security Hardening

Kubernetes Pod Security Policies:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
      - min: 1
        max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      - min: 1
        max: 65535
  readOnlyRootFilesystem: false

Network Policies:

1
2
3
4
5
6
7
8
9
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Performance Optimization

Kernel Parameters for High-Traffic Servers:

1
2
3
4
5
6
7
8
9
10
# /etc/sysctl.conf
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.core.somaxconn=65535
net.ipv4.tcp_max_syn_backlog=65535
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=30

Docker Daemon Optimization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// /etc/docker/daemon.json
{
  "live-restore": true,
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ],
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65535,
      "Soft": 65535
    }
  }
}

Hybrid Monitoring Setup

Combining Prometheus with Cloud Monitoring:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'selfhosted-nodes'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.11:9100']

  - job_name: 'gcp-instances'
    ec2_sd_configs:
      - region: us-west1
        access_key: $AWS_ACCESS_KEY
        secret_key: $AWS_SECRET_KEY
        port: 9100

Usage & Operations

Daily Maintenance Checklist

  1. Storage Monitoring:
    1
    2
    3
    
    df -h / /var/lib/docker
    docker system df
    kubectl describe pvc
    
  2. Log Review:
    1
    2
    
    journalctl --since "24 hours ago" -u docker
    kubectl logs -l app=nginx --since=1h
    
  3. Backup Verification:
    1
    2
    
    restic -r /backups check
    velero backup get
    
  4. Security Updates:
    1
    2
    
    apt list --upgradable
    kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.containers[].image | test(":[0-9]+\\.")) | .metadata.name'
    

Hybrid Scaling Patterns

Burst to Cloud During Traffic Spikes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Terraform autoscaling policy
resource "aws_autoscaling_policy" "burst_policy" {
  name                   = "onprem_burst"
  scaling_adjustment     = 4
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.burst_group.name
}

resource "kubernetes_horizontal_pod_autoscaler" "onprem_hpa" {
  metadata {
    name = "onprem-autoscaler"
  }
  spec {
    scale_target_ref {
      kind = "Deployment"
      name = "frontend"
    }
    min_replicas = 3
    max_replicas = 10
    target_cpu_utilization_percentage = 80
  }
}

Troubleshooting

Common Issues Matrix

SymptomSelf-Hosted Likely CauseCloud Service Likely Cause
Intermittent connectivityNIC bonding misconfigurationSecurity group rules
DNS resolution failuresLocal resolver issuesRoute53 private zone config
Storage performance dropsDisk failure in RAID arrayEBS volume throughput limits
Authentication failuresLDAP/AD sync issuesIAM role misconfiguration
Certificate errorsLet’s Encrypt renewal failureACM certificate
This post is licensed under CC BY 4.0 by the author.