Post

My Husband Works In It And He Knows Way More Than You Do

My Husband Works In It And He Knows Way More Than You Do

My Husband Works In IT And He Knows Way More Than You Do: Infrastructure Management in Modern DevOps

Introduction

The infamous phrase “My husband works in IT and knows way more than you do” echoes through helpdesk logs and sysadmin war stories like a bad joke with serious consequences. This attitude – whether from end-users, stakeholders, or even colleagues – represents a fundamental challenge in infrastructure management: the disconnect between perceived technical competence and actual systems knowledge.

The Reddit thread “Umm, I’m Gen Z. I know how to use computers” perfectly encapsulates this modern paradox. While younger generations grow up with intuitive UIs and cloud services, foundational infrastructure knowledge – the kind that kept a hotel’s bonded T1 lines and corporate Exchange servers running 15 years ago – remains critical. Today’s DevOps engineers face similar challenges when managing distributed systems, container orchestration, and hybrid cloud environments.

This comprehensive guide examines infrastructure management through the lens of modern DevOps practices, covering:

  1. Infrastructure-as-Code (IaC) implementation
  2. Container orchestration best practices
  3. Network performance optimization
  4. Legacy system modernization strategies
  5. Cross-generational knowledge transfer techniques

Whether you’re managing a homelab Kubernetes cluster or enterprise-grade cloud infrastructure, these battle-tested approaches will help you avoid becoming the subject of someone else’s “my spouse knows better” horror story.

Understanding Modern Infrastructure Management

Evolution from Sysadmin to DevOps

The shift from traditional system administration to DevOps represents more than just a title change. Consider these key differences:

Traditional SysadminModern DevOps
Manual server provisioningInfrastructure-as-Code
Static environmentsEphemeral containers
Physical hardware focusCloud-native architectures
Reactive troubleshootingProactive monitoring
Siloed responsibilitiesCross-functional collaboration

This transition began in earnest with tools like Puppet (2005) and Chef (2009), accelerating with Docker’s release in 2013. The bonded T1 lines from our hotel story would now be replaced by SD-WAN solutions, while on-prem Exchange servers have largely migrated to cloud-based services like Microsoft 365.

Core Components of Modern Infrastructure

  1. Orchestration Systems: Kubernetes, Docker Swarm, Nomad
  2. Configuration Management: Ansible, Terraform, Pulumi
  3. Monitoring Stack: Prometheus, Grafana, ELK
  4. Networking: Service meshes (Istio, Linkerd), API gateways
  5. Security: Zero-trust architectures, SPIFFE/SPIRE

The Knowledge Gap Challenge

The original hotel scenario highlights three persistent infrastructure challenges:

  1. Bandwidth Management: From T1 lines to 5G/WiFi6
  2. Centralized Services: Exchange → Office 365 → Hybrid models
  3. User Expectations: “Always-on” mentality vs physical constraints

Modern solutions address these through:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Example: Kubernetes Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: bandwidth-limit
spec:
  podSelector:
    matchLabels:
      app: exchange
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: outlook
    ports:
    - protocol: TCP
      port: 443

This policy limits Exchange pod traffic to only Outlook clients on HTTPS, mimicking the controlled access of legacy systems with modern tooling.

Prerequisites for Modern Infrastructure

Hardware Requirements

While cloud providers abstract hardware concerns, on-prem/homelab setups require careful planning:

ComponentMinimumRecommendedEnterprise
CPU4 cores8 cores16+ cores
RAM8GB32GB128GB+
Storage100GB1TB NVMeSAN/NAS
Network1GbE10GbE25/100GbE

Software Requirements

  1. Operating System: RHEL 8+/Ubuntu 20.04 LTS+
  2. Container Runtime: containerd 1.6+, Docker Engine 24.0+
  3. Orchestration: Kubernetes 1.27+, Nomad 1.5+
  4. Automation: Ansible Core 2.14+, Terraform 1.5+

Security Preconfiguration

Before installation:

  1. Configure SSH key authentication:
    1
    2
    
    ssh-keygen -t ed25519 -C "admin@example.com"
    ssh-copy-id -i ~/.ssh/id_ed25519.pub user@server
    
  2. Harden SSH configuration (/etc/ssh/sshd_config):
    1
    2
    3
    4
    
    PermitRootLogin no
    PasswordAuthentication no
    MaxAuthTries 3
    ClientAliveInterval 300
    
  3. Set up basic firewall rules:
    1
    2
    3
    4
    
    sudo ufw default deny incoming
    sudo ufw allow 22/tcp
    sudo ufw allow 6443/tcp  # Kubernetes API
    sudo ufw enable
    

Installation & Setup: Kubernetes Cluster

Base System Configuration

  1. Update all packages:
    1
    
    sudo apt update && sudo apt upgrade -y
    
  2. Install container runtime:
    1
    2
    3
    
    sudo apt install -y containerd runc
    sudo containerd config default | sudo tee /etc/containerd/config.toml
    sudo systemctl restart containerd
    
  3. Disable swap:
    1
    2
    
    sudo swapoff -a
    sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
    

Kubernetes Installation

  1. Add repository:
    1
    2
    3
    
    sudo apt-get install -y apt-transport-https ca-certificates curl
    curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
    echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
    
  2. Install components:
    1
    2
    3
    
    sudo apt-get update
    sudo apt-get install -y kubelet=1.28.5-1.1 kubeadm=1.28.5-1.1 kubectl=1.28.5-1.1
    sudo apt-mark hold kubelet kubeadm kubectl
    
  3. Initialize control plane:
    1
    2
    3
    
    sudo kubeadm init --pod-network-cidr=10.244.0.0/16 \
      --apiserver-advertise-address=192.168.1.100 \
      --control-plane-endpoint=cluster.example.com:6443
    

Network Plugin Setup

Install Calico CNI:

1
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml

Verify node status:

1
kubectl get nodes -o wide

Expected output:

1
2
NAME       STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master01   Ready    control-plane   5m    v1.28.5   192.168.1.100   <none>        Ubuntu 22.04.3 LTS   5.15.0-78-generic   containerd://1.6.21

Configuration & Optimization

Cluster Configuration Best Practices

  1. Resource Quotas: Prevent namespace resource exhaustion
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: compute-resources
    spec:
      hard:
     requests.cpu: "8"
     requests.memory: 16Gi
     limits.cpu: "16"
     limits.memory: 32Gi
    
  2. Pod Security Standards:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    apiVersion: apiserver.config.k8s.io/v1
    kind: AdmissionConfiguration
    plugins:
    - name: PodSecurity
      configuration:
     apiVersion: pod-security.admission.config.k8s.io/v1
     kind: PodSecurityConfiguration
     defaults:
       enforce: "restricted"
       enforce-version: "latest"
    

Performance Optimization

  1. Kubelet Configuration (/var/lib/kubelet/config.yaml):
    1
    2
    3
    4
    5
    6
    7
    
    apiVersion: kubelet.config.k8s.io/v1
    systemReserved:
      cpu: 500m
      memory: 1Gi
    evictionHard:
      memory.available: "500Mi"
      nodefs.available: "10%"
    
  2. ETCD Tuning: ```bash sudo systemctl edit etcd.service

[Service] Environment=”ETCD_HEARTBEAT_INTERVAL=100” Environment=”ETCD_ELECTION_TIMEOUT=500” Environment=”ETCD_SNAPSHOT_COUNT=10000”

1
2
3
4
5
6
7
8
### Security Hardening

1. Enable Pod Security Admission:
```bash
kubectl label --overwrite ns default \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted
  1. Network Policy Enforcement:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: default-deny
    spec:
      podSelector: {}
      policyTypes:
      - Ingress
      - Egress
    

Usage & Operations

Daily Management Tasks

  1. Cluster status check:
    1
    2
    
    kubectl get componentstatuses
    kubectl top nodes
    
  2. Pod management: ```bash

    List pods with extended information

    kubectl get pods -o wide –sort-by=’.status.startTime’

Debugging pod issues

kubectl describe pod $pod_name kubectl logs $pod_name –previous

1
2
3
4
5
6
7
8
9
10
### Backup Procedures

1. ETCD snapshot:
```bash
sudo ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db
  1. Resource manifests backup:
    1
    
    kubectl get all --all-namespaces -o yaml > cluster-state-$(date +%Y%m%d).yaml
    

Monitoring Setup

Prometheus installation via Helm:

1
2
3
4
5
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set alertmanager.enabled=false \
  --set grafana.enabled=true

Access Grafana dashboard:

1
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Troubleshooting Common Issues

Network Connectivity Problems

  1. Check CNI plugin status:
    1
    
    kubectl get pods -n kube-system -l k8s-app=calico-node
    
  2. Validate DNS resolution:
    1
    2
    
    kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never \
      -- nslookup kubernetes.default
    

Resource Constraints

  1. Identify resource hogs:
    1
    
    kubectl top pods --sort-by=memory --containers
    
  2. Analyze pod evictions:
    1
    
    kubectl get events --field-selector=reason=Evicted
    

Persistent Volume Issues

  1. Check storage class:
    1
    
    kubectl get storageclass
    
  2. Verify volume attachments:
    1
    2
    
    kubectl describe pvc $pvc_name
    kubectl describe pv $pv_name
    

Conclusion

The “my husband knows IT” mentality fails precisely where modern infrastructure management succeeds – in recognizing that true expertise requires continuous learning and validation through hands-on practice. The hotel’s bonded T1 lines have been replaced by 100GbE fiber, and Exchange servers have evolved into Kubernetes-hosted microservices, but the core challenge remains: managing complex systems while bridging knowledge gaps between generations of technologists.

Key takeaways from our deep dive:

  1. Infrastructure-as-Code eliminates configuration drift and “works on my machine” scenarios
  2. Container orchestration enables reproducible environments across generations
  3. Modern monitoring provides visibility that legacy systems lacked
  4. Security must be baked in, not bolted on

To continue your infrastructure management journey:

  1. Kubernetes Official Documentation
  2. Terraform Best Practices
  3. Linux Foundation Sysadmin Guide

The next time someone claims superior knowledge through association, remember: real expertise comes from understanding the systems, not just using them. Now go deploy something properly documented.

This post is licensed under CC BY 4.0 by the author.