Is This Normal Or Something Is Wrong With Me

Posted Aug 18, 2025

By Usman Masood Ashraf

views 7 min read

Is This Normal Or Something Is Wrong With Me: The DevOps Homelab Reality Check

Introduction

That moment when you stare at your infrastructure setup - cables snaking across floors, servers blinking ominously in dark corners, monitors displaying cryptic terminal outputs - and wonder: “Is this normal, or should I seek professional help?” You’re not alone. This existential question haunts every DevOps engineer and sysadmin who’s ever built a homelab or managed infrastructure at scale.

In the world of self-hosted environments and infrastructure management, the line between professional necessity and obsessive hoarding blurs faster than a kernel panic. The Reddit thread that inspired this post perfectly captures our collective anxiety - comments ranging from “there’s still visible floor” to “you might be a hoarder” reflect the unspoken tension in our community.

This comprehensive guide examines the realities of infrastructure management through the lens of professional DevOps practices. We’ll explore:

The psychology of infrastructure accumulation
Objective metrics for evaluating your setup
Optimization strategies for homelabs and production environments
When “normal” becomes technical debt
Sustainable approaches to infrastructure growth

Whether you’re managing a Raspberry Pi cluster in your basement or enterprise Kubernetes deployments, you’ll learn to distinguish between healthy infrastructure growth and problematic technical sprawl.

Understanding Infrastructure Sprawl

What Constitutes “Normal” in DevOps Environments?

In infrastructure management, “normal” is a spectrum bounded by two extremes:

Minimalist Ideal:

1-3 servers  
Standard monitoring stack  
Version-controlled configurations  
Documented disaster recovery plan

Common Reality:

8+ repurposed workstations  
Mixed-generation hardware  
Multiple hypervisors  
Ad-hoc monitoring solutions  
"Works on my lab" deployment processes

The key differentiator isn’t quantity, but manageability. As Google’s Site Reliability Engineering book notes: “The service’s management system should be uniform and not require significant manual intervention.”

The Psychology of Tech Hoarding

Why do we accumulate infrastructure? Several factors drive this behavior:

The “Just-in-Case” Syndrome: Keeping legacy systems “in case we need them”
Tool FOMO: Deploying every new DevOps tool that trends on Hacker News
Skill Stockpiling: Maintaining obsolete systems to preserve niche expertise
Monitoring Overcompensation: Implementing 5 monitoring solutions because “Prometheus might miss something”

A 2022 SysAdmin Survey revealed that 68% of professionals maintain systems they know should be decommissioned.

Technical Debt vs. Healthy Experimentation

Not all infrastructure sprawl is bad. The critical distinction lies in intentionality:

+---------------------+-----------------------------+------------------------------+
| Characteristic      | Healthy Experimentation     | Technical Debt               |
+---------------------+-----------------------------+------------------------------+
| Documentation       | Comprehensive               | Non-existent                 |
| Resource Usage      | Monitored and constrained   | Unchecked                    |
| Clear Purpose       | Defined learning objective  | "Might need it someday"      |
| Update Frequency    | Regular maintenance         | Never touched                |
| Security Posture    | Properly isolated           | Exposed vulnerabilities      |
+---------------------+-----------------------------+------------------------------+

Prerequisites for Sustainable Infrastructure

Before evaluating your setup, establish these foundational elements:

Hardware Requirements

Minimum viable monitoring for any environment:

  
# Resource monitoring basics
sudo apt install htop iotop iftop nmon

Organizational Principles

Implement these constraints before adding new components:

Naming Convention Standard

{environment}-{function}-{number} (prod-db-01, dev-app-03)

Resource Budget

  
# inventory.yaml
environments:
  production:
    cpu_cores: 48
    memory_gb: 256
    storage_tb: 10
  development:
    cpu_cores: 16
    memory_gb: 64
    storage_tb: 2

Lifecycle Policy

Any system unused for 90 days gets automatically decommissioned

Security Baseline

Every new component must meet:

# Basic security checklist
- Automatic security updates enabled
- SSH key authentication only
- Firewall restricting ingress/egress
- Non-root operation
- Log aggregation configured

Installation & Setup: Building With Intent

Step 1: Infrastructure as Code Foundation

Start with version-controlled environment definition:

  
mkdir infrastructure && cd infrastructure
git init
touch {servers,network,storage}.tf

  
# servers.tf
resource "proxmox_vm_qemu" "base_server" {
  count       = 3
  name        = "prod-base-${count.index}"
  target_node = "pve-primary"
  clone       = "ubuntu-2204-template"
  
  # Constrain resources from the start
  cores   = 4
  memory  = 8192
  disk {
    size    = "50G"
    storage = "ssd-pool"
  }
}

Step 2: Monitoring Implementation

Deploy a minimal observability stack:

  
# docker-compose.monitoring.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.47.2
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    restart: unless-stopped

  node_exporter:
    image: prom/node-exporter:v1.6.1
    pid: host
    restart: unless-stopped

Step 3: Resource Constraints

Enforce boundaries through tooling:

  
# Create resource constraint namespace
kubectl create ns constrained

# Apply limits to all deployments in namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: LimitRange
metadata:
  name: resource-limits
  namespace: constrained
spec:
  limits:
  - default:
      cpu: "1"
      memory: "1Gi"
    defaultRequest:
      cpu: "100m"
      memory: "256Mi"
    type: Container
EOF

Configuration & Optimization

The 70% Utilization Rule

Maintain healthy resource headroom:

+---------------+-----------------+------------------+
| Resource Type | Ideal Usage (%) | Action Threshold |
+---------------+-----------------+------------------+
| CPU           | 40-60           | >70% sustained   |
| Memory        | 60-70           | >85% sustained   |
| Disk          | 50-60           | >75%             |
| Network       | 30-40           | >60% sustained   |
+---------------+-----------------+------------------+

Security Hardening Checklist

For any Linux system:

  
# 1. Audit user accounts
sudo awk -F: '($3 < 1000) {print $1}' /etc/passwd

# 2. Verify permissions on critical files
sudo find / -perm -4000 -type f -exec ls -ld {} \;

# 3. Check for unnecessary services
sudo systemctl list-unit-files --state=enabled

# 4. Validate firewall rules
sudo iptables -L -v -n

# 5. Confirm log configuration
sudo ls -l /var/log/

Storage Optimization Strategies

Combat “just one more disk” syndrome:

  
# Identify storage hotspots
sudo du -h --max-depth=1 / | sort -hr

# Implement automated cleanup
find /var/log -name "*.log" -type f -mtime +30 -delete

# Set filesystem quotas
sudo setquota -u $USER 50G 55G 0 0 /

Usage & Operations: Maintaining Sanity

Daily Maintenance Routine

00 - Review overnight alerts (critical only)
00 - Check backup status reports
00 - Validate resource utilization trends
00 - Security patch assessment
00 - Infrastructure-as-Code updates

Backup Verification Protocol

  
#!/bin/bash
# validate_backups.sh

BACKUP_DIR="/backups/daily"
RETENTION_DAYS=7

# Check retention compliance
find $BACKUP_DIR -type f -mtime +$RETENTION_DAYS -exec rm {} \;

# Verify latest backup integrity
latest=$(ls -t $BACKUP_DIR | head -1)
tar -tzf "$BACKUP_DIR/$latest" >/dev/null || echo "Backup corrupt!"

Capacity Planning Formula

Predict when you’ll need more resources:

  
# growth_predictor.py
import numpy as np

current_storage = 500  # GB
daily_growth = 2.5     # GB/day
threshold = 750        # GB

days_remaining = (threshold - current_storage) / daily_growth
print(f"Expand storage in {days_remaining:.1f} days")

Troubleshooting Common Issues

System Overload Diagnostics

  
# 1. Identify resource bottlenecks
dstat -tcmnd --disk-util

# 2. Check process hierarchy
systemd-cgtop

# 3. Analyze disk I/O
iotop -oPa

# 4. Detect memory leaks
vmstat -SM 1 10

# 5. Network saturation diagnosis
nload -m -u G

When to Declare Infrastructure Bankruptcy

Signs you need a complete reset:

No documentation exists for >40% of systems
More than 3 generations of hardware present
Critical services depend on deprecated technology
Security patches haven’t been applied in >180 days
You have VMs running solely to host forgotten services

Rebuild procedure:

Inventory essential services
Define migration priorities
Build new environment with IaC
Perform phased migrations
Enforce constraints from day one

Conclusion

The question “Is this normal or something wrong with me?” reveals deeper truths about infrastructure management. Healthy environments balance three competing demands:

Functionality: Does it serve its intended purpose?
Maintainability: Can we support it without heroic efforts?
Sustainability: Does it align with available resources?

Remember the wisdom from UNIX philosophy: “Do one thing well.” Apply this to your infrastructure by regularly asking:

What problem does this component solve?
What would happen if we removed it?
Does its value justify the maintenance cost?

For further learning:

Ultimately, “normal” is what lets you sleep through the night without alerts. If your setup meets business requirements while maintaining operational sanity, embrace its quirks - visible floor space optional.

Open Source, Reddit Guides, Kubernetes

This post is licensed under CC BY 4.0 by the author.