Post

Do You Remember When Mice Had Balls

Do You Remember When Mice Had Balls

Introduction

The distinctive click-clack of cleaning a mechanical mouse’s rollers remains etched in the memory of every sysadmin who worked through the 90s. Like those physical maintenance routines, modern infrastructure management demands its own form of digital hygiene - not with alcohol swabs and compressed air, but with configuration management and observability pipelines.

In today’s ephemeral environments where containers spin up and down in seconds, establishing robust infrastructure maintenance practices has become both more critical and more complex. This guide bridges nostalgic system administration wisdom with contemporary DevOps practices, transforming “cleaning mouse balls” into actionable strategies for maintaining cloud-native systems.

You’ll learn:

  • The infrastructure hygiene parallels between physical hardware and cloud resources
  • How to implement automated maintenance workflows using modern tools
  • Configuration management patterns that prevent “digital dust” accumulation
  • Monitoring strategies that replace physical inspection of components
  • Performance optimization techniques for containerized environments

We’ll leverage open-source tools including Docker, Ansible, and Prometheus to build maintenance systems that would make any 90s sysadmin proud of their digital successor.

Understanding Infrastructure Hygiene

From Physical to Digital Maintenance

Mechanical mice required direct physical interaction - removing the ball, scraping rubber-coated rollers, and clearing gunk from optical sensors. Each component had clear failure modes:

  1. Ball traction degradation (dust accumulation)
  2. Roller encoder misalignment
  3. Physical switch wear-out

Modern infrastructure presents analogous challenges:

  • Resource leaks: Zombie containers, orphaned volumes
  • Configuration drift: Unmanaged changes to IaC definitions
  • Performance degradation: Resource contention in shared environments
  • Security vulnerabilities: Unpatched dependencies in container images

Evolution of Maintenance Paradigms

EraMaintenance ApproachToolsFailure Detection
1990sPhysical inspectionScrewdrivers, cleaning kitsUser complaints
2000sScheduled scriptsCron jobs, batch filesNagios alerts
2010sInfrastructure as CodeAnsible, TerraformCloudWatch metrics
2020sDeclarative automationKubernetes operatorsAIOps correlation

Key Components of Modern Hygiene

  1. Immutable Infrastructure: Treating servers as disposable cattle rather than pets
  2. Declarative Configuration: Version-controlled infrastructure definitions
  3. Automated Remediation: Self-healing systems with operator patterns
  4. Observability Pipelines: Centralized metrics, logs, and traces
  5. Chaos Engineering: Proactive failure injection testing

Real-World Analogy: Mouse Ball vs. Container Orchestration

Mouse ComponentModern EquivalentMaintenance Strategy
Rubber ballContainer imageRegular vulnerability scanning
X/Y-axis rollersCluster nodesNode auto-scaling groups
Ball casingContainer runtimeRuntime security hardening
PS/2 connectorService meshNetwork policy enforcement

Prerequisites

System Requirements

Minimum Hardware:

  • 2 CPU cores (x86_64 or ARMv8)
  • 4GB RAM
  • 20GB storage (SSD recommended)

Operating Systems:

  • Ubuntu 22.04 LTS
  • RHEL 9+ or compatible
  • Debian 11 (Bullseye)

Network Considerations:

  • Outbound HTTPS access for package retrieval
  • Inbound ports for management interfaces (SSH:22, Prometheus:9090)
  • Firewall rules restricting access to management interfaces
  • VLAN segmentation for production vs. management traffic

Software Dependencies

  1. Container Runtime:
    • Docker Engine 24.0+
      1
      2
      
      # Installation command for Ubuntu
      sudo apt-get install docker-ce=5:24.0.7-1~ubuntu.22.04~jammy docker-ce-cli=5:24.0.7-1~ubuntu.22.04~jammy containerd.io
      
  2. Configuration Management:
    • Ansible Core 2.15+
      1
      
      python3 -m pip install ansible-core==2.15.6
      
  3. Monitoring Stack:
    • Prometheus 2.47+
    • Node Exporter 1.6+

Security Preparation

  1. Create dedicated service accounts:
    1
    2
    
    sudo useradd -r -s /sbin/nologin prometheus
    sudo useradd -r -s /sbin/nologin node_exporter
    
  2. Configure SSH key authentication:
    1
    
    ssh-keygen -t ed25519 -f ~/.ssh/infra_hygiene
    
  3. Set up encrypted credential storage:
    1
    
    mkdir ~/.infra-secrets && chmod 700 ~/.infra-secrets
    

Pre-Installation Checklist

  1. Verify CPU virtualization support
    1
    
    lscpu | grep Virtualization
    
  2. Confirm time synchronization
    1
    
    timedatectl status | grep synchronized
    
  3. Validate filesystem permissions
    1
    
    df -Th /var/lib/docker
    
  4. Test network throughput
    1
    
    iperf3 -c <test_server>
    

Installation & Setup

Container Runtime Configuration

Docker Daemon Settings (/etc/docker/daemon.json):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65536,
      "Soft": 65536
    }
  },
  "live-restore": true,
  "experimental": false
}

Key Configuration Directives:

  1. log-driver: Prevents container logs from consuming disk space
  2. default-ulimits: Sets open file handle limits for all containers
  3. live-restore: Maintains containers during daemon restarts

Monitoring Stack Deployment

Prometheus Docker Compose (docker-compose-monitoring.yml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.47.0
    container_name: prometheus
    user: "prometheus"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    ports:
      - "9090:9090"
    restart: unless-stopped

  node_exporter:
    image: prom/node-exporter:v1.6.1
    container_name: node_exporter
    user: "node_exporter"
    command:
      - "--path.rootfs=/host"
    pid: "host"
    volumes:
      - /:/host:ro,rslave
    restart: unless-stopped

volumes:
  prom_data:

Prometheus Configuration (prometheus.yml):

1
2
3
4
5
6
7
8
9
10
11
12
global:
  scrape_interval: 15s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node_exporter:9100']

  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

Verification Workflow

  1. Check container status:
    1
    
    docker ps --format "table $CONTAINER_ID\t$CONTAINER_NAMES\t$CONTAINER_STATUS\t$CONTAINER_PORTS"
    
  2. Validate metrics collection:
    1
    
    curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health'
    
  3. Test alert pipeline:
    1
    
    docker run --rm -it busybox sh -c "while true; do dd if=/dev/zero of=/dev/null; done"
    

Configuration & Optimization

Security Hardening

Container Runtime Protections:

1
2
3
4
5
6
7
8
# Run container as non-root user
docker run --user 1000:1000 nginx:alpine

# Mount filesystem as read-only
docker run --read-only -v /tmp:/tmp alpine

# Disable inter-container communication
docker network create --internal isolated_net

Linux Kernel Parameters (/etc/sysctl.d/99-hygiene.conf):

1
2
3
4
5
6
7
# Prevent container privilege escalation
kernel.kptr_restrict=2
kernel.unprivileged_bpf_disabled=1

# Harden network stack
net.ipv4.conf.all.log_martians=1
net.ipv4.icmp_echo_ignore_broadcasts=1

Performance Optimization

Resource Constraints:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# docker-compose-resources.yml
services:
  webapp:
    image: nginx:alpine
    deploy:
      resources:
        limits:
          cpus: '1.5'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M
    ulimits:
      nproc: 65535
      nofile:
        soft: 20000
        hard: 40000

Filesystem Tuning:

1
2
3
# Mount SSD with optimal options
mkfs.xfs -f /dev/sdb1
mount -o noatime,nodiratime,discard /dev/sdb1 /var/lib/docker

Observability Integration

Prometheus Alert Rules (alerts.yml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
groups:
- name: hygiene_alerts
  rules:
  - alert: HighMemoryUsage
    expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Host memory exhausted (instance )"
      description: "Available memory is below 10%"

  - alert: UnhealthyContainer
    expr: count_over_time(container_last_seen{name=~".+"}[5m]) == 0
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Container not reporting ()"

Usage & Operations

Daily Maintenance Commands

Container Hygiene:

1
2
3
4
5
6
7
8
# Remove stopped containers older than 24h
docker container prune --filter "until=24h" --force

# Clean unused images
docker image prune --all --filter "until=168h" --force

# Inspect container resource usage
docker stats --no-stream --format "table $CONTAINER_NAMES\t$CONTAINER_CPU_PERC\t$CONTAINER_MEM_USAGE"

Filesystem Monitoring:

1
2
3
4
5
# Find largest container logs
find /var/lib/docker/containers/ -name "*.log" -exec du -sh {} + | sort -rh | head -n 10

# Analyze storage drivers
docker system df -v

Backup Strategies

Volume Backup Procedure:

1
2
3
4
5
6
7
# Create snapshot of named volume
docker run --rm -v db_data:/volume -v /backups:/backup alpine \
  tar czf /backup/db_data_$(date +%Y%m%d).tar.gz -C /volume ./

# Restore from backup
docker run --rm -v db_data:/restore -v /backups:/backup alpine \
  sh -c "rm -rf /restore/* && tar xzf /backup/db_data_20240101.tar.gz -C /restore"

Cron Job for Regular Backups:

1
0 2 * * * docker run --rm -v db_data:/volume -v /backups:/backup alpine tar czf /backup/db_data_$(date +\%Y\%m\%d).tar.gz -C /volume ./

Scaling Patterns

Horizontal Scaling with Docker Swarm:

1
2
3
4
5
6
7
8
# Create service with auto-scaling
docker service create --name web --replicas 3 \
  --limit-cpu 0.5 --limit-memory 256M \
  --restart-condition any \
  nginx:alpine

# Scale based on CPU utilization
docker service update --replicas-max 10 --replicas-min 2 web

Troubleshooting

Common Issues and Solutions

1. Container Failing to Start

1
2
3
4
5
# Check logs from previous run
docker logs --tail 50 $CONTAINER_NAMES

# Verify image integrity
docker inspect --format='' $CONTAINER_IMAGE

2. High CPU/Memory Usage

1
2
3
4
5
6
# Identify problem process
docker exec $CONTAINER_NAMES top -o %CPU

# Profile CPU with perf
docker run --rm --privileged --pid=host alpine sh -c \
  "apk add perf && perf top -p $(pgrep -f $CONTAINER_NAMES)"

3. Network Connectivity Issues

1
2
3
4
5
# Test container DNS resolution
docker run --rm busybox nslookup google.com

# Inspect iptables rules
sudo iptables -L DOCKER-USER -v

Diagnostic Commands

System Inspection:

1
2
3
4
5
# Comprehensive system report
docker run --rm -v /:/host tmknom/prepare-report > system_report.txt

# Analyze container performance
docker stats --format "table $CONTAINER_NAMES\t$CONTAINER_CPU_PERC\t$CONTAINER_MEM_USAGE\t$CONTAINER_NET_IO\t$CONTAINER_BLOCK_IO"

Log Investigation:

1
2
3
4
5
# Follow logs across containers
docker service logs -f --since 5m --raw web | grep -i error

# Export logs for analysis
docker logs $CONTAINER_NAMES >& container.log

Conclusion

The discipline of keeping mechanical mice functional through regular cleaning finds its modern counterpart in systematic infrastructure hygiene practices. Where we once scraped rubber rollers clean of accumulated grime, we now automate configuration drift remediation and resource leak detection.

Key maintenance parallels:

  • Physical inspectionContinuous monitoring
  • Component replacementImmutable infrastructure
  • Preventive cleaningAutomated vulnerability scanning
  • Performance degradationResource usage alerts

To deepen your infrastructure hygiene practice:

  1. Implement scheduled reconciliation jobs using Ansible
  2. Study container security fundamentals with Docker Bench
  3. Explore advanced monitoring with Prometheus Operator

The essence of system administration remains unchanged: vigilant maintenance prevents catastrophic failures. Only the tools have evolved - from screwdrivers to kubectl, from cleaning kits to CI/CD pipelines. Our mission endures: keep the systems running smoothly, whether they track mouse balls or Kubernetes pods.

This post is licensed under CC BY 4.0 by the author.