Post

First Time Not Playing The Hero Feels Good

First Time Not Playing The Hero Feels Good

First Time Not Playing The Hero Feels Good

Introduction

Walking into a homelab or a self‑hosted environment and hearing the familiar phrase “We need you to fix this now” is a scenario many seasoned engineers recognize all too well. Yet there is a growing cadre of practitioners who, for the first time, experience the quiet satisfaction of not being the hero who scrambles at the last minute. This feeling isn’t about ego; it’s about establishing a resilient, automated, and predictable infrastructure that lets you step back from constant firefighting and focus on strategic growth.

In the world of DevOps, the transition from reactive heroics to proactive stewardship is often marked by a series of deliberate choices: robust monitoring, automated ticket routing, disciplined onboarding, and clear ownership boundaries. This guide unpacks those choices, offering a step‑by‑step blueprint for building a system where the “hero” role becomes optional rather than inevitable.

You will learn:

  • How to shift from ad‑hoc incident response to systematic, repeatable processes.
  • Which open‑source tools and patterns best support a self‑hosted homelab. - Concrete Docker‑based installations that avoid the pitfalls of placeholder syntax that conflicts with Jekyll Liquid templating.
  • Strategies for securing, optimizing, and scaling your stack without introducing hidden technical debt.
  • Practical troubleshooting techniques that keep the system reliable when it matters most.

By the end of this guide, you’ll have a clear roadmap for creating an environment where the first time you don’t get tapped on the shoulder during an office celebration is not a coincidence, but a design decision.


Understanding the Topic

What Does “Not Playing The Hero” Mean in DevOps?

In traditional IT and even modern DevOps narratives, the “hero” is the engineer who rushes in at 2 a.m. to resolve a critical outage, often bypassing standard procedures to restore service. While heroic efforts can be commendable, they are also symptomatic of systemic weaknesses: missing alerts, opaque configuration, or manual processes that lack documentation.

The phrase “First Time Not Playing The Hero Feels Good” captures the psychological shift when an engineer experiences the relief of a system that does not require that last‑minute heroics. It signals:

  1. Predictability – Alerts fire before issues become crises.
  2. Ownership – Clear escalation paths and documented runbooks exist.
  3. Automation – Repetitive tasks are handled by code, not by human intervention.

Historical Context

The concept of “hero culture” in operations dates back to the early days of mainframe support, where a single operator could keep an entire data center running. As cloud computing and containerization matured, the industry gravitated toward infrastructure as code (IaC) and observability. However, many on‑premise homelabs still cling to manual ticketing and ad‑hoc scripts, perpetuating the hero cycle.

Key Features and Capabilities

  • Self‑Hosted Ticketing & Incident Management – Tools like TheHive, Cortex, and OSS‑based ticketing platforms can be containerized and run locally, giving you full control over data and workflow customization. - Observability Stack – Prometheus, Grafana, and Alertmanager provide metrics and alerting that surface problems before they explode. - Automated Onboarding – Using Ansible or Bash scripts to provision users, grant permissions, and enforce security policies reduces the chance of “last‑minute” access requests.
  • Policy‑Driven Access Control – Integrating with LDAP or OAuth2 providers ensures that only authorized personnel can trigger critical actions.

Pros and Cons

AdvantageDescription
Reduced BurnoutFewer emergency calls mean lower stress levels.
Higher ReliabilityAutomated checks catch drift early.
Scalable GrowthNew services can be added without re‑inventing the wheel.
Clear AccountabilityRoles and responsibilities are documented.
DrawbackMitigation
Initial InvestmentTime spent designing automation pays off over time.
Learning CurveNew tools require familiarization.
ComplexityOver‑engineering can introduce unnecessary moving parts.

Use Cases and Scenarios

  • Home Lab with Multiple Services – A developer runs a personal cloud stack (Nextcloud, Plex, Home Assistant) and wants alerts when storage exceeds 80 % or when a container crashes.
  • Small Office Server – A sysadmin manages a mail and file server, needing a ticketing workflow for hardware replacement requests.
  • Community Open‑Source Project – Maintainers want a centralized issue tracker that integrates with CI/CD pipelines for automated testing.

Comparison to Alternatives

SolutionStrengthsWeaknesses
Traditional Ticketing (e.g., JIRA Cloud)Rich UI, extensive pluginsCloud‑only, may not fit air‑gapped environments
Custom Bash ScriptsSimple, lightweightHard to maintain, lack auditability
Fully Managed SaaSZero‑ops, quick startData residency concerns, recurring cost
Self‑Hosted Open‑Source StackFull control, customizableRequires initial setup effort

The self‑hosted stack offers the best balance for homelab enthusiasts who value data sovereignty and want to avoid vendor lock‑in.


Prerequisites

Before diving into installation, verify that your environment meets the following baseline requirements.

Hardware and OS

RequirementMinimumRecommended
CPU2 cores4 cores
RAM4 GB8 GB+
Storage20 GB SSD50 GB SSD (for logs and backups)
Network1 Gbps Ethernet1 Gbps+ with VLAN support

Software Dependencies | Component | Version | Reason |

|———–|———|——–| | Docker Engine | 24.0+ | Required for container orchestration | | Docker Compose | 2.20+ | Simplifies multi‑service deployment | | Linux Kernel | 5.15+ | Supports latest container features | | Optional: Ansible | 2.15+ | For configuration automation |

Network and Security

  • Open ports 80, 443, and 8080 (or custom ports you intend to expose).
  • Ensure firewall rules allow inbound traffic only from trusted IP ranges.
  • Generate strong TLS certificates for HTTPS endpoints; consider using Let’s Encrypt for automated renewal.

User Permissions

  • Create a dedicated system user (e.g., devops) that owns the Docker socket and configuration directories.
  • Grant the user sudo rights for Docker commands only, avoiding broad sudo privileges. ### Pre‑Installation Checklist
  1. Verify Docker daemon is running (systemctl status docker).
  2. Pull required base images (docker pull alpine, docker pull prom/prometheus).
  3. Create persistent directories (/opt/homelab/data, /opt/homelab/config).
  4. Set appropriate ownership (chown -R devops:devops /opt/homelab).
  5. Document any existing firewall rules that may conflict with new services.

Installation & Setup

Below is a comprehensive, step‑by‑step guide to deploy a self‑hosted observability and incident‑management stack using Docker. All commands use the $CONTAINER_ID placeholder to stay compatible with Jekyll Liquid templating.

1. Pull Required Images

1
2
3
4
5
docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull thehiveproject/thehive:3.2.0
docker pull thehiveproject/cortex:1.2.0
docker pull nginx:latest

2. Create a Docker Compose File

Create a file named docker-compose.yml with the following content. Each service is annotated with comments that explain its role.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
version: "3.8"

services:
  # Prometheus for metrics collection
  prometheus:
    image: prom/prometheus:latest
    container_name: $PROMETHEUS_CONTAINER_NAME
    restart: unless-stopped    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus    ports:
      - "9090:9090"

  # Grafana for visualization
  grafana:
    image: grafana/grafana:latest
    container_name: $GRAFANA_CONTAINER_NAME
    restart: unless-stopped
    depends_on:
      - prometheus
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  # TheHive for case management
  thehive:
    image: thehiveproject/thehive:3.2.0
    container_name: $THEHIVE_CONTAINER_NAME    restart: unless-stopped
    depends_on:
      - elasticsearch
      - cortex    environment:
      - CORTEX_URL=http://$CORTEX_CONTAINER_NAME:9200
      - ES_URL=http://$ELASTICSEARCH_CONTAINER_NAME:9200
      - TZ=UTC
    ports:
      - "9091:9091"

  # Cortex for object storage
  cortex:
    image: thehiveproject/cortex:1.2.0
    container_name: $CORTEX_CONTAINER_NAME    restart: unless-stopped
    volumes:
      - ./cortex.yaml:/etc/cortex/cortex.yaml:ro
      - cortex_storage:/cortex
    ports:
      - "9411:9411"

  # Elasticsearch for indexing
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    container_name: $ELASTICSEARCH_CONTAINER_NAME
    restart: unless-stopped
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

  # Nginx as reverse proxy
  nginx:
    image: nginx:latest
    container_name: $NGINX_CONTAINER_NAME
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - grafana      - thehive

volumes:
  prometheus_data:
  grafana_data:
  cortex_storage:
  elasticsearch_data:

Explanation of Placeholders

  • $PROMETHEUS_CONTAINER_NAME, $GRAFANA_CONTAINER_NAME, etc., are environment variables that you can set before running docker-compose up. They replace the {.ID} and {.Names} placeholders that would otherwise clash with Jekyll templating. ### 3. Configure Prometheus

Create prometheus.yml in the same directory:

1
2
3
4
5
6
7
8
9
10
11
12
13
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'thehive'
    metrics_path: '/api/v1/metrics'
    static_configs:
      - targets: ['the
This post is licensed under CC BY 4.0 by the author.