First Time Not Playing The Hero Feels Good

Posted Jun 13, 2026

By Usman Masood Ashraf

views 7 min read

Introduction

Walking into a homelab or a self‑hosted environment and hearing the familiar phrase “We need you to fix this now” is a scenario many seasoned engineers recognize all too well. Yet there is a growing cadre of practitioners who, for the first time, experience the quiet satisfaction of not being the hero who scrambles at the last minute. This feeling isn’t about ego; it’s about establishing a resilient, automated, and predictable infrastructure that lets you step back from constant firefighting and focus on strategic growth.

In the world of DevOps, the transition from reactive heroics to proactive stewardship is often marked by a series of deliberate choices: robust monitoring, automated ticket routing, disciplined onboarding, and clear ownership boundaries. This guide unpacks those choices, offering a step‑by‑step blueprint for building a system where the “hero” role becomes optional rather than inevitable.

You will learn:

How to shift from ad‑hoc incident response to systematic, repeatable processes.
Which open‑source tools and patterns best support a self‑hosted homelab. - Concrete Docker‑based installations that avoid the pitfalls of placeholder syntax that conflicts with Jekyll Liquid templating.
Strategies for securing, optimizing, and scaling your stack without introducing hidden technical debt.
Practical troubleshooting techniques that keep the system reliable when it matters most.

By the end of this guide, you’ll have a clear roadmap for creating an environment where the first time you don’t get tapped on the shoulder during an office celebration is not a coincidence, but a design decision.

Understanding the Topic

What Does “Not Playing The Hero” Mean in DevOps?

In traditional IT and even modern DevOps narratives, the “hero” is the engineer who rushes in at 2 a.m. to resolve a critical outage, often bypassing standard procedures to restore service. While heroic efforts can be commendable, they are also symptomatic of systemic weaknesses: missing alerts, opaque configuration, or manual processes that lack documentation.

The phrase “First Time Not Playing The Hero Feels Good” captures the psychological shift when an engineer experiences the relief of a system that does not require that last‑minute heroics. It signals:

Predictability – Alerts fire before issues become crises.
Ownership – Clear escalation paths and documented runbooks exist.
Automation – Repetitive tasks are handled by code, not by human intervention.

Historical Context

The concept of “hero culture” in operations dates back to the early days of mainframe support, where a single operator could keep an entire data center running. As cloud computing and containerization matured, the industry gravitated toward infrastructure as code (IaC) and observability. However, many on‑premise homelabs still cling to manual ticketing and ad‑hoc scripts, perpetuating the hero cycle.

Key Features and Capabilities

Self‑Hosted Ticketing & Incident Management – Tools like TheHive, Cortex, and OSS‑based ticketing platforms can be containerized and run locally, giving you full control over data and workflow customization. - Observability Stack – Prometheus, Grafana, and Alertmanager provide metrics and alerting that surface problems before they explode. - Automated Onboarding – Using Ansible or Bash scripts to provision users, grant permissions, and enforce security policies reduces the chance of “last‑minute” access requests.
Policy‑Driven Access Control – Integrating with LDAP or OAuth2 providers ensures that only authorized personnel can trigger critical actions.

Pros and Cons

Advantage	Description
Reduced Burnout	Fewer emergency calls mean lower stress levels.
Higher Reliability	Automated checks catch drift early.
Scalable Growth	New services can be added without re‑inventing the wheel.
Clear Accountability	Roles and responsibilities are documented.

Drawback	Mitigation
Initial Investment	Time spent designing automation pays off over time.
Learning Curve	New tools require familiarization.
Complexity	Over‑engineering can introduce unnecessary moving parts.

Use Cases and Scenarios

Home Lab with Multiple Services – A developer runs a personal cloud stack (Nextcloud, Plex, Home Assistant) and wants alerts when storage exceeds 80 % or when a container crashes.
Small Office Server – A sysadmin manages a mail and file server, needing a ticketing workflow for hardware replacement requests.
Community Open‑Source Project – Maintainers want a centralized issue tracker that integrates with CI/CD pipelines for automated testing.

Current State and Future Trends The market for open‑source incident management and observability is maturing rapidly. Projects like TheHive 3.0 and Cortex 1.0 now support native Docker deployments, making it easier to spin up a complete ticketing pipeline on a single host. Future trends point toward tighter integration with service meshes (e.g., Istio) and AI‑driven anomaly detection, but the foundational principle remains the same: automate the predictable, document the unpredictable, and eliminate the need for heroic rescues.

Comparison to Alternatives

Solution	Strengths	Weaknesses
Traditional Ticketing (e.g., JIRA Cloud)	Rich UI, extensive plugins	Cloud‑only, may not fit air‑gapped environments
Custom Bash Scripts	Simple, lightweight	Hard to maintain, lack auditability
Fully Managed SaaS	Zero‑ops, quick start	Data residency concerns, recurring cost
Self‑Hosted Open‑Source Stack	Full control, customizable	Requires initial setup effort

The self‑hosted stack offers the best balance for homelab enthusiasts who value data sovereignty and want to avoid vendor lock‑in.

Prerequisites

Before diving into installation, verify that your environment meets the following baseline requirements.

Hardware and OS

Requirement	Minimum	Recommended
CPU	2 cores	4 cores
RAM	4 GB	8 GB+
Storage	20 GB SSD	50 GB SSD (for logs and backups)
Network	1 Gbps Ethernet	1 Gbps+ with VLAN support

Software Dependencies | Component | Version | Reason |

|———–|———|——–| | Docker Engine | 24.0+ | Required for container orchestration | | Docker Compose | 2.20+ | Simplifies multi‑service deployment | | Linux Kernel | 5.15+ | Supports latest container features | | Optional: Ansible | 2.15+ | For configuration automation |

Network and Security

Open ports 80, 443, and 8080 (or custom ports you intend to expose).
Ensure firewall rules allow inbound traffic only from trusted IP ranges.
Generate strong TLS certificates for HTTPS endpoints; consider using Let’s Encrypt for automated renewal.

User Permissions

Create a dedicated system user (e.g., devops) that owns the Docker socket and configuration directories.
Grant the user sudo rights for Docker commands only, avoiding broad sudo privileges. ### Pre‑Installation Checklist

Verify Docker daemon is running (systemctl status docker).
Pull required base images (docker pull alpine, docker pull prom/prometheus).
Create persistent directories (/opt/homelab/data, /opt/homelab/config).
Set appropriate ownership (chown -R devops:devops /opt/homelab).
Document any existing firewall rules that may conflict with new services.

Installation & Setup

Below is a comprehensive, step‑by‑step guide to deploy a self‑hosted observability and incident‑management stack using Docker. All commands use the $CONTAINER_ID placeholder to stay compatible with Jekyll Liquid templating.

1. Pull Required Images

docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull thehiveproject/thehive:3.2.0
docker pull thehiveproject/cortex:1.2.0
docker pull nginx:latest

2. Create a Docker Compose File

Create a file named docker-compose.yml with the following content. Each service is annotated with comments that explain its role.

  
version: "3.8"

services:
  # Prometheus for metrics collection
  prometheus:
    image: prom/prometheus:latest
    container_name: $PROMETHEUS_CONTAINER_NAME
    restart: unless-stopped    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus    ports:
      - "9090:9090"

  # Grafana for visualization
  grafana:
    image: grafana/grafana:latest
    container_name: $GRAFANA_CONTAINER_NAME
    restart: unless-stopped
    depends_on:
      - prometheus
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  # TheHive for case management
  thehive:
    image: thehiveproject/thehive:3.2.0
    container_name: $THEHIVE_CONTAINER_NAME    restart: unless-stopped
    depends_on:
      - elasticsearch
      - cortex    environment:
      - CORTEX_URL=http://$CORTEX_CONTAINER_NAME:9200
      - ES_URL=http://$ELASTICSEARCH_CONTAINER_NAME:9200
      - TZ=UTC
    ports:
      - "9091:9091"

  # Cortex for object storage
  cortex:
    image: thehiveproject/cortex:1.2.0
    container_name: $CORTEX_CONTAINER_NAME    restart: unless-stopped
    volumes:
      - ./cortex.yaml:/etc/cortex/cortex.yaml:ro
      - cortex_storage:/cortex
    ports:
      - "9411:9411"

  # Elasticsearch for indexing
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    container_name: $ELASTICSEARCH_CONTAINER_NAME
    restart: unless-stopped
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

  # Nginx as reverse proxy
  nginx:
    image: nginx:latest
    container_name: $NGINX_CONTAINER_NAME
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - grafana      - thehive

volumes:
  prometheus_data:
  grafana_data:
  cortex_storage:
  elasticsearch_data:

Explanation of Placeholders

$PROMETHEUS_CONTAINER_NAME, $GRAFANA_CONTAINER_NAME, etc., are environment variables that you can set before running docker-compose up. They replace the {.ID} and {.Names} placeholders that would otherwise clash with Jekyll templating. ### 3. Configure Prometheus

Create prometheus.yml in the same directory:

  
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'thehive'
    metrics_path: '/api/v1/metrics'
    static_configs:
      - targets: ['the

Open Source, Reddit Guides, Docker

This post is licensed under CC BY 4.0 by the author.

First Time Not Playing The Hero Feels Good

Introduction

Understanding the Topic

What Does “Not Playing The Hero” Mean in DevOps?

Historical Context

Key Features and Capabilities

Pros and Cons

Use Cases and Scenarios

Comparison to Alternatives

Prerequisites

Hardware and OS

Software Dependencies | Component | Version | Reason |

Network and Security

User Permissions

Installation & Setup

1. Pull Required Images

2. Create a Docker Compose File

Trending Tags