Post

Had An Interview Yesterday

Had An Interview Yesterday

Had AnInterview Yesterday

Introduction

Yesterday I walked into a job interview that seemed perfectly aligned with my background in self‑hosted infrastructure and automation. The posting highlighted a mature IT team, a ticketing system, and a clear path for collaboration. Yet, halfway through the conversation, the interviewer dropped a bombshell: there is no IT team. The role demands a one‑person army capable of supporting ten geographically dispersed locations, 200 end‑users, and roughly 500 endpoints across offices, warehouses, and remote sites.

This scenario is a reality for many homelab enthusiasts, small‑to‑medium enterprises, and even larger organizations that have opted for a lean, cost‑effective approach to infrastructure management. The challenge is not just about deploying services; it is about designing a resilient, scalable, and secure environment that can be operated solo without sacrificing uptime or productivity.

In this guide we will dissect the problem, explore the underlying technologies that make a one‑person IT army viable, and provide a step‑by‑step roadmap for building, configuring, and maintaining such a system. Whether you are a seasoned DevOps engineer, a homelab hobbyist, or a sysadmin looking to transition to a more autonomous setup, this comprehensive article will equip you with the knowledge, tools, and best practices needed to succeed.

Key takeaways:

  • Understand the scope of a one‑person IT operation and why it matters for modern infrastructure.
  • Learn how to design a modular architecture that scales across multiple sites.
  • Master the installation and configuration of core services using proven open‑source tools.
  • Implement security hardening, performance tuning, and monitoring strategies.
  • Gain practical troubleshooting techniques for real‑world scenarios.

By the end of this guide, you will have a clear blueprint for turning a seemingly impossible workload into a manageable, repeatable, and sustainable operation.

Understanding the Topic

What Does “One‑Person IT Army” Really Mean?

A “one‑person IT army” refers to a single individual who assumes full responsibility for the entire technology stack of an organization. This includes:

  • Network design – routing, VPNs, firewalls, and WAN optimization.
  • Server and endpoint management – provisioning, patching, and monitoring of physical and virtual machines.
  • Application deployment – CI/CD pipelines, container orchestration, and service discovery.
  • Security and compliance – identity management, vulnerability scanning, and incident response.
  • User support – ticket handling, troubleshooting, and service requests.

The term is not a literal army of one person, but rather a single operator who leverages automation, scripting, and robust tooling to eliminate manual toil and achieve near‑zero touch operations.

Historical Context

The concept emerged from the early days of self‑hosted infrastructure, where small businesses and hobbyists built their own servers using Linux, open‑source services, and basic scripting. As cloud computing matured, the pendulum swung toward managed services, but the cost and vendor lock‑in motivated many to return to self‑hosted solutions. Today, the rise of infrastructure as code (IaC), containerization, and observability tools has made it feasible for a single operator to manage complex environments that previously required entire teams.

Core Features and Capabilities

FeatureDescriptionTypical Tools
AutomationScripts and declarative configurations replace manual steps.Ansible, Terraform, Bash, Python
Container OrchestrationDeploy and manage services in isolated environments.Docker, Podman, Kubernetes
Configuration ManagementEnforce consistent state across all nodes.Ansible, Chef, Puppet
Monitoring & AlertingReal‑time visibility into health and performance.Prometheus, Grafana, Zabbix
Ticketing IntegrationCentralize user requests and incident tracking.Jira Service Management, RT, GitLab Issues
Backup & RecoveryProtect data and ensure rapid restoration.Restic, Duplicity, BorgBackup
Security HardeningReduce attack surface and meet compliance.OpenSCAP, Falco, SELinux

Pros and Cons

Pros

  • Cost Efficiency – Eliminates the need for multiple staff salaries. - Rapid Decision‑Making – No bureaucracy; changes can be rolled out instantly.
  • Full Visibility – Operator has end‑to‑end insight into every component.
  • Scalable Automation – Once scripts are written, scaling is a matter of execution.

Cons

  • Single Point of Failure – If the operator is unavailable, the entire stack may stall.
  • Knowledge Breadth – Requires proficiency across many domains (networking, security, etc.).
  • Time Intensive – Initial setup can be demanding before automation pays off.

Use Cases and Scenarios

  • Distributed Offices – Multiple sites with limited local IT staff.
  • Warehouse Automation – IoT devices, barcode scanners, and PLCs needing centralized monitoring.
  • Remote Workforce – Supporting 200 users with VPN, email, and collaboration tools.
  • Homelab to Production – Transitioning a personal testbed into a production‑grade environment.

The ecosystem for solo operators is maturing rapidly. Projects like Crossplane, Rancher, and Portainer are simplifying multi‑cluster management, while GitOps frameworks (e.g., Argo CD) enable declarative deployments with minimal manual intervention. Artificial intelligence is beginning to assist in anomaly detection, reducing the manual effort required for log analysis.

Comparison with Alternatives

AlternativeTypical Team SizeManagement OverheadIdeal For
Traditional MSP5‑10 engineersHigh (multiple tiers)Large enterprises
Fully Managed CloudNone (provider)Low (limited control)Organizations prioritizing convenience
One‑Person IT Army1Medium (requires automation)Small businesses, homelabs, cost‑conscious teams

Prerequisites

Before embarking on the journey, ensure that your environment meets the following baseline requirements.

Hardware - CPU – Minimum 8 cores (preferably 16) to handle concurrent container workloads.

  • RAM – At least 32 GB; 64 GB recommended for larger workloads.
  • Storage – 2 TB SSD for fast access to logs, backups, and container images.
  • Network – Gigabit Ethernet with redundant uplink for high availability.

Software

ComponentMinimum VersionPurpose
Operating SystemUbuntu 22.04 LTS or Debian 12Stable base for containers and networking.
Docker Engine24.0+Container runtime for application isolation.
Docker Compose2.20+Multi‑container orchestration on a single host.
Ansible2.15+Configuration management and automation.
Prometheus2.48+Metrics collection and alerting.
Grafana10.2+Visualization and dashboarding.
Git2.43+Source control for IaC repositories.
VPN Server (e.g., WireGuard)1.0+Secure remote access across sites.

Network and Security

  • Firewall – Configured to allow only required ports (e.g., 22 for SSH, 80/443 for web services).
  • TLS – All external services must be served over HTTPS with valid certificates.
  • User Permissions – Non‑root user with sudo privileges for Docker and Ansible operations.

Pre‑Installation Checklist

  1. Verify hardware specifications and power redundancy.
  2. Install the base OS and apply all security patches.
  3. Create a dedicated non‑root user (e.g., devops) and configure SSH key authentication.
  4. Install Docker Engine and enable the service.
  5. Set up a Git repository for all configuration files.
  6. Draft an initial network diagram outlining site connectivity.

Installation & Setup

Below is a detailed, step‑by‑step guide to deploy a core stack that can serve as the backbone of a one‑person IT army. The example uses Docker Compose for service orchestration, Ansible for configuration management, and Prometheus + Grafana for monitoring.

1. Clone the Repository

1
2
git clone https://github.com/example/one-person-it-army.git
cd one-person-it-army

2. Prepare Environment Variables

Create a .env file with the following variables (adjust values to match your environment).

# .env
DOMAIN=example.com
VPN_SUBNET=10.0.0.0/24
PROMETHEUS_VERSION=2.48.1
GRAFANA_VERSION=10.2.0

3. Deploy Core Services with Docker Compose

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# docker-compose.yml
version: "3.8"

services:
  prometheus:
    image: prom/prometheus:${PROMETHEUS_VERSION}
    container_name: $CONTAINER_NAMES-prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    $CONTAINER_STATUS

  grafana:
    image: grafana/grafana:${GRAFANA_VERSION}
    container_name: $CONTAINER_NAMES-grafana    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    $CONTAINER_STATUS

  node-exporter:
    image: prom/node-exporter:latest
    container_name: $CONTAINER_NAMES-node-exporter
    restart: unless-stopped
    network_mode: host
    $CONTAINER_STATUS

volumes:
  prometheus_data:
  grafana_data:

Explanation of Key Directives

  • $CONTAINER_NAMES – placeholder for the base name of the container; replace with a meaningful identifier.
  • $CONTAINER_STATUS – indicates the desired restart policy (unless-stopped).
  • network_mode: host – allows node‑exporter to collect host metrics without additional networking overhead.

4. Start the Stack

1
docker compose up -d

Verify that all containers are running:

1
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"

5. Configure Ansible for Host Hardening

Create an inventory file inventory.ini:

1
2
[all]
localhost ansible_connection=local

Create a playbook hardening.yml:

- name: Harden host security
  hosts: all
  become: true
  tasks:
    - name: Ensure firewall is enabled
      ufw:
        state: enabled        direction: incoming
        rules:
          - {port: 22, proto: tcp, action: allow}
          - {port: 80, proto: tcp, action: allow}
          - {port: 443, proto: tcp, action: allow}
          - {port: 3000, proto: tcp, action: allow}
          - {port: 9090, proto: tcp, action: allow}
      $STATUS

    - name: Install Fail2Ban
      apt:
        name: fail2ban
        state: present
      $STATUS    - name: Configure Fail2Ban for SSH
      copy:
        dest: /etc/fail2ban/jail.d/ssh.conf        content: |
          [sshd]
          enabled = true
          port = ssh
          filter = sshd
          logpath = /var/log/auth.log
          maxretry = 5
          bantime = 600
          findtime = 600
        mode: '0644'
      $STATUS

Run the playbook:

1
ansible-playbook -i inventory.ini hardening.yml

6. Verify Monitoring Stack

  • Access Prometheus at http://<host_ip>:9090.
  • Access Grafana at http://<host_ip>:3000 (default credentials: admin / admin).

Add Prometheus as a data source in Grafana and import dashboards for node‑exporter metrics.

Common Installation Pitfalls

IssueSymptomFix
Docker daemon not startingdocker: command not foundEnsure systemctl start docker and enable at boot (systemctl enable docker).
Port conflicts“address already in use” errorsChange host ports in docker-compose.yml or stop conflicting services.
Ansible permission deniedPermission denied (publickey)Verify SSH key permissions (chmod 600 ~/.ssh/id_rsa) and add public key to ~/.ssh/authorized_keys.
Prometheus scrape failuresNo metrics displayedCheck prometheus.yml for correct targets and network reachability.

Configuration & Optimization

1. Security Hardening

  • TLS Termination – Use Caddy or Traefik as a reverse proxy to terminate HTTPS.
  • Container Isolation – Enable AppArmor or SELinux profiles for each container
This post is licensed under CC BY 4.0 by the author.