Had An Interview Yesterday
Had AnInterview Yesterday
Introduction
Yesterday I walked into a job interview that seemed perfectly aligned with my background in self‑hosted infrastructure and automation. The posting highlighted a mature IT team, a ticketing system, and a clear path for collaboration. Yet, halfway through the conversation, the interviewer dropped a bombshell: there is no IT team. The role demands a one‑person army capable of supporting ten geographically dispersed locations, 200 end‑users, and roughly 500 endpoints across offices, warehouses, and remote sites.
This scenario is a reality for many homelab enthusiasts, small‑to‑medium enterprises, and even larger organizations that have opted for a lean, cost‑effective approach to infrastructure management. The challenge is not just about deploying services; it is about designing a resilient, scalable, and secure environment that can be operated solo without sacrificing uptime or productivity.
In this guide we will dissect the problem, explore the underlying technologies that make a one‑person IT army viable, and provide a step‑by‑step roadmap for building, configuring, and maintaining such a system. Whether you are a seasoned DevOps engineer, a homelab hobbyist, or a sysadmin looking to transition to a more autonomous setup, this comprehensive article will equip you with the knowledge, tools, and best practices needed to succeed.
Key takeaways:
- Understand the scope of a one‑person IT operation and why it matters for modern infrastructure.
- Learn how to design a modular architecture that scales across multiple sites.
- Master the installation and configuration of core services using proven open‑source tools.
- Implement security hardening, performance tuning, and monitoring strategies.
- Gain practical troubleshooting techniques for real‑world scenarios.
By the end of this guide, you will have a clear blueprint for turning a seemingly impossible workload into a manageable, repeatable, and sustainable operation.
Understanding the Topic
What Does “One‑Person IT Army” Really Mean?
A “one‑person IT army” refers to a single individual who assumes full responsibility for the entire technology stack of an organization. This includes:
- Network design – routing, VPNs, firewalls, and WAN optimization.
- Server and endpoint management – provisioning, patching, and monitoring of physical and virtual machines.
- Application deployment – CI/CD pipelines, container orchestration, and service discovery.
- Security and compliance – identity management, vulnerability scanning, and incident response.
- User support – ticket handling, troubleshooting, and service requests.
The term is not a literal army of one person, but rather a single operator who leverages automation, scripting, and robust tooling to eliminate manual toil and achieve near‑zero touch operations.
Historical Context
The concept emerged from the early days of self‑hosted infrastructure, where small businesses and hobbyists built their own servers using Linux, open‑source services, and basic scripting. As cloud computing matured, the pendulum swung toward managed services, but the cost and vendor lock‑in motivated many to return to self‑hosted solutions. Today, the rise of infrastructure as code (IaC), containerization, and observability tools has made it feasible for a single operator to manage complex environments that previously required entire teams.
Core Features and Capabilities
| Feature | Description | Typical Tools |
|---|---|---|
| Automation | Scripts and declarative configurations replace manual steps. | Ansible, Terraform, Bash, Python |
| Container Orchestration | Deploy and manage services in isolated environments. | Docker, Podman, Kubernetes |
| Configuration Management | Enforce consistent state across all nodes. | Ansible, Chef, Puppet |
| Monitoring & Alerting | Real‑time visibility into health and performance. | Prometheus, Grafana, Zabbix |
| Ticketing Integration | Centralize user requests and incident tracking. | Jira Service Management, RT, GitLab Issues |
| Backup & Recovery | Protect data and ensure rapid restoration. | Restic, Duplicity, BorgBackup |
| Security Hardening | Reduce attack surface and meet compliance. | OpenSCAP, Falco, SELinux |
Pros and Cons
Pros
- Cost Efficiency – Eliminates the need for multiple staff salaries. - Rapid Decision‑Making – No bureaucracy; changes can be rolled out instantly.
- Full Visibility – Operator has end‑to‑end insight into every component.
- Scalable Automation – Once scripts are written, scaling is a matter of execution.
Cons
- Single Point of Failure – If the operator is unavailable, the entire stack may stall.
- Knowledge Breadth – Requires proficiency across many domains (networking, security, etc.).
- Time Intensive – Initial setup can be demanding before automation pays off.
Use Cases and Scenarios
- Distributed Offices – Multiple sites with limited local IT staff.
- Warehouse Automation – IoT devices, barcode scanners, and PLCs needing centralized monitoring.
- Remote Workforce – Supporting 200 users with VPN, email, and collaboration tools.
- Homelab to Production – Transitioning a personal testbed into a production‑grade environment.
Current State and Future Trends
The ecosystem for solo operators is maturing rapidly. Projects like Crossplane, Rancher, and Portainer are simplifying multi‑cluster management, while GitOps frameworks (e.g., Argo CD) enable declarative deployments with minimal manual intervention. Artificial intelligence is beginning to assist in anomaly detection, reducing the manual effort required for log analysis.
Comparison with Alternatives
| Alternative | Typical Team Size | Management Overhead | Ideal For |
|---|---|---|---|
| Traditional MSP | 5‑10 engineers | High (multiple tiers) | Large enterprises |
| Fully Managed Cloud | None (provider) | Low (limited control) | Organizations prioritizing convenience |
| One‑Person IT Army | 1 | Medium (requires automation) | Small businesses, homelabs, cost‑conscious teams |
Prerequisites
Before embarking on the journey, ensure that your environment meets the following baseline requirements.
Hardware - CPU – Minimum 8 cores (preferably 16) to handle concurrent container workloads.
- RAM – At least 32 GB; 64 GB recommended for larger workloads.
- Storage – 2 TB SSD for fast access to logs, backups, and container images.
- Network – Gigabit Ethernet with redundant uplink for high availability.
Software
| Component | Minimum Version | Purpose |
|---|---|---|
| Operating System | Ubuntu 22.04 LTS or Debian 12 | Stable base for containers and networking. |
| Docker Engine | 24.0+ | Container runtime for application isolation. |
| Docker Compose | 2.20+ | Multi‑container orchestration on a single host. |
| Ansible | 2.15+ | Configuration management and automation. |
| Prometheus | 2.48+ | Metrics collection and alerting. |
| Grafana | 10.2+ | Visualization and dashboarding. |
| Git | 2.43+ | Source control for IaC repositories. |
| VPN Server (e.g., WireGuard) | 1.0+ | Secure remote access across sites. |
Network and Security
- Firewall – Configured to allow only required ports (e.g., 22 for SSH, 80/443 for web services).
- TLS – All external services must be served over HTTPS with valid certificates.
- User Permissions – Non‑root user with
sudoprivileges for Docker and Ansible operations.
Pre‑Installation Checklist
- Verify hardware specifications and power redundancy.
- Install the base OS and apply all security patches.
- Create a dedicated non‑root user (e.g.,
devops) and configure SSH key authentication. - Install Docker Engine and enable the service.
- Set up a Git repository for all configuration files.
- Draft an initial network diagram outlining site connectivity.
Installation & Setup
Below is a detailed, step‑by‑step guide to deploy a core stack that can serve as the backbone of a one‑person IT army. The example uses Docker Compose for service orchestration, Ansible for configuration management, and Prometheus + Grafana for monitoring.
1. Clone the Repository
1
2
git clone https://github.com/example/one-person-it-army.git
cd one-person-it-army
2. Prepare Environment Variables
Create a .env file with the following variables (adjust values to match your environment).
# .env
DOMAIN=example.com
VPN_SUBNET=10.0.0.0/24
PROMETHEUS_VERSION=2.48.1
GRAFANA_VERSION=10.2.0
3. Deploy Core Services with Docker Compose
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# docker-compose.yml
version: "3.8"
services:
prometheus:
image: prom/prometheus:${PROMETHEUS_VERSION}
container_name: $CONTAINER_NAMES-prometheus
restart: unless-stopped
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
$CONTAINER_STATUS
grafana:
image: grafana/grafana:${GRAFANA_VERSION}
container_name: $CONTAINER_NAMES-grafana restart: unless-stopped
volumes:
- grafana_data:/var/lib/grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
$CONTAINER_STATUS
node-exporter:
image: prom/node-exporter:latest
container_name: $CONTAINER_NAMES-node-exporter
restart: unless-stopped
network_mode: host
$CONTAINER_STATUS
volumes:
prometheus_data:
grafana_data:
Explanation of Key Directives
$CONTAINER_NAMES– placeholder for the base name of the container; replace with a meaningful identifier.$CONTAINER_STATUS– indicates the desired restart policy (unless-stopped).network_mode: host– allows node‑exporter to collect host metrics without additional networking overhead.
4. Start the Stack
1
docker compose up -d
Verify that all containers are running:
1
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
5. Configure Ansible for Host Hardening
Create an inventory file inventory.ini:
1
2
[all]
localhost ansible_connection=local
Create a playbook hardening.yml:
- name: Harden host security
hosts: all
become: true
tasks:
- name: Ensure firewall is enabled
ufw:
state: enabled direction: incoming
rules:
- {port: 22, proto: tcp, action: allow}
- {port: 80, proto: tcp, action: allow}
- {port: 443, proto: tcp, action: allow}
- {port: 3000, proto: tcp, action: allow}
- {port: 9090, proto: tcp, action: allow}
$STATUS
- name: Install Fail2Ban
apt:
name: fail2ban
state: present
$STATUS - name: Configure Fail2Ban for SSH
copy:
dest: /etc/fail2ban/jail.d/ssh.conf content: |
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 5
bantime = 600
findtime = 600
mode: '0644'
$STATUS
Run the playbook:
1
ansible-playbook -i inventory.ini hardening.yml
6. Verify Monitoring Stack
- Access Prometheus at
http://<host_ip>:9090. - Access Grafana at
http://<host_ip>:3000(default credentials:admin/admin).
Add Prometheus as a data source in Grafana and import dashboards for node‑exporter metrics.
Common Installation Pitfalls
| Issue | Symptom | Fix |
|---|---|---|
| Docker daemon not starting | docker: command not found | Ensure systemctl start docker and enable at boot (systemctl enable docker). |
| Port conflicts | “address already in use” errors | Change host ports in docker-compose.yml or stop conflicting services. |
| Ansible permission denied | Permission denied (publickey) | Verify SSH key permissions (chmod 600 ~/.ssh/id_rsa) and add public key to ~/.ssh/authorized_keys. |
| Prometheus scrape failures | No metrics displayed | Check prometheus.yml for correct targets and network reachability. |
Configuration & Optimization
1. Security Hardening
- TLS Termination – Use Caddy or Traefik as a reverse proxy to terminate HTTPS.
- Container Isolation – Enable AppArmor or SELinux profiles for each container