I Forgot To Put The

Posted Sep 3, 2025

By Usman Masood Ashraf

views 8 min read

I Forgot To Put The: A DevOps Cautionary Tale About Infrastructure Management

1. Introduction

That sinking feeling when your terminal cursor blinks innocently after you’ve just executed:

  
rm -rf ./*

…only to realize a half-second too late that you forgot the critical directory path in your command. This visceral experience - memorialized in countless Reddit horror stories and sysadmin war stories - represents one of the most universal yet preventable disasters in infrastructure management.

In self-hosted environments and homelabs where systems administrators often operate without enterprise-grade safeguards, the consequences of a mistyped command can be catastrophic. One errant space character or misplaced wildcard can erase years of carefully curated configurations, media libraries, or application data. As DevOps professionals managing critical infrastructure, we must engineer resilience against these human errors.

This comprehensive guide examines:

The technical anatomy of destructive commands like rm -rf
System hardening techniques to prevent accidental deletions
Recovery strategies when disaster strikes
Architectural approaches to make infrastructure self-healing
Enterprise-grade safeguards you can implement in homelabs

We’ll transform that gut-wrenching “I forgot to put the…” moment from a career-lowlight into a valuable learning opportunity through practical system administration techniques used by professional DevOps teams.

2. Understanding the Problem Space

2.1 The Anatomy of Disaster

At its core, the infamous rm -rf incident combines three critical elements:

Recursive Deletion (-r):
Operates on directories and their contents recursively
Force Flag (-f):
Suppresses confirmation prompts and ignores nonexistent files
Path Specification:
The make-or-break component where a single character can determine infrastructure fate

When combined with shell globbing patterns (*, .*, /*), these parameters become increasingly dangerous:

Command	Effect	Risk Level
`rm -rf /home/user/*`	Deletes all visible files in directory	High
`rm -rf /home/user/.*`	Deletes all hidden files (.files)	High
`rm -rf /home/user /*`	Space before wildcard deletes root (!)	Critical

2.2 Why Homelabs Are Vulnerable

Self-hosted environments present unique risk factors:

Lack of Enterprise Safeguards:
No enterprise backup systems, change controls, or approval workflows
Mixed Criticality Workloads:
Personal media shares coexist with critical services like home automation
Experimental Mindset:
Frequent system modifications increase “command fatigue”
Resource Constraints:
Limited hardware for implementing robust redundancy

2.3 Historical Context

The rm -rf disaster has deep roots in Unix history:

1971: First rm command appears in Unix Version 1
1983: Recursive flag (-r) added in BSD 4.2
1990s: Force flag (-f) becomes common in GNU coreutils
2006: Google employee accidentally deletes production index with rm -rf /*
2012: Linus Torvalds advocates for making rm -rf / require extra flags
2018: Kubernetes CVE-2018-1002105 allows container escape to host deletion

3. Prerequisites for Safe Operations

3.1 System Requirements

Implement these foundational safeguards before any dangerous operations:

User Account Restrictions:

  
# Create restricted admin user
sudo useradd -m -s /bin/bash admin
sudo usermod -aG sudo admin
sudo visudo # Add: "admin ALL=(ALL) NOPASSWD: ALL, !/usr/bin/rm"

Filesystem Selection:
Use copy-on-write filesystems with snapshots:
- ZFS (zfs create -o copies=2 tank/home)
- Btrfs (btrfs subvolume snapshot /home /home/.snapshots/daily)

Mandatory Access Control:

  
# AppArmor policy for /usr/bin/rm
/usr/bin/rm {
  deny /**/.*,
  deny /*,
  deny /bin/*,
  deny /sbin/*,
  deny /usr/*,
  audit /home/*,
}

3.2 Pre-Installation Checklist

Before executing any destructive commands:

Verify current directory with pwd && ls
Test command with echo rm -rf * first
Confirm backups are current (borg list /backup)
Open second terminal session as fail-safe
Set filesystem immutable flag (chattr +i critical_file)

4. System Hardening Against Accidental Deletion

4.1 Aliasing Dangerous Commands

Modify shell configuration to prevent footguns:

  
# ~/.bashrc or ~/.zshrc
alias rm='rm -I --preserve-root'
alias chmod='chmod --preserve-root'
alias chown='chown --preserve-root'
alias mv='mv -i'
alias cp='cp -i'
alias ln='ln -i'

4.2 Implementing a Trash System

Replace rm with trash-cli for recoverable deletions:

  
# Install on Debian/Ubuntu
sudo apt install trash-cli

# Replace system rm (use at your own risk!)
echo 'alias rm="trash-put"' >> ~/.bashrc

# Restore deleted files
trash-list
trash-restore $ITEM_NUMBER

4.3 Kernel-Level Protections

Enable inotify monitoring with auditd:

  
# Monitor deletions in critical directories
sudo auditctl -w /home -p wa -k home_dir_changes
sudo auditctl -w /etc -p wa -k etc_changes

# Generate reports
sudo ausearch -k home_dir_changes -ts today

5. Recovery Strategies When Prevention Fails

5.1 Filesystem-Specific Recovery

Ext4:

  
# 1. Immediately unmount filesystem
sudo umount /dev/sda1

# 2. Use extundelete
sudo extundelete /dev/sda1 --restore-all --output-dir /recovery

ZFS:

  
# List available snapshots
zfs list -t snapshot

# Rollback to last known good state
zfs rollback tank/home@autosnap_2023-10-15_04:00:00_daily

5.2 Forensic Recovery Tools

TestDisk:
Recovers partition tables and boot sectors
1 sudo testdisk /dev/sdb
PhotoRec:
File carving for 300+ file types
1 sudo photorec /dev/sdb

Scalpel:
Customizable file carver

# /etc/scalpel/scalpel.conf
ext4 y 2000000 \x53\x5a\x44\x44\x44\x44\x00\x00\x00\x00

5.3 Backup Restoration Workflows

Implement the 3-2-1 backup rule with verification:

  
# BorgBackup example
borg create --stats --progress /backup::'{hostname}-{now}' ~

# Verify backup integrity
borg check /backup

# Restore most recent backup
borg extract /backup::$(borg list --last 1 /backup | awk '{print $1}')

6. Architectural Safeguards

6.1 Immutable Infrastructure Patterns

Convert volatile systems to immutable artifacts:

Docker:

  
FROM alpine
RUN apk add --no-cache critical_service
VOLUME /config
CMD ["critical_service"]

Systemd:

  
# /etc/systemd/system/critical.service
[Service]
ProtectSystem=strict
ReadWritePaths=/var/lib/critical
InaccessiblePaths=/home

6.2 Declarative Configuration Management

Ansible playbook snippet for safe permissions:

  
- name: Harden file permissions
  hosts: all
  tasks:
    - name: Set recursive ownership
      ansible.builtin.file:
        path: /etc
        owner: root
        group: root
        mode: '0644'
        recurse: yes
      check_mode: yes
      diff: yes

6.3 Infrastructure as Code Verification

Terratest validation pipeline:

  
func TestStorageDeletion(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../examples/storage",
    }

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify deletion protection
    output := terraform.Output(t, terraformOptions, "deletion_protection")
    assert.Equal(t, "true", output)
}

7. Psychological Safeguards for Operators

7.1 Command Line Mindfulness

Implement pre-execution pauses:

  
# Add 2-second delay to destructive commands
dangerous_commands=("rm" "dd" "mkfs" "fdisk")
for cmd in "${dangerous_commands[@]}"; do
    eval "$cmd() { sleep 2; command $cmd \"\$@\"; }"
done

7.2 Shell Prompt Engineering

Make critical states visually obvious:

  
# ~/.bashrc
if [[ $EUID -eq 0 ]]; then
    PS1="\[\e[1;41m\][DANGER ROOT]\[\e[0m\] \w # "
else
    PS1="\[\e[1;33m\]\u@\h\[\e[0m\] \w \$ "
fi

8. Enterprise Patterns for Homelabs

8.1 Change Approval Workflows

Implement lightweight review processes using Git:

  
# Infrastructure change workflow
git checkout -b storage-change
# Make changes
git commit -m "Modify storage config"
git push origin storage-change
# Require pull request approval before applying
ansible-playbook --check site.yml

8.2 Canary Deployments

Stage changes before full rollout:

  
# Phase 1: Single node
ansible-playbook -l canary_node site.yml

# Monitor for 24 hours
journalctl -f -u critical_service

# Phase 2: Full rollout
ansible-playbook site.yml

9. When Disaster Strikes: Incident Response

9.1 Immediate Containment Protocol

Freeze State:

  
sync; echo 3 > /proc/sys/vm/drop_caches
# For remote systems
systemctl stop network.service

Capture Forensic Evidence:

  
# Memory capture
sudo dd if=/dev/mem of=/tmp/mem.dump bs=1M count=1024

# Disk image
sudo dd if=/dev/sda of=/evidence/sda.img conv=noerror,sync

Damage Assessment:

# Compare against known good state
diff -qr /current /backup/last_known_good

9.2 Post-Mortem Template

Incident Report: Accidental Deletion 2023-10-15

Timeline

14:32: Command executed via SSH session
14:33: Nagios alerts on service downtime
14:37: Restore initiated from offsite backup

Root Causes

Lack of rm alias hardening
No confirmation prompt for recursive deletes
Recent backup hadn’t completed due to disk full error

Corrective Actions

Implement safe-rm system-wide
Add backup monitoring to alert channel
Conduct monthly recovery drills
```

10. Conclusion

The “I forgot to put the…” moment remains an ever-present risk in infrastructure management, but through deliberate system design and operational practices, we can transform these near-disasters into valuable resilience tests. By implementing the technical safeguards, architectural patterns, and psychological practices outlined in this guide:

Critical Systems Gain Immunity through immutable infrastructure patterns
Human Errors Become Contained with filesystem safeguards and aliases
Recovery Processes Become Routine via automated backup verification
Operators Develop Resilience through mindfulness practices

The true measure of DevOps maturity isn’t preventing every mistake - that’s impossible - but rather creating systems where a single mistyped command can’t cascade into catastrophic failure. As you rebuild your NAS or reconfigure your homelab, let these principles guide you toward infrastructure that survives both hardware failures and human fallibility.

Further Learning Resources

Remember: The most powerful command in your terminal isn’t rm or dd - it’s sudo shutdown -h now when you recognize you’re about to make a catastrophic mistake. Sometimes walking away is the most professional recovery strategy of all.

Open Source, Reddit Guides, Kubernetes

This post is licensed under CC BY 4.0 by the author.