Aws Is Down Whos Laughing Right Now

Posted Oct 21, 2025

By Usman Masood Ashraf

views 5 min read

1. Introduction

When AWS US-EAST-1 stumbles, half the internet collapses. Docker builds fail, DynamoDB APIs vanish, and engineers scramble to explain why “IT IS ALWAYS DNS.” Yet in homelabs worldwide, $15/month self-hosted services hum along untouched. This is the reality of modern infrastructure centralization – and the growing rebellion against it.

For DevOps engineers and sysadmins, today’s outages are tomorrow’s resume-generating events. This guide dissects why decentralized infrastructure matters, how to build outage-resistant systems, and why your side-project VPS might outlive AWS Region-wide failures. We’ll explore:

The fragility of hyperscale dependency (as demonstrated by the June 2024 DynamoDB DNS outage)
Battle-tested self-hosting patterns for critical services
Multi-cloud mitigation strategies that don’t require enterprise budgets
DNS resilience techniques beyond “just use Route53”

By the end, you’ll transform from cloud consumer to infrastructure contrarian – the engineer laughing when status pages turn red.

2. Understanding Decentralized Infrastructure

What Is Self-Hosted Resilience?

Self-hosting critical services means maintaining operational control when:

Cloud provider DNS fails (AWS us-east-1, June 2024)
Container registries become unavailable (Docker Hub during AWS outages)
Region-specific APIs go dark

Historical Context: The 2020 DynDNS attack, 2021 Fastly outage, and 2024 AWS DNS failure prove single-point dependencies risk internet-wide disruptions.

Key Advantages of Decentralization

Real-World Example: The Redditor’s $15/month Immich instance survived AWS’ outage because:

No dependency on AWS DNS resolvers
Local Docker image cache
Stateless service design

When Decentralization Becomes Liability

Counterintuitively, self-hosting increases availability only when:

You implement automated patching (unattended-upgrades)
Configure monitoring equivalent to CloudWatch (Prometheus + Grafana)
Maintain tested backups (Borgmatic + Rclone)

3. Prerequisites

Hardware Requirements

Critical Dependencies:

  
# Base OS (Debian 12 example)
sudo apt install -y docker-ce=5:24.0.7-1~debian.12~bookworm \
containerd.io=1.6.31-1 \
docker-buildx-plugin=0.11.2-1~debian.12~bookworm

# Verify no AWS dependencies
dig +trace docker.com @8.8.8.8 | grep 'awsdns'

Security Pre-Checks

Network Isolation:

ufw default deny incoming
ufw allow from 192.168.1.0/24 to any port 443

DNS Control:

  
# /etc/unbound/unbound.conf
forward-zone:
  name: "."
  forward-addr: 9.9.9.9@853#dns.quad9.net
  forward-ssl-upstream: yes

4. Installation & Setup

Stateless Service Template (Immich Example)

  
# Pull images during stable periods
docker pull -a immichproject/immich

# Verify local cache
docker images | grep immichproject/immich

# Persistent volumes only for critical data
docker volume create immich_pgdata

docker-compose.yml Resilience Tweaks:

  
services:
  immich-server:
    image: immichproject/immich-server:release
    networks:
      - internal_isolated
    dns:
      - 192.168.1.53 # Your local resolver
    deploy:
      resources:
        limits:
          memory: 4GB

networks:
  internal_isolated:
    internal: true # No accidental internet exposure

Validation Steps:

  
# Confirm no external DNS leaks
docker exec $CONTAINER_ID cat /etc/resolv.conf

# Test service isolation
docker run --rm --network container:$CONTAINER_ID nicolaka/netshoot \
curl -sI https://aws.amazon.com | head -n1
# Should timeout if properly isolated

5. Configuration & Optimization

DNS Armor Plating

Stubby Config (DNS-over-TLS):

  
# /etc/stubby/stubby.yml
resolution_type: GETDNS_RESOLUTION_STUB
dns_transport_list:
  - GETDNS_TRANSPORT_TLS
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
round_robin_upstreams: 1 # Failover
upstream_recursive_servers:
  - address_data: 9.9.9.9
    tls_auth_name: "dns.quad9.net"
  - address_data: 1.1.1.1
    tls_auth_name: "cloudflare-dns.com"

Caching Unbound Setup:

  
# /etc/unbound/unbound.conf
server:
    prefetch: yes
    prefetch-key: yes
    cache-min-ttl: 3600 # Survive upstream outages
    serve-expired: yes
    serve-expired-ttl: 86400

Container Registry Mirror

  
# Harbor with local cache
docker run -d --name harbor -p 443:443 \
  -v harbor_data:/data \
  goharbor/harbor:2.10.0

# Docker client config
echo '{"registry-mirrors": ["https://harbor.example.com"]}' | \
  sudo tee /etc/docker/daemon.json

6. Usage & Operations

Outage-Proof Daily Operations

Backup Strategy:

  
# PostgreSQL with WAL-G to MinIO (S3 alternative)
wal-g backup-push /var/lib/postgresql/data \
  --config /etc/wal-g/config.json

# Verify without AWS S3 API
MINIO_ALIAS=localbackup
mc ls $MINIO_ALIAS/postgres-backups

Automated Image Updates:

  
# Watchtower without Docker Hub dependency
docker run -d --name watchtower \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e WATCHTOWER_POLL_INTERVAL=86400 \
  -e WATCHTOWER_NO_PULL=false \
  --restart unless-stopped \
  containrrr/watchtower:2.6.0 \
  --label-enable --include-stopped

7. Troubleshooting

“When DNS Is Down” Diagnostic Toolkit

Bypass Cloud Resolvers:

  
# Direct root server query
dig +norecurse @h.root-servers.net dynamodb.us-east-1.amazonaws.com

# Verify local cache hit
unbound-control dump_cache | grep dynamodb

Container Fallback Testing:

  
# Force offline mode
iptables -A OUTPUT -p tcp --dport 443 -j DROP

# Validate degraded functionality
docker exec $CONTAINER_ID curl https://aws.amazon.com -m 5
# Expected: "Connection timed out"

# Check service health endpoint
docker exec $CONTAINER_ID wget -qO- localhost:8080/health

8. Conclusion

The June 2024 AWS outage wasn’t an anomaly – it was a stress test. Engineers who designed systems expecting cloud failure maintained availability through:

Decentralized DNS: Local resolvers with aggressive caching
On-Premises Redundancy: Critical service mirrors (container registries, object storage)
State Management: Knowing when SQLite outperforms DynamoDB

For deeper study:

The cloud’s greatest irony? Its most resilient users treat it as expendable. Build accordingly.

Open Source, Reddit Guides, Docker

This post is licensed under CC BY 4.0 by the author.