Dont Know Everything Quiet Quit Be Mediocre Itll Save Your Sanity In The Long Run
Don’t Know Everything, Quiet Quit, Be Mediocre: It’ll Save Your Sanity In The Long Run
Introduction
The sysadmin’s terminal flickers with yet another ticket: “NTP server shows correct time but wall clock is 10 minutes off.” You verify NTP sync, check firewall rules, confirm stratum sources - everything works. Yet the analog clock on the wall remains stubbornly wrong. The punchline? It’s not your clock. You didn’t install it. You didn’t configure it. But now it’s your problem.
This scenario from Reddit’s r/sysadmin perfectly captures the DevOps trap we’ve all faced: The compulsion to own every technical problem within sight distance, regardless of actual responsibility. This post isn’t about NTP troubleshooting - it’s about the psychological infrastructure we build (or fail to build) around our technical work.
In an era where Kubernetes clusters span continents and SaaS sprawl creates invisible dependencies, the old sysadmin mantra “know everything, control everything” has become a recipe for burnout. We’ll examine:
- The cultural shift from “hero sysadmin” to sustainable DevOps practice
- Technical boundary-setting using modern infrastructure patterns
- Documentation strategies that protect your sanity
- When and how to say “not my problem” professionally
You’ll learn concrete techniques to:
- Define operational responsibility boundaries in complex environments
- Create self-service documentation that deflects trivial requests
- Implement monitoring that automatically answers “is this my problem?”
- Preserve mental bandwidth for high-value engineering work
Understanding the Problem Space
The Death of the Omniscient Sysadmin
In legacy IT environments, system administrators were expected to be:
- Hardware technicians
- Network engineers
- Application specialists
- Security auditors
- Desktop support
- Procurement managers
This “full-stack human” model collapsed under cloud-native complexity. Consider these statistics from recent DevOps reports:
Responsibility Area | % of Teams Reporting Ownership Fatigue |
---|---|
Cloud Infrastructure | 73% |
CI/CD Pipelines | 68% |
Developer Tooling | 61% |
Legacy Systems | 89% |
Third-Party SaaS | 57% |
The Reddit clock scenario exemplifies responsibility creep - when auxiliary systems become your problem through organizational osmosis.
Modern Responsibility Boundaries
Effective DevOps teams use technical guardrails to define operational boundaries:
Infrastructure Ownership Matrix
| System Component | Primary Owner | Escalation Path | Monitoring Responsibility | |—————————|—————|—————–|—————————| | NTP Servers | Network Team | Tier 2 Support | Central Monitoring | | Physical Clocks | Facilities | Vendor Support | None (Manual Checks) | | VM Time Sync | DevOps | Cloud Team | Prometheus/Grafana | | Application Timezone Logic | Dev Team | DevOps | App Performance Monitoring| ```
- Automated Responsibility Tagging Modern infrastructure-as-code tools allow ownership tagging:
1 2 3 4 5 6 7 8 9 10
# Terraform resource with ownership metadata resource "aws_instance" "ntp_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro" tags = { Owner = "Network Team" SupportContact = "network-support@example.com" Runbook = "https://wiki.example.com/NTP-Troubleshooting" } }
- Service Boundary Monitoring Prometheus alert rules that differentiate responsibility: ```yaml
alert: NTPStratumHigh expr: node_ntp_stratum > 5 annotations: description: ‘NTP stratum too high (). Contact network team.’ playbook: ‘https://wiki.example.com/NTP-Stratum-Alert’
alert: ContainerTimeDrift expr: abs(time() - container_last_seen) > 60 annotations: description: ‘Container time drift detected. Check Docker host sync.’ playbook: ‘https://wiki.example.com/Container-Time-Sync’ ```
The Mediocrity Principle
“Mediocrity” in this context means strategically limiting deep expertise to your actual responsibility domain. This isn’t about doing poor work - it’s about resisting the urge to:
- Reverse-engineer every black box system
- Maintain tribal knowledge of deprecated systems
- Accept responsibility for systems without authority
As the classic Google SRE book notes: “100% reliability is both impossible and economically nonviable.” Apply this to personal knowledge - 100% system mastery across all dependencies is impossible.
Prerequisites for Sanity Preservation
Technical Requirements
- Clear CMDB (Configuration Management Database)
- ServiceNow, NetBox, or open-source alternatives like iTop
- Must contain:
- System ownership
- Support contacts
- Documentation links
- Unified Monitoring
- Prometheus + Grafana stack
- Proper alert routing (e.g., Alertmanager -> Slack/Teams channels by team)
- Documentation Portal
Organizational Requirements
- Formal RACI Matrix
- Responsible, Accountable, Consulted, Informed
- Published for all critical systems
- Change Advisory Board (CAB) Process
- Prevent shadow IT deployments
- Mandatory ownership assignment for new systems
- Escalation Protocol
- Defined SLAs for cross-team issues
- Automatic ticket routing based on infrastructure tags
Implementing Boundary Controls
Infrastructure as Code Ownership
Add ownership metadata to all IaC resources:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# AWS Terraform module with support metadata
module "ntp_cluster" {
source = "terraform-aws-modules/ec2-instance/aws"
instance_count = 3
name = "ntp-server"
tags = {
Service = "NTP"
Owner = "network-team@example.com"
Runbook = "https://wiki.example.com/NTP-Maintenance"
SupportTier = "2"
}
}
Automated Documentation Generation
Use tools like Terraform-Docs to create self-maintaining ownership manifests:
1
2
# Generate documentation from IaC
terraform-docs markdown table --output-file OWNERSHIP.md .
Resulting OWNERSHIP.md
:
Resource | Type | Owner | Support Tier |
---|---|---|---|
ntp_cluster | aws_instance | network-team@example.com | 2 |
Alert Routing with Ownership Metadata
Configure Alertmanager to route based on tags:
1
2
3
4
5
6
7
8
9
10
11
route:
group_by: ['alertname', 'cluster']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'slack-general'
routes:
- match:
severity: critical
Owner: network-team
receiver: 'slack-network'
The Art of Professional Pushback
When receiving requests outside your domain:
- Verify ownership
1 2
# Query CMDB for system owner curl "https://cmdb.example.com/api/systems/$(hostname)/owner"
Provide actionable handoff Bad response: “Not my problem.” Good response: “Our monitoring shows NTP sync working at the OS level. The physical clock appears to be out of sync. Per our CMDB, physical devices are managed by Facilities (ext. 555). I’ve cc’d their lead on this ticket.”
- Document the interaction Update the ticket with:
- CMDB ownership record
- Monitoring screenshots
- Escalation path followed
Configuration Examples
NTP Boundary Monitoring
Prometheus rules that differentiate sync issues:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
groups:
- name: time-sync
rules:
- alert: OSLevelTimeDrift
expr: abs(node_timex_offset_seconds) > 0.5
labels:
severity: critical
Owner: devops-team
annotations:
description: 'Host time drift detected - check NTP configuration'
- alert: PhysicalClockDrift
expr: last_over_time(physical_clock_offset_seconds[5m]) > 60
labels:
severity: warning
Owner: facilities-team
annotations:
description: 'Building clock out of sync - contact facilities'
Automated Responsibility Checks
Bash script to verify system ownership before troubleshooting:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash
SYSTEM=$1
# Query CMDB
OWNER=$(curl -s "https://cmdb.example.com/api/systems/$SYSTEM/owner")
if [[ "$OWNER" != "devops-team" ]]; then
echo "System $SYSTEM is owned by $OWNER"
echo "Escalating ticket to $OWNER..."
echo "See documentation: https://wiki.example.com/System-Ownership"
exit 1
fi
# Proceed with troubleshooting
ntpq -p
chronyc sources
Operational Workflows
Daily Boundary Maintenance
- CMDB Hygiene Check
1 2
# Report systems without clear ownership curl "https://cmdb.example.com/api/systems?owner=null" | jq .
- Alert Ownership Audit
1 2 3 4 5 6
-- Query alerts handled by wrong team SELECT * FROM alert_history WHERE resolved_by NOT IN ( SELECT team FROM system_owners WHERE system = alert_history.system_name );
- Documentation Link Validation
1 2 3 4 5
# Check for broken runbook links import requests for system in cmdb.systems: if response.status_code != 200: log_error(f"Broken link for {system.name}")
Troubleshooting Boundary Issues
Common Problems and Solutions
Symptom | Likely Cause | Resolution |
---|---|---|
“Why is this my problem?” responses | Missing CMDB data | UPDATE cmdb.systems SET owner='network-team' WHERE name='ntp01'; |
Alerts routing to wrong team | Misconfigured Alertmanager | Add match: [Owner: 'network-team'] to route config |
Recurring shadow IT issues | No CAB process | Implement Terraform Sentinel policies requiring owner tags |
Debugging Ownership Conflicts
- Check historical ownership:
1 2
# Git blame for IaC ownership tags git blame terraform/main.tf | grep -i owner
- Audit access patterns:
1 2 3 4
-- Find who actually maintains the system SELECT DISTINCT user FROM audit_logs WHERE system='ntp-server' AND action IN ('restart', 'config_change');
- Verify monitoring coverage:
1 2
# Check if system has associated alerts curl "https://prometheus.example.com/api/v1/alerts" | jq '.[] | select(.annotations.system == "ntp-server")'
Conclusion
The wall clock that shouldn’t be your problem is more than a meme - it’s a warning sign of unhealthy responsibility spread. By implementing technical ownership boundaries through:
- CMDB-enforced system attribution
- Metadata-driven alert routing
- Self-service documentation
- Professional escalation protocols
We move from the unsustainable “know everything” model to a sustainable DevOps practice where:
- Teams focus on their actual domains
- Cross-system issues have clear paths
- Engineers preserve cognitive bandwidth
Your next steps:
- Audit three systems you “sort of” support with no formal ownership
- Implement at least one metadata tag in your IaC
- Create a single runbook page with explicit escalation instructions
Remember: Mediocrity isn’t about quality - it’s about scope. The best DevOps engineers aren’t those who fix everything, but those who know exactly what not to fix.