Your On-Call Rotation Is Writing Requirements. Nobody Is Reading Them.
There's a specific moment most on-call engineers recognize. You're in a post-mortem, walking through a memory exhaustion failure that took longer to diagnose than it should have. Someone mentions they've seen this pattern before. Someone else says they filed a ticket after the last one. You go looking for the ticket. It's closed — marked resolved after the immediate fix, never connected to anything in the roadmap. The underlying failure mode is still there. You're just meeting it again. That moment is worth examining, because it reveals something about how reliability work actually moves — or doesn't — through most technology organizations. The on-call engineers usually have detailed, accurate knowledge of where the system is fragile. They know which failure modes the runbooks don't fully cover, which write patterns cause CPU plateaus the monitoring doesn't catch, which edge cases will eventually surface as customer escalations. That knowledge gets expre...