If You're Writing a Post-Mortem, It's Too Late

The best time to prevent a production disaster was six months ago. The second best time is now.

Aug 03, 2025

We've all been there. It's 3 AM, your service is down, customers are angry, and executives are asking hard questions. Hours later, after the fire is finally out, someone inevitably says: "We need to write a post-mortem."

But here's the uncomfortable truth: if you're writing a post-mortem, you've already failed. Not at incident response—you might have handled that brilliantly—but at the more fundamental job of engineering leadership: anticipating failure before it happens.

Post-mortems are archaeology. Pre-mortems are architecture.

What Is a Pre-Mortem?

A pre-mortem is exactly what it sounds like: conducting a failure analysis before the failure occurs. It's a structured exercise where teams imagine their project, system, or initiative has failed catastrophically, then work backwards to identify what could have caused that failure.

The concept comes from psychology research by Gary Klein, who found that teams using pre-mortem techniques increased their ability to identify risks by 30%. In engineering, where the cost of failure can be measured in millions of dollars and sleepless nights, that 30% improvement isn't just valuable—it's essential.

Pre-Mortem vs. Risk Assessment: What's the Difference?

Traditional risk assessments ask: "What could go wrong?" Pre-mortems ask: "We failed spectacularly. What happened?"

The difference is profound. Risk assessments often generate sanitized lists of theoretical problems. Pre-mortems tap into your team's deepest fears and gut instincts about what will actually break in production.

The Anatomy of Engineering Pre-Mortems

1. The Failure Scenario Exercise

Start every major project with this question: "It's six months from now. Our system is completely down, customers are furious, and we're trending on Twitter for all the wrong reasons. What happened?"

Give your team 15 minutes to brainstorm failure scenarios without judgment. You'll be amazed at what emerges:

"Our database connection pool gets exhausted during the Super Bowl"
"That third-party API we depend on gets acquired and shut down"
"We discover our authentication system has been broken for weeks"

2. The Organizational Failure Audit

Technical failures rarely happen in isolation. Ask:

"What organizational dynamics contributed to this failure?"
"What process broke down?"
"What did we not communicate?"

Often, the real failure modes are human: unclear ownership, poor communication, or conflicting priorities.

3. The Success Bias Challenge

Teams naturally focus on how things should work. Pre-mortems force you to imagine how they will break. Challenge every assumption:

"This service never goes down" → "What happens when it does?"
"Our users understand the interface" → "What if they don't?"
"The network is reliable" → "What about that fiber cut last year?"

Pre-Mortem Templates That Work

The Technical Pre-Mortem Template

Project: [Name]
Imagined Failure Date: [6 months from now]
Failure Description: [One sentence describing catastrophic failure]

Technical Failure Modes:

Data layer failures
Service dependencies
Performance bottlenecks
Security vulnerabilities
Operational complexity

Organizational Failure Modes:

Communication breakdowns
Process gaps
Resource constraints
Knowledge silos

Early Warning Signals:

What metrics would indicate we're heading toward failure?
What behaviors should concern us?

Prevention Strategies:

What can we build/change now to prevent this?
What monitoring/alerting do we need?
What processes should we establish?

The Launch Pre-Mortem

Before any major launch, run this specific exercise:

"Our launch was a disaster. Traffic spiked, systems crashed, and we had to roll back. The CEO is asking why we didn't see this coming. What happened?"

Common discoveries:

Load testing didn't account for real user behavior
Database migrations took longer than expected
Monitoring gaps left blind spots
Rollback procedures weren't tested

Case Study: The Migration That Didn't Fail

A team at a major e-commerce company was migrating their checkout system to a new architecture. Instead of jumping straight into planning, they ran a pre-mortem.

Imagined Failure: "Black Friday arrives, and our new checkout system can't handle the load. We lose $10M in revenue."

What They Discovered:

Their load testing was based on average traffic, not peak holiday patterns
The new system had different memory characteristics that could cause issues
Their rollback plan required manual steps that would take hours

Actions Taken:

Rewrote load tests using actual Black Friday traffic patterns from previous years
Built automated rollback procedures
Created a "circuit breaker" that could instantly route traffic back to the old system
Established dedicated war room protocols with clear decision-making authority

Result: The migration succeeded flawlessly. But more importantly, it succeeded because they imagined it failing.

The Pre-Mortem Meeting Structure

Phase 1: Set the Scene (5 minutes)

Define the project/system scope
Establish the imagined failure scenario
Set ground rules: no idea is too paranoid

Phase 2: Individual Brainstorming (10 minutes)

Each person writes down failure modes privately
No discussion or filtering yet
Encourage wild, specific scenarios

Phase 3: Collective Brain Dump (15 minutes)

Round-robin sharing of ideas
Record everything without judgment
Build on others' ideas

Phase 4: Categorization (10 minutes)

Group failures by type (technical, organizational, external)
Identify patterns and dependencies

Phase 5: Prioritization (10 minutes)

Rate by likelihood and impact
Focus on high-probability, high-impact scenarios

Phase 6: Prevention Planning (20 minutes)

For top risks, define specific prevention measures
Assign owners and timelines
Identify early warning signals

Making Pre-Mortems Stick

1. Leadership Buy-In

Pre-mortems only work if leadership genuinely wants to hear bad news. If your culture punishes pessimism, pre-mortems become theater.

2. Follow-Through Discipline

The document isn't the deliverable—the prevention measures are. Track and review your pre-mortem action items like any other project requirement.

3. Psychological Safety

Teams need to feel safe expressing their deepest concerns about a project. If people are afraid to voice doubts, you'll miss the most important failure modes.

4. Regular Revisiting

Pre-mortems aren't one-time exercises. As projects evolve, new failure modes emerge. Schedule regular "failure assumption reviews."

The ROI of Paranoia

Pre-mortems feel like overhead until they prevent your first major incident. Consider the math:

Average cost of critical system downtime: $300,000/hour
Time to run a thorough pre-mortem: 4 hours
If pre-mortems prevent one major incident per year: ROI of 75,000%

But the real value isn't just incident prevention—it's better system design. Teams that regularly practice pre-mortems build more resilient systems from the ground up.

Common Pre-Mortem Pitfalls

The Optimism Trap

"That could never happen to us." This is exactly the mindset pre-mortems are designed to overcome. If you're not uncomfortable with some of your failure scenarios, you're not thinking creatively enough.

Analysis Paralysis

Don't let pre-mortems become endless catastrophizing sessions. Set time limits and focus on actionable risks.

The Prevention Theater

Going through the motions without actually implementing prevention measures. Pre-mortems without follow-through are worse than no pre-mortems at all.

The Single-Point-of-Failure Fallacy

Most real failures are cascading. Don't just identify individual failure modes—explore how they compound.

Building a Pre-Mortem Culture

Start small. Pick one upcoming project and run a 30-minute pre-mortem. When team members see how many blindspots you uncover, they'll become believers.

Make it routine. Add pre-mortems to your project kickoff checklist, right alongside requirements gathering and technical design.

Celebrate prevented failures. When your monitoring catches an issue that your pre-mortem identified, make sure the team knows their paranoia paid off.

The Meta-Pre-Mortem Question

Here's the question that should keep every engineering leader awake at night: "What failure are we not even considering?"

The most dangerous failures are the ones outside your mental model entirely. The database technology that seemed rock-solid but had a rare edge case. The vendor that seemed too big to fail. The assumption that seemed too obvious to question.

This is why diverse teams matter for pre-mortems. Different backgrounds, experiences, and paranoia patterns help you see blindspots.

Conclusion: The Courage to Imagine Failure

Post-mortems make us feel responsible and thorough. But they're ultimately about documenting failure, not preventing it. Pre-mortems require something harder: the intellectual humility to assume we're wrong and the courage to imagine our creations failing.

The best engineering teams I've worked with share one trait: they're professionally paranoid. They don't just build systems—they build systems that are designed to fail gracefully when the unthinkable happens.

Because in production, the unthinkable happens every Tuesday.

Your next project will face challenges you haven't considered. Your systems will fail in ways you didn't anticipate. The question isn't whether this will happen—it's whether you'll see it coming.

So before you write your next post-mortem, ask yourself: what pre-mortem should we have written instead?

Runtime Decisions

Discussion about this post

Ready for more?