Big-Bang Rewrites Almost Always Fail
Replacing an entire system at once feels decisive and clean. It is also the single most reliable way to turn a working operation into an expensive crater.

I build and replace software systems for a living, and the most dangerous words a client can say to me are "let's just rebuild the whole thing at once." The big-bang rewrite — scrap the old system, build its replacement in parallel, flip a switch on launch day — is the most seductive plan in technology and the one I fight hardest against. It feels decisive, clean, and final. It is also how working operations turn into smoking craters.
I have watched teams pour a year and a fortune into a from-scratch replacement, only to discover on launch day that the old system did a hundred quiet things nobody documented — and now none of them work. The rewrite was supposed to be the upgrade. It became the outage.
Why the conventional wisdom is wrong
The conventional wisdom says a messy old system is best replaced cleanly: freeze it, build its successor properly this time, and cut over once the new one is ready. It sounds responsible. The hidden flaw is that the old system is not just code — it is years of accumulated edge cases, fixes, and undocumented behavior the business now silently depends on. A rewrite throws all of that away and bets it can be re-derived from memory. It almost never can.
The old system's quirks are load-bearing — customers, integrations, and reports quietly rely on behavior nobody ever wrote down.
The rewrite has to hit a moving target, because the old system keeps changing to serve the business while you rebuild it.
No value is delivered until launch day, so a year of work produces zero real feedback until it is far too late to course-correct.
The cutover is all-or-nothing, so every risk in the entire project converges on a single terrifying moment.
What is actually true
What is actually true is that big systems are replaced safely by strangling them, not by detonating them. You wrap the old system, route one slice of functionality at a time through the new one, and shrink the old system gradually until nothing is left. Each step is small, reversible, and delivers value immediately. Risk is spread across dozens of tiny cutovers instead of concentrated in one. By the time the old system is gone, the new one has already been proven in production, piece by piece.
This is slower to start and far less satisfying than a clean-slate rewrite. There is no triumphant launch day, no moment where the old thing dies and the new thing is born — just a steady, boring migration where the lights never go out. That boredom is the point. The dramatic launch is where projects die; the incremental migration is where they survive.
When a rewrite is most tempting
The urge to rewrite is strongest at exactly the moments you should resist it hardest.
The old system is "a mess," so a clean rebuild feels morally satisfying — but messy and working still beats clean and unproven.
A new team inherited the code, does not understand it, and would rather rebuild than learn it — which guarantees they will relearn its hidden lessons the hard way.
A new technology is exciting, so the rewrite is really a pretext to use it — a terrible reason to bet the operation.
Estimating an incremental migration is hard, while "just rewrite it" sounds simple — so the safer, harder plan loses the meeting to the easy, dangerous one.
What we see at TTGC
When clients come to us wanting to scrap a system and rebuild it from zero, our first job is usually to talk them out of the big bang. We have seen the craters — the year-long rewrites that launched broken, the cutovers that took the business down, the rebuilds abandoned half-finished after the budget ran out, leaving two half-working systems instead of one whole one. So we migrate incrementally almost without exception: wrap the old system, replace it slice by slice, and keep it running the entire time. Clients sometimes find it anticlimactic — they wanted a bold new platform, and we gave them a quiet series of small swaps. But their operation never went dark, and that is the whole job. The rewrites that make headlines are the ones that fail. The migrations that work are the ones nobody noticed.
The honest take
If you are about to replace a system, resist the clean-slate rewrite no matter how satisfying it feels. The big bang concentrates every risk into one launch day and delivers nothing until then, which is why it so reliably fails. Strangle the old system instead — replace it one small, reversible slice at a time, keep it running throughout, and let the new system earn its place in production gradually. It is slower, less heroic, and far less likely to end in a crater. In system replacement, boring is not a compromise. It is the strategy.
Sources
Martin Fowler — the Strangler Fig pattern for incrementally replacing legacy systems. martinfowler.com
Standish Group CHAOS Report — research on software project failure rates and the risk of large, all-at-once efforts.
TTGC — patterns across client systems and migration work.


