Strategy
When to rewrite vs refactor a legacy system
Quick answer: refactor when the legacy system’s architecture is broadly sound but the implementation has decayed, the test coverage is reasonable, and the business needs continuity. Rewrite when the architecture is structurally wrong for current requirements, the technical debt is so deep that incremental improvement is slower than starting over, or the underlying technology is genuinely obsolete. The default should be refactor — rewrites usually cost more and ship later than expected.
The full-rewrite temptation is one of the most expensive instincts in software engineering. The legacy system is painful to work with. The codebase is messy. The tests are sparse. The team wants to start fresh. Surely a clean rewrite would be faster than slogging through the existing code.
It almost never is.
Joel Spolsky famously called this “the single worst strategic mistake that any software company can make.” That was 2000. Twenty-six years later, the same pattern keeps repeating. Here’s why, and the cases where rewrites actually do work.
Why most rewrites fail
The pattern that catches teams:
1. The legacy system encodes more business logic than the team realises. A function that looks like five lines of dead code turns out to handle a regulatory requirement from three years ago that nobody remembers but customers depend on. The legacy system is full of these — quirks, edge cases, special handling for specific scenarios — that have been added over years in response to real problems.
The rewrite team starts from a fresh requirements document that captures the general shape of the system. The specific quirks aren’t in the document because nobody knows them. Six months in, the rewrite is being patched with discoveries: “oh, also we need to handle this 14-character invoice format because of a 2018 partnership”, “oh, also the system has to send a specific email to compliance every Tuesday”, and on and on.
2. The rewrite ships later than estimated. Rewrites consistently take longer than the original system did, even when the team is more experienced. Reasons: the original system had years of small fixes layered in; the team is rebuilding to a higher quality bar; the testing burden is substantial; integration with surrounding systems often requires both old and new versions running in parallel.
A two-year-estimated rewrite typically takes three. A six-month-estimated rewrite typically takes a year. The maths matters because:
3. The legacy system can’t be paused. While the rewrite happens, the legacy system continues to be fixed and extended. New features are added (because the business doesn’t stop). Bugs are patched. Two years in, the rewrite has to catch up to a moving target rather than the target it started chasing.
4. The rewrite team gets demoralised. Eighteen months into a multi-year rewrite, the team is exhausted, the original architects have left, the requirements have shifted, and the executives who approved the rewrite are wondering when it’ll ship. A meaningful percentage of rewrites are abandoned at this point, leaving the legacy system in place and a sunk cost charged off.
5. The replacement system has its own debt by the time it ships. A four-year rewrite ships in 2026 against requirements set in 2022. By 2026, those requirements are wrong. The new system needs to be modernised before it’s even deployed. The cycle continues.
When rewrites actually do work
Some scenarios where a rewrite genuinely is the right answer:
1. The underlying technology is structurally obsolete. A system in a language with no security updates, on a database that’s end-of-life, with no path to incremental migration. The cost of staying is the cost of running an unsupported stack indefinitely.
2. The architecture is structurally wrong for current requirements. A monolithic system that needs to handle 100× the scale it was designed for and can’t. A single-tenant system that needs to be multi-tenant. An on-premise system that needs to run in the cloud. Sometimes the architecture is so fundamentally misaligned with current needs that incremental migration genuinely doesn’t close the gap.
3. The codebase is so unmaintainable that any change is high-risk. Test coverage near zero, no documentation, original developers all gone, every change introduces three new bugs. The cost of any feature work is so high that the team can’t deliver anything; the rewrite is the cheaper path to any progress.
4. The system is small enough to rebuild quickly. A small system (a few thousand lines of code, well-bounded scope) can sometimes be rewritten in 2–6 weeks. At that scale, the rewrite’s risks are manageable and the benefit is real.
5. The business model has changed fundamentally. The legacy system was built for one business; the business now does something different. The legacy is solving the wrong problem.
In each of these cases, the cost-benefit genuinely favours rewriting. The mistake is treating these as the typical case rather than the exception.
What refactoring actually looks like
The alternative isn’t “leave the legacy system alone.” It’s a structured set of incremental changes that move the system toward where it needs to be:
1. Add tests before changing anything. The first investment in any legacy system is a meaningful test suite. Not exhaustive coverage; enough to detect regressions when changes are made. This is slow and unglamorous and it’s the foundation everything else builds on.
2. Strangler fig migrations. Build new functionality alongside the old, and progressively route traffic to the new version. The old system continues to handle what it’s already handling; the new system handles new use cases. Over time the old system shrinks as the new one grows. Eventually the old system is small enough to retire.
This pattern works because there’s never a flag-day cutover. Each incremental migration is small, reversible, and tested. Risk is bounded.
3. Extract services from the monolith. A specific function or module gets carved out into its own service. The old system calls the new service for that function. Over time more functions are extracted, the old monolith shrinks. Eventually most functionality lives in services and the monolith is a thin orchestration layer.
4. Modernise dependencies and tooling. Upgrading frameworks, libraries, language versions. Each upgrade is small, incremental, and improves the foundation without changing the system’s behaviour visible to users.
5. Improve the user-facing layer separately. Sometimes the legacy system’s problem is the UI feels dated. A modern frontend can be built against the legacy backend, often with Astro, Next.js, or similar — without touching the backend at all. This delivers most of the visible benefit of a rewrite at a fraction of the cost and risk.
These patterns aren’t exotic. They’re the strategies that consistently produce real progress on legacy systems, while a full rewrite consistently doesn’t.
What a refactoring strategy actually costs
For a typical mid-size legacy system:
- Year 1: Foundation work. Tests, dependency upgrades, documentation, architectural decisions. Visible progress is modest; foundation matters.
- Year 2: First strangler migrations. New features built alongside; high-touch areas extracted. Visible progress accelerates.
- Year 3+: Compound benefits. Each new piece of work is faster than the last because the foundation is stronger. The system gradually becomes the system it should have been.
Total cost over three years: typically 1.5–2.5× what the same team would spend on a rewrite. Total delivered value over three years: typically 3–5× what a rewrite delivers, because the rewrite hasn’t shipped yet.
The maths consistently favours refactoring for systems that can be refactored. The work is less glamorous; the outcome is better.
The political problem
The biggest obstacle to refactoring is rarely technical. It’s political. Refactoring doesn’t produce a clean “we’ve modernised our system” announcement. It produces three years of incremental improvements that compound. Executives find it harder to communicate to boards. Engineers find it less satisfying than greenfield work. Sales teams find it harder to demo than “our new platform.”
The honest answer: the right strategy is the one that produces the best outcomes, not the one that’s easiest to communicate. For most legacy systems, that’s refactoring — even when the temptation to rewrite is strong.
What we do with legacy systems
For legacy work we take on:
- We default to refactoring rather than rewriting unless the legacy is in one of the genuine rewrite categories
- We do thorough analysis upfront to understand what the system actually does (often more than the team realises)
- We propose strangler-pattern migrations rather than big-bang rewrites
- We build test coverage as the first investment
- We recommend modernising the user-facing layer separately from the backend when that’s the source of pain
This isn’t universal — sometimes a rewrite is genuinely the right call — but it’s the right default. We turn down rewrites where refactoring would produce a better outcome.
Common questions
Should I rewrite my legacy system or refactor it? Default to refactoring unless the system is in one of the genuine rewrite categories: structurally obsolete technology, architecture fundamentally wrong for current needs, unmaintainable codebase where any change is high-risk, small enough to rebuild quickly, or business model has changed. Refactoring usually produces better outcomes for less cost; rewrites consistently take longer and cost more than estimated.
Why do most rewrites fail? Rewrites take longer than estimated; the legacy system can’t be paused while the rewrite happens; the team is rebuilding a moving target; institutional knowledge in the legacy code is lost; and the eventual replacement is built to outdated requirements by the time it ships. A meaningful percentage of multi-year rewrites are abandoned partway.
What is the strangler fig pattern? A migration approach where new functionality is built alongside the legacy system, with traffic progressively routed to the new version. Over time the legacy system shrinks as the new system grows. Avoids flag-day cutovers and bounds risk on each incremental migration.
How do I know if my system needs a rewrite? Honest signals: the underlying technology is end-of-life with no migration path; the architecture genuinely can’t scale to current requirements; test coverage is so poor that any change is high-risk; the codebase is small enough to rebuild quickly; or the business model has changed fundamentally. Without one of these, refactoring is usually the better call.
How long does refactoring take vs rewriting? Refactoring delivers compounding value over years. Rewrites deliver no value until they ship, which is typically later than estimated. Over a three-year horizon, refactoring usually delivers 3–5× the value of an attempted rewrite, even though year-one progress can look modest.
If you’re weighing rewrite vs refactor for an existing system and want an honest assessment, start a project. We turn down rewrite work where refactoring is genuinely the better path.
More reading
What AI actually costs to run in production
AI demos are cheap. Production is not. Where the money actually goes when you ship an AI feature, and how to size the engineering investment around the model.
IntegrationsWhy integrations break in production (and what to design for)
Every integration that "just calls an API" eventually breaks. The five places they fail first, and the design patterns that keep them running unattended.
StrategyThe hidden costs of SaaS once your business is established
The per-seat licence is the visible cost. Integration tax, lock-in, configuration drift, and the seat tax at scale are the SaaS costs no one quotes up front.