Strangler Fig: Bidirectional Sync Failure Lost Finances
40% traffic to new service lost finances.
20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.
- Intercept traffic at the edge proxy, route individual features to new services
- Old system stays live until all functionality is migrated — no big bang
- Risk is bounded to the slice currently being migrated, not the entire system
- Data sync between old and new is the hardest part — expect weeks of reconciliation
- Rollback means flipping traffic back to legacy, cheap and fast
- Biggest mistake: migrating the database before the service — you'll need dual writes
Imagine a giant old oak tree in your garden. A strangler fig vine wraps around it, growing its own roots and branches, slowly taking over — until one day the oak rots away and only the fig is left, strong and healthy. Nobody had to chop the oak down overnight. That's exactly what this pattern does to legacy software: you grow a new system around the old one, route traffic to the new parts gradually, and quietly retire the old code piece by piece.
Every senior engineer has a war story about the legacy monolith. The codebase that nobody dares touch, where a one-line change takes three weeks of regression testing and still breaks something in production at 2 AM on a Friday. These systems didn't become terrifying overnight — they grew that way over years of feature additions, hotfixes, and 'we'll clean this up later' compromises. The business depends on them. You cannot simply turn them off.
The Strangler Fig Pattern, coined by Martin Fowler in 2004 after observing actual strangler fig trees in Australian rainforests, is an architectural migration strategy that solves one specific problem: how do you replace a working-but-painful system with a better one without a risky, all-or-nothing 'big-bang' rewrite? The answer is that you don't replace it all at once. You intercept traffic at the edge, divert individual capabilities to new services as they're built, and let the old system die by starvation rather than demolition. The risk at any point in time is bounded to the slice you're currently migrating.
By the end of this article you'll understand the full mechanics of the pattern — the proxy/facade layer, feature-by-feature traffic routing, data synchronisation between old and new, rollback strategies, and the production gotchas that turn a smooth migration into a nightmare if you don't see them coming. You'll also have working code for the routing facade and a feature-flag-driven traffic splitter you can adapt to your own stack today.
What Is the Strangler Fig Pattern? (And Why Your Team Needs It)
The Strangler Fig Pattern is a migration strategy that lets you replace a legacy system incrementally, one feature at a time. You put a routing layer — a reverse proxy, API gateway, or even a smart load balancer — in front of the existing monolith. Every incoming request hits this facade instead of the legacy app directly.
The facade checks a routing table (often backed by feature flags) and decides whether to send the request to the old system or the new service. Over time, you build replacement services for each functionality while the legacy app still handles everything else. You route traffic to the new service when it's ready. Once a feature is fully replaced and tested, you remove the legacy code for that feature.
This isn't a new idea — Martin Fowler described it in 2004. But most teams still default to the 'rewrite it all' approach, which collapses under its own risk. The Strangler Fig pattern caps the blast radius of any mistake to exactly one feature.
- You never rewrite 'everything'. You pick one capability — login, search, payments — and replace that.
- The legacy system continues running all unmigrated features. Zero risk outside the slice.
- If the new service fails, you flip the routing rule back. The legacy system never stopped.
- This pattern works because each slice is small enough to reason about, test, and rollback independently.
Building the Routing Facade: Feature Flags and Traffic Splitting
The facade is the single most critical piece of a Strangler Fig migration. It must be performant, stateless (or externalise state), and observable. Most teams use an API gateway (Kong, Nginx, Envoy) or a reverse proxy with dynamic routing. The key requirement: routing decisions must be changeable at runtime without a deployment.
Feature flags control which users or requests go to the new service. You start at 0% traffic, enable it for internal testing (1% of users), then gradually increase to 100%. The flag can be based on user ID hash, geographic region, or any attribute. If something breaks, you turn the flag off — traffic instantly goes back to legacy.
Don't implement your own feature flag system in-house. Use LaunchDarkly, Unleash, or even a simple Redis-backed toggle. Your only job is to read the flag in the facade, not implement the flag infrastructure.
Data Synchronisation: The Real Challenge of Strangler Fig
Routing traffic is the easy part. The hard part is keeping data consistent between the legacy database and your new service's database. During migration, both systems need to access and modify the same user data, orders, or inventory. If you don't have a solid data sync strategy, you'll end up with silent corruption.
The safest approach is to have a single source of truth (the legacy database) and have the new service read from it but write to its own database plus the legacy one (dual writes). This keeps both systems in sync. However, dual writes are error-prone — one side can fail while the other succeeds. A better approach is to use change data capture (CDC) from the legacy database: any change in the legacy DB is streamed to a message topic, and the new service consumes that stream to update its own store. The new service's writes are also written to the legacy DB via the same CDC pipeline (reverse sync).
Alternatively, you can migrate data at the database level first (e.g., use database views or federation), but that adds a different kind of coupling. The key is bidirectional replication until you cut over completely.
- There is no distributed transaction between two databases in a strangler fig migration.
- Your write path must handle: legacy success + new failure, legacy failure + new success, or both failure.
- The legacy system must remain the authoritative source until cutover is complete.
- Use a reconciliation cron job to detect and fix differences between the two stores hourly.
Rollback Strategy: How to Undo a Migration Without Pain
A good strangler fig migration must have a rapid rollback plan for every slice. The beauty of the pattern is that the legacy system never goes away until the last feature is migrated. You can always flip the routing flag back to legacy for a particular feature.
But a simple routing rollback isn't always enough — you also need to handle data. If the new service wrote data that doesn't exist in legacy, you can lose it on rollback. The rule: the legacy system must be the authoritative writer until cutover. Any writes from the new service must be replicated back to legacy (dual writes or CDC reverse sync). That way, when you flip the routing back, the legacy system has all the data.
Your rollback sequence: 1) Turn off the feature flag (stop routing traffic to new service). 2) Verify the legacy system can serve all the data (run a data consistency check). 3) If data is missing, run a backfill from the new service's database. 4) Decommission new service only after at least 48 hours of clean rollback window.
Test your rollback before you need it. Simulate a failure scenario in staging: let the new service crash and verify that the routing facade correctly falls back to legacy without any UX interruption.
When NOT to Use the Strangler Fig Pattern
The Strangler Fig pattern isn't a silver bullet. It works best for replacing parts of a monolithic system where you can isolate a single capability. It fails when:
- The legacy system has no clear interface boundaries — everything is tightly coupled through a shared database or global state. In that case, you can't extract a single feature without dragging half the monolith with it.
- The new system requires a fundamentally different data model that can't be mapped to the legacy one. If every request needs to transform heavily between old and new schema, the proxy becomes a bottleneck.
- You need performance improvements immediately — the strangler fig approach adds latency from the proxy and dual writes for many months. If you need to make the system 2x faster this quarter, a rewrite (with careful planning) might be the better call.
- The team is unwilling to maintain two codebases in parallel. The pattern requires you to keep the legacy app around until migration is complete. If your team can't handle that cognitive load, consider a big-bang migration with a well-tested rollback plan instead.
Evaluate your specific context. The pattern is a tool, not a religion.
The Real Enemy: Shared State Between Old and New Systems
Here's the trap most teams walk into. They split traffic between legacy monolith and shiny new microservices, and everything works fine for three weeks. Then customers start seeing stale data in one system and fresh data in another. Orders vanish. User sessions collide. Your on-call phone becomes ground zero for a war between two versions of reality.
The core problem is shared mutable state. Both systems read and write to the same database, or worse, they cache independently. The legacy system mutates a user record, the new service reads an old replica, and suddenly you're explaining to a VP why customer invoices don't match.
The solution isn't elegant. It's practical. You need a single source of truth with a write-through pattern. Route all writes through the strangler facade. Both old and new systems read from freshest data, but only one system owns writes at any time. Or you implement event sourcing where the legacy system emits domain events that the new system consumes. Either way, you must treat data like explosives – handle with clear ownership boundaries.
Everything else is technical debt waiting to explode.
Kill the Facade Before It Kills You
The strangler facade is a hero in month one. By month twelve, it's technical debt wearing a crown. Teams get comfortable. They stop migrating the last 20% of legacy features because "the facade handles it." The facade becomes a god object -- routing, authorization, logging, rate limiting all tangled in one middleware that nobody wants to touch.
You need an expiration date on your facade. Treat it like a feature flag that defaults to legacy, not a permanent infrastructure piece. Every route added to the facade should trigger a ticket to either fully migrate that feature or explicitly sunset the legacy path. If you don't, you'll end up with a distributed monolith that's harder to operate than the original.
Best practice: After 80% of traffic routes to new services, schedule a two-week sprint to decommission the facade entirely. Replace remaining routes with direct service-to-service calls. The facade served its purpose – it let you change the system without stopping the business. Don't let it become a permanent complexity multiplier.
Teams that ignore this advice spend six months debugging why the facade times out on Tuesdays at 3 PM.
Problems and Considerations: Where the Strangler Fig Breaks
The Strangler Fig pattern looks great on a whiteboard. In production, it hits real friction. First: traffic splitting is easy until you need session affinity. If your old system holds user state in memory and the new system uses a different session store, a 50/50 split means users get logged out randomly. You need sticky sessions or a shared token store—neither is trivial.
Second: reporting and analytics. Your old system logs to one database, your new system logs to another. Suddenly your dashboards show half the data. You'll need a reporting facade that reads from both and deduplicates. That's a hidden cost most teams miss until month three.
Third: domain coupling. If your old monolith has a single table that powers billing, shipping, and notifications, strangling one feature breaks the others. You don't get to pick and choose—you either migrate the shared dependency first or build a translation layer. Both slow you down.
Finally: team coordination. The pattern demands a steady stream of small migrations. If your team rotates or loses context, the facade rots. Keep a living document of which features are migrated and which routes still hit the old system. One stale route and you're debugging production incidents at 2 AM.
Related Resources: What to Read After You Hit the Wall
You've read the Strangler Fig page. Now stop treating it like a silver bullet. Here's what you actually need next.
Feature Toggle Pattern (Martin Fowler, 2016) — This is the routing facade's brain. Understand the difference between release toggles, experiment toggles, and ops toggles. Your Strangler Fig uses all three, and if you mix them up you'll deploy features that aren't ready or kill rollbacks.
Bulkhead Pattern — Once you have old and new systems running side by side, one bad query in the old system can starve the new system of connection pool resources. Bulkhead isolates them. Without it, a crash in legacy code takes down your shiny new microservice.
Change Data Capture (CDC) — For data synchronization, skip the batch jobs. Read Debezium's documentation or Jay Kreps' "The Log: What Every Software Engineer Should Know About Real-Time Data's Unifying Abstraction." CDC lets you stream database changes from your old system to your new one without writing a single ETL cron.
Anti-Corruption Layer (Eric Evans, DDD) — When the old system's data model is garbage (and it always is), this pattern translates between contexts. Your Strangler Fig route handler should call an anti-corruption layer, not the old database directly.
The Midnight Data Loss That Killed a Migration
- Data sync must be bidirectional during the migration period — not just one way.
- Assume every user can be served by either system at any time until migration is complete.
- Change data capture with a message broker is the only reliable way to handle dual writes without application-level coupling.
curl -H "X-Force-Route: new-service" to isolate the issue.curl -v http://proxy/api/users/123 --header "X-Original-Route: legacy"diff current_routing.yaml previous_routing.yamlKey takeaways
Common mistakes to avoid
4 patternsMigrating the database before the service
Not planning for bidirectional data sync
Ramping traffic too fast without load testing the new service
Assuming the proxy is stateless when it holds routing state
Interview Questions on This Topic
Explain the Strangler Fig Pattern. How does it differ from a big-bang rewrite?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.
That's Architecture. Mark it forged?
9 min read · try the examples if you haven't