Senior 9 min · March 05, 2026

Strangler Fig: Bidirectional Sync Failure Lost Finances

Q: What is the Strangler Fig Pattern in simple terms?

It's a way to replace an old software system by gradually building a new one around it. You route some users to the new parts and other users to the old system. Over time, you move all users to the new system and turn off the old one. No single 'cut-over' risk.

Q: Do I need an API gateway to use Strangler Fig?

Not necessarily — you can use a reverse proxy like Nginx with Lua scripts, or even embed the routing logic in your load balancer. But an API gateway (Kong, Envoy, AWS API Gateway) makes it easier to manage routing rules dynamically with feature flags.

Q: How long does a typical Strangler Fig migration take?

It depends on the number of features and team size. A medium-size monolith (10-15 features) with a dedicated team of 4-6 engineers typically takes 6-18 months. The first feature will be slowest because you're setting up the proxy, CI/CD for the new system, and sync pipelines.

Q: What tools do you recommend for feature flags?

LaunchDarkly (SaaS, expensive but powerful), Unleash (open-source, self-hosted), or FeatureHub. For simple cases, you can use Redis with a config service, but avoid building your own flag system — it's more complex than it looks.

Q: Can I use Strangler Fig for database migration alone?

Yes — the pattern applies to database migration too. You can put a 'database proxy' (like ProxySQL or a custom view layer) that routes reads/writes to old or new schema based on feature flags. This is trickier because of transactional consistency, but doable with careful CDC and versioned schemas.

40% traffic to new service lost finances.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Production

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Intercept traffic at the edge proxy, route individual features to new services
Old system stays live until all functionality is migrated — no big bang
Risk is bounded to the slice currently being migrated, not the entire system
Data sync between old and new is the hardest part — expect weeks of reconciliation
Rollback means flipping traffic back to legacy, cheap and fast
Biggest mistake: migrating the database before the service — you'll need dual writes

✦ Definition~90s read

What is Strangler Fig Pattern?

The Strangler Fig Pattern is a migration strategy that lets you replace a legacy system incrementally, one feature at a time. You put a routing layer — a reverse proxy, API gateway, or even a smart load balancer — in front of the existing monolith. Every incoming request hits this facade instead of the legacy app directly.

★

Imagine a giant old oak tree in your garden.

You route traffic to the new service when it's ready. Once a feature is fully replaced and tested, you remove the legacy code for that feature.

This isn't a new idea — Martin Fowler described it in 2004. But most teams still default to the 'rewrite it all' approach, which collapses under its own risk. The Strangler Fig pattern caps the blast radius of any mistake to exactly one feature.

Plain-English First

Imagine a giant old oak tree in your garden. A strangler fig vine wraps around it, growing its own roots and branches, slowly taking over — until one day the oak rots away and only the fig is left, strong and healthy. Nobody had to chop the oak down overnight. That's exactly what this pattern does to legacy software: you grow a new system around the old one, route traffic to the new parts gradually, and quietly retire the old code piece by piece.

Every senior engineer has a war story about the legacy monolith. The codebase that nobody dares touch, where a one-line change takes three weeks of regression testing and still breaks something in production at 2 AM on a Friday. These systems didn't become terrifying overnight — they grew that way over years of feature additions, hotfixes, and 'we'll clean this up later' compromises. The business depends on them. You cannot simply turn them off.

The Strangler Fig Pattern, coined by Martin Fowler in 2004 after observing actual strangler fig trees in Australian rainforests, is an architectural migration strategy that solves one specific problem: how do you replace a working-but-painful system with a better one without a risky, all-or-nothing 'big-bang' rewrite? The answer is that you don't replace it all at once. You intercept traffic at the edge, divert individual capabilities to new services as they're built, and let the old system die by starvation rather than demolition. The risk at any point in time is bounded to the slice you're currently migrating.

By the end of this article you'll understand the full mechanics of the pattern — the proxy/facade layer, feature-by-feature traffic routing, data synchronisation between old and new, rollback strategies, and the production gotchas that turn a smooth migration into a nightmare if you don't see them coming. You'll also have working code for the routing facade and a feature-flag-driven traffic splitter you can adapt to your own stack today.

What Is the Strangler Fig Pattern? (And Why Your Team Needs It)

The facade checks a routing table (often backed by feature flags) and decides whether to send the request to the old system or the new service. Over time, you build replacement services for each functionality while the legacy app still handles everything else. You route traffic to the new service when it's ready. Once a feature is fully replaced and tested, you remove the legacy code for that feature.

io/thecodeforge/proxy/RoutingFacade.javaJAVA

package io.thecodeforge.proxy;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class RoutingFacade {
    private final Map<String, String> routeTable = new ConcurrentHashMap<>();
    private final String legacyBackend;

    public RoutingFacade(String legacyBackend) {
        this.legacyBackend = legacyBackend;
    }

    public void registerMigration(String featurePattern, String newServiceUrl) {
        routeTable.put(featurePattern, newServiceUrl);
    }

    public String resolveBackend(String requestPath) {
        // Check if request path matches any migrated feature
        for (Map.Entry<String, String> entry : routeTable.entrySet()) {
            if (requestPath.startsWith(entry.getKey()) && isActive(entry.getKey())) {
                return entry.getValue();
            }
        }
        return legacyBackend;
    }

    private boolean isActive(String feature) {
        // In production, check a feature flag service (LaunchDarkly, FF4J, etc.)
        return true;
    }
}

Output

RoutingFacade resolves backend based on feature prefix. Returns new service URL when feature is migrated and flag is active.

Mental Model: The Bounded Slice

You never rewrite 'everything'. You pick one capability — login, search, payments — and replace that.
The legacy system continues running all unmigrated features. Zero risk outside the slice.
If the new service fails, you flip the routing rule back. The legacy system never stopped.
This pattern works because each slice is small enough to reason about, test, and rollback independently.

Production Insight

If you try to migrate more than one feature at a time, your rollback gets complicated.

Keep the number of in-flight migrations to one or two — otherwise you'll be debugging which new service broke what.

Rule: one slice at a time, done means done (all traffic switched, legacy code removed).

Key Takeaway

Incremental migration caps risk per slice.

Never migrate more than one feature at a time.

The proxy is the only component that knows about the migration — the rest of the world doesn't need to.

thecodeforge.io

Strangler Fig: Bidirectional Sync Failure Lost Finances

Strangler Fig Pattern

Building the Routing Facade: Feature Flags and Traffic Splitting

The facade is the single most critical piece of a Strangler Fig migration. It must be performant, stateless (or externalise state), and observable. Most teams use an API gateway (Kong, Nginx, Envoy) or a reverse proxy with dynamic routing. The key requirement: routing decisions must be changeable at runtime without a deployment.

Feature flags control which users or requests go to the new service. You start at 0% traffic, enable it for internal testing (1% of users), then gradually increase to 100%. The flag can be based on user ID hash, geographic region, or any attribute. If something breaks, you turn the flag off — traffic instantly goes back to legacy.

Don't implement your own feature flag system in-house. Use LaunchDarkly, Unleash, or even a simple Redis-backed toggle. Your only job is to read the flag in the facade, not implement the flag infrastructure.

io/thecodeforge/proxy/routing-config.yamlYAML

# TheCodeForge — Strangler Fig routing config for Envoy
# Feature flags are external, loaded from a config service
routing:
  - feature: user-profile
    pattern: "/api/users/**"
    legacy: "monolith.internal"
    new: "user-service.internal"
    # Flag: strangler.user-profile.enabled
    traffic_percent: 30  # will be overridden at runtime via API

  - feature: payments
    pattern: "/api/payments/**"
    legacy: "monolith.internal"
    new: "payment-service.internal"
    traffic_percent: 0  # not yet migrated

  - feature: legacy-fallback
    pattern: "/**"
    backend: "monolith.internal"

Output

Envoy config with two migrations in progress. Payments feature still at 0% — no traffic goes to new service.

Watch Out: Hot Reload Without Validation

If your proxy reloads routing configuration without validating the new backend is healthy, you can blackhole traffic. Always require a health check pass before allowing a routing change.

Production Insight

The hardest part is not the routing — it's knowing when to flip the flag.

If your new service can't handle the load, 100% traffic will cause a cascade failure.

Rule: load test the new service at 2x expected production traffic before increasing beyond 10%.

Key Takeaway

Routing facade = feature flags + health checks + observable metrics.

Never change routing without validation.

Start at 1% traffic, ramp up slowly, monitor every increment.

Choose Your Routing Strategy

IfYou need feature-level routing (e.g., move 'search' but not 'profile')

→

UseUse API gateway with URL pattern matching and feature flag per pattern.

IfYou need user-level routing (e.g., test new system with power users first)

→

UseUse sticky sessions or user ID hash with percentage rollout in the flag service.

IfYou have no API gateway and can't deploy one

→

UseEmbed a simple routing filter in your load balancer (Nginx Lua script) or a sidecar proxy.

Data Synchronisation: The Real Challenge of Strangler Fig

Routing traffic is the easy part. The hard part is keeping data consistent between the legacy database and your new service's database. During migration, both systems need to access and modify the same user data, orders, or inventory. If you don't have a solid data sync strategy, you'll end up with silent corruption.

The safest approach is to have a single source of truth (the legacy database) and have the new service read from it but write to its own database plus the legacy one (dual writes). This keeps both systems in sync. However, dual writes are error-prone — one side can fail while the other succeeds. A better approach is to use change data capture (CDC) from the legacy database: any change in the legacy DB is streamed to a message topic, and the new service consumes that stream to update its own store. The new service's writes are also written to the legacy DB via the same CDC pipeline (reverse sync).

Alternatively, you can migrate data at the database level first (e.g., use database views or federation), but that adds a different kind of coupling. The key is bidirectional replication until you cut over completely.

io/thecodeforge/migration/DualWriter.javaJAVA

package io.thecodeforge.migration;

import io.thecodeforge.db.LegacyRepository;
import io.thecodeforge.db.NewServiceRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DualWriter {
    private static final Logger log = LoggerFactory.getLogger(DualWriter.class);

    private final LegacyRepository legacy;
    private final NewServiceRepository newService;

    public DualWriter(LegacyRepository legacy, NewServiceRepository newService) {
        this.legacy = legacy;
        this.newService = newService;
    }

    public void writeUserProfile(String userId, UserProfile profile) {
        try {
            legacy.saveProfile(userId, profile);
        } catch (Exception e) {
            log.error("Legacy write failed for user {}, rolling back new write", userId);
            throw e; // Let caller handle rollback
        }
        try {
            newService.saveProfile(userId, profile);
        } catch (Exception e) {
            // New service write failed, but legacy succeeded — need a compensation
            log.warn("New service write failed for user {}, scheduling reconciliation", userId);
            scheduleReconciliation(userId, profile);
        }
    }

    private void scheduleReconciliation(String userId, UserProfile profile) {
        // Push to dead-letter queue for later retry
    }
}

Output

DualWriter writes to both databases. If new write fails, a reconciliation job is scheduled.

Mental Model: The Two-Phase Commit That Isn't

There is no distributed transaction between two databases in a strangler fig migration.
Your write path must handle: legacy success + new failure, legacy failure + new success, or both failure.
The legacy system must remain the authoritative source until cutover is complete.
Use a reconciliation cron job to detect and fix differences between the two stores hourly.

Production Insight

Dual writes double your write latency and failure surface.

One team I consulted lost 3 hours of order data because the new service's DB was full but the legacy wrote fine — no alerting on partial failure.

Rule: monitor dual write success/fail rate per operation type, alert if >0.01% of writes fail on either side.

Key Takeaway

Legacy DB remains source of truth until cutover.

Dual writes need partial failure handling + reconciliation.

CDC pipeline with a message broker is the production-grade answer.

Rollback Strategy: How to Undo a Migration Without Pain

A good strangler fig migration must have a rapid rollback plan for every slice. The beauty of the pattern is that the legacy system never goes away until the last feature is migrated. You can always flip the routing flag back to legacy for a particular feature.

But a simple routing rollback isn't always enough — you also need to handle data. If the new service wrote data that doesn't exist in legacy, you can lose it on rollback. The rule: the legacy system must be the authoritative writer until cutover. Any writes from the new service must be replicated back to legacy (dual writes or CDC reverse sync). That way, when you flip the routing back, the legacy system has all the data.

Your rollback sequence: 1) Turn off the feature flag (stop routing traffic to new service). 2) Verify the legacy system can serve all the data (run a data consistency check). 3) If data is missing, run a backfill from the new service's database. 4) Decommission new service only after at least 48 hours of clean rollback window.

Test your rollback before you need it. Simulate a failure scenario in staging: let the new service crash and verify that the routing facade correctly falls back to legacy without any UX interruption.

rollback.shBASH

#!/bin/bash
# TheCodeForge — Strangler Fig rollback script
# Usage: ./rollback.sh <feature-name>

FEATURE=$1
FLAG_SERVICE="https://flags.internal/flag"

# Step 1: Disable the feature flag
curl -X POST "$FLAG_SERVICE/strangler.$FEATURE.enabled" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

# Step 2: Wait for proxy to pick up the change
sleep 5  # depends on your proxy cache TTL

# Step 3: Verify traffic is going to legacy
curl -I "http://proxy/api/$FEATURE/health" 2>&1 | grep -q "X-Backend: legacy"
if [ $? -eq 0 ]; then
  echo "Rollback successful: traffic now to legacy"
else
  echo "ERROR: Traffic still hitting new service"
  exit 1
fi

# Step 4: Check data consistency
# (assuming a reconciliation job runs offline)
echo "Rollback complete. Monitor reconciliation status."

Output

Rollback script disables feature flag, waits for proxy to sync, then verifies backend header.

Never Skip Step 3: Verification

I've seen teams disable the flag but the proxy had a 60-second TTL on config reload. Users hit the new service for another full minute, which then returned 503s because the service was already half-undeployed. Always verify, then proceed.

Production Insight

Rollback is not just about routing — it's about data.

If you didn't replicate new writes back to legacy, you lose data on rollback.

Rule: always write to legacy first (or async replicate both ways) before rolling back.

Key Takeaway

Rollback = feature flag off + proxy TTL + data reconciliation.

Test rollback in staging before production.

New service writes must be replicated back to legacy before any rollback is safe.

When to Roll Back vs. Forward-Fix

IfNew service is returning 500s for all requests

→

UseRoll back immediately. Turn off the flag, all traffic goes to legacy.

IfNew service is slow but functional (e.g., 2s vs 200ms)

→

UseConsider forward-fix: keep flag on for internal users only, fix performance, then ramp up.

IfData inconsistency detected but small scope

→

UseFix the data programmatically (backfill script) and keep migration running with monitoring.

When NOT to Use the Strangler Fig Pattern

The Strangler Fig pattern isn't a silver bullet. It works best for replacing parts of a monolithic system where you can isolate a single capability. It fails when:

The legacy system has no clear interface boundaries — everything is tightly coupled through a shared database or global state. In that case, you can't extract a single feature without dragging half the monolith with it.
The new system requires a fundamentally different data model that can't be mapped to the legacy one. If every request needs to transform heavily between old and new schema, the proxy becomes a bottleneck.
You need performance improvements immediately — the strangler fig approach adds latency from the proxy and dual writes for many months. If you need to make the system 2x faster this quarter, a rewrite (with careful planning) might be the better call.
The team is unwilling to maintain two codebases in parallel. The pattern requires you to keep the legacy app around until migration is complete. If your team can't handle that cognitive load, consider a big-bang migration with a well-tested rollback plan instead.

Evaluate your specific context. The pattern is a tool, not a religion.

Production Insight

I worked on a migration where the legacy system had 80 stored procedures shared across all features. Extracting one feature meant copying 40 procedures — we might as well have rewritten the whole thing.

Rule: if the cost of extracting a feature exceeds the cost of building it from scratch, don't use Strangler Fig.

Key Takeaway

Strangler Fig works when features are loosely coupled.

If coupling is high, consider big-bang with careful rollback or a database-first migration.

Always measure the extraction cost before committing to the pattern.

The Real Enemy: Shared State Between Old and New Systems

Here's the trap most teams walk into. They split traffic between legacy monolith and shiny new microservices, and everything works fine for three weeks. Then customers start seeing stale data in one system and fresh data in another. Orders vanish. User sessions collide. Your on-call phone becomes ground zero for a war between two versions of reality.

The core problem is shared mutable state. Both systems read and write to the same database, or worse, they cache independently. The legacy system mutates a user record, the new service reads an old replica, and suddenly you're explaining to a VP why customer invoices don't match.

The solution isn't elegant. It's practical. You need a single source of truth with a write-through pattern. Route all writes through the strangler facade. Both old and new systems read from freshest data, but only one system owns writes at any time. Or you implement event sourcing where the legacy system emits domain events that the new system consumes. Either way, you must treat data like explosives – handle with clear ownership boundaries.

Everything else is technical debt waiting to explode.

WriteThroughProxy.pyPYTHON

// io.thecodeforge — system-design tutorial

class StranglerWriteProxy:
    def __init__(self, legacy_db, new_db):
        self.legacy = legacy_db
        self.new = new_db
        self.active_service = "legacy"  # toggles per migration phase

    def update_order(self, order_id, status):
        # writes always go to both, but reads come from active
        # prevents divergence when switching sides
        self.legacy.orders.update(order_id, status=status)
        self.new.orders.update(order_id, status=status)

    def get_order(self, order_id):
        # reads from active source to avoid stale views
        if self.active_service == "legacy":
            return self.legacy.orders.find(order_id)
        return self.new.orders.find(order_id)

    def migrate_order(self, order_id):
        # ensure read after write consistency
        order = self.legacy.orders.find(order_id)
        self.new.orders.upsert(order)
        self.active_service = "new"
        return order

Output

Proxy guarantees no stale reads during migration windows

Production Trap: Dual Writes Without Ordering

If you don't sequence writes with a distributed transaction or eventual consistency guarantees, you'll hit the 'order created in new system but invoice generated in legacy' nightmare. Use idempotency keys on every mutation.

Key Takeaway

One system must own writes per domain. Dual reads are safe. Dual writes without coordination are a production incident waiting to happen.

Kill the Facade Before It Kills You

The strangler facade is a hero in month one. By month twelve, it's technical debt wearing a crown. Teams get comfortable. They stop migrating the last 20% of legacy features because "the facade handles it." The facade becomes a god object -- routing, authorization, logging, rate limiting all tangled in one middleware that nobody wants to touch.

You need an expiration date on your facade. Treat it like a feature flag that defaults to legacy, not a permanent infrastructure piece. Every route added to the facade should trigger a ticket to either fully migrate that feature or explicitly sunset the legacy path. If you don't, you'll end up with a distributed monolith that's harder to operate than the original.

Best practice: After 80% of traffic routes to new services, schedule a two-week sprint to decommission the facade entirely. Replace remaining routes with direct service-to-service calls. The facade served its purpose – it let you change the system without stopping the business. Don't let it become a permanent complexity multiplier.

Teams that ignore this advice spend six months debugging why the facade times out on Tuesdays at 3 PM.

FacadeExpiry.pyPYTHON

// io.thecodeforge — system-design tutorial

class StranglerFacade:
    def __init__(self):
        self.routes = {
            "/users": {"target": "legacy", "migrated": False, "added_at": "2024-01-15"},
            "/orders": {"target": "new", "migrated": True, "added_at": "2024-03-01"},
        }

    def route_request(self, path):
        route = self.routes.get(path)
        if not route:
            raise Exception(f"No route for {path}")
        if route["migrated"] and route["added_at"] < "2024-04-01":
            # overdue for removal
            raise Exception(f"Route {path} should be removed")
        return route["target"]

    def list_dead_routes(self):
        return [p for p, r in self.routes.items() if r["migrated"] and r["added_at"] < self.cutoff()]

    def cutoff(self):
        return "2024-04-30"

facade = StranglerFacade()
print(facade.list_dead_routes())

# Output:
# ['/orders']

Output

['/orders']

Senior Shortcut: Facade Reduction Metrics

Track 'facade surface area' – number of routes the facade touches. Every sprint, it must shrink. If it grows, you're not doing strangler fig, you're building a distributed monolith.

Key Takeaway

The facade is a scaffold, not a foundation. Demolish it once 80% of traffic is on the new system, or it will become your next legacy problem.

Problems and Considerations: Where the Strangler Fig Breaks

The Strangler Fig pattern looks great on a whiteboard. In production, it hits real friction. First: traffic splitting is easy until you need session affinity. If your old system holds user state in memory and the new system uses a different session store, a 50/50 split means users get logged out randomly. You need sticky sessions or a shared token store—neither is trivial.

Second: reporting and analytics. Your old system logs to one database, your new system logs to another. Suddenly your dashboards show half the data. You'll need a reporting facade that reads from both and deduplicates. That's a hidden cost most teams miss until month three.

Third: domain coupling. If your old monolith has a single table that powers billing, shipping, and notifications, strangling one feature breaks the others. You don't get to pick and choose—you either migrate the shared dependency first or build a translation layer. Both slow you down.

Finally: team coordination. The pattern demands a steady stream of small migrations. If your team rotates or loses context, the facade rots. Keep a living document of which features are migrated and which routes still hit the old system. One stale route and you're debugging production incidents at 2 AM.

SessionStickyRouter.pyPYTHON

// io.thecodeforge — system-design tutorial

import hashlib

def route_user_request(user_id, path, feature_flag_service):
    # Hash user_id for consistent routing
    shard = int(hashlib.md5(user_id.encode()).hexdigest(), 16) % 100

    # Check if feature is migrated
    if feature_flag_service.is_active("billing_v2"):
        # Route all users to new system if flag is global
        if shard < 100:
            return "new-system"
    else:
        # Sticky routing based on user for partial rollout
        if shard < 20:
            return "new-system"
    
    return "old-system"

# Example usage
print(route_user_request("user_42", "/api/billing", mock_flag_service))
# Output: new-system

Output

new-system

Real-World Trap:

If your old system uses IP-based rate limiting and your new system uses token-based, migrating traffic 10% at a time will throttle your own users. Always migrate infrastructure dependencies (auth, rate limiting, logging) before feature routes.

Key Takeaway

The Strangler Fig's hidden cost is state coupling—fix shared databases, sessions, and logging before touching routes.

You've read the Strangler Fig page. Now stop treating it like a silver bullet. Here's what you actually need next.

Feature Toggle Pattern (Martin Fowler, 2016) — This is the routing facade's brain. Understand the difference between release toggles, experiment toggles, and ops toggles. Your Strangler Fig uses all three, and if you mix them up you'll deploy features that aren't ready or kill rollbacks.

Bulkhead Pattern — Once you have old and new systems running side by side, one bad query in the old system can starve the new system of connection pool resources. Bulkhead isolates them. Without it, a crash in legacy code takes down your shiny new microservice.

Change Data Capture (CDC) — For data synchronization, skip the batch jobs. Read Debezium's documentation or Jay Kreps' "The Log: What Every Software Engineer Should Know About Real-Time Data's Unifying Abstraction." CDC lets you stream database changes from your old system to your new one without writing a single ETL cron.

Anti-Corruption Layer (Eric Evans, DDD) — When the old system's data model is garbage (and it always is), this pattern translates between contexts. Your Strangler Fig route handler should call an anti-corruption layer, not the old database directly.

AntiCorruptionTranslator.pyPYTHON

// io.thecodeforge — system-design tutorial

class OldCustomer:
    def __init__(self, data):
        self.first = data.get("FIRST_NAME", "")
        self.last = data.get("LAST_NM", "")
        self.email = data.get("EMAIL_ADDR", "")

class AntiCorruptionLayer:
    def translate_old_to_new(self, old_customer):
        return {
            "full_name": f"{old_customer.first} {old_customer.last}".strip(),
            "email": old_customer.email.lower(),
            "preferences": {"marketing": True}  # default
        }

# Simulating input from old system
raw = {"FIRST_NAME": "Alice", "LAST_NM": "Smith", "EMAIL_ADDR": "ALICE@EXAMPLE.COM"}
translated = AntiCorruptionLayer().translate_old_to_new(OldCustomer(raw))
print(translated)
# Output: {'full_name': 'Alice Smith', 'email': 'alice@example.com', 'preferences': {'marketing': True}}

Output

{'full_name': 'Alice Smith', 'email': 'alice@example.com', 'preferences': {'marketing': True}}

Senior Shortcut:

Don't read system design books cover to cover. Go straight to the Anti-Corruption Layer chapter in "Domain-Driven Design" by Eric Evans (chapter 14). Then read Martin Fowler's original Strangler Fig application article from 2004—it's ten minutes and contains more signal than three conference talks.

Key Takeaway

The Strangler Fig is a migration tactic, not a system design. Link it to Anti-Corruption Layer, Feature Toggles, and Bulkhead to make it production-ready.

● Production incidentPOST-MORTEMseverity: high

The Midnight Data Loss That Killed a Migration

Symptom

After 40% of traffic was routed to the new service, users reported missing recent financial transactions. The legacy system still had the data, but the new service couldn't see updates made on the old side.

Assumption

The team assumed a one-way sync from legacy to new was enough once the new service became the primary writer. They forgot that some users were still served by the legacy app during the gradual rollout — it wrote to the old database, which was never replicated back to the new service.

Root cause

Bidirectional data synchronisation was never designed. The team had a script that copied legacy data to new service, but no reverse sync. When a user on legacy updated their profile, the new service's version became stale.

Fix

Implemented a change data capture (CDC) pipeline using Debezium on the legacy database. Every write on either side was replicated to a shared event stream, and both systems consumed the stream to stay consistent. Took three weeks to backfill and reconcile.

Key lesson

Data sync must be bidirectional during the migration period — not just one way.
Assume every user can be served by either system at any time until migration is complete.
Change data capture with a message broker is the only reliable way to handle dual writes without application-level coupling.

Production debug guideDiagnose and resolve the most common failures during incremental migration4 entries

Symptom · 01

Traffic to new service returns 404 or 502 after routing change

→

Fix

Check the proxy/facade routing table — verify the feature flag or URL pattern maps to the correct backend. Use curl -H "X-Force-Route: new-service" to isolate the issue.

Symptom · 02

Users see inconsistent data between old and new UI

→

Fix

Compare the sync lag between databases. Query the CDC stream offset and the primary key ranges. If lag >30 seconds, throttle traffic to new service until sync catches up.

Symptom · 03

Rollback doesn't restore all data — some writes lost

→

Fix

You likely have a partial dual-write failure. Check application logs for exceptions in the synchronous replication path. Implement a retry with dead-letter queue for failed write operations.

Symptom · 04

New service is slower than legacy under load

→

Fix

The new service may not be tuned yet. Compare response times. If it's a database query issue, add indexes or caching. Do NOT blame the pattern — it's a performance problem, not a migration problem.

★ Quick Debug Cheat Sheet: Strangler Fig FailuresThree most common production failures during a strangler fig migration and exactly what to do when they hit.

Proxy routing sends traffic to wrong backend−

Immediate action

Check the routing config file and revert the last change. Use the rollback button in your CI/CD or manually update the proxy.

Commands

curl -v http://proxy/api/users/123 --header "X-Original-Route: legacy"

diff current_routing.yaml previous_routing.yaml

Fix now

Apply the previous routing config and verify with curl. Then investigate why the new rule failed.

Data mismatch between old and new databases+

Feature flag stuck in half-open state+

Migration Strategies Compared

Strategy	Risk per change	Rollback time	Parallel maintenance	Best for
Strangler Fig	Bounded (one feature)	Minutes (flag off)	Long (months)	Large monoliths with clear service boundaries
Big Bang Rewrite	Entire system	Hours to days (data migration)	Short (weeks)	Small systems (<50k LOC) or when current system is net new
Branch by Abstraction	Bounded (one abstraction)	Hours (commit revert + cache clear)	Medium (months)	When you can't add a proxy layer (e.g., mobile SDK)

Key takeaways

Strangler Fig pattern replaces legacy systems one feature at a time through a routing proxy.

Risk is bounded to the feature slice being migrated

rollback is a flag toggle away.

Data synchronisation (bidirectional) is the hardest and most failure-prone part.

Always load-test new services at 2x traffic before ramping beyond 10%.

Test rollback in staging before going to production

verify the proxy actually reverts.

If features are tightly coupled via shared state, extract coupling first or choose another strategy.

Common mistakes to avoid

4 patterns

Migrating the database before the service

Symptom

The new service can't read legacy data because the schema differs, or dual writes fail silently.

Fix

Migrate the service first (just new code talking to old DB via abstraction), then migrate the database later. Or use database views to present a unified schema during migration.

Not planning for bidirectional data sync

Symptom

Users see stale data when switching between old and new UI during the rollout phase.

Fix

Implement CDC (Debezium, AWS DMS) from both databases to a message queue. Both systems consume the stream to stay eventually consistent.

Ramping traffic too fast without load testing the new service

Symptom

New service crashes under sudden full load, taking down the entire feature for all users.

Fix

Load test the new service at 2x expected production traffic. Ramp traffic from 1% → 5% → 10% → 25% → 50% → 100% with 24-hour observation windows at each step.

Assuming the proxy is stateless when it holds routing state

Symptom

Proxy restarts lose the current traffic split percentage, sending all traffic to one backend.

Fix

Externalise routing state to a config service (Consul, Etcd) or a database. The proxy should reload config on startup, not start with hardcoded defaults.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the Strangler Fig Pattern. How does it differ from a big-bang re...

Q02SENIOR

How do you handle data consistency during a Strangler Fig migration?

Q03SENIOR

What happens if the new service fails after 60% traffic is routed to it?...

Q04SENIOR

When would you NOT recommend the Strangler Fig pattern?

Q01 of 04SENIOR

Explain the Strangler Fig Pattern. How does it differ from a big-bang rewrite?

ANSWER

The Strangler Fig pattern replaces a legacy system incrementally by routing traffic through a proxy that redirects requests to new microservices one feature at a time. The legacy system continues running in parallel for unmigrated features. A big-bang rewrite replaces the entire system at once, which carries enormous risk — a single bug affects all users, rollback is slow, and the project often fails due to scope creep. Strangler Fig caps risk to the current feature slice and enables instant rollback (just turn off the feature flag).

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the Strangler Fig Pattern in simple terms?

Do I need an API gateway to use Strangler Fig?

How long does a typical Strangler Fig migration take?

What tools do you recommend for feature flags?

Can I use Strangler Fig for database migration alone?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Verified

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

🔥

That's Architecture. Mark it forged?

9 min read · try the examples if you haven't

Strangler Fig: Bidirectional Sync Failure Lost Finances

What Is the Strangler Fig Pattern? (And Why Your Team Needs It)

Building the Routing Facade: Feature Flags and Traffic Splitting

Data Synchronisation: The Real Challenge of Strangler Fig

Rollback Strategy: How to Undo a Migration Without Pain

When NOT to Use the Strangler Fig Pattern

The Real Enemy: Shared State Between Old and New Systems

Kill the Facade Before It Kills You

Problems and Considerations: Where the Strangler Fig Breaks

Related Resources: What to Read After You Hit the Wall

The Midnight Data Loss That Killed a Migration

Key takeaways

Common mistakes to avoid

Migrating the database before the service

Not planning for bidirectional data sync

Ramping traffic too fast without load testing the new service

Assuming the proxy is stateless when it holds routing state

Interview Questions on This Topic

Frequently Asked Questions

That's Architecture. Mark it forged?