OSPF Metric Blind Spot — Why 10Gbps and 100Mbps Look Equal
Default OSPF cost=1 on both 10Gbps and transatlantic 100Mbps caused >200ms latency.
- RIP uses Bellman-Ford for distance vector routing with hop count metric
- OSPF uses Dijkstra's SPF for link state routing with cost based on bandwidth
- BGP is a path vector protocol that exchanges reachability and policy attributes
- Convergence: RIP can take minutes (count-to-infinity), OSPF converges in under 10s
- Production insight: metric misconfiguration causes silent suboptimal routing
Imagine your city has thousands of roads and you need to drive a package from New York to Los Angeles. A routing protocol is like a GPS system that every intersection uses to talk to its neighbours — each junction shares what roads it knows about, how congested they are, and updates its map when a road closes. The 'protocol' is just the agreed language all those intersections use to gossip with each other so every driver always takes the best available path.
Every time you load a webpage, your request hops across dozens of routers spanning continents, undersea cables, and data-centres owned by completely different companies. None of those routers were pre-programmed with a static map of the entire internet — that would be impossible to maintain. Instead, they run routing protocols: living, breathing algorithms that continuously discover the network topology, elect the best paths, and heal themselves when links go dark. Understanding this machinery isn't academic; it's what separates an engineer who can debug a production outage from one who just restarts the router and hopes.
The core problem routing protocols solve is dynamic reachability at scale. A static route you add manually works fine for a lab with five subnets. It collapses the moment a link fails or a new site comes online, because nothing automatically redistributes that knowledge. Routing protocols replace human intervention with distributed consensus — every router converges on the same view of the network without a central coordinator, and they do it in seconds or milliseconds depending on the protocol.
By the end of this article you'll understand exactly how Bellman-Ford powers RIP and why it causes count-to-infinity, how Dijkstra's SPF algorithm inside OSPF builds a loop-free topology, why BGP is a policy engine masquerading as a routing protocol, and how to reason about convergence time and route selection in production networks. You'll also walk away with concrete Python simulations you can run locally to watch these algorithms think.
Where Routing Protocols Fit in Networking
Every router maintains a routing table — the list of every destination network it knows about and how to reach it. But routers don't build that table by magic. They rely on routing protocols to dynamically discover paths, adapt to failures, and distribute reachability across an autonomous system or the entire internet.
The distinction between routing protocols and routed protocols is crucial for production thinking. Routed protocols like IP carry user data. Routing protocols like OSPF, BGP, and RIP carry routing information between routers. If the routing protocol goes down, the IP traffic doesn't necessarily stop — the routers still have stale routes until they time out. That stale state is a common root cause of traffic blackholes.
RIP — Distance Vector Routing and Its Production Limits
RIP (Routing Information Protocol) is the simplest routing protocol. Every router sends its entire routing table to neighbours every 30 seconds. Each route carries a hop count — a metric that counts how many routers you must cross to reach the destination. The maximum is 15 hops; 16 means unreachable.
The algorithm behind RIP is Bellman-Ford — a distributed version where each router updates its table based on received advertisements. RIP converges slowly because of the count-to-infinity problem: when a link fails, the announcement takes time to propagate, and routers may temporarily believe a path exists through a router that just lost that route, incrementing the hop count each time. This slowly increases until it hits 16.
In production, RIP is almost never used today. The 15-hop limit is too restrictive for any network with more than a few routers. Convergence can take minutes — unacceptable for modern applications. However, RIP's simplicity makes it an excellent teaching tool. Understanding its flaws explains why OSPF and BGP exist.
- When a link fails, the router that discovers it sets the metric to 16 and announces it.
- Neighbors may have a better path? No — they might have learned that route from the failed router, so they still think it's reachable.
- The failed router hears the neighbor's advertisement and thinks there's a path through them, so it updates its own metric to (neighbor's metric + 1).
- This cycle repeats, each time incrementing the metric, until it reaches 16 and is finally considered unreachable.
- Protocols use split-horizon and route poisoning to mitigate this, but poison reverse only helps for directly connected routers.
OSPF — Link State Routing with Fast Convergence
OSPF (Open Shortest Path First) is a link-state routing protocol. Instead of exchanging routing tables, OSPF routers flood Link State Advertisements (LSAs) to all routers in the same area. Every router builds an identical Link State Database (LSDB) of the entire network topology. Then each router runs Dijkstra's Shortest Path First (SPF) algorithm on this database to compute the shortest path tree to every destination.
OSPF uses a metric called cost, which defaults to reference_bandwidth / interface_bandwidth. This makes path selection sensitive to bandwidth — a 1Gbps link gets cost=1 (if reference is 100Gbps), a 100Mbps link gets cost=1 as well. That's why you must adjust the reference bandwidth to match your fastest links.
OSPF converges in seconds because each router independently calculates paths from the consistent LSDB — no hop-by-hop propagation delay. Link failures trigger immediate LSA floods, and the SPF tree is recalculated. However, SPF recalculations can be CPU-intensive in large topologies; OSPF mitigates this with areas (hierarchical design) and incremental SPF (iSPF).
BGP — The Internet's Policy Engine
BGP (Border Gateway Protocol) is not a typical routing protocol. It doesn't find the shortest path based on bandwidth or hops. Instead, it exchanges reachability information and applies policy. BGP is a path vector protocol: each route advertisement carries the entire AS_PATH — the list of autonomous systems the route has traversed. This prevents loops (an AS will reject its own AS in the path).
BGP's primary metric is not latency or bandwidth but administrative policy. Network operators use BGP attributes like Local Preference (LOCAL_PREF), Multi-Exit Discriminator (MED), AS_PATH length, and community tags to influence inbound and outbound traffic. BGP decision process has 10+ tie-breaking steps, from highest LOCAL_PREF to lowest IGP metric to the router ID.
BGP convergence is complex — especially after a failure that creates a withdrawn route. The path exploration problem can cause significant delays (minutes) as BGP speakers try alternative paths before giving up. Techniques like BGP prefix independent convergence (PIC) and BGP add-path mitigate this.
- A route is like a treaty: 'I will send traffic to network X through AS Y because we have a peering agreement.'
- Local Preference is your internal policy: 'I prefer routes learned from my transit provider over my backup.'
- AS_PATH is a trust measure: shorter path = less intermediation, but longer path may be more reliable.
- MED is a suggestion: 'I'd prefer you enter my AS through this specific border router.'
- Communities are tags that convey intent across administrative boundaries.
Convergence and Performance: Engineering for Fast Recovery
Convergence is the time it takes for all routers to agree on a consistent view of the network after a change. The required convergence time depends on the application: voice traffic can tolerate sub-second outage, email can handle seconds, but financial trading systems require sub-50ms recovery.
RIP converges in tens of seconds to minutes due to hold-down timers and count-to-infinity. OSPF converges in 5-10 seconds with default timers; with BFD this drops to <1 second. BGP convergence is trickier: after a route withdrawal, BGP may explore alternative paths for up to minutes (path exploration).
- BFD (Bidirectional Forwarding Detection): sub-second link failure detection independent of routing protocol.
- BGP PIC (Prefix Independent Convergence): pre-compute backup paths.
- LFA (Loop-Free Alternate) in OSPF/IS-IS: install a backup next-hop to avoid waiting for SPF.
- Graceful Restart: allows router to continue forwarding while restarting, if neighbor cooperates.
The trade-off: faster convergence often means more state, more CPU or memory. BFD adds packet overhead. LFA doubles the FIB table size. Choose based on your failure rate and tolerance.
OSPF Metric Misconfiguration Caused East-West Traffic to Traverse a Transatlantic Link
- Never assume default OSPF costs reflect real path quality.
- Always tune the reference bandwidth to match your fastest links.
- Use administrative distance or route-maps when simple cost isn't enough.
- Monitor traffic flows after any routing configuration change — a silent diversion can be worse than a visible outage.
Key takeaways
Common mistakes to avoid
5 patternsAssuming OSPF cost default works for all link speeds
Not filtering BGP routes inbound and outbound
Using RIP in production with default timers
Forgetting to configure OSPF network type on point-to-point links
Mismatching BGP timers or update source interface
Interview Questions on This Topic
Explain the difference between distance vector and link state routing protocols. Give one advantage and one disadvantage of each.
Frequently Asked Questions
That's Computer Networks. Mark it forged?
5 min read · try the examples if you haven't