Design Tinder: Building a Geo-Distributed Matching Engine That Won't Swipe Left at 3AM
Design Tinder's matching system with real-world trade-offs: geo-sharding, Redis sorted sets, and the 500ms swipe SLA.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Tinder's matching system uses geo-sharded Redis sorted sets for proximity queries, Cassandra for user profiles, and a Kafka-backed event pipeline for real-time updates. The core challenge is maintaining low latency for swipe actions while ensuring eventual consistency across regions.
Imagine a giant map of your city with every single person as a glowing dot. When you swipe right, you're basically shouting 'I'm interested!' within a 50-mile radius. The system has to instantly find all nearby dots, check who already swiped right on you, and if there's a match, light up both dots. Now do this for 50 million people simultaneously, and you've got Tinder's backend.
Everyone thinks Tinder is just a fancy SQL query with a WHERE clause on distance. That's cute. Until your database melts at 2 AM on a Saturday because 10,000 people in Manhattan all swiped right at once. The real challenge isn't matching — it's doing it in under 500 milliseconds while handling 1.8 million swipes per second at peak. This isn't a CRUD app. It's a real-time geo-distributed event system with a side of social graph. By the end of this, you'll know how to build a matching engine that doesn't fall over when a Taylor Swift concert ends.
Why Geo-Sharding Is Non-Negotiable
Your first instinct is to put all user locations in a single Redis sorted set. That works for 100,000 users. At 10 million, every ZRANGEBYSCORE takes 500ms. At 100 million, your Redis instance runs out of memory and starts evicting keys. The fix is geo-sharding: partition your data by geohash cells. Each cell holds a manageable number of users. Queries only hit the relevant cells. This is the difference between a system that scales and a system that burns.
The Matching Pipeline: From Swipe to Match in Under 500ms
When user A swipes right on user B, you need to check if B already swiped right on A. That's a join across two databases: the swipe event log and the user profile. Doing this synchronously kills latency. Instead, use an event-driven pipeline. Swipe right → Kafka event → consumer updates Redis set of 'right swipes' for each user. When a match is detected (both swiped right), another event triggers the match notification. This decouples the write path from the read path and keeps latency predictable.
Why Cassandra Beats PostgreSQL for User Profiles
PostgreSQL with PostGIS can do geo queries. But at Tinder's scale, you need write throughput that PostgreSQL can't handle without sharding. Cassandra gives you linear write scalability and tunable consistency. Use eventual consistency for profile reads (you can tolerate seeing a slightly outdated bio). Use quorum consistency for match writes (you can't afford to lose a match). The trade-off: no joins, no complex queries. You design your schema around query patterns.
The Swipe Queue: Rate Limiting and Backpressure
Without rate limiting, a viral user can get 10k swipes per second, overwhelming your Redis cluster. Implement a token bucket per user for outgoing swipes. Also, use a bounded queue (e.g., Disruptor) in the swipe service to apply backpressure when downstream systems lag. If the queue fills up, reject new swipes with HTTP 429. Better to drop a swipe than to crash the service.
Handling the 'Double Swipe' Race Condition
Two users swipe right on each other at the exact same millisecond. Both consumers check SISMEMBER and see false. Both write to match_events. You get duplicate matches. Fix: use a conditional write in Redis. When writing the swipe, use SETNX to create a lock key. Only proceed if lock acquired. Or use Redis streams with consumer group idempotency. The simplest: make match ID the primary key in Cassandra and use INSERT IF NOT EXISTS.
When Not to Use This Architecture
If you're building a dating app for a small town with 10,000 users, don't copy Tinder's architecture. You don't need Kafka, Cassandra, or Redis clusters. A single PostgreSQL instance with PostGIS and a simple swipe table will work fine. The overhead of distributed systems will kill your velocity. Only reach for this when you have millions of users and a 500ms SLA. Otherwise, keep it simple.
The 4GB Container That Kept Dying
location:dr5ru instead of location:nyc. Each geohash cell holds ~5000 users. ZRANGEBYSCORE now scans 5000 entries, not 2 million. Also added read replicas for the sorted set.- Always pre-filter by geohash before running geo-distance queries.
- A sorted set with 2 million members is a liability, not a feature.
redis-cli --bigkeys to find large sorted sets. 3. If a geohash cell has >50k members, split it. 4. Increase Redis cluster shards.kafka-consumer-groups --bootstrap-server ... --group match-consumer --describe. 2. If lag > 100k, increase partitions and consumers. 3. Check for poison pill messages (deserialization errors).redis-cli --cluster check <node>:6379redis-cli info stats | grep instantaneous_ops_per_secKey takeaways
Interview Questions on This Topic
How does Tinder handle the 'double swipe' race condition where both users swipe right at the same time?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Real World. Mark it forged?
3 min read · try the examples if you haven't