To design Airbnb, you need a microservices architecture with separate services for search, booking, payment, and user management. Use a read-optimized search index (Elasticsearch), a relational database for bookings with optimistic locking, and a message queue for async payment processing. Cache aggressively for read-heavy workloads.
✦ Definition~90s read
What is Design Airbnb?
Design Airbnb is the process of architecting a scalable, reliable system that handles property search, listing management, booking, payments, and user management for a vacation rental marketplace. It requires careful handling of high read/write loads, concurrency conflicts, and distributed transactions.
★
Think of Airbnb like a giant hotel booking system where every hotel room is owned by a different person.
Plain-English First
Think of Airbnb like a giant hotel booking system where every hotel room is owned by a different person. You need a catalog (search), a reservation system (booking), a cash register (payments), and a guest list (users). The trick is making sure two people don't book the same room at the same time, and that the host gets paid correctly. It's like coordinating a massive potluck dinner where everyone brings a dish but you need to ensure no one brings the same thing.
I've seen booking systems fail at 3 AM because of a single missing index. The classic rookie mistake? Treating Airbnb like a simple CRUD app. It's not. It's a distributed system that must handle search, real-time availability, payments, and concurrency — all while staying consistent. Most tutorials gloss over the hard parts: race conditions, idempotency, and eventual consistency. This article covers the real architecture that powers production booking systems, with the exact patterns that prevent double bookings and data loss. By the end, you'll know how to design a system that handles millions of listings and thousands of bookings per second without falling over.
Why Microservices? The Monolith That Couldn't Scale
A monolithic Airbnb clone works for 100 users. At 10 million, it collapses. The search endpoint competes with booking writes. A single DB connection pool gets exhausted by heavy read queries. The fix? Split into services: Search Service (read-heavy, can use Elasticsearch), Booking Service (write-heavy, needs strong consistency), Payment Service (async, idempotent), and User Service (CRUD). Each scales independently. Each owns its data. The trade-off: eventual consistency between services. A booking might take a few seconds to appear in search results. That's acceptable. What's not acceptable is a double booking because of a shared database.
MicroserviceArchitecture.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — SystemDesign tutorial
// Service boundaries and communication
// SearchService: read-optimized, uses Elasticsearch
// - Handles listing queries, filters, pagination
// - Data synced from BookingService via CDC (Debezium)
// BookingService: write-optimized, uses PostgreSQL
// - Handles booking creation, cancellation, availability
// - Uses optimistic locking with version column
// PaymentService: async, uses Kafka + Stripe
// - Processes payments via queue
// - Idempotency keys prevent duplicate charges
// UserService: CRUD, uses PostgreSQL
// - Manages user profiles, authentication
// - Caches sessions in Redis
// Communication:
// - Synchronous: gRPC for low-latency internal calls (e.g., Booking -> Paymentfor idempotency check)
// - Asynchronous: Kafkaforevents (e.g., BookingCreated -> Search to update index)
// APIGateway: routes requests, handles auth, rate limiting
Output
No direct output — architecture diagram described.
Production Trap: Shared Database
Never let multiple services directly access the same database. You'll get coupling, contention, and a single point of failure. Each service must own its data store. Communicate via APIs or events.
thecodeforge.io
Airbnb System Architecture at Scale
Design Airbnb
Search at Scale: Elasticsearch Isn't Optional
Search is the most read-heavy endpoint. Users filter by location, price, dates, amenities. A relational DB with LIKE queries dies under load. You need a search index. Elasticsearch is the standard. It handles full-text search, geo queries, and aggregations. The challenge: keeping the index in sync with the booking database. When a booking is made, the property's availability changes. If the index is stale, users see booked properties as available. Solution: use Change Data Capture (CDC) with Debezium to stream changes from PostgreSQL to Elasticsearch. This gives near-real-time sync with minimal latency. For high-traffic listings, use a write-through cache (Redis) for availability checks during booking creation.
For popular properties, cache availability in Redis with a TTL of 30 seconds. The booking service checks Redis first, then falls back to DB. This reduces load on both DB and Elasticsearch.
Booking Concurrency: How to Prevent Double Bookings
The hardest part of Airbnb is booking. Two users can click 'Book' on the same property at the same time. Without proper locking, both succeed. The fix: optimistic locking with a version column. Each property_calendar row has a version integer. The booking update statement includes WHERE version = :old_version. If the version changed, the update affects zero rows, and the booking fails. The application retries. This works for most cases. For high-contention properties (e.g., a cheap apartment in Manhattan), use pessimistic locking: SELECT ... FOR UPDATE on the availability row before checking. This serializes access but reduces throughput. Choose based on contention level.
BookingOptimisticLocking.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// io.thecodeforge — SystemDesign tutorial
// PostgreSQL booking creation with optimistic locking
-- Table: property_calendar
-- Columns: property_id, date, is_available, version
BEGIN;
-- Read current availability and version
SELECT is_available, version
INTO v_available, v_version
FROM property_calendar
WHERE property_id = 123AND date BETWEEN'2025-06-01'AND'2025-06-05'FORUPDATE; -- pessimistic lock for high contention
IFNOT v_available THENROLLBACK;
RAISEEXCEPTION'Property not available';
ENDIF;
-- Update availability with version check
UPDATE property_calendar
SET is_available = false, version = version + 1WHERE property_id = 123AND date BETWEEN'2025-06-01'AND'2025-06-05'AND version = v_version; -- optimistic check
IFNOTFOUNDTHENROLLBACK;
RAISEEXCEPTION'Concurrency conflict, retry';
ENDIF;
-- Insert booking record
INSERTINTObookings (property_id, guest_id, start_date, end_date, status)
VALUES (123, 456, '2025-06-01', '2025-06-05', 'confirmed');
COMMIT;
Output
No direct output — SQL transaction.
The Classic Bug: Missing Version Check
I've seen teams skip the version check and rely on the SELECT FOR UPDATE alone. That still has a race: two transactions can both read the same version, then both update. The second update overwrites the first. Always include the WHERE version = old_version.
thecodeforge.io
Optimistic Locking for Booking
Design Airbnb
Payment Processing: Async and Idempotent
Payments are the most failure-prone part. Network timeouts, duplicate requests, provider outages. Never process payments synchronously in the booking request. Use an async queue (Kafka, RabbitMQ). The booking service creates a booking with status 'pending', enqueues a payment event, and returns immediately. The payment service consumes the event, charges the guest, and updates the booking status. Idempotency is critical: each payment event has a unique idempotency key (e.g., booking_id + retry_count). The payment provider (Stripe) uses this key to ensure the same charge isn't made twice. If the payment fails, the booking is cancelled, and the availability is restored via a compensating transaction.
AsyncPayment.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// io.thecodeforge — SystemDesign tutorial
// Booking service: after successful DB insert, enqueue payment event
// Kafkaproducer (Java)
producer.send(newProducerRecord<>("payment_events", bookingId,
newPaymentEvent(bookingId, guestId, amount, idempotencyKey)));
// Payment service consumer (Java)
@KafkaListener(topics = "payment_events")
publicvoidhandlePayment(PaymentEvent event) {
// Checkif already processed (idempotency)
if (paymentRepository.existsByIdempotencyKey(event.getIdempotencyKey())) {
return; // already processed
}
try {
// Charge via Stripe with idempotency key
Charge charge = stripe.charges.create(
newChargeCreateParams()
.setAmount(event.getAmount())
.setCurrency("usd")
.setSource("tok_visa")
.setIdempotencyKey(event.getIdempotencyKey())
);
// Update booking status to 'confirmed'
bookingService.updateStatus(event.getBookingId(), "confirmed");
// Store idempotency key
paymentRepository.save(newPaymentRecord(event.getIdempotencyKey(), charge.getId()));
} catch (StripeException e) {
// Payment failed, cancel booking and restore availability
bookingService.cancelBooking(event.getBookingId());
// Optionally notify user
}
}
Output
No direct output — code logic.
Interview Gold: Idempotency Keys
Always mention idempotency keys in system design interviews. It shows you've dealt with real payment failures. The key is generated by the client (booking service) and must be unique per operation. Stripe rejects duplicate keys.
Caching Strategy: What to Cache and What Not To
Cache aggressively for read-heavy endpoints. Search results: cache in Redis with TTL of 1 minute. Property details: cache in CDN (CloudFront) with TTL of 5 minutes. User sessions: cache in Redis with TTL of 30 minutes. But never cache availability for booking decisions. That's a recipe for double bookings. Availability must always be read from the source of truth (database) during booking creation. For search, eventual consistency is fine. For booking, strong consistency is mandatory. Use cache-aside pattern: on cache miss, load from DB and populate cache. Set appropriate TTLs to balance freshness and performance.
CacheAside.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — SystemDesign tutorial
// Cache-aside for property details
function getPropertyDetails(propertyId) {
// Try cache first
String cached = redis.get("property:" + propertyId);
if (cached != null) {
returnJSON.parse(cached);
}
// Cache miss, load from DBProperty property = db.query("SELECT * FROM properties WHERE id = ?", propertyId);
if (property == null) {
returnnull;
}
// Populate cache with TTL
redis.setex("property:" + propertyId, 300, JSON.stringify(property)); // 5 min TTLreturn property;
}
Output
No direct output — function logic.
Never Do This: Cache Availability for Booking
I've seen a startup cache availability in Redis with a 10-second TTL. Two users read 'available' from cache, both booked. Double booking. Always read availability from the database with proper locking during booking.
thecodeforge.io
Cache vs. No-Cache for Bookings
Design Airbnb
Database Schema: The Core Tables
The database is the backbone. Key tables: users, properties, property_calendar (availability per date), bookings, payments. The property_calendar table is denormalized for performance: one row per property per date. This makes availability checks fast (single range query). The bookings table stores the reservation. Payments table records transactions. Use UUIDs as primary keys to avoid sequential ID guessing. Indexes: composite index on property_calendar(property_id, date) for availability queries; index on bookings(property_id, start_date, end_date) for conflict checks. Use foreign keys but be careful with cascading deletes — you don't want to accidentally delete bookings when a property is removed.
DatabaseSchema.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
// io.thecodeforge — SystemDesign tutorial
-- PostgreSQL schema
CREATETABLEusers (
id UUIDPRIMARYKEYDEFAULTgen_random_uuid(),
name VARCHAR(255) NOTNULL,
email VARCHAR(255) UNIQUENOTNULL,
created_at TIMESTAMPDEFAULTNOW()
);
CREATETABLEproperties (
id UUIDPRIMARYKEYDEFAULTgen_random_uuid(),
host_id UUIDREFERENCESusers(id),
title VARCHAR(255) NOTNULL,
description TEXT,
price_per_night DECIMAL(10,2) NOTNULL,
max_guests INT,
created_at TIMESTAMPDEFAULTNOW()
);
CREATETABLEproperty_calendar (
property_id UUIDREFERENCESproperties(id),
date DATENOTNULL,
is_available BOOLEANDEFAULTtrue,
price DECIMAL(10,2), -- can override base price
version INTDEFAULT1,
PRIMARYKEY (property_id, date)
);
CREATEINDEX idx_calendar_property_date ONproperty_calendar(property_id, date);
CREATETABLEbookings (
id UUIDPRIMARYKEYDEFAULTgen_random_uuid(),
property_id UUIDREFERENCESproperties(id),
guest_id UUIDREFERENCESusers(id),
start_date DATENOTNULL,
end_date DATENOTNULL,
status VARCHAR(20) DEFAULT'pending', -- pending, confirmed, cancelled
total_price DECIMAL(10,2),
created_at TIMESTAMPDEFAULTNOW(),
CONSTRAINT no_overlap EXCLUDEUSINGgist (
property_id WITH =,
daterange(start_date, end_date, '[]') WITH &&
) WHERE (status = 'confirmed')
);
CREATETABLEpayments (
id UUIDPRIMARYKEYDEFAULTgen_random_uuid(),
booking_id UUIDREFERENCESbookings(id),
amount DECIMAL(10,2) NOTNULL,
status VARCHAR(20) DEFAULT'pending',
stripe_charge_id VARCHAR(255),
idempotency_key VARCHAR(255) UNIQUE,
created_at TIMESTAMPDEFAULTNOW()
);
Output
No direct output — SQL DDL.
Senior Shortcut: Exclusion Constraint
Use PostgreSQL exclusion constraints to prevent overlapping bookings at the database level. This is a safety net even if application logic fails. The constraint above uses a GiST index on daterange.
API Design: RESTful Endpoints That Don't Suck
Design APIs for the frontend, not for your database schema. Key endpoints: GET /api/properties (search with filters), GET /api/properties/:id (details), POST /api/bookings (create booking), POST /api/payments (process payment). Use pagination for list endpoints (cursor-based, not offset). Return meaningful error codes: 409 Conflict for double booking attempts, 402 Payment Required for payment failures. Use idempotency keys on POST endpoints to allow safe retries. Version your API from day one (/v1/).
Search and property details are read-heavy. Use read replicas for the booking database to offload SELECT queries. The booking service writes to the primary, reads from replicas. Be aware of replication lag: a user might not see their own booking immediately after creation. Acceptable for most cases. For property images and static assets, use a CDN (CloudFront, Cloudflare). Cache images with long TTLs (1 year) and use cache busting with versioned URLs. For the search index, Elasticsearch handles high read throughput natively. Use multiple nodes and shard the index by property ID range.
ReadReplicas.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — SystemDesign tutorial
// Database configuration with read replicas
// Primary: handles writes (INSERT, UPDATE, DELETE)
// Replicas: handle reads (SELECT)
// Connection pool configuration (HikariCP example)
// Primary datasource
HikariConfig primaryConfig = newHikariConfig();
primaryConfig.setJdbcUrl("jdbc:postgresql://primary-host:5432/airbnb");
primaryConfig.setMaximumPoolSize(50);
// Replica datasource
HikariConfig replicaConfig = newHikariConfig();
replicaConfig.setJdbcUrl("jdbc:postgresql://replica-host:5432/airbnb");
replicaConfig.setMaximumPoolSize(200); // more connections for reads
// Use @Transactional(readOnly = true) for read-only methods
// Spring will route to replica datasource automatically if configured
Output
No direct output — configuration.
Production Trap: Stale Reads After Booking
If you read from replicas immediately after a booking, you might not see the new booking due to replication lag. Solution: after booking creation, read from primary for the next 1 second (or use session stickiness).
Monitoring and Alerting: What to Watch
You can't fix what you don't measure. Key metrics: booking success rate (target >99.9%), payment success rate, search latency (p99 <200ms), DB connection pool utilization, queue depth for payment events. Set up alerts: if booking success rate drops below 99%, page the on-call. If queue depth exceeds 1000, scale payment consumers. Use distributed tracing (Jaeger) to debug slow requests across services. Log every booking creation and payment event with correlation IDs.
Two guests checked into the same property on the same night. Both had confirmed bookings in the system.
Assumption
The team assumed the database's default READ COMMITTED isolation level was sufficient to prevent race conditions.
Root cause
Booking creation ran under READ COMMITTED isolation. Two concurrent transactions read the same availability row (both saw 'available'), then both inserted a booking. No locking or version check was in place. The application-level check 'if available then insert' had a race window.
Fix
Changed booking creation to use SERIALIZABLE isolation level. Added an availability_version column on the property_calendar table. The update statement includes WHERE version = :old_version. If zero rows affected, the booking fails and retries.
Key lesson
Never trust application-level checks for critical resources.
Use database-level optimistic locking or pessimistic locks for booking systems.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Double booking detected (two confirmed bookings for same property/dates)
→
Fix
1. Check booking table for overlapping entries. 2. Review application logs for concurrent booking creation. 3. Verify optimistic locking is implemented (version column). 4. Check transaction isolation level (should be SERIALIZABLE or use SELECT FOR UPDATE). 5. Add exclusion constraint if missing.
Symptom · 02
Search shows booked properties as available
→
Fix
1. Check CDC pipeline (Debezium) health. 2. Verify Kafka topic 'booking_events' has recent messages. 3. Check Elasticsearch index mapping for availability field. 4. Reduce cache TTL on search results. 5. Force reindex of affected property.
Symptom · 03
Payment processing stuck (queue growing)
→
Fix
1. Check payment service health and logs. 2. Verify Stripe API key and connectivity. 3. Check for rate limiting from Stripe. 4. Increase number of consumers. 5. Manually reprocess failed events from dead letter queue.
★ Design Airbnb Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Double booking — two users confirmed same property−
Immediate action
Check for overlapping bookings in DB
Commands
SELECT * FROM bookings WHERE property_id = 'X' AND status = 'confirmed' AND daterange(start_date, end_date, '[]') && daterange('2025-06-01', '2025-06-05', '[]');
SHOW transaction_isolation;
Fix now
Set isolation level to SERIALIZABLE and add version column to property_calendar.
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';
Fix now
Increase pool size in HikariCP config or optimize slow queries.
Aspect
Optimistic Locking
Pessimistic Locking
Concurrency handling
Version check on update; retry on conflict
SELECT ... FOR UPDATE; blocks other transactions
Performance under low contention
Fast, no blocking
Slower due to lock overhead
Performance under high contention
Many retries, can degrade
Better throughput, serialized access
Deadlock risk
Low (no locks held)
Possible if not careful with lock ordering
Use case
Low to medium contention properties
High contention (e.g., cheap popular listings)
Key takeaways
1
Never trust application-level checks for critical resources; use database-level locking or versioning to prevent double bookings.
2
Process payments asynchronously with idempotency keys to handle failures gracefully and avoid duplicate charges.
3
Use CDC (Debezium) to keep search index in sync with the booking database in near real-time.
4
Cache aggressively for reads, but never cache availability for booking decisions
always read from the source of truth with proper locking.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does your system handle a double booking attempt under concurrent lo...
Q02SENIOR
When would you choose optimistic locking over pessimistic locking for a ...
Q03SENIOR
What happens if the payment service crashes after charging the guest but...
Q04JUNIOR
What is the difference between a read replica and a cache? When would yo...
Q05SENIOR
A user reports that they booked a property, but the host says it's not a...
Q06SENIOR
How would you design the system to handle 10x the current traffic? What ...
Q01 of 06SENIOR
How does your system handle a double booking attempt under concurrent load? Walk through the database transaction.
ANSWER
The booking service uses SERIALIZABLE isolation. It reads availability with SELECT FOR UPDATE, then updates the property_calendar row with a version check: UPDATE ... SET is_available=false, version=version+1 WHERE property_id=? AND date BETWEEN ? AND ? AND version=?. If zero rows affected, it retries. This prevents two concurrent transactions from both succeeding.
Q02 of 06SENIOR
When would you choose optimistic locking over pessimistic locking for a booking system?
ANSWER
Optimistic locking for low to medium contention properties (most listings). Pessimistic locking for high-contention properties (e.g., a cheap popular apartment) where retries would be frequent and degrade UX. Pessimistic locking serializes access but reduces retries.
Q03 of 06SENIOR
What happens if the payment service crashes after charging the guest but before updating the booking status? How do you recover?
ANSWER
The payment event is in Kafka with an idempotency key. When the payment service restarts, it reprocesses the event. It checks the idempotency key against the payment table. If the charge was already made (Stripe charge ID stored), it skips the charge and updates the booking status. If the charge failed, it retries. This ensures exactly-once semantics.
Q04 of 06JUNIOR
What is the difference between a read replica and a cache? When would you use each?
ANSWER
A read replica is a copy of the database that serves read queries, reducing load on the primary. It's useful for queries that need fresh data (e.g., user's own bookings). A cache (Redis) stores frequently accessed data in memory with a TTL. It's faster but can serve stale data. Use cache for property details, read replicas for queries that need consistency.
Q05 of 06SENIOR
A user reports that they booked a property, but the host says it's not available. How do you debug?
ANSWER
Check the bookings table for overlapping confirmed bookings. Look for double booking incidents. Verify the exclusion constraint is in place. Check application logs for concurrent booking creation. If double booking occurred, the fix is to add optimistic locking and SERIALIZABLE isolation. Compensate the affected user.
Q06 of 06SENIOR
How would you design the system to handle 10x the current traffic? What bottlenecks would you address first?
ANSWER
Scale horizontally: add more instances of each microservice. Add read replicas for the database. Shard Elasticsearch index. Use CDN for static assets. The first bottleneck is usually the database: use connection pooling, optimize queries, add caching. Next is the search index: add more Elasticsearch nodes. Finally, ensure the message queue can handle the load by partitioning topics.
01
How does your system handle a double booking attempt under concurrent load? Walk through the database transaction.
SENIOR
02
When would you choose optimistic locking over pessimistic locking for a booking system?
SENIOR
03
What happens if the payment service crashes after charging the guest but before updating the booking status? How do you recover?
SENIOR
04
What is the difference between a read replica and a cache? When would you use each?
JUNIOR
05
A user reports that they booked a property, but the host says it's not available. How do you debug?
SENIOR
06
How would you design the system to handle 10x the current traffic? What bottlenecks would you address first?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
How do you prevent double bookings in Airbnb system design?
Use optimistic locking with a version column on the availability table. The update statement includes a WHERE version = old_version check. If the version changed, the update fails and the application retries. For high-contention properties, use pessimistic locking with SELECT FOR UPDATE.
Was this helpful?
02
What's the difference between optimistic and pessimistic locking for booking systems?
Optimistic locking assumes conflicts are rare and retries on failure. Pessimistic locking locks the row upfront, blocking other transactions. Use optimistic for most properties, pessimistic for high-contention ones where retries would be costly.
Was this helpful?
03
How do you handle payment failures in a booking system?
Use an async queue (Kafka) to process payments. The booking is created with status 'pending'. The payment service consumes the event, charges the guest with an idempotency key, and updates the booking status. If payment fails, the booking is cancelled and availability restored via a compensating transaction.
Was this helpful?
04
What database schema do you use for Airbnb's availability calendar?
A denormalized property_calendar table with one row per property per date. Columns: property_id, date, is_available, price, version. This allows fast range queries for availability checks. Use a composite index on (property_id, date).