PHP MongoDB — Silent Data Loss from Default Write Concern
MongoDB default {w:1} write concern causes silent data loss after replica set elections.
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
- PHP + MongoDB pairs PHP's dynamic typing with MongoDB's schemaless documents
- The mongodb/mongodb library wraps ext-mongodb C extension for near-native speed
- BSON serialisation handles type conversion between PHP types and MongoDB types
- A single document can hold up to 16MB, but keep it under 1MB for cursor performance
- Wrong index choice adds 200ms+ per query in production — use explain() before trusting
- Biggest mistake: ignoring write concern leads to silent data loss in replica sets
Imagine a traditional MySQL database is like a perfectly organised filing cabinet where every folder must follow the exact same template — same slots, same fields, no exceptions. MongoDB is like a collection of labeled shoeboxes where each box can hold whatever you want — photos, receipts, a rubber duck — and you can find anything fast because each box has a smart index on the outside. PHP is the person opening those boxes, reading what's inside, and deciding what to do next. When your data is irregular, fast-changing, or shaped differently for every user, the shoebox system wins hands down.
Most PHP applications start life with MySQL — structured, reliable, familiar. But somewhere around the time your product manager asks for 'flexible user profiles', 'nested product attributes', or 'activity feeds that look different for every user type', a relational schema starts to feel like wearing shoes two sizes too small. You spend more time writing ALTER TABLE migrations and LEFT JOIN acrobatics than you spend actually building features. That's not a MySQL problem — it's a data-shape mismatch problem, and MongoDB was built to solve it at scale.
MongoDB stores data as BSON documents — Binary JSON objects that can nest arrays, sub-documents, and mixed types without a schema police officer stopping you at the door. PHP connects to it via the official mongodb/mongodb Composer package, which wraps the low-level ext-mongodb C extension. This split architecture (C extension for raw performance, PHP library for ergonomic developer experience) means you get near-native speed without writing a single line of C. The driver handles connection pooling, BSON serialisation, cursor streaming, and write concern negotiation transparently.
By the end of this article you'll know how to wire up PHP to MongoDB correctly in production, write efficient CRUD operations and aggregation pipelines, design indexes that don't destroy your write throughput, and dodge the six most painful production mistakes that nobody warns you about until your on-call phone rings at 2am.
What is PHP and MongoDB?
At its heart, PHP+MongoDB means using PHP to interact with a document-oriented NoSQL database. You store data as BSON documents — think JSON but with strongly-typed fields. The PHP driver handles the serialisation and deserialisation so you work with native PHP arrays or objects. This pairing shines when your data model changes frequently, when you need to embed related data directly, or when you need high write throughput. For example, a user profile with address history and preferences can be stored in a single document instead of normalising across three tables. That's not just convenience — it eliminates expensive joins and makes read operations 10x faster in many cases.
Here's how a basic query looks in PHP:
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Client;
$client = new Client(getenv('MONGODB_URI')); $collection = $client->selectDatabase('ecommerce')->selectCollection('products');
$product = $collection->findOne(['sku' => 'PHONE-001']); echo $product['name'] . ' - $' . $product['price']; ```
Notice you're working with arrays directly — no schema definition, no migration, no ORM configuration. That's the speed advantage that matters when your data shape changes weekly.
Driver Installation and Connection Setup
Getting PHP to talk to MongoDB involves two components: the C extension (ext-mongodb) and the PHP library (mongodb/mongodb). Install both via PECL and Composer:
``bash pecl install mongodb echo "extension=mongodb.so" >> php.ini composer require mongodb/mongodb ``
Then create a connection manager. Never hardcode credentials — use environment variables.
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Client; use MongoDB\Driver\Manager;
class ConnectionManager { private static ?Client $client = null;
public static function getClient(): Client { if (self::$client === null) { $uri = getenv('MONGODB_URI') ?: 'mongodb://localhost:27017'; $uriOptions = [ 'readPreference' => 'secondaryPreferred', 'w' => 'majority', 'journal' => true, 'connectTimeoutMS' => 3000, 'socketTimeoutMS' => 10000 ]; $driverOptions = [ 'typeMap' => ['root' => 'array', 'document' => 'array', 'array' => 'array'], ]; self::$client = new Client($uri, $uriOptions, $driverOptions); } return self::$client; } } ```
The typeMap option converts BSON documents to PHP arrays instead of stdClass objects — faster and less error-prone. The readPreference lets read queries hit secondaries, offloading the primary.
One more thing: always set a reasonable connectTimeoutMS (like 3000ms) and socketTimeoutMS (like 10000ms) in your uriOptions. Without these, a MongoDB node that's slow to respond can hang your PHP process indefinitely. Trust me, you'll learn this the hard way when your FPM workers pile up waiting for a dead secondary.
maxPoolSize in uriOptions:
'maxPoolSize' => 200. Also monitor db.serverStatus().connections on MongoDB.CRUD Operations with Documents
MongoDB documents are BSON objects with nested structures. PHP's mongodb/mongodb library maps PHP arrays to BSON automatically. But watch out for type conversions — timestamps, ObjectIds, and large integers are common traps.
Insert: Pass an array with keys. Use ['_id' => new MongoDB\BSON\ only if you need client-generated IDs. Otherwise let MongoDB handle it.ObjectId()]
Find: The find() method returns a MongoDB\Collection object that iterates lazily. Use ->toArray() for small result sets, cursor iteration for large ones. Always pass projection to limit fields returned over the wire.
Update: Use updateOne() or updateMany() with atomic operators like $set, $unset, $inc. Never read-modify-write — that's a race condition in disguise.
Delete: deleteOne() and deleteMany() are final. Use them sparingly; mark documents as deleted: true instead for recoverability.
Example with error handling and bulk write:
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Collection; use MongoDB\Driver\Exception\RuntimeException;
class UserRepository { private Collection $collection;
public function { $this->collection = ConnectionManager::getClient() ->selectDatabase('myapp') ->selectCollection('users'); }__construct()
public function updateEmail(string $userId, string $newEmail): bool { try { $result = $this->collection->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($userId)], ['$set' => ['email' => $newEmail]] ); return $result->getModifiedCount() === 1; } catch (RuntimeException $e) { error_log("MongoDB update failed: " . $e->getMessage()); return false; } }
public function batchUpdateStatus(array $userIds, string $status): int { $operations = []; foreach ($userIds as $id) { $operations[] = [ 'updateOne' => [ ['_id' => new \MongoDB\BSON\ObjectId($id)], ['$set' => ['status' => $status]] ] ]; } $result = $this->collection->bulkWrite($operations, ['ordered' => false]); return $result->getModifiedCount(); } } ```
bulkWrite with ordered => false runs operations in parallel — use it for batch jobs where order doesn't matter. For account transfers, keep ordered => true to maintain sequence.
One more thing: findOneAndUpdate is your atomic read-modify-write tool. It returns the updated document and guarantees no other process snuck in between. Perfect for counters, reservation systems, and queue pop operations.
- updateOne() applies changes to the first matching document
- updateMany() applies changes to all matching documents
- Use $set to change specific fields, $unset to remove fields
- Use $inc to safely increment/decrement counters
- Never read-then-write — replace with update or findAndModify
find() followed by updateOne() based on condition — that's a race condition.find() to reduce network payload.Aggregation Pipeline: Powerful Data Processing
MongoDB's aggregation pipeline processes documents through a sequence of stages: $match, $group, $sort, $project, $lookup, $unwind, etc. Each stage transforms the document stream, and the pipeline is executed in memory unless allowDiskUse is set.
$lookupwithout indexes on the foreign collection (adds seconds per query)$unwindon large arrays followed by$group(creates massive intermediate results)$sortwithout index at the start (forces all documents into memory)$matchafter$group(filters reduced data, wastes early elimination)
Best practice: Place $match as early as possible. Use $project to drop unused fields. For expensive $lookup, consider denormalizing data if the foreign collection is mostly static.
Example: Aggregate orders by status with total value:
``php $pipeline = [ ['$match' => ['created_at' => ['$gte' => new \MongoDB\BSON\UTCDateTime($startOfDay)]]], ['$group' => [ '_id' => '$status', 'count' => ['$sum' => 1], 'totalValue' => ['$sum' => '$total_amount'] ]], ['$sort' => ['totalValue' => -1]] ]; $orders = $collection->aggregate($pipeline, ['allowDiskUse' => true, 'maxTimeMS' => 5000]); foreach ($orders as $order) { echo "{$order['_id']}: {$order['count']} orders, total \${$order['totalValue']} "; } ``
Note: allowDiskUse is necessary for large datasets — otherwise MongoDB may throw a 16819 error when memory limit is exceeded.
A common trap with $lookup is forgetting that the localField value is taken as-is, so if it's a string, the foreign field must also be a string — not an ObjectId. This mismatch causes silent empty results and hours of debugging. Always check types in the explain output.
$lookup across sharded collections triggers scatter-gather — each shard executes the lookup independently, and results are merged by the primary shard. This can be extremely slow. If possible, use $lookup only on unsharded or small collections. Alternatively, duplicate data instead of joining.{foreignKey: 1} in the foreign collection before deploying pipelines with $lookup.Indexing Strategy and Production Gotchas
Indexes in MongoDB are B-trees, similar to relational databases. But compound index prefix rules, query shape sensitivity, and the impact on write throughput differ.
Single field index: createIndex(['email' => 1]) — for exact matches or sort on email.
Compound index: createIndex(['status' => 1, 'created_at' => -1]) — supports queries on status, or status+created_at, but NOT on created_at alone.
Text index: createIndex(['description' => 'text']) — for full-text search with $text queries. Only one text index per collection.
Production pitfalls: 1. Too many indexes on a write-heavy collection -> each insert/update touches every index, doubling write time. 2. Unused indexes: use $indexStats to identify dead indexes. 3. Overusing to force an index -> query optimizer disabled. 4. Not using hint()dropIndexes() during migrations; building new indexes on large collections blocks read operations.
Use before deploying any query to production:explain()
``php $cursor = $collection->find( ['status' => 'active', 'created_at' => ['$gte' => $date]], ['projection' => ['_id' => 0, 'email' => 1]] ); $explain = $collection->find( ['status' => 'active', 'created_at' => ['$gte' => $date]], ['projection' => ['_id' => 0, 'email' => 1], 'explain' => true] ); ``
The explain output shows whether a COLLSCAN (collection scan) or IXSCAN (index scan) is used. Aim for IXSCAN with low totalDocsExamined relative to nReturned.
One more gotcha: MongoDB's query planner caches plans. If a suboptimal plan gets cached because of a skewed data distribution, you'll see slow queries even with the right indexes. Run db.collection.getPlanCache().clear() occasionally to force re-evaluation.
{background: true} (default in MongoDB 4.2+) to allow reads/writes during build. For replica sets, build on secondary first, then step it up to primary.Explain() every query before production.Transactions, Write Concerns, and Data Integrity
MongoDB supports multi-document ACID transactions since version 4.0. But they come with caveats: they only work on replica sets, have a 60-second default timeout, and should not be used for high-throughput operations that can be done with atomic operators.
Write concern controls durability. The default w: 1 acknowledges after primary memory write — not durable. w: majority waits for majority of replica set members. Adding j: true forces journal commit before acknowledgment.
Read concerns: local (default) returns data that may be rolled back. majority returns only committed data. Use linearizable only when you need absolute latest data, but it's slow.
- Multiple documents that must be updated together (e.g., funds transfer)
- Cross-collection consistency
- Not for simple counters or array pushes — those are atomic anyway.
Example transaction with retry logic:
```php <?php namespace Io\TheCodeForge\MongoDb;
$client = ConnectionManager::getClient(); $maxRetries = 3; $retry = 0;
do { $session = $client->startSession(); $session->startTransaction([ 'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::SNAPSHOT), 'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY) ]); try { $accounts = $client->selectDatabase('bank')->selectCollection('accounts'); $accounts->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($fromId)], ['$inc' => ['balance' => -100]], ['session' => $session] ); $accounts->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($toId)], ['$inc' => ['balance' => 100]], ['session' => $session] ); $session->commitTransaction(); break; } catch (\MongoDB\Driver\Exception\CommandException $e) { $session->abortTransaction(); if ($e->hasErrorLabel('TransientTransactionError') && ++$retry < $maxRetries) { usleep(100000); // 100ms backoff continue; } throw $e; } } while (true); ```
Performance: transactions add 20-50ms overhead due to conflict detection. Use them sparingly.
If you're on MongoDB 4.2+, you can also use distributed transactions across sharded clusters. But the latency penalty is higher — expect 100-200ms per transaction. Only do this when the business requirement genuinely demands cross-shard atomicity.
- startSession() creates a logical session
- startTransaction() opens a transaction within that session
- All operations must include the session option
- commitTransaction() writes all changes atomically
- abortTransaction() undoes all changes in that transaction
Change Streams: Real-Time Data Feeds with PHP
MongoDB change streams allow you to subscribe to real-time changes on a collection, database, or entire deployment. They're built on the oplog and provide a reliable, ordered stream of events. In PHP, you use the MongoDB\Collection::watch() method to start listening.
- Real-time dashboards that update when data changes
- Cross-service synchronisation (e.g., push changes to Elasticsearch)
- Audit logging of all modifications
- Cache invalidation triggers
Example: Watch for new orders with resume token persistence:
```php <?php namespace Io\TheCodeForge\MongoDb;
$orderCollection = ConnectionManager::getClient() ->selectDatabase('ecommerce') ->selectCollection('orders');
$resumeToken = $metadataCollection->findOne(['_id' => 'changeStreamResume'])['token'] ?? null; $options = ['maxAwaitTimeMS' => 1000]; if ($resumeToken) { $options['startAfter'] = $resumeToken; }
$pipeline = [['$match' => ['operationType' => 'insert']]]; $cursor = $orderCollection->watch($pipeline, $options);
foreach ($cursor as $event) { $order = $event['fullDocument']; // Send to a queue, update cache, etc. echo "New order: " . $order['_id'] . PHP_EOL; // Persist resume token $metadataCollection->updateOne( ['_id' => 'changeStreamResume'], ['$set' => ['token' => $event['_id']]], ['upsert' => true] ); } ```
- Change streams require a replica set (not a standalone).
- The cursor blocks waiting for events. Use
maxAwaitTimeMSto control polling interval. - If the cursor is idle for too long (default 10 minutes), MongoDB may close it. Reconnect gracefully.
$fullDocument: 'updateLookup'option gives you the full document after an update, but it adds a round-trip per event.- In sharded clusters, change streams are ordered within each shard but not globally. Use
startAfterto resume from a specific point after a crash.
Change streams don't work with transactions — you can't see uncommitted changes. That's fine, because you only want to react to committed data anyway.
_id field of the last processed event and pass it as startAfter when restarting the stream. This ensures exactly-once processing semantics — critical for payment integrations.startAfter on restart.Replica Set Configuration and Failover Handling
A MongoDB replica set provides automatic failover and data redundancy. The PHP driver discovers replica set members from the connection string and routes writes to the primary. But proper configuration is critical — missteps can cause silent failover delays or split-brain scenarios.
Connection string: Use the SRV format for automatic discovery: mongodb+srv://user:pass@cluster0.example.mongodb.net/db?retryWrites=true&w=majority. This tells the driver to query DNS for all replica set members.
Read preference: Set to secondaryPreferred for read-heavy workloads, but be aware of stale reads. If you read-after-write, use primary or enable causal consistency.
Retry writes: Always enable retryWrites=true in the connection string. This re-runs write operations if the primary steps down during a write. It doesn't replace write concern, but adds resilience.
Failover test script:
```php <?php namespace Io\TheCodeForge\MongoDb;
$uri = 'mongodb+srv://user:pass@cluster0.example.mongodb.net/db?retryWrites=true&w=majority'; $client = new MongoDB\Client($uri);
$collection = $client->selectDatabase('test')->selectCollection('failover');
// Step down the primary (requires admin) $admin = $client->selectDatabase('admin'); $admin->command(['replSetStepDown' => 60, 'force' => true]);
// Try to write immediately — should retry automatically $collection->insertOne(['test' => 'failover', 'ts' => new MongoDB\BSON\UTCDateTime()]); echo "Write succeeded after failover. "; ```
Always test failover in a staging environment. Simulate primary step-down, network partitions, and secondary lag before going live.
Handling Schema Migrations Without Downtime
You shipped MongoDB because you thought 'schemaless' meant you'd never touch migrations again. Wrong. Schemaless just means the database won't scream at you, but your application will silently corrupt data when you add a field that old documents don't have. Real downtime happens when a new code path expects a field that doesn't exist in production documents from six months ago.
The fix: write defensive read paths with $exists checks or use MongoDB's schema validation with $jsonSchema. But the real trick is incremental migrations. Never update all documents at once. Run a background script that iterates with a cursor noBatchSize and a write concern of acknowledged. Add the new field with a default, then update your application to read both old and new shapes. Only after you've confirmed zero read errors do you drop the fallback logic. This is how you avoid a 3 AM outage because your aggregation pipeline exploded on a null field that doesn't exit gracefully.
Time-Series Data: Writes That Won't Burn Your IO Budget
MongoDB time-series collections are not a magic bullet. They're a specialized tool that forces you to think about data granularity before you write a single line of PHP. The default approach—write one document per event per second—will destroy your write throughput and blow up your storage costs. Why? Because every write triggers an index update, and with millions of small documents, you're paying for metadata overhead.
The right move: bucket your data. MongoDB time-series collections do this automatically, but only if you define the metaField and granularity correctly. Set granularity to 'seconds' if you're logging IoT sensor data every tick. Use 'minutes' for aggregated metrics like API response times. If you don't, MongoDB will guess and likely get it wrong, forcing your queries to scan more buckets than necessary. Then you'll wonder why your $avg aggregation on the last hour takes eight seconds.
Here's the pattern: group your measurements by sensor_id as the metaField, then let MongoDB's internal bucketing compress writes. Your PHP code just inserts flat documents—MongoDB handles the compression. But remember: a poorly chosen granularity means your data stays uncompressed. Profiling your query patterns before picking granularity saves you a rewrite.
collMod options. But never omit it; default will burn you.Bulletproof Error Handling: Don't Let a Dead MongoDB Takedown Your App
Production MongoDB goes down. Drivers throw exceptions. Network partitions happen. The difference between a senior dev and a junior is how gracefully you handle that moment. You catch, log, circuit-break, and retry — not bury errors in a try-catch with a generic 'something went wrong'.
MongoDB PHP driver throws MongoDB\Driver\Exception\ConnectionTimeoutException when it can't reach the server. Your code must treat this as a first-class failure path. Implement exponential backoff. Use a health-check loop before touching the database. Log the full error context including server host and operation.
Don't let a missing database bring down your entire page. Catch specific exceptions, present a stale cache, and alert ops. Your users forgive a stale dashboard. They don't forgive a 500 error.
\Exception swallows bugs like syntax errors. Always catch the most specific MongoDB exception type first. Let unexpected exceptions bubble up to your global exception handler—don't silence them in a query function.Bulk Writes at Scale: Batch Your Operations or Eat the Latency
Sending one INSERT per document kills throughput. MongoDB PHP driver gives you bulkWrite(). Use it. A single network round-trip can insert 10,000 documents instead of 10,000 separate calls. That's the difference between 2 seconds and 2 minutes.
Bulk writes support inserts, updates, deletes in one batch. Ordered mode stops on first error — useful for sequential operations. Unordered mode runs all operations even if some fail — perfect for loading logs where a few duplicates don't matter.
Batch size matters. MongoDB can't handle an infinite payload. Cap your batches at 10,000 operations or 16MB of data per batch, whichever hits first. Split your array into chunks. Use in PHP to keep your memory sane.array_chunk()
Why wait for each document? Fire a bulk write and move on. Your API response times will thank you.
ordered => false for high-throughput ingestion where order doesn't matter. It avoids a single bad document blocking the entire batch. Error handling is on you — check getWriteErrors() on the result object after each batch.The Silent Write Failure: When MongoDB Accepts Your Data and Then Loses It
$collection->insertOne($doc, ['writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY)]);. For high-durability workloads, also enable journaling: j: true.- Never trust default write concern for business-critical data.
- Always set w: majority and j: true for financial or user-signup data.
- Test failover scenarios before going to production — don't learn this during an outage.
- Use retryWrites=true to automatically re-attempt writes after a primary stepdown — it's not a substitute for proper write concern, but it reduces the window for data loss.
telnet <host> 27017. Verify firewall rules. Ensure MongoDB is running and reachable. Check connection string for typos. Test with mongosh --host <host> from the app server.security.authorization: enabled).db.collection.explain('executionStats').find(...) to check if MongoDB is using the expected index. Look for COLLSCAN as a sign of missing index. Check for query patterns that don't match index prefix. Use hint() to force an index temporarily.Please reconnect or use maxTimeMS``maxTimeMS on the command. Use batchSize to control each batch. Avoid processing large datasets one document at a time — batch updates instead.new MongoDB\BSON\UTCDateTime() for dates. Manually convert with MongoDB\BSON\fromPHP() and debug.mongosh --host <host> --port 27017telnet <host> 27017Key takeaways
Common mistakes to avoid
7 patternsNot setting write concern to majority
Using fine-grained collections instead of embedded documents
Ignoring the 16MB document size limit
Relying on the default read preference 'primary' for read-heavy workloads
Not using BSON type classes for dates and ObjectIds
Overusing transactions for single-document updates
Forgetting to set socket and connection timeouts
Interview Questions on This Topic
How do you handle schema evolution in MongoDB when using PHP?
phone was added, missing it returns null instead of an error. For complex migrations, write a script that iterates through documents and updates them using bulkWrite with $set. Never use findAndModify for bulk updates — it processes one document at a time.
``php
$bulk = [];
foreach ($collection->find(['schema_version' => 1]) as $doc) {
$bulk[] = [
'updateOne' => [
['_id' => $doc['_id']],
['$set' => ['schema_version' => 2, 'phone' => null]]
]
];
if (count($bulk) >= 100) {
$collection->bulkWrite($bulk);
$bulk = [];
}
}
if ($bulk) $collection->bulkWrite($bulk);
``
Always test migrations on a replica set member off the critical path.Frequently Asked Questions
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
That's PHP & MySQL. Mark it forged?
13 min read · try the examples if you haven't