PHP MongoDB — Silent Data Loss from Default Write Concern
MongoDB default {w:1} write concern causes silent data loss after replica set elections.
- PHP + MongoDB pairs PHP's dynamic typing with MongoDB's schemaless documents
- The mongodb/mongodb library wraps ext-mongodb C extension for near-native speed
- BSON serialisation handles type conversion between PHP types and MongoDB types
- A single document can hold up to 16MB, but keep it under 1MB for cursor performance
- Wrong index choice adds 200ms+ per query in production — use explain() before trusting
- Biggest mistake: ignoring write concern leads to silent data loss in replica sets
Most PHP applications start life with MySQL — structured, reliable, familiar. But somewhere around the time your product manager asks for 'flexible user profiles', 'nested product attributes', or 'activity feeds that look different for every user type', a relational schema starts to feel like wearing shoes two sizes too small. You spend more time writing ALTER TABLE migrations and LEFT JOIN acrobatics than you spend actually building features. That's not a MySQL problem — it's a data-shape mismatch problem, and MongoDB was built to solve it at scale.
MongoDB stores data as BSON documents — Binary JSON objects that can nest arrays, sub-documents, and mixed types without a schema police officer stopping you at the door. PHP connects to it via the official mongodb/mongodb Composer package, which wraps the low-level ext-mongodb C extension. This split architecture (C extension for raw performance, PHP library for ergonomic developer experience) means you get near-native speed without writing a single line of C. The driver handles connection pooling, BSON serialisation, cursor streaming, and write concern negotiation transparently.
By the end of this article you'll know how to wire up PHP to MongoDB correctly in production, write efficient CRUD operations and aggregation pipelines, design indexes that don't destroy your write throughput, and dodge the six most painful production mistakes that nobody warns you about until your on-call phone rings at 2am.
What is PHP and MongoDB?
At its heart, PHP+MongoDB means using PHP to interact with a document-oriented NoSQL database. You store data as BSON documents — think JSON but with strongly-typed fields. The PHP driver handles the serialisation and deserialisation so you work with native PHP arrays or objects. This pairing shines when your data model changes frequently, when you need to embed related data directly, or when you need high write throughput. For example, a user profile with address history and preferences can be stored in a single document instead of normalising across three tables. That's not just convenience — it eliminates expensive joins and makes read operations 10x faster in many cases.
Here's how a basic query looks in PHP:
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Client;
$client = new Client(getenv('MONGODB_URI')); $collection = $client->selectDatabase('ecommerce')->selectCollection('products');
$product = $collection->findOne(['sku' => 'PHONE-001']); echo $product['name'] . ' - $' . $product['price']; ```
Notice you're working with arrays directly — no schema definition, no migration, no ORM configuration. That's the speed advantage that matters when your data shape changes weekly.
Driver Installation and Connection Setup
Getting PHP to talk to MongoDB involves two components: the C extension (ext-mongodb) and the PHP library (mongodb/mongodb). Install both via PECL and Composer:
``bash pecl install mongodb echo "extension=mongodb.so" >> php.ini composer require mongodb/mongodb ``
Then create a connection manager. Never hardcode credentials — use environment variables.
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Client; use MongoDB\Driver\Manager;
class ConnectionManager { private static ?Client $client = null;
public static function getClient(): Client { if (self::$client === null) { $uri = getenv('MONGODB_URI') ?: 'mongodb://localhost:27017'; $uriOptions = [ 'readPreference' => 'secondaryPreferred', 'w' => 'majority', 'journal' => true, 'connectTimeoutMS' => 3000, 'socketTimeoutMS' => 10000 ]; $driverOptions = [ 'typeMap' => ['root' => 'array', 'document' => 'array', 'array' => 'array'], ]; self::$client = new Client($uri, $uriOptions, $driverOptions); } return self::$client; } } ```
The typeMap option converts BSON documents to PHP arrays instead of stdClass objects — faster and less error-prone. The readPreference lets read queries hit secondaries, offloading the primary.
One more thing: always set a reasonable connectTimeoutMS (like 3000ms) and socketTimeoutMS (like 10000ms) in your uriOptions. Without these, a MongoDB node that's slow to respond can hang your PHP process indefinitely. Trust me, you'll learn this the hard way when your FPM workers pile up waiting for a dead secondary.
CRUD Operations with Documents
MongoDB documents are BSON objects with nested structures. PHP's mongodb/mongodb library maps PHP arrays to BSON automatically. But watch out for type conversions — timestamps, ObjectIds, and large integers are common traps.
Insert: Pass an array with keys. Use ['_id' => new MongoDB\BSON\ only if you need client-generated IDs. Otherwise let MongoDB handle it.ObjectId()]
Find: The find() method returns a MongoDB\Collection object that iterates lazily. Use ->toArray() for small result sets, cursor iteration for large ones. Always pass projection to limit fields returned over the wire.
Update: Use updateOne() or updateMany() with atomic operators like $set, $unset, $inc. Never read-modify-write — that's a race condition in disguise.
Delete: deleteOne() and deleteMany() are final. Use them sparingly; mark documents as deleted: true instead for recoverability.
Example with error handling and bulk write:
```php <?php namespace Io\TheCodeForge\MongoDb;
use MongoDB\Collection; use MongoDB\Driver\Exception\RuntimeException;
class UserRepository { private Collection $collection;
public function { $this->collection = ConnectionManager::getClient() ->selectDatabase('myapp') ->selectCollection('users'); }__construct()
public function updateEmail(string $userId, string $newEmail): bool { try { $result = $this->collection->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($userId)], ['$set' => ['email' => $newEmail]] ); return $result->getModifiedCount() === 1; } catch (RuntimeException $e) { error_log("MongoDB update failed: " . $e->getMessage()); return false; } }
public function batchUpdateStatus(array $userIds, string $status): int { $operations = []; foreach ($userIds as $id) { $operations[] = [ 'updateOne' => [ ['_id' => new \MongoDB\BSON\ObjectId($id)], ['$set' => ['status' => $status]] ] ]; } $result = $this->collection->bulkWrite($operations, ['ordered' => false]); return $result->getModifiedCount(); } } ```
bulkWrite with ordered => false runs operations in parallel — use it for batch jobs where order doesn't matter. For account transfers, keep ordered => true to maintain sequence.
One more thing: findOneAndUpdate is your atomic read-modify-write tool. It returns the updated document and guarantees no other process snuck in between. Perfect for counters, reservation systems, and queue pop operations.
Aggregation Pipeline: Powerful Data Processing
MongoDB's aggregation pipeline processes documents through a sequence of stages: $match, $group, $sort, $project, $lookup, $unwind, etc. Each stage transforms the document stream, and the pipeline is executed in memory unless allowDiskUse is set.
$lookupwithout indexes on the foreign collection (adds seconds per query)$unwindon large arrays followed by$group(creates massive intermediate results)$sortwithout index at the start (forces all documents into memory)$matchafter$group(filters reduced data, wastes early elimination)
Best practice: Place $match as early as possible. Use $project to drop unused fields. For expensive $lookup, consider denormalizing data if the foreign collection is mostly static.
Example: Aggregate orders by status with total value:
``php $pipeline = [ ['$match' => ['created_at' => ['$gte' => new \MongoDB\BSON\UTCDateTime($startOfDay)]]], ['$group' => [ '_id' => '$status', 'count' => ['$sum' => 1], 'totalValue' => ['$sum' => '$total_amount'] ]], ['$sort' => ['totalValue' => -1]] ]; $orders = $collection->aggregate($pipeline, ['allowDiskUse' => true, 'maxTimeMS' => 5000]); foreach ($orders as $order) { echo "{$order['_id']}: {$order['count']} orders, total \${$order['totalValue']} "; } ``
Note: allowDiskUse is necessary for large datasets — otherwise MongoDB may throw a 16819 error when memory limit is exceeded.
A common trap with $lookup is forgetting that the localField value is taken as-is, so if it's a string, the foreign field must also be a string — not an ObjectId. This mismatch causes silent empty results and hours of debugging. Always check types in the explain output.
Indexing Strategy and Production Gotchas
Indexes in MongoDB are B-trees, similar to relational databases. But compound index prefix rules, query shape sensitivity, and the impact on write throughput differ.
Single field index: createIndex(['email' => 1]) — for exact matches or sort on email.
Compound index: createIndex(['status' => 1, 'created_at' => -1]) — supports queries on status, or status+created_at, but NOT on created_at alone.
Text index: createIndex(['description' => 'text']) — for full-text search with $text queries. Only one text index per collection.
Production pitfalls: 1. Too many indexes on a write-heavy collection -> each insert/update touches every index, doubling write time. 2. Unused indexes: use $indexStats to identify dead indexes. 3. Overusing to force an index -> query optimizer disabled. 4. Not using hint()dropIndexes() during migrations; building new indexes on large collections blocks read operations.
Use before deploying any query to production:explain()
``php $cursor = $collection->find( ['status' => 'active', 'created_at' => ['$gte' => $date]], ['projection' => ['_id' => 0, 'email' => 1]] ); $explain = $collection->find( ['status' => 'active', 'created_at' => ['$gte' => $date]], ['projection' => ['_id' => 0, 'email' => 1], 'explain' => true] ); ``
The explain output shows whether a COLLSCAN (collection scan) or IXSCAN (index scan) is used. Aim for IXSCAN with low totalDocsExamined relative to nReturned.
One more gotcha: MongoDB's query planner caches plans. If a suboptimal plan gets cached because of a skewed data distribution, you'll see slow queries even with the right indexes. Run db.collection.getPlanCache().clear() occasionally to force re-evaluation.
Transactions, Write Concerns, and Data Integrity
MongoDB supports multi-document ACID transactions since version 4.0. But they come with caveats: they only work on replica sets, have a 60-second default timeout, and should not be used for high-throughput operations that can be done with atomic operators.
Write concern controls durability. The default w: 1 acknowledges after primary memory write — not durable. w: majority waits for majority of replica set members. Adding j: true forces journal commit before acknowledgment.
Read concerns: local (default) returns data that may be rolled back. majority returns only committed data. Use linearizable only when you need absolute latest data, but it's slow.
- Multiple documents that must be updated together (e.g., funds transfer)
- Cross-collection consistency
- Not for simple counters or array pushes — those are atomic anyway.
Example transaction with retry logic:
```php <?php namespace Io\TheCodeForge\MongoDb;
$client = ConnectionManager::getClient(); $maxRetries = 3; $retry = 0;
do { $session = $client->startSession(); $session->startTransaction([ 'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::SNAPSHOT), 'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY) ]); try { $accounts = $client->selectDatabase('bank')->selectCollection('accounts'); $accounts->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($fromId)], ['$inc' => ['balance' => -100]], ['session' => $session] ); $accounts->updateOne( ['_id' => new \MongoDB\BSON\ObjectId($toId)], ['$inc' => ['balance' => 100]], ['session' => $session] ); $session->commitTransaction(); break; } catch (\MongoDB\Driver\Exception\CommandException $e) { $session->abortTransaction(); if ($e->hasErrorLabel('TransientTransactionError') && ++$retry < $maxRetries) { usleep(100000); // 100ms backoff continue; } throw $e; } } while (true); ```
Performance: transactions add 20-50ms overhead due to conflict detection. Use them sparingly.
If you're on MongoDB 4.2+, you can also use distributed transactions across sharded clusters. But the latency penalty is higher — expect 100-200ms per transaction. Only do this when the business requirement genuinely demands cross-shard atomicity.
Change Streams: Real-Time Data Feeds with PHP
MongoDB change streams allow you to subscribe to real-time changes on a collection, database, or entire deployment. They're built on the oplog and provide a reliable, ordered stream of events. In PHP, you use the MongoDB\Collection::watch() method to start listening.
- Real-time dashboards that update when data changes
- Cross-service synchronisation (e.g., push changes to Elasticsearch)
- Audit logging of all modifications
- Cache invalidation triggers
Example: Watch for new orders with resume token persistence:
```php <?php namespace Io\TheCodeForge\MongoDb;
$orderCollection = ConnectionManager::getClient() ->selectDatabase('ecommerce') ->selectCollection('orders');
$resumeToken = $metadataCollection->findOne(['_id' => 'changeStreamResume'])['token'] ?? null; $options = ['maxAwaitTimeMS' => 1000]; if ($resumeToken) { $options['startAfter'] = $resumeToken; }
$pipeline = [['$match' => ['operationType' => 'insert']]]; $cursor = $orderCollection->watch($pipeline, $options);
foreach ($cursor as $event) { $order = $event['fullDocument']; // Send to a queue, update cache, etc. echo "New order: " . $order['_id'] . PHP_EOL; // Persist resume token $metadataCollection->updateOne( ['_id' => 'changeStreamResume'], ['$set' => ['token' => $event['_id']]], ['upsert' => true] ); } ```
- Change streams require a replica set (not a standalone).
- The cursor blocks waiting for events. Use
maxAwaitTimeMSto control polling interval. - If the cursor is idle for too long (default 10 minutes), MongoDB may close it. Reconnect gracefully.
$fullDocument: 'updateLookup'option gives you the full document after an update, but it adds a round-trip per event.- In sharded clusters, change streams are ordered within each shard but not globally. Use
startAfterto resume from a specific point after a crash.
Change streams don't work with transactions — you can't see uncommitted changes. That's fine, because you only want to react to committed data anyway.
Replica Set Configuration and Failover Handling
A MongoDB replica set provides automatic failover and data redundancy. The PHP driver discovers replica set members from the connection string and routes writes to the primary. But proper configuration is critical — missteps can cause silent failover delays or split-brain scenarios.
Connection string: Use the SRV format for automatic discovery: mongodb+srv://user:pass@cluster0.example.mongodb.net/db?retryWrites=true&w=majority. This tells the driver to query DNS for all replica set members.
Read preference: Set to secondaryPreferred for read-heavy workloads, but be aware of stale reads. If you read-after-write, use primary or enable causal consistency.
Retry writes: Always enable retryWrites=true in the connection string. This re-runs write operations if the primary steps down during a write. It doesn't replace write concern, but adds resilience.
Failover test script:
```php <?php namespace Io\TheCodeForge\MongoDb;
$uri = 'mongodb+srv://user:pass@cluster0.example.mongodb.net/db?retryWrites=true&w=majority'; $client = new MongoDB\Client($uri);
$collection = $client->selectDatabase('test')->selectCollection('failover');
// Step down the primary (requires admin) $admin = $client->selectDatabase('admin'); $admin->command(['replSetStepDown' => 60, 'force' => true]);
// Try to write immediately — should retry automatically $collection->insertOne(['test' => 'failover', 'ts' => new MongoDB\BSON\UTCDateTime()]); echo "Write succeeded after failover. "; ```
Always test failover in a staging environment. Simulate primary step-down, network partitions, and secondary lag before going live.
| Aspect | PHP + MongoDB | MySQL + ORM (e.g., Doctrine) |
|---|---|---|
| Schema flexibility | Schemaless — add fields on the fly | Requires migrations to change schema |
| Joins | No native joins; use $lookup in aggregation (expensive) | Native joins via SQL, optimizer-driven |
| Query language | MQL (MongoDB Query Language) as PHP arrays | SQL, wrapped by ORM DQL |
| Transactions | Multi-document transactions since 4.0 (limited overhead) | Mature ACID transactions |
| Indexing | B-tree, text, geospatial, TTL indexes | B-tree, full-text, spatial (varies by engine) |
| PHP integration | Mongodb/mongodb library, ext-mongodb C extension | PDO/MySQLi or ORM libs |
| Best for | Rapidly evolving schemas, nested data, high write throughput | Complex relationships, reporting, strong consistency |
Key Takeaways
- MongoDB schemaless model fits flexible data but requires discipline in versioning and index design.
- PHP's MongoDB driver performance depends on connection pooling — adjust maxPoolSize per FPM concurrency.
- Use writeConcern majority and journal true for critical writes; test failover scenarios.
- Aggregation pipelines: place $match first, index foreign keys for $lookup, set allowDiskUse and maxTimeMS.
- Explain every query before production — look for IXSCAN, avoid COLLSCAN.
- Limit indexes on write-heavy collections and monitor usage with $indexStats.
- Use atomic operators instead of transactions for single-document updates; transactions are for multi-document consistency only.
- Change streams need replica sets and resume token persistence for reliable real-time feeds.
- Always set retryWrites=true and timeouts in connection options to handle outages gracefully.
Common Mistakes to Avoid
- Not setting write concern to majority
Symptom: Data loss after replica set failover — writes appear successful but vanish when primary goes down.
Fix: Set w: 'majority' as default in the connection URI options, and j: true for critical data. - Using fine-grained collections instead of embedded documents
Symptom: N+1 query problem: fetching a user and their addresses requires separate queries, just like in SQL.
Fix: Embed related data (e.g., address as an array inside user document) unless the relationship is truly many-to-many and large. - Ignoring the 16MB document size limit
Symptom: Insert fails with 'Document too large' error when storing arrays of sub-documents that push over 16MB.
Fix: Use GridFS for files or break large arrays into separate collections referenced by _id. - Relying on the default read preference 'primary' for read-heavy workloads
Symptom: Primary server CPU at 90% while secondaries are idle; queries get slower under load.
Fix: Set readPreference to 'secondaryPreferred' for read-only queries. But be aware of stale data — use 'primary' for read-after-write consistency. - Not using BSON type classes for dates and ObjectIds
Symptom: Dates stored as strings, ObjectIds as integers — queries fail to match because types differ.
Fix: Always use MongoDB\BSON\UTCDateTime for dates, MongoDB\BSON\ObjectId for _id references. - Overusing transactions for single-document updates
Symptom: Write latency jumps from 5ms to 60ms for every update, transaction conflict errors under load.
Fix: Use atomic operators ($inc, $set, $push) for single-document updates. Reserve transactions for multi-document consistency only. - Forgetting to set socket and connection timeouts
Symptom: PHP-FPM workers pile up waiting for a slow or dead MongoDB node, eventually exhausting the process pool.
Fix: Always set connectTimeoutMS (e.g., 3000) and socketTimeoutMS (e.g., 10000) in the connection options.
Interview Questions on This Topic
- QHow do you handle schema evolution in MongoDB when using PHP?SeniorReveal
- QExplain how MongoDB's PHP driver handles connection pooling. What can go wrong in high-concurrency FPM environments?SeniorReveal
- QWrite a PHP script that demonstrates a safe update using a write concern of 'majority' and journaled writes.Mid-levelReveal
- QWhat are the main performance considerations when using aggregation pipelines with $lookup in PHP?SeniorReveal
- QHow would you implement a change stream consumer in PHP that survives restarts?SeniorReveal
- QDescribe the impact of using 'secondaryPreferred' read preference on data consistency and how to mitigate stale reads.SeniorReveal
Frequently Asked Questions
What is PHP and MongoDB in simple terms?
PHP and MongoDB is a way to store and retrieve data without a fixed schema. You send PHP arrays to MongoDB, and it stores them as BSON documents. You don't need to define columns up front — PHP arrays directly become documents. This is ideal for data that changes shape often, like user profiles with optional fields or nested product attributes.
How do I install the PHP MongoDB driver for production?
You need two components: ext-mongodb (C extension) via PECL, and the mongodb/mongodb Composer package. For production, ensure your server has the correct PHP version and use a version-locked Composer requirement like ^1.15. Also install mongodb in your Dockerfile with pecl install mongodb. Verify the extension is loaded with php -m | grep mongodb.
Can I use MongoDB and MySQL together in the same PHP project?
Yes, it's common to use MySQL for transactional, relational data (e.g., orders, accounts) and MongoDB for flexible, high-read data (e.g., product catalogs, logs, user profiles). Each connection is separate; manage them via dependency injection. Just be careful with transactions — you can't mix them across databases.
What is the maximum document size in MongoDB? How do I handle larger data?
The default limit is 16MB per document. For larger binary data, use GridFS, which splits files into chunks (default 255KB) stored in separate collections. For large nested arrays, consider splitting into a separate collection and referencing by _id. Also, keep documents under 1MB for optimal cursor performance.
Why are my MongoDB queries slow despite creating indexes?
Possible causes: Index prefix not matching query pattern (compound index order matters), query using $regex without anchor, or documents being large so I/O dominates. Use explain() to see if the index is used. Also check that the index is actually created (listIndexes()). In production, monitor query performance with db.currentOp() and the profiler.
When should I use transactions instead of atomic operators?
Use transactions only when you need to atomically update multiple documents across collections (e.g., transferring funds between accounts). For single-document updates, atomic operators like $inc, $set, $push are sufficient and much faster. Transactions add 20-50ms overhead and require replica sets.
That's PHP & MySQL. Mark it forged?
10 min read · try the examples if you haven't