IPC — Misaligned uint64 Torn Read in Shared Memory
A misaligned uint64 write at offset 1022 caused a torn read crash.
- IPC lets isolated processes exchange data through kernel-managed channels
- Pipes: unidirectional byte stream, parent-child or named (FIFO) for unrelated procs
- Sockets: bidirectional, works across processes on same host (AF_UNIX) or network (TCP)
- Shared memory: fastest (70–150 ns access) — zero kernel copy — but requires explicit sync
- Message queues: async, kernel-buffered, priority delivery — slower but decoupled in time
- Production risk: shared memory races corrupt data silently — never skip mutexes
- Biggest mistake: assuming all IPC is interchangeable — latency varies from 70ns to 50µs
Imagine two chefs working in the same restaurant kitchen. They can't read each other's minds, so they need ways to pass information — one chef shouts across the kitchen (like a pipe), another writes on a shared whiteboard (like shared memory), and a third drops orders into a ticket queue (like a message queue). IPC is exactly that: the set of rules and tools the OS provides so that separate running programs can talk to each other, coordinate work, and share data without crashing into each other.
The OS keeps every process in its own virtual address space — that sandbox is why a crash in one tab doesn't bring down your whole browser. But it creates a hard problem: how do two isolated processes cooperate?
IPC solves exactly this. The kernel provides controlled channels — pipes, sockets, shared memory, message queues — that let data flow between address spaces safely. Each makes a different trade-off between speed, complexity, and what breaks under load.
Here's what most guides skip: the wrong IPC choice can tank your system's throughput by 100x or introduce data corruption that only reproduces on multi-socket machines. This article covers not just how each mechanism works but what fails in production and how to fix it.
Most engineers treat IPC as a one-dimensional decision — faster is better. That's wrong. The real question is what failure mode you can tolerate. Shared memory gives you speed but a single lock mistake corrupts all data. Message queues add latency but isolate failures. Choose based on your outage cost, not a microbenchmark.
What is Inter-Process Communication?
Inter-Process Communication is a core concept in OS design. The kernel keeps processes isolated — each in its own virtual address space. IPC bridges that gap, providing controlled channels for data exchange. There are four classic mechanisms: pipes, sockets, shared memory, and message queues.
Think of IPC as a spectrum. Pipes are simple byte streams for parent-child processes. Sockets add network transparency. Shared memory delivers raw speed (70–150 ns access) but demands manual synchronisation. Message queues give structured, async delivery with kernel buffering.
Here's the rule that matters: you rarely choose IPC directly — your libraries decide. PostgreSQL uses shared memory for its buffer pool. Get the size wrong and you'll either OOM or hit terrible disk reads. Profile IPC overhead under your actual workload, not synthetic benchmarks.
Most engineers treat IPC as one-dimensional: faster is better. That's wrong. The real question is what failure mode you can tolerate. Shared memory gives speed but a single lock mistake corrupts all data. Message queues add latency but isolate failures. Choose based on your outage cost, not a microbenchmark.
A common misconception: IPC mechanisms are drop-in replacements. They're not. Switch from shared memory to a message queue in a latency-critical path and you'll see a 100x slowdown. Always validate with your workload.
Another nuance: IPC performance depends on the kernel version and CPU architecture. On a NUMA system, accessing a shared memory page allocated on a remote NUMA node adds 2–3x latency. Use numactl to pin processes and memory allocations to the same node. Ignoring topology is one of the most common causes of inconsistent IPC performance in production.
numactl --membind to keep memory local to the processes that use it.pipe()). Zero config, lowest overhead.Pipes: Simple Byte Streams Between Related Processes
Pipes are the oldest and simplest IPC mechanism. An unnamed pipe created with gives two file descriptors: the write end and the read end. Data written to the write end is buffered in the kernel and read sequentially from the read end.pipe()
Pipes are unidirectional. For bidirectional communication, create two pipes or switch to a socketpair. The kernel buffer is finite — pipe-max-size defaults to 1 MB on Linux, adjustable via fcntl(fd, F_SETPIPE_SZ). If the buffer fills, blocks until the reader drains data.write()
Named pipes (FIFOs) use a filesystem path and allow unrelated processes to communicate. Created with . The same read/write semantics apply, but now any process with permissions can open the file.mkfifo()
Pipes are great for one-shot data flows like grep | sort or for passing small control messages between a parent and its children. Don't force them into high-throughput scenarios — the kernel copy and scheduling overhead adds up.
One production gotcha: if the reader closes its end before the writer finishes, the writer receives SIGPIPE (termination) or gets EPIPE error on write if SIGPIPE is ignored. Always handle EPIPE in your writer loop.
A real-world story: a log aggregation pipeline used pipes between a producer and a consumer. The consumer was slow due to disk I/O, and the producer would block on write, stalling the entire logging thread. The fix was to set O_NONBLOCK on the write end and buffer in userspace — but only after losing a day of logs. Always test your pipe throughput against your worst-case burst rate.
Another subtle issue: the pipe buffer size is shared across all writers. If you have multiple writers, a single slow reader can cause all of them to block. Consider using a separate pipe per writer or a socket with a larger buffer.
In high-throughput scenarios, consider using to move data between pipes and sockets without copying to userspace. It's a syscall but can reduce CPU overhead significantly when chaining pipes. However, splice requires one end to be a pipe. That's a niche optimisation but worth knowing.splice()
splice() avoids user-space copies but requires a pipe as one end.pipe()). Simple, no filesystem artifacts.mkfifo(). Must handle permissions and cleanup.Sockets: Network-Transparent Bidirectional Communication
Sockets are the most versatile IPC mechanism. They can communicate within a single host using Unix domain sockets (AF_UNIX) or across a network using TCP (AF_INET). Unix domain sockets are file-based and support both stream (SOCK_STREAM) and datagram (SOCK_DGRAM) semantics.
For local IPC, Unix domain sockets are the gold standard: they support bidirectional byte streams, have lower latency than TCP (no loopback interface), and integrate well with select/poll/epoll. They also support passing file descriptors between processes via — extremely useful for delegating server sockets to workers.sendmsg()
A common pattern: a server creates a Unix domain socket at /tmp/app.sock, accepts connections from multiple clients, and services them using a thread pool or event loop. The biggest pitfall is forgetting to the socket file before unlink(), or getting permissions wrong (the socket file must be writable by the client).bind()
TCP sockets add network transparency but at a cost: latency is 10–100x higher, and you need to handle connection management, reconnection, and message framing yourself (length-prefix encoding is standard).
One real-world lesson: if you forget to set SO_RCVTIMEO or use a blocking accept, a single slow client can stall your entire server event loop. Use non-blocking I/O + epoll for any production socket server.
Another subtle issue: partial reads. TCP is a stream, not a message API. You might read only half of what was sent. Always loop and check return values. A common bug is assuming a single gets the full message. Build a simple framing layer — four bytes of length followed by the payload — and you'll save hours of debugging.read()
Passing file descriptors via SCM_RIGHTS is a powerful feature. Nginx uses it to hand off accepted sockets to worker processes. But be careful: the receiving process must have the same UID or be root. Also, the kernel serializes the FD passing, so you can send only one FD per message on some systems.
Performance-wise, Unix domain sockets are about 2–3x faster than TCP loopback for small messages (under 1KB). For bulk transfers, the gap narrows but still favours Unix sockets. Benchmark your own workload.
- socket() = get a phone device
- bind() = assign your phone number
- listen() = wait for incoming calls
- accept() = pick up the ring
- connect() = dial
- read/write = speak and listen
Shared Memory: Raw Speed with Manual Synchronisation
Shared memory is the fastest IPC mechanism because data moves directly between process address spaces without kernel copies. On Linux, you create a shared memory object with and then shm_open() it into each process's address space. Both processes see the same physical pages.mmap()
Once mapped, you can write and read from the shared region using pointer dereferencing — no system calls. That's nanosecond access. But it comes with a huge responsibility: race conditions are guaranteed if you don't use synchronisation. The typical approach is to embed a pthread_mutex_t (or a futex-based lock) directly in the shared memory region, placed at the start, and protect critical sections.
A common production pattern: a producer writes a new version of a large data structure (e.g., a lookup table), then atomically increments a generation counter that consumers check before reading. That's a form of sequence lock (seqlock). But even then, the compiler can reorder memory accesses — you need memory barriers (atomic_thread_fence).
The biggest mistake? Thinking that because you only write to your own partition, you don't need synchronisation. See the production incident above — torn reads happen.
Another subtle issue: cache line bouncing when two processes on different CPU cores repeatedly modify adjacent variables in the same cache line (false sharing). Pad your shared structures to 64-byte boundaries to avoid this.
One more trap: initialising the mutex with PTHREAD_PROCESS_SHARED attribute. If you forget, any process other than the initialiser will crash with EINVAL on the first lock. Always check that the mutex is process-shared and initialised exactly once.
Additionally, be aware of the size of the shared memory object. ftruncate() sets the size, and mapping beyond that causes SIGBUS. Always handle SIGBUS or map with MAP_POPULATE to pre-fault pages. In production, set the shared memory size based on sysctl kernel.shmmax limits.
For large-scale deployments, consider using huge pages (2MB or 1GB) to reduce TLB misses. shm_open with MAP_HUGETLB can significantly improve performance for large shared regions. But it requires kernel configuration and reserving huge pages.
PTHREAD_PROCESS_SHARED attribute, otherwise pthread_mutex_lock will crash with EINVAL when a process other than the initialiser tries to lock it. Do this exactly once: call pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED) before pthread_mutex_init.Message Queues: Structured Asynchronous Communication
Message queues (SysV msgget family or POSIX mq_open family) allow processes to send and receive discrete messages. Each message has a type or priority, and the kernel buffers them until the receiver picks them up. The sender doesn't block unless the queue is full.
- named queues (
/myqueue) accessible by unrelated processes - prioritised delivery (messages with higher priority are received first)
- notification (via
mq_notifywith a signal or thread) when a message arrives - asynchronous send with
O_NONBLOCK
The queue has a fixed maximum number of messages and maximum message size, set at creation. These are kernel-controlled: msgmax, msgmnb, etc. If you need dynamic sizing, you must handle failures.
Message queues are ideal for event-driven systems where producers and consumers don't need to be alive at the same time. They decouple the sender and receiver in time (the kernel holds the message) and in space (receiver can be on a different process).
But they're slower than shared memory — each mq_send/mq_receive involves a system call and a kernel buffer copy. In high-throughput paths, shared memory + a condition variable beats message queues by orders of magnitude.
Be aware of priority inversion: a low-priority message that holds a resource needed by a high-priority sender can cause head-of-line blocking. Monitor queue depths and set timeouts on both ends.
A production scenario: a logging service used a message queue between the app and the file writer. When the queue filled up, the app's mq_send blocked, which blocked the request handler. The entire site went down because the logger became the bottleneck. The fix: mq_timedsend with a timeout and fallback to dropping messages under backpressure.
Another subtlety: POSIX message queue names are limited to NAME_MAX (often 255) characters and must start with a slash. SysV queues use a numeric key. Prefer POSIX for portability and notification support.
Also note: message queues are not suitable for real-time systems with tight deadlines due to unpredictable kernel scheduling. The time between mq_send and mq_receive can vary by microseconds to milliseconds under load.
mq_receive in a loop, use mq_notify to get a signal or spawn a thread when a message arrives. This reduces CPU usage and latency. But beware: the notification is one-shot — you must rearm it after each receipt.IPC Performance Tuning: Kernel Parameters That Actually Matter
The Linux kernel exposes dozens of sysctl knobs that control IPC behavior. Getting these wrong means your pipe buffer starves, shared memory allocations fail, or message queues block unnecessarily.
For pipes: fs.pipe-max-size (default 1 MB) caps the kernel buffer. Use fcntl(fd, F_SETPIPE_SZ, size) to increase per-pipe up to the max. For high-throughput pipelines, increase fs.pipe-max-size and set appropriate per-pipe sizes.
For shared memory: kernel.shmmax (default 32 MB on some distros) sets the max size of a single shared memory segment. kernel.shmall limits total pages. If your app needs a large shared region (say 16 GB), you must increase both. Also, kernel.shm_rmid_forced can help clean up orphaned segments after a crash.
For message queues: kernel.msgmnb (default 16 KB) caps total bytes in a queue. kernel.msgmax caps per-message size. kernel.msgmni caps number of queues system-wide. Monitor with ipcs -q -l.
For semaphores: kernel.sem sets semaphore limits. If you use System V semaphores for IPC sync, ensure SEMMSL isn't too low.
A common production trap: a team increased shmmax but forgot shmall, so shmget failed with ENOMEM. Always set both. Also, changes via sysctl -w are ephemeral — persist in /etc/sysctl.conf.
Here's a snippet that sets sane defaults for a database workload:
``bash # Increase shared memory limits for Postgres-like workloads sysctl -w kernel.shmmax=68719476736 # 64 GB sysctl -w kernel.shmall=16777216 # 64 GB in pages (4K each) sysctl -w kernel.msgmnb=65536 # 64 KB per queue sysctl -w fs.pipe-max-size=4194304 # 4 MB per pipe ``
Always reboot or reload after editing sysctl.conf: sysctl -p.
kernel.shmall must be at least kernel.shmmax / PAGE_SIZE. If you increase shmmax without adjusting shmall, large allocations fail silently. Always check with ipcs -lm after changes.shmmax but forgot shmall, causing shmget to fail with ENOMEM at scale.ipcs -lm after each change.pipe-max-size and set per-pipe via fcntl. Don't rely on the default 1 MB.shmmax and shmall must be set together.pipe-max-size and msgmnb are the most commonly misconfigured.Real-World IPC: How PostgreSQL and Redis Use Shared Memory
Two of the most popular databases rely on shared memory for performance-critical paths.
PostgreSQL uses shared memory for its shared buffer pool, WAL buffers, and lock tables. The shared_buffers parameter directly maps to a SysV or POSIX shared memory segment. Misconfiguring kernel.shmmax or kernel.shmall to be smaller than shared_buffers causes PostgreSQL to fail to start with an IPC error. A common fix is to increase kernel parameters or reduce shared_buffers.
Redis uses shared memory for its persistence model? No — Redis is single-threaded and uses fork()+COW, not shared memory. However, Redis cluster and some tools use shared memory for inter-process coordination. The bigger point: Redis uses socket IPC for replication and networking.
But the real shared-memory workhorse is PostgreSQL. A production story: a PostgreSQL deployment had shared_buffers set to 8 GB but kernel.shmmax was only 1 GB. The server refused to start. The fix was to increase kernel.shmmax to 12 GB. Always check ipcs -lm after changes.
Another example: Apache HTTP uses shared memory for the scoreboard to track worker processes. Nginx uses shared memory for rate limiting and load balancing across workers. Both depend on correct kernel tuning.
For custom applications, a ring buffer in shared memory is a common pattern for high-throughput logging or metrics collection. The key is to use a lock-free design with atomic sequence numbers and enough padding to avoid false sharing.
In production, always monitor shared memory usage with ipcs -m and ensure your monitoring alerts on segment allocation failures. A sudden spike in shared memory usage can indicate a bug that leaks segments — set up a job to clean up orphaned segments on startup.
kernel.shmmax is smaller than shared_buffers. The error message usually says 'invalid value for parameter shared_buffers' or 'could not create shared memory segment'. Increase kernel parameters as needed.The Shared Memory Race That Brought Down a Trading Engine
- Shared memory is not safe without synchronisation, even with 'non-overlapping' writes
- Assume all data races are possible: the compiler and CPU can reorder operations
- Always align shared variables to their type size
- Use proper sync primitives — lock-free algorithms require proof, not hope
lsof -c <process> to see open file descriptors. Ensure reader hasn't crashed or exited early.strace -e trace=connect,recvfrom,sendto -p <PID> to see connection state. Netstat ss -tlnp to verify listen backlog and port. Check firewall rules with iptables -L.valgrind --tool=helgrind. 2. Use cat /proc/<PID>/maps to confirm shared memory region is mapped with correct permissions. 3. Dump raw bytes with dd if=/dev/shm/<name> bs=1 count=<size> | xxd to inspect corruption.ipcs -q. Compare message count to msg_qbytes. Use strace -e trace=msgsnd to see if it's blocking on full queue. Increase kernel parameter kernel.msgmnb or add non-blocking flag.ls -l /path/to/socket. Remove stale files with rm /path/to/socket. Ensure your application unlinks the socket on shutdown. Use lsof /path/to/socket to see if another process still has it open.Key takeaways
shmmax, pipe-max-size, msgmnb) for your workloadCommon mistakes to avoid
5 patternsUsing pipes for bidirectional communication
Forgetting to unlink Unix socket files before bind
Not aligning shared memory variables to their type size
Assuming message queue send never blocks
mq_timedsend() with a timeout, or set O_NONBLOCK. Monitor queue depth with ipcs -q.Initialising a mutex in shared memory without PTHREAD_PROCESS_SHARED
Interview Questions on This Topic
Compare pipes and Unix domain sockets for local IPC. When would you use one over the other?
Frequently Asked Questions
That's Operating Systems. Mark it forged?
11 min read · try the examples if you haven't