Advanced 14 min · March 06, 2026

Inter-Process Communication

IPC — Misaligned uint64 Torn Read in Shared Memory

Q: Can pipes be used for unrelated processes?

Yes, but only named pipes (FIFOs). Created with `mkfifo`, they appear as files on the filesystem. Any process with the right permissions can open the FIFO for reading or writing. However, FIFOs are still unidirectional. For bidirectional communication between unrelated processes, use Unix domain sockets.

Q: Why is shared memory faster than pipes?

Shared memory involves zero kernel copies once the mapping is established – data is read and written directly via memory access. Pipes, sockets, and message queues all copy data through kernel buffers (at least two copies: user→kernel and kernel→user). That's why shared memory can achieve sub-microsecond latency while other mechanisms are in the microsecond range.

Q: What happens if a message queue fills up?

By default, `mq_send` blocks until space becomes available. This can cause cascading failures if the sender is a critical path. The fix is to use `O_NONBLOCK` or `mq_timedsend()` with a timeout. You can also increase `kernel.msgmnb` via sysctl, but that only delays the problem – you still need a backpressure mechanism.

Q: How do I check the current size of my pipe buffer?

Use `fcntl(fd, F_GETPIPE_SZ)` to get the capacity in bytes for a specific pipe. The system-wide maximum is in `/proc/sys/fs/pipe-max-size`. Remember that the kernel buffer is a shared resource – if a writer fills it, it blocks until the reader drains data.

Q: Is it safe to share memory between processes without any locks if each process only writes to its own region?

No. Even with partitioned writes, reads from another process's region can see torn values if the write is misaligned. Additionally, the compiler and CPU can reorder memory operations. The only safe way is to use proper synchronisation – mutexes, atomic operations with memory ordering, or sequence locks. The production incident in this article shows exactly how that fails.

Q: What is splice() and when should I use it for pipes?

splice() is a Linux syscall that moves data between two file descriptors without copying through userspace. It's useful for zero-copy forwarding from a pipe to a socket or vice versa. Use it in high-throughput pipelines where you don't need to inspect the data. One end must be a pipe. It can reduce CPU usage significantly but adds complexity.

Q: How do I monitor shared memory usage in production?

Use `ipcs -m` to list all shared memory segments. Check `cat /proc/sysvipc/shm` for detailed per-segment info. Monitor `kernel.shmmax` and `kernel.shmall` via sysctl. Set up alerts for when shared memory usage approaches kernel limits. Also, use `smem` or `pmap` to see which processes are using shared memory.

A misaligned uint64 write at offset 1022 caused a torn read crash.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

IPC lets isolated processes exchange data through kernel-managed channels
Pipes: unidirectional byte stream, parent-child or named (FIFO) for unrelated procs
Sockets: bidirectional, works across processes on same host (AF_UNIX) or network (TCP)
Shared memory: fastest (70–150 ns access) — zero kernel copy — but requires explicit sync
Message queues: async, kernel-buffered, priority delivery — slower but decoupled in time
Production risk: shared memory races corrupt data silently — never skip mutexes
Biggest mistake: assuming all IPC is interchangeable — latency varies from 70ns to 50µs

✦ Definition~90s read

What is Inter-Process Communication?

Inter-Process Communication is a core concept in OS design. The kernel keeps processes isolated — each in its own virtual address space. IPC bridges that gap, providing controlled channels for data exchange. There are four classic mechanisms: pipes, sockets, shared memory, and message queues.

★

Imagine two chefs working in the same restaurant kitchen.

Think of IPC as a spectrum. Pipes are simple byte streams for parent-child processes. Sockets add network transparency. Shared memory delivers raw speed (70–150 ns access) but demands manual synchronisation. Message queues give structured, async delivery with kernel buffering.

Here's the rule that matters: you rarely choose IPC directly — your libraries decide. PostgreSQL uses shared memory for its buffer pool. Get the size wrong and you'll either OOM or hit terrible disk reads. Profile IPC overhead under your actual workload, not synthetic benchmarks.

Most engineers treat IPC as one-dimensional: faster is better. That's wrong. The real question is what failure mode you can tolerate. Shared memory gives speed but a single lock mistake corrupts all data. Message queues add latency but isolate failures. Choose based on your outage cost, not a microbenchmark.

A common misconception: IPC mechanisms are drop-in replacements. They're not. Switch from shared memory to a message queue in a latency-critical path and you'll see a 100x slowdown. Always validate with your workload.

Another nuance: IPC performance depends on the kernel version and CPU architecture. On a NUMA system, accessing a shared memory page allocated on a remote NUMA node adds 2–3x latency. Use numactl to pin processes and memory allocations to the same node. Ignoring topology is one of the most common causes of inconsistent IPC performance in production.

Plain-English First

Imagine two chefs working in the same restaurant kitchen. They can't read each other's minds, so they need ways to pass information — one chef shouts across the kitchen (like a pipe), another writes on a shared whiteboard (like shared memory), and a third drops orders into a ticket queue (like a message queue). IPC is exactly that: the set of rules and tools the OS provides so that separate running programs can talk to each other, coordinate work, and share data without crashing into each other.

The OS keeps every process in its own virtual address space — that sandbox is why a crash in one tab doesn't bring down your whole browser. But it creates a hard problem: how do two isolated processes cooperate?

IPC solves exactly this. The kernel provides controlled channels — pipes, sockets, shared memory, message queues — that let data flow between address spaces safely. Each makes a different trade-off between speed, complexity, and what breaks under load.

Here's what most guides skip: the wrong IPC choice can tank your system's throughput by 100x or introduce data corruption that only reproduces on multi-socket machines. This article covers not just how each mechanism works but what fails in production and how to fix it.

Most engineers treat IPC as a one-dimensional decision — faster is better. That's wrong. The real question is what failure mode you can tolerate. Shared memory gives you speed but a single lock mistake corrupts all data. Message queues add latency but isolate failures. Choose based on your outage cost, not a microbenchmark.

What is Inter-Process Communication?

io_thecodeforge_ipc_overview.cC

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    int fd[2];
    pipe(fd);
    pid_t pid = fork();
    if (pid == 0) {
        close(fd[0]);
        const char *msg = "io.thecodeforge says: IPC works";
        write(fd[1], msg, 32);
        close(fd[1]);
    } else {
        close(fd[1]);
        char buf[64];
        read(fd[0], &buf, 64);
        printf("Child said: %s\n", buf);
        close(fd[0]);
        wait(NULL);
    }
    return 0;
}

🔥Nuance: NUMA Matters

On multi-socket machines, shared memory allocated on one NUMA node can be slower when accessed from another. Use numactl --membind to keep memory local to the processes that use it.

📊 Production Insight

Your libraries choose IPC for you, not your config files.

PostgreSQL uses shared memory for its buffer pool — misconfigure it and you get OOM or terrible disk reads.

Rule: profile IPC overhead under load; 10µs pipe latency is fine for logs, deadly for a trading engine.

🎯 Key Takeaway

IPC isolates processes safely but forces data through kernel-managed channels.

Latency: shared memory < 1µs, pipes ~10µs, sockets ~50µs (local) to ms (remote).

Choose based on latency budget, data size, and whether processes are on the same host.

IPC Mechanism Decision Tree

IfProcesses are parent-child, need simple byte stream, unidirectional

→

UseUse unnamed pipe (pipe()). Zero config, lowest overhead.

IfProcesses are unrelated, same host, need bidirectional byte stream

→

UseUse Unix domain socket (AF_UNIX). Socketpair for convenience.

IfProcesses need high throughput, low latency, share large data set

→

UseUse shared memory (shm_open + mmap). Mandatory: mutex sync.

IfProcesses are unrelated, need async message delivery with kernel buffering

→

UseUse POSIX message queues (mq_open). Better than SysV; supports notification.

IfProcesses are on different machines

→

UseUse TCP sockets (AF_INET). Consider message framing (length prefix) to avoid fragmentation.

thecodeforge.io

Inter Process Communication

Pipes are the oldest and simplest IPC mechanism. An unnamed pipe created with pipe() gives two file descriptors: the write end and the read end. Data written to the write end is buffered in the kernel and read sequentially from the read end.

Pipes are unidirectional. For bidirectional communication, create two pipes or switch to a socketpair. The kernel buffer is finite — pipe-max-size defaults to 1 MB on Linux, adjustable via fcntl(fd, F_SETPIPE_SZ). If the buffer fills, write() blocks until the reader drains data.

Named pipes (FIFOs) use a filesystem path and allow unrelated processes to communicate. Created with mkfifo(). The same read/write semantics apply, but now any process with permissions can open the file.

Pipes are great for one-shot data flows like grep | sort or for passing small control messages between a parent and its children. Don't force them into high-throughput scenarios — the kernel copy and scheduling overhead adds up.

One production gotcha: if the reader closes its end before the writer finishes, the writer receives SIGPIPE (termination) or gets EPIPE error on write if SIGPIPE is ignored. Always handle EPIPE in your writer loop.

A real-world story: a log aggregation pipeline used pipes between a producer and a consumer. The consumer was slow due to disk I/O, and the producer would block on write, stalling the entire logging thread. The fix was to set O_NONBLOCK on the write end and buffer in userspace — but only after losing a day of logs. Always test your pipe throughput against your worst-case burst rate.

Another subtle issue: the pipe buffer size is shared across all writers. If you have multiple writers, a single slow reader can cause all of them to block. Consider using a separate pipe per writer or a socket with a larger buffer.

In high-throughput scenarios, consider using splice() to move data between pipes and sockets without copying to userspace. It's a syscall but can reduce CPU overhead significantly when chaining pipes. However, splice requires one end to be a pipe. That's a niche optimisation but worth knowing.

io_thecodeforge_pipe_example.cC

#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
#include <string.h>

int main() {
    int fd[2];
    if (pipe(fd) == -1) {
        perror("pipe");
        return 1;
    }

    pid_t pid = fork();
    if (pid == 0) {
        // Child: write
        close(fd[0]);
        const char *msg = "io.thecodeforge says: data through pipe";
        write(fd[1], msg, strlen(msg) + 1);
        close(fd[1]);
        return 0;
    }

    // Parent: read
    close(fd[1]);
    char buf[64];
    ssize_t n = read(fd[0], buf, sizeof(buf));
    printf("Received: %s\n", buf);
    close(fd[0]);
    wait(NULL);
    return 0;
}

⚠ Pipe Gotcha: Blocking Write

If the reader closes its end before the writer finishes, the writer receives SIGPIPE (termination) or gets EPIPE error on write if SIGPIPE is ignored. Always handle EPIPE in your writer loop.

📊 Production Insight

Pipes are unbuffered in the sense that they pass data through kernel pages.

In a log-aggregation pipeline, a slow reader caused the writer to block, stalling all logging and cascading into a full system freeze.

Fix: set O_NONBLOCK on the write end and log to a separate thread with a user-space buffer.

For high-throughput, splice() avoids user-space copies but requires a pipe as one end.

🎯 Key Takeaway

Pipes are simple, zero-config, but limited to parent-child or unrelated via FIFO.

Blocking write can kill throughput if reader is slow.

Use socketpair for bidirectional on same host — it's essentially a pipe but gives two FILE*s.

Pipes Decision Tree

IfNeed one-way byte stream between parent/child

→

UseUse unnamed pipe (pipe()). Simple, no filesystem artifacts.

IfNeed bidirectional but processes are related

→

UseUse socketpair(AF_UNIX, SOCK_STREAM) — avoid managing two pipes.

IfNeed one-way between unrelated processes

→

UseUse named pipe (FIFO) via mkfifo(). Must handle permissions and cleanup.

IfMultiple writers, single slow reader

→

UseSwitch to Unix socket or add per-writer pipes. O_NONBLOCK with user-space buffer can help.

Sockets: Network-Transparent Bidirectional Communication

Sockets are the most versatile IPC mechanism. They can communicate within a single host using Unix domain sockets (AF_UNIX) or across a network using TCP (AF_INET). Unix domain sockets are file-based and support both stream (SOCK_STREAM) and datagram (SOCK_DGRAM) semantics.

For local IPC, Unix domain sockets are the gold standard: they support bidirectional byte streams, have lower latency than TCP (no loopback interface), and integrate well with select/poll/epoll. They also support passing file descriptors between processes via sendmsg() — extremely useful for delegating server sockets to workers.

A common pattern: a server creates a Unix domain socket at /tmp/app.sock, accepts connections from multiple clients, and services them using a thread pool or event loop. The biggest pitfall is forgetting to unlink() the socket file before bind(), or getting permissions wrong (the socket file must be writable by the client).

TCP sockets add network transparency but at a cost: latency is 10–100x higher, and you need to handle connection management, reconnection, and message framing yourself (length-prefix encoding is standard).

One real-world lesson: if you forget to set SO_RCVTIMEO or use a blocking accept, a single slow client can stall your entire server event loop. Use non-blocking I/O + epoll for any production socket server.

Another subtle issue: partial reads. TCP is a stream, not a message API. You might read only half of what was sent. Always loop and check return values. A common bug is assuming a single read() gets the full message. Build a simple framing layer — four bytes of length followed by the payload — and you'll save hours of debugging.

Passing file descriptors via SCM_RIGHTS is a powerful feature. Nginx uses it to hand off accepted sockets to worker processes. But be careful: the receiving process must have the same UID or be root. Also, the kernel serializes the FD passing, so you can send only one FD per message on some systems.

Performance-wise, Unix domain sockets are about 2–3x faster than TCP loopback for small messages (under 1KB). For bulk transfers, the gap narrows but still favours Unix sockets. Benchmark your own workload.

io_thecodeforge_unix_domain_socket_server.cC

#include <sys/socket.h>
#include <sys/un.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main() {
    const char *path = "/tmp/io_thecodeforge.sock";
    unlink(path);  // clean up previous run

    int sock = socket(AF_UNIX, SOCK_STREAM, 0);
    if (sock == -1) { perror("socket"); return 1; }

    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);

    if (bind(sock, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        perror("bind"); return 1;
    }
    listen(sock, 5);

    int client = accept(sock, NULL, NULL);
    if (client == -1) { perror("accept"); return 1; }

    char buf[128];
    read(client, buf, sizeof(buf));
    printf("Server got: %s\n", buf);

    const char *reply = "Welcome from io.thecodeforge";
    write(client, reply, strlen(reply) + 1);

    close(client); close(sock); unlink(path);
    return 0;
}

Mental Model

Socket Mental Model: Phone System

Think of a socket like a phone call: you dial an address, wait for the other side to answer, then talk both ways.

socket() = get a phone device
bind() = assign your phone number
listen() = wait for incoming calls
accept() = pick up the ring
connect() = dial
read/write = speak and listen

📊 Production Insight

Unix domain sockets are 2–3x faster than loopback TCP for small messages.

But if you forget to set SO_RCVTIMEO or use a blocking accept, a single slow client can stall your entire server event loop.

Rule: use non-blocking I/O + epoll for any production socket server.

Partial reads are guaranteed — always loop. Build a framing layer (length prefix) to avoid app-level corruption.

🎯 Key Takeaway

Unix sockets: bidirectional, local, low-latency, can pass FDs.

TCP sockets: network-enabled, higher overhead, need message framing.

Always set non-blocking and handle partial reads/writes — TCP is a stream, not a message API.

Socket Decision Tree

IfSame host, bidirectional, need FD passing

→

UseUnix stream socket (AF_UNIX, SOCK_STREAM). Best latency, can pass FDs.

IfSame host, need datagram (unordered, reliable)

→

UseUnix datagram socket (AF_UNIX, SOCK_DGRAM). No framing needed for fixed-size messages.

IfCross-network, need reliable stream

→

UseTCP socket (AF_INET, SOCK_STREAM). Must handle partial reads, reconnection, and framing.

IfCross-network, need low-latency, loss-tolerant

→

UseUDP socket (AF_INET, SOCK_DGRAM). Build your own reliability if needed.

thecodeforge.io

Inter Process Communication

Shared Memory: Raw Speed with Manual Synchronisation

Shared memory is the fastest IPC mechanism because data moves directly between process address spaces without kernel copies. On Linux, you create a shared memory object with shm_open() and then mmap() it into each process's address space. Both processes see the same physical pages.

Once mapped, you can write and read from the shared region using pointer dereferencing — no system calls. That's nanosecond access. But it comes with a huge responsibility: race conditions are guaranteed if you don't use synchronisation. The typical approach is to embed a pthread_mutex_t (or a futex-based lock) directly in the shared memory region, placed at the start, and protect critical sections.

A common production pattern: a producer writes a new version of a large data structure (e.g., a lookup table), then atomically increments a generation counter that consumers check before reading. That's a form of sequence lock (seqlock). But even then, the compiler can reorder memory accesses — you need memory barriers (atomic_thread_fence).

The biggest mistake? Thinking that because you only write to your own partition, you don't need synchronisation. See the production incident above — torn reads happen.

Another subtle issue: cache line bouncing when two processes on different CPU cores repeatedly modify adjacent variables in the same cache line (false sharing). Pad your shared structures to 64-byte boundaries to avoid this.

One more trap: initialising the mutex with PTHREAD_PROCESS_SHARED attribute. If you forget, any process other than the initialiser will crash with EINVAL on the first lock. Always check that the mutex is process-shared and initialised exactly once.

Additionally, be aware of the size of the shared memory object. ftruncate() sets the size, and mapping beyond that causes SIGBUS. Always handle SIGBUS or map with MAP_POPULATE to pre-fault pages. In production, set the shared memory size based on sysctl kernel.shmmax limits.

For large-scale deployments, consider using huge pages (2MB or 1GB) to reduce TLB misses. shm_open with MAP_HUGETLB can significantly improve performance for large shared regions. But it requires kernel configuration and reserving huge pages.

io_thecodeforge_shared_memory_producer.cC

#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <pthread.h>

#define SHM_NAME "/io_thecodeforge_shm"
#define SHM_SIZE 4096

int main() {
    int fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, SHM_SIZE);
    void *ptr = mmap(NULL, SHM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);

    // mutex must be initialised elsewhere with PTHREAD_PROCESS_SHARED
    pthread_mutex_t *mtx = (pthread_mutex_t *)ptr;
    pthread_mutex_lock(mtx);

    char *data = (char *)ptr + sizeof(pthread_mutex_t);
    const char *msg = "io.thecodeforge shared memory payload";
    memcpy(data, msg, strlen(msg) + 1);

    pthread_mutex_unlock(mtx);
    munmap(ptr, SHM_SIZE);
    return 0;
}

⚠ Shared Memory Initialisation Trap

The mutex inside shared memory must be initialised with PTHREAD_PROCESS_SHARED attribute, otherwise pthread_mutex_lock will crash with EINVAL when a process other than the initialiser tries to lock it. Do this exactly once: call pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED) before pthread_mutex_init.

📊 Production Insight

Shared memory is the mechanism behind PostgreSQL's buffer pool, Redis's persistence, and most inter-process data stores.

It's also where the nastiest race conditions live — ones that only show up on multi-socket machines under load.

Rule: use tools like ThreadSanitizer and stress-test with concurrent readers/writers.

False sharing can silently halve throughput — pad critical structs to 64 bytes.

🎯 Key Takeaway

Shared memory is the fastest IPC (zero kernel copies) but demands explicit synchronisation.

Always use mutexes or atomics with memory ordering — never trust partitioned writes alone.

For read-heavy workloads, consider a seqlock or RCU pattern.

Shared Memory Decision Tree

IfNeed fastest IPC for large data structures, same host

→

UseUse shared memory with mutex or RW lock. Embed sync primitives at offset 0.

IfRead-heavy, single writer, infrequent updates

→

UseConsider seqlock (sequence counter + barrier) to avoid exclusive locks on reads.

IfMultiple writers, need atomic updates on counters

→

UseUse C11 atomics or GCC __sync builtins. Align to 8-byte boundary.

IfPerformance critical, large dataset (>1GB)

→

UseEnable huge pages (MAP_HUGETLB). Reserve via kernel boot params.

Message Queues: Structured Asynchronous Communication

Message queues (SysV msgget family or POSIX mq_open family) allow processes to send and receive discrete messages. Each message has a type or priority, and the kernel buffers them until the receiver picks them up. The sender doesn't block unless the queue is full.

POSIX message queues (preferred over SysV) provide

named queues (/myqueue) accessible by unrelated processes
prioritised delivery (messages with higher priority are received first)
notification (via mq_notify with a signal or thread) when a message arrives
asynchronous send with O_NONBLOCK

The queue has a fixed maximum number of messages and maximum message size, set at creation. These are kernel-controlled: msgmax, msgmnb, etc. If you need dynamic sizing, you must handle failures.

Message queues are ideal for event-driven systems where producers and consumers don't need to be alive at the same time. They decouple the sender and receiver in time (the kernel holds the message) and in space (receiver can be on a different process).

But they're slower than shared memory — each mq_send/mq_receive involves a system call and a kernel buffer copy. In high-throughput paths, shared memory + a condition variable beats message queues by orders of magnitude.

Be aware of priority inversion: a low-priority message that holds a resource needed by a high-priority sender can cause head-of-line blocking. Monitor queue depths and set timeouts on both ends.

A production scenario: a logging service used a message queue between the app and the file writer. When the queue filled up, the app's mq_send blocked, which blocked the request handler. The entire site went down because the logger became the bottleneck. The fix: mq_timedsend with a timeout and fallback to dropping messages under backpressure.

Another subtlety: POSIX message queue names are limited to NAME_MAX (often 255) characters and must start with a slash. SysV queues use a numeric key. Prefer POSIX for portability and notification support.

Also note: message queues are not suitable for real-time systems with tight deadlines due to unpredictable kernel scheduling. The time between mq_send and mq_receive can vary by microseconds to milliseconds under load.

io_thecodeforge_mq_receiver.cC

#include <mqueue.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    struct mq_attr attr = { .mq_maxmsg = 10, .mq_msgsize = 256 };
    mqd_t mq = mq_open("/io_thecodeforge_mq", O_CREAT | O_RDONLY, 0644, &attr);
    if (mq == (mqd_t)-1) { perror("mq_open"); return 1; }

    char buf[256];
    unsigned int prio;
    ssize_t n = mq_receive(mq, buf, sizeof(buf), &prio);
    if (n == -1) { perror("mq_receive"); return 1; }

    buf[n] = '\0';
    printf("Received: %s (priority %u)\n", buf, prio);
    mq_close(mq);
    mq_unlink("/io_thecodeforge_mq");
    return 0;
}

💡Use mq_notify for Event-Driven Consumption

Instead of polling mq_receive in a loop, use mq_notify to get a signal or spawn a thread when a message arrives. This reduces CPU usage and latency. But beware: the notification is one-shot — you must rearm it after each receipt.

📊 Production Insight

Message queues are ideal for decoupled, async workflows but can become a bottleneck.

A logging service using a full queue blocked the entire request handler, taking the site down.

Fix: set a timeout on send and drop messages under backpressure instead of blocking.

Priority inversion is real — monitor depths and consider separate queues for critical vs bulk messages.

🎯 Key Takeaway

Message queues decouple sender and receiver in time and space.

Slower than shared memory (syscall + copy) but safer and easier to use across unrelated processes.

Prefer POSIX queues over SysV — they're more portable and support notification.

Message Queue Decision Tree

IfNeed async, decoupled messaging, same host, moderate throughput

→

UseUse POSIX message queue (mq_open). Named, priority support, notification available.

IfNeed fixed-priority delivery, sender must never block

→

UseUse O_NONBLOCK and handle EAGAIN with fallback or mq_timedsend.

IfNeed persistent distributed messaging, cross-network

→

UseUse Kafka or RabbitMQ — message queues are local-only. Don't try to stretch POSIX MQ across machines.

IfReal-time critical, tight latency bounds

→

UseAvoid message queues. Use shared memory with condition variables or lock-free ring buffers.

IPC Performance Tuning: Kernel Parameters That Actually Matter

The Linux kernel exposes dozens of sysctl knobs that control IPC behavior. Getting these wrong means your pipe buffer starves, shared memory allocations fail, or message queues block unnecessarily.

For pipes: fs.pipe-max-size (default 1 MB) caps the kernel buffer. Use fcntl(fd, F_SETPIPE_SZ, size) to increase per-pipe up to the max. For high-throughput pipelines, increase fs.pipe-max-size and set appropriate per-pipe sizes.

For shared memory: kernel.shmmax (default 32 MB on some distros) sets the max size of a single shared memory segment. kernel.shmall limits total pages. If your app needs a large shared region (say 16 GB), you must increase both. Also, kernel.shm_rmid_forced can help clean up orphaned segments after a crash.

For message queues: kernel.msgmnb (default 16 KB) caps total bytes in a queue. kernel.msgmax caps per-message size. kernel.msgmni caps number of queues system-wide. Monitor with ipcs -q -l.

For semaphores: kernel.sem sets semaphore limits. If you use System V semaphores for IPC sync, ensure SEMMSL isn't too low.

A common production trap: a team increased shmmax but forgot shmall, so shmget failed with ENOMEM. Always set both. Also, changes via sysctl -w are ephemeral — persist in /etc/sysctl.conf.

Here's a snippet that sets sane defaults for a database workload:

``bash # Increase shared memory limits for Postgres-like workloads sysctl -w kernel.shmmax=68719476736 # 64 GB sysctl -w kernel.shmall=16777216 # 64 GB in pages (4K each) sysctl -w kernel.msgmnb=65536 # 64 KB per queue sysctl -w fs.pipe-max-size=4194304 # 4 MB per pipe ``

Always reboot or reload after editing sysctl.conf: sysctl -p.

io_thecodeforge_ipc_sysctl.shBASH

#!/bin/bash
# Set IPC kernel parameters for production workloads
# Run as root

sysctl -w kernel.shmmax=68719476736
sysctl -w kernel.shmall=16777216
sysctl -w kernel.msgmnb=65536
sysctl -w kernel.msgmax=65536
sysctl -w fs.pipe-max-size=4194304

# Persist across reboots
cat >> /etc/sysctl.conf <<EOF
kernel.shmmax=68719476736
kernel.shmall=16777216
kernel.msgmnb=65536
kernel.msgmax=65536
fs.pipe-max-size=4194304
EOF

⚠ Parameter Order Matters

Some kernel parameters are interdependent. For example, kernel.shmall must be at least kernel.shmmax / PAGE_SIZE. If you increase shmmax without adjusting shmall, large allocations fail silently. Always check with ipcs -lm after changes.

📊 Production Insight

A team increased shmmax but forgot shmall, causing shmget to fail with ENOMEM at scale.

Always set both and persist in /etc/sysctl.conf.

Verify with ipcs -lm after each change.

For pipe-intensive workloads, increase pipe-max-size and set per-pipe via fcntl. Don't rely on the default 1 MB.

🎯 Key Takeaway

Kernel parameters control IPC resource limits.

shmmax and shmall must be set together.

pipe-max-size and msgmnb are the most commonly misconfigured.

IPC Tuning Decision Tree

Ifshmget fails with ENOMEM for large segment

→

UseIncrease kernel.shmmax and kernel.shmall (shmall >= shmmax / PAGE_SIZE). Verify with ipcs -lm.

IfPipe write blocks even with space in user buffer

→

UseIncrease fs.pipe-max-size via sysctl, and set per-pipe with fcntl(fd, F_SETPIPE_SZ, new_size).

IfMessage queue send blocks (full queue)

→

UseIncrease kernel.msgmnb and kernel.msgmax. Consider using non-blocking send with fallback.

IfNeed to clean up orphaned shared memory after crash

→

UseEnable kernel.shm_rmid_forced. Also consider using shm_unlink in application cleanup handlers.

IfSemaphore operations fail with ENOSPC

→

UseIncrease kernel.sem values: SEMMSL, SEMMNS, SEMOPM, SEMMNI. Use ipcs -sl to check current limits.

Real-World IPC: How PostgreSQL and Redis Use Shared Memory

Two of the most popular databases rely on shared memory for performance-critical paths.

PostgreSQL uses shared memory for its shared buffer pool, WAL buffers, and lock tables. The shared_buffers parameter directly maps to a SysV or POSIX shared memory segment. Misconfiguring kernel.shmmax or kernel.shmall to be smaller than shared_buffers causes PostgreSQL to fail to start with an IPC error. A common fix is to increase kernel parameters or reduce shared_buffers.

Redis uses shared memory for its persistence model? No — Redis is single-threaded and uses fork()+COW, not shared memory. However, Redis cluster and some tools use shared memory for inter-process coordination. The bigger point: Redis uses socket IPC for replication and networking.

But the real shared-memory workhorse is PostgreSQL. A production story: a PostgreSQL deployment had shared_buffers set to 8 GB but kernel.shmmax was only 1 GB. The server refused to start. The fix was to increase kernel.shmmax to 12 GB. Always check ipcs -lm after changes.

Another example: Apache HTTP uses shared memory for the scoreboard to track worker processes. Nginx uses shared memory for rate limiting and load balancing across workers. Both depend on correct kernel tuning.

For custom applications, a ring buffer in shared memory is a common pattern for high-throughput logging or metrics collection. The key is to use a lock-free design with atomic sequence numbers and enough padding to avoid false sharing.

In production, always monitor shared memory usage with ipcs -m and ensure your monitoring alerts on segment allocation failures. A sudden spike in shared memory usage can indicate a bug that leaks segments — set up a job to clean up orphaned segments on startup.

io_thecodeforge_check_shm.shBASH

#!/bin/bash
# Check shared memory usage and kernel limits

echo "=== Shared Memory Limits ==="
sysctl kernel.shmmax kernel.shmall 2>/dev/null

echo ""
echo "=== Current Shared Memory Segments ==="
ipcs -m

echo ""
echo "=== IPC Limits (ipcs -lm) ==="
ipcs -lm

🔥PostgreSQL IPC Startup Failure

PostgreSQL will refuse to start if kernel.shmmax is smaller than shared_buffers. The error message usually says 'invalid value for parameter shared_buffers' or 'could not create shared memory segment'. Increase kernel parameters as needed.

📊 Production Insight

PostgreSQL's shared_buffers is a direct consumer of shared memory.

If shmmax is too low, the database won't start — you get a cryptic IPC error.

Rule: always set kernel.shmmax >= shared_buffers + some margin.

Redis doesn't use shared memory for data, but its replication uses socket IPC.

Monitor ipcs -m for orphaned segments that can slowly exhaust system memory.

🎯 Key Takeaway

PostgreSQL, Apache, Nginx — all use shared memory behind the scenes.

Get the kernel parameters wrong and they fail silently.

Always verify shared memory limits before deploying a database or web server at scale.

Real-World IPC Decision Tree

IfRunning PostgreSQL with large shared_buffers

→

UseEnsure kernel.shmmax and kernel.shmall are sufficiently large. Increase if necessary.

IfCustom high-throughput logging using ring buffer

→

UseUse shared memory with atomic sequence numbers and padded structs to avoid false sharing.

IfNginx load balancing across workers

→

UseUse shared memory for rate limit zones. Pre-allocate with adequate size.

IfApache HTTP with MPM worker

→

UseShared memory used for scoreboard. Monitor with ipcs -m; clean up after graceful restart.

Process Control Blocks: The OS's Cheat Sheet for Your Process

Every time your process does something — reads a file, sends a signal, spawns a child — the kernel looks up a data structure called the Process Control Block (PCB). The PCB is not an abstraction. It's a concrete struct in kernel memory that holds everything the OS needs to switch between processes, handle interrupts, and enforce security boundaries.

The PCB contains the process state (running, waiting, zombie), program counter, CPU registers, memory limits, open file descriptors, and the process ID. If you've ever wondered why fork() is fast but not instant: the kernel has to copy and reinitialize parts of the PCB. Missed a real-time deadline because your process got preempted? Blame the context switch time — the kernel saving one PCB and loading another.

Production trap: thinking that higher priority always means faster response. The OS also checks resource limits stored in the PCB. If you're hammering shared memory in a tight loop without yielding, your priority bump won't save you. The scheduler reads the PCB's stored nice value, but also its cumulative CPU time. Starvation happens when the kernel decides your process has had enough CPU for now.

PCBMetrics.pyPYTHON

// io.thecodeforge — cs-fundamentals tutorial

import os
import time

def check_process_state():
    """
    Simulates reading PCB fields via /proc.
    In production, this is how you'd introspect a process.
    """
    pid = os.getpid()
    proc_path = f"/proc/{pid}/stat"
    
    try:
        with open(proc_path, 'r') as f:
            fields = f.read().split()
        # Field 2: process state (single char)
        state = fields[2]
        # Field 13: utime (clock ticks)
        utime = int(fields[13])
        # Field 14: stime
        stime = int(fields[14])
        
        print(f"PID: {pid}")
        print(f"State: {state}")
        print(f"User time (ticks): {utime}")
        print(f"System time (ticks): {stime}")
    except FileNotFoundError:
        print("Process no longer exists — zombie check your children.")

if __name__ == "__main__":
    check_process_state()

Output

PID: 12345

State: S (sleeping)

User time (ticks): 47

System time (ticks): 12

⚠ Production Trap:

Reading /proc/PID/stat in a tight loop can contend with the kernel's internal PCB lock. Do NOT poll it faster than every 100ms in production — you'll stall the scheduler.

🎯 Key Takeaway

The PCB is the kernel's single source of truth for your process. If you're debugging weird performance, read the PCB fields before guessing.

Types of Processes: Not All Processes Are Equally Dangerous

The kernel classifies processes by how they were created and what they're allowed to do. The two fundamental types: independent and cooperating. Independent processes share nothing — they run in their own address space and can't affect each other (except through the OS). Cooperating processes intentionally share data via IPC. That's where the fun starts.

Within cooperating processes, you have parent-child (forked) and sibling (same parent) relationships. The parent-child relationship is critical because of signal propagation: kill the parent, and the OS can optionally send SIGHUP to children. Miss this in design, and your orphaned child processes will keep holding shared memory segments, causing leaks that outlast the parent.

Real-world rule: daemonize your long-lived worker processes. A daemon process has no controlling terminal and gets reparented to init (PID 1). If you fork a worker from a web server, the worker inherits the server's environment variables, file descriptors, and signal handlers. One stray kill -9 on the server, and your worker dies too. Detach it properly by forking twice and calling setsid().

DaemonizeWorker.pyPYTHON

// io.thecodeforge — cs-fundamentals tutorial

import os
import sys
import time

def daemonize():
    """
    Fork twice to become a daemon process.
    First fork: run in background.
    Second fork: prevent reacquiring a terminal.
    """
    pid = os.fork()
    if pid > 0:
        # Parent exits so child runs in background
        sys.exit(0)
    
    # Child: become session leader
    os.setsid()
    
    # Second fork to ensure no terminal reattach
    pid = os.fork()
    if pid > 0:
        sys.exit(0)
    
    # Now we're a daemon
    print(f"Daemon PID: {os.getpid()}", file=sys.stderr)
    
    while True:
        # Simulate IPC work — polling a shared memory flag
        time.sleep(10)

if __name__ == "__main__":
    daemonize()

Output

Daemon PID: 54321

🔥Senior Shortcut:

If you're using Python's multiprocessing module with shared memory, always set daemon=True on your workers. But watch out: daemon processes can't create child processes of their own.

🎯 Key Takeaway

Know your process type before choosing an IPC mechanism. A daemon worker needs different signal handling than a short-lived forked child.

Problems in Inter-Process Communication (IPC)

IPC is not free. The fundamental problem is coordination without corruption. When two processes access a shared resource—memory, a file, a pipe—they race against each other. The classic race condition occurs when process A reads a value, process B modifies it, and A writes back the stale value: the update vanishes. Deadlock is another killer: process 1 holds resource X and waits for Y, while process 2 holds Y and waits for X. Both freeze forever. Starvation happens when a low-priority process never gets access because higher-priority processes keep cutting in line. Synchronization primitives like mutexes and semaphores solve races but introduce overhead—contention on a lock can destroy throughput. Portability is another hidden cost: POSIX pipes work on Linux but not Windows named pipes. Finally, security: a misconfigured shared memory segment leaks data between processes that should never see each other. Understanding these problems tells you why you pick a specific IPC mechanism: speed vs. safety, structure vs. flexibility.

race_demo.pyPYTHON

// io.thecodeforge — cs-fundamentals tutorial
import multiprocessing
import time

counter = multiprocessing.Value('i', 0)

def increment(lock, n):
    for _ in range(n):
        with lock:
            counter.value += 1  # critical section

if __name__ == '__main__':
    lock = multiprocessing.Lock()
    p1 = multiprocessing.Process(target=increment, args=(lock, 100000))
    p2 = multiprocessing.Process(target=increment, args=(lock, 100000))
    p1.start(); p2.start()
    p1.join(); p2.join()
    print(f"Final counter: {counter.value} (expected 200000)")

Output

Final counter: 200000 (expected 200000)

⚠ Production Trap:

Removing the lock from the demo above silently produces a final counter like 142893 instead of 200000. Race bugs are non-deterministic—they pass your unit tests and crash at 3 AM under load.

🎯 Key Takeaway

All IPC problems stem from shared state and concurrency; always use a synchronization primitive around any mutable shared resource.

Modes of Inter-Process Communication

IPC modes fall into two families: synchronous and asynchronous. Synchronous modes block the sending process until the receiver confirms receipt. A pipe write blocks when the buffer is full. A socket send blocks until the data leaves the kernel buffer. These give you backpressure—the sender cannot outrun the receiver, preventing memory blowups. The cost? Latency spikes when the receiver is slow. Asynchronous modes, like message queues, return immediately. The kernel buffers the message, and the receiver picks it up later. This decouples the processes in time—great for bursty workloads—but you lose flow control: the queue grows unbounded until memory runs out. Shared memory is a third axis: it's neither fully synchronous nor asynchronous. Two processes read and write a common memory region with no kernel intervention. Synchronization is manual (spinlocks, futexes). Speed is maximum, but the cost is complexity and the risk of data races. Signals are another mode: a one-bit notification sent to a process. They're fast but carry no data—only the fact that something happened. Each mode trades off latency, throughput, safety, and developer effort.

async_queue.pyPYTHON

// io.thecodeforge — cs-fundamentals tutorial
from multiprocessing import Queue, Process
import time

def producer(q):
    for i in range(5):
        q.put(i)
        print(f"Produced {i}")

def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consumed {item}")
        time.sleep(0.2)

if __name__ == '__main__':
    q = Queue()
    p1 = Process(target=producer, args=(q,))
    p2 = Process(target=consumer, args=(q,))
    p1.start(); p2.start()
    p1.join()
    q.put(None)  # poison pill
    p2.join()

Output

Produced 0

Produced 1

Consumed 0

Produced 2

Consumed 1

Produced 3

Consumed 2

Produced 4

Consumed 3

Consumed 4

⚠ Production Trap:

A poison pill (None sentinel) is required for graceful shutdown of a consumer. Without it, the consumer blocks forever on q.get() after the last real message, leaking process resources.

🎯 Key Takeaway

Choose synchronous IPC for backpressure and bounded memory; choose asynchronous for decoupling and burst tolerance; choose shared memory only when you own the synchronization.

Unix Domain Sockets vs TCP Localhost: Performance Comparison

Unix Domain Sockets (UDS) and TCP localhost connections are two common methods for IPC on the same machine. UDS use the filesystem as their address namespace, while TCP localhost uses the loopback network interface (127.0.0.1). The key performance difference lies in overhead: TCP involves protocol stack processing (headers, checksums, congestion control) even on localhost, whereas UDS bypasses the network stack entirely, operating within the kernel's socket layer with minimal overhead. Benchmarks show UDS can be 2-3x faster for small messages and have lower latency. For example, a simple echo benchmark with 1KB messages might show UDS throughput of ~500K messages/sec vs TCP's ~200K messages/sec. However, TCP localhost offers the advantage of using the same API for both local and remote communication, simplifying code. In production, choose UDS for performance-critical local IPC (e.g., database connections, web server backends) and TCP for portability or when you may later distribute processes across machines. Practical example: Nginx uses UDS for communication between master and worker processes. To benchmark, use tools like iperf or custom socket programs. Note that UDS also supports sendmsg with file descriptors, enabling efficient credential passing.

uds_vs_tcp_benchmark.pyPYTHON

import socket, time, sys

def benchmark(use_uds, msg_size=1024, count=10000):
    if use_uds:
        s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        s.bind('/tmp/test_uds')
        s.listen(1)
        conn, _ = s.accept()
    else:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.bind(('127.0.0.1', 0))
        s.listen(1)
        port = s.getsockname()[1]
        # client connects
        c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        c.connect(('127.0.0.1', port))
        conn, _ = s.accept()
    msg = b'x' * msg_size
    start = time.time()
    for _ in range(count):
        conn.sendall(msg)
        conn.recv(msg_size)
    elapsed = time.time() - start
    print(f"{'UDS' if use_uds else 'TCP'}: {count/elapsed:.0f} msgs/sec")
    conn.close()
    s.close()

if __name__ == '__main__':
    benchmark(True)
    benchmark(False)

💡When to Use Unix Domain Sockets

📊 Production Insight

In production, many high-performance services (e.g., PostgreSQL, Redis) use UDS for local connections. When benchmarking, ensure you account for connection setup overhead; persistent connections amortize this cost.

🎯 Key Takeaway

Unix Domain Sockets outperform TCP localhost by 2-3x due to bypassing the network stack, making them ideal for high-performance local IPC.

D-Bus: Desktop IPC on Linux

D-Bus is a message bus system that provides inter-process communication for desktop applications on Linux (and other Unix-like systems). It allows applications to communicate with each other and with the system daemon. D-Bus operates on a bus topology: there is a system bus for hardware events and services, and a session bus for per-user desktop applications. Messages are typed and structured, supporting method calls, signals, and properties. Applications register object paths and interfaces, and clients invoke methods or listen for signals. For example, a media player can expose a 'Play' method on the bus, and a remote control app can call it. D-Bus uses a binary protocol for efficiency and supports introspection (querying available methods). Practical use: NetworkManager uses D-Bus to allow apps to control network settings. To use D-Bus in C, you'd link against libdbus-1; higher-level bindings exist for Python (dbus-python), Qt (QtDBus), and GLib (GDBus). Example: A simple Python script to list names on the session bus: import dbus; bus = dbus.SessionBus(); print(bus.list_names()). D-Bus also handles service activation: if a service is not running, the bus can start it automatically. However, D-Bus adds overhead compared to raw sockets or shared memory, so it's best suited for desktop IPC where flexibility and structured communication matter.

dbus_example.pyPYTHON

import dbus

def get_session_bus_names():
    bus = dbus.SessionBus()
    names = bus.list_names()
    for name in names:
        print(name)

def call_method():
    bus = dbus.SessionBus()
    # Example: call a method on a hypothetical media player
    proxy = bus.get_object('org.example.MediaPlayer', '/org/example/MediaPlayer')
    iface = dbus.Interface(proxy, 'org.example.MediaPlayer')
    result = iface.Play()
    print(f"Play returned: {result}")

if __name__ == '__main__':
    get_session_bus_names()
    # call_method()

🔥D-Bus vs Other IPC

📊 Production Insight

When building desktop applications, use D-Bus for integration with system services (e.g., notifications, network management). For custom high-performance IPC, avoid D-Bus due to its overhead.

🎯 Key Takeaway

D-Bus provides a standardized, structured IPC mechanism for desktop Linux applications, supporting method calls, signals, and service activation.

Shared Memory with mmap and shm_open

Shared memory is the fastest IPC mechanism because processes can directly access the same memory region without kernel intervention for data transfer. On POSIX systems, you can create shared memory using shm_open (which creates a file descriptor backed by a shared memory object) and then mmap to map it into the process address space. This avoids the overhead of System V shared memory (shmget) and provides a file descriptor that can be used with poll/select. Example: A producer writes data to a shared buffer, and a consumer reads it. Synchronization is required (e.g., using mutexes or atomic operations). Here's a simple producer-consumer using POSIX shared memory and a semaphore: The producer creates a shared memory object with shm_open, sets its size with ftruncate, maps it with mmap, and writes data. The consumer opens the same object and maps it. A semaphore (named or unnamed) coordinates access. For instance, the producer increments a counter after writing, and the consumer waits on the semaphore. This pattern is used in high-frequency trading systems and multimedia pipelines. Note that shm_open creates a file in /dev/shm/ (tmpfs) by default. Cleanup requires shm_unlink and munmap. In production, ensure proper synchronization to avoid torn reads/writes. For example, use atomic operations for simple flags or a mutex for complex structures. The mmap approach also allows mapping files directly (file-backed mapping) for persistent shared data.

shm_producer_consumer.cC

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <semaphore.h>
#include <string.h>

#define SHM_NAME "/my_shm"
#define SEM_NAME "/my_sem"
#define BUF_SIZE 1024

struct shared {
    sem_t sem;
    char buf[BUF_SIZE];
};

int main() {
    // Producer
    int fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(fd, sizeof(struct shared));
    struct shared *s = mmap(NULL, sizeof(struct shared), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    close(fd);
    sem_init(&s->sem, 1, 0);
    strcpy(s->buf, "Hello from producer!");
    sem_post(&s->sem);
    printf("Producer wrote: %s\n", s->buf);
    sleep(2);
    munmap(s, sizeof(struct shared));
    shm_unlink(SHM_NAME);
    sem_destroy(&s->sem);
    return 0;
}

⚠ Synchronization is Mandatory

📊 Production Insight

Use shared memory for high-throughput, low-latency IPC where data is large or frequent. In production, prefer POSIX shared memory over System V for better integration with modern APIs and easier cleanup.

🎯 Key Takeaway

POSIX shared memory with shm_open and mmap provides the fastest IPC by allowing direct memory access, but requires explicit synchronization to avoid data corruption.

● Production incidentPOST-MORTEMseverity: high

The Shared Memory Race That Brought Down a Trading Engine

Symptom

Intermittent data corruption: order book entries showed prices from different stocks, trades failed to match, system logged 'impossible' states.

Assumption

The team assumed that because each process only wrote to its own offset within the shared region, no race existed. They used a fixed partitioning scheme — process A owned bytes 0–1023, process B owned 1024–2047 — but both processes also read from each other's regions to detect updates.

Root cause

A uint64 write is not atomic on x86 when misaligned. Process A started writing an 8-byte price at offset 1022 (crossing the partition boundary), and process B's reader saw a torn value — the first 2 bytes from the new write and the remaining 6 from the previous value. That torn integer became a negative price, which cascaded into a market order being sent at the wrong side.

Fix

1. Align all shared variables to their natural size (uint64 on 8-byte boundary). 2. Add a pthread_mutex_t per partition. 3. Use volatile sig_atomic_t flags for lock-free reads where possible. 4. Run with ThreadSanitizer on the test cluster before every production deploy.

Key lesson

Shared memory is not safe without synchronisation, even with 'non-overlapping' writes
Assume all data races are possible: the compiler and CPU can reorder operations
Always align shared variables to their type size
Use proper sync primitives — lock-free algorithms require proof, not hope

Production debug guideCommon IPC failures and the exact commands to diagnose them5 entries

Symptom · 01

Pipe write returns EPIPE (Broken pipe)

→

Fix

Check if the reading end is closed. Use lsof -c <process> to see open file descriptors. Ensure reader hasn't crashed or exited early.

Symptom · 02

Socket connect hangs indefinitely

→

Fix

strace -e trace=connect,recvfrom,sendto -p <PID> to see connection state. Netstat ss -tlnp to verify listen backlog and port. Check firewall rules with iptables -L.

Symptom · 03

Shared memory read returns stale or corrupted data

→

Fix

1. Verify mutex locking with valgrind --tool=helgrind. 2. Use cat /proc/<PID>/maps to confirm shared memory region is mapped with correct permissions. 3. Dump raw bytes with dd if=/dev/shm/<name> bs=1 count=<size> | xxd to inspect corruption.

Symptom · 04

Message queue send blocks forever

→

Fix

Check queue size with ipcs -q. Compare message count to msg_qbytes. Use strace -e trace=msgsnd to see if it's blocking on full queue. Increase kernel parameter kernel.msgmnb or add non-blocking flag.

Symptom · 05

Unix socket bind fails with EADDRINUSE

→

Fix

Check if the socket file exists with ls -l /path/to/socket. Remove stale files with rm /path/to/socket. Ensure your application unlinks the socket on shutdown. Use lsof /path/to/socket to see if another process still has it open.

★ IPC Quick Debug CommandsThe 4 most common IPC failures and what to run in under 30 seconds.

Pipe broken EPIPE−

Immediate action

`strace -e write -p <writer_pid>`

Commands

`lsof -c <process_name> | grep pipe`

`echo 'pipe size' ; cat /proc/sys/fs/pipe-max-size`

Fix now

Ensure reader process stays alive; add SIGPIPE handler or set it to SIG_IGN

Socket connect timeout+

Shared memory stale data+

Message queue send blocks+

IPC Mechanism Comparison

Mechanism	Typical Latency	Data Shape	Scope	Sync Needed	Max Throughput
Pipe	~10 µs	Byte stream	Same host (parent-child or FIFO)	No (kernel serializes)	~1 GB/s
Unix Socket	~50 µs	Byte stream or datagram	Same host	No (kernel serializes)	~2 GB/s
TCP Socket	~100 µs (local) – 10 ms (WAN)	Byte stream	Any network	No (kernel serializes)	~1 GB/s (local)
Shared Memory	~70–150 ns	Raw memory	Same host	Yes (mutex/atomics)	~10 GB/s
Message Queue	~10–100 µs	Messages (priority)	Same host	No (kernel buffers)	~100 MB/s

⚙ Quick Reference

14 commands from this guide

File	Command / Code	Purpose
io_thecodeforge_ipc_overview.c	int main() {	What is Inter-Process Communication?
io_thecodeforge_pipe_example.c	int main() {	Pipes
io_thecodeforge_unix_domain_socket_server.c	int main() {	Sockets
io_thecodeforge_shared_memory_producer.c	int main() {	Shared Memory
io_thecodeforge_mq_receiver.c	int main() {	Message Queues
io_thecodeforge_ipc_sysctl.sh	sysctl -w kernel.shmmax=68719476736	IPC Performance Tuning
io_thecodeforge_check_shm.sh	echo "=== Shared Memory Limits ==="	Real-World IPC
PCBMetrics.py	def check_process_state():	Process Control Blocks
DaemonizeWorker.py	def daemonize():	Types of Processes
race_demo.py	counter = multiprocessing.Value('i', 0)	Problems in Inter-Process Communication (IPC)
async_queue.py	from multiprocessing import Queue, Process	Modes of Inter-Process Communication
uds_vs_tcp_benchmark.py	def benchmark(use_uds, msg_size=1024, count=10000):	Unix Domain Sockets vs TCP Localhost
dbus_example.py	def get_session_bus_names():	D-Bus
shm_producer_consumer.c	struct shared {	Shared Memory with mmap and shm_open

Key takeaways

IPC mechanisms are not interchangeable

latency ranges from 70ns (shared memory) to 50µs (message queues).

Shared memory is the fastest but requires explicit synchronisation; a missing mutex corrupts data silently.

Pipes are simple and zero-config but limited to unidirectional, parent-child flows.

Unix domain sockets are the gold standard for local bidirectional IPC

use them over TCP on the same host.

Message queues decouple sender and receiver but can become a bottleneck if queue size is misconfigured.

Tune kernel parameters (shmmax, pipe-max-size, msgmnb) for your workload

defaults are often too small.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

Compare pipes and Unix domain sockets for local IPC. When would you use ...

Q02SENIOR

How does shared memory achieve the lowest latency? What's the main risk?

Q03SENIOR

What is the difference between POSIX and System V message queues? Which ...

Q04SENIOR

Explain the role of `kernel.shmmax` and `kernel.shmall`. What happens if...

Q05SENIOR

What is false sharing in the context of shared memory IPC, and how do yo...

Q06SENIOR

How does PostgreSQL use shared memory, and what kernel parameters must b...

Q01 of 06JUNIOR

Compare pipes and Unix domain sockets for local IPC. When would you use one over the other?

ANSWER

Pipes are simpler and lighter – they're just file descriptors for parent-child communication. But they're unidirectional. Unix domain sockets are bidirectional, support multiple connections (server model), and can pass file descriptors. Use pipes for simple one-way data flows; use Unix sockets when you need bidirectional communication or a client-server pattern. Both are kernel-buffered, but Unix sockets are slightly more flexible.

FAQ · 7 QUESTIONS

Frequently Asked Questions

Can pipes be used for unrelated processes?

Why is shared memory faster than pipes?

What happens if a message queue fills up?

How do I check the current size of my pipe buffer?

Is it safe to share memory between processes without any locks if each process only writes to its own region?

What is splice() and when should I use it for pipes?

How do I monitor shared memory usage in production?

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Operating Systems. Mark it forged?

14 min read · try the examples if you haven't

IPC — Misaligned uint64 Torn Read in Shared Memory

What is Inter-Process Communication?

Pipes: Simple Byte Streams Between Related Processes

Sockets: Network-Transparent Bidirectional Communication

Shared Memory: Raw Speed with Manual Synchronisation

Message Queues: Structured Asynchronous Communication

IPC Performance Tuning: Kernel Parameters That Actually Matter

Real-World IPC: How PostgreSQL and Redis Use Shared Memory

Process Control Blocks: The OS's Cheat Sheet for Your Process

Types of Processes: Not All Processes Are Equally Dangerous

Problems in Inter-Process Communication (IPC)

Modes of Inter-Process Communication

Unix Domain Sockets vs TCP Localhost: Performance Comparison

D-Bus: Desktop IPC on Linux

Shared Memory with mmap and shm_open

The Shared Memory Race That Brought Down a Trading Engine

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's Operating Systems. Mark it forged?