Python Threading vs Multiprocessing: Race Condition Gotcha
Duplicate entries and HTTP 429 errors from concurrent list pops.
- Threading shares memory but is limited by Python's GIL to one bytecode thread at a time.
- Multiprocessing runs separate processes, each with its own GIL, enabling full CPU parallelism.
- Threads excel at I/O-bound tasks; processes excel at CPU-bound tasks.
- Sharing data between processes requires serialization (pickle) which adds overhead.
- Deadlocks and race conditions occur in both—knowing lock ordering prevents them.
- Profile before optimizing: a wrong choice can slow your app by 10x.
Every Python developer eventually hits the wall: their code is slow, the CPU is barely breaking a sweat, and adding a loop makes it worse. At that moment, concurrency stops being a theory and becomes urgent. Threading and multiprocessing are Python's two primary answers to that problem, and choosing the wrong one doesn't just cost performance — it can introduce bugs that only appear in production at 3 AM under heavy load.
The core problem both tools solve is the same: doing more than one thing at a time. But the reason your choice matters so much is the Global Interpreter Lock — the GIL. CPython, the standard Python interpreter, uses a mutex that allows only one thread to execute Python bytecode at any given moment. This single design decision splits Python's concurrency world in two: threads that share memory but battle the GIL, and processes that sidestep the GIL entirely by running in separate interpreter instances at the cost of higher overhead and no shared memory by default.
By the end of this article you'll understand exactly when threads win, when processes win, how to safely share data between both, how to avoid the race conditions and deadlocks that bite even experienced engineers, and how to profile your choice to confirm it actually helps. We'll go deep into the CPython internals that explain the behaviour, not just the surface-level API.
What is threading and multiprocessing in Python?
Threading and multiprocessing are Python's standard library modules for running code concurrently. threading creates multiple threads within the same process — they share memory but are constrained by the Global Interpreter Lock (GIL). multiprocessing spawns separate processes, each with its own Python interpreter and memory space, bypassing the GIL entirely.
You don't need to choose one forever. Many production systems use both: threads for I/O, processes for CPU-heavy work. The key is understanding the cost of each. Thread creation is cheap (~50 µs), but context switching between threads under GIL contention can waste cycles. Process creation is expensive (~10 ms), but once running, they can max out every CPU core without stepping on each other.
Let's start with a concrete example. The code below shows a simple CPU-intensive task running single-threaded, then with threads, then with processes. The output will surprise you if you expect threads to speed it up.
The Global Interpreter Lock (GIL) — Why Threads Are Not Parallel
CPython's GIL ensures only one thread executes Python bytecode at any instant. This isn't a bug — it simplified CPython's memory management and made C extension modules easier to write. But it also means that CPU-bound Python threads do not run in parallel on multiple cores; they take turns. For I/O-bound tasks, threads are still useful because they release the GIL while waiting for I/O, allowing other threads to run.
Let's visualise: imagine a lock that each thread must hold before it can do any Python work. When a thread does a blocking I/O call — like reading from a socket — it releases the lock, and another thread can grab it. This is why threading works for web scraping, database queries, and file downloads. But if you're doing pure math in a loop, no I/O happens, the lock is never released, and you get no parallelism — often even slower due to the overhead of context switching.
The GIL is re-acquired after every 100 bytecode instructions (Python 3.2+) or on I/O. This interval is adjustable via , but don't change it unless you're profiling. Even with shorter intervals, CPU-bound threads still fight for the lock.sys.setswitchinterval()
Multiprocessing — True Parallelism but Higher Overhead
The multiprocessing module spawns separate Python processes, each with its own interpreter, memory space, and — crucially — its own GIL. This means you can actually use all CPU cores for parallel computation. But this freedom comes at a cost: creating a process is expensive (forking or spawning takes tens of milliseconds), and sharing data between processes requires serialization (pickling) which adds overhead and limits what can be shared.
Common patterns: Pool for map-reduce style parallelism, Process for long-running workers, and Queue or Pipe for inter-process communication. Shared memory via multiprocessing.Value or Array can avoid serialization but only works for primitive types. Manager objects allow sharing Python objects across processes but are slower due to a server process mediating access.
When you call Pool.map(), the data is split into chunks, each chunk is pickled, sent over a pipe to a worker process, unpickled, computed, re-pickled, and sent back. This overhead can dominate if each task is tiny. Use chunksize parameter to batch multiple tasks per call, reducing IPC overhead.
Sharing State Between Threads and Processes
Threads share everything: same address space, same Python objects. That's convenient but dangerous. Without proper synchronization, two threads can read and write the same variable in unpredictable ways — a race condition. Python's threading.Lock is the basic tool to protect critical sections. Use with lock: blocks around all access to shared mutable state.
Processes do not share memory by default. To share data, you must use explicit IPC mechanisms: - multiprocessing.Queue: thread- and process-safe FIFO, great for producer-consumer. - multiprocessing.Pipe: faster but only for two endpoints. - multiprocessing.Value / multiprocessing.Array: raw shared memory for C types (ctypes). Requires locking on writes. - multiprocessing.Manager: creates a server process that proxies Python objects, easier but much slower.
Each approach has trade-offs between speed and flexibility. Default to Queue unless you have a strong reason not to. Manager objects are convenient but add 2-5x latency per access because every attribute access crosses a pipe.
For thread safety beyond locks, consider data or immutable data structures. Avoid relying on the GIL to protect shared state — it doesn't protect against context switches between bytecode instructions.threading.local()
Choosing Between Threading, Multiprocessing, and asyncio
Python offers three main concurrency tools: threading, multiprocessing, and asyncio. The right choice depends on the nature of your workload. - Threading: best for I/O-bound tasks where you have many concurrent operations, especially when you need true parallelism in waiting (e.g., web scraping, database queries). Threads are lightweight and share memory, making coordination simple if done correctly. - Multiprocessing: best for CPU-bound tasks where you need to leverage multiple cores. Each process runs independently, so you avoid the GIL. Overhead is higher, and inter-process communication is slower. - asyncio: best for I/O-bound tasks with a single thread, using cooperative multitasking via an event loop. It eliminates the overhead of thread switching and race conditions on shared state, but you must use async-friendly libraries and cannot block the event loop.
In practice, many senior developers mix these: use asyncio for network I/O, and farm out CPU-heavy work to a multiprocessing pool (using loop.run_in_executor). This gives you the scalability of async I/O with the parallelism of processes.
One more nuance: if your I/O-bound task involves many concurrent connections (thousands), asyncio scales better than threading because threads have overhead per thread (~8MB stack). asyncio's overhead is ~2KB per task. For 10,000 connections, asyncio is the clear winner.
Performance Profiling and Debugging Concurrency Issues
Never assume your concurrency choice makes things faster. Always profile before and after. Python's built-in cProfile works with multithreaded programs but only shows the main thread's perspective. For multiprocessing, profile each child process separately. threading.Thread can be profiled with logging.threading.current_thread().name
- Too many threads/processes: context switching or memory exhaustion.
- Chunksizes too small in
Pool.map: IPC overhead dominates. - Locks held too long: reduce scope of critical sections.
- Pickling overhead for large data: consider shared memory or array-based solutions.
Debugging deadlocks or hangs: use python -u to disable output buffering, then send a SIGQUIT (Ctrl+\) to get a traceback of all threads. For processes, use ps or strace to see where they're blocked. Use in your code to dump threads on crash.faulthandler.enable()
One production technique: wrap each thread's main loop in a try/except that logs the thread name and exception. This helps identify which thread is failing without a full core dump.
| Concept | Use Case | Example |
|---|---|---|
| Threading | I/O-bound tasks (web scraping, DB queries, file reads) | Multiple HTTP requests using concurrent.futures.ThreadPoolExecutor |
| Multiprocessing | CPU-bound tasks (image processing, data crunching, simulations) | multiprocessing.Pool.map with square function on large array |
| asyncio | High-concurrency I/O (thousands of connections, real-time services) | aiohttp to fetch 10,000 URLs concurrently with single thread |
| Mixed (async + processes) | I/O-heavy app with occasional CPU work | asyncio event loop with loop.run_in_executor(pool, cpu_task) |
Key Takeaways
- Threads: shared memory, limited by GIL, ideal for I/O-bound tasks.
- Processes: bypass GIL, real parallelism, higher IPC overhead.
- asyncio: cooperative multitasking for high-concurrency I/O, single thread.
- Always profile before optimizing concurrency choices.
- Deadlocks are preventable: lock ordering, timeouts, and minimal shared state.
- Shared state is the root of most concurrency bugs — prefer message passing.
Common Mistakes to Avoid
- Memorising syntax before understanding the concept
Symptom: Cannot apply concurrency to new problems; copy-paste code without knowing why it works
Fix: Build mental models (e.g., GIL as bathroom key). Write small experiments: create two threads that update a shared counter without locks and observe corruption. - Skipping practice and only reading theory
Symptom: Confident in interviews but unable to debug a real deadlock in production
Fix: Set up a local project with ThreadPoolExecutor and multiprocessing.Pool. Introduce bugs intentionally and debug them. - Using threads for CPU-bound tasks expecting parallel speedup
Symptom: No performance gain, often slower due to GIL contention and context switching
Fix: Use multiprocessing or asyncio + process pool executor instead. Profile first. - Not protecting shared mutable state with locks in threads
Symptom: Intermittent data corruption, unexplained crashes, incorrect results
Fix: Always use threading.Lock when multiple threads read/write same objects; prefer immutable data or thread-safe queues - Creating too many processes (e.g., 1000 processes on 4-core machine)
Symptom: System becomes unresponsive, memory exhausted, high swap usage
Fix: Cap number of processes tousingos.cpu_count()multiprocessing.Pool(processes=os.cpu_count()) - Forgetting to call `join()` on multiprocessing.Process
Symptom: Zombie processes accumulate, eventually hitting OS limit
Fix: Always join processes (or use Pool context manager); set daemon=True only for short-lived tasks - Assuming `multiprocessing.Queue` is as fast as `threading.Queue`
Symptom: Unexpected latency in inter-process data passing
Fix: Profile IPC overhead. Usemultiprocessing.Pipefor simple two-way communication, or shared memory for primitive types.
Interview Questions on This Topic
- QExplain how the GIL affects Python threading. When would you still choose threads over processes?JuniorReveal
- QWhat is the difference between a Lock and an RLock in threading? Give a scenario where RLock is necessary.Mid-levelReveal
- QHow does
multiprocessing.Queuediffer fromthreading.Queue? What happens when you send an unpicklable object?Mid-levelReveal - QDesign a system that handles 10,000 concurrent socket connections in Python. Compare threading vs asyncio vs multiprocessing. Which would you choose?SeniorReveal
- QYou have a mixed workload: 80% I/O (HTTP requests) and 20% CPU (parsing JSON). How do you design the concurrency?SeniorReveal
- QExplain how the GIL is released during I/O operations. What happens when a C extension releases the GIL?SeniorReveal
Frequently Asked Questions
What is the GIL and how does it affect Python concurrency?
The Global Interpreter Lock is a mutex that protects CPython's memory management. It ensures only one thread executes Python bytecode at a time. This makes threads safe for I/O-bound tasks but not for CPU-bound parallelism. Multiprocessing sidesteps the GIL by using separate interpreters.
Does multiprocessing always make Python code faster?
No. If the overhead of spawning processes and serializing data outweighs the computational gain, it can be slower. Always profile with realistic data. For very small tasks, a single process may be faster.
How can I avoid deadlocks in Python threading?
Acquire locks in a consistent order, use timeouts (Lock.acquire(timeout=5)), and prefer RLock if reentrancy is needed. Also, keep critical sections as short as possible to reduce contention.
Why can't I pickle a lambda to send to a multiprocessing process?
Lambda functions are defined inline and do not have a globally accessible qualified name that pickle can find. To fix, define the function using def at the module level. Alternatively, use the dill library which can serialize lambdas.
What is the difference between `Pool.map` and `Pool.imap`?
Pool.map blocks until all results are ready and returns them in order as a list. Pool.imap returns an iterator that yields results as they become available, reducing memory usage for large datasets. Use imap when you want to start processing results before all tasks finish.
That's Python Libraries. Mark it forged?
6 min read · try the examples if you haven't