Skip to content
Home Python Python Memory Management — Event Bus That Ate 8GB

Python Memory Management — Event Bus That Ate 8GB

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Advanced Python → Topic 4 of 17
RSS grew ~200MB/day from never-removed event bus handlers.
🔥 Advanced — solid Python foundation required
In this tutorial, you'll learn
RSS grew ~200MB/day from never-removed event bus handlers.
  • CPython uses three-tier memory: arenas (256KB), pools (4KB), size classes (8-byte multiples).
  • Reference counting cleans 99% of objects instantly; cyclic GC handles cycles via generational mark-and-sweep.
  • Weak references and weakref.WeakValueDictionary/WeakSet break cycles without manual cleanup.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Python uses reference counting for immediate deterministic cleanup
  • Cyclic garbage collector supplements refcounting for cycles
  • Objects over 512 bytes bypass pymalloc and use raw malloc
  • Generational GC (3 tiers) optimises for young objects
  • Weak references break cycles without manual cleanup
  • tracemalloc and gc module are first-line debugging tools
🚨 START HERE

Python Memory Debug Cheat Sheet

Quick commands for diagnosing memory issues in production
🟡

Suspected memory leak

Immediate ActionCheck RSS growth pattern and enable tracemalloc on-demand
Commands
import tracemalloc; tracemalloc.start(); snapshot = tracemalloc.take_snapshot()
diff = leak_snapshot.compare_to(baseline_snapshot, 'lineno')
Fix NowIdentify top allocator and apply weakref or explicit cleanup
🟡

GC not collecting cycles quickly enough

Immediate ActionInspect GC generation counts and thresholds
Commands
import gc; print(gc.get_threshold()); print(gc.get_count())
gc.collect(2) # Force full collection
Fix NowLower gen0 threshold, or disable GC if you're sure no cycles exist
🟡

Object count growing unbounded

Immediate ActionUse gc.get_objects() to list all tracked objects
Commands
from collections import Counter; cnt = Counter(type(o).__name__ for o in gc.get_objects())
cnt.most_common(20)
Fix NowCheck the top types — if it's 'function' or 'dict', look for listener leaks
🟡

High GC pause time

Immediate ActionCheck generation 2 collection time via gc.get_stats()
Commands
gc.get_stats()
gc.set_threshold(500, 5, 5) # Tune for your app
Fix NowReduce gen0 threshold to collect young objects more often, shrink gen2 heap size
Production Incident

The Event Bus That Ate 8GB — A Python Memory Leak Diagnosis

A long-running web service's memory grew steadily over weeks until it OOM-killed the container. The culprit: a naive event bus that never deregistered callback handlers.
SymptomRSS memory climbed ~200MB per day. No single request leaked, but cumulative handler registrations consumed all available memory. Restarting fixed it temporarily.
AssumptionEngineers assumed Python's GC would clean up all unreachable objects automatically. They didn't realise that closures registered as callbacks kept references to request-scoped objects, preventing GC.
Root causeAn event bus that stored handler functions in a plain dictionary. Each request registered a new closure which captured the request object. The dictionary kept references alive, and the closures formed cycles through their enclosing scope, but the real issue was that the dictionary never removed handlers — they were always reachable.
FixReplace the plain dict with a weakref.WeakSet for the listener registry. This allowed handlers to be garbage collected when no other strong references remained, even without explicit deregistration.
Key Lesson
Never store long-lived references to short-lived objects without a cleanup path.WeakValueDictionary and WeakSet are your first line of defense against listener leaks.Always monitor object counts for container types (lists, dicts, function objects) in long-running services.
Production Debug Guide

Symptom → Action checklist for production memory issues

Memory grows over hours or days, never stabilisesTake two tracemalloc snapshots with a time interval. Diff them to find the top allocation sources. Look for closures, lists, or dicts accumulating.
Memory stays high after deleting large objectsCPython's free lists keep memory mapped. Check gc.get_objects() for surviving container objects. If counts are stable, it's free lists — not a leak.
GC collections take too long (affects latency)Check gc.get_stats() for generation 2 collection time. Tune thresholds with gc.set_threshold() to collect gen1 more frequently, reducing gen2 scan size.
__del__ methods not firingCheck for reference cycles involving objects with __del__. Cycles with __del__ are collected in generation 2 only, and order is undefined. Use weakref to break cycles.
Unexpected object identity (a is b) failingVerify whether values are within CPython's caching range. Small ints (-5 to 256) are cached; larger ints are not. Use == for value comparison.

Python feels effortless compared to C or C++. You never call malloc, you never worry about dangling pointers, and memory just... works. But that magic has a cost, and if you don't understand what's happening under the hood, you'll hit memory leaks in long-running services, inexplicable slowdowns in data pipelines, and bugs that only reproduce under load — the worst kind. Every production Python engineer has a horror story here.

The problem memory management solves is deceptively simple: who owns this chunk of memory, and when is it safe to give it back? Python answers that question with a two-layer system — reference counting as the fast first pass, and a cyclic garbage collector as the slower safety net for the cases reference counting can't handle. Understanding both layers — and how they interact — is what separates engineers who debug memory issues in minutes from those who spend days guessing.

By the end of this article you'll be able to explain CPython's memory allocator hierarchy, predict when the garbage collector fires and how to tune it, use weak references to break memory-leaking cycles, read tracemalloc snapshots to pinpoint leaks in production, and avoid the five most common memory traps that catch even experienced Python developers off guard.

CPython's Memory Architecture: From OS Blocks to Python Objects

CPython doesn't talk directly to the OS for every tiny allocation. That would be catastrophically slow — a sys call for every integer? No. Instead it builds a three-tier pyramid.

At the base, the OS gives CPython large raw memory blocks via malloc. CPython's arena allocator carves those blocks into 256 KB arenas. Each arena is divided into pools (4 KB each), and each pool handles objects of a specific size class — in multiples of 8 bytes up to 512 bytes. This is the pymalloc subsystem, and it exists specifically to avoid the overhead of the general-purpose allocator for small, short-lived objects.

Objects larger than 512 bytes skip pymalloc entirely and go straight to malloc. This means a 600-byte bytes object and a 100-byte dict have completely different allocation paths — a fact that matters when you're profiling.

Pools maintain a free list internally. When an object is freed, its slot goes back onto the pool's free list rather than returning memory to the OS immediately. This is why Python processes sometimes look like they're holding onto memory even after you've deleted everything — the memory is logically free but still mapped to the process. Arenas are only released back to the OS when every pool inside them is completely empty, which is harder to achieve than it sounds.

memory_architecture_demo.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
import sys
import tracemalloc

# Start tracing memory allocations
tracemalloc.start()

# --- Demonstrate size classes and sys.getsizeof ---

# Small integers are cached by CPython (-5 to 256)
small_int = 42
large_int = 1000

print(f"Size of integer 42:    {sys.getsizeof(small_int)} bytes")
print(f"Size of integer 1000:  {sys.getsizeof(large_int)} bytes")
print(f"Size of empty list:    {sys.getsizeof([])} bytes")
print(f"Size of empty dict:    {sys.getsizeof({}) } bytes")
print(f"Size of empty str:     {sys.getsizeof('')} bytes")
print()

# --- Show that small ints are the SAME object in memory ---
# CPython caches integers from -5 to 256 to avoid repeated allocation
a = 256
b = 256
print(f"a = 256, b = 256 -> same object? {a is b}")  # True — cached

c = 257
d = 257
print(f"c = 257, d = 257 -> same object? {c is d}")  # False — not cached
print()

# --- Demonstrate pymalloc vs raw malloc boundary ---
# Objects <= 512 bytes use pymalloc pools; larger use malloc directly
small_bytes = bytes(100)   # 100 bytes -> pymalloc
large_bytes = bytes(600)   # 600 bytes -> malloc directly

print(f"Size of 100-byte object: {sys.getsizeof(small_bytes)} bytes (pymalloc pool)")
print(f"Size of 600-byte object: {sys.getsizeof(large_bytes)} bytes (raw malloc)")
print()

# --- Snapshot: see what tracemalloc recorded ---
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 3 memory allocations in this script:")
for stat in top_stats[:3]:
    print(f"  {stat}")

tracemalloc.stop()
▶ Output
Size of integer 42: 28 bytes
Size of integer 1000: 28 bytes
Size of empty list: 56 bytes
Size of empty dict: 64 bytes
Size of empty str: 49 bytes

a = 256, b = 256 -> same object? True
c = 257, d = 257 -> same object? False

Size of 100-byte object: 133 bytes (pymalloc pool)
Size of 600-byte object: 633 bytes (raw malloc)

Top 3 memory allocations in this script:
memory_architecture_demo.py:8: size=1024 B, count=4, average=256 B
memory_architecture_demo.py:29: size=633 B, count=1, average=633 B
memory_architecture_demo.py:28: size=133 B, count=1, average=133 B
⚠ Watch Out: sys.getsizeof Is Shallow
sys.getsizeof only reports the memory of the object itself, not the objects it references. A list of 1000 large strings will report ~8056 bytes (the list shell) — not the gigabytes those strings actually consume. Use tracemalloc or the third-party 'pympler' library for deep size measurements in production diagnostics.
📊 Production Insight
Arena deallocation depends on completely empty pools.
In practice, a single surviving object in a pool prevents the entire arena from being released back to the kernel.
Rule: If your RSS is stubbornly high after deleting many objects, check if a few long-lived ones are fragmenting the arena.
🎯 Key Takeaway
pymalloc handles objects ≤512 bytes via size-class pools and arenas.
Large objects bypass directly to malloc.
Memory appears retained because arenas are rarely freed back to the OS.
Understand that peak RSS doesn't mean peak live memory.

Reference Counting and the Cyclic Garbage Collector — How Objects Actually Die

Every Python object carries an ob_refcnt field — a simple integer baked right into the PyObject C struct. Every time you bind a name, append to a list, or pass something to a function, that counter goes up. When the binding is destroyed — scope exits, del is called, the container is cleared — it goes down. Hit zero, and CPython calls the object's destructor and frees the memory immediately. No pause, no waiting. That's reference counting's superpower: instant, deterministic cleanup.

But reference counting has one fatal blind spot: cycles. If object A holds a reference to object B, and object B holds a reference back to A, both counters stay at 1 even when nothing else in the program can reach either of them. They're orphaned but immortal under pure reference counting.

This is where CPython's generational cyclic garbage collector steps in. It supplements — never replaces — reference counting. The GC tracks container objects (lists, dicts, sets, user-defined classes) that could potentially form cycles. It ignores scalars like ints and strings, which can never form cycles on their own.

The GC runs in three generations. New objects start in generation 0. If they survive a GC pass, they're promoted to generation 1, then generation 2. The idea: most objects die young (your loop variable, your temp dict), so collecting generation 0 frequently is cheap and catches most garbage. Collecting generation 2 is rare and expensive, but that's fine because long-lived objects are unlikely to be cyclic garbage.

reference_counting_and_gc.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
import gc
import sys
import ctypes

# ── PART 1: Observe reference counts directly ──────────────────────────────

class TrackedNode:
    """A simple node we'll use to build a reference cycle."""
    def __init__(self, label):
        self.label = label
        self.partner = None  # Will point to another TrackedNode

    def __del__(self):
        # This fires when the object is actually destroyed
        print(f"  [destructor] TrackedNode '{self.label}' was freed")

# Create a single node and watch the refcount
node_alpha = TrackedNode("alpha")
# getrefcount always reports +1 because the function argument itself is a reference
print(f"Refcount of node_alpha (just created): {sys.getrefcount(node_alpha) - 1}")

alias = node_alpha  # Second binding — refcount goes to 2
print(f"Refcount after creating alias:         {sys.getrefcount(node_alpha) - 1}")

del alias           # Remove one binding — refcount drops to 1
print(f"Refcount after deleting alias:         {sys.getrefcount(node_alpha) - 1}")
print()

# ── PART 2: Create an unreachable cycle and prove GC finds it ──────────────

# Disable automatic GC so we can control exactly when it runs
gc.disable()

node_one = TrackedNode("one")
node_two = TrackedNode("two")

# Wire them into a cycle: one -> two -> one
node_one.partner = node_two
node_two.partner = node_one

# Now remove the only external references to both nodes
# Reference counting CANNOT free these — each has refcount 1 from the other
print("Deleting external references to node_one and node_two...")
del node_one
del node_two
print("(No destructor fired yet — cycle keeps both alive)")
print()

# Manually check what the GC considers unreachable
unreachable_count = gc.collect()  # Collect all generations
print(f"GC collected {unreachable_count} unreachable objects")
print()

# ── PART 3: Inspect GC generations ────────────────────────────────────────

gc.enable()

print("GC generation thresholds:", gc.get_threshold())
print("GC generation counts:    ", gc.get_count())
# Thresholds: (700, 10, 10) means:
#   gen0 collects every 700 allocations
#   gen1 collects every 10 gen0 collections
#   gen2 collects every 10 gen1 collections
▶ Output
Refcount of node_alpha (just created): 1
Refcount after creating alias: 2
Refcount after deleting alias: 1

Deleting external references to node_one and node_two...
(No destructor fired yet — cycle keeps both alive)

[destructor] TrackedNode 'two' was freed
[destructor] TrackedNode 'one' was freed
GC collected 2 unreachable objects

GC generation thresholds: (700, 10, 10)
GC generation counts: (0, 0, 0)
🔥Interview Gold: Why CPython Uses Both Systems
Reference counting gives O(1) deterministic cleanup for the 99% case — no pause, no scan. The cyclic GC is the fallback for the edge case reference counting provably can't handle. PyPy, Jython and other Python implementations don't use reference counting at all, which is why code that relies on __del__ firing immediately (like files closing) can behave differently across implementations.
📊 Production Insight
Reference counting is deterministic but blind to cycles.
Cyclic GC handles cycles with a stop-the-world pause.
In high-throughput services, GC pause time is your real enemy.
Rule: Tune thresholds so gen2 collections happen during low-traffic windows.
🎯 Key Takeaway
Refcounting cleans 99% of objects instantly with zero pause.
Cyclic GC is a fallback for cycles only.
__del__ is not guaranteed to fire immediately in cycles.
Don't disable the GC unless you know your code never creates cycles.

Weak References, __slots__, and Memory-Efficient Patterns in Production

Now that you know cycles kill you, let's talk about the tools that prevent them without manually breaking every back-reference.

A weak reference lets you hold a pointer to an object without incrementing its reference count. The object can still die normally; the weak reference just becomes None (or raises ReferenceError) when that happens. This is perfect for caches, observer patterns, and parent-child relationships where the child shouldn't keep the parent alive.

The weakref module gives you weakref.ref() for a single weak reference, weakref.WeakValueDictionary for caches where values can expire, and weakref.WeakSet for observer registries.

On a completely different axis: __slots__ is the single highest-impact optimization for memory-heavy code that creates thousands of instances of the same class. By default, every Python instance carries a __dict__ — a full hash table — even if your object only has three fixed attributes. A __dict__ costs around 200–300 bytes minimum. __slots__ replaces that dict with a fixed C-level array, dropping per-instance overhead dramatically.

The trade-off: __slots__ breaks dynamic attribute assignment, makes multiple inheritance trickier, and surprises developers who expect __dict__ to exist. Use it deliberately in hot paths — not as a default everywhere.

weak_references_and_slots.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
import weakref
import sys
import gc

# ══════════════════════════════════════════════════════════════
# PART 1: WeakValueDictionary as a memory-safe cache
# ══════════════════════════════════════════════════════════════

class ExpensiveResource:
    """Simulates an object that's costly to create (DB connection, parsed config)."""
    def __init__(self, resource_id):
        self.resource_id = resource_id

    def __repr__(self):
        return f"ExpensiveResource(id={self.resource_id})"

# A cache where entries vanish automatically when nothing else holds them
resource_cache = weakref.WeakValueDictionary()

# Create a resource and store it in the cache
db_connection = ExpensiveResource(resource_id="db-primary")
resource_cache["db-primary"] = db_connection

print(f"Cache hit:  {resource_cache.get('db-primary')}")
print(f"Cache size: {len(resource_cache)}")
print()

# When the strong reference disappears, the cache entry cleans itself up
del db_connection
gc.collect()  # Force cleanup for demo purposes

print(f"After del:  {resource_cache.get('db-primary')}")
print(f"Cache size: {len(resource_cache)}")
print()

# ══════════════════════════════════════════════════════════════
# PART 2: Breaking a parent-child cycle with weakref.ref
# ══════════════════════════════════════════════════════════════

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        self._parent_ref = None  # Will hold a weak reference, not a strong one

    def add_child(self, child_node):
        child_node._parent_ref = weakref.ref(self)  # Weak — child won't keep parent alive
        self.children.append(child_node)            # Strong — parent keeps children alive

    @property
    def parent(self):
        # Dereference the weak ref; returns None if parent was collected
        if self._parent_ref is None:
            return None
        return self._parent_ref()  # Calling a weakref returns the object or None

    def __repr__(self):
        return f"TreeNode({self.value})"

root = TreeNode("root")
child = TreeNode("child")
root.add_child(child)

print(f"child.parent = {child.parent}")
print(f"root.children = {root.children}")
print()

# ══════════════════════════════════════════════════════════════
# PART 3: __slots__ memory savings — measured
# ══════════════════════════════════════════════════════════════

class RegularPoint:
    """Standard class — every instance carries a full __dict__."""
    def __init__(self, x_coord, y_coord, z_coord):
        self.x_coord = x_coord
        self.y_coord = y_coord
        self.z_coord = z_coord

class SlottedPoint:
    """Slots class — fixed-size C array, no __dict__ overhead."""
    __slots__ = ('x_coord', 'y_coord', 'z_coord')

    def __init__(self, x_coord, y_coord, z_coord):
        self.x_coord = x_coord
        self.y_coord = y_coord
        self.z_coord = z_coord

regular = RegularPoint(1.0, 2.0, 3.0)
slotted = SlottedPoint(1.0, 2.0, 3.0)

regular_size = sys.getsizeof(regular) + sys.getsizeof(regular.__dict__)
slotted_size = sys.getsizeof(slotted)  # No __dict__ to add

print(f"RegularPoint size (object + __dict__): {regular_size} bytes")
print(f"SlottedPoint size (no __dict__):        {slotted_size} bytes")
print(f"Memory saved per instance:              {regular_size - slotted_size} bytes")
print()

# Scale that up to a realistic data pipeline with 1M points
num_instances = 1_000_000
savings_mb = (regular_size - slotted_size) * num_instances / (1024 ** 2)
print(f"Projected saving across {num_instances:,} instances: {savings_mb:.1f} MB")
▶ Output
Cache hit: ExpensiveResource(id=db-primary)
Cache size: 1

After del: None
Cache size: 0

child.parent = TreeNode(root)
root.children = [TreeNode(child)]

RegularPoint size (object + __dict__): 344 bytes
SlottedPoint size (no __dict__): 56 bytes
Memory saved per instance: 288 bytes

Projected saving across 1,000,000 instances: 274.7 MB
💡Pro Tip: WeakValueDictionary for LRU Cache Backends
When building a cache on top of functools.lru_cache isn't flexible enough, WeakValueDictionary gives you automatic eviction based on object lifetime — no max-size cap needed. Combine it with a factory function that checks the cache first and creates on miss. This pattern is used inside Python's own importlib machinery.
📊 Production Insight
Weak references add negligible overhead — they're just pointers with a death callback.
__slots__ can save >80% memory per instance for narrow data objects.
But misapplied slots break reflection and dynamic attribute patterns.
Rule: Use __slots__ in hot paths where instance count >10k; measure before applying widely.
🎯 Key Takeaway
WeakValueDictionary and WeakSet break cycles automatically.
__slots__ removes __dict__ overhead → massive savings for many instances.
Never assume __dict__ exists on slotted objects — breaks serialization.
Measure first, optimise second.

Diagnosing Memory Leaks with tracemalloc in Production

You've got a long-running Python service. RSS memory climbs slowly over hours and never comes back down. The question is: what's holding onto that memory?

tracemalloc is the right tool for this — it's in the standard library since Python 3.4, has minimal overhead when used correctly, and gives you file-and-line-number attribution for every allocation. The typical workflow: take a baseline snapshot early in the process lifecycle, take a second snapshot after the suspected leak window, and diff them. The lines with the biggest positive delta are your culprits.

For production use, keep tracemalloc off by default (it adds ~30% memory overhead for tracing metadata) and enable it only when diagnosing. Better: expose a signal handler or a debug endpoint that takes a snapshot on demand without restarting the process.

Beyond tracemalloc, the gc module is invaluable. gc.get_objects() returns every object currently tracked by the cyclic GC. Calling it before and after a suspicious operation and comparing counts tells you exactly what object types are accumulating. Pair it with collections.Counter for instant triage.

A subtler cause of production leaks is Python's internal free lists for types like floats, lists, and frames. CPython keeps recently freed objects on these lists for reuse rather than returning to the OS. This is good for performance, but it means peak memory is sticky — after a spike, your process won't shrink even after the spike objects are gone.

leak_diagnosis_demo.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
import tracemalloc
import gc
import collections
import linecache

# ── Helper: pretty-print a tracemalloc diff ────────────────────────────────

def display_top_allocations(snapshot, key_type='lineno', limit=5):
    """Print the top N memory consumers from a tracemalloc snapshot."""
    stats = snapshot.statistics(key_type)
    print(f"{'Rank':<5} {'Size':>10} {'Count':>8}  Location")
    print("-" * 60)
    for rank, stat in enumerate(stats[:limit], start=1):
        frame = stat.traceback[0]
        # Fetch the actual source line for context
        source_line = linecache.getline(frame.filename, frame.lineno).strip()
        print(f"{rank:<5} {stat.size / 1024:>8.1f} KB {stat.count:>8}  "
              f"{frame.filename}:{frame.lineno}")
        print(f"      {'':>10} {'':>8}  -> {source_line}")
    print()

# ── Simulate a leaking registry (classic production pattern) ───────────────

class EventBus:
    """
    A naive event bus that never deregisters listeners.
    This is the #1 cause of Python service memory leaks.
    """
    _listeners: dict = {}

    @classmethod
    def register(cls, event_name, handler_func):
        cls._listeners.setdefault(event_name, []).append(handler_func)

    @classmethod
    def listener_count(cls):
        return sum(len(v) for v in cls._listeners.values())

# ── Take baseline snapshot ─────────────────────────────────────────────────

tracemalloc.start(depth=5)  # depth=5 captures 5 frames of stack context
gc.collect()                 # Clean slate before baseline

baseline_snapshot = tracemalloc.take_snapshot()
baseline_gc_counts = collections.Counter(
    type(obj).__name__ for obj in gc.get_objects()
)

print("=== Simulating 500 request cycles (leaking handlers each time) ===")

# Simulate a web server handling requests — each 'request' registers a
# new handler but the old ones are never removed
for request_number in range(500):
    def handle_user_event(event_data, req=request_number):
        """Handler closure — captures req, keeping it alive in the bus."""
        return f"request {req} handled {event_data}"

    EventBus.register("user.login", handle_user_event)

print(f"EventBus now holds {EventBus.listener_count()} handlers")
print()

# ── Take leak snapshot and diff ────────────────────────────────────────────

leak_snapshot = tracemalloc.take_snapshot()
leak_gc_counts = collections.Counter(
    type(obj).__name__ for obj in gc.get_objects()
)

print("=== Top memory allocations AFTER the leak ===")
display_top_allocations(leak_snapshot, limit=4)

print("=== Object count changes (GC-tracked objects) ===")
for type_name, count in (leak_gc_counts - baseline_gc_counts).most_common(5):
    print(f"  +{count:>6}  {type_name}")
print()

# ── Show the diff between snapshots ───────────────────────────────────────
print("=== Snapshot diff (new allocations since baseline) ===")
diff_stats = leak_snapshot.compare_to(baseline_snapshot, 'lineno')
for stat in diff_stats[:4]:
    print(stat)

tracemalloc.stop()

# ── The fix: use WeakSet so the bus doesn't prevent GC ────────────────────
print()
print("=== Fix: use weakref.WeakSet for listener registry ===")
import weakref

class SafeEventBus:
    _listeners: dict = {}

    @classmethod
    def register(cls, event_name, handler_func):
        if event_name not in cls._listeners:
            cls._listeners[event_name] = weakref.WeakSet()
        cls._listeners[event_name].add(handler_func)

    @classmethod
    def listener_count(cls):
        return sum(len(list(v)) for v in cls._listeners.values())

print("SafeEventBus uses WeakSet — handlers are released when they go out of scope.")
▶ Output
=== Simulating 500 request cycles (leaking handlers each time) ===
EventBus now holds 500 handlers

=== Top memory allocations AFTER the leak ===
Rank Size Count Location
------------------------------------------------------------
1 48.2 KB 500 leak_diagnosis_demo.py:52
-> def handle_user_event(event_data, req=request_number):
2 10.1 KB 1 leak_diagnosis_demo.py:30
-> _listeners: dict = {}
3 5.3 KB 500 <frozen importlib._bootstrap>:241
->
4 1.2 KB 14 leak_diagnosis_demo.py:1
-> import tracemalloc

=== Object count changes (GC-tracked objects) ===
+ 500 function
+ 1 dict
+ 1 list

=== Snapshot diff (new allocations since baseline) ===
leak_diagnosis_demo.py:52: size=48200 B (+48200 B), count=500 (+500), average=96 B
leak_diagnosis_demo.py:30: size=10136 B (+10136 B), count=1 (+1), average=10136 B
<frozen importlib._bootstrap>:241: size=5376 B (+5376 B), count=500 (+500), average=10 B

=== Fix: use weakref.WeakSet for listener registry ===
SafeEventBus uses WeakSet — handlers are released when they go out of scope.
⚠ Watch Out: tracemalloc Overhead in Production
Running tracemalloc.start() permanently in a production service can increase memory usage by 30–50% because it stores a traceback for every live allocation. The production-safe pattern: keep it disabled, expose a /debug/memory endpoint (behind auth) that calls tracemalloc.start(), waits 60 seconds, takes a snapshot, calls tracemalloc.stop(), and returns the diff as JSON. You get the diagnosis without the permanent cost.
📊 Production Insight
tracemalloc gives precise attribution but costs memory.
Use gc.get_objects() for lightweight object count snapshots.
Free lists cause RSS stickiness — not a leak, just latency to release.
Rule: Distinguish between true leaks (object count grows) and free list retention (stable count, high RSS).
🎯 Key Takeaway
tracemalloc diffs pinpoint leaky lines.
gc.get_objects() + Counter shows what types are accumulating.
Free lists inflate RSS after spikes — run gc.collect() to confirm.
Always baseline before you debug, and isolate tracemalloc to debug endpoints.

GC Tuning and Production Trade-offs

The default GC thresholds (700 allocations for gen0, 10 gen0 collections per gen1, 10 gen1 per gen2) work for general-purpose scripts. In production, they can cause noticeable latency spikes when gen2 collects a large heap.

You can tune with gc.set_threshold(gen0_threshold, gen1_multiplier, gen2_multiplier). Lower gen0 threshold triggers more frequent collections, which keeps each collection small but raises total overhead. Higher thresholds mean less frequent but more expensive collections.

Some high-throughput services disable the GC entirely after startup — Instagram famously did this. They proved their code never created cycles. That's a risky move unless you audit every library and every codepath. You can also run gc.collect(2) manually during maintenance windows.

gc.set_debug(gc.DEBUG_LEAK) prints objects that can't be collected — invaluable for catching cycles with __del__. But don't leave it on in production; it prints to stderr and slows everything.

Another tuning lever: gc.freeze() promotes all current objects to a 'permanent' generation that the GC never scans again. This is useful for services that preload modules and config at startup — those objects never die, so scanning them every GC cycle is wasted work. Django's ASGI server uses this pattern.

Mental Model
GC Tuning as a Service Budget
Think of GC budget like a CPU budget: you decide how much CPU time to spend on memory housekeeping.
  • Frequent young collections (low gen0 threshold) → lower peak pause, higher total CPU.
  • Infrequent young collections (high gen0 threshold) → higher peak pause, lower total CPU.
  • Gen2 pause scales with the number of objects that survive to gen2.
  • gc.freeze() eliminates scanning of immortal objects entirely — use it after warm-up.
  • The right trade-off depends on your latency SLO and memory allocation rate.
📊 Production Insight
Lowering gen0 threshold from 700 to 300 can reduce gen2 collection size by 40%.
But each gen1 sweep adds ~0.3ms overhead — multiply by collection frequency.
gc.freeze() after warm-up can cut GC CPU usage by 15–25% in web services.
Rule: Profile GC with gc.get_stats() before tuning — never guess.
🎯 Key Takeaway
GC thresholds control pause time vs total overhead.
gc.freeze() skips immutable objects — huge win for long-lived services.
Disabling GC is dangerous unless you can prove zero cycles exist.
Measure GC stats in production, tune based on your latency budget.
AspectReference CountingCyclic Garbage Collector
Mechanismob_refcnt field in every PyObject C structMark-and-sweep over tracked container objects
TriggersEvery assignment, del, scope exit — immediateAfter N allocations per generation (threshold-based)
Handles cycles?No — orphaned cycles live foreverYes — its entire reason for existing
Pause timeZero — cleanup happens inlineStop-the-world pause (brief but real; worse for gen2)
OverheadAtomic increment/decrement on every reference opPeriodic scan of all tracked containers
Tunable?No — hardwired into CPythonYes — gc.set_threshold(), gc.disable(), gc.collect()
Object types coveredAll objectsOnly container types (list, dict, set, class instances)
__del__ guaranteed?Yes, immediately when refcount hits 0 (no cycles)Eventually, but order is undefined for cycle members
PyPy / Jython supportNo — only CPythonDifferent GC implementations exist in each runtime

🎯 Key Takeaways

  • CPython uses three-tier memory: arenas (256KB), pools (4KB), size classes (8-byte multiples).
  • Reference counting cleans 99% of objects instantly; cyclic GC handles cycles via generational mark-and-sweep.
  • Weak references and weakref.WeakValueDictionary/WeakSet break cycles without manual cleanup.
  • __slots__ eliminates per-instance __dict__ overhead — huge savings for high-count objects.
  • tracemalloc diffs pinpoint leaky code lines; gc.get_objects() shows accumulating types.
  • GC tuning (thresholds, freeze) lets you trade pause time for total overhead.
  • Free lists cause RSS stickiness, not true leaks — always collect before diagnosing.
  • Disabling GC is risky unless you can prove zero cycles exist in all code paths.

⚠ Common Mistakes to Avoid

    Using 'is' for value comparison
    Symptom

    a is b returns True for small integers and interned strings because CPython caches them, then randomly returns False for larger values outside the cache range. This creates false confidence until production data hits different ranges.

    Fix

    Always use == for value comparison. Reserve is exclusively for identity checks like if obj is None or comparing singletons. Document this in your team's style guide.

    Expecting __del__ to fire at a predictable time
    Symptom

    File handles, socket connections, or lock releases inside __del__ don't execute when expected, causing resource leaks in long-running services. This is especially common in code that creates objects in a cycle.

    Fix

    Use context managers (with statements) with __enter__/__exit__ for all deterministic resource cleanup. Never rely on __del__ for anything time-sensitive — it may be delayed by cycles, suppressed during interpreter shutdown, or not called at all in PyPy.

    Disabling GC for 'speed' without understanding consequences
    Symptom

    After calling gc.disable() in a Django or FastAPI service, memory climbs unbounded over hours because every cyclic structure (including ORM querysets referencing model instances referencing the queryset) accumulates.

    Fix

    Profile first with gc.get_stats() to measure actual GC pause time. If overhead is real, tune thresholds with gc.set_threshold() rather than disabling outright. Instagram's GC-disable trick only works because their allocation pattern avoids cycles — it's not a general recipe.

Interview Questions on This Topic

  • QExplain how CPython manages memory. What are arenas, pools, and blocks, and why does this three-tier system exist?SeniorReveal
    CPython avoids making a syscall for every small allocation. It requests large 256KB arenas from the OS via malloc, divides each arena into 4KB pools, and assigns each pool to a specific size class (multiples of 8 bytes up to 512 bytes). Objects ≤512 bytes are allocated from these pools (pymalloc), while larger objects go directly to malloc. This design reduces fragmentation and allocator overhead for the vast majority of Python objects, which are small and short-lived. The free lists within pools reuse memory quickly, but also cause RSS stickiness because arenas are only released back to the OS when every pool is completely empty.
  • QWhat's the difference between reference counting and the cyclic garbage collector? Why does CPython need both?SeniorReveal
    Reference counting immediately frees an object when its reference count hits zero — deterministic, no pause. But it can't handle cycles: two objects referencing each other keep their counts at 1 even when the program can't reach them. The cyclic GC supplements refcounting by tracing container objects and collecting unreachable cycles. It uses a generational algorithm (generations 0, 1, 2) and runs on a threshold-based schedule. The two systems together give you instant cleanup for the common case and a safety net for cycles.
  • QHow would you debug a memory leak in a production Python service? Walk through the tools and steps.SeniorReveal
    First, confirm it's a true leak vs. free list retention: run gc.collect() and check if RSS drops. If RSS drops, it's free lists — no leak. If not, use tracemalloc: enable it via an on-demand debug endpoint, take baseline and post-leak snapshots, diff them to find the top allocation sources. Also use gc.get_objects() + collections.Counter to see which object types are growing (functions, dicts are common culprits). Look for listener registrations, closures capturing request-scoped objects, and module caches that never clear. Apply weak references or explicit cleanup. Never run tracemalloc permanently in production — it adds 30–50% memory overhead.
  • QWhen would you use __slots__ in Python? What are the trade-offs?Mid-levelReveal
    Use __slots__ when you create thousands of instances of a class and memory is a concern. It removes the per-instance __dict__ (a full hash table ~300 bytes) and stores attributes in a fixed C-level array. This can cut per-instance memory by 80%+. The trade-offs: you can't add new attributes dynamically, multiple inheritance becomes trickier if classes have __slots__ defined (needs explicit cooperation), and code that relies on __dict__ (serialization, reflection) will break. Measure memory savings before committing — the benefit is negligible for small instance counts.

Frequently Asked Questions

Why does Python's memory usage not decrease after deleting large objects?

CPython keeps freed objects in internal free lists for reuse rather than returning memory to the OS. This is by design — it avoids the cost of repeated syscalls. True memory is reclaimed when an entire arena pool is empty, which may not happen if any object in that pool remains alive. Additionally, the OS may not reclaim pages immediately even after munmap. To confirm a leak vs. free list retention, call gc.collect() and check if RSS drops — if it does, it's free lists. Use tracemalloc to find real leaks.

How do I prevent memory leaks from event listeners and callbacks?

Never store callbacks in a plain list or dictionary. Use weakref.WeakSet or weakref.WeakValueDictionary so that when the callback object goes out of scope (e.g., after a request finishes), it's automatically removed from the registry. If you must keep strong references, implement an explicit deregistration mechanism (e.g., a context manager that unregisters on __exit__).

When should I call gc.collect() manually?

Manually call gc.collect() when you've just released a large cyclic structure and want to free memory immediately, such as after processing a huge batch of data. Also call it before taking a memory snapshot for profiling. Don't call it on every request — it adds overhead. In production, consider running gc.collect(2) during maintenance windows to clean generation 2 without affecting response times.

What is the difference between gc.get_objects() and sys.getsizeof() for measuring memory?

gc.get_objects() returns a list of every container object tracked by the cyclic GC — it tells you what objects exist and their types, but not their sizes. sys.getsizeof() returns the shallow size of a single object (the object itself, not its referenced objects). For real memory profiling, use tracemalloc (standard library) or pympler (third-party) to get both object counts and deep sizes.

Does disabling the GC improve Python performance?

It can, but only if you're sure your code never creates reference cycles and you've measured that GC pause time is actually hurting your latency. Without cycles, the GC does nothing useful. However, many libraries (ORM, caching, async frameworks) create cycles internally. Instagram disabled GC safely because they audited every dependency and proved zero cycles. For most teams, it's safer to tune thresholds with gc.set_threshold() than to disable the GC entirely.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousMetaclasses in PythonNext →Python Descriptors
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged