Python Magic Methods — Set Corruption When __hash__ Missing
Orders missing from set when __eq__ defined without __hash__ triggers TypeError and silent corruption.
- Magic methods are hooks Python calls automatically on operator, attribute, or built-in usage — they are looked up on the class type, not the instance, so monkey-patching a dunder on a single object has no effect
- __init__ initializes instances; __new__ controls creation itself — override __new__ only for singletons or immutable subclasses
- __repr__ must be unambiguous for debugging; __str__ is user-friendly — if you only implement one, implement __repr__
- __eq__ and __hash__ must agree; break this rule and sets and dicts silently corrupt — implementing __eq__ without __hash__ makes the object unhashable with TypeError
- __slots__ saves 40–60% memory per instance but requires every class in the hierarchy to define its own slots, and @dataclass(slots=True) is the safer way to get the same benefit in Python 3.10+
- __getattr__ is a fallback for missing attributes only; __getattribute__ intercepts every access — confusing the two is a common source of silent bugs
- __call__ makes any instance callable; __enter__ and __exit__ power the with statement — both are more production-common than most tutorials acknowledge
Imagine you buy a fancy coffee machine. Out of the box it already knows how to turn on, make a sound when it is done, and show its status on a little screen — you did not program any of that, it just came built-in. Python magic methods are the same idea: they are pre-agreed slots that Python calls automatically when certain things happen to your object, like printing it, adding two of them together, or checking if they are equal. You fill in the slot, Python does the calling. The important detail is that Python looks for that slot on the class, not on the individual object. You cannot give one specific coffee machine a custom startup sound by sticking a label on it — you have to update the model's factory spec.
Every time Python evaluates len(my_list), compares two objects with ==, or prints something with print(), it is secretly delegating that work to a special method buried inside the object's class. This is not magic in the stage-trick sense — it is a precisely defined protocol that makes Python's data model tick. Understanding it is the difference between writing classes that play nicely with the rest of Python's ecosystem and writing classes that feel bolted-on and awkward.
Before magic methods existed as a concept, languages forced you to register callbacks or inherit from a god-class just to make your objects behave like built-in types. Python solved this with a clean contract: implement a double-underscore method with a specific name (hence 'dunder'), and the interpreter will call it at the right moment. The result is that your custom Vector class can support +, len(), slicing, context managers, and even pickling — all without inheriting from anything.
By the end of this article you will understand not just the syntax but the CPython internals that make these calls happen, the subtle ordering rules Python follows when resolving them, the performance traps hiding in __getattr__ and __slots__, how __call__ and __enter__/__exit__ work in production, why __del__ is almost always the wrong answer to resource cleanup, and the patterns senior engineers use in production libraries. You will also have concrete answers for the interview questions that trip up even experienced Python developers.
What Are Magic Methods? The CPython Dispatch Contract
Magic methods are special method names with double underscores on both ends that Python's interpreter calls implicitly when you use certain syntax or built-in operations. The 'magic' is not runtime wizardry — it is a well-defined protocol baked into CPython's bytecode execution and the C-level type structure.
When you write len(obj), CPython calls type(obj).__len__(obj), not obj. directly. Python looks up the method on the class, not the instance. This distinction is critical: if you try to monkey-patch a dunder onto a single object — __len__()obj.__len__ = lambda: 42 — it will have no effect when you call len(obj). The built-in function goes through the type, always.
The same dispatch applies to obj + other: Python looks up __add__ on type(obj), then __radd__ on type(other), then raises TypeError if both return NotImplemented. This two-step lookup is why int + float works — int.__add__ returns NotImplemented for floats, so Python falls back to float.__radd__. The protocol enables cooperation between types that neither owns the other.
A practical guide for choosing when to implement a dunder versus a regular method: - Use a dunder when you want Python syntax (+, len(), str(), ==) to work naturally on your object. - Use a regular named method when you need a descriptive API — is clearer than add_item()__add__ for domain-specific logic. - Use __lt__ and related comparison dunders when you want your objects to work with sorted(), min(), and max(). - Use __call__ when you want an instance to behave like a function — this is more common in production than tutorials suggest.
The key mental model: each dunder is a slot in CPython's type structure (PyTypeObject). If the class defines the dunder, Python fills the slot. If not, the slot is NULL and the built-in raises TypeError. This is also why __getattr__ cannot intercept len() — __getattr__ operates at the Python level, but len() goes through a C-level slot that bypasses Python attribute lookup entirely.
len() calls: __getattr__ operates at the Python attribute level, while len() goes through a C slot that bypasses Python attribute lookup entirely.len() still use the C-level slot and ignore your override. This means __getattr__ cannot intercept len() calls — surprising if you relied on it for logging. To intercept len(), override __len__ directly. To intercept all attribute access including dunders, you need a proxy class with explicitly defined dunders, not just __getattribute__.len() or + to use it — the lookup always goes through the class.__init__ vs __new__: Object Creation Under the Hood
Most developers know __init__ as the constructor, but technically __new__ is the true constructor — it allocates the object. __init__ only initializes an already-created instance. This distinction matters when you need immutable objects or when you subclass immutable types like tuple or str.
__new__ receives the class as its first argument and must return an instance, usually by calling super().__new__(cls). If __new__ returns an instance of a different class, __init__ is NOT called. That is not a bug — it is a deliberate rule: __init__ only runs if __new__ returned an instance of the class being constructed.
In production, you rarely override __new__ unless you need a singleton, a flyweight, or to subclass an immutable type. For 95% of Python classes, __init__ is all you need.
- Subclassing immutable types (tuple, str, int, frozenset) — __init__ cannot modify an already-created immutable, so modification must happen in __new__.
- Singleton or flyweight patterns — __new__ can return an existing cached instance.
- Factory patterns where the returned type depends on the arguments — __new__ can return a subclass instance.
One rule worth memorising: if __new__ returns an instance of a completely different class, __init__ is skipped. This is how some serialisation libraries work — they call __new__ to allocate the shell of an object, then populate attributes directly without going through __init__.
super().__new__(cls) and return an instance of cls unless you have a very deliberate reason to return something else.super().__new__(cls). Forgetting to call super().__new__ results in a TypeError at instantiation time. In the Singleton pattern, note that __init__ still runs on every call even though __new__ returns the same object — so initialisation logic in __init__ will overwrite state on every construction, as shown in the example above.__str__ vs __repr__: Which to Use for Debugging and Logging
Both __repr__ and __str__ return string representations, but the contract is different. __repr__ should be unambiguous — ideally a string you could pass to eval() to recreate the object. __str__ should be readable to an end user. Python uses __str__ for print() and str(), and __repr__ for the interactive interpreter and the f-string debug format (f"{obj!r}").
If you define only one, define __repr__. When __str__ is missing, Python falls back to __repr__. The reverse is not true: if __str__ is defined but __repr__ is not, the interactive interpreter and logging frameworks that call repr() still show the default <__main__.MyClass object at 0x...> — completely useless in a production log.
A practical rule from years of on-call experience: if you ship a class to production without __repr__, you will eventually spend time staring at a log file trying to figure out which object was which. It takes about five minutes to implement and saves hours over the lifetime of the service.
For sensitive data — passwords, API keys, personal information — mask the value in __repr__ rather than omitting it entirely. Showing User(id=42, email='a***@example.com') is far more useful for debugging than User(id=42) and still protects the data.
repr() will show a useless memory address. This makes it impossible to distinguish objects in logs without extra context. Every class that gets passed to a logger, stored in an exception, or printed in a traceback deserves a __repr__. It is the highest-ROI dunder you can implement.__eq__ and __hash__: The Contract That Keeps Dicts and Sets Consistent
Python's dict and set rely on hash tables. When you look up a key, Python computes its hash (via __hash__) to find the bucket, then checks equality (via __eq__) to confirm the match. The contract is simple: if two objects are equal (__eq__ returns True), their hashes MUST be equal. Break this contract and you corrupt the data structure — objects disappear, duplicates appear, and lookups return wrong results.
There are two distinct failure modes that get conflated:
Failure mode one — __eq__ without __hash__: Python implicitly sets __hash__ to None, making the object unhashable. Any attempt to add it to a set or use it as a dict key raises TypeError: unhashable type immediately. This is the loud, obvious failure.
Failure mode two — mutable hash: You implement both __eq__ and __hash__, but __hash__ is based on a mutable field. The object is inserted into a set. The field changes. Now the hash is different, the object is in the wrong bucket, and it can never be found or removed. The set appears to contain an object it cannot retrieve. This is the silent, dangerous failure — no exception, just corrupted state.
The fix for both is the same: base __hash__ only on immutable fields, and match those fields to what __eq__ uses.
For mutable objects where value-based equality is needed, the right answer is usually: do not implement __hash__ at all (leave it None explicitly), and use a list or a different data structure that does not require hashing. If you need to deduplicate mutable objects, extract an immutable key and deduplicate on that.
__getattr__, __setattr__, and __delattr__: Attribute Access Control and Pitfalls
Python gives you fine-grained control over attribute access via three hooks: __getattr__ (fallback for failed lookups), __setattr__ (every attribute assignment), and __delattr__ (attribute deletion). __getattr__ is called only when normal attribute lookup fails — it is NOT the same as __getattribute__, which intercepts every access.
The difference matters enormously in practice. If you access an attribute that exists, __getattr__ is never called — __getattribute__ handles it. __getattr__ is specifically the fallback of last resort.
The underscore guard in the proxy example below deserves an explicit explanation because engineers routinely remove it thinking it is defensive noise. Without it, accessing self._config inside __getattr__ would trigger __getattr__ again (because _config might not exist yet during the early stages of __init__ before the line self._config = config executes). The guard raises AttributeError immediately for underscore-prefixed names, which tells Python to abort the lookup rather than recurse.
Inside __setattr__, never write self.x = value — that calls __setattr__ again and creates infinite recursion. Always delegate through super().__setattr__(name, value) or object.__setattr__(self, name, value) for attributes that need to bypass the custom logic.
super().__setattr__(name, value) or object.__setattr__(self, name, value). The underscore guard (raise AttributeError for names starting with underscore) is your safety net for early-construction scenarios.get() method with an explicit default argument — do not hide missing configuration inside attribute lookup.super() to avoid infinite recursion. The underscore guard in __getattr__ prevents recursion during early object construction — do not remove it.__call__, __enter__, __exit__, and __del__ — The Production Dunders Nobody Teaches
Most tutorials cover __init__, __repr__, and __eq__ and stop there. But three other dunders appear constantly in production Python and deserve explicit coverage.
__call__ makes any instance behave like a function. Any class with __call__ defined can be invoked with parentheses — instance(args). This is how decorators implemented as classes work, how stateful callables maintain configuration, and how middleware layers stay composable without inheritance. In 2026, __call__ is especially common in ML inference pipelines where a model object is callable, and in dependency injection containers that need to produce objects on demand.
__enter__ and __exit__ power the with statement. __enter__ is called at the start of a with block and its return value is bound to the as variable. __exit__ is called at the end, whether the block exits normally or through an exception. __exit__ receives the exception type, value, and traceback — if it returns True, the exception is suppressed; if it returns False or None, the exception propagates. This is the correct pattern for resource management in Python, replacing try/finally in almost every case.
__del__ is the finalizer — it is called when an object's reference count drops to zero. It looks like the right place for cleanup, but it is unreliable in practice: it may not be called if there are reference cycles (CPython's cycle garbage collector handles those separately), it is never guaranteed to run in PyPy, and it is never guaranteed to run at all during interpreter shutdown. The rule is simple: if a resource must be released, use a context manager (__enter__/__exit__) or an explicit close() method. Reserve __del__ for logging or debugging only, never for resource release that must happen.
close() method. __del__ is for optional cleanup at best, and for logging at worst.__slots__: Memory Efficiency, Inheritance Rules, and @dataclass(slots=True)
__slots__ is a declarative way to prevent the creation of a per-instance __dict__, saving memory and speeding up attribute access. Each slot reserves space for an attribute as a direct pointer in a fixed-size array, bypassing the dict lookup entirely.
The memory saving is concrete. A regular class instance in CPython 3.12 carries a __dict__ that takes roughly 200–400 bytes even when empty, plus the standard instance overhead. A slotted instance replaces that with a fixed array of pointers — typically 40–50% smaller per instance. For a service that creates millions of geolocation points or financial tick records, that difference matters.
The complexity cost is real too. Every class in the inheritance hierarchy must define its own __slots__. A subclass that does not define __slots__ will have a __dict__ anyway, defeating the memory saving. Multiple inheritance with __slots__ requires careful coordination — if two parent classes both define non-empty __slots__, the subclass must be designed to avoid layout conflicts.
For private attributes with name mangling, the slot must use the mangled name. A class Foo with self.__price needs the slot named '_Foo__price', not '__price'. This is a common mistake that produces an AttributeError that looks nothing like a slots issue.
In Python 3.10+, @dataclass(slots=True) generates both the class and its __slots__ automatically, handles the mangled names correctly, and avoids most of the manual slot management pitfalls. This is the recommended way to use slots for most teams in 2026.
The Silent Set Corruption: When __eq__ Without __hash__ Breaks Production
- Never implement __eq__ without __hash__ unless you explicitly want unhashable instances — and if you want unhashable, set __hash__ = None explicitly so the intent is clear.
- Use @dataclass(frozen=True) for value objects to avoid manual boilerplate and to enforce immutability.
- Add a unit test that inserts two equal objects into a set and checks the set has exactly one element. That test would have caught this on day one.
- When a TypeError surfaces in production and you apply a workaround under pressure, the root cause investigation still matters. The workaround may introduce a second bug that is harder to find.
len() will not find it because Python looks up dunders on the type, not the object. Also confirm the method returns an integer — returning a float raises TypeError.len() raises TypeError.Key takeaways
Common mistakes to avoid
7 patternsDefining __eq__ without __hash__
Using self.attr = value inside __setattr__ without delegating to super()
super().__setattr__(name, value) for internal assignments inside __setattr__.Assuming __getattr__ intercepts all attribute access
super().__getattribute__(name) inside __getattribute__ and handle AttributeError to delegate to __getattr__.Using __slots__ without accounting for mangled private attribute names
Using __del__ for resource cleanup
close() method is needed, document it clearly and never rely on __del__ to call it.Not implementing __repr__ in production classes
Monkey-patching a dunder on an instance expecting built-in functions to use it
Interview Questions on This Topic
What is the difference between __new__ and __init__? When would you override __new__?
Frequently Asked Questions
That's OOP in Python. Mark it forged?
9 min read · try the examples if you haven't