Python Pickle — __reduce__() Remote Code Execution
__reduce__() enables arbitrary code execution during deserialization.- pickle converts Python objects to byte streams and back — great for internal caching and model persistence.
- Always use binary mode ('wb'/'rb') and specify a protocol version to avoid version incompatibilities.
- Never unpickle data from untrusted sources — it can execute arbitrary code via __reduce__.
- pickle converts arbitrary Python objects to byte streams for storage or transmission.
- Use pickle.dump() to write to a file and pickle.load() to restore.
- Protocol versions (0-5) trade compatibility for speed and size; use PROTOCOL 4 or 5 for modern apps.
- Pickle is faster than JSON for Python-native objects but insecure with untrusted data.
- Production gotcha: unpickling attacker-controlled data executes arbitrary code via __reduce__.
Quick Pickle Debug Cheat Sheet
UnpicklingError: invalid load key '\x00'
python -c "with open('data.pkl','rb') as f: print(f.read(20))"python -c "with open('data.pkl','rb') as f: import pickle; print(pickle.load(f))"AttributeError: Can't get attribute 'MyClass'
python -c "from mymodule import MyClass; import pickle; data = pickle.load(open('data.pkl','rb'))"python -c "import pickle; with open('data.pkl','rb') as f: print(type(pickle.load(f)).__name__)"Production Incident
pickle.load() data from an untrusted source.Implement a restricted Unpickler if you must use pickle with external input.Consider using JSON, MessagePack, or dill with a restricted whitelist.Production Debug GuideCommon symptoms when pickle.load() fails and how to fix them
pickle.dump(), or verify file size.Every serious Python application eventually hits the same wall: you spend time computing something valuable — a trained machine learning model, a complex graph structure, a parsed configuration tree — and then your program ends and all of it vanishes. The next run starts from scratch, wasting time and resources. This is one of the most quietly expensive problems in Python development, and most beginners don't realise there's a clean, built-in solution sitting right in the standard library.
The pickle module solves this by giving you object serialization: the ability to convert any Python object into a byte stream that can be written to disk, stored in a database, or sent across a network — and then deserialized back into an identical live object. Unlike writing to a CSV or JSON file, pickle doesn't care what shape your data is in. It handles nested objects, custom class instances, lambda functions, and even circular references without you lifting a finger.
By the end of this article you'll understand exactly how pickle works under the hood, when it's the right tool versus when you should reach for JSON or shelve, how to safely serialize and deserialize complex Python objects including class instances, and — critically — the security trap that catches even experienced developers off guard. You'll walk away with patterns you can drop into real projects immediately.
What is pickle Module in Python?
pickle is Python's built-in serialization module. It converts any Python object into a byte stream that can be saved to disk or sent over a network, then reconstructed later. The key advantage over formats like JSON or CSV is that pickle can handle arbitrary Python objects — including custom class instances, nested structures, and even circular references — without you writing any conversion code.
Here's the simplest possible example: serialize a dictionary to a file, then read it back.
import pickle # Serialize a dict to a file data = {'name': 'model_v1', 'accuracy': 0.95, 'layers': [128, 64, 10]} with open('model.pkl', 'wb') as f: pickle.dump(data, f) # Deserialize it back with open('model.pkl', 'rb') as f: restored = pickle.load(f) print(restored == data) # True print(restored['name']) # model_v1
model_v1
How to Serialize and Deserialize Objects with pickle
The two core functions are pickle.dump() to write an object to a file, and pickle.load() to read it back. You can also use pickle.dumps() to get bytes and pickle.loads() from bytes. Always open your files in binary mode: 'wb' for writing, 'rb' for reading. The protocol argument controls the binary format version.
Protocol versions range from 0 (text-based, readable) to 5 (binary, with out-of-band data support). Specifying protocol=pickle.HIGHEST_PROTOCOL gives you the best speed and smallest size, but may not be compatible with older Python versions.
import pickle # Pickle a complex object with different protocols data = {"name": "model", "params": [1.2, 3.4], "hyperparams": {"lr": 0.001}} with open('data_proto4.pkl', 'wb') as f: pickle.dump(data, f, protocol=4) with open('data_proto5.pkl', 'wb') as f: pickle.dump(data, f, protocol=5) # Read back (protocol auto-detected) with open('data_proto4.pkl', 'rb') as f: restored = pickle.load(f) print(restored)
pickle.dump() for files, pickle.dumps() for bytes.Customizing Serialization with __getstate__ and __setstate__
Not every object attribute is serializable by default. File handles, network connections, and database sessions need special handling. Override __getstate__ to return a dict of picklable state, and __setstate__ to restore resources after deserialization.
A common pattern: your class holds a database connection that cannot be pickled. You exclude it in __getstate__ and recreate it in __setstate__.
class DatabaseConnection: def __init__(self, dsn): self.dsn = dsn self.conn = self._create_connection() def _create_connection(self): return f"Connected to {self.dsn}" def __getstate__(self): return {'dsn': self.dsn} def __setstate__(self, state): self.__dict__.update(state) self.conn = self._create_connection() import pickle conn = DatabaseConnection("db://host:port/db") with open('conn.pkl', 'wb') as f: pickle.dump(conn, f) with open('conn.pkl', 'rb') as f: loaded = pickle.load(f) print(loaded.conn)
Security Risks and Safe Usage
The biggest danger with pickle is that pickle.load() can execute arbitrary code. This happens through the __reduce__ protocol, which allows objects to specify any callable and arguments to reconstruct themselves. A malicious pickle can run os.system, open network sockets, or delete files.
The only safe rule: never unpickle data from an untrusted source. If you must accept external input, use a restricted Unpickler that whitelists allowed classes. Even then, consider using a separate process or container for isolation.
pickle.load() from an attacker can execute any Python code. Always restrict what classes can be unpickled using a custom Unpickler with overridden find_class. But even that is not foolproof — prefer JSON or protocol buffers for untrusted input.pickle.load() on data you didn't create.Performance Comparison: pickle vs JSON vs dill
pickle is generally faster than JSON for Python-native objects because it uses a binary format and can encode arbitrary types without additional conversion. However, for simple dicts and lists, JSON can be faster due to optimized C implementations. dill extends pickle and adds support for lambdas and closures, but comes with a size and speed overhead.
Here's a rough benchmark: pickling a 10MB list of strings is about 3x faster than JSON with protocols 4 or 5. The file size is also ~20% smaller. But if you need interoperability, JSON wins.
| Feature | pickle | JSON | dill |
|---|---|---|---|
| Python object support | Full (classes, functions, circular refs) | Basic types only (dict, list, str, int, etc.) | Full plus lambdas, closures, generators |
| Security | Dangerous — executes arbitrary code | Safe — no code execution | Same risk as pickle |
| Speed (large objects) | Fast (binary, protocol 5) | Slower (text-based, conversion overhead) | Slower than pickle (extension overhead) |
| Interoperability | Python only | Cross-language | Python only |
| File size (nested data) | 20-30% smaller than JSON | Larger due to text overhead | Similar to pickle |
🎯 Key Takeaways
- pickle converts Python objects to byte streams and back — great for internal caching and model persistence.
- Always use binary mode ('wb'/'rb') and specify a protocol version to avoid version incompatibilities.
- Never unpickle data from untrusted sources — it can execute arbitrary code via __reduce__.
- Use __getstate__ and __setstate__ to control serialization of complex objects with non-serializable fields.
- For cross-language or user-supplied data, choose JSON or protocol buffers over pickle.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is Python's pickle module and how does it differ from JSON serialization?JuniorReveal
- QExplain the pickle protocol versions and when to use each.Mid-levelReveal
- QHow would you handle a pickle security vulnerability in a production system that accepts serialized objects from authenticated users?SeniorReveal
- QDescribe a scenario where __getstate__ and __setstate__ are essential in pickle.Mid-levelReveal
Frequently Asked Questions
What is pickle Module in Python in simple terms?
Think of pickle as Python's way to take a snapshot of an object and save it to a file. Later, you can load that snapshot and get back the exact same object — even if it's a complex nested structure or a custom class instance.
Why is pickle considered insecure?
pickle can execute arbitrary Python code during unpickling because it can instantiate any class and call any function via the __reduce__ method. Attackers can craft a malicious pickle payload that runs system commands or exfiltrates data.
How can I make pickle safe for limited use?
Create a subclass of pickle.Unpickler and override the find_class method to allow only a whitelist of safe modules and classes. Even so, consider stronger isolation like running the unpickling in a container or using a non-Python serialization format.
Can pickle handle lambda functions?
No, the standard pickle cannot serialize lambdas, nested functions, or interactive sessions. For that, use the dill library which extends pickle's capabilities.
What protocol should I use for cross-version Python compatibility?
Use protocol=4 if you need compatibility with Python 3.4 to 3.7. For Python 3.8+, protocol=5 is best but not backward compatible. If you control both ends and use the same Python version, use pickle.HIGHEST_PROTOCOL.
Why does pickle.load() sometimes raise 'AttributeError' for missing class?
The class definition used when pickling must be importable at unpickling time. If you renamed, moved, or deleted the class, pickle cannot reconstruct the object. Use __getstate__/__setstate__ to decouple from class location.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.