Skip to content
Home Python Python Pickle — __reduce__() Remote Code Execution

Python Pickle — __reduce__() Remote Code Execution

Where developers are forged. · Structured learning · Free forever.
📍 Part of: File Handling → Topic 6 of 6
pickle's __reduce__() enables arbitrary code execution during deserialization.
⚙️ Intermediate — basic Python knowledge assumed
In this tutorial, you'll learn
pickle's __reduce__() enables arbitrary code execution during deserialization.
  • pickle converts Python objects to byte streams and back — great for internal caching and model persistence.
  • Always use binary mode ('wb'/'rb') and specify a protocol version to avoid version incompatibilities.
  • Never unpickle data from untrusted sources — it can execute arbitrary code via __reduce__.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • pickle converts arbitrary Python objects to byte streams for storage or transmission.
  • Use pickle.dump() to write to a file and pickle.load() to restore.
  • Protocol versions (0-5) trade compatibility for speed and size; use PROTOCOL 4 or 5 for modern apps.
  • Pickle is faster than JSON for Python-native objects but insecure with untrusted data.
  • Production gotcha: unpickling attacker-controlled data executes arbitrary code via __reduce__.
🚨 START HERE

Quick Pickle Debug Cheat Sheet

Use these commands to diagnose and fix pickle issues fast.
🟡

UnpicklingError: invalid load key '\x00'

Immediate ActionCheck that the file was opened in binary mode ('rb' not 'r').
Commands
python -c "with open('data.pkl','rb') as f: print(f.read(20))"
python -c "with open('data.pkl','rb') as f: import pickle; print(pickle.load(f))"
Fix NowRe-open file with 'rb' mode and ensure the pickle protocol matches the Python version.
🟡

AttributeError: Can't get attribute 'MyClass'

Immediate ActionImport the class or module that contains MyClass before unpickling.
Commands
python -c "from mymodule import MyClass; import pickle; data = pickle.load(open('data.pkl','rb'))"
python -c "import pickle; with open('data.pkl','rb') as f: print(type(pickle.load(f)).__name__)"
Fix NowEither import the missing class or reconstruct the object using __setstate__ if available.
Production Incident

Remote Code Execution via Unpickling Untrusted Data

A single pickle.load() call on a user-uploaded file ran arbitrary shell commands.
SymptomDevelopment server suddenly spawning unexpected processes; after investigation, attacker had uploaded a crafted pickle file that executed os.system('rm -rf /').
AssumptionThe team assumed pickle was safe because it's part of the standard library and they only loaded files from authenticated users.
Root causepickle's __reduce__ method allows objects to specify arbitrary callables and arguments during deserialization. The attacker embedded a malicious __reduce__ that called os.system with a destructive command.
FixNever unpickle untrusted data. Switch to a safe serialization format (JSON, protobuf) for user-supplied input. If pickle is unavoidable, implement a custom Unpickler that restricts allowed classes using find_class.
Key Lesson
Never pass pickle.load() data from an untrusted source.Implement a restricted Unpickler if you must use pickle with external input.Consider using JSON, MessagePack, or dill with a restricted whitelist.
Production Debug Guide

Common symptoms when pickle.load() fails and how to fix them

AttributeError: Can't get attribute 'ClassName' on <module>The class definition is missing or the module path changed. Add the class to the namespace or import the correct module before unpickling.
pickle.UnpicklingError: invalid load keyThe byte stream is corrupted or not a valid pickle. Check file integrity, protocol version, and ensure you're reading binary mode ('rb').
Pickle data truncated or EOFErrorFile was not fully written. Use .flush() and .close() after pickle.dump(), or verify file size.
ModuleNotFoundError when unpicklingThe pickled object depends on a module missing in the current environment. Install the required module or use a portable serialization like JSON.

Every serious Python application eventually hits the same wall: you spend time computing something valuable — a trained machine learning model, a complex graph structure, a parsed configuration tree — and then your program ends and all of it vanishes. The next run starts from scratch, wasting time and resources. This is one of the most quietly expensive problems in Python development, and most beginners don't realise there's a clean, built-in solution sitting right in the standard library.

The pickle module solves this by giving you object serialization: the ability to convert any Python object into a byte stream that can be written to disk, stored in a database, or sent across a network — and then deserialized back into an identical live object. Unlike writing to a CSV or JSON file, pickle doesn't care what shape your data is in. It handles nested objects, custom class instances, lambda functions, and even circular references without you lifting a finger.

By the end of this article you'll understand exactly how pickle works under the hood, when it's the right tool versus when you should reach for JSON or shelve, how to safely serialize and deserialize complex Python objects including class instances, and — critically — the security trap that catches even experienced developers off guard. You'll walk away with patterns you can drop into real projects immediately.

What is pickle Module in Python?

pickle is Python's built-in serialization module. It converts any Python object into a byte stream that can be saved to disk or sent over a network, then reconstructed later. The key advantage over formats like JSON or CSV is that pickle can handle arbitrary Python objects — including custom class instances, nested structures, and even circular references — without you writing any conversion code.

Here's the simplest possible example: serialize a dictionary to a file, then read it back.

io_thecodeforge/pickle_basics.py · PYTHON
12345678910111213
import pickle

# Serialize a dict to a file
data = {'name': 'model_v1', 'accuracy': 0.95, 'layers': [128, 64, 10]}
with open('model.pkl', 'wb') as f:
    pickle.dump(data, f)

# Deserialize it back
with open('model.pkl', 'rb') as f:
    restored = pickle.load(f)

print(restored == data)  # True
print(restored['name'])  # model_v1
▶ Output
True
model_v1
🔥Where pickle shines
Use pickle for internal Python persistence — caching results, saving ML models, checkpointing long computations. Avoid it for cross-language or untrusted data.
📊 Production Insight
The biggest gotcha: pickle relies on class definitions being importable at unpickling time.
If you rename a module or delete a class, old pickle files become unreadable.
Rule: Always version your pickle format and keep backward compatibility in mind.
🎯 Key Takeaway
pickle serializes any Python object to bytes.
Binary mode ('wb'/'rb') is mandatory — text mode corrupts the stream.
The deserializing environment must have the same classes available.

How to Serialize and Deserialize Objects with pickle

The two core functions are pickle.dump() to write an object to a file, and pickle.load() to read it back. You can also use pickle.dumps() to get bytes and pickle.loads() from bytes. Always open your files in binary mode: 'wb' for writing, 'rb' for reading. The protocol argument controls the binary format version.

Protocol versions range from 0 (text-based, readable) to 5 (binary, with out-of-band data support). Specifying protocol=pickle.HIGHEST_PROTOCOL gives you the best speed and smallest size, but may not be compatible with older Python versions.

io_thecodeforge/protocol_example.py · PYTHON
123456789101112131415
import pickle

# Pickle a complex object with different protocols
data = {"name": "model", "params": [1.2, 3.4], "hyperparams": {"lr": 0.001}}

with open('data_proto4.pkl', 'wb') as f:
    pickle.dump(data, f, protocol=4)

with open('data_proto5.pkl', 'wb') as f:
    pickle.dump(data, f, protocol=5)

# Read back (protocol auto-detected)
with open('data_proto4.pkl', 'rb') as f:
    restored = pickle.load(f)
print(restored)
▶ Output
{'name': 'model', 'params': [1.2, 3.4], 'hyperparams': {'lr': 0.001}}
📊 Production Insight
Using protocol 5 (Python 3.8+) with out-of-band data can reduce memory copies for large numpy arrays.
But if you share pickle files with Python 3.7 or earlier, protocol 5 will fail.
Rule: For maximum compatibility, use protocol=4. For internal use, use HIGHEST_PROTOCOL.
🎯 Key Takeaway
Use pickle.dump() for files, pickle.dumps() for bytes.
Always specify a protocol version explicitly to avoid surprises.
The highest protocol gives best performance but may not be backward compatible.

Customizing Serialization with __getstate__ and __setstate__

Not every object attribute is serializable by default. File handles, network connections, and database sessions need special handling. Override __getstate__ to return a dict of picklable state, and __setstate__ to restore resources after deserialization.

A common pattern: your class holds a database connection that cannot be pickled. You exclude it in __getstate__ and recreate it in __setstate__.

io_thecodeforge/db_connection_custom.py · PYTHON
1234567891011121314151617181920212223
class DatabaseConnection:
    def __init__(self, dsn):
        self.dsn = dsn
        self.conn = self._create_connection()

    def _create_connection(self):
        return f"Connected to {self.dsn}"

    def __getstate__(self):
        return {'dsn': self.dsn}

    def __setstate__(self, state):
        self.__dict__.update(state)
        self.conn = self._create_connection()

import pickle
conn = DatabaseConnection("db://host:port/db")
with open('conn.pkl', 'wb') as f:
    pickle.dump(conn, f)

with open('conn.pkl', 'rb') as f:
    loaded = pickle.load(f)
print(loaded.conn)
▶ Output
Connected to db://host:port/db
📊 Production Insight
If your class contains file handles or database connections, they won't pickle automatically.
Implement __getstate__ to return a dict of serializable attributes and __setstate__ to restore connections.
Rule: Always test round-trip serialization of your custom classes with unit tests.
🎯 Key Takeaway
__getstate__ controls what gets pickled; __setstate__ rebuilds resources on load.
Without these, unpickling may fail or produce stale connections.
Test round-trips early — they fail silently at first unpickle.

Security Risks and Safe Usage

The biggest danger with pickle is that pickle.load() can execute arbitrary code. This happens through the __reduce__ protocol, which allows objects to specify any callable and arguments to reconstruct themselves. A malicious pickle can run os.system, open network sockets, or delete files.

The only safe rule: never unpickle data from an untrusted source. If you must accept external input, use a restricted Unpickler that whitelists allowed classes. Even then, consider using a separate process or container for isolation.

⚠ Never Unpickle Untrusted Data
A single pickle.load() from an attacker can execute any Python code. Always restrict what classes can be unpickled using a custom Unpickler with overridden find_class. But even that is not foolproof — prefer JSON or protocol buffers for untrusted input.
📊 Production Insight
A crafted pickle can execute arbitrary Python code via __reduce__.
Production incidents often involve file upload servers that trust pickle input.
Rule: Use JSON or a restricted Unpickler for any data from external sources. Or better, don't accept pickle at all.
🎯 Key Takeaway
Never call pickle.load() on data you didn't create.
Implement a custom Unpickler with find_class whitelist if absolutely required.
For configuration or user data, choose JSON or protocol buffers — pickle is not worth the risk.
Pickle vs JSON vs dill
IfNeed to serialize only basic types (dict, list, int, str) and share across languages
UseUse JSON
IfNeed to serialize custom Python objects (classes, functions) for internal use
UseUse pickle with protocol 4 or 5
IfNeed to serialize lambdas, closures, or interactive objects
UseUse dill (extends pickle)
IfData comes from untrusted source
UseAvoid pickle entirely; use JSON or a restricted Unpickler with extreme caution

Performance Comparison: pickle vs JSON vs dill

pickle is generally faster than JSON for Python-native objects because it uses a binary format and can encode arbitrary types without additional conversion. However, for simple dicts and lists, JSON can be faster due to optimized C implementations. dill extends pickle and adds support for lambdas and closures, but comes with a size and speed overhead.

Here's a rough benchmark: pickling a 10MB list of strings is about 3x faster than JSON with protocols 4 or 5. The file size is also ~20% smaller. But if you need interoperability, JSON wins.

📊 Production Insight
pickle is 3-5x faster than JSON for Python-native objects but 2x slower for simple dicts.
Memory overhead: pickle with protocol 5 is ~20% smaller than JSON for nested structures.
Rule: Measure before optimizing; JSON wins for interoperability, pickle for internal persistence.
🎯 Key Takeaway
Use pickle for internal Python object storage — it's faster and more compact.
Use JSON for cross-language or untrusted data.
dill extends pickle to handle lambdas and closures but adds overhead.
🗂 Serialization Format Comparison
Key trade-offs at a glance
FeaturepickleJSONdill
Python object supportFull (classes, functions, circular refs)Basic types only (dict, list, str, int, etc.)Full plus lambdas, closures, generators
SecurityDangerous — executes arbitrary codeSafe — no code executionSame risk as pickle
Speed (large objects)Fast (binary, protocol 5)Slower (text-based, conversion overhead)Slower than pickle (extension overhead)
InteroperabilityPython onlyCross-languagePython only
File size (nested data)20-30% smaller than JSONLarger due to text overheadSimilar to pickle

🎯 Key Takeaways

  • pickle converts Python objects to byte streams and back — great for internal caching and model persistence.
  • Always use binary mode ('wb'/'rb') and specify a protocol version to avoid version incompatibilities.
  • Never unpickle data from untrusted sources — it can execute arbitrary code via __reduce__.
  • Use __getstate__ and __setstate__ to control serialization of complex objects with non-serializable fields.
  • For cross-language or user-supplied data, choose JSON or protocol buffers over pickle.

⚠ Common Mistakes to Avoid

    Unpickling data from untrusted sources
    Symptom

    Malicious code execution, data breaches, or system compromise after loading a pickle file from a user or external API.

    Fix

    Never use pickle.load() on untrusted data. Use a safe format like JSON for external input, or implement a custom Unpickler with a strict allowed classes list.

    Forgetting to open files in binary mode
    Symptom

    UnpicklingError: invalid load key, often '\x00' because text mode corrupts the byte stream.

    Fix

    Always use 'wb' for pickle.dump() and 'rb' for pickle.load(). Never use 'w' or 'r' which are text mode.

    Assuming pickle works across Python versions without protocol compatibility
    Symptom

    UnpicklingError when loading a pickle created in a newer Python version (e.g., protocol 5 unpickled in Python 3.7).

    Fix

    Define a fixed protocol version when pickling (e.g., protocol=4) for cross-version compatibility. Use pickle.DEFAULT_PROTOCOL for the current version only.

    Relying on pickle for class instances without stable class definitions
    Symptom

    AttributeError: Can't get attribute 'MyClass' on <module> when the class has been moved or renamed.

    Fix

    Keep class definitions stable and importable. Use __getstate__/__setstate__ to decouple from exact class location. Consider using a schema registry if classes evolve.

Interview Questions on This Topic

  • QWhat is Python's pickle module and how does it differ from JSON serialization?JuniorReveal
    pickle is a Python-specific serialization that can handle arbitrary Python objects, including custom classes, functions, and circular references. It outputs binary data. JSON is a text-based, cross-language format that only supports basic data types. pickle is faster for Python objects but insecure with untrusted data.
  • QExplain the pickle protocol versions and when to use each.Mid-levelReveal
    Protocol 0 is ASCII text (readable but inefficient). Protocol 1 and 2 are binary. Protocol 3 is for Python 3.x. Protocol 4 (Python 3.4+) supports large objects and is default. Protocol 5 (Python 3.8+) adds out-of-band data for zero-copy. Use protocol 4 for broad compatibility, or HIGHEST_PROTOCOL for best performance.
  • QHow would you handle a pickle security vulnerability in a production system that accepts serialized objects from authenticated users?SeniorReveal
    Firstly, assess if pickle is necessary. If yes, implement a restricted Unpickler by overriding find_class to whitelist only specific classes. Alternatively, use a trusted data format like encrypted JSON. Additionally, sandbox the unpickling process using separate processes with limited permissions. Document the security posture and conduct code reviews.
  • QDescribe a scenario where __getstate__ and __setstate__ are essential in pickle.Mid-levelReveal
    When an object contains non-serializable resources like database connections, file handles, or network sockets. __getstate__ should return a dict of the picklable state, and __setstate__ should recreate the resource from that state. Without them, unpickling will raise an AttributeError or produce a broken object.

Frequently Asked Questions

What is pickle Module in Python in simple terms?

Think of pickle as Python's way to take a snapshot of an object and save it to a file. Later, you can load that snapshot and get back the exact same object — even if it's a complex nested structure or a custom class instance.

Why is pickle considered insecure?

pickle can execute arbitrary Python code during unpickling because it can instantiate any class and call any function via the __reduce__ method. Attackers can craft a malicious pickle payload that runs system commands or exfiltrates data.

How can I make pickle safe for limited use?

Create a subclass of pickle.Unpickler and override the find_class method to allow only a whitelist of safe modules and classes. Even so, consider stronger isolation like running the unpickling in a container or using a non-Python serialization format.

Can pickle handle lambda functions?

No, the standard pickle cannot serialize lambdas, nested functions, or interactive sessions. For that, use the dill library which extends pickle's capabilities.

What protocol should I use for cross-version Python compatibility?

Use protocol=4 if you need compatibility with Python 3.4 to 3.7. For Python 3.8+, protocol=5 is best but not backward compatible. If you control both ends and use the same Python version, use pickle.HIGHEST_PROTOCOL.

Why does pickle.load() sometimes raise 'AttributeError' for missing class?

The class definition used when pickling must be importable at unpickling time. If you renamed, moved, or deleted the class, pickle cannot reconstruct the object. Use __getstate__/__setstate__ to decouple from class location.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← Previousos and pathlib Module in Python
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged