Working with JSON in Python: Read, Write, and Parse Like a Pro
Every time you tap 'place order' on an e-commerce site, check the weather on your phone, or log into an app, JSON is quietly doing the heavy lifting behind the scenes. It's the format web APIs use to send data back and forth, the format config files are stored in, and the format data pipelines use to pass records between services. If you're writing Python in 2024 and you're not comfortable with JSON, you've got a gap that'll slow you down on almost every real project.
The problem JSON solves is surprisingly simple: computers need to share structured data with each other, but every language stores data differently in memory. A Python dictionary isn't the same as a JavaScript object internally — but both can be serialized into a JSON string that looks identical. JSON is the agreed-upon middle ground, a plain-text format that any language can read and write without needing to know anything about the other side's internals.
By the end of this article you'll be able to read JSON from a file, write Python data structures back out as JSON, parse API responses with confidence, handle encoding edge cases, and avoid the three mistakes that trip up even experienced developers. You'll also know exactly what to say when an interviewer asks you about serialization.
json.loads() vs json.load() — The Difference That Trips Everyone Up
Python's built-in json module gives you four functions you'll use constantly: json.loads(), json.load(), json.dumps(), and json.dump(). The naming is deceptively similar, and mixing them up is the single most common JSON mistake in Python.
Here's the mental model: the functions with an 's' on the end work with strings. The ones without the 's' work with file objects. That's it. json.loads() takes a JSON-formatted string and returns a Python object. json.load() takes an open file handle and reads JSON directly from it. Same relationship on the output side: json.dumps() returns a string, json.dump() writes directly to a file.
Why does this distinction matter? Because when you're hitting a web API, requests.get().text gives you a string — so you reach for json.loads(). When you're reading a config file from disk, you open the file and reach for json.load(). Picking the wrong one gives you a confusing AttributeError or TypeError that's hard to debug if you don't know the root cause.
import json # ── CASE 1: Parsing a JSON STRING (e.g. from an API response) ── # Imagine requests.get(...).text returned this string: api_response_text = '{"user": "alice", "score": 42, "active": true}' # json.loads() deserializes a STRING into a Python dict user_data = json.loads(api_response_text) print(type(user_data)) # Confirm it's a dict, not a string print(user_data["user"]) # Access like any normal Python dict print(user_data["active"]) # JSON 'true' becomes Python True (bool) print("---") # ── CASE 2: Reading JSON from a FILE ── # First, let's write a sample file so the example is fully self-contained sample_config = { "app_name": "DataPipeline", "version": "2.1.0", "debug": False, "max_retries": 3 } # Write the config to disk first with open("config.json", "w", encoding="utf-8") as config_file: json.dump(sample_config, config_file) # dump() writes to a FILE OBJECT # Now read it back — json.load() reads from a FILE OBJECT, not a string with open("config.json", "r", encoding="utf-8") as config_file: loaded_config = json.load(config_file) print(type(loaded_config)) # Also a dict print(loaded_config["app_name"]) # 'DataPipeline' print(loaded_config["debug"]) # False (Python bool) print(loaded_config["max_retries"]) # 3 (Python int)
alice
True
---
<class 'dict'>
DataPipeline
False
3
Writing JSON Files the Right Way — Formatting, Encoding, and Sorting
Writing raw JSON to a file works fine, but the output is a single compressed line that's nearly unreadable when you open the file. In production you often need human-readable output — for config files, logs, or debugging. The json.dump() and json.dumps() functions have optional parameters that give you full control over the output format.
The indent parameter is the big one. Pass indent=2 or indent=4 and your output is immediately readable with proper nesting. The sort_keys=True parameter alphabetises the keys, which is invaluable for config files or any output that goes into version control — it makes diffs clean and predictable instead of random.
The ensure_ascii=False parameter is critical and often forgotten. By default, Python's json module escapes every non-ASCII character — so a name like 'María' becomes '\u004dar\u00eda'. That's technically valid JSON, but it's unreadable and bloated. Setting ensure_ascii=False writes the actual Unicode characters directly, which is almost always what you want when handling international data.
Finally, always open your JSON files with encoding='utf-8' explicitly. Don't rely on the system default — it varies between Windows, Mac, and Linux and will cause encoding bugs that are maddening to debug across environments.
import json # Real-world example: saving a user profile with international characters user_profile = { "user_id": 1047, "full_name": "María García", # Non-ASCII characters — common in real data "email": "maria.garcia@example.com", "preferences": { "theme": "dark", "language": "es", "notifications": True }, "tags": ["premium", "verified", "beta-tester"] } # ── BAD: Default output — technically valid but unreadable ── bad_output = json.dumps(user_profile) print("Raw (hard to read):") print(bad_output) print() # ── GOOD: Formatted, Unicode-safe, sorted keys ── good_output = json.dumps( user_profile, indent=2, # 2-space indentation for readability sort_keys=True, # Alphabetical keys — great for version control diffs ensure_ascii=False # Write 'María' not '\u004dar\u00eda' ) print("Formatted (production-ready):") print(good_output) # ── Writing to a file with explicit UTF-8 encoding ── output_path = "user_profile.json" with open(output_path, "w", encoding="utf-8") as output_file: json.dump( user_profile, output_file, indent=2, sort_keys=True, ensure_ascii=False # Critical: don't escape María into \u sequences ) print(f"\nProfile saved to {output_path}")
{"user_id": 1047, "full_name": "Mar\u00eda Garc\u00eda", "email": "maria.garcia@example.com", "preferences": {"theme": "dark", "language": "es", "notifications": true}, "tags": ["premium", "verified", "beta-tester"]}
Formatted (production-ready):
{
"email": "maria.garcia@example.com",
"full_name": "María García",
"preferences": {
"language": "es",
"notifications": true,
"theme": "dark"
},
"tags": [
"premium",
"verified",
"beta-tester"
],
"user_id": 1047
}
Profile saved to user_profile.json
Handling Non-Serializable Types — Dates, Decimals, and Custom Objects
Here's where Python and JSON have a real fight. JSON only natively supports strings, numbers, booleans, null, arrays, and objects. It has no concept of a Python datetime, a Decimal, a set, or any custom class you've built. The moment you try to serialize one of these, Python throws a TypeError: Object of type X is not JSON serializable — and it stops everything.
This isn't a bug, it's a design decision. JSON is meant to be language-agnostic, so it can't encode Python-specific types. Your job is to tell Python how to translate those types into something JSON understands.
The cleanest way is to write a custom encoder by subclassing json.JSONEncoder and overriding its default() method. You get called for any type the encoder doesn't know how to handle, and you return a JSON-compatible representation. This is the approach used in production codebases because it's explicit, testable, and reusable — you define the encoding logic once and pass the encoder class anywhere you call json.dump() or json.dumps().
For quick scripts, the default parameter shortcut works fine — pass a lambda or small function. But for anything going into a team codebase, write a proper encoder class. It's more readable and your colleagues will thank you.
import json from datetime import datetime, date from decimal import Decimal # ── The types that break vanilla JSON serialization ── order_record = { "order_id": "ORD-9921", "created_at": datetime(2024, 3, 15, 10, 30, 0), # datetime — not JSON serializable "ship_date": date(2024, 3, 17), # date — also not JSON serializable "total_amount": Decimal("199.99"), # Decimal — not JSON serializable "item_ids": {101, 205, 309}, # set — not JSON serializable "status": "pending" } # Try without a custom encoder — this will raise TypeError try: json.dumps(order_record) except TypeError as encoding_error: print(f"Without custom encoder: {encoding_error}") print() # ── Solution: Custom JSONEncoder subclass ── class AppJSONEncoder(json.JSONEncoder): """Handles Python types that vanilla JSON can't serialize.""" def default(self, obj): # datetime: serialize as ISO 8601 string — universally understood if isinstance(obj, datetime): return obj.isoformat() # e.g. '2024-03-15T10:30:00' # date: serialize as ISO date string if isinstance(obj, date): return obj.isoformat() # e.g. '2024-03-17' # Decimal: convert to float — fine for display, use string for finance if isinstance(obj, Decimal): return float(obj) # or str(obj) if precision matters # set: JSON has arrays, not sets — convert and sort for determinism if isinstance(obj, set): return sorted(list(obj)) # sort so output is always consistent # Let the parent class raise TypeError for anything we don't handle return super().default(obj) # Now serialize cleanly using our custom encoder serialised_order = json.dumps( order_record, cls=AppJSONEncoder, # Pass the encoder CLASS (not an instance) indent=2, ensure_ascii=False ) print("Serialized order record:") print(serialised_order) # ── Deserializing back: parse the date string yourself ── loaded_order = json.loads(serialised_order) print("\nRound-trip — created_at is now a string:") print(type(loaded_order["created_at"]), loaded_order["created_at"]) # Convert back to datetime when needed parsed_datetime = datetime.fromisoformat(loaded_order["created_at"]) print(f"Re-parsed datetime: {parsed_datetime}")
Serialized order record:
{
"order_id": "ORD-9921",
"created_at": "2024-03-15T10:30:00",
"ship_date": "2024-03-17",
"total_amount": 199.99,
"item_ids": [
101,
205,
309
],
"status": "pending"
}
Round-trip — created_at is now a string:
<class 'str'> 2024-03-15T10:30:00
Re-parsed datetime: 2024-03-15 10:30:00
Real-World Pattern: Fetching and Processing an API Response
Everything we've covered comes together the moment you touch a real API. The pattern is always the same: make an HTTP request, get back a JSON string in the response body, parse it into a Python dict, do your work, optionally write results to a file. Understanding each step means you're never guessing.
The requests library's response object has a convenient .json() shortcut method that calls json.loads() on response.text for you — but only if the response's Content-Type header is application/json. If it's not, you'll get a requests.exceptions.JSONDecodeError. It's worth knowing this so you're not surprised.
The pattern below also shows defensive coding: checking that the response key you expect actually exists before accessing it, and handling the case where an API returns a successful HTTP 200 but puts error details in the JSON body — which is extremely common with third-party APIs. This is the difference between code that works in a demo and code that survives contact with the real world.
import json import urllib.request # Using stdlib so no pip install needed for this example import urllib.error from datetime import datetime # ── Simulating an API response (using a real public API) ── # We'll use JSONPlaceholder — a free, stable fake REST API for testing API_URL = "https://jsonplaceholder.typicode.com/users/1" def fetch_user_profile(user_url: str) -> dict | None: """Fetch a user profile from an API and return it as a Python dict.""" try: with urllib.request.urlopen(user_url, timeout=10) as response: # response.read() returns bytes — decode to string first raw_bytes = response.read() json_string = raw_bytes.decode("utf-8") # Parse the JSON string into a Python dict user_data = json.loads(json_string) return user_data except urllib.error.URLError as network_error: print(f"Network error: {network_error}") return None except json.JSONDecodeError as parse_error: print(f"Response wasn't valid JSON: {parse_error}") return None def save_profile_snapshot(user_data: dict, output_path: str) -> None: """Enrich the profile with a timestamp and save it to a JSON file.""" # Add metadata before saving — this is extremely common in pipelines enriched_profile = { "fetched_at": datetime.utcnow().isoformat() + "Z", # ISO 8601 UTC "source_url": API_URL, "profile": user_data } with open(output_path, "w", encoding="utf-8") as output_file: json.dump( enriched_profile, output_file, indent=2, ensure_ascii=False # Safe for names with accents or non-Latin chars ) print(f"Snapshot saved to: {output_path}") # ── Main flow ── print(f"Fetching profile from {API_URL}...") user = fetch_user_profile(API_URL) if user: # Defensive access — check keys exist before using them user_name = user.get("name", "Unknown") user_email = user.get("email", "No email provided") company_name = user.get("company", {}).get("name", "No company") print(f"Name: {user_name}") print(f"Email: {user_email}") print(f"Company: {company_name}") # Save the enriched snapshot save_profile_snapshot(user, "user_snapshot.json") # ── Reading it back to verify the round-trip ── print("\nVerifying saved file...") with open("user_snapshot.json", "r", encoding="utf-8") as saved_file: reloaded = json.load(saved_file) print(f"Fetched at: {reloaded['fetched_at']}") print(f"Stored name: {reloaded['profile']['name']}")
Name: Leanne Graham
Email: Sincere@april.biz
Company: Romaguera-Crona
Snapshot saved to: user_snapshot.json
Verifying saved file...
Fetched at: 2024-03-15T10:30:00Z
Stored name: Leanne Graham
| Aspect | json.load() / json.dump() | json.loads() / json.dumps() |
|---|---|---|
| Input / Output type | File object (open file handle) | Python string |
| Typical use case | Reading/writing JSON files on disk | Parsing API responses, network data |
| Memory efficiency | Streams from file — better for large files | Entire string must fit in memory |
| Encoding parameter | Controlled by open() — always set encoding='utf-8' | String already decoded — no encoding param |
| Error on wrong input | AttributeError: 'str' object has no attribute 'read' | TypeError: the JSON object must be str, not TextIOWrapper |
| indent / sort_keys work? | Yes — same optional parameters | Yes — same optional parameters |
🎯 Key Takeaways
- The 's' rule is permanent: json.loads() and json.dumps() work with strings; json.load() and json.dump() work with file objects. Mix them up and you get an AttributeError or TypeError that's confusing without this context.
- Always open JSON files with encoding='utf-8' explicitly and always pass ensure_ascii=False when writing — these two habits alone prevent an entire class of encoding bugs that are notoriously hard to reproduce across different operating systems.
- JSON has no native type for datetime, Decimal, or set. Build a reusable JSONEncoder subclass early in any project that deals with these types — it's a one-time cost that pays dividends every time you add a new serialization call.
- Use dict.get('key', fallback) instead of dict['key'] when parsing external JSON — APIs change, fields disappear, and a KeyError at runtime is far worse than a sensible default value.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Calling json.load() on a string instead of json.loads() — Symptom: AttributeError: 'str' object has no attribute 'read' — Fix: If your JSON is already a string (e.g. from an API response or a variable), use json.loads(). Reserve json.load() exclusively for open file handles. The 's' = string rule is your cheat code.
- ✕Mistake 2: Forgetting ensure_ascii=False when writing international data — Symptom: Names like 'José' appear as 'Jos\u00e9' in your JSON files, making them unreadable and bloated — Fix: Always pass ensure_ascii=False to json.dump() and json.dumps(). This tells Python to write real Unicode characters instead of escape sequences, which is almost always the right behaviour for modern applications.
- ✕Mistake 3: Trying to serialize a datetime or Decimal without a custom encoder — Symptom: TypeError: Object of type datetime is not JSON serializable, thrown at runtime, often mid-pipeline — Fix: Write a custom JSONEncoder subclass that handles your non-standard types, or convert them manually before serializing (e.g. str(my_date) or my_datetime.isoformat()). Build the encoder once, reuse it everywhere in your codebase.
Interview Questions on This Topic
- QWhat's the difference between json.load() and json.loads() in Python, and when would you use each one?
- QYou're serializing a Python object to JSON and you get a TypeError saying your datetime is not JSON serializable. Walk me through two different ways to fix that.
- QIf you're storing financial amounts like prices and discounts in a JSON file, would you serialize them as floats or strings, and why? What could go wrong with floats?
Frequently Asked Questions
How do I read a JSON file in Python?
Open the file with open('yourfile.json', 'r', encoding='utf-8') and pass the file handle to json.load(). This gives you a Python dict or list you can work with immediately. Always specify encoding='utf-8' explicitly to avoid platform-specific encoding surprises.
Why does Python throw 'Object of type datetime is not JSON serializable'?
JSON is a language-agnostic format and only supports strings, numbers, booleans, null, arrays, and objects. Python's datetime type doesn't map to any of these natively. Fix it by subclassing json.JSONEncoder and converting datetime objects to ISO strings (datetime.isoformat()) in the default() method, then pass your encoder class using the cls= parameter.
What's the difference between json.dumps() returning a string versus json.dump() writing to a file?
json.dumps() (with an 's') serializes a Python object into a JSON-formatted string in memory and returns it — useful when you need to send data over a network or store it in a variable. json.dump() (without the 's') writes the serialized JSON directly into an open file object, which is more efficient for writing to disk since you don't need to hold the entire string in memory first.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.