JSON in Python — Unicode Escapes Cause Downstream Failure
Python's json.
- Core concept: Python's json module converts between Python objects and JSON strings/files
- Key functions: json.loads() for strings, json.load() for files — the 's' rule is permanent
- Performance: json.dump() streams data to a file handle — use for large payloads, not json.dumps()
- Production insight: Missing ensure_ascii=False silently corrupts international text
- Biggest mistake: Calling json.load() on a string gives AttributeError — always check input type first
Think of JSON like a shipping label on a package. The label has structured information — sender, receiver, contents, weight — written in a way both the post office and the recipient can read instantly. JSON is exactly that: a universal 'shipping label' format that lets your Python app send data to a website, a database, or another program, and have it arrive perfectly readable on the other side. It doesn't matter if the other end is written in JavaScript, Go, or Java — JSON is the common language they all speak.
Every time you tap 'place order' on an e-commerce site, check the weather on your phone, or log into an app, JSON is quietly doing the heavy lifting behind the scenes. It's the format web APIs use to send data back and forth, the format config files are stored in, and the format data pipelines use to pass records between services. If you're writing Python in 2024 and you're not comfortable with JSON, you've got a gap that'll slow you down on almost every real project.
The problem JSON solves is surprisingly simple: computers need to share structured data with each other, but every language stores data differently in memory. A Python dictionary isn't the same as a JavaScript object internally — but both can be serialized into a JSON string that looks identical. JSON is the agreed-upon middle ground, a plain-text format that any language can read and write without needing to know anything about the other side's internals.
By the end of this article you'll be able to read JSON from a file, write Python data structures back out as JSON, parse API responses with confidence, handle encoding edge cases, and avoid the three mistakes that trip up even experienced developers. You'll also know exactly what to say when an interviewer asks you about serialization.
json.loads() vs json.load() — The Difference That Trips Everyone Up
Python's built-in json module gives you four functions you'll use constantly: json.loads(), json.load(), json.dumps(), and json.dump(). The naming is deceptively similar, and mixing them up is the single most common JSON mistake in Python.
Here's the mental model: the functions with an 's' on the end work with strings. The ones without the 's' work with file objects. That's it. takes a JSON-formatted string and returns a Python object. json.loads() takes an open file handle and reads JSON directly from it. Same relationship on the output side: json.load() returns a string, json.dumps() writes directly to a file.json.dump()
Why does this distinction matter? Because when you're hitting a web API, gives you a string — so you reach for requests.get().text. When you're reading a config file from disk, you open the file and reach for json.loads(). Picking the wrong one gives you a confusing json.load()AttributeError or TypeError that's hard to debug if you don't know the root cause.
json.loads() and json.dumps() = string in, string out. json.load() and json.dump() = file in, file out. Tattoo this on your brain and you'll never mix them up again.json.loads()json.load()json.dumps()json.dump()Writing JSON Files the Right Way — Formatting, Encoding, and Sorting
Writing raw JSON to a file works fine, but the output is a single compressed line that's nearly unreadable when you open the file. In production you often need human-readable output — for config files, logs, or debugging. The and json.dump() functions have optional parameters that give you full control over the output format.json.dumps()
The indent parameter is the big one. Pass indent=2 or indent=4 and your output is immediately readable with proper nesting. The sort_keys=True parameter alphabetises the keys, which is invaluable for config files or any output that goes into version control — it makes diffs clean and predictable instead of random.
The ensure_ascii=False parameter is critical and often forgotten. By default, Python's json module escapes every non-ASCII character — so a name like 'María' becomes '\u004dar\u00eda'. That's technically valid JSON, but it's unreadable and bloated. Setting ensure_ascii=False writes the actual Unicode characters directly, which is almost always what you want when handling international data.
Finally, always open your JSON files with encoding='utf-8' explicitly. Don't rely on the system default — it varies between Windows, Mac, and Linux and will cause encoding bugs that are maddening to debug across environments.
Handling Non-Serializable Types — Dates, Decimals, and Custom Objects
Here's where Python and JSON have a real fight. JSON only natively supports strings, numbers, booleans, null, arrays, and objects. It has no concept of a Python datetime, a Decimal, a set, or any custom class you've built. The moment you try to serialize one of these, Python throws a TypeError: Object of type X is not JSON serializable — and it stops everything.
This isn't a bug, it's a design decision. JSON is meant to be language-agnostic, so it can't encode Python-specific types. Your job is to tell Python how to translate those types into something JSON understands.
The cleanest way is to write a custom encoder by subclassing json.JSONEncoder and overriding its method. You get called for any type the encoder doesn't know how to handle, and you return a JSON-compatible representation. This is the approach used in production codebases because it's explicit, testable, and reusable — you define the encoding logic once and pass the encoder class anywhere you call default() or json.dump().json.dumps()
For quick scripts, the default parameter shortcut works fine — pass a lambda or small function. But for anything going into a team codebase, write a proper encoder class. It's more readable and your colleagues will thank you.
TypeError mid-pipeline stops everything. In production this often happens when a new field type is added to a model but the encoder isn't updated.Real-World Pattern: Fetching and Processing an API Response
Everything we've covered comes together the moment you touch a real API. The pattern is always the same: make an HTTP request, get back a JSON string in the response body, parse it into a Python dict, do your work, optionally write results to a file. Understanding each step means you're never guessing.
The requests library's response object has a convenient .json() shortcut method that calls on json.loads()response.text for you — but only if the response's Content-Type header is application/json. If it's not, you'll get a requests.exceptions.JSONDecodeError. It's worth knowing this so you're not surprised.
The pattern below also shows defensive coding: checking that the response key you expect actually exists before accessing it, and handling the case where an API returns a successful HTTP 200 but puts error details in the JSON body — which is extremely common with third-party APIs. This is the difference between code that works in a demo and code that survives contact with the real world.
json.dump() directly to avoid holding the full string in memory.response.json() will succeed — wrap it in try-except.json.loads() in a try-except for API responses.Graceful Error Handling for JSON Parsing and Encoding
Production systems encounter malformed JSON more often than you'd think. A truncated network response, a misconfigured upstream service, or a file corrupted mid-write can all produce invalid JSON. Your code needs to handle these failures without crashing the entire pipeline.
The most common JSON parsing error is JSONDecodeError, which is a subclass of ValueError. It tells you exactly where the parser failed: line number, column number, and the unexpected character. Use this information in your logging to speed up debugging.
Another frequent issue is UnicodeDecodeError when opening files that aren't UTF-8. Some legacy systems output UTF-16, Latin-1, or UTF-8 with a BOM (byte order mark). Python's with open()encoding='utf-8' will fail on these. Use encoding='utf-8-sig' to strip the BOM automatically, or detect encoding with the chardet library for unknown sources.
A defensive strategy: never let a single malformed JSON entry crash your entire batch job. If you're processing a line-delimited JSON file (one JSON object per line), wrap each line parse in a try-except, log the error, and continue. This is standard in ETL pipelines.
json.JSONDecodeError object has .lineno, .colno, and .pos attributes — use them in your logs.json.loads() on a sample first.encoding='utf-8-sig' when you don't know if the source includes a BOM — it's harmless if there's no BOM.json.loads() in try-except for every external source.The Unicode Escape Incident: Downstream Systems Break on Escaped Names
json.dump() would write readable text. Nobody read the default behaviour documentation.json.dump() and json.dumps() escape all non-ASCII characters. Names like 'José' become 'Jos\u00e9'. The consuming service's parser didn't handle Unicode escape sequences correctly.json.dump() call in the pipeline. Also add a post-deployment validation script that checks for escape sequences in output files.- Always pass ensure_ascii=False when writing JSON for production consumption — human-readable or not, it's safer.
- Test round-trips with non-ASCII test data in CI. A simple unit test with 'María' would have caught this before deploy.
- Document encoding assumptions in your API contract — don't assume the next service handles escaped Unicode.
json.load() on a string instead of json.loads(). Check if the input is a file path (use open() then json.load()) or a string (use json.loads()).json.load() per line (NDJSON) or wrap in a list. For a file with one JSON per line, iterate with: for line in file: json.loads(line).json.dump() a non-serializable Python type. Use a custom JSONEncoder subclass that handles datetime, Decimal, set, etc. Or convert manually before calling json.dump().Key takeaways
json.dumps() work with strings; json.load() and json.dump() work with file objects. Mix them up and you get an AttributeError or TypeError that's confusing without this context.json.loads() in try-except when parsing external data. Log the line and column from JSONDecodeError. Never let a single malformed entry crash a batch job.Common mistakes to avoid
3 patternsCalling json.load() on a string instead of json.loads()
json.load(), which expects a file handle.json.loads(). Reserve json.load() exclusively for open file handles. The 's' = string rule is your cheat code.Forgetting ensure_ascii=False when writing international data
json.dump() and json.dumps(). This tells Python to write real Unicode characters instead of escape sequences, which is almost always the right behaviour for modern applications.Trying to serialize a datetime or Decimal without a custom encoder
my_datetime.isoformat()). Build the encoder once, reuse it everywhere in your codebase.Interview Questions on This Topic
What's the difference between json.load() and json.loads() in Python, and when would you use each one?
json.loads() reads JSON from a string. The 's' stands for 'string'. Use json.load() when you have a file path—open the file and pass the handle. Use json.loads() when you already have the JSON as a string, typically from an API response or a variable. Confusing them leads to AttributeError ('str' object has no attribute 'read') when you pass a string to json.load().Frequently Asked Questions
That's File Handling. Mark it forged?
5 min read · try the examples if you haven't