Senior 5 min · March 05, 2026

JSON in Python — Unicode Escapes Cause Downstream Failure

Python's json.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Core concept: Python's json module converts between Python objects and JSON strings/files
  • Key functions: json.loads() for strings, json.load() for files — the 's' rule is permanent
  • Performance: json.dump() streams data to a file handle — use for large payloads, not json.dumps()
  • Production insight: Missing ensure_ascii=False silently corrupts international text
  • Biggest mistake: Calling json.load() on a string gives AttributeError — always check input type first
Plain-English First

Think of JSON like a shipping label on a package. The label has structured information — sender, receiver, contents, weight — written in a way both the post office and the recipient can read instantly. JSON is exactly that: a universal 'shipping label' format that lets your Python app send data to a website, a database, or another program, and have it arrive perfectly readable on the other side. It doesn't matter if the other end is written in JavaScript, Go, or Java — JSON is the common language they all speak.

Every time you tap 'place order' on an e-commerce site, check the weather on your phone, or log into an app, JSON is quietly doing the heavy lifting behind the scenes. It's the format web APIs use to send data back and forth, the format config files are stored in, and the format data pipelines use to pass records between services. If you're writing Python in 2024 and you're not comfortable with JSON, you've got a gap that'll slow you down on almost every real project.

The problem JSON solves is surprisingly simple: computers need to share structured data with each other, but every language stores data differently in memory. A Python dictionary isn't the same as a JavaScript object internally — but both can be serialized into a JSON string that looks identical. JSON is the agreed-upon middle ground, a plain-text format that any language can read and write without needing to know anything about the other side's internals.

By the end of this article you'll be able to read JSON from a file, write Python data structures back out as JSON, parse API responses with confidence, handle encoding edge cases, and avoid the three mistakes that trip up even experienced developers. You'll also know exactly what to say when an interviewer asks you about serialization.

json.loads() vs json.load() — The Difference That Trips Everyone Up

Python's built-in json module gives you four functions you'll use constantly: json.loads(), json.load(), json.dumps(), and json.dump(). The naming is deceptively similar, and mixing them up is the single most common JSON mistake in Python.

Here's the mental model: the functions with an 's' on the end work with strings. The ones without the 's' work with file objects. That's it. json.loads() takes a JSON-formatted string and returns a Python object. json.load() takes an open file handle and reads JSON directly from it. Same relationship on the output side: json.dumps() returns a string, json.dump() writes directly to a file.

Why does this distinction matter? Because when you're hitting a web API, requests.get().text gives you a string — so you reach for json.loads(). When you're reading a config file from disk, you open the file and reach for json.load(). Picking the wrong one gives you a confusing AttributeError or TypeError that's hard to debug if you don't know the root cause.

json_loads_vs_load.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import json

# ── CASE 1: Parsing a JSON STRING (e.g. from an API response) ──
# Imagine requests.get(...).text returned this string:
api_response_text = '{"user": "alice", "score": 42, "active": true}'

# json.loads() deserializes a STRING into a Python dict
user_data = json.loads(api_response_text)

print(type(user_data))        # Confirm it's a dict, not a string
print(user_data["user"])      # Access like any normal Python dict
print(user_data["active"])    # JSON 'true' becomes Python True (bool)

print("---")

# ── CASE 2: Reading JSON from a FILE ──
# First, let's write a sample file so the example is fully self-contained
sample_config = {
    "app_name": "DataPipeline",
    "version": "2.1.0",
    "debug": False,
    "max_retries": 3
}

# Write the config to disk first
with open("config.json", "w", encoding="utf-8") as config_file:
    json.dump(sample_config, config_file)  # dump() writes to a FILE OBJECT

# Now read it back — json.load() reads from a FILE OBJECT, not a string
with open("config.json", "r", encoding="utf-8") as config_file:
    loaded_config = json.load(config_file)

print(type(loaded_config))              # Also a dict
print(loaded_config["app_name"])        # 'DataPipeline'
print(loaded_config["debug"])           # False (Python bool)
print(loaded_config["max_retries"])     # 3 (Python int)
Output
<class 'dict'>
alice
True
---
<class 'dict'>
DataPipeline
False
3
Memory Hook:
Think of the 's' as standing for 'string'. json.loads() and json.dumps() = string in, string out. json.load() and json.dump() = file in, file out. Tattoo this on your brain and you'll never mix them up again.
Production Insight
In production, the AttributeError from mixing load/loads is confusing because it doesn't mention JSON.
Engineers often waste 30 minutes checking file permissions before realising the input was a string.
Rule: always check type(input) before calling the parse function — add a guard in helper code.
Key Takeaway
The 's' rule is permanent.
json.loads/json.dumps work with strings; json.load/json.dump work with file objects.
Mix them up and you get an AttributeError that tells you nothing about JSON.
Which function to use?
IfInput is a Python string (e.g., from requests.text)
UseUse json.loads()
IfInput is an open file handle
UseUse json.load()
IfOutput to a string (for API response, logging)
UseUse json.dumps()
IfOutput directly to a file
UseUse json.dump()

Writing JSON Files the Right Way — Formatting, Encoding, and Sorting

Writing raw JSON to a file works fine, but the output is a single compressed line that's nearly unreadable when you open the file. In production you often need human-readable output — for config files, logs, or debugging. The json.dump() and json.dumps() functions have optional parameters that give you full control over the output format.

The indent parameter is the big one. Pass indent=2 or indent=4 and your output is immediately readable with proper nesting. The sort_keys=True parameter alphabetises the keys, which is invaluable for config files or any output that goes into version control — it makes diffs clean and predictable instead of random.

The ensure_ascii=False parameter is critical and often forgotten. By default, Python's json module escapes every non-ASCII character — so a name like 'María' becomes '\u004dar\u00eda'. That's technically valid JSON, but it's unreadable and bloated. Setting ensure_ascii=False writes the actual Unicode characters directly, which is almost always what you want when handling international data.

Finally, always open your JSON files with encoding='utf-8' explicitly. Don't rely on the system default — it varies between Windows, Mac, and Linux and will cause encoding bugs that are maddening to debug across environments.

write_json_formatted.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import json

# Real-world example: saving a user profile with international characters
user_profile = {
    "user_id": 1047,
    "full_name": "María García",       # Non-ASCII characters — common in real data
    "email": "maria.garcia@example.com",
    "preferences": {
        "theme": "dark",
        "language": "es",
        "notifications": True
    },
    "tags": ["premium", "verified", "beta-tester"]
}

# ── BAD: Default output — technically valid but unreadable ──
bad_output = json.dumps(user_profile)
print("Raw (hard to read):")
print(bad_output)
print()

# ── GOOD: Formatted, Unicode-safe, sorted keys ──
good_output = json.dumps(
    user_profile,
    indent=2,              # 2-space indentation for readability
    sort_keys=True,        # Alphabetical keys — great for version control diffs
    ensure_ascii=False     # Write 'María' not '\u004dar\u00eda'
)
print("Formatted (production-ready):")
print(good_output)

# ── Writing to a file with explicit UTF-8 encoding ──
output_path = "user_profile.json"
with open(output_path, "w", encoding="utf-8") as output_file:
    json.dump(
        user_profile,
        output_file,
        indent=2,
        sort_keys=True,
        ensure_ascii=False   # Critical: don't escape María into \u sequences
    )

print(f"\nProfile saved to {output_path}")
Output
Raw (hard to read):
{"user_id": 1047, "full_name": "Mar\u00eda Garc\u00eda", "email": "maria.garcia@example.com", "preferences": {"theme": "dark", "language": "es", "notifications": true}, "tags": ["premium", "verified", "beta-tester"]}
Formatted (production-ready):
{
"email": "maria.garcia@example.com",
"full_name": "María García",
"preferences": {
"language": "es",
"notifications": true,
"theme": "dark"
},
"tags": [
"premium",
"verified",
"beta-tester"
],
"user_id": 1047
}
Profile saved to user_profile.json
Watch Out:
If you're writing JSON files that go into a Git repository, always use sort_keys=True. Without it, Python's dict iteration order (insertion order since 3.7) means two logically identical dicts can produce different JSON output if keys were inserted in different orders — creating noisy, meaningless diffs that pollute your pull requests.
Production Insight
The biggest production issue is not format but encoding. A JSON file written on Windows without explicit encoding='utf-8' can produce cp1252 bytes, breaking Linux consumers.
Always add ensure_ascii=False AND encoding='utf-8' — they are separate concerns.
Skip one and you'll get a ticket at 2am about 'corrupted' JSON files.
Key Takeaway
Always open JSON files with encoding='utf-8'.
Always pass ensure_ascii=False when writing.
Always pass sort_keys=True when writing for version control.
These three habits prevent an entire class of encoding bugs.

Handling Non-Serializable Types — Dates, Decimals, and Custom Objects

Here's where Python and JSON have a real fight. JSON only natively supports strings, numbers, booleans, null, arrays, and objects. It has no concept of a Python datetime, a Decimal, a set, or any custom class you've built. The moment you try to serialize one of these, Python throws a TypeError: Object of type X is not JSON serializable — and it stops everything.

This isn't a bug, it's a design decision. JSON is meant to be language-agnostic, so it can't encode Python-specific types. Your job is to tell Python how to translate those types into something JSON understands.

The cleanest way is to write a custom encoder by subclassing json.JSONEncoder and overriding its default() method. You get called for any type the encoder doesn't know how to handle, and you return a JSON-compatible representation. This is the approach used in production codebases because it's explicit, testable, and reusable — you define the encoding logic once and pass the encoder class anywhere you call json.dump() or json.dumps().

For quick scripts, the default parameter shortcut works fine — pass a lambda or small function. But for anything going into a team codebase, write a proper encoder class. It's more readable and your colleagues will thank you.

custom_json_encoder.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import json
from datetime import datetime, date
from decimal import Decimal

# ── The types that break vanilla JSON serialization ──
order_record = {
    "order_id": "ORD-9921",
    "created_at": datetime(2024, 3, 15, 10, 30, 0),  # datetime — not JSON serializable
    "ship_date": date(2024, 3, 17),                   # date — also not JSON serializable
    "total_amount": Decimal("199.99"),                # Decimal — not JSON serializable
    "item_ids": {101, 205, 309},                      # set — not JSON serializable
    "status": "pending"
}

# Try without a custom encoder — this will raise TypeError
try:
    json.dumps(order_record)
except TypeError as encoding_error:
    print(f"Without custom encoder: {encoding_error}")

print()

# ── Solution: Custom JSONEncoder subclass ──
class AppJSONEncoder(json.JSONEncoder):
    """Handles Python types that vanilla JSON can't serialize."""

    def default(self, obj):
        # datetime: serialize as ISO 8601 string — universally understood
        if isinstance(obj, datetime):
            return obj.isoformat()          # e.g. '2024-03-15T10:30:00'

        # date: serialize as ISO date string
        if isinstance(obj, date):
            return obj.isoformat()          # e.g. '2024-03-17'

        # Decimal: convert to float — fine for display, use string for finance
        if isinstance(obj, Decimal):
            return float(obj)               # or str(obj) if precision matters

        # set: JSON has arrays, not sets — convert and sort for determinism
        if isinstance(obj, set):
            return sorted(list(obj))        # sort so output is always consistent

        # Let the parent class raise TypeError for anything we don't handle
        return super().default(obj)


# Now serialize cleanly using our custom encoder
serialised_order = json.dumps(
    order_record,
    cls=AppJSONEncoder,    # Pass the encoder CLASS (not an instance)
    indent=2,
    ensure_ascii=False
)

print("Serialized order record:")
print(serialised_order)

# ── Deserializing back: parse the date string yourself ──
loaded_order = json.loads(serialised_order)
print("\nRound-trip — created_at is now a string:")
print(type(loaded_order["created_at"]), loaded_order["created_at"])

# Convert back to datetime when needed
parsed_datetime = datetime.fromisoformat(loaded_order["created_at"])
print(f"Re-parsed datetime: {parsed_datetime}")
Output
Without custom encoder: Object of type datetime is not JSON serializable
Serialized order record:
{
"order_id": "ORD-9921",
"created_at": "2024-03-15T10:30:00",
"ship_date": "2024-03-17",
"total_amount": 199.99,
"item_ids": [
101,
205,
309
],
"status": "pending"
}
Round-trip — created_at is now a string:
<class 'str'> 2024-03-15T10:30:00
Re-parsed datetime: 2024-03-15 10:30:00
Finance Warning:
Never convert Decimal to float when the value represents money. float(Decimal('199.99')) can introduce floating-point precision errors. Serialize Decimal as str(obj) instead, and document that the field is a decimal string. Your future self (and your accountants) will thank you.
Production Insight
A TypeError mid-pipeline stops everything. In production this often happens when a new field type is added to a model but the encoder isn't updated.
Always include a catch-all in your encoder for unknown types — log a warning and convert to str or skip.
Better: unit test every new model field with a round-trip serialization test.
Key Takeaway
JSON has no native datetime, Decimal, or set.
Write a reusable JSONEncoder subclass early in any project.
Build it once, reuse it everywhere — a one-time cost that pays every time you add a new field.

Real-World Pattern: Fetching and Processing an API Response

Everything we've covered comes together the moment you touch a real API. The pattern is always the same: make an HTTP request, get back a JSON string in the response body, parse it into a Python dict, do your work, optionally write results to a file. Understanding each step means you're never guessing.

The requests library's response object has a convenient .json() shortcut method that calls json.loads() on response.text for you — but only if the response's Content-Type header is application/json. If it's not, you'll get a requests.exceptions.JSONDecodeError. It's worth knowing this so you're not surprised.

The pattern below also shows defensive coding: checking that the response key you expect actually exists before accessing it, and handling the case where an API returns a successful HTTP 200 but puts error details in the JSON body — which is extremely common with third-party APIs. This is the difference between code that works in a demo and code that survives contact with the real world.

api_json_pipeline.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import json
import urllib.request   # Using stdlib so no pip install needed for this example
import urllib.error
from datetime import datetime

# ── Simulating an API response (using a real public API) ──
# We'll use JSONPlaceholder — a free, stable fake REST API for testing
API_URL = "https://jsonplaceholder.typicode.com/users/1"

def fetch_user_profile(user_url: str) -> dict | None:
    """Fetch a user profile from an API and return it as a Python dict."""
    try:
        with urllib.request.urlopen(user_url, timeout=10) as response:
            # response.read() returns bytes — decode to string first
            raw_bytes = response.read()
            json_string = raw_bytes.decode("utf-8")

            # Parse the JSON string into a Python dict
            user_data = json.loads(json_string)
            return user_data

    except urllib.error.URLError as network_error:
        print(f"Network error: {network_error}")
        return None
    except json.JSONDecodeError as parse_error:
        print(f"Response wasn't valid JSON: {parse_error}")
        return None


def save_profile_snapshot(user_data: dict, output_path: str) -> None:
    """Enrich the profile with a timestamp and save it to a JSON file."""
    # Add metadata before saving — this is extremely common in pipelines
    enriched_profile = {
        "fetched_at": datetime.utcnow().isoformat() + "Z",  # ISO 8601 UTC
        "source_url": API_URL,
        "profile": user_data
    }

    with open(output_path, "w", encoding="utf-8") as output_file:
        json.dump(
            enriched_profile,
            output_file,
            indent=2,
            ensure_ascii=False   # Safe for names with accents or non-Latin chars
        )
    print(f"Snapshot saved to: {output_path}")


# ── Main flow ──
print(f"Fetching profile from {API_URL}...")
user = fetch_user_profile(API_URL)

if user:
    # Defensive access — check keys exist before using them
    user_name = user.get("name", "Unknown")
    user_email = user.get("email", "No email provided")
    company_name = user.get("company", {}).get("name", "No company")

    print(f"Name:    {user_name}")
    print(f"Email:   {user_email}")
    print(f"Company: {company_name}")

    # Save the enriched snapshot
    save_profile_snapshot(user, "user_snapshot.json")

    # ── Reading it back to verify the round-trip ──
    print("\nVerifying saved file...")
    with open("user_snapshot.json", "r", encoding="utf-8") as saved_file:
        reloaded = json.load(saved_file)

    print(f"Fetched at: {reloaded['fetched_at']}")
    print(f"Stored name: {reloaded['profile']['name']}")
Output
Fetching profile from https://jsonplaceholder.typicode.com/users/1...
Name: Leanne Graham
Email: Sincere@april.biz
Company: Romaguera-Crona
Snapshot saved to: user_snapshot.json
Verifying saved file...
Fetched at: 2024-03-15T10:30:00Z
Stored name: Leanne Graham
Pro Tip:
Always use dict.get('key', default) instead of dict['key'] when parsing API responses. APIs change — a key that's always been there can disappear in a new version, and dict['key'] raises a KeyError while dict.get('key') lets you set a sensible fallback. This single habit will save you from countless 3am on-call incidents.
Production Insight
Third-party APIs often return HTTP 200 with a JSON error body. Your code must distinguish success from failure by checking the body content, not the status code alone.
Also: API responses can be large. If you're saving to disk, use json.dump() directly to avoid holding the full string in memory.
Rule: never assume response.json() will succeed — wrap it in try-except.
Key Takeaway
Always use .get() for dict access.
Always wrap json.loads() in a try-except for API responses.
Treat HTTP 200 as a suggestion — validate the actual JSON content.

Graceful Error Handling for JSON Parsing and Encoding

Production systems encounter malformed JSON more often than you'd think. A truncated network response, a misconfigured upstream service, or a file corrupted mid-write can all produce invalid JSON. Your code needs to handle these failures without crashing the entire pipeline.

The most common JSON parsing error is JSONDecodeError, which is a subclass of ValueError. It tells you exactly where the parser failed: line number, column number, and the unexpected character. Use this information in your logging to speed up debugging.

Another frequent issue is UnicodeDecodeError when opening files that aren't UTF-8. Some legacy systems output UTF-16, Latin-1, or UTF-8 with a BOM (byte order mark). Python's open() with encoding='utf-8' will fail on these. Use encoding='utf-8-sig' to strip the BOM automatically, or detect encoding with the chardet library for unknown sources.

A defensive strategy: never let a single malformed JSON entry crash your entire batch job. If you're processing a line-delimited JSON file (one JSON object per line), wrap each line parse in a try-except, log the error, and continue. This is standard in ETL pipelines.

graceful_json_parsing.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import json
from pathlib import Path

# Simulate a file with one bad line
log_file_content = """{"user": "alice", "action": "login"}
{"user": "bob", "action": "logout"
{"user": "carol", "action": "purchase"}
"""

# Write to temp file
Path("events.log").write_text(log_file_content, encoding="utf-8")

# ── Defensive line-by-line parsing ──
valid_events = []
with open("events.log", "r", encoding="utf-8") as handle:
    for line_number, line in enumerate(handle, start=1):
        line = line.strip()
        if not line:
            continue
        try:
            event = json.loads(line)
            valid_events.append(event)
        except json.JSONDecodeError as err:
            print(f"[WARN] Line {line_number}: {err.msg} at position {err.pos} — skipped")

print(f"\nProcessed {len(valid_events)} valid events")

# ── Handling Unicode with BOM ──
# Simulate a file with UTF-8 BOM
bom_data = b'\xef\xbb\xbf{"version": 1}'  # UTF-8 BOM + JSON
Path("with_bom.json").write_bytes(bom_data)

# This fails:
try:
    with open("with_bom.json", "r", encoding="utf-8") as f:
        json.load(f)
except json.JSONDecodeError as err:
    print(f"Without BOM handling: {err}")

# This works:
with open("with_bom.json", "r", encoding="utf-8-sig") as f:
    data = json.load(f)
print(f"With utf-8-sig: {data}")
Output
[WARN] Line 2: Expecting ',' delimiter at position 28 — skipped
Processed 2 valid events
Without BOM handling: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
With utf-8-sig: {'version': 1}
Production Rule:
Never let a single JSON parse failure crash your entire ETL batch. Wrap each line/fragment in try-except, log the error with line number and character position, and continue. The json.JSONDecodeError object has .lineno, .colno, and .pos attributes — use them in your logs.
Production Insight
In production, malformed JSON from a single upstream service can halt an entire pipeline if not handled gracefully.
Always validate JSON before processing large files: a quick json.loads() on a sample first.
Use encoding='utf-8-sig' when you don't know if the source includes a BOM — it's harmless if there's no BOM.
Rule: defensive parsing is not optional in event-driven architectures.
Key Takeaway
Wrap json.loads() in try-except for every external source.
Use encoding='utf-8-sig' for files from unknown origins.
Log line and column from JSONDecodeError.
Never let one bad entry kill the whole batch.
● Production incidentPOST-MORTEMseverity: high

The Unicode Escape Incident: Downstream Systems Break on Escaped Names

Symptom
Java service reading JSON files threw MalformedInputException every time a non-ASCII character appeared. Data team spent three hours before someone checked the actual file content.
Assumption
Everyone assumed Python's json.dump() would write readable text. Nobody read the default behaviour documentation.
Root cause
By default, json.dump() and json.dumps() escape all non-ASCII characters. Names like 'José' become 'Jos\u00e9'. The consuming service's parser didn't handle Unicode escape sequences correctly.
Fix
Add ensure_ascii=False to every json.dump() call in the pipeline. Also add a post-deployment validation script that checks for escape sequences in output files.
Key lesson
  • Always pass ensure_ascii=False when writing JSON for production consumption — human-readable or not, it's safer.
  • Test round-trips with non-ASCII test data in CI. A simple unit test with 'María' would have caught this before deploy.
  • Document encoding assumptions in your API contract — don't assume the next service handles escaped Unicode.
Production debug guideCommon JSON failures and how to diagnose them on a running system4 entries
Symptom · 01
AttributeError: 'str' object has no attribute 'read'
Fix
You called json.load() on a string instead of json.loads(). Check if the input is a file path (use open() then json.load()) or a string (use json.loads()).
Symptom · 02
JSONDecodeError: Extra data in JSON
Fix
Multiple JSON objects concatenated. Use json.load() per line (NDJSON) or wrap in a list. For a file with one JSON per line, iterate with: for line in file: json.loads(line).
Symptom · 03
TypeError: Object of type datetime is not JSON serializable
Fix
You tried to json.dump() a non-serializable Python type. Use a custom JSONEncoder subclass that handles datetime, Decimal, set, etc. Or convert manually before calling json.dump().
Symptom · 04
UnicodeDecodeError when opening JSON file
Fix
The file is not UTF-8 encoded. Try encoding='latin-1' or detect encoding with chardet. Many legacy systems output UTF-8 with BOM — open with encoding='utf-8-sig' to strip BOM.
★ Quick Debug: JSON Parse FailuresFast commands to diagnose JSON issues on a production system. Run these from the shell.
Quickly test if a string is valid JSON
Immediate action
Use Python one-liner on command line
Commands
echo '{"key": "value"}' | python -c "import sys,json; json.loads(sys.stdin.read()); print('valid')"
python -c "import json; json.loads(open('data.json').read()); print('valid')"
Fix now
If it fails, check for trailing commas, single quotes, or missing quotes around keys.
JSON file contains escape sequences (\uXXXX) instead of real characters+
Immediate action
Check if file has raw Unicode or escapes
Commands
head -c 200 data.json | cat -v | grep -o '\\u[0-9a-fA-F]\{4\}'
python -c "import json; data = json.load(open('data.json')); print(json.dumps(data, ensure_ascii=False, indent=2))" > fixed.json
Fix now
Add ensure_ascii=False to your python script and regenerate the file.
API response returns HTML instead of JSON+
Immediate action
Print first 200 characters of response.text
Commands
curl -s https://api.example.com/endpoint | head -c 200
curl -s -H "Accept: application/json" https://api.example.com/endpoint | head -c 200
Fix now
Set proper Accept header or check API URL. If the API is returning an error page, check authentication or endpoint path.
json.load / dump vs json.loads / dumps
Aspectjson.load() / json.dump()json.loads() / json.dumps()
Input / Output typeFile object (open file handle)Python string
Typical use caseReading/writing JSON files on diskParsing API responses, network data
Memory efficiencyStreams from file — better for large filesEntire string must fit in memory
Encoding parameterControlled by open() — always set encoding='utf-8'String already decoded — no encoding param
Error on wrong inputAttributeError: 'str' object has no attribute 'read'TypeError: the JSON object must be str, not TextIOWrapper
indent / sort_keys work?Yes — same optional parametersYes — same optional parameters

Key takeaways

1
The 's' rule is permanent
json.loads() and json.dumps() work with strings; json.load() and json.dump() work with file objects. Mix them up and you get an AttributeError or TypeError that's confusing without this context.
2
Always open JSON files with encoding='utf-8' explicitly and always pass ensure_ascii=False when writing
these two habits alone prevent an entire class of encoding bugs that are notoriously hard to reproduce across different operating systems.
3
JSON has no native type for datetime, Decimal, or set. Build a reusable JSONEncoder subclass early in any project that deals with these types
it's a one-time cost that pays dividends every time you add a new serialization call.
4
Use dict.get('key', fallback) instead of dict['key'] when parsing external JSON
APIs change, fields disappear, and a KeyError at runtime is far worse than a sensible default value.
5
Wrap json.loads() in try-except when parsing external data. Log the line and column from JSONDecodeError. Never let a single malformed entry crash a batch job.

Common mistakes to avoid

3 patterns
×

Calling json.load() on a string instead of json.loads()

Symptom
AttributeError: 'str' object has no attribute 'read' — confusing because it doesn't mention JSON. The error occurs when you pass a string variable to json.load(), which expects a file handle.
Fix
If your JSON is already a string (e.g., from an API response or a variable), use json.loads(). Reserve json.load() exclusively for open file handles. The 's' = string rule is your cheat code.
×

Forgetting ensure_ascii=False when writing international data

Symptom
Names like 'José' appear as 'Jos\u00e9' in your JSON files, making them unreadable and bloated. Downstream consumers that don't expect escaped Unicode may fail to parse.
Fix
Always pass ensure_ascii=False to json.dump() and json.dumps(). This tells Python to write real Unicode characters instead of escape sequences, which is almost always the right behaviour for modern applications.
×

Trying to serialize a datetime or Decimal without a custom encoder

Symptom
TypeError: Object of type datetime is not JSON serializable, thrown at runtime, often mid-pipeline, stopping everything.
Fix
Write a custom JSONEncoder subclass that handles your non-standard types, or convert them manually before serializing (e.g., str(my_date) or my_datetime.isoformat()). Build the encoder once, reuse it everywhere in your codebase.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What's the difference between json.load() and json.loads() in Python, an...
Q02SENIOR
You're serializing a Python object to JSON and you get a TypeError sayin...
Q03SENIOR
If you're storing financial amounts like prices and discounts in a JSON ...
Q01 of 03JUNIOR

What's the difference between json.load() and json.loads() in Python, and when would you use each one?

ANSWER
json.load() reads JSON from an open file object (file handle). json.loads() reads JSON from a string. The 's' stands for 'string'. Use json.load() when you have a file path—open the file and pass the handle. Use json.loads() when you already have the JSON as a string, typically from an API response or a variable. Confusing them leads to AttributeError ('str' object has no attribute 'read') when you pass a string to json.load().
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
How do I read a JSON file in Python?
02
Why does Python throw 'Object of type datetime is not JSON serializable'?
03
What's the difference between json.dumps() returning a string versus json.dump() writing to a file?
🔥

That's File Handling. Mark it forged?

5 min read · try the examples if you haven't

Previous
Reading and Writing Files in Python
3 / 6 · File Handling
Next
Working with CSV in Python