Home Python Working with JSON in Python: Read, Write, and Parse Like a Pro

Working with JSON in Python: Read, Write, and Parse Like a Pro

In Plain English 🔥
Think of JSON like a shipping label on a package. The label has structured information — sender, receiver, contents, weight — written in a way both the post office and the recipient can read instantly. JSON is exactly that: a universal 'shipping label' format that lets your Python app send data to a website, a database, or another program, and have it arrive perfectly readable on the other side. It doesn't matter if the other end is written in JavaScript, Go, or Java — JSON is the common language they all speak.
⚡ Quick Answer
Think of JSON like a shipping label on a package. The label has structured information — sender, receiver, contents, weight — written in a way both the post office and the recipient can read instantly. JSON is exactly that: a universal 'shipping label' format that lets your Python app send data to a website, a database, or another program, and have it arrive perfectly readable on the other side. It doesn't matter if the other end is written in JavaScript, Go, or Java — JSON is the common language they all speak.

Every time you tap 'place order' on an e-commerce site, check the weather on your phone, or log into an app, JSON is quietly doing the heavy lifting behind the scenes. It's the format web APIs use to send data back and forth, the format config files are stored in, and the format data pipelines use to pass records between services. If you're writing Python in 2024 and you're not comfortable with JSON, you've got a gap that'll slow you down on almost every real project.

The problem JSON solves is surprisingly simple: computers need to share structured data with each other, but every language stores data differently in memory. A Python dictionary isn't the same as a JavaScript object internally — but both can be serialized into a JSON string that looks identical. JSON is the agreed-upon middle ground, a plain-text format that any language can read and write without needing to know anything about the other side's internals.

By the end of this article you'll be able to read JSON from a file, write Python data structures back out as JSON, parse API responses with confidence, handle encoding edge cases, and avoid the three mistakes that trip up even experienced developers. You'll also know exactly what to say when an interviewer asks you about serialization.

json.loads() vs json.load() — The Difference That Trips Everyone Up

Python's built-in json module gives you four functions you'll use constantly: json.loads(), json.load(), json.dumps(), and json.dump(). The naming is deceptively similar, and mixing them up is the single most common JSON mistake in Python.

Here's the mental model: the functions with an 's' on the end work with strings. The ones without the 's' work with file objects. That's it. json.loads() takes a JSON-formatted string and returns a Python object. json.load() takes an open file handle and reads JSON directly from it. Same relationship on the output side: json.dumps() returns a string, json.dump() writes directly to a file.

Why does this distinction matter? Because when you're hitting a web API, requests.get().text gives you a string — so you reach for json.loads(). When you're reading a config file from disk, you open the file and reach for json.load(). Picking the wrong one gives you a confusing AttributeError or TypeError that's hard to debug if you don't know the root cause.

json_loads_vs_load.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536
import json

# ── CASE 1: Parsing a JSON STRING (e.g. from an API response) ──
# Imagine requests.get(...).text returned this string:
api_response_text = '{"user": "alice", "score": 42, "active": true}'

# json.loads() deserializes a STRING into a Python dict
user_data = json.loads(api_response_text)

print(type(user_data))        # Confirm it's a dict, not a string
print(user_data["user"])      # Access like any normal Python dict
print(user_data["active"])    # JSON 'true' becomes Python True (bool)

print("---")

# ── CASE 2: Reading JSON from a FILE ──
# First, let's write a sample file so the example is fully self-contained
sample_config = {
    "app_name": "DataPipeline",
    "version": "2.1.0",
    "debug": False,
    "max_retries": 3
}

# Write the config to disk first
with open("config.json", "w", encoding="utf-8") as config_file:
    json.dump(sample_config, config_file)  # dump() writes to a FILE OBJECT

# Now read it back — json.load() reads from a FILE OBJECT, not a string
with open("config.json", "r", encoding="utf-8") as config_file:
    loaded_config = json.load(config_file)

print(type(loaded_config))              # Also a dict
print(loaded_config["app_name"])        # 'DataPipeline'
print(loaded_config["debug"])           # False (Python bool)
print(loaded_config["max_retries"])     # 3 (Python int)
▶ Output
<class 'dict'>
alice
True
---
<class 'dict'>
DataPipeline
False
3
⚠️
Memory Hook:Think of the 's' as standing for 'string'. json.loads() and json.dumps() = string in, string out. json.load() and json.dump() = file in, file out. Tattoo this on your brain and you'll never mix them up again.

Writing JSON Files the Right Way — Formatting, Encoding, and Sorting

Writing raw JSON to a file works fine, but the output is a single compressed line that's nearly unreadable when you open the file. In production you often need human-readable output — for config files, logs, or debugging. The json.dump() and json.dumps() functions have optional parameters that give you full control over the output format.

The indent parameter is the big one. Pass indent=2 or indent=4 and your output is immediately readable with proper nesting. The sort_keys=True parameter alphabetises the keys, which is invaluable for config files or any output that goes into version control — it makes diffs clean and predictable instead of random.

The ensure_ascii=False parameter is critical and often forgotten. By default, Python's json module escapes every non-ASCII character — so a name like 'María' becomes '\u004dar\u00eda'. That's technically valid JSON, but it's unreadable and bloated. Setting ensure_ascii=False writes the actual Unicode characters directly, which is almost always what you want when handling international data.

Finally, always open your JSON files with encoding='utf-8' explicitly. Don't rely on the system default — it varies between Windows, Mac, and Linux and will cause encoding bugs that are maddening to debug across environments.

write_json_formatted.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243
import json

# Real-world example: saving a user profile with international characters
user_profile = {
    "user_id": 1047,
    "full_name": "María García",       # Non-ASCII characters — common in real data
    "email": "maria.garcia@example.com",
    "preferences": {
        "theme": "dark",
        "language": "es",
        "notifications": True
    },
    "tags": ["premium", "verified", "beta-tester"]
}

# ── BAD: Default output — technically valid but unreadable ──
bad_output = json.dumps(user_profile)
print("Raw (hard to read):")
print(bad_output)
print()

# ── GOOD: Formatted, Unicode-safe, sorted keys ──
good_output = json.dumps(
    user_profile,
    indent=2,              # 2-space indentation for readability
    sort_keys=True,        # Alphabetical keys — great for version control diffs
    ensure_ascii=False     # Write 'María' not '\u004dar\u00eda'
)
print("Formatted (production-ready):")
print(good_output)

# ── Writing to a file with explicit UTF-8 encoding ──
output_path = "user_profile.json"
with open(output_path, "w", encoding="utf-8") as output_file:
    json.dump(
        user_profile,
        output_file,
        indent=2,
        sort_keys=True,
        ensure_ascii=False   # Critical: don't escape María into \u sequences
    )

print(f"\nProfile saved to {output_path}")
▶ Output
Raw (hard to read):
{"user_id": 1047, "full_name": "Mar\u00eda Garc\u00eda", "email": "maria.garcia@example.com", "preferences": {"theme": "dark", "language": "es", "notifications": true}, "tags": ["premium", "verified", "beta-tester"]}

Formatted (production-ready):
{
"email": "maria.garcia@example.com",
"full_name": "María García",
"preferences": {
"language": "es",
"notifications": true,
"theme": "dark"
},
"tags": [
"premium",
"verified",
"beta-tester"
],
"user_id": 1047
}

Profile saved to user_profile.json
⚠️
Watch Out:If you're writing JSON files that go into a Git repository, always use sort_keys=True. Without it, Python's dict iteration order (insertion order since 3.7) means two logically identical dicts can produce different JSON output if keys were inserted in different orders — creating noisy, meaningless diffs that pollute your pull requests.

Handling Non-Serializable Types — Dates, Decimals, and Custom Objects

Here's where Python and JSON have a real fight. JSON only natively supports strings, numbers, booleans, null, arrays, and objects. It has no concept of a Python datetime, a Decimal, a set, or any custom class you've built. The moment you try to serialize one of these, Python throws a TypeError: Object of type X is not JSON serializable — and it stops everything.

This isn't a bug, it's a design decision. JSON is meant to be language-agnostic, so it can't encode Python-specific types. Your job is to tell Python how to translate those types into something JSON understands.

The cleanest way is to write a custom encoder by subclassing json.JSONEncoder and overriding its default() method. You get called for any type the encoder doesn't know how to handle, and you return a JSON-compatible representation. This is the approach used in production codebases because it's explicit, testable, and reusable — you define the encoding logic once and pass the encoder class anywhere you call json.dump() or json.dumps().

For quick scripts, the default parameter shortcut works fine — pass a lambda or small function. But for anything going into a team codebase, write a proper encoder class. It's more readable and your colleagues will thank you.

custom_json_encoder.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import json
from datetime import datetime, date
from decimal import Decimal

# ── The types that break vanilla JSON serialization ──
order_record = {
    "order_id": "ORD-9921",
    "created_at": datetime(2024, 3, 15, 10, 30, 0),  # datetime — not JSON serializable
    "ship_date": date(2024, 3, 17),                   # date — also not JSON serializable
    "total_amount": Decimal("199.99"),                # Decimal — not JSON serializable
    "item_ids": {101, 205, 309},                      # set — not JSON serializable
    "status": "pending"
}

# Try without a custom encoder — this will raise TypeError
try:
    json.dumps(order_record)
except TypeError as encoding_error:
    print(f"Without custom encoder: {encoding_error}")

print()

# ── Solution: Custom JSONEncoder subclass ──
class AppJSONEncoder(json.JSONEncoder):
    """Handles Python types that vanilla JSON can't serialize."""

    def default(self, obj):
        # datetime: serialize as ISO 8601 string — universally understood
        if isinstance(obj, datetime):
            return obj.isoformat()          # e.g. '2024-03-15T10:30:00'

        # date: serialize as ISO date string
        if isinstance(obj, date):
            return obj.isoformat()          # e.g. '2024-03-17'

        # Decimal: convert to float — fine for display, use string for finance
        if isinstance(obj, Decimal):
            return float(obj)               # or str(obj) if precision matters

        # set: JSON has arrays, not sets — convert and sort for determinism
        if isinstance(obj, set):
            return sorted(list(obj))        # sort so output is always consistent

        # Let the parent class raise TypeError for anything we don't handle
        return super().default(obj)


# Now serialize cleanly using our custom encoder
serialised_order = json.dumps(
    order_record,
    cls=AppJSONEncoder,    # Pass the encoder CLASS (not an instance)
    indent=2,
    ensure_ascii=False
)

print("Serialized order record:")
print(serialised_order)

# ── Deserializing back: parse the date string yourself ──
loaded_order = json.loads(serialised_order)
print("\nRound-trip — created_at is now a string:")
print(type(loaded_order["created_at"]), loaded_order["created_at"])

# Convert back to datetime when needed
parsed_datetime = datetime.fromisoformat(loaded_order["created_at"])
print(f"Re-parsed datetime: {parsed_datetime}")
▶ Output
Without custom encoder: Object of type datetime is not JSON serializable

Serialized order record:
{
"order_id": "ORD-9921",
"created_at": "2024-03-15T10:30:00",
"ship_date": "2024-03-17",
"total_amount": 199.99,
"item_ids": [
101,
205,
309
],
"status": "pending"
}

Round-trip — created_at is now a string:
<class 'str'> 2024-03-15T10:30:00
Re-parsed datetime: 2024-03-15 10:30:00
⚠️
Finance Warning:Never convert Decimal to float when the value represents money. float(Decimal('199.99')) can introduce floating-point precision errors. Serialize Decimal as str(obj) instead, and document that the field is a decimal string. Your future self (and your accountants) will thank you.

Real-World Pattern: Fetching and Processing an API Response

Everything we've covered comes together the moment you touch a real API. The pattern is always the same: make an HTTP request, get back a JSON string in the response body, parse it into a Python dict, do your work, optionally write results to a file. Understanding each step means you're never guessing.

The requests library's response object has a convenient .json() shortcut method that calls json.loads() on response.text for you — but only if the response's Content-Type header is application/json. If it's not, you'll get a requests.exceptions.JSONDecodeError. It's worth knowing this so you're not surprised.

The pattern below also shows defensive coding: checking that the response key you expect actually exists before accessing it, and handling the case where an API returns a successful HTTP 200 but puts error details in the JSON body — which is extremely common with third-party APIs. This is the difference between code that works in a demo and code that survives contact with the real world.

api_json_pipeline.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
import json
import urllib.request   # Using stdlib so no pip install needed for this example
import urllib.error
from datetime import datetime

# ── Simulating an API response (using a real public API) ──
# We'll use JSONPlaceholder — a free, stable fake REST API for testing
API_URL = "https://jsonplaceholder.typicode.com/users/1"

def fetch_user_profile(user_url: str) -> dict | None:
    """Fetch a user profile from an API and return it as a Python dict."""
    try:
        with urllib.request.urlopen(user_url, timeout=10) as response:
            # response.read() returns bytes — decode to string first
            raw_bytes = response.read()
            json_string = raw_bytes.decode("utf-8")

            # Parse the JSON string into a Python dict
            user_data = json.loads(json_string)
            return user_data

    except urllib.error.URLError as network_error:
        print(f"Network error: {network_error}")
        return None
    except json.JSONDecodeError as parse_error:
        print(f"Response wasn't valid JSON: {parse_error}")
        return None


def save_profile_snapshot(user_data: dict, output_path: str) -> None:
    """Enrich the profile with a timestamp and save it to a JSON file."""
    # Add metadata before saving — this is extremely common in pipelines
    enriched_profile = {
        "fetched_at": datetime.utcnow().isoformat() + "Z",  # ISO 8601 UTC
        "source_url": API_URL,
        "profile": user_data
    }

    with open(output_path, "w", encoding="utf-8") as output_file:
        json.dump(
            enriched_profile,
            output_file,
            indent=2,
            ensure_ascii=False   # Safe for names with accents or non-Latin chars
        )
    print(f"Snapshot saved to: {output_path}")


# ── Main flow ──
print(f"Fetching profile from {API_URL}...")
user = fetch_user_profile(API_URL)

if user:
    # Defensive access — check keys exist before using them
    user_name = user.get("name", "Unknown")
    user_email = user.get("email", "No email provided")
    company_name = user.get("company", {}).get("name", "No company")

    print(f"Name:    {user_name}")
    print(f"Email:   {user_email}")
    print(f"Company: {company_name}")

    # Save the enriched snapshot
    save_profile_snapshot(user, "user_snapshot.json")

    # ── Reading it back to verify the round-trip ──
    print("\nVerifying saved file...")
    with open("user_snapshot.json", "r", encoding="utf-8") as saved_file:
        reloaded = json.load(saved_file)

    print(f"Fetched at: {reloaded['fetched_at']}")
    print(f"Stored name: {reloaded['profile']['name']}")
▶ Output
Fetching profile from https://jsonplaceholder.typicode.com/users/1...
Name: Leanne Graham
Email: Sincere@april.biz
Company: Romaguera-Crona
Snapshot saved to: user_snapshot.json

Verifying saved file...
Fetched at: 2024-03-15T10:30:00Z
Stored name: Leanne Graham
⚠️
Pro Tip:Always use dict.get('key', default) instead of dict['key'] when parsing API responses. APIs change — a key that's always been there can disappear in a new version, and dict['key'] raises a KeyError while dict.get('key') lets you set a sensible fallback. This single habit will save you from countless 3am on-call incidents.
Aspectjson.load() / json.dump()json.loads() / json.dumps()
Input / Output typeFile object (open file handle)Python string
Typical use caseReading/writing JSON files on diskParsing API responses, network data
Memory efficiencyStreams from file — better for large filesEntire string must fit in memory
Encoding parameterControlled by open() — always set encoding='utf-8'String already decoded — no encoding param
Error on wrong inputAttributeError: 'str' object has no attribute 'read'TypeError: the JSON object must be str, not TextIOWrapper
indent / sort_keys work?Yes — same optional parametersYes — same optional parameters

🎯 Key Takeaways

  • The 's' rule is permanent: json.loads() and json.dumps() work with strings; json.load() and json.dump() work with file objects. Mix them up and you get an AttributeError or TypeError that's confusing without this context.
  • Always open JSON files with encoding='utf-8' explicitly and always pass ensure_ascii=False when writing — these two habits alone prevent an entire class of encoding bugs that are notoriously hard to reproduce across different operating systems.
  • JSON has no native type for datetime, Decimal, or set. Build a reusable JSONEncoder subclass early in any project that deals with these types — it's a one-time cost that pays dividends every time you add a new serialization call.
  • Use dict.get('key', fallback) instead of dict['key'] when parsing external JSON — APIs change, fields disappear, and a KeyError at runtime is far worse than a sensible default value.

⚠ Common Mistakes to Avoid

  • Mistake 1: Calling json.load() on a string instead of json.loads() — Symptom: AttributeError: 'str' object has no attribute 'read' — Fix: If your JSON is already a string (e.g. from an API response or a variable), use json.loads(). Reserve json.load() exclusively for open file handles. The 's' = string rule is your cheat code.
  • Mistake 2: Forgetting ensure_ascii=False when writing international data — Symptom: Names like 'José' appear as 'Jos\u00e9' in your JSON files, making them unreadable and bloated — Fix: Always pass ensure_ascii=False to json.dump() and json.dumps(). This tells Python to write real Unicode characters instead of escape sequences, which is almost always the right behaviour for modern applications.
  • Mistake 3: Trying to serialize a datetime or Decimal without a custom encoder — Symptom: TypeError: Object of type datetime is not JSON serializable, thrown at runtime, often mid-pipeline — Fix: Write a custom JSONEncoder subclass that handles your non-standard types, or convert them manually before serializing (e.g. str(my_date) or my_datetime.isoformat()). Build the encoder once, reuse it everywhere in your codebase.

Interview Questions on This Topic

  • QWhat's the difference between json.load() and json.loads() in Python, and when would you use each one?
  • QYou're serializing a Python object to JSON and you get a TypeError saying your datetime is not JSON serializable. Walk me through two different ways to fix that.
  • QIf you're storing financial amounts like prices and discounts in a JSON file, would you serialize them as floats or strings, and why? What could go wrong with floats?

Frequently Asked Questions

How do I read a JSON file in Python?

Open the file with open('yourfile.json', 'r', encoding='utf-8') and pass the file handle to json.load(). This gives you a Python dict or list you can work with immediately. Always specify encoding='utf-8' explicitly to avoid platform-specific encoding surprises.

Why does Python throw 'Object of type datetime is not JSON serializable'?

JSON is a language-agnostic format and only supports strings, numbers, booleans, null, arrays, and objects. Python's datetime type doesn't map to any of these natively. Fix it by subclassing json.JSONEncoder and converting datetime objects to ISO strings (datetime.isoformat()) in the default() method, then pass your encoder class using the cls= parameter.

What's the difference between json.dumps() returning a string versus json.dump() writing to a file?

json.dumps() (with an 's') serializes a Python object into a JSON-formatted string in memory and returns it — useful when you need to send data over a network or store it in a variable. json.dump() (without the 's') writes the serialized JSON directly into an open file object, which is more efficient for writing to disk since you don't need to hold the entire string in memory first.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousReading and Writing Files in PythonNext →Working with CSV in Python
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged