Home Python Pydantic for Data Validation in Python — A Real-World Guide

Pydantic for Data Validation in Python — A Real-World Guide

In Plain English 🔥
Imagine you run a hotel and guests fill out a check-in form. Before handing them a key, a receptionist checks every field — 'Is the name filled in? Is the phone number actually a number? Is the check-in date in the future?' Pydantic is that receptionist for your Python code. You describe what valid data looks like, and Pydantic checks every piece of data that comes in — rejecting anything that doesn't fit, before it ever causes a problem deeper in your system.
⚡ Quick Answer
Imagine you run a hotel and guests fill out a check-in form. Before handing them a key, a receptionist checks every field — 'Is the name filled in? Is the phone number actually a number? Is the check-in date in the future?' Pydantic is that receptionist for your Python code. You describe what valid data looks like, and Pydantic checks every piece of data that comes in — rejecting anything that doesn't fit, before it ever causes a problem deeper in your system.

Every production Python application eats data it didn't create — JSON from an API, environment variables, user form submissions, database records. Any one of those can arrive malformed, missing a field, or with the wrong type. When that bad data reaches your business logic, the errors that surface are cryptic, hard to trace, and expensive to debug in production. Data validation isn't optional; it's the front door of a reliable system.

Before Pydantic, the typical approach was a tangle of if statements, isinstance() checks, and manual type coercion spread across dozens of functions. It worked, but it was brittle, verbose, and impossible to maintain at scale. Pydantic solves this by letting you declare your data's shape as a plain Python class — using type hints you're already writing — and then automatically validates, coerces, and documents that shape for you. One model definition does the work of fifty hand-rolled checks.

By the end of this article you'll know how to build Pydantic models that validate real API payloads, write custom validators for business rules that go beyond basic types, handle nested data structures confidently, and avoid the three mistakes that catch most intermediate developers off guard. You'll also understand why FastAPI, the most popular Python web framework of the last four years, bets its entire request/response pipeline on Pydantic.

Your First Pydantic Model — Why Type Hints Alone Aren't Enough

Python's type hints are annotations — they're hints to your IDE and to tools like mypy, but the Python runtime itself ignores them completely. Write age: int in a plain dataclass and pass the string 'thirty' — Python won't complain. That silence is dangerous when data is coming from the outside world.

Pydantic changes the contract. When you inherit from BaseModel, those same type hints become enforced rules. Pydantic reads them at class-creation time and builds a validator for every field. At instantiation, every value is checked and coerced if possible — or rejected with a clear, structured error if it isn't.

The magic keyword here is coercion. If a field is typed as int and you pass '42', Pydantic doesn't refuse — it converts '42' to 42 for you. That's deliberate: real-world data sources like JSON, query strings, and environment variables are always strings first. Pydantic meets the real world halfway. But if you pass 'forty-two', it can't coerce that — and it tells you exactly which field failed, what value it received, and what type it expected.

user_registration_model.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344
from pydantic import BaseModel, EmailStr, ValidationError

# Install pydantic with: pip install pydantic[email]
# The [email] extra adds email validation support via the 'email-validator' package

class UserRegistration(BaseModel):
    username: str          # Must be a string — Pydantic will coerce int/float to str
    email: EmailStr        # Must be a valid email address format, not just any string
    age: int               # Pydantic will coerce '25' -> 25, but fail on 'twenty-five'
    is_active: bool = True # Optional field with a default value

# --- Happy path: valid data coming in from a registration form ---
new_user = UserRegistration(
    username="ada_lovelace",
    email="ada@babbage.co.uk",
    age="28",          # Passed as a string (as it would be from a web form) — Pydantic coerces it
    is_active=True
)

print("=== Successful Registration ===")
print(f"Username : {new_user.username}")
print(f"Email    : {new_user.email}")
print(f"Age      : {new_user.age} (type: {type(new_user.age).__name__})")  # Confirm it's now an int
print(f"Active   : {new_user.is_active}")
print(f"As dict  : {new_user.model_dump()}")  # model_dump() serializes the model to a plain dict

print()

# --- Unhappy path: bad data — what happens in production when input isn't sanitized ---
print("=== Failed Registration (bad data) ===")
try:
    bad_user = UserRegistration(
        username="ghost_user",
        email="not-an-email",   # Fails EmailStr validation
        age="twenty-five",      # Cannot be coerced to int
    )
except ValidationError as validation_error:
    # Pydantic gives us a structured error — we can log it, return it as JSON, or display it
    print(f"Caught {validation_error.error_count()} validation error(s):")
    for error in validation_error.errors():
        print(f"  Field : {' -> '.join(str(loc) for loc in error['loc'])}")
        print(f"  Issue : {error['msg']}")
        print(f"  Input : {error['input']}")
        print()
▶ Output
=== Successful Registration ===
Username : ada_lovelace
Email : ada@babbage.co.uk
Age : 28 (type: int)
Active : True
As dict : {'username': 'ada_lovelace', 'email': 'ada@babbage.co.uk', 'age': 28, 'is_active': True}

=== Failed Registration (bad data) ===
Caught 2 validation error(s):
Field : email
Issue : value is not a valid email address: An email address must have an @-sign.
Input : not-an-email

Field : age
Issue : Input should be a valid integer, unable to parse string as an integer
Input : twenty-five
⚠️
Pro Tip: model_dump() is your serialization best friendCall `.model_dump()` on any Pydantic model to get a plain Python dict — perfect for returning JSON from an API endpoint or writing to a database. Pass `mode='json'` to get JSON-safe types (e.g., `datetime` objects become ISO strings automatically).

Custom Validators and Field Constraints — Encoding Business Rules

Type checking gets you 70% of the way. But real business rules are more nuanced: a username can't contain spaces, a price can't be negative, a start date must be before an end date. These rules live in your domain — Pydantic can't guess them. That's where Field() constraints and the @field_validator decorator come in.

Field() lets you attach constraints directly to a type declaration — things like minimum/maximum values, string length limits, and regex patterns. This is the right tool for single-field rules that are purely about the value's shape. They're also automatically reflected in the JSON Schema Pydantic generates, which means FastAPI's auto-docs (Swagger UI) will show these constraints to API consumers for free.

For rules that involve logic — especially rules that span multiple fields — @field_validator and @model_validator are your tools. A @field_validator runs after the type has been coerced, so you're always working with a Python int, not the raw string '42'. A @model_validator with mode='after' fires after the entire model is populated, giving you access to all fields at once. That's how you implement 'end date must be after start date' without hacks.

product_listing_model.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
from datetime import date
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional

class ProductListing(BaseModel):
    # Field() constraints are checked BEFORE custom validators run
    name: str = Field(
        min_length=2,
        max_length=100,
        description="Product display name"
    )
    price_usd: float = Field(
        gt=0,           # gt = greater than (strictly). Use ge= for greater-than-or-equal
        le=99999.99,    # le = less than or equal
        description="Price in US dollars"
    )
    discount_pct: Optional[float] = Field(
        default=0.0,
        ge=0.0,   # Cannot be negative
        lt=100.0  # Cannot be 100% or more (would be free or negative price)
    )
    available_from: date
    available_until: Optional[date] = None
    sku: str = Field(
        pattern=r'^[A-Z]{2,4}-\d{4,8}$',  # e.g. ELEC-00142 or TV-9982100
        description="Stock Keeping Unit code"
    )

    # @field_validator runs AFTER the field is already its correct Python type
    @field_validator('name')
    @classmethod
    def name_must_not_be_generic(cls, product_name: str) -> str:
        # Strip whitespace first — a common real-world data cleaning step
        cleaned_name = product_name.strip()
        generic_names = {'product', 'item', 'thing', 'unnamed', 'test'}
        if cleaned_name.lower() in generic_names:
            raise ValueError(f"Product name '{cleaned_name}' is too generic. Use a specific name.")
        return cleaned_name  # Always return the (possibly cleaned) value

    # @model_validator with mode='after' has access to the fully-built model instance
    @model_validator(mode='after')
    def availability_window_must_be_valid(self) -> 'ProductListing':
        if self.available_until is not None:
            if self.available_until <= self.available_from:
                raise ValueError(
                    f"available_until ({self.available_until}) must be "
                    f"after available_from ({self.available_from})"
                )
        return self  # Always return self from a model_validator


# --- Test 1: Valid product ---
laptop = ProductListing(
    name="  ThinkPad X1 Carbon  ",  # Leading/trailing spaces — our validator cleans this
    price_usd=1299.99,
    discount_pct=15.0,
    available_from=date(2024, 1, 1),
    available_until=date(2024, 12, 31),
    sku="COMP-00891"
)
print(f"Product name (cleaned): '{laptop.name}'")
print(f"Price after discount: ${laptop.price_usd * (1 - laptop.discount_pct / 100):.2f}")
print()

# --- Test 2: Invalid product — multiple errors at once ---
from pydantic import ValidationError
try:
    bad_product = ProductListing(
        name="product",             # Fails our custom validator
        price_usd=-50.0,            # Fails gt=0 constraint
        available_from=date(2024, 6, 1),
        available_until=date(2024, 1, 1),  # Fails model validator (before < after)
        sku="invalid-sku-format"    # Fails regex pattern
    )
except ValidationError as e:
    print(f"Found {e.error_count()} errors:")
    for err in e.errors():
        field = ' -> '.join(str(loc) for loc in err['loc'])
        print(f"  [{field}] {err['msg']}")
▶ Output
Product name (cleaned): 'ThinkPad X1 Carbon'
Price after discount: $1104.99

Found 4 errors:
[name] Value error, Product name 'product' is too generic. Use a specific name.
[price_usd] Input should be greater than 0
[sku] String should match pattern '^[A-Z]{2,4}-\d{4,8}$'
[ProductListing] Value error, available_until (2024-01-01) must be after available_from (2024-06-01)
⚠️
Watch Out: Always return a value from @field_validatorIf you forget to return the value from a `@field_validator`, the field silently becomes `None` — even if validation passed. This is one of the most common Pydantic bugs in production code. Always end your validator with `return value` (or the cleaned version of it).

Nested Models and Real API Payloads — Where Pydantic Gets Powerful

Real-world data is almost never flat. An order has a customer, a customer has an address, an address has a country. When you nest Pydantic models inside each other, each layer validates independently — so an error in a deeply nested field tells you exactly where the problem is, not just 'something in the payload was wrong'.

You can nest models by declaring a field's type as another BaseModel subclass. Pydantic handles the recursive validation automatically. You can also use List[SomeModel] or Dict[str, SomeModel] for collections of validated objects — all the standard Python typing constructs work as you'd expect.

This is exactly how FastAPI uses Pydantic under the hood: every incoming request body is a nested model graph. The framework deserializes the raw JSON into Python objects, runs the full validation tree, and hands your route function a fully-typed, already-validated object. No manual parsing, no defensive dict.get() calls, no hidden None values where you expected a string.

ecommerce_order_model.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
from pydantic import BaseModel, Field, field_validator
from typing import List
from datetime import datetime
from enum import Enum

class OrderStatus(str, Enum):
    # Inheriting from str makes JSON serialization work seamlessly
    PENDING   = "pending"
    CONFIRMED = "confirmed"
    SHIPPED   = "shipped"
    DELIVERED = "delivered"
    CANCELLED = "cancelled"

class ShippingAddress(BaseModel):
    street_line_1: str = Field(min_length=5)
    street_line_2: str = ""
    city: str
    postcode: str
    country_code: str = Field(pattern=r'^[A-Z]{2}$')  # ISO 3166-1 alpha-2, e.g. 'US', 'GB'

class OrderItem(BaseModel):
    product_sku: str
    product_name: str
    quantity: int = Field(ge=1)            # Must order at least 1
    unit_price_usd: float = Field(gt=0)

    @property
    def line_total(self) -> float:
        # Computed property — not a stored field, so Pydantic doesn't validate it
        # but it's always consistent with the validated data
        return round(self.quantity * self.unit_price_usd, 2)

class CustomerOrder(BaseModel):
    order_id: str
    customer_email: str
    status: OrderStatus = OrderStatus.PENDING  # Enum default
    shipping_address: ShippingAddress           # Nested model — validated recursively
    items: List[OrderItem] = Field(min_length=1) # Order must have at least one item
    created_at: datetime = Field(default_factory=datetime.utcnow)  # Auto-set on creation

    @field_validator('order_id')
    @classmethod
    def order_id_must_be_uppercase(cls, raw_id: str) -> str:
        return raw_id.upper()  # Normalise — so 'ord-1234' and 'ORD-1234' are treated the same

    @property
    def order_total(self) -> float:
        return round(sum(item.line_total for item in self.items), 2)


# Simulating a JSON payload arriving from an API client (e.g. a mobile app checkout)
# In real life this would be: CustomerOrder(**request.json()) or FastAPI does it automatically
raw_payload = {
    "order_id": "ord-20240815-9921",
    "customer_email": "grace@hopper.dev",
    "shipping_address": {
        "street_line_1": "123 Compiler Street",
        "city": "Arlington",
        "postcode": "22201",
        "country_code": "US"
    },
    "items": [
        {"product_sku": "BOOK-0042", "product_name": "The Art of Computer Programming", "quantity": 2, "unit_price_usd": 79.99},
        {"product_sku": "BOOK-0187", "product_name": "Clean Code", "quantity": 1, "unit_price_usd": 34.50}
    ]
}

order = CustomerOrder(**raw_payload)

print(f"Order ID    : {order.order_id}")             # Note: auto-uppercased
print(f"Customer    : {order.customer_email}")
print(f"Status      : {order.status.value}")
print(f"Ship to     : {order.shipping_address.city}, {order.shipping_address.country_code}")
print(f"Items       : {len(order.items)}")
for item in order.items:
    print(f"  - {item.product_name} x{item.quantity} @ ${item.unit_price_usd} = ${item.line_total}")
print(f"Order Total : ${order.order_total}")
print()

# Serialise the whole nested structure to a dict (ready to store in a database or return as JSON)
order_dict = order.model_dump()
print("Serialised keys at top level:", list(order_dict.keys()))
print("Nested address keys:", list(order_dict['shipping_address'].keys()))
▶ Output
Order ID : ORD-20240815-9921
Customer : grace@hopper.dev
Status : pending
Ship to : Arlington, US
Items : 2
- The Art of Computer Programming x2 @ $79.99 = $159.98
- Clean Code x1 @ $34.5 = $34.5
Order Total : $194.48

Serialised keys at top level: ['order_id', 'customer_email', 'status', 'shipping_address', 'items', 'created_at']
Nested address keys: ['street_line_1', 'street_line_2', 'city', 'postcode', 'country_code']
🔥
Interview Gold: Why does FastAPI use Pydantic?FastAPI uses Pydantic because its models double as both a validation layer and an OpenAPI schema source. One `BaseModel` subclass gives you request parsing, response serialization, and auto-generated Swagger docs — with zero duplication. That's a compelling answer in any FastAPI or API design interview.

Pydantic Settings — Validating Config and Environment Variables

BaseModel is for validating data that flows through your application. BaseSettings — from the pydantic-settings package — is for validating the environment your application runs in. The distinction matters: your app's config (database URL, API keys, port number, debug flags) is just data that happens to come from environment variables and .env files instead of JSON.

Without BaseSettings, it's common to scatter os.environ.get('DATABASE_URL', 'localhost') calls through your codebase. The problem: you only find out a required variable is missing when the code path that uses it runs — possibly hours or days into a production deployment. BaseSettings validates all of your config at startup. If DATABASE_URL is missing, your app refuses to start and tells you exactly which variable is missing before it does any damage.

This pattern — fail fast with a clear message rather than fail slowly with a cryptic error — is one of the most valuable architectural habits you can build. BaseSettings makes it trivially easy.

app_config_settings.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
# Install: pip install pydantic-settings
# Create a .env file in the same directory with the content shown below before running

from pydantic import Field, AnyHttpUrl, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
from typing import List

# --- Contents of your .env file (do NOT commit this to git) ---
# DATABASE_URL=postgresql://user:pass@localhost:5432/myapp
# API_SECRET_KEY=super-secret-key-change-this-in-prod
# DEBUG_MODE=false
# ALLOWED_ORIGINS=http://localhost:3000,https://myapp.com
# MAX_CONNECTIONS=10

class AppSettings(BaseSettings):
    # SettingsConfigDict tells Pydantic WHERE to look for values
    model_config = SettingsConfigDict(
        env_file='.env',            # Read from .env file
        env_file_encoding='utf-8',
        case_sensitive=False,       # DATABASE_URL and database_url are the same
        extra='ignore'              # Don't crash if .env has extra keys we don't declare
    )

    # Required fields — no default means the app REFUSES to start if these are missing
    database_url: str = Field(description="Full PostgreSQL connection string")
    api_secret_key: str = Field(min_length=16, description="JWT signing secret")

    # Optional fields with sensible defaults
    debug_mode: bool = False
    app_port: int = Field(default=8000, ge=1024, le=65535)
    max_connections: int = Field(default=5, ge=1, le=100)

    # A comma-separated list in an env var — Pydantic can handle this pattern
    allowed_origins: List[str] = Field(default=["http://localhost:3000"])

    @field_validator('api_secret_key')
    @classmethod
    def secret_key_must_not_be_default(cls, key_value: str) -> str:
        insecure_defaults = {'secret', 'password', 'changeme', 'default', 'test'}
        if key_value.lower() in insecure_defaults:
            raise ValueError(
                "API_SECRET_KEY is set to an insecure default value. "
                "Set a strong random secret before deploying."
            )
        return key_value


# This is the pattern: load settings ONCE at module level.
# Import `settings` from this module everywhere else — never call AppSettings() twice.
try:
    settings = AppSettings()
    print("=== Application Config Loaded Successfully ===")
    print(f"Database     : {settings.database_url[:30]}...")  # Don't log full credentials
    print(f"Port         : {settings.app_port}")
    print(f"Debug Mode   : {settings.debug_mode}")
    print(f"Max Conns    : {settings.max_connections}")
    print(f"CORS Origins : {settings.allowed_origins}")
except Exception as config_error:
    # In a real app you'd use logging.critical() here and sys.exit(1)
    print(f"FATAL: Could not load application config — {config_error}")
    print("Application startup aborted.")
▶ Output
=== Application Config Loaded Successfully ===
Database : postgresql://user:pass@localhos...
Port : 8000
Debug Mode : False
Max Conns : 10
Cors Origins : ['http://localhost:3000', 'https://myapp.com']
⚠️
Pro Tip: Singleton Settings PatternCreate your `AppSettings` instance once at module level (e.g. in `config.py`) and import that single `settings` object everywhere else. Never call `AppSettings()` inside a function — it re-reads the `.env` file on every call, which is slow and makes testing a nightmare.
AspectPlain Python DataclassesPydantic BaseModel
Runtime type validationNone — type hints are ignoredFull validation on every instantiation
Type coercion (str -> int)Not performedAutomatic where safe (e.g. '42' -> 42)
Error messagesPython's default TypeError (often cryptic)Structured errors with field name, input, and reason
Nested model validationManual — you write the logic yourselfAutomatic and recursive out of the box
JSON serializationRequires manual dict() or asdict()Built-in .model_dump() and .model_dump_json()
JSON Schema generationNot supportedBuilt-in .model_json_schema()
Custom validatorsNot a built-in concept@field_validator and @model_validator decorators
Default factoriesfield(default_factory=...) supportedField(default_factory=...) supported
Environment variable parsingNot supportedBaseSettings from pydantic-settings package
Performance (Pydantic v2)Native Python speedRust-powered core — ~5-50x faster than Pydantic v1

🎯 Key Takeaways

  • Pydantic turns Python type hints from passive documentation into active runtime enforcement — the same syntax, a completely different guarantee.
  • Type coercion is a feature, not a bug: '42' becomes 42, but 'forty-two' raises a clear ValidationError — Pydantic meets real-world string data halfway.
  • Nest models inside models freely — Pydantic validates each layer recursively and pinpoints exactly which nested field failed, saving hours of debugging.
  • Use BaseSettings (from pydantic-settings) to validate environment variables at startup — apps that fail fast with a clear config error are infinitely easier to operate than apps that crash mysteriously at runtime.

⚠ Common Mistakes to Avoid

  • Mistake 1: Forgetting to return the value from @field_validator — If you validate but don't return the (possibly transformed) value, Pydantic silently sets the field to None even though no error was raised. Always end every @field_validator with return value. Run a quick test: print the field after instantiation to confirm it's not None.
  • Mistake 2: Using mutable defaults like lists or dicts directly in Field() — Writing tags: List[str] = [] at the class level is a classic Python gotcha. All instances share the same list object. Fix it with tags: List[str] = Field(default_factory=list) — this creates a fresh list for each new model instance.
  • Mistake 3: Confusing Pydantic v1 and v2 API — Pydantic v2 (released 2023) changed many APIs: .dict() became .model_dump(), .json() became .model_dump_json(), @validator became @field_validator, and @root_validator became @model_validator. If you're copying code from older tutorials or Stack Overflow answers, check which version is installed with python -c 'import pydantic; print(pydantic.VERSION)' before debugging mysterious import errors.

Interview Questions on This Topic

  • QWhat is the difference between Pydantic's @field_validator and @model_validator, and when would you choose one over the other?
  • QHow does Pydantic handle data that doesn't exactly match the declared type — for example, a string passed to an integer field? What is coercion and when can it cause problems?
  • QIf you had a FastAPI endpoint that accepts a nested JSON body with 15 fields across 4 nested objects, how would you structure your Pydantic models and why would you avoid putting everything in a single flat model?

Frequently Asked Questions

What is Pydantic used for in Python?

Pydantic is primarily used for data validation and settings management. You define the shape and rules of your data as a Python class, and Pydantic automatically validates, coerces, and serializes that data at runtime. It's most commonly used in FastAPI applications, configuration management, and anywhere external data (API responses, user input, environment variables) needs to be validated before it reaches business logic.

Is Pydantic only for FastAPI?

Not at all — Pydantic is a standalone library that works in any Python project. FastAPI made it famous because it uses Pydantic for all request/response handling, but you can use Pydantic in CLI tools, data pipelines, background workers, Django apps, and configuration management. Anywhere you receive data from an external source is a valid use case.

What's the difference between Pydantic v1 and Pydantic v2?

Pydantic v2 rewrote the validation core in Rust, making it roughly 5-50x faster than v1. The API also changed significantly: .dict() is now .model_dump(), .json() is now .model_dump_json(), @validator is now @field_validator, and @root_validator is now @model_validator. If you're following a tutorial from before mid-2023, assume it's using v1 syntax and check the Pydantic migration guide before adapting examples.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousPython Performance OptimisationNext →Python Concurrency — asyncio Deep Dive
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged