Pydantic for Data Validation in Python — A Real-World Guide
Every production Python application eats data it didn't create — JSON from an API, environment variables, user form submissions, database records. Any one of those can arrive malformed, missing a field, or with the wrong type. When that bad data reaches your business logic, the errors that surface are cryptic, hard to trace, and expensive to debug in production. Data validation isn't optional; it's the front door of a reliable system.
Before Pydantic, the typical approach was a tangle of if statements, isinstance() checks, and manual type coercion spread across dozens of functions. It worked, but it was brittle, verbose, and impossible to maintain at scale. Pydantic solves this by letting you declare your data's shape as a plain Python class — using type hints you're already writing — and then automatically validates, coerces, and documents that shape for you. One model definition does the work of fifty hand-rolled checks.
By the end of this article you'll know how to build Pydantic models that validate real API payloads, write custom validators for business rules that go beyond basic types, handle nested data structures confidently, and avoid the three mistakes that catch most intermediate developers off guard. You'll also understand why FastAPI, the most popular Python web framework of the last four years, bets its entire request/response pipeline on Pydantic.
Your First Pydantic Model — Why Type Hints Alone Aren't Enough
Python's type hints are annotations — they're hints to your IDE and to tools like mypy, but the Python runtime itself ignores them completely. Write age: int in a plain dataclass and pass the string 'thirty' — Python won't complain. That silence is dangerous when data is coming from the outside world.
Pydantic changes the contract. When you inherit from BaseModel, those same type hints become enforced rules. Pydantic reads them at class-creation time and builds a validator for every field. At instantiation, every value is checked and coerced if possible — or rejected with a clear, structured error if it isn't.
The magic keyword here is coercion. If a field is typed as int and you pass '42', Pydantic doesn't refuse — it converts '42' to 42 for you. That's deliberate: real-world data sources like JSON, query strings, and environment variables are always strings first. Pydantic meets the real world halfway. But if you pass 'forty-two', it can't coerce that — and it tells you exactly which field failed, what value it received, and what type it expected.
from pydantic import BaseModel, EmailStr, ValidationError # Install pydantic with: pip install pydantic[email] # The [email] extra adds email validation support via the 'email-validator' package class UserRegistration(BaseModel): username: str # Must be a string — Pydantic will coerce int/float to str email: EmailStr # Must be a valid email address format, not just any string age: int # Pydantic will coerce '25' -> 25, but fail on 'twenty-five' is_active: bool = True # Optional field with a default value # --- Happy path: valid data coming in from a registration form --- new_user = UserRegistration( username="ada_lovelace", email="ada@babbage.co.uk", age="28", # Passed as a string (as it would be from a web form) — Pydantic coerces it is_active=True ) print("=== Successful Registration ===") print(f"Username : {new_user.username}") print(f"Email : {new_user.email}") print(f"Age : {new_user.age} (type: {type(new_user.age).__name__})") # Confirm it's now an int print(f"Active : {new_user.is_active}") print(f"As dict : {new_user.model_dump()}") # model_dump() serializes the model to a plain dict print() # --- Unhappy path: bad data — what happens in production when input isn't sanitized --- print("=== Failed Registration (bad data) ===") try: bad_user = UserRegistration( username="ghost_user", email="not-an-email", # Fails EmailStr validation age="twenty-five", # Cannot be coerced to int ) except ValidationError as validation_error: # Pydantic gives us a structured error — we can log it, return it as JSON, or display it print(f"Caught {validation_error.error_count()} validation error(s):") for error in validation_error.errors(): print(f" Field : {' -> '.join(str(loc) for loc in error['loc'])}") print(f" Issue : {error['msg']}") print(f" Input : {error['input']}") print()
Username : ada_lovelace
Email : ada@babbage.co.uk
Age : 28 (type: int)
Active : True
As dict : {'username': 'ada_lovelace', 'email': 'ada@babbage.co.uk', 'age': 28, 'is_active': True}
=== Failed Registration (bad data) ===
Caught 2 validation error(s):
Field : email
Issue : value is not a valid email address: An email address must have an @-sign.
Input : not-an-email
Field : age
Issue : Input should be a valid integer, unable to parse string as an integer
Input : twenty-five
Custom Validators and Field Constraints — Encoding Business Rules
Type checking gets you 70% of the way. But real business rules are more nuanced: a username can't contain spaces, a price can't be negative, a start date must be before an end date. These rules live in your domain — Pydantic can't guess them. That's where Field() constraints and the @field_validator decorator come in.
Field() lets you attach constraints directly to a type declaration — things like minimum/maximum values, string length limits, and regex patterns. This is the right tool for single-field rules that are purely about the value's shape. They're also automatically reflected in the JSON Schema Pydantic generates, which means FastAPI's auto-docs (Swagger UI) will show these constraints to API consumers for free.
For rules that involve logic — especially rules that span multiple fields — @field_validator and @model_validator are your tools. A @field_validator runs after the type has been coerced, so you're always working with a Python int, not the raw string '42'. A @model_validator with mode='after' fires after the entire model is populated, giving you access to all fields at once. That's how you implement 'end date must be after start date' without hacks.
from datetime import date from pydantic import BaseModel, Field, field_validator, model_validator from typing import Optional class ProductListing(BaseModel): # Field() constraints are checked BEFORE custom validators run name: str = Field( min_length=2, max_length=100, description="Product display name" ) price_usd: float = Field( gt=0, # gt = greater than (strictly). Use ge= for greater-than-or-equal le=99999.99, # le = less than or equal description="Price in US dollars" ) discount_pct: Optional[float] = Field( default=0.0, ge=0.0, # Cannot be negative lt=100.0 # Cannot be 100% or more (would be free or negative price) ) available_from: date available_until: Optional[date] = None sku: str = Field( pattern=r'^[A-Z]{2,4}-\d{4,8}$', # e.g. ELEC-00142 or TV-9982100 description="Stock Keeping Unit code" ) # @field_validator runs AFTER the field is already its correct Python type @field_validator('name') @classmethod def name_must_not_be_generic(cls, product_name: str) -> str: # Strip whitespace first — a common real-world data cleaning step cleaned_name = product_name.strip() generic_names = {'product', 'item', 'thing', 'unnamed', 'test'} if cleaned_name.lower() in generic_names: raise ValueError(f"Product name '{cleaned_name}' is too generic. Use a specific name.") return cleaned_name # Always return the (possibly cleaned) value # @model_validator with mode='after' has access to the fully-built model instance @model_validator(mode='after') def availability_window_must_be_valid(self) -> 'ProductListing': if self.available_until is not None: if self.available_until <= self.available_from: raise ValueError( f"available_until ({self.available_until}) must be " f"after available_from ({self.available_from})" ) return self # Always return self from a model_validator # --- Test 1: Valid product --- laptop = ProductListing( name=" ThinkPad X1 Carbon ", # Leading/trailing spaces — our validator cleans this price_usd=1299.99, discount_pct=15.0, available_from=date(2024, 1, 1), available_until=date(2024, 12, 31), sku="COMP-00891" ) print(f"Product name (cleaned): '{laptop.name}'") print(f"Price after discount: ${laptop.price_usd * (1 - laptop.discount_pct / 100):.2f}") print() # --- Test 2: Invalid product — multiple errors at once --- from pydantic import ValidationError try: bad_product = ProductListing( name="product", # Fails our custom validator price_usd=-50.0, # Fails gt=0 constraint available_from=date(2024, 6, 1), available_until=date(2024, 1, 1), # Fails model validator (before < after) sku="invalid-sku-format" # Fails regex pattern ) except ValidationError as e: print(f"Found {e.error_count()} errors:") for err in e.errors(): field = ' -> '.join(str(loc) for loc in err['loc']) print(f" [{field}] {err['msg']}")
Price after discount: $1104.99
Found 4 errors:
[name] Value error, Product name 'product' is too generic. Use a specific name.
[price_usd] Input should be greater than 0
[sku] String should match pattern '^[A-Z]{2,4}-\d{4,8}$'
[ProductListing] Value error, available_until (2024-01-01) must be after available_from (2024-06-01)
Nested Models and Real API Payloads — Where Pydantic Gets Powerful
Real-world data is almost never flat. An order has a customer, a customer has an address, an address has a country. When you nest Pydantic models inside each other, each layer validates independently — so an error in a deeply nested field tells you exactly where the problem is, not just 'something in the payload was wrong'.
You can nest models by declaring a field's type as another BaseModel subclass. Pydantic handles the recursive validation automatically. You can also use List[SomeModel] or Dict[str, SomeModel] for collections of validated objects — all the standard Python typing constructs work as you'd expect.
This is exactly how FastAPI uses Pydantic under the hood: every incoming request body is a nested model graph. The framework deserializes the raw JSON into Python objects, runs the full validation tree, and hands your route function a fully-typed, already-validated object. No manual parsing, no defensive dict.get() calls, no hidden None values where you expected a string.
from pydantic import BaseModel, Field, field_validator from typing import List from datetime import datetime from enum import Enum class OrderStatus(str, Enum): # Inheriting from str makes JSON serialization work seamlessly PENDING = "pending" CONFIRMED = "confirmed" SHIPPED = "shipped" DELIVERED = "delivered" CANCELLED = "cancelled" class ShippingAddress(BaseModel): street_line_1: str = Field(min_length=5) street_line_2: str = "" city: str postcode: str country_code: str = Field(pattern=r'^[A-Z]{2}$') # ISO 3166-1 alpha-2, e.g. 'US', 'GB' class OrderItem(BaseModel): product_sku: str product_name: str quantity: int = Field(ge=1) # Must order at least 1 unit_price_usd: float = Field(gt=0) @property def line_total(self) -> float: # Computed property — not a stored field, so Pydantic doesn't validate it # but it's always consistent with the validated data return round(self.quantity * self.unit_price_usd, 2) class CustomerOrder(BaseModel): order_id: str customer_email: str status: OrderStatus = OrderStatus.PENDING # Enum default shipping_address: ShippingAddress # Nested model — validated recursively items: List[OrderItem] = Field(min_length=1) # Order must have at least one item created_at: datetime = Field(default_factory=datetime.utcnow) # Auto-set on creation @field_validator('order_id') @classmethod def order_id_must_be_uppercase(cls, raw_id: str) -> str: return raw_id.upper() # Normalise — so 'ord-1234' and 'ORD-1234' are treated the same @property def order_total(self) -> float: return round(sum(item.line_total for item in self.items), 2) # Simulating a JSON payload arriving from an API client (e.g. a mobile app checkout) # In real life this would be: CustomerOrder(**request.json()) or FastAPI does it automatically raw_payload = { "order_id": "ord-20240815-9921", "customer_email": "grace@hopper.dev", "shipping_address": { "street_line_1": "123 Compiler Street", "city": "Arlington", "postcode": "22201", "country_code": "US" }, "items": [ {"product_sku": "BOOK-0042", "product_name": "The Art of Computer Programming", "quantity": 2, "unit_price_usd": 79.99}, {"product_sku": "BOOK-0187", "product_name": "Clean Code", "quantity": 1, "unit_price_usd": 34.50} ] } order = CustomerOrder(**raw_payload) print(f"Order ID : {order.order_id}") # Note: auto-uppercased print(f"Customer : {order.customer_email}") print(f"Status : {order.status.value}") print(f"Ship to : {order.shipping_address.city}, {order.shipping_address.country_code}") print(f"Items : {len(order.items)}") for item in order.items: print(f" - {item.product_name} x{item.quantity} @ ${item.unit_price_usd} = ${item.line_total}") print(f"Order Total : ${order.order_total}") print() # Serialise the whole nested structure to a dict (ready to store in a database or return as JSON) order_dict = order.model_dump() print("Serialised keys at top level:", list(order_dict.keys())) print("Nested address keys:", list(order_dict['shipping_address'].keys()))
Customer : grace@hopper.dev
Status : pending
Ship to : Arlington, US
Items : 2
- The Art of Computer Programming x2 @ $79.99 = $159.98
- Clean Code x1 @ $34.5 = $34.5
Order Total : $194.48
Serialised keys at top level: ['order_id', 'customer_email', 'status', 'shipping_address', 'items', 'created_at']
Nested address keys: ['street_line_1', 'street_line_2', 'city', 'postcode', 'country_code']
Pydantic Settings — Validating Config and Environment Variables
BaseModel is for validating data that flows through your application. BaseSettings — from the pydantic-settings package — is for validating the environment your application runs in. The distinction matters: your app's config (database URL, API keys, port number, debug flags) is just data that happens to come from environment variables and .env files instead of JSON.
Without BaseSettings, it's common to scatter os.environ.get('DATABASE_URL', 'localhost') calls through your codebase. The problem: you only find out a required variable is missing when the code path that uses it runs — possibly hours or days into a production deployment. BaseSettings validates all of your config at startup. If DATABASE_URL is missing, your app refuses to start and tells you exactly which variable is missing before it does any damage.
This pattern — fail fast with a clear message rather than fail slowly with a cryptic error — is one of the most valuable architectural habits you can build. BaseSettings makes it trivially easy.
# Install: pip install pydantic-settings # Create a .env file in the same directory with the content shown below before running from pydantic import Field, AnyHttpUrl, field_validator from pydantic_settings import BaseSettings, SettingsConfigDict from typing import List # --- Contents of your .env file (do NOT commit this to git) --- # DATABASE_URL=postgresql://user:pass@localhost:5432/myapp # API_SECRET_KEY=super-secret-key-change-this-in-prod # DEBUG_MODE=false # ALLOWED_ORIGINS=http://localhost:3000,https://myapp.com # MAX_CONNECTIONS=10 class AppSettings(BaseSettings): # SettingsConfigDict tells Pydantic WHERE to look for values model_config = SettingsConfigDict( env_file='.env', # Read from .env file env_file_encoding='utf-8', case_sensitive=False, # DATABASE_URL and database_url are the same extra='ignore' # Don't crash if .env has extra keys we don't declare ) # Required fields — no default means the app REFUSES to start if these are missing database_url: str = Field(description="Full PostgreSQL connection string") api_secret_key: str = Field(min_length=16, description="JWT signing secret") # Optional fields with sensible defaults debug_mode: bool = False app_port: int = Field(default=8000, ge=1024, le=65535) max_connections: int = Field(default=5, ge=1, le=100) # A comma-separated list in an env var — Pydantic can handle this pattern allowed_origins: List[str] = Field(default=["http://localhost:3000"]) @field_validator('api_secret_key') @classmethod def secret_key_must_not_be_default(cls, key_value: str) -> str: insecure_defaults = {'secret', 'password', 'changeme', 'default', 'test'} if key_value.lower() in insecure_defaults: raise ValueError( "API_SECRET_KEY is set to an insecure default value. " "Set a strong random secret before deploying." ) return key_value # This is the pattern: load settings ONCE at module level. # Import `settings` from this module everywhere else — never call AppSettings() twice. try: settings = AppSettings() print("=== Application Config Loaded Successfully ===") print(f"Database : {settings.database_url[:30]}...") # Don't log full credentials print(f"Port : {settings.app_port}") print(f"Debug Mode : {settings.debug_mode}") print(f"Max Conns : {settings.max_connections}") print(f"CORS Origins : {settings.allowed_origins}") except Exception as config_error: # In a real app you'd use logging.critical() here and sys.exit(1) print(f"FATAL: Could not load application config — {config_error}") print("Application startup aborted.")
Database : postgresql://user:pass@localhos...
Port : 8000
Debug Mode : False
Max Conns : 10
Cors Origins : ['http://localhost:3000', 'https://myapp.com']
| Aspect | Plain Python Dataclasses | Pydantic BaseModel |
|---|---|---|
| Runtime type validation | None — type hints are ignored | Full validation on every instantiation |
| Type coercion (str -> int) | Not performed | Automatic where safe (e.g. '42' -> 42) |
| Error messages | Python's default TypeError (often cryptic) | Structured errors with field name, input, and reason |
| Nested model validation | Manual — you write the logic yourself | Automatic and recursive out of the box |
| JSON serialization | Requires manual dict() or asdict() | Built-in .model_dump() and .model_dump_json() |
| JSON Schema generation | Not supported | Built-in .model_json_schema() |
| Custom validators | Not a built-in concept | @field_validator and @model_validator decorators |
| Default factories | field(default_factory=...) supported | Field(default_factory=...) supported |
| Environment variable parsing | Not supported | BaseSettings from pydantic-settings package |
| Performance (Pydantic v2) | Native Python speed | Rust-powered core — ~5-50x faster than Pydantic v1 |
🎯 Key Takeaways
- Pydantic turns Python type hints from passive documentation into active runtime enforcement — the same syntax, a completely different guarantee.
- Type coercion is a feature, not a bug: '42' becomes 42, but 'forty-two' raises a clear ValidationError — Pydantic meets real-world string data halfway.
- Nest models inside models freely — Pydantic validates each layer recursively and pinpoints exactly which nested field failed, saving hours of debugging.
- Use BaseSettings (from pydantic-settings) to validate environment variables at startup — apps that fail fast with a clear config error are infinitely easier to operate than apps that crash mysteriously at runtime.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Forgetting to return the value from @field_validator — If you validate but don't return the (possibly transformed) value, Pydantic silently sets the field to None even though no error was raised. Always end every @field_validator with
return value. Run a quick test: print the field after instantiation to confirm it's not None. - ✕Mistake 2: Using mutable defaults like lists or dicts directly in Field() — Writing
tags: List[str] = []at the class level is a classic Python gotcha. All instances share the same list object. Fix it withtags: List[str] = Field(default_factory=list)— this creates a fresh list for each new model instance. - ✕Mistake 3: Confusing Pydantic v1 and v2 API — Pydantic v2 (released 2023) changed many APIs:
.dict()became.model_dump(),.json()became.model_dump_json(),@validatorbecame@field_validator, and@root_validatorbecame@model_validator. If you're copying code from older tutorials or Stack Overflow answers, check which version is installed withpython -c 'import pydantic; print(pydantic.VERSION)'before debugging mysterious import errors.
Interview Questions on This Topic
- QWhat is the difference between Pydantic's @field_validator and @model_validator, and when would you choose one over the other?
- QHow does Pydantic handle data that doesn't exactly match the declared type — for example, a string passed to an integer field? What is coercion and when can it cause problems?
- QIf you had a FastAPI endpoint that accepts a nested JSON body with 15 fields across 4 nested objects, how would you structure your Pydantic models and why would you avoid putting everything in a single flat model?
Frequently Asked Questions
What is Pydantic used for in Python?
Pydantic is primarily used for data validation and settings management. You define the shape and rules of your data as a Python class, and Pydantic automatically validates, coerces, and serializes that data at runtime. It's most commonly used in FastAPI applications, configuration management, and anywhere external data (API responses, user input, environment variables) needs to be validated before it reaches business logic.
Is Pydantic only for FastAPI?
Not at all — Pydantic is a standalone library that works in any Python project. FastAPI made it famous because it uses Pydantic for all request/response handling, but you can use Pydantic in CLI tools, data pipelines, background workers, Django apps, and configuration management. Anywhere you receive data from an external source is a valid use case.
What's the difference between Pydantic v1 and Pydantic v2?
Pydantic v2 rewrote the validation core in Rust, making it roughly 5-50x faster than v1. The API also changed significantly: .dict() is now .model_dump(), .json() is now .model_dump_json(), @validator is now @field_validator, and @root_validator is now @model_validator. If you're following a tutorial from before mid-2023, assume it's using v1 syntax and check the Pydantic migration guide before adapting examples.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.