DynamoDB Basics Explained: Tables, Keys, and Real-World Patterns
Every app eventually hits a wall with its database. SQL tables start choking on hundreds of millions of rows, joins get expensive, and scaling horizontally becomes an engineering nightmare. Amazon DynamoDB was built specifically to shatter that wall — it's the database that powers Amazon's own shopping cart, which handles millions of writes per second during Prime Day without breaking a sweat. If you're building anything at scale on AWS, DynamoDB will cross your path sooner or later.
The core problem DynamoDB solves is predictable performance at any scale. Traditional relational databases slow down as data grows because query planners scan more rows and indexes get larger. DynamoDB sidesteps this entirely by requiring you to declare your access patterns upfront. You design your keys so that every query hits exactly one partition — think of it as pre-building the fast path before traffic arrives, not scrambling to optimize after the fact.
By the end of this article you'll understand how DynamoDB's partition and sort keys actually work under the hood, how to model a real e-commerce order system without a single SQL join, why single-table design exists and when it's overkill, and the two mistakes that will silently ruin your DynamoDB performance in production.
Partition Keys and Sort Keys: The Engine Under the Hood
Every DynamoDB table has a primary key. That key comes in two flavors: a simple primary key (just a partition key) or a composite primary key (partition key + sort key). Understanding the difference isn't just syntax trivia — it determines what queries you can run efficiently.
The partition key is hashed by DynamoDB internally to decide which physical server (partition) stores your item. This is why it's sometimes called a hash key. Every read and write for that item goes to exactly that one partition. The golden rule: items with the same partition key live on the same server. That's powerful — it means related data is co-located — but it also means if one partition key gets hit with 90% of your traffic, you have a 'hot partition' problem and performance tanks.
The sort key (also called a range key) doesn't affect which partition stores the item, but it determines the order of items within that partition. This lets you query a range — 'give me all orders for customer-42 placed in the last 30 days' — in a single, efficient request. That range query is only possible because the sort key values are stored in sorted order on disk within each partition. Think of the partition key as the drawer label in a filing cabinet, and the sort key as the alphabetical tabs inside that drawer.
import boto3 # Create a DynamoDB client pointing at a local DynamoDB instance for dev # In production, remove 'endpoint_url' and use your AWS region dynamodb = boto3.resource( 'dynamodb', region_name='us-east-1', endpoint_url='http://localhost:8000' # local DynamoDB for safe testing ) # Create the Orders table with a COMPOSITE primary key: # partition_key = customer_id (groups all orders for one customer together) # sort_key = order_date#order_id (allows range queries by date) table = dynamodb.create_table( TableName='Orders', KeySchema=[ { 'AttributeName': 'customer_id', # partition key 'KeyType': 'HASH' }, { 'AttributeName': 'order_date_order_id', # sort key (composite value) 'KeyType': 'RANGE' } ], AttributeDefinitions=[ # Only KEY attributes go here — DynamoDB is schemaless for everything else {'AttributeName': 'customer_id', 'AttributeType': 'S'}, # S = String {'AttributeName': 'order_date_order_id','AttributeType': 'S'} ], BillingMode='PAY_PER_REQUEST' # on-demand pricing — no capacity planning needed for dev ) # Wait until the table is actually ready before writing to it table.wait_until_exists() print(f"Table status: {table.table_status}") print(f"Table name: {table.table_name}")
Table name: Orders
Writing and Reading Data — With a Real E-Commerce Pattern
Now that the table exists, let's populate it with real data and then query it the way a production app would. The key insight here is that DynamoDB offers two fundamentally different operations: GetItem (fetch by exact primary key — always fast, O(1)) and Query (fetch all items sharing a partition key, optionally filtered by sort key — still fast because it stays within one partition). There's also Scan, which reads every item in the table — treat this like a last resort in production.
For our e-commerce example, the pattern is: partition key = customer_id, sort key = a string that combines the date and the order ID separated by a # symbol. That separator trick is deliberately chosen. Because DynamoDB sorts strings lexicographically, the ISO date format (YYYY-MM-DD) sorts chronologically. So 'give me all orders after 2024-01-01' becomes a simple sort key condition — begins_with or between — without scanning the whole table.
This is the core discipline of DynamoDB design: your sort key isn't just an ID, it's a query tool. Every character in it is a deliberate choice about what queries you want to be able to answer efficiently.
import boto3 from boto3.dynamodb.conditions import Key from decimal import Decimal # DynamoDB uses Decimal, not float, for numbers dynamodb = boto3.resource( 'dynamodb', region_name='us-east-1', endpoint_url='http://localhost:8000' ) table = dynamodb.Table('Orders') # ─── WRITE: Put three orders for the same customer ───────────────────────── # Sort key format: "YYYY-MM-DD#ORDER-ID" — date first so range queries work orders_to_insert = [ { 'customer_id': 'CUST-42', 'order_date_order_id': '2024-01-15#ORD-001', 'total_usd': Decimal('59.99'), 'status': 'DELIVERED', 'items': ['Mechanical Keyboard', 'USB Hub'] }, { 'customer_id': 'CUST-42', 'order_date_order_id': '2024-03-22#ORD-002', 'total_usd': Decimal('149.00'), 'status': 'SHIPPED', 'items': ['Monitor Stand'] }, { 'customer_id': 'CUST-42', 'order_date_order_id': '2024-06-10#ORD-003', 'total_usd': Decimal('29.99'), 'status': 'PROCESSING', 'items': ['Desk Mat'] } ] for order in orders_to_insert: table.put_item(Item=order) # put_item: create or fully overwrite an item print("Inserted 3 orders for CUST-42") # ─── READ: Get ONE specific order by its full primary key ─────────────────── response = table.get_item( Key={ 'customer_id': 'CUST-42', 'order_date_order_id': '2024-03-22#ORD-002' } ) specific_order = response.get('Item', {}) print(f"\nSpecific order: {specific_order['order_date_order_id']} — ${specific_order['total_usd']}") # ─── QUERY: Get all orders for CUST-42 placed on or after 2024-03-01 ──────── # 'between' on the sort key stays inside one partition — this is efficient! range_response = table.query( KeyConditionExpression= Key('customer_id').eq('CUST-42') & Key('order_date_order_id').between('2024-03-01', '2024-12-31') ) print(f"\nOrders from March 2024 onwards ({range_response['Count']} found):") for order in range_response['Items']: print(f" {order['order_date_order_id']} | ${order['total_usd']} | {order['status']}") # ─── UPDATE: Change just the status field without rewriting the whole item ─── table.update_item( Key={ 'customer_id': 'CUST-42', 'order_date_order_id': '2024-03-22#ORD-002' }, UpdateExpression='SET #s = :new_status', # #s avoids 'status' reserved word conflict ExpressionAttributeNames={'#s': 'status'}, ExpressionAttributeValues={':new_status': 'DELIVERED'} ) print("\nUpdated ORD-002 status to DELIVERED")
Specific order: 2024-03-22#ORD-002 — $149.00
Orders from March 2024 onwards (2 found):
2024-03-22#ORD-002 | $149.00 | SHIPPED
2024-06-10#ORD-003 | $29.99 | PROCESSING
Updated ORD-002 status to DELIVERED
Global Secondary Indexes: Querying Data a Different Way
Here's the problem you'll hit in week two of using DynamoDB: your table is perfectly designed for 'get all orders by customer', but product management just asked for 'get all orders with status SHIPPED'. That query doesn't match your partition key at all. In SQL, you'd add an index. In DynamoDB, you add a Global Secondary Index (GSI).
A GSI is essentially a separate, automatically maintained copy of your table organized around a different key. You define a new partition key (and optionally a sort key), and DynamoDB replicates writes to that index asynchronously. Queries against a GSI are just as fast as queries against the base table — because the same partitioning logic applies.
The word 'global' means the index spans the entire table, not just one partition. There's also a Local Secondary Index (LSI), which must share the base table's partition key but can have a different sort key — and it must be defined at table creation time, not added later. GSIs can be added to existing tables, making them far more flexible in practice.
The design trade-off: every GSI costs you money (storage + write capacity) and adds a few milliseconds of eventual consistency lag. You're not getting something for free — you're choosing which query patterns deserve their own fast path.
import boto3 from boto3.dynamodb.conditions import Key dynamodb = boto3.resource( 'dynamodb', region_name='us-east-1', endpoint_url='http://localhost:8000' ) # Add a GSI to the existing Orders table so we can query by order status # GSI partition key: status | GSI sort key: order_date_order_id # This lets us ask: "give me all SHIPPED orders, most recent first" dynamodb.meta.client.update_table( TableName='Orders', AttributeDefinitions=[ {'AttributeName': 'status', 'AttributeType': 'S'}, {'AttributeName': 'order_date_order_id', 'AttributeType': 'S'} ], GlobalSecondaryIndexUpdates=[ { 'Create': { 'IndexName': 'StatusByDate-GSI', 'KeySchema': [ {'AttributeName': 'status', 'KeyType': 'HASH'}, {'AttributeName': 'order_date_order_id', 'KeyType': 'RANGE'} ], 'Projection': { # INCLUDE only the fields we actually need — saves cost vs ALL 'ProjectionType': 'INCLUDE', 'NonKeyAttributes': ['customer_id', 'total_usd'] } } } ] ) print("GSI creation initiated (takes a moment to backfill)...") # Wait for the GSI to become ACTIVE before querying it waiter = dynamodb.meta.client.get_waiter('table_exists') waiter.wait(TableName='Orders') table = dynamodb.Table('Orders') # ─── Query via GSI: all SHIPPED orders ────────────────────────────────────── gsi_response = table.query( IndexName='StatusByDate-GSI', # tell DynamoDB to use our new index KeyConditionExpression=Key('status').eq('SHIPPED') ) print(f"\nAll SHIPPED orders ({gsi_response['Count']} found):") for order in gsi_response['Items']: print(f" Customer: {order['customer_id']} | " f"Order: {order['order_date_order_id']} | " f"Total: ${order['total_usd']}")
All SHIPPED orders (1 found):
Customer: CUST-42 | Order: 2024-03-22#ORD-002 | Total: $149.00
Single-Table Design: Why One Table Often Beats Many
Single-table design (STD) is the concept that trips up virtually every developer migrating from SQL. The instinct is to create one DynamoDB table per entity — an Orders table, a Customers table, a Products table — mirroring what you'd do in a relational database. But DynamoDB wasn't designed for that, and doing it that way throws away its biggest advantage.
In SQL, you join across tables at query time. DynamoDB has no joins. Instead, single-table design stores heterogeneous entity types in the same table, using generic attribute names like PK and SK (partition key and sort key) whose values encode the entity type. For example: PK='CUSTOMER#CUST-42', SK='PROFILE' for a customer record, and PK='CUSTOMER#CUST-42', SK='ORDER#2024-03-22#ORD-002' for an order belonging to that customer. Now a single Query call with PK='CUSTOMER#CUST-42' returns the customer profile AND all their orders in one network round-trip.
This is powerful for read-heavy access patterns, but it comes with a real cost: the model is harder to reason about, harder to query ad-hoc (e.g., for analytics), and painful if your access patterns change. Single-table design is a deliberate trade: you sacrifice flexibility for performance and cost efficiency. It's the right call for a microservice with stable, well-understood access patterns. It's the wrong call for an exploratory analytics workload — use Athena or Redshift for that.
import boto3 from boto3.dynamodb.conditions import Key from decimal import Decimal dynamodb = boto3.resource( 'dynamodb', region_name='us-east-1', endpoint_url='http://localhost:8000' ) # One table to rule them all — stores Customers AND Orders together # Generic key names PK / SK are conventional in single-table design table = dynamodb.create_table( TableName='ECommerceApp', KeySchema=[ {'AttributeName': 'PK', 'KeyType': 'HASH'}, {'AttributeName': 'SK', 'KeyType': 'RANGE'} ], AttributeDefinitions=[ {'AttributeName': 'PK', 'AttributeType': 'S'}, {'AttributeName': 'SK', 'AttributeType': 'S'} ], BillingMode='PAY_PER_REQUEST' ) table.wait_until_exists() # ─── Write a CUSTOMER record ───────────────────────────────────────────────── # PK and SK both describe what the item IS — the value encodes entity type table.put_item(Item={ 'PK': 'CUSTOMER#CUST-42', # partition key identifies the customer 'SK': 'PROFILE', # sort key distinguishes record type 'entity_type': 'Customer', # explicit type tag — helps with filtering 'full_name': 'Alex Rivera', 'email': 'alex@example.com', 'loyalty_tier': 'Gold' }) # ─── Write two ORDERS for the same customer ────────────────────────────────── # Same PK as the customer — they share a partition, so one Query fetches both table.put_item(Item={ 'PK': 'CUSTOMER#CUST-42', 'SK': 'ORDER#2024-01-15#ORD-001', # ORDER# prefix separates orders from profile 'entity_type': 'Order', 'total_usd': Decimal('59.99'), 'status': 'DELIVERED' }) table.put_item(Item={ 'PK': 'CUSTOMER#CUST-42', 'SK': 'ORDER#2024-06-10#ORD-003', 'entity_type': 'Order', 'total_usd': Decimal('29.99'), 'status': 'PROCESSING' }) print("Inserted customer profile and 2 orders into single table") # ─── Fetch customer profile + ALL orders in ONE query call ─────────────────── # begins_with('ORDER#') filters to just orders, skipping the PROFILE item all_items_response = table.query( KeyConditionExpression=Key('PK').eq('CUSTOMER#CUST-42') ) print(f"\nAll items for CUST-42 ({all_items_response['Count']} total):") for item in all_items_response['Items']: print(f" [{item['entity_type']}] SK={item['SK']}") # Get ONLY the orders with a begins_with filter on the sort key orders_only_response = table.query( KeyConditionExpression= Key('PK').eq('CUSTOMER#CUST-42') & Key('SK').begins_with('ORDER#') ) print(f"\nOrders only ({orders_only_response['Count']} found):") for order in orders_only_response['Items']: print(f" {order['SK']} | ${order['total_usd']} | {order['status']}")
All items for CUST-42 (3 total):
[Customer] SK=PROFILE
[Order] SK=ORDER#2024-01-15#ORD-001
[Order] SK=ORDER#2024-06-10#ORD-003
Orders only (2 found):
ORDER#2024-01-15#ORD-001 | $59.99 | DELIVERED
ORDER#2024-06-10#ORD-003 | $29.99 | PROCESSING
| Feature / Aspect | DynamoDB (NoSQL) | PostgreSQL (Relational SQL) |
|---|---|---|
| Data Model | Key-value + document (schemaless per item) | Fixed schema with typed columns |
| Query Flexibility | Only by primary key or GSI — patterns must be pre-planned | Any column at any time via ad-hoc SQL |
| Join Support | None — use single-table design or denormalization | Full JOIN support across tables |
| Horizontal Scaling | Automatic, infinite — no config needed | Complex — requires read replicas or sharding |
| Read Consistency | Eventually consistent by default; strongly consistent optional | ACID-compliant, always strongly consistent |
| Schema Changes | Add attributes anytime — no migration needed | ALTER TABLE required — can lock tables at scale |
| Pricing Model | Pay per read/write unit or on-demand | Pay for server/instance size (always on) |
| Best For | Predictable, high-throughput access patterns at scale | Complex queries, reporting, transactional integrity |
🎯 Key Takeaways
- DynamoDB's partition key is hashed to determine physical storage location — choose high-cardinality values or you'll create hot partitions that throttle your entire application.
- The sort key is a query tool, not just an identifier — use composite sort key values like 'DATE#ID' to enable efficient range queries that stay within a single partition.
- Global Secondary Indexes let you query by a different key, but they cost money, add eventual consistency lag, and are not a substitute for proper upfront key design — they supplement it.
- Single-table design co-locates related entities under one partition key, eliminating the need for joins and reducing network round-trips — but it only pays off when your access patterns are stable and well-defined.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Choosing a low-cardinality partition key (e.g., 'status' with only 3 values) — Symptom: one or two partitions get all the traffic (hot partition), throughput is throttled even though you're under your overall capacity limit, and you see ProvisionedThroughputExceededException on specific items. Fix: Use a high-cardinality partition key like user_id or order_id. If you truly need to query by status, add a GSI with status as the partition key and accept that traffic will be distributed across those few partitions — or use write sharding by appending a random suffix to the partition key and querying all shards in parallel.
- ✕Mistake 2: Using Scan instead of Query for production reads — Symptom: table reads are slow and cost spikes dramatically as the table grows; a 10GB table Scan consumes 10GB of read capacity units every time regardless of how many items match. Fix: Redesign your access pattern to use a Query (requires a known partition key). If you genuinely need to search without a partition key, add a GSI whose partition key matches your search field. Reserve Scan strictly for one-off data migrations or admin scripts that run during off-peak hours with a low-throughput limit set via the Limit parameter.
- ✕Mistake 3: Treating update_item like a safe partial update without condition expressions — Symptom: concurrent Lambda functions both read an item, both compute a new value, and both write back — the last write wins silently, losing the first update (classic lost update race condition). Fix: Use ConditionExpression in update_item to implement optimistic locking. Add a version attribute to each item, increment it on every write, and include ConditionExpression='version = :expected_version' in your update. DynamoDB will reject the write if another process already incremented the version, and you can retry safely.
Interview Questions on This Topic
- QCan you explain the difference between a partition key and a sort key, and give an example of when you'd use a composite primary key over a simple one?
- QWhat is a hot partition in DynamoDB, how does it happen, and what strategies do you use to prevent it in a high-traffic application?
- QIf a colleague proposes creating separate DynamoDB tables for each entity type in a new microservice — one for Users, one for Orders, one for Products — how would you evaluate that approach and what alternative would you consider?
Frequently Asked Questions
What is the difference between DynamoDB GetItem and Query?
GetItem fetches a single item when you know its complete primary key (partition key + sort key) — it's always O(1) and the fastest possible read. Query fetches multiple items that share the same partition key, optionally filtered by sort key conditions. Query is efficient because it reads only one partition, but it returns a collection rather than a single item. Never use Scan when Query will do the job.
Can I change my DynamoDB partition key after creating the table?
No — the primary key (partition key and sort key) is set at table creation and cannot be changed. If you need a different key structure, you have to create a new table and migrate your data to it. This is the most important reason to design your access patterns carefully before creating your DynamoDB table.
Is DynamoDB always eventually consistent, or can I get strongly consistent reads?
DynamoDB offers both. By default, reads are eventually consistent, meaning you might read slightly stale data for a few milliseconds after a write. You can request strongly consistent reads by setting ConsistentRead=True in your GetItem or Query call — this guarantees you see the latest committed data. Strongly consistent reads cost twice as many read capacity units, and they are not available on Global Secondary Indexes at all.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.