Python Iterators and Iterables Explained — How Loops Really Work
Every Python developer uses for-loops from day one, but almost nobody stops to ask: how does Python actually know what to do next on each loop cycle? The answer lives inside a two-method protocol — __iter__ and __next__ — and once you understand it, you'll see it everywhere: in file reading, database cursors, API pagination, and streaming data pipelines. This isn't just academic knowledge; it's the engine under the hood of the language itself.
The problem this solves is memory and control. If Python loaded every item from a collection into memory before you could loop over it, working with a 10-million-row CSV file or an infinite sequence of sensor readings would be impossible. Iterators let you process one item at a time, on demand, without ever needing to know how many items exist in total. This lazy evaluation is what makes Python practical for real data-engineering work.
By the end of this article you'll be able to explain exactly what happens when Python executes a for-loop, write your own custom iterator class from scratch, spot the difference between an iterable and an iterator in a code review, and avoid the subtle bugs that trip up even experienced developers when they assume an iterator can be reused.
The Two-Protocol System: Iterable vs Iterator
Python draws a firm line between two roles. An iterable is any object that knows how to produce an iterator — it has an __iter__ method that returns one. A list, a string, a tuple, a dict — all iterables. An iterator is the stateful worker that actually does the traversal. It has both __iter__ (which just returns itself) and __next__ (which delivers the next item or raises StopIteration when it's done).
This separation exists for a good reason: you want to be able to loop over the same list a hundred times without it 'running out'. The list (iterable) stays neutral. Each time you start a new loop, Python silently calls iter(your_list) to create a fresh iterator — a brand-new dealer for your deck of cards.
You can manually step through this process with Python's built-in iter() and next() functions, which is exactly what a for-loop does internally on every single iteration. Understanding this unlocks everything else in this article.
# A plain list is an ITERABLE — it can produce iterators on demand playlist = ["Bohemian Rhapsody", "Hotel California", "Stairway to Heaven"] # iter() calls playlist.__iter__() and returns a fresh ITERATOR playlist_iterator = iter(playlist) # next() calls playlist_iterator.__next__() each time print(next(playlist_iterator)) # Fetches item 1 — iterator remembers position print(next(playlist_iterator)) # Fetches item 2 — picks up exactly where it left off print(next(playlist_iterator)) # Fetches item 3 — last item # The iterator is now exhausted — calling next() again raises StopIteration try: print(next(playlist_iterator)) except StopIteration: print("Iterator exhausted — no more songs!") # The original list is UNCHANGED — start a fresh iterator anytime fresh_iterator = iter(playlist) print(next(fresh_iterator)) # Back to the beginning — "Bohemian Rhapsody" # A for-loop does ALL of this invisibly on every iteration for song in playlist: # Python calls iter(playlist) once here print(f"Now playing: {song}") # Then calls next() on each cycle
Hotel California
Stairway to Heaven
Iterator exhausted — no more songs!
Bohemian Rhapsody
Now playing: Bohemian Rhapsody
Now playing: Hotel California
Now playing: Stairway to Heaven
Building a Custom Iterator — A Real-World File Chunker
Here's where things get genuinely useful. Let's say you're processing a large log file and you want to read it in fixed-size chunks rather than line by line or all at once. You can't do this elegantly with a plain list. This is the exact scenario custom iterators were made for.
To make an object an iterator, you implement two methods: __iter__ returns self (because the iterator is its own iterable), and __next__ returns the next value or raises StopIteration. That's the entire contract.
The power here is state. Your iterator class can carry any state it needs between calls to __next__ — a file handle, a counter, a buffer, a database cursor position. This is what separates a custom iterator from a simple function: it pauses between calls and picks up exactly where it left off, making it perfect for streaming, pagination, and lazy computation.
class FileChunkIterator: """ Reads a text file in fixed-size chunks. Useful for processing large files without loading them entirely into memory. """ def __init__(self, filepath, chunk_size=5): self.filepath = filepath self.chunk_size = chunk_size # How many lines per chunk self._file_handle = None # Will hold the open file — initialised in __iter__ self._line_buffer = [] # Accumulates lines until chunk is full def __iter__(self): # Open the file fresh each time iteration starts # This allows the iterator to be restarted cleanly self._file_handle = open(self.filepath, "r") self._line_buffer = [] return self # An iterator must return itself here def __next__(self): # Try to fill a chunk from the file while len(self._line_buffer) < self.chunk_size: raw_line = self._file_handle.readline() if not raw_line: # readline() returns '' at end of file break self._line_buffer.append(raw_line.strip()) if not self._line_buffer: # Buffer empty — file is fully consumed self._file_handle.close() # Clean up the file handle raise StopIteration # Signal to the for-loop: we're done # Pull exactly chunk_size lines (or fewer if we're near the end) chunk = self._line_buffer[:self.chunk_size] self._line_buffer = self._line_buffer[self.chunk_size:] return chunk # --- Demo: create a temporary log file and iterate over it in chunks --- import tempfile import os # Write 12 fake log lines to a temp file log_lines = [f"[INFO] Event #{i} processed successfully" for i in range(1, 13)] with tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) as temp_log: temp_log.write("\n".join(log_lines)) temp_filepath = temp_log.name # Iterate over the log in chunks of 5 lines chunk_reader = FileChunkIterator(temp_filepath, chunk_size=5) for chunk_number, log_chunk in enumerate(chunk_reader, start=1): print(f"--- Chunk {chunk_number} ({len(log_chunk)} lines) ---") for log_entry in log_chunk: print(f" {log_entry}") os.unlink(temp_filepath) # Clean up the temp file
[INFO] Event #1 processed successfully
[INFO] Event #2 processed successfully
[INFO] Event #3 processed successfully
[INFO] Event #4 processed successfully
[INFO] Event #5 processed successfully
--- Chunk 2 (5 lines) ---
[INFO] Event #6 processed successfully
[INFO] Event #7 processed successfully
[INFO] Event #8 processed successfully
[INFO] Event #9 processed successfully
[INFO] Event #10 processed successfully
--- Chunk 3 (2 lines) ---
[INFO] Event #11 processed successfully
[INFO] Event #12 processed successfully
Generator Functions — The Shortcut Python Gives You
Writing a full iterator class is powerful, but verbose. Python gives you a shortcut: generator functions. Any function that contains a yield statement automatically becomes a factory for iterator objects called generators. Python handles all the __iter__ and __next__ plumbing for you.
Under the hood, calling a generator function doesn't execute the body at all — it returns a generator object. Each call to next() on that object resumes execution from the last yield, suspending again at the next one. This is exactly the same pause-and-resume behaviour as our custom iterator, but expressed in a fraction of the code.
The real-world sweet spot for generators is producing sequences that are either very large or computationally expensive — think paginated API responses, mathematical series, or streaming transformations. If your data source is 'pull-based' (you ask for the next item when you're ready), a generator is almost always the right tool.
import time # Simulates a paginated REST API that returns users in pages def mock_api_fetch(page_number, page_size=3): """ Pretend this is calling requests.get('https://api.example.com/users?page=N'). Returns a list of users for that page, or empty list when pages run out. """ all_users = [ {"id": 1, "name": "Alice Nakamura"}, {"id": 2, "name": "Ben Okafor"}, {"id": 3, "name": "Carmen Reyes"}, {"id": 4, "name": "David Chen"}, {"id": 5, "name": "Elena Petrov"}, {"id": 6, "name": "Farhan Malik"}, {"id": 7, "name": "Grace Osei"}, ] start_index = (page_number - 1) * page_size return all_users[start_index : start_index + page_size] def paginated_user_stream(page_size=3): """ Generator that fetches users page-by-page. Callers never need to know pagination exists — they just iterate. """ current_page = 1 while True: print(f" [Generator] Fetching page {current_page} from API...") page_data = mock_api_fetch(current_page, page_size) if not page_data: # Empty page means no more data print(" [Generator] All pages consumed — raising StopIteration") return # 'return' inside a generator raises StopIteration automatically for user in page_data: yield user # Suspend here, hand one user to the caller current_page += 1 # Only advance the page after all users on it are yielded # The caller just iterates — zero pagination logic leaks out print("Streaming all users from API:\n") for user in paginated_user_stream(page_size=3): print(f" Processing user: {user['name']} (ID: {user['id']})") print("\nDone — all users processed.")
[Generator] Fetching page 1 from API...
Processing user: Alice Nakamura (ID: 1)
Processing user: Ben Okafor (ID: 2)
Processing user: Carmen Reyes (ID: 3)
[Generator] Fetching page 2 from API...
Processing user: David Chen (ID: 4)
Processing user: Elena Petrov (ID: 5)
Processing user: Farhan Malik (ID: 6)
[Generator] Fetching page 3 from API...
Processing user: Grace Osei (ID: 7)
[Generator] Fetching page 4 from API...
[Generator] All pages consumed — raising StopIteration
Done — all users processed.
The Exhaustion Trap and the iter() Sentinel Form
Here's the behaviour that catches almost everyone at some point: iterators are one-shot. Once an iterator is exhausted, it stays exhausted. Calling iter() on an already-exhausted iterator just returns the same dead object — it does not reset. This is different from calling iter() on an iterable like a list, which creates a brand-new iterator.
This distinction has a practical consequence: if you pass an iterator (not an iterable) to two functions, the second one will silently get nothing. No error. Just an empty loop. These bugs are genuinely hard to track down.
Python also has a lesser-known second form of iter() — iter(callable, sentinel) — which wraps any zero-argument callable into an iterator that keeps calling it until the return value equals the sentinel. This is incredibly useful for reading data in fixed-size blocks, processing queue items, or any situation where you have a 'pull until done' data source.
# ── PART 1: The Exhaustion Trap ────────────────────────────────────────────── team_members = ["Priya", "Jordan", "Kwame"] # This is an ITERABLE team_iterator = iter(team_members) # This is an ITERATOR (stateful) print("First loop:") for member in team_iterator: print(f" Hello, {member}") print("\nSecond loop over the SAME iterator (iterator is exhausted):") for member in team_iterator: # This loop body never executes — silent failure! print(f" Hello again, {member}") print(" (nothing printed — iterator was already exhausted)") print("\nSecond loop over the ORIGINAL LIST (always works):") for member in team_members: # list is an iterable — fresh iterator each time print(f" Hello again, {member}") # ── PART 2: The iter(callable, sentinel) Form ───────────────────────────────── import io # Simulate reading binary data in 4-byte blocks from a stream binary_stream = io.BytesIO(b"TheCodeForge.io rocks!") print("\nReading binary stream in 4-byte blocks:") # iter(callable, sentinel): calls binary_stream.read(4) repeatedly # Stops automatically when read() returns b'' (empty bytes — end of stream) for block in iter(lambda: binary_stream.read(4), b""): print(f" Block: {block}") # Each block is exactly 4 bytes (except maybe last)
Hello, Priya
Hello, Jordan
Hello, Kwame
Second loop over the SAME iterator (iterator is exhausted):
(nothing printed — iterator was already exhausted)
Second loop over the ORIGINAL LIST (always works):
Hello again, Priya
Hello again, Jordan
Hello again, Kwame
Reading binary stream in 4-byte blocks:
Block: b'TheC'
Block: b'odeF'
Block: b'orge'
Block: b'.io '
Block: b'rock'
Block: b's!'
| Feature / Aspect | Iterable | Iterator |
|---|---|---|
| Required methods | __iter__() only | __iter__() and __next__() |
| Holds state between calls | No — stateless | Yes — remembers current position |
| Can be looped multiple times | Yes — creates a fresh iterator each time | No — exhausted after one full pass |
| Examples in Python stdlib | list, tuple, str, dict, set | file objects, enumerate(), zip(), map() |
| What iter() returns | A new iterator object | Itself (self) |
| Memory usage pattern | Usually stores all data | Usually produces one item at a time |
| Can be passed to for-loop? | Yes | Yes |
| Restarting iteration | Automatic — just loop again | Must call iter() on the source iterable again |
🎯 Key Takeaways
- An iterable produces iterators; an iterator IS the stateful cursor — confusing the two causes silent, hard-to-debug empty-loop bugs.
- Every Python for-loop is secretly calling iter() once and then next() on every cycle until StopIteration is raised — there is no other mechanism.
- Iterators are one-shot by design: once exhausted, they stay exhausted. Always pass the original iterable — not the iterator — when the same data needs to be traversed more than once.
- Generator functions (using yield) are shorthand for writing iterator classes — use them for clean, memory-efficient lazy sequences; use a full class when you need restartable iteration or complex internal state.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Iterating an exhausted iterator and expecting results — Symptom: a for-loop that runs but never executes its body, producing no output and no error — Fix: never store the result of iter() in a variable you plan to loop over more than once. Keep a reference to the original iterable (the list, tuple, or custom class) and call iter() fresh each time you need a new traversal.
- ✕Mistake 2: Forgetting to raise StopIteration in a custom __next__ — Symptom: an infinite loop when your data source runs dry, or a cryptic RecursionError if you accidentally call __next__ recursively — Fix: every code path through __next__ must either return a value or raise StopIteration. Add a guard at the top:
if self._position >= len(self._data): raise StopIterationbefore any return statement. - ✕Mistake 3: Treating a generator object as if it's reusable — Symptom: the second call to a function that returns a generator produces no results, or downstream code that depends on the data silently processes nothing — Fix: a generator function (using yield) returns a new generator object each time it's called. If you need to iterate the same data twice, call the generator function again to get a fresh generator, or convert the first pass to a list with list(my_generator()).
Interview Questions on This Topic
- QWhat's the difference between an iterable and an iterator in Python, and how does a for-loop use both of them internally?
- QIf you call iter() on an iterator object rather than a plain list, what does it return and why? What are the implications of this for writing reusable code?
- QYou have a generator function that yields results from a paginated API. A colleague calls it once, stores the result in a variable, and passes that variable to two different processing functions. What bug will they hit, and how would you redesign the code to fix it?
Frequently Asked Questions
Is every iterator also an iterable in Python?
Yes — by convention and by the protocol definition, every iterator must implement __iter__ returning self. This means you can pass an iterator directly to a for-loop or any function that expects an iterable. The reverse is not true: an iterable is not necessarily an iterator unless it also implements __next__.
What is the difference between a generator and an iterator in Python?
A generator is a specific type of iterator created either by a generator function (using yield) or a generator expression. All generators are iterators, but not all iterators are generators. The practical difference is implementation style: generators write themselves via yield, while iterators require explicit __iter__ and __next__ methods on a class.
Why does Python use StopIteration instead of returning None or a special value to signal the end?
Using an exception means the protocol works even when None or any other sentinel value is a legitimate item in your sequence. If your iterator yields None, returning None to signal 'done' would be ambiguous. StopIteration is unambiguous, and Python's for-loop catches it automatically so you never see it in normal usage — it's a control-flow mechanism, not an error.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.