Introduction

This is the fourth and final post in our Python basics series. We have covered functions, error handling, and the standard library. Together with the earlier posts on basic syntax and OOP, you now have a solid foundation in Python.

Generators are the topic I have been looking forward to most. They are one of those features that, once you understand them, you start seeing everywhere — and you wonder how you ever managed without them. The idea is simple and beautiful: instead of computing an entire sequence up front and storing it in memory, you compute one value at a time, only when it is actually needed.

Lazy birds, in other words. They do not arrive all at once. They come one by one, when conditions are right.

What Is an Iterator?

Before generators, let us understand iterators, because a generator is a particular kind of iterator.

An iterator is any object that implements two methods: __iter__() (which returns the iterator itself) and __next__() (which returns the next value, or raises StopIteration when there are no more values).

You use iterators constantly without realising it. Every for loop in Python works by calling __next__() on an iterator:

birds = ["Eagle", "Pigeon", "Stork"]

# What a for loop actually does, manually
iterator = iter(birds)
print(next(iterator))  # Eagle
print(next(iterator))  # Pigeon
print(next(iterator))  # Stork

try:
    print(next(iterator))  # raises StopIteration
except StopIteration:
    print("No more birds.")

Eagle
Pigeon
Stork
No more birds.

The for loop does all of this for you invisibly. But knowing the mechanism matters when you write your own iterators and generators.

You can write a full iterator class with __iter__ and __next__ methods. It works, but it is verbose. Generators give you the same behaviour with a fraction of the code.

Generators with yield

A generator function looks like a regular function but uses yield instead of return. Each time yield is reached, the function pauses, sends the value to the caller, and resumes from exactly that point the next time next() is called:

def bird_migration(birds: list[str], destination: str):
    """Yield birds one by one as they complete their migration."""
    print(f"Migration to {destination} begins.")
    for bird in birds:
        print(f"  {bird} is on its way...")
        yield bird
        print(f"  {bird} has arrived.")
    print(f"Migration to {destination} complete.")


flock = ["Stork", "Swallow", "Warbler"]
migration = bird_migration(flock, "Amsterdam")

print("Waiting for first bird...")
first = next(migration)
print(f"First arrival: {first}\n")

print("Waiting for the rest...")
for bird in migration:
    print(f"Arrived: {bird}")

Waiting for first bird...
Migration to Amsterdam begins.
  Stork is on its way...
First arrival: Stork

  Stork has arrived.
Waiting for the rest...
  Swallow is on its way...
Arrived: Swallow
  Swallow has arrived.
  Warbler is on its way...
Arrived: Warbler
  Warbler has arrived.
Migration to Amsterdam complete.

Read that output carefully. The generator pauses after each yield and resumes when asked for the next value. The code before and after yield bird runs in two separate steps. This is fundamentally different from a function that returns a list all at once.

Calling a generator function does not execute any code at all — it returns a generator object. Execution only starts when you call next() for the first time.

Why Generators Save Memory

The practical importance of generators becomes clear with large data. Compare these two approaches:

import sys

# Eager: builds the whole list in memory
def all_wingspans_list(max_birds: int) -> list[int]:
    return [i * 10 for i in range(1, max_birds + 1)]

# Lazy: yields one value at a time
def all_wingspans_gen(max_birds: int):
    for i in range(1, max_birds + 1):
        yield i * 10


n = 1_000_000

list_version = all_wingspans_list(n)
gen_version  = all_wingspans_gen(n)

print(f"List size in memory: {sys.getsizeof(list_version):,} bytes")
print(f"Generator size:      {sys.getsizeof(gen_version):,} bytes")

List size in memory: 8,448,728 bytes   (~8 MB)
Generator size:              200 bytes

The generator holds almost no memory regardless of how many values it will eventually produce. For a million birds, or a billion sensor readings, or a file with ten million lines, this is not a convenience — it is the difference between a program that runs and one that runs out of memory.

Generator Expressions

Just as list comprehensions give you a concise syntax for lists, generator expressions give you the same for generators. The only difference is parentheses instead of square brackets:

birds = ["Eagle", "Pigeon", "Stork", "Swan", "Penguin", "Ostrich"]

# List comprehension — creates the whole list immediately
flying_list = [b for b in birds if b not in ("Penguin", "Ostrich")]

# Generator expression — creates nothing yet, computes on demand
flying_gen = (b for b in birds if b not in ("Penguin", "Ostrich"))

print(type(flying_list))   # <class 'list'>
print(type(flying_gen))    # <class 'generator'>

for bird in flying_gen:
    print(bird)

<class 'list'>
<class 'generator'>
Eagle
Pigeon
Stork
Swan

Generator expressions compose naturally. You can chain them together without ever building an intermediate list:

observations = [
    ("Eagle",   "Amsterdam", 3),
    ("Pigeon",  "Den Haag",  12),
    ("Heron",   "Amsterdam", 1),
    ("Coot",    "Den Haag",  8),
    ("Pigeon",  "Amsterdam", 7),
]

# Chain of generator expressions — nothing computed until consumed
amsterdam_obs = (obs for obs in observations if obs[1] == "Amsterdam")
large_flocks  = (obs for obs in amsterdam_obs if obs[2] >= 3)
names_only    = (obs[0] for obs in large_flocks)

print(list(names_only))

['Eagle', 'Pigeon']

Three transformations, no intermediate lists. Data flows through the pipeline one item at a time.

Infinite Sequences

One of the most striking things generators can do is represent sequences that never end. A list cannot be infinite. A generator can:

def bird_arrivals(species: str, start_day: int = 1):
    """Yield an infinite sequence of daily arrival counts."""
    import random
    random.seed(42)
    day = start_day
    while True:
        count = random.randint(0, 10)
        yield day, species, count
        day += 1


# Take just the first 5 days — the generator could go on forever
arrivals = bird_arrivals("Swallow")

print("First 5 days of swallow arrivals:")
for _ in range(5):
    day, species, count = next(arrivals)
    print(f"  Day {day}: {count} {species}(s)")

First 5 days of swallow arrivals:
  Day 1: 1 Swallow(s)
  Day 2: 6 Swallow(s)
  Day 3: 1 Swallow(s)
  Day 4: 3 Swallow(s)
  Day 5: 9 Swallow(s)

This kind of infinite generator is useful for simulations, event streams, sensor data, and anywhere you are processing data that arrives continuously rather than all at once.

itertools: Generator Combinators

The itertools module from the standard library is a collection of functions that work with iterators and generators. Together they form a kind of algebra for lazy sequences. Here are a few of the most useful:

import itertools

birds = ["Heron", "Coot", "Kingfisher"]
rings = ["red", "blue"]

# Cartesian product — every combination
print("All ringed birds:")
for bird, ring in itertools.product(birds, rings):
    print(f"  {bird} with {ring} ring")

print()

# Chain — iterate multiple sequences as one
early  = ["Swallow", "Warbler"]
late   = ["Stork", "Swift"]
all_migrants = itertools.chain(early, late)
print("All migrants:", list(all_migrants))

# islice — take a slice from an iterator without materialising it
def count_up(start=1):
    while True:
        yield start
        start += 1

first_ten = list(itertools.islice(count_up(), 10))
print("First ten:", first_ten)

# groupby — group consecutive elements by a key
observations = [
    ("Heron",  "Amsterdam"),
    ("Heron",  "Amsterdam"),
    ("Coot",   "Amsterdam"),
    ("Pigeon", "Den Haag"),
    ("Pigeon", "Den Haag"),
]
print("\nGrouped by species:")
for species, group in itertools.groupby(observations, key=lambda x: x[0]):
    locations = [obs[1] for obs in group]
    print(f"  {species}: {locations}")

All ringed birds:
  Heron with red ring
  Heron with blue ring
  Coot with red ring
  Coot with blue ring
  Kingfisher with red ring
  Kingfisher with blue ring

All migrants: ['Swallow', 'Warbler', 'Stork', 'Swift']
First ten: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Grouped by species:
  Heron: ['Amsterdam', 'Amsterdam']
  Coot: ['Amsterdam']
  Pigeon: ['Den Haag', 'Den Haag']

itertools.groupby requires the input to be sorted by the grouping key to work correctly — consecutive equal keys are grouped, so if the same species appears in non-consecutive positions it will appear as separate groups. Keep that in mind.

A Complete Migration Pipeline

Let us close with a generator pipeline that processes a stream of bird sightings — a real-world pattern you might use for log processing, CSV file parsing, or live sensor data:

import itertools
from dataclasses import dataclass
from datetime import date

@dataclass
class Sighting:
    species: str
    location: str
    count: int
    date: date


def read_sightings(raw_data: list[dict]):
    """Generator: parse raw dicts into Sighting objects."""
    for record in raw_data:
        try:
            yield Sighting(
                species=record["species"],
                location=record["location"],
                count=int(record["count"]),
                date=date.fromisoformat(record["date"]),
            )
        except (KeyError, ValueError) as e:
            print(f"Skipping malformed record {record}: {e}")


def filter_by_location(sightings, location: str):
    """Generator: yield only sightings from a given location."""
    return (s for s in sightings if s.location == location)


def large_flocks(sightings, min_count: int = 5):
    """Generator: yield only sightings with count >= min_count."""
    return (s for s in sightings if s.count >= min_count)


# --- Sample data with one malformed record ---
raw = [
    {"species": "Heron",      "location": "Amsterdam", "count": "3",  "date": "2026-05-10"},
    {"species": "Coot",       "location": "Amsterdam", "count": "12", "date": "2026-05-10"},
    {"species": "Pigeon",     "location": "Den Haag",  "count": "7",  "date": "2026-05-10"},
    {"species": "Kingfisher", "location": "Amsterdam", "count": "bad","date": "2026-05-10"},
    {"species": "Swan",       "location": "Amsterdam", "count": "6",  "date": "2026-05-10"},
]

# Build the pipeline — nothing executes yet
pipeline = read_sightings(raw)
amsterdam = filter_by_location(pipeline, "Amsterdam")
notable   = large_flocks(amsterdam, min_count=5)

# Consume — now it all runs
print("Notable Amsterdam sightings:")
for sighting in notable:
    print(f"  {sighting.species}: {sighting.count} birds")

Skipping malformed record {'species': 'Kingfisher', ...}: invalid literal for int()...
Notable Amsterdam sightings:
  Coot: 12 birds
  Swan: 6 birds

Three generators composed into a pipeline. The malformed record is skipped gracefully (thanks to the error handling from the previous post). Only the records that pass all filters are printed. And none of it builds an intermediate list.

Conclusion

Generators are, to me, one of the most elegant things about Python. The yield keyword looks small but the idea behind it — pause here, remember where I was, resume when asked — is powerful and applicable to a surprisingly large range of problems.

With this post the series is complete. Starting from basic syntax and data structures, through functions, error handling, and the standard library, and now to generators — you have the building blocks of real Python programs. The OOP post showed how to organise that knowledge into classes. The rest is practice and projects.

The next step? Pick something you want to build and build it. A bird tracking tool, a blog pipeline, a data processor. The learning that happens in a real project is different in kind from anything you get from a tutorial — it is the kind that sticks.

Good luck, and as always — let me know what you think!

Python Generators and Iterators: Lazy Birds

📚 This post is part of the "Python Basics" series