— python, object-oriented programming, iterators, generators — 11 min read
This article explains the grounding structure of iteration in Python. The
article start by looking at the iterator protocol and implements iterator
objects using object-oriented programming (OOP). Afterwards, it introduces
generator functions and the yield
statement as well as generator expressions
and shows how these simplify the way iterators are created. Additionally, we'll
touch on some newer topics like retuning values from generators using the
return
statement. Finally, it gives a simple primer of how to use generators
in OOP.
All examples in this article require a version of Python > 3.3 with an
exception of a few examples that use the walrus operator (:=
) which is only
available in Python version > 3.8.
Because of its dynamic nature Python doesn't impose strong interfaces. However, there exist protocols and the so-called dunder methods, methods whose names start and end with a double underscore character, that are responsible for the way objects behave.
The __iter__
(and __getitem__
) dunder method is fundamental for iteration
because any object with this method is able to interact with the iter
built-in function. Objects that have an __iter__
method are also called
iterable objects (or iterables for short). The iter
function calls the
__iter__
method of an object and creates an iteration procedure that Python
follows in order to iterate over the object.
Here's a practical example, say we have a tuple t = (0, 1, 2,)
and we'd like
to iterate over its elements. We'd expect that in the first iteration Python
finds 0
, in the second iteration 1
, and in the third iteration 2
. After
iterating over the last element we'd expect from Python to stop the iteration
process. This logics behind the iteration is prepared by the t.__iter__
method
and it's called iterator object (or iterator for short).
Here's the code for our previous example:
>>> t = (0, 1, 2,)>>> iterator_t = iter(t)
The iter
function internally called t.__iter__
dunder method and returned
an iterator which was assigned to iterator_t
. Once the iterator is created
Python knows how to iterate over the iterable t = (0, 1, 2,)
, as we'll see.
Built-in types such as lists, tuples, dictionaries, sets and others, are
iterable objects and have both __iter__
or __getitem__
dunder methods.
Therefore, there's no need to implement these dunder methods for the built-in
types because Python already knows how to do this. On the other hand, for
user-defined types (classes) it's important to implement __iter__
methods
which return iterators in order to iterate over them (or use them in for
loop).
Next we'll learn about the structure of iterators and how to write them using OOP.
In order for any object to be an iterator it has to supports the iterator
protocol. There are
three rules which every iterator must satisfy: it must keep a state which tells
what is the current item in iteration process; it has to modify the state
inside the __next__
dunder method with next element in iteration process and
return it or raise the StopIteration
exception in case there are no more
elements; it must have an __iter__
method which returns the same iterator.
It's easy to show that tuple t = (0, 1, 2,)
doesn't respect the third rule of
the iterator protocol where as iterator_t
does:
>>> t = (0, 1, 2,)>>> t is (iterator_t := iter(t))False>>> iterator_t is iter(iterator_t)True
Because the iter
built-in function simply calls the t.__iter__
method, the
previous code block is semantically equivalent to:
>>> t = (0, 1, 2,)>>> t is (iterator_t := t.__iter__())False>>> iterator_t is iterator_t.__iter__()True
Additionally, we can show that our tuple doesn't have a __next__
method while
its iterator does:
>>> hasattr(t, "__next__")False>>> hasattr(iterator_t, "__next__")True
The same way the iter
function calls __iter__
dunder method, the next
built-in function calls __next__
dunder method. __next__
method does all of
the heavy lifting because it changes the state of an iterator and returns the
next element from it. Let's use it on our iterator to change its state and
return values:
>>> next(iterator_t)0>>> next(iteartor_t)1>>> next(iterator_t)2>>> next(iterator_t)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Notice that the only way to interact with an iterator is to make it produce a
new value using the next
function. Because of their simplicity iterators are
very fast and memory efficient and therefore they are used in many places in
Python where performance is important. Article called Revisiting the Mechanism
Behind the for
Statement explains how iterators are
used for iteration inside the for
loop.
Next, we'll implement the iterator protocol inside a class which we'll use to create iterator objects. There, we'll show how iterators work internally.
According to the iterator protocol we'll create a class called Countdown
.
This class will be used to instantiate iterators that count backwards from a
starting number to zero in steps of one. You can think of this class as
iterator factory that is used for creating new iterators.
Let's define our new class and its behaviour:
1class Countdown:2 def __init__(self, start):3 self._count = start45 def __iter__(self):6 return self78 def __next__(self):9 if self._count > 0:10 self._count, count = self._count - 1, self._count11 return count12 else:13 raise StopIteration
Internal state of the class is kept inside self._count
. This variable updates
only when the __next__
method is called by the next
function. Therefore, a
starting underscore in the name indicates that this is a private variable.
Countdown
class has three key features:
keeps the state inside the self._count
modifies the state only inside the __next__
, more specifically decrements the
self._count
every time the next
function is used, and raises the
StopIteration
exception when self._count
becomes lower than zero to signal
that iteration has ended
returns the same object with the __iter__
method
Notice that __iter__
method of iterator objects returns the same object
(iterator). Therefore, iterators are iterable objects but the opposite is not
true.
Let's create an instance of Countdown
class and test it:
>>> new_year = Countdown(3)>>> new_year is iter(new_year)True>>> next(new_year)3>>> next(new_year)2>>> next(new_year)1>>> next(new_year)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
This test shows that instance of the Countdown
class fulfills the iterator
protocol. Namely, it keeps the state and changes it each time Python asks for a
next element with the next
function. Also calling the iter
function with
iterator as an argument returns the same iterator.
It's important that iterator raises the StopIteration
exception when it
doesn't have more values to return. Iterators that have raised the
StopIteration
exception are called exhausted iterators. Previous examples
have shown that iterators instantiated with Countdown
class became exhausted
when their internal state (self._count
) was lower than zero. Exhausted
iterator cannot produce values. Repopulating an exhausted iterator is the same
as creating a new instance.
Interestingly, there exist infinite iterators. These iterators don't have a
condition that raises the StopIteration
exception. Let's write a class that
is used for creating infinite iterators that echoes whatever is passed to them:
1class Echo:2 def __init__(self, message):3 self._message = message45 def __iter__(self):6 return self78 def __next__(self):9 return self._message
Amazingly, the code inside Echo.__next__
method is simpler than the one in
Countdown.__next__
method. To create a class that instantiates infinite
iterators it's sufficient to remove the condition for raising the
StopIteration
exception.
Let's also make sure that it works:
>>> troll_kid = Echo("Why?")>>> next(troll_kid)'Why?'>>> next(troll_kid)'Why?'>>> next(troll_kid)'Why?'
We've used Echo
to create an infinite iterator that represents a kid which
always asks the question "Why?" and trolls people that way. This iterator never
gets exhausted because it always knows how to create a new value when the
next
function requests it. Plugging this iterator inside the for
statement
would really troll Python because it would go into an infinite loop trying to
find a non-existing StopIteration
exception.
Python's Standard Library has a module called
itertools
that defines a
number of functions used for creating infinite iterators. The Echo
class
reimplemented a simpler version of the itertools.repeat
function
that creates an infinite iterator which repeats any object given to it.
You've probably realized that iterator objects have two states: suspended and
activated. When they're suspended they keep the internal state unchanged and as
soon as the next
function calls them they activate and produce a value.
After the activation they suspend again. The iterator protocol allows iterators
to produce values only when another part of code requires that, otherwise they
stay suspended. This kind of execution is referred to as lazy evaluation (or
call-by-need)1.
Lazy evaluation is heavily used for code optimization. Instead of creating memory-demanding data structures with elements (such as lists, tuples, ranges) and passing them around lazy evaluation allows us to pass data in sequences of values from one place to another without keeping it in memory after computation is completed.
It turns out that implementing the iterator protocol in a class is a lot of
work. This is where generator objects (or generators for short) come in!
Instead of creating iterators with a class we'll see that it is much easier to
create iterators using a special function that has the yield
statement in
function body. These special functions are called generator functions and are
able to create iterator objects called generator.
The yield
statement, just like the return
statement, is bound to the body
of a function and cannot be used outside of it:
>>> yield 42 File "<stdin>", line 1SyntaxError: 'yield' outside function
Functions that have the yield
statements inside the function body are called
generator functions. These are different from regular functions without the
yield
statement because they create generator objects. You can think of
generator functions as generator factories that instantiate generators.
Let's write a regular function and a generator function and call them to see the difference:
>>> def f():... return 42...>>> f()42>>> def generator_factory():... yield 42...>>> generator_factory()<generator object generator_factory at 0x7f49eb1f8f90>
The regular function returns the value which follows the return
statement in
the function definition while calling the generator function creates a
generator object (see the highlighted lines in the code block).
Like iterators, generators implement the iterator protocol. We'll create a generator by calling the generator function and show that generators pass all the tests like iterators:
>>> g = generator_factory() # create a generator>>> hasattr(g, "__iter__")True>>> g is iter(g)True>>> hasattr(g, "__next__")True>>> next(g)42>>> next(g)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
These are similar to the tests used to show that iterators support the iterator protocol.
A generator instance with __iter__
and __next__
methods is created after
calling a generator function. The iter
function called on a generator returns
the same generator instance. Additionally, calling the next
function
activates generators and executes the body of the generator function until the
next yield
statement is reached and produces the value which follows the
statement. Once it produces a value generator suspends and waits for the next
activation with the next
function. When there are no more yield
statements
in the function body generator gets exhausted and raises the StopIteratation
exception.
The previous section shows that generator functions provide a powerful
frameworks for creating iterator objects called generators. Therefore, we'll
rewrite the previous examples: iterator_t
, Countdown
, and Echo
using
generator functions.
iterator_t
instance:Let's write a generator function which creates generators that are similar to
iterator_t
:
1def gen_iterator_t():2 yield 03 yield 14 yield 25 return None
Generator functions are usually written without the return None
statement
because in Python any function that doesn't have the return
statement by
default returns None
. Python versions > 3.3 support the use of both return
and yield
statements inside the body of a generator function.
Let's call gen_iterator_t
to create a generator instance and, since
generators implement the iterator protocol, use the next
function to change
the generator state and produce values:
>>> iterator_t = gen_iterator_t()>>> next(iterator_t)0>>> next(iterator_t)1>>> next(iterator_t)2>>> next(iterator_t)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
The next
function activates the generator and advances the code inside the
generator function until the next yield
statement. When a yield statement is
reached the generator produces a value and suspends. It reactivates when it
gets called with the next
function. When there are no more yield
statements
or when the return
statement inside the generator function is reached the
generator raises the StopIteration
exception at which point it behaves like
an exhausted iterator.
Countdown
class:The following generator function is able to substitute the Countdown
class:
1def gen_countdown(start):2 count = start3 while count > 0:4 yield count5 count -= 16 return None
Generator functions, like regular functions, accept function arguments. Our
gen_countdown
is simpler to read and easier to maintain compared to the
Countdown
class which does the same job. Notice also that the yield
statement is used inside a loop, this is a very common pattern for writing
generator functions.
Let's show that generators created with gen_countdown
behaves like iterators
instantiated with the Countdown
class:
>>> new_year = gen_countdown(3)>>> new_year is iter(new_year)True>>> next(new_year)3>>> next(new_year)2>>> next(new_year)1>>> next(new_year)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Generator instance new_year
behaves exactly like the iterator instance
new_year
. Therefore, in Python generators and iterators are most often
considered synonyms.
Echo
class:Using infinite while
loops it's possible to write generator functions that
contain infinite number of yield
statements. This pattern supports
construction of infinite generators.
The following generator function creates generators similar to the iterators
created with the Echo
class:
1def gen_echo(message):2 while True:3 yield message
Notice that we've also intentionally left out the return None
. This is
probably for the better because this generator function cannot execute until
the function end. Thus, it's not able to return anything.
Let's create an infinite generator:
>>> troll_kid = gen_echo("Why?")>>> next(troll_kid)'Why?'>>> next(troll_kid)'Why?'>>> next(troll_kid)'Why?'
This generator cannot get exhausted because the yield
statement is stuck
inside the infinite loop. Next, we'll talk about retrieving return values from
generators once they're exhausted.
PEP 380 was implemented in Python
3.3 and ever since then generators are able to return a value once they're
exhausted. The return value of a generator is passed inside the
StopIteration
exception in an attribute called value
.
Take a look at the following generator function:
1from collections import namedtuple23Stat = namedtuple("Stat", ["total", "positive", "negative"])45def filter_from(words, f):6 c = 07 for total, w in enumerate(words, start=1):8 if f(w):9 c += 110 yield w11 return Stat(total, c, total-c)
This generator function filters out elements from words
according to the rule
in the second argument. The second argument is a function which takes a string
and returns a boolean. While generator is active it counts the number of total
elements and the ones that have passed through the filter. Once generator is
exhausted it returns Stat
namedtuple with statistics about the total and the
filtered number of items.
Let's create a generator that searches for palindromes (symmetric words):
>>> p = filter_from(... words := "Boeing 737 landed at noon in Cairo.".split(),... lambda x: x == x[::-1]... )
The first argument is list of words from the sentence and the second argument
is lambda function that checks if a word is palindrome. Additionally, using the
walrus operator (:=
) we've assigned the list of words from the sentence to
variable words
so that it's easy to recreate the same generator once it's
exhausted.
Let's filter out palindromes with our generator and iterate over them with the
next
function until the generator gets exhausted:
>>> next(p)'737'>>> next(p)'noon'>>> next(p)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration: Stat(total=7, positive=2, negative=5)
Notice that when the generator gets exhausted the traceback message indicates
that the StopIteration
exception object carries the return value from the
generator. The StopIteration
object must be caught with the try
/except
statements before the return value of the generator is retrieved from the
attribute called value
.
Here's an example of a while
loop that captures the StopIteration
exception, retrieves the return value of the generator and breaks out of the
loop:
>>> p = filter_from(words, lambda x: x == x[::-1])>>> while True:... try:... v = next(p)... except StopIteration as exception:... print(return_value := exception.value)... break... else:... print(v)...737noonStat(total=7, positive=2, negative=5)
Returning values from generator functions is more practical when working in the context of coroutines. Although sometimes returning a value from a generator is useful because it allows programmer to reflect on generated elements to make further decisions.
Generator expressions further shorten the process of creating generators by delegating anonymous generators directly and without the need for calling generator functions. The syntax for generator expressions is similar to list comprehensions with a difference that generator expresses are created inside parenthesis "()" while list comprehensions are created inside square brackets "[]".
Let's create a generator instance similar to the one we've created using
gen_countdown
and show that the created generator supports the iterator
protocol:
>>> new_year = (i for i in reversed(range(1, 4)))>>> new_year is iter(new_year)True>>> next(new_year)3>>> next(new_year)2>>> next(new_year)1>>> next(new_year)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
With this generator expression we've iterated over every element of
reversed(range(1, 4))
and yielded it. Just like any other iterator object
generator expressions can be used inside the for
statement:
>>> for count in (i for i in reversed(range(1, 4))):... print(count)...321
Notice that generator expressions are anonymous just like lambda functions. Although many Pythonistas are dreaded by lambda functions they seem to be happy using generator expressions, but when there is a need to reuse a generator in different places it's almost always better to write a separate generator function.
Flexibility of generator expressions makes it easy to combine them with
function arguments. Similar to list comprehensions, we can crate lists by
combining generator expressions and the list
function, see here:
>>> list(i for i in reversed(range(1, 4)))[3, 2, 1]
Additionally, generator expressions support the if
statement which allows
them to behave like the filter
built-in function. Let's filter out even
arguments from the previous example and create a list with only odd numbers:
>>> list(i for i in reversed(range(1, 4)) if i % 2 != 0)[3, 1]
For exercise, rewrite troll_kid
generator using a generator expression (hint:
use
itertools.count
or itertools.repeat
)2.
More information and better examples of generator expressions are found in PEP 289 which was written by the ever-amusing Python core developer Raymond Hettinger (check out his amazing YouTube videos).
It was previously mentioned that __iter__
method must always return an
iterator. Because generator functions return generators (which are identical to
iterators) they are suitable for implementing __iter__
method. This is the
preferred way of creating user-defined class for iterable objects.
Let's create a container class PicnicBasket
which has an __iter__
method.
This means that instances of this class are going to be iterable objects:
1class PicnicBasket:2 def __init__(self, *content):3 self._content = content45 def __iter__(self):6 for item in self._content:7 yield item
The __iter__
method is implemented as a generator function. Notice that the
__iter__
method doesn't contain the return
statement because it's expected
that it returns None
once its generator is exhausted.
Using generator expressions it's possible to rewrite PicnicBasket.__iter__
method:
class PicnicBasket: def __init__(self, *content): self._content = content def __iter__(self): return (item for item in self._content)
This example clearly shows that the __iter__
method returns a generator.
PicnicBasket
instances are iterable because they have __iter__
method which
returns an iterator. We'll create an instance of PicnicBasket
and than put in
a few items for an afternoon picnic:
>>> wood_basket = PicnicBasket("blanket", "lemonade" ,"sandwiches",)>>> for item in wood_basket:... print(item)...blanketlemonadesandwiches
Now our wood_basket
is an iterable object (iterable but not iterator) and
therefore it can be used inside the for
statement.
In this article we've distinguished between itearbles, objects with __iter__
or __getitem__
dunder methods, and iterators which are objects that implement
the iterator protocol. In the context of object-oriented programming we've
implemented classes that create iterators and talked about the concept of lazy
evaluation.
Generator functions provide a simple framework for creating iterators called
generators. Instead of implementing iterator protocol inside a class it's
easier and more readable to control the state of a generator with the yield
statement. Additionally, we're able to create anonymous generators using
generator expressions.
The last OOP example is only one of many where generators are handy. Now when you have the right tools and knowledge about iterables, iterators, and generators try to discover others!
If your looking for other examples where iterators are used in Python I'd
suggest reading Revisiting the Mechanism Behind the for
Statement. That article explains how the for
statement relies on
the iterator protocol for iterating over iterable objects.