— python, iterators, functional programming — 6 min read
This article provides alternative ways to implement count
, cycle
and
repeat
generator functions that are a part of the itertools
module in the
Python Standard Library (PSL). The proposed solutions are based on the
principle of functional recursion and are similar to design patterns used in
functional programming. The main message of the article is, however, not to
propose potential improvements to the PSL but to challenge programmers and open
them to a new way of thinking about iterators. It also goes to show that in the
context of iterators the idea of functional recursion comes naturally with the
use of the yield from
statement. Hence, it's very unfortunate that Python
doesn't have better support for recursion.
Generator functions are specific functions that contain yield
or yield from
statements inside their function body. Unlike regular functions, generator
functions return generator objects when they are called. If you feel that you
would like to brush up your knowledge about generators take a look at my other
article called Going Over Iterators and
Generators.
Recursive generator functions are a subset of regular generator functions.
More specifically, recursive generator function is defined by calling itself
inside the body of the same generator function. The main problem for achieving
recursion with generator functions is the fact that they return a generator
instance which awaits for the next
built-in function to call it in order to
produce a value. This behaviour obviously hinders recursion. Therefore,
calling a generator function returns a generator that needs to be unpacked in
order to establish recursion. We'll see that in Python this can be achieved
using the yield from
statement.
The following subsection introduces the concept of the yield from
statement
which is essential for unpacking iterators (generators). Afterwards, we'll use
it to establish recursive iteration in recursive generator functions.
yield from
Statement: The Unpacking MechanismThe yield from
statement was introduced to delegate subgenerators in the
context of coroutines (see PEP
380). However, when it's used in
the context of simple generators (non-coroutine objects) its usage becomes
simple. Therefore, for the purpose of this article consider the yield from
statement as a way to unpack iterable objects (objects with __iter__
or
__getitem__
dunder methods) until they raise the StopIteration
exception at
which point the unpacking stops. Consider the following generator function:
1def unpack(iterable):2 yield from iterable
As long as we make sure not to pass a coroutine as function argument, the above generator function behaves the same as the following:
1def unpack_(iterable):2 for i in iterable:3 yield i
Both unpack
and unpack_
are generator functions that accept a single
argument and return a generator. To show that the functions produce the same
results the following example unpacks a tuple and save it in a list:
>>> c = (1, 2, 3,)>>> list(unpack(c))[1, 2, 3]>>> list(unpack_(c))[1, 2, 3]
Now let's create an iterator and pass it to the yield from
:
>>> i = iter(c)>>> list(unpack(i))[1, 2, 3]
This shows that the behaviour of the yield_from
statement is the same for
containers (tuple
, list
, range
) and iterable objects. The yield from
statement provides a simple framework for unpacking iterable objects until the
StopIteration
exception signals to stop unpacking.
In the following example we'll look at the recursive nature of the Matriyoshka doll and create a recursive generator function that unpacks the doll.
We'll use NamedTuple
type to model the doll and specify the size of the doll
in the field called size
. This field reflects the number of dolls that the
original holds inside of it. The smallest doll has has the size
field equal
to zero (i.e., it doesn't contain other dolls). Here's the model for a
matriyoshka doll:
1from typing import NamedTuple234class Matriyoshka(NamedTuple):5 size: int
For example Matriyoshka(size=2)
is an object that models a doll that has two
dolls inside it, namely Matriyoshka(size=1)
and Matriyoshka(size=0)
objects. Note that with the relationship between the objects is weak in the
sense that the original doll only has a larger size but doesn't explicitly
enclose the smaller dolls.
Let's create a recursive generator function which recursively opens each doll
until the smallest doll is reached (i.e., doll that has size
equal to zero):
1from typing import Iterator234def open_matriyoshka(m: Matriyoshka) -> Iterator[Matriyoshka]:5 yield m6 if m.size > 0:7 yield from open_matriyoshka(m._make([m.size - 1]))
This function is a recursive generator function because it calls itself inside
of the function body. When the generator function is called with one argument
it returns a generator object. Generator objects are iterable and therefore
interact nicely with the yield from
statement. The yield from
statement
unpacks the generator which is created by calling open_matriyoshka
with a
doll of decremented size as a function argument. The decrementing ensures that
the doll size becomes smaller. The recursive iteration continues until the
base case for recursion is reached (i.e., until the algorithm decrements the
doll size sufficiently many times to reach the smallest doll with the size
field equal to zero). Once the base case is reached the recursion stops.
Let's test our recursive generator function and show that it can be used to
unpack Matriyoshka
objects:
>>> next(g := open_matriyoshka(Matriyoshka(size=2)))Matriyoshka(size=2)>>> next(g)Matriyoshka(size=1)>>> next(g)Matriyoshka(size=0)>>> next(g)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIteration
Generator g
is created after calling open_matriyoshka
with an argument
Matriyoshka(size=2)
, i.e., object that represents a doll which has two dolls
inside it. Each time the next
built-in function is called on the generator
g
the code inside the generator function runs until the next yield
statement and produces a value which follows it. When the code reaches the
yield from
statement it creates a new generator and continues searching for
the next yield
statement inside it. When there are no more yield
statements
the original generator g
is exhausted and the StopIteration
exception is
raised.
In the following section we'll use the recursive mechanism to rewrite three
functions from the itertools
module.
count
, cycle
and repeat
Functionscount
, cycle
and repeat
are iterator factories defined in the itertools
module that are used for creating well-optimized iterators that are both fast
and memory efficient. The functions are implemented in C programming language
and live inside the
itertoolsmodule.c
file. For the sake of completeness every function from the module has an
equivalent Python implementation which complements the
documentation
of the module. Before reimplementing these functions we'll cover the basic use
cases and the way they are implemented according to the Python documentation:
count
creates an infinite counter >>> count(10) --> 10 11 12
13 14 ...
1def count(start=0, step=1):2 n = start3 while True:4 yield n5 n += step
cycle
indefinitely cycles over an iterable object >>> cycle("ABC") --> A B C A B C A ...
1def cycle(iterable):2 saved = []3 for element in iterable:4 yield element5 saved.append(element)6 while saved:7 for element in saved:8 yield element
repeat
repeats an object infinitely or finite number of times >>>
repeat(10, 3) --> 10 10 10
1def repeat(object, times=None):2 if times is None:3 while True:4 yield object5 else:6 for i in range(times):7 yield object
Note that these functions share the same design pattern. They instantiate
infinite iterators by dropping into an infinite while
loop (highlighted lines
in the above code examples). This ensures that the iterators always have a
yield
statement that produces a new value and that the iterator is unable to
raise the StopIteration
exception (i.e., to get exhausted). This particular
design, using an infinite while
loop, is an obvious solution that a lot of
programmers in the context of imperative programming are comfortable with.
Next we'll put on our "functional-programming" hats and reimplement these functions in the form of previously introduced recursive generator functions.
count
FunctionThe count
function is probably the simplest one to implement:
1from typing import Union23Number = Union[int, float]456def recount(x: Number, step: Number = 1) -> Number:7 yield x8 yield from recount(x + step, step)
When this generator function is called it creates a generator object. The
generator simply yields the first argument that is passed when the generator
function is called. Then it creates a new generator with an incremented value.
The new generator is unpacked using the yield from
statement. The code
continues to search through the newly created generator for the next yield
statement. That way new values are being produced. Each time a new value is
produces a new generators with an incremented value is created. Here's an
example:
>>> r = recount(10, 2)>>> next(r)10>>> next(r)12>>> next(r)14>>> next(r)16
We'll use the same idea to reimplement the cycle
and repeat
functions.
First the generator needs to yield a value and than it recursively creates a
new generator which gets unpacked.
cycle
FunctionLooking at the cycle
function you'll notice that there is a quirk. Before
cycle
drops into the infinite loop it saves elements from the iterable into a
list
type object. This ensures that the iterable over which the code cycles
doesn't get exhausted. This is not needed for container-type objects (e.g.,
list
, tuple
, range
) but is important for iterators and generators since
they get exhausted once there are no more yield
statements to produce new
values. To create recursive generator function we'll use a similar trick and
make a copy of the iterator. The idea is to exhaust one iterator copy and use
the other as input for the recursive generator function:
1from itertools import tee2from typing import Iterable345def recycle(iterable: Iterable) -> Iterable:6 i, i_copy = tee(iterable)7 yield from i_copy8 yield from recycle(i)
If the function didn't create a copy it would have only worked for container-type objects. This implementation, however, is able to infinitely cycle over any iterable object. We can test this by combining two recursive generator functions:
>>> g = recycle(open_matriyoshka(Matriyoshka(size=2)))>>> next(g)Matriyoshka(size=2)>>> next(g)Matriyoshka(size=1)>>> next(g)Matriyoshka(size=0)>>> next(g)Matriyoshka(size=2)
On the downside this recursive generator function comes with a performance overhead because it makes a copy of the iterable each time it enters a new cycle.
repeat
FunctionExamples of recount
and recycle
have shown how to create infinite recursive
generators. Reimplementation of repeat
poses a new problem. Since repeat
can create finite iterators the recursive generator function requires a base
case to stop the recursion. Take a look at the following implementation:
1from typing import Iterator, TypeVar23T = TypeVar('T')456def rerepeat(x: T, times: int = 0) -> Iterator[T]:7 yield x8 if (new_times := times - 1) != 0:9 yield from rerepeat(x, new_times)
Generators created by calling rerepeat
yield the first argument used during
the function call. Afterwards the times
argument gets decremented. For the
generators with positive times
the decrementing means that the value becomes
closer to one. When times
becomes one the base case is reached. The if
statement doesn't activate and the generator stops unpacking new generators.
However, for the generators with negative times
(or zero) the decrementing
leads to even smaller numbers which drive it away from the base case.
Consequently, the generator with times
lower or equal to zero are infinite
generators.
Here's a taste of what we can do with the rerepeat
function:
>>> '-'.join(rerepeat("NANA", 8)) + " BATMAN!"'NANA-NANA-NANA-NANA-NANA-NANA-NANA-NANA BATMAN!'
This works only because the times
argument is positive and the iterator is
finite. However, testing the same example with an infinite iterator would blow
up in our face!
The concept of functional recursion is applicable in the context of iterators
and results in the so-called recursive generator functions. We've shown that
functional recursion is a natural approach for creating both finite and
infinite iterators using the simplest use case of the yield from
statement.
It provides the unpacking mechanism for iterable objects and establishes the
flow of recursive iteration. This circumvents the need for loops in generator
functions.
We've used the concept of functional recursion to reimplemented pure functions
from the itertools
module. These exercises have shown how to set base case
for recursion. While implementing the recycle
function it was pointed out
that container-type objects, unlike iterators, cannot be exhausted. Therefore,
careful consideration has to be made to reflect the use case when designing a
recursive generator function.
It is well-known that recursion in Python is a crippled tool since it builds on top of the C stack. This is very unfortunate because it limits the application of recursive generator function. Until the problem with recursion gets solved recursive generator will only be a mental exercise aimed to help programmers expand their way of thinking in terms of tools that are used in functional programming.