Generators as semi-coroutines

developerWorks Linux Open source projects Charming Python Iterators and simple generators

What's New in Python 2.2

Generators is a very interesting feature of Python 2.2 that is essentially a semi-coroutines.

Generators are another new feature, one that interacts with the introduction of iterators.

You're doubtless familiar with how function calls work in Python or C. When you call a function, it gets a private namespace where its local variables are created. When the function reaches a return statement, the local variables are destroyed and the resulting value is returned to the caller. A later call to the same function will get a fresh new set of local variables. But, what if the local variables weren't thrown away on exiting a function? What if you could later resume the function where it left off? This is what generators provide; they can be thought of as resumable functions.

Here's the simplest example of a generator function:

def generate_ints(N):
    for i in range(N):
        yield i

A new keyword, yield, was introduced for generators. Any function containing a yield statement is a generator function; this is detected by Python's bytecode compiler which compiles the function specially as a result. Because a new keyword was introduced, generators must be explicitly enabled in a module by including a from __future__ import generators statement near the top of the module's source code. In Python 2.3 this statement will become unnecessary.

When you call a generator function, it doesn't return a single value; instead it returns a generator object that supports the iterator protocol. On executing the yield statement, the generator outputs the value of i, similar to a return statement. The big difference between yield and a return statement is that on reaching a yield the generator's state of execution is suspended and local variables are preserved. On the next call to the generator's .next() method, the function will resume executing immediately after the yield statement. (For complicated reasons, the yield statement isn't allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the interaction between yield and exceptions.)

Here's a sample usage of the generate_ints generator:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in generate_ints
StopIteration

You could equally write for i in generate_ints(5), or a,b,c = generate_ints(3).

Inside a generator function, the return statement can only be used without a value, and signals the end of the procession of values; afterwards the generator cannot return any further values. return with a value, such as return 5, is a syntax error inside a generator function. The end of the generator's results can also be indicated by raising StopIteration manually, or by just letting the flow of execution fall off the bottom of the function.

You could achieve the effect of generators manually by writing your own class and storing all the local variables of the generator as instance variables. For example, returning a list of integers could be done by setting self.count to 0, and having the next() method increment self.count and return it. However, for a moderately complicated generator, writing a corresponding class would be much messier. Lib/test/test_generators.py contains a number of more interesting examples. The simplest one implements an in-order traversal of a tree using generators recursively.

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Two other examples in Lib/test/test_generators.py produce solutions for the N-Queens problem (placing queens on an chess board so that no queen threatens another) and the Knight's Tour (a route that takes a knight to every square of an chessboard without visiting any square twice).

The idea of generators comes from other programming languages, especially Icon (http://www.cs.arizona.edu/icon/), where the idea of generators is central. In Icon, every expression and function call behaves like a generator. One example from ``An Overview of the Icon Programming Language'' at http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks like:

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring ``or'' is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python doesn't go nearly as far as Icon in adopting generators as a central concept. Generators are considered a new part of the core Python language, but learning or using them isn't compulsory; if they don't solve any problems that you have, feel free to ignore them. One novel feature of Python's interface as compared to Icon's is that a generator's state is represented as a concrete object (the iterator) that can be passed around to other functions or stored in a data structure.

See Also:

PEP 255, Simple Generators
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.

[Dec 20, 2001] Semi-coroutines in Python 2.2

Search Result 31

From: Steven Majewski (sdm7g@Virginia.EDU)
Subject: Re: (semi) stackless python
Newsgroups: comp.lang.python

View: Complete Thread (2 articles) | Original Format

Date: 2001-12-20 13:01:46 PST

 The new generators in Python2.2 implement semi-coroutines, not
full coroutines:  the limitation is that they always return to
their caller -- they can't take an arbitrary continuation as
a return target.

 So you can't do everything you can do in Stackless, or at least,
you can't do it the same way. I'm not sure what the limitations
are yet, but you might be surprised with what you CAN do.
 If all the generator objects yield back to the same 'driver'
procedure, then you can do a sort of cooperative multithreading.
In that case, you're executing the generators for their side
effects -- the value returned by yield may be unimportant except
perhaps as a status code.

[ See also Tim Peters' post on the performance advantages of
  generators -- you only parse args once for many generator
  calls, and you keep and reuse the same stack frame. So I
  believe you get some of the same benefits of Stackless
  microthreads. ]

Maybe I can find the time to post an example of what I mean.
In the mean time, some previous posts on generators might give
you some ideas.

<http://groups.google.com/groups?q=generators+group:comp.lang.python+author:Majewski&hl=en&scoring=d&as_drrb=b&as_mind=12&as_minm=1&as_miny=2001&as_maxd=20&as_maxm=12&as_maxy=2001&rnum=1&selm=mailman.1008090197.6318.python-list%40python.org>

<http://groups.google.com/groups?hl=en&threadm=mailman.996870501.28562.python-list%40python.org&rnum=6&prev=/groups%3Fas_q%3Dgenerators%26as_ugroup%3Dcomp.lang.python%26as_uauthors%3DMajewski%26as_drrb%3Db%26as_mind%3D12%26as_minm%3D1%26as_miny%3D2001%26as_maxd%3D20%26as_maxm%3D12%26as_maxy%3D2001%26num%3D100%26as_scoring%3Dd%26hl%3Den>

The ungrammatical title of that last thread above:
"Nested generators is the Python equivalent of unix pipe cmds."
might also suggest the solution I'm thinking of.


-- Steve Majewski