Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Generators

When the body of a function contains one or more occurrences of the keyword yield, the function is known as a generator, or more precisely a generator function. When you call a generator, the function body does not execute. Instead, the generator function returns a special iterator object, known as a generator object (sometimes, somewhat confusingly, also referred to as just “a generator”), that wraps the function body, its local variables (including its parameters), and the current point of execution, which is initially the start of the function.

When you call next on a generator object, the function body executes from the current point up to the next yield, which takes the form:

yield expression

A bare yield without the expression is also legal, and equivalent to yield None. When a yield executes, the function execution is “frozen,” with current point of execution and local variables preserved, and the expression following yield returned as the result of next. When you call next again, execution of the function body resumes where it left off, again up to the next yield. When the function body ends or executes a return statement, the iterator raises a StopIteration exception to indicate that the iteration is finished. In v2, return statements in a generator cannot contain expressions; in v3, the expression after return, if any, is an argument to the resulting StopIteration.

yield is an expression, not a statement. When you call g.send(value) on a generator object g, the value of the yield is value; when you call next(g), the value of the yield is None. We cover this in “Generators as near-coroutines”: it’s the elementary building block to implement coroutines in Python.

A generator function is often a very handy way to build an iterator. Since the most common way to use an iterator is to loop on it with a for statement, you typically call a generator like this (with the call to next implicit in the for statement):

for avariable in somegenerator(arguments):

For example, say that you want a sequence of numbers counting up from 1 to N and then down to 1 again. A generator can help:

def updown(N):
    for x in range(1, N):
        yield x
    for x in range(N, 0, -1):
        yield x
for i in updown(3): print(i)                  # prints: 1 2 3 2 1

Here is a generator that works somewhat like built-in range, but returns an iterator on floating-point values rather than on integers:

def frange(start, stop, stride=1.0):
    while start < stop:
        yield start
        start += stride

This frange example is only somewhat like range because, for simplicity, it makes arguments start and stop mandatory, and assumes step is positive.

Generators are more flexible than functions that return lists. A generator may return an unbounded iterator, meaning one that yields an infinite stream of results (to use only in loops that terminate by other means, e.g., via a break statement). Further, a generator-object iterator performs lazy evaluation: the iterator computes each successive item only when and if needed, just in time, while the equivalent function does all computations in advance and may require large amounts of memory to hold the results list. Therefore, if all you need is the ability to iterate on a computed sequence, it is most often best to compute the sequence in a generator, rather than in a function that returns a list. If the caller needs a list of all the items produced by some bounded generator g(arguments), the caller can simply use the following code to explicitly request that a list be built:

resulting_list = list(g(arguments))

yield from (v3-only)

In v3 only, to improve execution efficiency and clarity when multiple levels of iteration are yielding values, you can use the form yield from expression, where expression is iterable. This yields the values from expression one at a time into the calling environment, avoiding the need to yield repeatedly. The updown generator defined earlier can be simplified in v3:

def updown(N):
    yield from range(1, N)
    yield from range(N, 0, -1)
for i in updown(3): print(i)                  # prints: 1 2 3 2 1

Moreover, the ability to use yield from means that, in v3 only, generators can be used as full-fledged coroutines, as covered in Chapter 18.

Generator expressions

Python offers an even simpler way to code particularly simple generators: generator expressions, commonly known as genexps. The syntax of a genexp is just like that of a list comprehension (as covered in “List comprehensions”), except that a genexp is within parentheses (()) instead of brackets ([]). The semantics of a genexp are the same as those of the corresponding list comprehension, except that a genexp produces an iterator yielding one item at a time, while a list comprehension produces a list of all results in memory (therefore, using a genexp, when appropriate, saves memory). For example, to sum the squares of all single-digit integers, you could code sum([x*x for x in range(10)]); however, you can express this even better, coding it as sum(x*x for x in range(10)) (just the same, but omitting the brackets), and obtain exactly the same result while consuming less memory. Note that the parentheses that indicate the function call also “do double duty” and enclose the genexp (no need for extra parentheses).

Generators as near-coroutines

In all modern Python versions (both v2 and v3), generators are further enhanced, with the possibility of receiving a value (or an exception) back from the caller as each yield executes. This lets generators implement coroutines, as explained in PEP 342. The main change from older versions of Python is that yield is not a statement, but an expression, so it has a value. When a generator resumes via a call to next on it, the corresponding yield’s value is None. To pass a value x into some generator g (so that g receives x as the value of the yield on which it’s suspended), instead of calling next(g), call g.send(x) (g.send(None) is just like next(g)).

Other modern enhancements to generators have to do with exceptions, and we cover them in “Generators and Exceptions”.


Top Visited
Switchboard
Latest
Past week
Past month

Old News ;-)

 

[Apr 3, 2007] Charming Python Python elegance and warts, Part 1

Generators as not-quite-sequences

Over several versions, Python has hugely enhanced its "laziness." For several versions, we have had generators defined with the yield statement in a function body. But along the way we also got the itertools modules to combine and create various types of iterators. We have the iter() built-in function to turn many sequence-like objects into iterators. With Python 2.4, we got generator expressions, and with 2.5 we will get enhanced generators that make writing coroutines easier. Moreover, more and more Python objects have become iterators or iterator-like; for example, what used to require the .xreadlines() method or before that the xreadlines module, is now simply the default behavior of open() to read files.

Similarly, looping through a dict lazily used to require the .iterkeys() method; now it is just the default for key in dct behavior. Functions like xrange() are a bit "special" in being generator-like, but neither quite a real iterator (no .next() method), nor a realized list like range() returns. However, enumerate() returns a true generator, and usually does what you had earlier wanted xrange() for. And itertools.count() is another lazy call that does almost the same thing as xrange(), but as a full-fledged iterator.

Python is strongly moving towards lazily constructing sequence-like objects; and overall this is an excellent direction. Lazy pseudo-sequences both save memory space and speed up operations (especially when dealing with very large sequence-like "things").

The problem is that Python still has a schizoaffective condition when it comes to deciding what the differences and similarities between "hard" sequences and iterators are. The troublesome part of this is that it really violates Python's idea of "duck typing": the ability to use a given object for a purpose just as long as it has the right behaviors, but not necessarily any inheritance or type restriction. The various things that are iterators or iterator-like sometimes act sequence-like, but other times do not; conversely, sequences often act iterator-like, but not always. Outside of those steeped in Python arcana, what does what is not obvious.

Divergences

The main point of similarity is that everything that is sequence- or iterator-like lets you loop over it, whether using a for loop, a list comprehension, or a generator comprehension. Past that, divergences occur. The most important of these differences is that sequences can be indexed, and directly sliced, while iterators cannot. In fact, indexing into a sequence is probably the most common thing you ever do with a sequence -- why on earth does it fall down so badly on iterators? For example:


Listing 9. Sequence-like and iterator-like things
>>> r = range(10)

>>> i = iter(r)

>>> x = xrange(10)

>>> g = itertools.takewhile(lambda n: n<10, itertools.count())

#...etc...

For all of these, you can use for n in thing. In fact, if you "concretize" any of them with list(thing), you wind up with exactly the same result. But if you wish to obtain a specific item -- or a slice of a few items -- you need to start caring about the exact type of thing. For example:


Listing 10. When indexing succeeds and fails
>>> r[4]
4

>>> i[4]
TypeError: unindexable object

With enough contortions, you can get an item for every type of sequence/iterator. One way is to loop until you get there. Another hackish combination might be something like:


Listing 11. Contortions to obtain an index
>>> thing, temp = itertools.tee(thing)

>>> zip(temp, '.'*5)[-1][0]
4

The pre-call to itertools.tee() preserves the original iterator. For a slice, you might use the itertools.islice() function, wrapped up in contortions.


Listing 12. Contortions to obtain a slice
>>> r[4:9:2]
[4, 6, 8]

>>> list(itertools.islice(r,4,9,2))  # works for iterators
[4, 6, 8]

A class wrapper

You might combine these techniques into a class wrapper for convenience, using some magic methods:


Listing 13. Making iterators indexable
>>> class Indexable(object):
...     def __init__(self, it):
...         self.it = it
...     def __getitem__(self, x):
...         self.it, temp = itertools.tee(self.it)
...         if type(x) is slice:
...             return list(itertools.islice(self.it, x.start, x.stop, x.step))
...         else:
...             return zip(temp, range(x+1))[-1][0]
...     def __iter__(self):
...         self.it, temp = itertools.tee(self.it)
...         return temp
...

>>> integers = Indexable(itertools.count())

>>> integers[4]
4
>>> integers[4:9:2]
[4, 6, 8]
      

So with some effort, you can coax an object to behave like both a sequence and an iterator. But this much effort should really not be necessary; indexing and slicing should "just work" whether a concrete sequence or a iterator is involved.

Notice that the Indexable class wrapper is still not as flexible as might be desirable. The main problem is that we create a new copy of the iterator every time. A better approach would be to cache the head of the sequence when we slice it, then use that cached head for future access of elements already examined. Of course, there is a trade-off between memory used and the speed penalty of running through the iterator. Nonetheless, the best thing would be if Python itself would do all of this "behind the scenes" -- the behavior might be fine-tuned somehow by "power users," but average programmers should not have to think about any of this.

In the next installment in this series, I'll discuss accessing methods using attribute syntax.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: November, 13, 2019