May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor


News Programming Languages Design Recommended Links Pipes Control Structures Closures
Coroutines in assembler Coroutines
in C
Coroutines in C++ Coroutines in Python Coroutines in Javascript Coroutines in Perl
"It is rather difficult to find short, simple examples of coroutines which illustrate the importance of the idea; the most useful coroutine applications generally are quite lengthy"

Knuth [The Art of Computer Programming]

"Subroutines are special cases of ... coroutines."

 —Donald Knuth[2]

Coroutines  are classic programming-in-the-large methodology that most know via pipes concept which was brought to mainstream by Unix.

The term "coroutine" was coined by Melvin Conway in 1958, when he invented the concept and applied it to the construction of a compiler. He is also famous for formulation of so called Conway Law (first published in Datamation in 1968 and popularized by F. Books in The Mythical Man-Month) and Think Pascal debugger

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.

The fate of coroutine concept was much more difficult then the fate of Conway Law, which instantly found almost universal recognition. Like often happens with innovative concepts, Conway has difficulties in publishing his paper and that happened much later in Conway's article "Design of a Separable Transition-Diagram Compiler", CACM 6 (1963), 396-408.  In this paper Melvin Convey considered coroutines as a natural compiler writing paradigm, which was tremendous breakthrough in compiler design for the time: you can write multipass compiler and debug it using intermediate files one stage at the time (as output of previous stage can just be saved). Then you can integrate all passes using coroutines achieving much greater efficiency and eliminating and writing to the disk except by the final stage of the compiler.

But the concept itself was discovered much earlier and probably by multiple people. A primitive form of coroutine linkage had already been noted briefly as a 'programming tip' in an early UNIVAC publication. Coroutines were independently studied by J. Erdwinn and J. Merner, at about the same time as Conway. The first academic discussion was in the first volume of the Art of Computer programming by Donald Knuth (1968). Later it was described with several interesting examples in the influential 1972 book Structured Programming  by Edsger W Dijkstra, C. A. R. Hoare, and Ole-Johan Dahl based on the latter experience with Simula.  

Coroutines is a natural programming technique for assembler programmers. Actually in some ways more natural then subroutines.  You just need to save return address from the point you are leaving coroutine (yielding control) and then jump to this saved address with appropriate saving/restoring register manipulations.

Like many other brilliant ideas in programming, the concept of coroutine was some far ahead of its time and outside very limited field of simulation languages was almost forgotten. It was never included in any major programming language (although PL/1 exceptions semantically were a special form of coroutine). The first really influential implementation of the concept happened only 15 years later (if we count from 1958) in Unix OS in the form of pipes. Despite its great value, as a programming language construct the concept of coroutines was like red hair stepchild: it took almost 50 years to integrate it into mainstream programming languages and that happened only with growth of popularity of scripting languages which happened relatively recently, say, less then decade ago. The first scripting language that included coroutines on the level of language constucts was probably Python where coroutines were available since version 2.5.  Rubi and Javascript followed.  Earlier attempts were by-and-large limited to specialized or general but far from mainstream languages such as Simula I (1965), Simula-67 (1970) and Modula 2 (1980).

The reason for that is that coroutines can't be implemented using stack which was standard, dominant implementation of subroutines since the days of Algol-60. There were a couple or "deviants", but of them only Modula-2 reached some level of popularity (Simula-67 has a general purpose language too, but never managed to became a popular language). The only place where coroutines were actively used in programming from 1973 till, say, 2003 was Unix shell programming.

The good things never disappeared and situation changed to the better 50 years later with scripting languages coming into mainstream.  Currently (as of 2012) coroutines are available in three major scripting languages:  JavaScript (since 1.7),  Python (since version 2.5) and Ruby (since 1.9).  In Perl 5 they are still are not first class construct but available via coro package. One huge blunder of Java design is that despite the fact that the language has heap as the main variable allocation mechanism it does not include coroutines.  But, in a way, Java is the "language for mediocrity, created by mediocrity" (aka new Cobol). Despite early hype it is a deeply flawed language that though out a lot of valuable parts of PL/1-C-C++ line of languages.  In this sense this is far from surprising ;-).

In languages that does not support coroutines directly, threads can serve as a possible workaround, although threads separate namespaces more then coroutine, denying some elegance of the concept. And in the most primitive form coroutines can be implemented using switch statement with one static variable that remember the last "yield" point and providing a jump to this point from the beginning of the subroutine.

Coroutines also have higher  computational costs the subroutines, as they can't rely of stack but on modern computers this is negligible penalty. While coroutines and threads may be used for the same purpose but they are definitively not the same in a very fundamental way. Coroutines share one processor with (at the level of the coroutines primitives) explicit transfer of control where threads may be executed in parallel or participate in time-sharing (without explicit control of transfer).   One problem with coroutines is that blocking I/O causes the coroutines to block.

Since coroutines can have more points of entry and exit than subroutines, it is possible to implement any subroutine as a coroutine. As  Donald Knuth remarked "Subroutines are special cases of ... coroutines."

Subroutines are special cases of ... coroutines

Exception handling is essentially based on the coroutines and it is difficult to implement them properly in case the language uses stack for implementation for subroutines. Exception is a coroutine that was activates when control reached it and then stopped. It is resumed only then exception occurs.  In this sense PL/1 was one of the first languages which supported coroutines, long before they make their way into Modula.

Exception handling is essentially based on the coroutines and it is difficult to be implemented properly in case the language does not support them.  In this sense PL/1 was one of the first languages which supported coroutines, long before they make their way into Modula.

Coroutines are more generic than subroutines. The lifespan of subroutines is dictated by stack which often contain temporary variables and is last in, first out (the last subroutine called is the first to return); in contrast, the lifespan of coroutines is completely arbitrary and such variables need to be allocated either statically or on the heap.

The start of a subroutine is the only point of entry. Subroutines can return control to the calling point only once for each invocation; in contrast, coroutines can return control (yield) several times. The start of a coroutine is the first point of entry and subsequent points of entry are following yield commands. The first execution of yield operation returns the result to the calling coroutine and gives it back control, like a usual subroutine. However, the next time the coroutine is called, the execution does not start at the beginning of the coroutine but at the first statement after the yield statement. So coroutine remembers the previous call context, while subroutine does not.

A good introduction to coroutines can be found in Wikipedia . Here is an abridged version

Coroutines are more generic than subroutines. The lifespan of subroutines is dictated by last in, first out (the last subroutine called is the first to return); in contrast, the lifespan of coroutines is dictated entirely by their use and need.

The start of a subroutine is the only point of entry. Subroutines can return only once; in contrast, coroutines can return (yield) several times. The start of a coroutine is the first point of entry and subsequent points of entry are following yield commands. Practically, yielding returns the result to the calling coroutine and gives it back control, like a usual subroutine. However, the next time the coroutine is called, the execution does not start at the beginning of the coroutine but just after the yield call.

Here is a simple example of how coroutines can be useful. Suppose you have a consumer-producer relationship where one routine creates items and adds them to a queue and another removes items from the queue and uses them. For reasons of efficiency, you want to add and remove several items at once. The code might look like this:

var q := new queue

coroutine produce
        while q is not full
            create some new items
            add the items to q
        yield to consume

coroutine consume
        while q is not empty
            remove some items from q
            use the items
        yield to produce

The queue is then completely filled or emptied before yielding control to the other coroutine using the yield command. The further coroutines calls are starting right after the yield, in the inner coroutine loop.

Although this example is often used to introduce multithreading, it is not necessary to have two threads to achieve this: the yield statement can be implemented by a branch directly from one routine into the other.

Detailed comparison

Since coroutines can have more points of entry and exit than subroutines, it is possible to implement any subroutine as a coroutine. "Subroutines are special cases of ... coroutines." —Donald Knuth[2]

Each time a subroutine is called (invoked), execution starts at the beginning of the invoked subroutine. Likewise, the first time a coroutine is invoked, execution starts at the beginning of the coroutine; however, each subsequent time a coroutine is invoked, execution resumes following the place where the coroutine last returned (yielded).

A subroutine returns only once. In contrast, a coroutine can return multiple times, making it possible to return additional values upon subsequent calls to the coroutine. Coroutines in which subsequent calls yield additional results are often known as generators.

Subroutines only require a single stack that can be preallocated at the beginning of program execution. In contrast, coroutines, able to call on other coroutines as peers, are best implemented using continuations. Continuations may require allocation of additional stacks and therefore are more commonly implemented in garbage-collected high-level languages. Coroutine creation can be done cheaply by preallocating stacks or caching previously allocated stacks.

... ... ...

Coroutines are useful to implement the following:

Programming languages supporting coroutines

Since continuations can be used to implement coroutines, programming languages that support them can also quite easily support coroutines.

Top updates

Bulletin Latest Past week Past month
Google Search


Old News ;-)

[Oct 04, 2012] Introduction to System Software, Chapter 15

A  good description of both coroutines and generators...

A block of code is said to be active while it is being executed. An activation record is a data structure which describes a single activation of a procedure, function or other block of code. The activation record of a block holds the values of all variables local to the block, and it holds sufficient information to establish an environment for interpreting references global to that block. In the context of procedures and functions, the times at which the activation records are allocated and deallocated determine the lifetimes of the local variables and whether or not the procedures or functions are allowed to be recursive.

Older languages such as FORTRAN had ill-specified local variable lifetimes, and most implementations of such languages use static allocation of activation records. As a result, classic FORTRAN implementations do not allow recursion, and local variables retain their values from one call to a procedure to the next. In contrast, the languages descended from Algol, including Pascal, C, Ada and Java, all allocate a new activation record with each call to a procedure and deallocate it on return. Thus, local variables in such languages do not retain their values from one call to the next but recursion is allowed.

When one procedure or function calls another, the activation of that procedure is not considered to be terminated; instead, it is considered to be suspended until the called procedure or function returns. As a result, when a procedure calls itself recursively, there will be a number of coexisting activations of that procedure, each with its own activation record. All but one of these activations will be suspended awaiting the termination of others, and the one which is not suspended will be the one which was called most recently.

The sequence of machine instructions which is used to transfer control to a procedure is called the calling sequence for that procedure. The code at the head of the called procedure to which the call transfers control is called the receiving sequence.

Other terms, such as register save sequence and entry sequence are sometimes used.
Finally, the code at the end of the procedure which returns control to the caller is called the return sequence. These sequences of instructions are strongly dependent on each other, so it is common to use the term calling sequences in a more generic sense to refer to all of them. The remainder of this section illustrates the concept of activation record management with a discussion of the calling sequences which might be used for procedures on a typical computer.

Most computers have a special procedure calling instruction which will be at the heart of any calling sequence. For the examples used here, the JSR instruction will be used, a mnemonic for jump to subroutine. The basic resources of the example machine and the semantics of the JSR, JMP and MOV instructions are given in Figure 15.1.

type { basic resources }
     word = integer;
     address = integer;

var  { registers in the central processor }
     R: array [0...] of word;
     PC: address;

     { main memory }
     M: array [address] of word;

{ definitions of instructions }

procedure JSR( var link: word; destination: address );
begin { jump to subroutine }
     link := PC;
     PC := destination;
end { JSR };

procedure JMP( destination: address );
begin { jump }
     PC := destination;
end { JMP };

procedure MOV( source: word; var destination: word );
begin { move data }
     destination := source;
end { MOV };

Figure 15.1. Formal definition of a partial machine architecture.

It will be assumed that the example machine has multiple general registers (the particular number of registers is unimportant), and it will be assumed that the basic process performed by the central processor is to repeatedly fetch instructions from M[PC], increment PC, and then execute the instruction as suggested by the procedural descriptions of the instruction semantics given in Figure 15.1.

As an example of a simple instruction, MOV(R[1],R[2]) will move the contents of the register R[1] into the register R[2]. Note that, in the examples given here, it will be assumed that the assembler uses a Pascal or C like syntax to specify operand addressing modes.

The JSR instruction first moves the old value of the program counter register PC into a linkage register (or memory location) specified by the programmer and then it transfers control to the destination address. Because of the convention that the program counter is incremented after fetching the instruction and prior to execution, the saved value of the program counter refers to the instruction immediately following the JSR instruction; that is, it is the return address.

The JSR instruction described here is very similar to the corresponding instructions on the IBM 360/370/390, the Power PC, the Silicon Graphics R series super-microprocessors, and other modern RISC machines. Many older architectures, notably the classic Intel microprocessors, the widely studied DEC PDP-11 and VAX architectures, and the old Motorola 680x0 microprocessors have procedure call instructions which are far more complex.

All calling sequences on the example machine will have a structure which is similar to that shown in Figure 15.2.

; calling sequence
  -- save any important register values
  -- place parameters in agreed upon registers
  JSR( R[link], <destination> )
  -- get results from agreed upon registers
  -- restore any important register values

; receiving sequence
  MOV( R[link], M[linksave] )
  -- save parameters (if needed)

; return sequence
  -- get parameters (if needed)
  JMP( M[linksave] )

Figure 15.2. A skeletal calling sequence.

Lines starting with -- in Figure 15.2 indicate sequences of additional instructions which might have to be added when this skeletal calling sequence is used as part of a real program. The particular register used for linkage is of no importance, and the memory location in which the linkage register is saved depends on the procedure being called. Some calling sequences allow the use of a linkage register to be bypassed entirely by having the procedure call instruction save the return address directly in memory, for example, on a stack.

The skeletal calling sequence shown in Figure 15.2 is typical of the sequences used for hand-coded assembly language procedures with small numbers of parameters and statically allocated activation records. If large numbers of parameters are to be passed, data structures must be constructed in memory to hold them, and if activation records are dynamically allocated, code must be added to the calling, receiving and return sequences to take care of this.

The division of responsibility between the calling, receiving, and return sequences may vary considerably. Activation records may be allocated by either the calling sequence or the receiving sequence, and they may be deallocated by either the return sequence or the part of the calling sequence after the JSR instruction.

Activation records for languages such as Pascal, C, C++ and Java can be allocated on a stack, since they are allocated in a strictly last-in first-out order. Although most language implementations use a stack, there is no requirement that a stack be used for this. It would be perfectly practical to allocate activation records on a heap, using general purpose allocate and deallocate routines, but calls to these routines must use a special calling sequence and their activation records must be allocated specially, since we cannot call the allocate routine in order to allocate an activation record for itself.

In addition to the local variables of the called procedure, each activation record must contain at least two other items: The identity of the activation record of the caller, and information about how to return control to the caller. These fields have many different common names, but the former is typically called the dynamic link or return link, while the latter is called either a return address or a continuation point. Unfortunately, the latter terms are not interchangible. Both refer to the place where the linkage register is saved, that is, the address in the calling routine where execution is to resume when the called routine returns. If the linkage register is saved in the called routine's activation record, it is called the return address; if it is saved in the calling routine's activation record, it is called the continuation point.

A typical activation record format using continuation points is shown in Figure 15.3.

       |        |
       |        |
       |========|  <------ bottom of activation record
       |  link  |  the return link
       |  cont  |  the continuation point
       |   .       parameters (if any)
           .    |
       |   .       local variables (if any)
           .    |
       |   .       temporary variables (if any)
           .    |
       |========|  <------ top of activation record
       |        |

Figure 15.3. The structure of an activation record.

During the execution of any procedure, some register must be reserved to point to the current activation record. This will be called R[local] in the examples used here, since this register is used as the index register for the local variables of the current procedure. Other names for this register abound, among them, FP, the frame pointer register; this name comes from the common use of the term stack frame to refer to an activation record allocated on a stack. The constants link and cont will be used for the offsets of the return link and continuation point fields of the activation record, so M[R[local]+link] and M[R[local]+cont] will be the memory locations holding the return link and continuation point of the current activation record.

The activation record illustrated in Figure 15.3 includes no assumptions about the use of a stack. While most modern languages push activation records onto a stack, nothing requires this! The calling sequence given in Figure 15.4, for example, makes no assumptions about the use of a stack.

; special calling sequence for allocate
  -- put size of desired block in R[temp]
  JSR( M[alloclink], allocate )
  -- pointer to allocated block is returned in R[temp]

; special calling sequence for deallocate
  -- put pointer to deallocated block in R[temp]
  JSR( M[alloclink], deallocate )

; regular calling sequence
  -- save any important register values in caller's activation record
  MOV( <destination activation record size>, R[temp] )
  JSR( M[alloclink], allocate )
  -- insert parameter values in new activation record
  MOV( R[local], M[R[temp]+link] )
  JSR( M[R[local]+cont], <destination> )
  JSR( M[alloclink], deallocate )

; receiving sequence
  MOV( R[temp], R[local] )

; return sequence
  MOV( R[local], R[temp] )
  MOV( M[R[temp]+link], R[local] )
  JMP( M[R[local]+cont] )

Figure 15.4. A calling sequence with activation records on the heap.

The calling sequence given in Figure 15.4 uses calls to allocate and deallocate to manage activation records on the heap; as noted previously, the calling sequences for these routines must be special. As coded here, they use a special dedicated location, M[alloclink] to store their return address, and they presumably have statically allocated activation records. This calling sequence requires that the caller do most of the work! The result is receiving and return sequences which are relatively simple, and the code to move parameter values into place can be simple because the activation record of the caller is available for computing parameter values at the same time that the activation record of the called routine is available for storing them.

One disadvantage of the calling sequence given in Figure 15.4 is that the caller must know the activation record size needed by the called routine in order to allocate it. Simple programming conventions can overcome this problem; for example, it can be required that each procedure declaration provide a symbolic definition of the activation record size for that procedure, for example, we might have the convention that the procedure proc has its activation record size given by procARS.

Many systems have calling sequences which are carefully designed to put most of the work in the receiving and return sequences. If this is done right, the result is a 'universal calling sequence' which allows programs written in any language to call procedures written in any other language. One way in which operating system designers encourage the use of a system wide standard calling sequence is by providing a set of macros, giving them names such as CALL, PROC and RETURN, which make the use of the standard sequences so convenient that few programmers invent other calling sequences. Where such macros are not provided, assembly language programmers are well advised to write their own in order to avoid problems with calling sequences. Programmers should be aware that the standard calling sequences provided by some machines are not particularly efficient, so it is sometimes profitable to design and use special purpose calling sequences.

The calling sequences used by languages such as Algol, Pascal, PL/I and Ada are more complex than that given above because of the problems of dealing with nested blocks. C and C++ have far more restricted nesting because they forbid functions from being declared local to other functions. The possibility of local procedures and functions introduces a need for a static link in each activation record which is used to identify the activation record of the block that statically encloses the block associated with the current activation record. Whenever a variable declared in one of the enclosing blocks is referenced, the chain of static links is followed out to the appropriate activation record. There is an alternative to the static link called a display, which is a global array of pointers to the current activation records of all accessible blocks, but this also requires a new entry in each activation record to save the previous display entry for each block. These issues are beyond the scope of this work.

Calling sequences using a stack are not difficult to design; space can be allocated by pushing the initial values of the allocated storage on the stack, and deallocation can be done by popping the stack. Figure 15.5 illustrates a typical structure for an activation record on a stack.

          |        |
          |        |  previous activation record
          |        |
R[local]->|  link  |  the return link        |
          |--------|                          - a stack mark
          |  cont  |  the continuation point |
          |   .       parameters, local variables, and temporaries (if any)
              .    |  top word on stack
R[stack]->|        |  first free word beyond stack top
          |        |  

Figure 15.5. A typical activation record on the stack top.

In Figure 15.5, it is assumed that the stack pointer register R[stack] normally points to the first free word beyond the top item on the stack. We will also assume that there are instructions PUSH and POP for operating on the stack, and that these increment and decrement the stack pointer, respectively. The basic structure of the activation record shown here is the same as that shown in Figure 15.3 except for the fact that activation records are embedded in the stack.

When activation records are pushed on a stack, the pair of locations containing the return link and the continuation point are sometimes called a stack mark. Consecutive activation records on the stack are separated by the stack marks, so these locations mark the point in the stack which determines how far the stack must be popped when deallocating an activation record. Some machines have special instructions for pushing a stack mark onto the stack.

A calling sequence using the stack based activation record format shown in Figure 15.5 is given in Figure 15.6.

; regular calling sequence
  -- save any important register values in caller's activation record
  MOV( R[stack], R[temp] )   |
  PUSH( R[local] )            - push the stack mark
  PUSH( 0 )                  |
  -- PUSH parameter values on stack
  JSR( M[R[local]+cont], <destination> )

; receiving sequence
  MOV( R[temp], R[local] )
  -- PUSH initial values of local variables

; return sequence
  MOV( R[local], R[stack] )  
  MOV( M[R[local]+link], R[local] )
  JMP( M[R[local]+cont] )

Figure 15.6. A stack based calling sequence.

The calling sequence in Figure 15.6 is a straightforward modification of that shown in Figure 15.4. It differs from the original in the use of PUSH instructions to build the stack mark and allocate storage for the new activation record. As a result of this, the caller need only push the stack mark and the required number of parameters on the stack; the called routine is responsible for pushing the local variables and any temporary storage. In fact, if the programming language does not advertise any particular initial value for local variables, the called routine may simply increment the stack pointer by the number of local variables needed in order to avoid the need to push default initial values one by one, only to have the programmer ignore the default initial values.

The return sequence shown in Figure 15.6 deallocates the activation record by reseting the stack pointer to point to the start of its activation record; note that the base register for the activation record was initially derived from the value of the stack pointer prior to the call, so this simple assignment to the stack pointer undoes the effects of any pushes since that time. It should be noted that the use of a stack pointer register is not always required when pushing activation records on a stack! Simpler, although less intuitive calling sequences make use of the fact that it is usually possible to statically determine the size of the stack within each activation record from an inspection of the code, so the stack top at any point in the code can be determined by a constant displacement from the base of the current activation record.

Some machines designed in the early 1970's include stack mechanisms which are particularly inconvenient from the point of view of activation record management. These include the DEC PDP-11, the R 6502, the Intel 8080 and its descendants up through the Pentium. The problem on these machines is that the procedure call instruction itself pushes the return address on the stack and the return instruction simply pops the stack into the program counter. As a result, the return address is typically above any parameters which were pushed on the stack instead of being incorporated into a stack mark.

The basic return instructions on these machines do nothing to solve the problem of activation record deallocation. As a result, the return sequence must typically pop all local variables off of the stack prior to the return instruction, and the calling sequence, after control returns, must finish deallocating the activation record by popping the parameters. Late in the history of these architecture, special instruction, for example, the MARK instruction on the PDP-11, were introduced to try to solve these problems, but very few programmers ever understand how to use such instructions, and as a result, they are rarely used.


A coroutine is a routine which does not enforce a last-in first-out ordering on control transfers that we associate with procedures and functions. As a result, activation records for coroutines cannot usually be allocated on a stack, but must be allocated from a heap. Like a procedure, a coroutine is a body of code which must be associated with an activation record before it is run. Unlike procedures, however, the activation records for coroutines are not automatically allocated as a result of control transfers. Instead, the user must explicitly allocate coroutine activation records prior to a transfer of control, and the user may transfer control to a given coroutine, running in a given activation record, a number of times before that coroutine terminates.

The transfer of control to a coroutine is accomplished by a resume operation. This causes the computation which initiated the control transfer to be suspended, as a coroutine, and it causes the resumption of the computation to which control is transferred. As a result, two or more coroutine activations may easily transfer control back and forth, where each time a coroutine regains control, it picks up at the point it was most recently executing. Note the distinction between the terms coroutine and coroutine activation. A coroutine is a block of code which may be activated, producing an activation to which control may be transferred. The alternate term coroutine instance is sometimes used as a synonym for coroutine activation, because the activation record for a coroutine can be viewed as an instance of a special kind of class.

When procedures were the only concern, there was little reason to choose between the use of continuation points and return addresses in designing activation records and calling sequences. With coroutines, on the other hand, the choice is obvious: Continuation points should be used. This is because, when control is transferred to a coroutine, it must be possible to find the correct value of the program counter for that coroutine without knowledge of how control was most recently transferred away from that coroutine. Recall that continuation points are stored in the activation record of the routine as control is passed to some other routine, while return addresses tend to be stored in the activation record of the routine to which control is being transferred.

As motivation for the development of coroutines, consider the problem which was originally presented in Figure 10.3 through Figure 10.5. The problem was to write a program which processed data which arrived at some rate at an input port, delete all occurences of 'z' from the input, double all occurences of 'x', and pass the data on to an output port. Input/output at these ports happens at varying rates, so queues must be used to buffer the transfer of data between the input/output ports and the applications program.

A coroutine-based solution to this problem is given in Figure 15.7, using a Pascal-like language supporting coroutines.

{ global variables }
var poll, appl: activation;

     coroutine pollingloop;
               if inputstatus = ready and not full( inputqueue )
                 then enqueue( inputqueue, inputdata );

               if outputstatus = ready and not empty( outputqueue )
                 then dequeue( outputqueue, outputdata );

               resume appl;
     end { pollingloop };

     coroutine application;
     var ch: char;
               while empty( inputqueue ) do resume poll;
               dequeue( inputqueue, ch );
               if ch \(!= 'z' then begin
                    if ch = 'x' then begin
                         while full( outputqueue ) do resume poll;
                         enqueue( outputqueue, ch );
                    while full( outputqueue ) do resume poll;
                    enqueue( outputqueue, ch );
     end { application };

begin { main program }
     poll := activate pollingloop;
     appl := activate application;
     resume poll;
end { main program }.

Figure 15.7. An example using coroutines.

In Figure 15.7, variables of type activation hold pointers to activation records, and the special operator activate takes the name of a coroutine as an operand and returns a pointer to an activation record for that coroutine. The activation record allocated by activate has a size sufficient for the coroutine in question and has its continuation point set to the first instruction of the body of that coroutine. The resume statement takes a pointer to an activation record as an argument and transfers control to that activation record.

The important thing to note about Figure 15.7 is that it allowed the problem originally posed in Chapter 10 to be solved with what look like two main programs, each of which is a coroutine. The first of these, pollingloop, emphasizes the structure of the problem from the point of view of performing input/output; thus, the control structure is the same as that of the main polling loop solution illustrated in Figure 10.2. The second coroutine, application, has a structure which emphasizes the problem which is to be solved; thus, the control structure is the same as that of the polling procedure solution shown in Figure 10.5.

The code for the application coroutine shown in Figure 15.7 could have been improved by the use of read and write procedures which would encapsulate the references to the input and output queues and hide the small polling loops which await status changes in these queues. Unfortunately, this change introduces the problem of how a procedure or function can be called from within a coroutine and resume another. This problem has been solved, but the solutions are beyond the scope of this discussion.

As with calling sequences, there are many possible coroutine resume sequences. The resume sequences discussed here use the activation record structure shown in Figure 15.3. This structure includes a link field which is not really needed in the context of "pure" coroutines, but is quite useful in situations where it is useful for a coroutine to be able to resume its caller. The structure also includes provisions for parameters. Parameters were not used or needed in the example given in Figure 15.7, but many coroutine systems allow parameters to be passed to a coroutine at the time of activation record allocation.

Two coroutine resume sequences are shown in Figure 15.8.

; resume sequence 1
-- save any important register values
MOV( R[local], R[temp] )
MOV( <destination activation record>, R[local] )
JSR( M[R[temp]+cont], M[R[local]+cont] )
-- restore saved register values

; resume sequence 2
-- save any important register values
MOV( <destination activation record>, R[temp] )
JSR( M[R[local]+cont], M[R[temp]+cont] )
MOV( R[temp], R[local] )
-- restore saved register values

Figure 15.8. Two coroutine resume sequences.

The first resume sequences shown in Figure 15.8 does all of the work before transferring control, while the second transfers control as soon as possible, requiring that the destination coroutine finish the job. These resume sequences are equally good, but it is important to note that they are not compatible; a coroutine which uses one of these resume sequences cannot resume one which uses the other. The second resume sequence may be somewhat confusing because the first two instructions are executed prior to a transfer of control to a different coroutine, while the third instruction is executed as a result of some other coroutine resuming the original one.


A generator is a special kind of coroutine which is similar to a procedure in many ways. As with coroutines, the allocation of an activation record for a generator must be done explicitly prior to any transfer of control to that generator, but unlike coroutines, the transfer of control to a generator activation resembles a procedure call. The difference between a call to a generator activation and a call to a procedure is that each call to a generator activation causes it to begin execution immediately after the point where it most recently returned.

The term 'generator' stems from the use of functions which generate sequences of values. Conventionally, such a function would be written using global variables to remember previous values to be used in generating the next. Generators allow these to be replaced with local variables which persist from one call to the generator to the next. The function and generator approaches are contrasted in Figure 15.9.

{ global variables needed to generate the Fibonacci numbers }
var i: integer { initially 0 };
    j: integer { initially 1 };

function fibonacci: integer;
var k: integer;
     fibonacci := i;
     k := i + j;
     i := j;
     j := k;
end { fibonacci };

generator function fibonacci: integer;
var i,j,k: integer;
     i := 0;
     j := 1;
          return i;
          k := i + j;
          i := j;
          j := k;
end { fibonacci };

Figure 15.9. Generators and functions contrasted.

Note that a 'generator function' is a generator which returns a value each time it returns to its caller.

Assuming that the global variables used by the function fibonacci in Figure 15.9 are properly initialized, and assuming that they are not accidentally used by other code, successive calls to this function will return successive values of the Fibonacci series. The generator returns these same values, but it has the advantage of having complete control over its local variables and their initialization. Note that, unlike the function, the generator cannot be called until an activation record has been allocated, for example, by an activate operation, and that the call to the generator must be done using something like the resume statement used with coroutines.

In an object oriented language, generators would be modeled as objects, where the local variables of the generator are private components of the object, and the generator function itself is the method of the object. This object oriented solution is intermediate between the simple functional model and the couroutine model for achieving the same effect.

The transfer of control to a generator is accomplished by a generator resume sequence, and the return of control from a generator to its caller is done by a generator return sequence. These strongly resemble hybrids of the sequences used with procedures and coroutines. As was the case with coroutines, the activation record structure shown in Figure 15.3 is also sufficient for generators. Two example sets of sequences for use with generators are shown in Figure 15.10.

; generator resume sequence 1
-- save any needed registers
MOV( R[local], R[temp] )
MOV( <destination activation record>, R[local] )
MOV( R[temp], M[R[local]+link] )
JSR( M[R[temp]+cont], M[R[local]+cont] )
-- restore any needed registers

; generator return sequence 1
-- save any needed registers
MOV( R[local], R[temp] )
MOV( M[R[temp]+link], R[local] )
JSR( M[R[temp]+cont], M[R[local]+cont] )
-- restore any needed registers

; generator resume sequence 2
-- save any needed registers
MOV( <destination activation record>, R[temp] )
JSR( M[R[local]+cont], M[R[temp]+cont] )
-- restore any needed registers

; generator return sequence 2
-- save any needed registers
MOV( R[local], R[temp] )
MOV( M[R[temp]+link], R[local] )
JSR( M[R[temp]+cont], M[R[local]+cont] )
MOV( R[local], M[R[temp]+link] )
MOV( R[temp], R[local] )
-- save any needed registers

Figure 15.10. Two sets of generator calling and return sequences.

Both of sets of sequences given in Figure 15.10 accomplish the same thing, but one sequence puts an equal burden on the caller and the generator itself, while the other puts most of the work in the generator, with very little work in the caller.

Note that the execution of the first half of a resume sequence will transfer control to the second half of some return sequence, and the execution of the first half of a return sequence will transfer control to the second half of a resume sequence. This symmetry illustrates the close relationship between generators and coroutines, but it also allows the relationship between generators and procedures to be clarified: A generator resume sequence is similar to a procedure calling sequence, and a generator return sequence is similar to a procedure return sequence followed immediately by a procedure receiving sequence, with the additional constraint that the procedure return sequence is modified to save the continuation point of the procedure so that the next call will transfer control to the following receiving sequence.

[ Oct 03, 2012 ] Coro-Intro.pod

Implementation of coroutines in Perl 5.

Coro started as a simple module that implemented a specific form of first class continuations called Coroutines. These basically allow you to capture the current point execution and jump to another point, while allowing you to return at any time, as kind of non-local jump, not unlike C's setjmp/longjmp. This is nowadays known as a Coro::State.

One natural application for these is to include a scheduler, resulting in cooperative threads, which is the main use case for Coro today. Still, much of the documentation and custom refers to these threads as "coroutines" or often just "coros".

A thread is very much like a stripped-down perl interpreter, or a process: Unlike a full interpreter process, a thread doesn't have its own variable or code namespaces - everything is shared. That means that when one thread modifies a variable (or any value, e.g. through a reference), then other threads immediately see this change when they look at the same variable or location.

Cooperative means that these threads must cooperate with each other, when it comes to CPU usage - only one thread ever has the CPU, and if another thread wants the CPU, the running thread has to give it up. The latter is either explicitly, by calling a function to do so, or implicity, when waiting on a resource (such as a Semaphore, or the completion of some I/O request). This threading model is popular in scripting languages (such as python or ruby), and this implementation is typically far more efficient than threads implemented in other languages.

Perl itself uses rather confusing terminology - what perl calls a "thread" (or "ithread") is actually called a "process" everywhere else: The so-called "perl threads" are actually artifacts of the unix process emulation code used on Windows, which is consequently why they are actually processes and not threads. The biggest difference is that neither variables (nor code) are shared between processes or ithreads.

Cooperative Threads

Cooperative threads is what the Coro module gives you. Obviously, you have to use it first:

   use Coro;

To create a thread, you can use the async function that automatically gets exported from that module:

   async {
      print "hello\n";

Async expects a code block as first argument (in indirect object notation). You can actually pass it extra arguments, and these will end up in @_ when executing the codeblock, but since it is a closure, you can also just refer to any lexical variables that are currently visible.

The above lines create a thread, but if you save them in a file and execute it as a perl program, you will not get any output.

The reasons is that, although you created a thread, and the thread is ready to execute (because async puts it into the so-called ready queue), it never gets any CPU time to actually execute, as the main program - which also is a thread almost like any other - never gives up the CPU but instead exits the whole program, by running off the end of the file. Since Coro threads are cooperative, the main thread has to cooperate, and give up the CPU.

To explicitly give up the CPU, use the cede function (which is often called yield in other thread implementations):

   use Coro;

   async {
      print "hello\n";


Running the above prints hello and exits.

Now, this is not very interesting, so let's try a slightly more interesting program:

   use Coro;

   async {
      print "async 1\n";
      print "async 2\n";

   print "main 1\n";
   print "main 2\n";

Running this program prints:

   main 1
   async 1
   main 2
   async 2

This nicely illustrates the non-local jump ability: the main program prints the first line, and then yields the CPU to whatever other threads there are. And there is one other, which runs and prints "async 1", and itself yields the CPU. Since the only other thread available is the main program, it continues running and so on.

Let's look at the example in more detail: async first creates a new thread. All new threads start in a suspended state. To make them run, they need to be put into the ready queue, which is the second thing that async does. Each time a thread gives up the CPU, Coro runs a so-called scheduler. The scheduler selects the next thread from the ready queue, removes it from the queue, and runs it.

cede also does two things: first it puts the running thread into the ready queue, and then it jumps into the scheduler. This has the effect of giving up the CPU, but also ensures that, eventually, the thread gets run again.

In fact, cede could be implemented like this:

   sub my_cede {

This works because $Coro::current always contains the currently running thread, and the scheduler itself can be called directly via Coro::schedule.

What is the effect of just calling schedule without putting the current thread into the ready queue first? Simple: the scheduler selects the next ready thread and runs it. And the current thread, as it hasn't been put into the ready queue, will go to sleep until something wakes it up. If. Ever.

The following example remembers the current thread in a variable, creates a thread and then puts the main program to sleep.

The newly created thread uses rand to wake up the main thread by calling its ready method - or not.

   use Coro;

   my $wakeme = $Coro::current;

   async {
      $wakeme->ready if 0.5 > rand;


Now, when you run it, one of two things happen: Either the async thread wakes up the main thread again, in which case the program silently exits, or it doesn't, in which case you get something like this:

   FATAL: deadlock detected.
         PID SC  RSS USES Description              Where
    31976480 -C  19k    0 [main::]                 [program:9]
    32223768 UC  12k    1                          []
    32225088 -- 2068    1 [coro manager]           []
    32225184 N-  216    0 [unblock_sub scheduler]  -

Why is that? Well, when the async thread runs into the end of its block, it will be terminated (via a call to Coro::terminate) and the scheduler is called again. Since the async thread hasn't woken up the main thread, and there aren't any other threads, there is nothing to wake up, and the program cannot continue. Since there are threads that could be running (main) but none are ready to do so, Coro signals a deadlock - no progress is possible. Usually you also get a listing of all threads, which might help you track down the problem.

However, there is an important case where progress is, in fact, possible, despite no threads being ready - namely in an event-based program. In such a program, some threads could wait for external events, such as a timeout, or some data to arrive on a socket.

Since a deadlock in such a case would not be very useful, there is a module named Coro::AnyEvent that integrates threads into an event loop. It configures Coro in a way that, instead of dieing with an error message, it instead runs the event loop in the hope of receiving an event that will wake up some thread.

Error Handling - Error Codes, Exceptions and Beyond

Dru’s Blog

About 10 months ago, I was writing a library. As I was writing it, I started to look at the whole issue of notifying the caller of errors. In typical fashion, I tried to optimize the error handling problem rather than just do the right thing, and just use error codes. I did a ton of research. Here is a current list of links and articles on the subject.

Getting Started

To get you started here are some good starting points. They both received a lot of attention on the internet.

A colorful post by Damien Katz.

A nice opinion piece that is pro-error codes by the famous Joel of Joel on Software.

Read my original post with excellent comments by Daniel Lyons, Paul Clegg, and Neville of the North.


The default and standard way of handling errors since the begining is to just use error codes with some convention of noticing them. For example, you could document the error condition with an api and then set a global variable for the actual code. It is up to the programmer calling the function to notice the error and do the right thing.

This is the technique used by operating systems and most libraries. Historically, these systems have never been consistent or compatable with other conventions. The most evolved system for this would probably be the Microsoft COM system. All functions return an HRESULT, which is essentially an error code.

The next system was the ‘exception-handling’ system. In this system errors cannot be ingored. Exception handlers are declared, optionally, at a given scope. If an exception is thrown (ie an error has occurred), handlers are searched up the stack until a matching handler is found.

IMHO, the exception system isn’t used properly in 90% of the cases. There is a fine balance between a soft error and something exceptional. The syntax also tends to get in the way for even the simplest of errors. I agree that there should be errors that are not ignored, but there has to be a better way.

So, old skoolers are ‘we use error codes, and we like them, dammit - aka, super disciplined programming, usually for real-time, embedded and smaller systems.

The new schoolers are, ‘you have to be kidding about error-codes, use exceptions’ - aks, yeah, we use exceptions, that is what the language gives us… and btw, no, we don’t mind typing on our keyboards a lot

Somehow, there has to be a better way. Maybe it will be system or application, specific.

Moving On - Old / New Ideas

If you don’t mind it being a C++ article, here is an amazing one from Andrei Alexandrescu and Petru Marginean. (Andrei is widely known for his great work on Policy Based design with C++, which is excellent) The artcle is well written and practical. In fact, the idea was so good, the language ‘D’ made it part of the language.

Here is an example:

void User::AddFriend(User& newFriend)
        pDB_->AddFriend(GetName(), newFriend.GetName());
    catch (...)

10 lines, and this is for the super-simple example.

void User::AddFriend(User& newFriend)
    ScopeGuard guard = MakeObjGuard(friends_, &UserCont::pop_back);
    pDB_->AddFriend(GetName(), newFriend.GetName());

In D it would look even cleaner:

void User::AddFriend(User& newFriend)
    scope(failure) friends_.pop_back();
    pDB_->AddFriend(GetName(), newFriend.GetName());

IMHO, I think exception handling will move more towards systems like this. Higher level, simpler and cleaner.

Other interesting systems are the ones developed for Common Lisp, Erlang, and Smalltalk. I’m sure Haskell has something to say about this as well.

The Common Lisp and Smalltalk ones are similar. Instead of forcing a mechanism like most exception handlers. These systems give the exception ‘catcher’ the choice of retry’ing or doing something different at the point of the exception. Very powerful.

Speaking of smalltalk, here is an excellent article called Subsystem Exception Handling in Smalltalk. I highly recommend it.

My Recomendation

If you are building a library, use error codes. Error codes are much easier to turn into exceptions by the language wrapper that will eventually be built on top.

When programming, don’t get trapped into think about the little picture. A lot of these errors are just pawns in the grand scheme of assuring that you have all of your resources in place before you begin your task at hand. If you present your code in that manner, it will be much easier to understand for all parties.

More Links

Error Codes vs. Exceptions by Damien Katz.

opinion piece that is pro-error codes by the famous Joel of Joel on Software.

Read my original post with excellent comments by Daniel Lyons, Paul Clegg, and Neville of the North.

Microsoft COM

D Language - Exception Safe Programming

Subsystem Exception Handling in Smalltalk - nice section on history as well

A nice long thread on comp.lang.c++.moderated

*Slightly Wacky, But Neat *

Knuth preprint P160 Efficient coroutine generation of constrained Gray sequences, aka ``Deconstructing coroutines'' (written with Frank Ruskey).

P160 Efficient coroutine generation of constrained Gray sequences, aka ``Deconstructing coroutines'' (written with Frank Ruskey).
TeX file deco.tex; MetaPost illustrations; another illustration; Compressed PostScript

An interesting thread at

From: Alan Kennedy (
Subject: Re: Help with coroutine-based state machines?
View: Complete Thread (21 articles)
Original Format
Newsgroups: comp.lang.python
Date: 2003-06-03 02:21:55 PST
[Alan spends the weekend upgrading an old laptop, so he can look at
this stuff in the comfort of the garden, which was bathed in glorious
sunshine for the entirety of the Irish Bank Holiday weekend B-). He
also prints and reads the writings of Christian Tismer[1], David
Mertz[2,3] and the Timbot[4,5], in relation to generators,
coroutines, continuations, pseudo/weightless/micro-threads, etc].

Steven Taschuk wrote:

 Lots of great stuff which illustrates the problems of calling from
 generator/coroutine code to functional code and vice versa.

I now understand. Thanks for taking the time to explain.

Thanks to your clear examples, I now picture coroutines in the
following way (which I hope is correct :-).

1. The "traditional" function/method call paradigm builds a stack
frame every time a function/method is called. Since this stack frame
holds the references to the local variables for the function/method,
it must be destroyed and collected if memory is not to be "leaked".
The only time when the stack frame can be destroyed is when the
references it contains are no longer required: i.e. after the
instance of the function/method call has finished.

2. The only method by which the "traditional" function call can
invoke another "traditional" function/method call is to call it
"recursively", thus causing the construction of another stack frame.
There is no mechanism for it to "substitute" the stack frame of the
called function "in place of" its own stack frame. (Although I
believe that Stackless can do this, after much rewriting and
reinvention of truth :-).

3. Because all "traditional" function/method calls, involve the
creation and eventual destruction of a stack frame, I label the space
in which they operate "Linear stack space".

4. There is another space, which I call "Generator space". When a
call is made into "Generator space", a stack frame is not
constructed: at least not every time. Instead, a resumable and
persistent stack frame is "resumed": this "resumable stack frame" was
created once, sometime in the past: it is termed, in current python
terminology, the generator-iterator.

5. When the code in "Generator space" (i.e. the generator-iterator)
is called, it resumes immediately after where it last 'yield'ed, and
continues until it 'yield's again, or returns or excepts. When it
'yield's, two things happen. A: The resumable stack frame is
"frozen", so that it can later be resumed again. and B: A python
object, which may be None, is transferred back to the caller.

6. For any call from "Linear Stack space" into "Generator space",
there is a crossover point, P. When the called code in "Generator
space" finishes executing, it can only enter back into "Linear stack
space" through point P: it cannot exit through any other point.
(According to the current python model).

7. If any calls are made from "Generator space" into "Linear stack
space", this leads to the creation of a stack frame, which must be
destroyed. If the called function/method in "Linear stack space" then
calls back into "Generator space", and the "Linear space" function is
not allowed to exit naturally, this is essentially unbound recursion,
which will lead eventually to a blown stack. (The non-functioning
Producer/Consumer example illustrates this: the code in "Generator
space" could be made to work by "traditionally" calling the write and
read functions, but this mutual recursion would eventually blow the
"Linear space" stack).

8. When a stack frame in "Generator space" 'yield's, it is possible
to adopt a convention for the 'yield'ed value: the "dispatcher"
convention. Under this convention, the value 'yield'ed can be a
reference to another resumable stack frame in Generator space. A
special "dispatch"-function is coded to recognise these references,
and to immediately call back into "Generator space". The newly
resumed stack frame is still limited by the same constraints as the
original call into "Generator space": it must eventually return to
"Linear stack space" through point P, in this case the dispatcher

9. A complex network of interacting "resumable stack frames", or
generators, can mutually 'yield' control to each other, assuming the
cooperation of the dispatcher function. Execution continually
"bounces" back and forth between the dispatch function (Point P, in
"Linear stack space") and various 'yield'ed resumable states in
"Generator space".

10. A network of interacting generator-iterators could potentially
execute much faster than an equivalent series of "traditional"
function/method calls in "Linear stack space", since there is no
overhead associated with con/destruction of stack frames, no parsing of
parameters, etc. Such a network of interacting generators are called
"coroutines", and require the presence of a dispatcher function: the
dispatcher function must be explicitly created by the programmer, as
python currently exists.

10a: refinement: In fact python generators are really
"semi"-coroutines, because of the requirement to keep 'yield'ing to a
dispatcher function. In a language which implemented
"full"-coroutines (maybe a __future__ version of python?), the
dispatcher would not be required (at least explicitly), and resumable
states could transfer control directly to any other resumable state
for which they hold a reference.

11. The efficiency benefits of generators can be also realised by
*not* adopting the dispatcher convention. Instead, a generator can
simply 'yield' the series of values it has been coded to generate:
the nodes in a data structure, the numbers in a computed series, etc.
This can lead to more natural code, particularly for the calling
function which utilises the series of values 'yield'ed: it is mostly
unaware that there is a resumable state involved in the generation of
the values. Simple example:

def squares(n):
    for i in xrange(n):
        yield n, n*n

# Instantiate a generator-iterator
sqgen = squares(100)
# We can do the following because the generator-iterator
# conventiently implements the iterator protocol.
for i, sq in sqgen():
    print "The square of %d is %d" % (i, sq)

12. It is possible to avoid the "mutual" recursion problem of calling
from "Generator space" into "Linear stack space", by the following

 o Moving all "traditional" function/method calls required from
   "Linear stack space" into "Generator space".
 o By expanding the "dispatcher" convention to allow generators to
   yield special values representing a function call or a return value.
   The dispatcher function must then explicitly manage a call stack.

13. It is possible to model ultra-lightweight threads by representing
each thread as a simple generator which steps through the states
required to implement the functionality required of the "thread". The
"execution pointer" of such threads is advanced simply by calling the
".next()" method of the "thread"s generator-iterator. (Incidentally,
as well as being highly efficient in terms of resource consumption,
these ultra-lightweight threads offer much finer control of
thread-prioritisation, thread-creation, destruction and "collection",

Now that I think I've got my head around it, I think I'm going to try
my hand at an example or two, which I will post to the newsgroup
(might be a few weeks, I'm kinda busy right now). The two main
interesting examples that come to mind are

1. A ultra-lightweight thread implementation of an
   asyncore/medusa/twisted style socket server.
2. A coroutine based version of something that is normally
   (relatively) resource-intensive: a series of SAX2 callbacks to a
   chain of ContentHandlers/Filters.

Lastly, I think I'm beginning to "get" continuations. Although the
examples given in Christian Tismers paper "Continuations and
Stackless Python"[1] are difficult to follow without the definition
of a continuation object (which seems to require intimate familiarity
with the Stackless implementation), I think I understand the general

And it's a principal I'm not really interested in pursuing, because I
can't see that it will make me any more productive, or my programs
more efficient (than they could be using the "coroutines" and
"weightless-threads" described above). And I can imagine that
continuation code could be a little brain-bending to write (thus
making me *less* productive %-), as this example of a while loop
(which I sort of understand) shows:

def looptest(n):
    this = continuation.current()
    k = this.update(n)
    if k:

But I can see the power inherent in the continuations paradigm.

Again, many thanks to all who responded to my original post.

Reading material.


kind regards,

alan kennedy
Terry Reedy (
Subject: Re: PEP 288: Generator Attributes
View: Complete Thread (11 articles)
Original Format
Newsgroups: comp.lang.python
Date: 2002-12-09 13:19:27 PST
"John Roth" <> wrote in message
> "Rocco Moretti" <> wrote
> > process of set attributes -> call processing function -> repeat is an
> > akward way of achieving what you really want to do - continue the
> > generator with given data. I'd agree with what Guido said on
> > Python-Dev: there is no advantage in this scheme over passing a
> > mutable dummy object when the generator is created (you could even
> > call it __self__ if you really wanted too ...). 
> I thoroughly agree.

Me too.

> with some bemusement, because, for me at least, the most
> intuitive thing to do is make yield a built in function rather than
> a statement.

But then we would need a another keyword (possible, of course) to tell
the compiler to change the generated bytecode.  It would also break
the parallel between 'return something' and 'yield something'.  The
only difference in execution is that yield does less -- by *not*
deleting the execution frame.

> That way, it can return a value, which is what
> was passed in on the reinvocation.

One of the design features of generators is that resumption is
extremely quick precisely because the argument processing and function
setup steps are bypassed.  Just restore the execution frame and go on
with the next statement [bytecode actually].

Another design feature is that they are meant to work seemlessly with
for statements, with only one explicit call (to produce the iterator
that 'for' needs).  In this context, passing in additional values is
not possible.

Another problem is with multiple yields.  Consider the following
hypothetical generator with the proposed yield() 'function' (see next
comment for why the '' marks):

def genf(a):
gen = genf(1)

Then the documentation must say: on the first call to, pass
one arg; on the next, pass two; finally, don't pass any.  Not good,

> The trouble with the whole thing is that conceptually, "yield" has two
> values: one that it returns to the caller, and one that it
> gets from the caller. It's a two way pipe.

(You mean, 'would be'.)  Pipes and such usually have explicit read and
write methods that make the order of operations clear.  The yield()
proposal hijacks function notation to squash a read and write
together -- with the order switched at the two ends.

To explain: y=f(x) usually means 'set y to a value depending on
(calculated from) x'  -- this is the meaning of 'function'.
Operationally, a function call means to send x to process f to
initialize the corresponding parameter,  suspend operation while f
operates, and set y to the value returned by f.  The combined
next(arg)/yield() idea warps and shuffles these semantics.

Let p = pipe or whatever:  Then (omitting suspend) y = yield(x) could
where (contrary to the implication of function notation) the value
read cannot depend on x since it is calculated at the other end before
the write.

The corresponding x = then means; p.write(y)
which makes the order of value passing the opposite of what a function
call means.  The only reason to write y is for a possible future call
to  I think it much better to separate the read and write
and pair the writing of y with the reading of the x that functionally
depends on the value written.

One could reverse the meaning of x = to be
but the x read would still not depend on the y written since the
latter would have to be set aside and ignored until the predetermined
x was sent back.  IE, y=yield(x) would have to mean and be implemented
  <hidden-slot> =; p.write(x); y=<hidden-slot>

In either case, there is a synchronization problem in that the values
sent to the generator are shifted by one call.  The last
call must send a dummy that will be ignored.  On the other hand, if,
for instance, there is one yield function which depends on the
variable reset by the yield function, then that variable must be
separately initialized in the initial genf() call.

So it arguably makes just as much sense to initialize via genf() with
a mutable with paired write/read, send/receive, or set/get methods or
the equivalent operations (as with dicts).  One can then make explicit
calls to pass data in either direction without fake 'function' calls.
If one pairs 'yield None' with '' or 'for dummy in
genf(mutable):', one can even do all value passing, in both
directions, via the two-way channel or mutable and use 'yield'
strictly for suspend/resume synchronization of the coroutines.

This proposal come close to asking for what the original Stackless did
with continuations.  These allowed some mind-boggling code.  Almost
too fancy for what Python is meant to be 8-).

Terry J. Reedy
From: Tim Peters (
Subject: PEP 255: Simple Generators, Revised Posting
This is the only article in this thread
View: Original Format
Newsgroups: comp.lang.python.announce
Date: 2001-06-24 15:31:44 PST
Major revision:  more details about exceptions, return vs StopIteration, and
interactions with try/except/finally; more Q&A; and a BDFL Pronouncement.  The
reference implementation appears solid and works as described here in all
respects, so I expect this will be the last major revision (and so also last
full posting) of this PEP.

The output below is in ndiff format (see Tools/scripts/ in your Python
distribution).  Just the new text can be seen in HTML form here:

"Feature discussions" should take place primarily on the Python Iterators list:

Implementation discussions may wander in and out of Python-Dev too.

  PEP: 255
  Title: Simple Generators
- Version: $Revision: 1.3 $
?                       ^
+ Version: $Revision: 1.12 $
?                       ^^
  Author: (Neil Schemenauer), (Tim Peters), (Magnus Lie Hetland)
  Status: Draft
  Type: Standards Track
  Requires: 234
  Created: 18-May-2001
  Python-Version: 2.2
- Post-History: 14-Jun-2001
+ Post-History: 14-Jun-2001, 23-Jun-2001
?                          +++++++++++++


      This PEP introduces the concept of generators to Python, as well
      as a new statement used in conjunction with them, the "yield"


      When a producer function has a hard enough job that it requires
      maintaining state between values produced, most programming languages
      offer no pleasant and efficient solution beyond adding a callback
      function to the producer's argument list, to be called with each value

      For example, in the standard library takes this approach:
      the caller must pass a "tokeneater" function to tokenize(), called
      whenever tokenize() finds the next token.  This allows tokenize to be
      coded in a natural way, but programs calling tokenize are typically
      convoluted by the need to remember between callbacks which token(s)
      were seen last.  The tokeneater function in is a good
      example of that, maintaining a state machine in global variables, to
      remember across callbacks what it has already seen and what it hopes to
      see next.  This was difficult to get working correctly, and is still
      difficult for people to understand.  Unfortunately, that's typical of
      this approach.

      An alternative would have been for tokenize to produce an entire parse
      of the Python program at once, in a large list.  Then tokenize clients
      could be written in a natural way, using local variables and local
      control flow (such as loops and nested if statements) to keep track of
      their state.  But this isn't practical:  programs can be very large, so
      no a priori bound can be placed on the memory needed to materialize the
      whole parse; and some tokenize clients only want to see whether
      something specific appears early in the program (e.g., a future
      statement, or, as is done in IDLE, just the first indented statement),
      and then parsing the whole program first is a severe waste of time.

      Another alternative would be to make tokenize an iterator[1],
      delivering the next token whenever its .next() method is invoked.  This
      is pleasant for the caller in the same way a large list of results
      would be, but without the memory and "what if I want to get out early?"
      drawbacks.  However, this shifts the burden on tokenize to remember
      *its* state between .next() invocations, and the reader need only
      glance at tokenize.tokenize_loop() to realize what a horrid chore that
      would be.  Or picture a recursive algorithm for producing the nodes of
      a general tree structure:  to cast that into an iterator framework
      requires removing the recursion manually and maintaining the state of
      the traversal by hand.

      A fourth option is to run the producer and consumer in separate
      threads.  This allows both to maintain their states in natural ways,
      and so is pleasant for both.  Indeed, Demo/threads/ in the
      Python source distribution provides a usable synchronized-communication
      class for doing that in a general way.  This doesn't work on platforms
      without threads, though, and is very slow on platforms that do
      (compared to what is achievable without threads).

      A final option is to use the Stackless[2][3] variant implementation of
      Python instead, which supports lightweight coroutines.  This has much
      the same programmatic benefits as the thread option, but is much more
      efficient.  However, Stackless is a controversial rethinking of the
      Python core, and it may not be possible for Jython to implement the
      same semantics.  This PEP isn't the place to debate that, so suffice it
      to say here that generators provide a useful subset of Stackless
      functionality in a way that fits easily into the current CPython
      implementation, and is believed to be relatively straightforward for
      other Python implementations.

      That exhausts the current alternatives.  Some other high-level
      languages provide pleasant solutions, notably iterators in Sather[4],
      which were inspired by iterators in CLU; and generators in Icon[5], a
      novel language where every expression "is a generator".  There are
      differences among these, but the basic idea is the same:  provide a
      kind of function that can return an intermediate result ("the next
      value") to its caller, but maintaining the function's local state so
      that the function can be resumed again right where it left off.  A
      very simple example:

          def fib():
              a, b = 0, 1
              while 1:
                  yield b
                  a, b = b, a+b

      When fib() is first invoked, it sets a to 0 and b to 1, then yields b
      back to its caller.  The caller sees 1.  When fib is resumed, from its
      point of view the yield statement is really the same as, say, a print
      statement:  fib continues after the yield with all local state intact.
      a and b then become 1 and 1, and fib loops back to the yield, yielding
      1 to its invoker.  And so on.  From fib's point of view it's just
      delivering a sequence of results, as if via callback.  But from its
      caller's point of view, the fib invocation is an iterable object that
      can be resumed at will.  As in the thread approach, this allows both
      sides to be coded in the most natural ways; but unlike the thread
      approach, this can be done efficiently and on all platforms.  Indeed,
      resuming a generator should be no more expensive than a function call.

      The same kind of approach applies to many producer/consumer functions.
      For example, could yield the next token instead of invoking
      a callback function with it as argument, and tokenize clients could
      iterate over the tokens in a natural way:  a Python generator is a kind
      of Python iterator[1], but of an especially powerful kind.

- Specification
+ Specification:  Yield
?              ++++++++

      A new statement is introduced:

          yield_stmt:    "yield" expression_list

      "yield" is a new keyword, so a future statement[8] is needed to phase
-     this in.  [XXX spell this out]
+     this in.  [XXX spell this out -- but new keywords have ripple effects
+     across tools too, and it's not clear this can be forced into the future
+     framework at all -- it's not even clear that Python's parser alone can
+     be taught to swing both ways based on a future stmt]

      The yield statement may only be used inside functions.  A function that
-     contains a yield statement is called a generator function.
+     contains a yield statement is called a generator function.  A generator
?                                                               +++++++++++++
+     function is an ordinary function object in all respects, but has the
+     new CO_GENERATOR flag set in the code object's co_flags member.

      When a generator function is called, the actual arguments are bound to
      function-local formal argument names in the usual way, but no code in
      the body of the function is executed.  Instead a generator-iterator
      object is returned; this conforms to the iterator protocol[6], so in
      particular can be used in for-loops in a natural way.  Note that when
      the intent is clear from context, the unqualified name "generator" may
      be used to refer either to a generator-function or a generator-

      Each time the .next() method of a generator-iterator is invoked, the
      code in the body of the generator-function is executed until a yield
      or return statement (see below) is encountered, or until the end of
      the body is reached.

      If a yield statement is encountered, the state of the function is
      frozen, and the value of expression_list is returned to .next()'s
      caller.  By "frozen" we mean that all local state is retained,
      including the current bindings of local variables, the instruction
      pointer, and the internal evaluation stack:  enough information is
      saved so that the next time .next() is invoked, the function can
      proceed exactly as if the yield statement were just another external

+     Restriction:  A yield statement is not allowed in the try clause of a
+     try/finally construct.  The difficulty is that there's no guarantee
+     the generator will ever be resumed, hence no guarantee that the finally
+     block will ever get executed; that's too much a violation of finally's
+     purpose to bear.
+ Specification:  Return
      A generator function can also contain return statements of the form:


      Note that an expression_list is not allowed on return statements
      in the body of a generator (although, of course, they may appear in
      the bodies of non-generator functions nested within the generator).

-     When a return statement is encountered, nothing is returned, but a
+     When a return statement is encountered, control proceeds as in any
+     function return, executing the appropriate finally clauses (if any
-     StopIteration exception is raised, signalling that the iterator is
?                                                           ------------
+     exist).  Then a StopIteration exception is raised, signalling that the
?    ++++++++++++++++
-     exhausted.   The same is true if control flows off the end of the
+     iterator is exhausted.   A StopIteration exception is also raised if
+     control flows off the end of the generator without an explict return.
-     function.  Note that return means "I'm done, and have nothing
?   -----------
+     Note that return means "I'm done, and have nothing interesting to
?                                                       +++++++++++++++
-     interesting to return", for both generator functions and non-generator
?    ---------------
+     return", for both generator functions and non-generator functions.
?                                                            +++++++++++
-     functions.
+     Note that return isn't always equivalent to raising StopIteration:  the
+     difference lies in how enclosing try/except constructs are treated.
+     For example,
+         >>> def f1():
+         ...     try:
+         ...         return
+         ...     except:
+         ...        yield 1
+         >>> print list(f1())
+         []
+     because, as in any function, return simply exits, but
+         >>> def f2():
+         ...     try:
+         ...         raise StopIteration
+         ...     except:
+         ...         yield 42
+         >>> print list(f2())
+         [42]
+     because StopIteration is captured by a bare "except", as is any
+     exception.
+ Specification:  Generators and Exception Propagation
+     If an unhandled exception-- including, but not limited to,
+     StopIteration --is raised by, or passes through, a generator function,
+     then the exception is passed on to the caller in the usual way, and
+     subsequent attempts to resume the generator function raise
+     StopIteration.  In other words, an unhandled exception terminates a
+     generator's useful life.
+     Example (not idiomatic but to illustrate the point):
+     >>> def f():
+     ...     return 1/0
+     >>> def g():
+     ...     yield f()  # the zero division exception propagates
+     ...     yield 42   # and we'll never get here
+     >>> k = g()
+     >>>
+     Traceback (most recent call last):
+       File "<stdin>", line 1, in ?
+       File "<stdin>", line 2, in g
+       File "<stdin>", line 2, in f
+     ZeroDivisionError: integer division or modulo by zero
+     >>>  # and the generator cannot be resumed
+     Traceback (most recent call last):
+       File "<stdin>", line 1, in ?
+     StopIteration
+     >>>
+ Specification:  Try/Except/Finally
+     As noted earlier, yield is not allowed in the try clause of a try/
+     finally construct.  A consequence is that generators should allocate
+     critical resources with great care.  There is no restriction on yield
+     otherwise appearing in finally clauses, except clauses, or in the try
+     clause of a try/except construct:
+     >>> def f():
+     ...     try:
+     ...         yield 1
+     ...         try:
+     ...             yield 2
+     ...             1/0
+     ...             yield 3  # never get here
+     ...         except ZeroDivisionError:
+     ...             yield 4
+     ...             yield 5
+     ...             raise
+     ...         except:
+     ...             yield 6
+     ...         yield 7     # the "raise" above stops this
+     ...     except:
+     ...         yield 8
+     ...     yield 9
+     ...     try:
+     ...         x = 12
+     ...     finally:
+     ...         yield 10
+     ...     yield 11
+     >>> print list(f())
+     [1, 2, 4, 5, 8, 9, 10, 11]
+     >>>


          # A binary tree class.
          class Tree:

              def __init__(self, label, left=None, right=None):
                  self.label = label
                  self.left = left
                  self.right = right

              def __repr__(self, level=0, indent="    "):
                  s = level*indent + `self.label`
                  if self.left:
                      s = s + "\n" + self.left.__repr__(level+1, indent)
                  if self.right:
                      s = s + "\n" + self.right.__repr__(level+1, indent)
                  return s

              def __iter__(self):
                  return inorder(self)

          # Create a Tree from a list.
          def tree(list):
              n = len(list)
              if n == 0:
                  return []
              i = n / 2
              return Tree(list[i], tree(list[:i]), tree(list[i+1:]))

          # A recursive generator that generates Tree leaves in in-order.
          def inorder(t):
              if t:
                  for x in inorder(t.left):
                      yield x
                  yield t.label
                  for x in inorder(t.right):
                      yield x

          # Show it off: create a tree.
          # Print the nodes of the tree in in-order.
          for x in t:
              print x,

          # A non-recursive generator.
          def inorder(node):
              stack = []
              while node:
                  while node.left:
                      node = node.left
                  yield node.label
                  while not node.right:
                          node = stack.pop()
                      except IndexError:
                      yield node.label
                  node = node.right

          # Exercise the non-recursive generator.
          for x in t:
              print x,

+     Both output blocks display:
+         A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

  Q & A

+     Q. Why not a new keyword instead of reusing "def"?
+     A. See BDFL Pronouncements section below.
-     Q. Why a new keyword?  Why not a builtin function instead?
+     Q. Why a new keyword for "yield"?  Why not a builtin function instead?
?                         ++++++++++++

      A. Control flow is much better expressed via keyword in Python, and
         yield is a control construct.  It's also believed that efficient
         implementation in Jython requires that the compiler be able to
         determine potential suspension points at compile-time, and a new
-        keyword makes that easy.
+        keyword makes that easy.  The CPython referrence implementation also
+        exploits it heavily, to detect which functions *are* generator-
+        functions (although a new keyword in place of "def" would solve that
+        for CPython -- but people asking the "why a new keyword?" question
+        don't want any new keyword).
+     Q: Then why not some other special syntax without a new keyword?  For
+        example, one of these instead of "yield 3":
+        return 3 and continue
+        return and continue 3
+        return generating 3
+        continue return 3
+        return >> , 3
+        from generator return 3
+        return >> 3
+        return << 3
+        >> 3
+        << 3
+     A: Did I miss one <wink>?  Out of hundreds of messages, I counted two
+        suggesting such an alternative, and extracted the above from them.
+        It would be nice not to need a new keyword, but nicer to make yield
+        very clear -- I don't want to have to *deduce* that a yield is
+        occurring from making sense of a previously senseless sequence of
+        keywords or operators.  Still, if this attracts enough interest,
+        proponents should settle on a single consensus suggestion, and Guido
+        will Pronounce on it.
+     Q. Why allow "return" at all?  Why not force termination to be spelled
+        "raise StopIteration"?
+     A. The mechanics of StopIteration are low-level details, much like the
+        mechanics of IndexError in Python 2.1:  the implementation needs to
+        do *something* well-defined under the covers, and Python exposes
+        these mechanisms for advanced users.  That's not an argument for
+        forcing everyone to work at that level, though.  "return" means "I'm
+        done" in any kind of function, and that's easy to explain and to use.
+        Note that "return" isn't always equivalent to "raise StopIteration"
+        in try/except construct, either (see the "Specification: Return"
+        section).
+     Q. Then why not allow an expression on "return" too?
+     A. Perhaps we will someday.  In Icon, "return expr" means both "I'm
+        done", and "but I have one final useful value to return too, and
+        this is it".  At the start, and in the absence of compelling uses
+        for "return expr", it's simply cleaner to use "yield" exclusively
+        for delivering values.
+ BDFL Pronouncements
+     Issue:  Introduce another new keyword (say, "gen" or "generator") in
+     place of "def", or otherwise alter the syntax, to distinguish
+     generator-functions from non-generator functions.
+     Con:  In practice (how you think about them), generators *are*
+     functions, but with the twist that they're resumable.  The mechanics of
+     how they're set up is a comparatively minor technical issue, and
+     introducing a new keyword would unhelpfully overemphasize the
+     mechanics of how generators get started (a vital but tiny part of a
+     generator's life).
+     Pro:  In reality (how you think about them), generator-functions are
+     actually factory functions that produce generator-iterators as if by
+     magic.  In this respect they're radically different from non-generator
+     functions, acting more like a constructor than a function, so reusing
+     "def" is at best confusing.  A "yield" statement buried in the body is
+     not enough warning that the semantics are so different.
+     BDFL:  "def" it stays.  No argument on either side is totally
+     convincing, so I have consulted my language designer's intuition.  It
+     tells me that the syntax proposed in the PEP is exactly right - not too
+     hot, not too cold.  But, like the Oracle at Delphi in Greek mythology,
+     it doesn't tell me why, so I don't have a rebuttal for the arguments
+     against the PEP syntax.  The best I can come up with (apart from
+     agreeing with the rebuttals ... already made) is "FUD".  If this had
+     been part of the language from day one, I very much doubt it would have
+     made Andrew Kuchling's "Python Warts" page.

  Reference Implementation

-     A preliminary patch against the CVS Python source is available[7].
+     The current implementation, in a preliminary state (no docs and no
+     focused tests), is part of Python's CVS development tree[9].
+     Using this requires that you build Python from source.
+     This was derived from an earlier patch by Neil Schemenauer[7].

  Footnotes and References

      [1] PEP 234,
      [3] PEP 219,
      [4] "Iteration Abstraction in Sather"
          Murer , Omohundro, Stoutamire and Szyperski

      [6] The concept of iterators is described in PEP 234

+     [9] To experiment with this implementation, check out Python from CVS
+         according to the instructions at


      This document has been placed in the public domain.

  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil

[Dec 26, 2001] coroutines and string processing

From: Tim Peters (
Subject: RE: Stackless & String-processing
Newsgroups: comp.lang.python
View: Complete Thread (32 articles) | Original Format

Date: 1999/07/15

[Neel Krishnaswami]
> ...
> I've been looking at Icon, and it occurs to me that if coroutines and
> generators were available at the Python level, it might yield a way of
> doing string processing that is more "Pythonic" than regexps.

Yes, and you can find much more about this in the very early days of the
String SIG archives.

> Regexps are nice because when you have a pattern that they can
> directly represent, then you can simply specify a pattern and then you
> don't have to worry about the tedious bookkeeping of looping over the
> string and keeping track of the state variable.
> However, when you need to match a pattern that a regexp can't match,
> then suddenly you need to break the string into tokens with regexps
> and then loop over the pieces and keep track of a bunch of state
> variables that don't necessarily correspond to the pieces you are
> actually interested in.
> This is unpleasant, because a) clumsy bookkeeping is bad, and b)
> there's two different sorts of syntax needed to do basically
> similar tasks.

The latter is one of the arguments Griswold gave in the paper that
introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
pattern and procedural sublanguages.

I don't think he'd make the same argument today!  Icon is an idiosyncratic
delight, but most string pattern matching was in fact easier (to write,
modify, or read) in SNOBOL4.  Once you start using Icon for complex pattern
matching, you'll soon discover that it's very hard to go back to old code
and disentangle "the pattern" from "the processing" -- *because* everything
looks the same, and it's all intermixed.  The distinct sublanguages in
SNOBOL4 enforced a clean separation; and that was more often an aid than a

Have noted before that writing string matching code in Icon feels a lot like
writing in an assembly language for SNOBOL4.  Of course the latter didn't
have Icon's power in other areas (e.g. it's at least possible to program
structural pattern-matchers in Icon, using generators to match e.g. tree
patterns; SNOBOL4's patterns started & ended with strings).

> If we could compose generators just like functions, then the bookkeeping
> can be abstracted away and the same approach will work for arbitrarily
> complicated parsing tasks.

The real advantage of regexps is speed, & that's probably while they'll
always be much more popular.  SNOBOL4 and Icon didn't define "bal" builtins
because you couldn't code that pattern yourself <wink>.  bal is beyond a
regexp's abilities, but it's still a simple kind of pattern, and just runs
too darned slow if implemented via the recursive definition as run with the
general backtracking machinery.

> So I think it would be nice if these lovely toys were available at
> the Python level. Is this a possibility?

I've been bugging Guido about generators since '91, but for the billion and
one uses other than string matching.  Anything is possible -- except that
I'll ever stop bugging him <wink>.

> (I'd love to be able to define coroutines in Python like so:
> def evens(z):
>     for elt in z:
>         if z % 2 == 0:
>             suspend(z)

How do you intend to resume it?

> It would probably require some rethinking of Python's iteration
> protocol though.)

That's not a hangup; Guido already knows how to do it cleanly.  Like any Big
New Feature there are real questions about costs vs benefits, and so far
this stuff has been widely viewed as "too esoteric".  I think it's natural
as breathing, to the point that a *non*-resumable function strikes me as
never inhaling <wink>.

For now, there's a working implementation of a generator protocol (somewhat
more flexible than Icon's) in the source distribution, under
Demo/threads/  I also posted a general (thread-based) coroutine
implementation a bit over 5 years ago.  Building these on Christian's
stackless Python instead should run much faster.

BTW, generators are much easier to implement than coroutines -- the former
don't require threads or stacklessness or continuations under the covers,
just some relatively straightforward changes to Python's current
implementation (Steven Majewski noted this 5 years ago).  This is why
generators work in all ports of Icon, but coroutines are an optional feature
that's supported only if a platform guru has taken the time to write the
platform-dependent context-switching assembly code Icon coexps require.

degeneratoredly y'rs  - tim

[Dec 26, 2001] Nice post with an explanation of coroutines to a person outside academic community

From: Daniel Larsson (
Subject: Re: Python and Java
Newsgroups: comp.lang.python, comp.lang.modula3
View: Complete Thread (22 articles) | Original Format

Date: 1996/07/03

Aaron Watters wrote:
> James C. Phillips wrote:
> >
> > Daniel Larsson wrote:
> > > Coroutines are still rare in newer languages. I'm sure there
> > > are more, but I know only of Modula-2 and BETA.
> >
> > For us non-computer-science-types, what is a coroutine?
> > I've never heard this term before.
> >
> > -Jim
> I've never really used them, but as I recall they are like
> subroutines, except that more than one coroutine can be
> active at once and one coroutine can explicitly give control
> to another or something... I personally don't understand why
> you need programming language features to emulate this kind of
> behaviour, but I'm probably ill informed and wrong.
> Maybe some expert can tell me: is there anything you can do
> with coroutines that can't be emulated directly with instances
> of classes in Python or M3?  Please educate...  -- Aaron Watters

I would hardly describe myself as an expert in coroutines, but

Suppose you have a binary tree class:

class Node:
   def __init__(self, elem, left=None, right=None):
      self.elem = elem
      self.left, self.right = left, right

class BinTree:
   def __init__(self):
      self.root = None

   def scan(self, node):
      if node:
	 suspend node.elem # Suspend coroutine and return value

   coroutine traverse(self):
      return None # Terminate coroutine

A coroutine has its own stack. When a coroutine is called, the
caller's stack is saved, and the execution is moved to the callee's
stack. When a coroutine suspends, stacks are reversed again.

In the above case, each call to traverse will do one inorder step
in the tree and return the value at that node. Here's how to use
it to print the contents of a binary trees:

tree = BinTree()
... # init tree
while 1:
   elem = tree.traverse()
   if elem == None: break
   print elem

This could certainly be implemented in Python without coroutines,
and I don't even know if this is even a good example of the
benefits of coroutines. Anyway, some problems where you would like
to use threads, might be easier to solve with coroutines, since
you don't have any synchronization problems (you have explicit
control of when switching between threads).

Hope I didn't confuse too many people out there...
Daniel Larsson, ABB Industrial Systems AB
From: Christian Tismer (
Subject: Re: Iterators & generators (RE: Real Problems with Python)
View: Complete Thread (21 articles)
Original Format
Newsgroups: comp.lang.python
Date: 2000/02/15

Andrew Cooke wrote:
> In article <000001bf768e$48e40580$45a0143f@tim>,
>   "Tim Peters" <> wrote:
> >    def traverse_post(self):
> >        for child in self.left, self.right:
> >            if child is not None:
> >                suspend child.traverse_post()
> >        suspend
> That finally hammered home to me just what continuations are all about.
> Does anyone have something similarly elegant that shows a coroutine?
> I've just had a look at the stackless python documentation and
> coroutines seem to be described as something coming out of a single
> detailed example.  Is there a simpler definition?  (I did look back
> through some posts on Deja, but there was nothing recent that seemed to
> explain what a coroutine actually is - sorry if I've missed something
> obvious).

What did you read, actually?
Here some pointers:
IPC8 paper:
The latter gives you a bit of understanding what a continuation is.

Tim Peters about coroutines can be found here:

More on coroutines by Sam Rushing:

On Scheme, continuations, generators and coroutines:

Revised Scheme 5 report:

The following books are also highly recommended:
- Andrew W. Appel, Compiling with Continuations, Cambridge University
Press, 1992
- Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes,
Essentials of Programming Languages, MIT Press, 1993
- Christopher T. Haynes, Daniel P. Friedman, and Mitchell Wand,
Continuations and Coroutines,  Computer Languages, 11(3/4): 143-153,
- Strachey and Wadsworth, Continuations: A mathematical semantics which
can deal with full jumps.  Technical monograph PRG-11, Programming
Research Group, Oxford, 1974.

I don't think this is easy stuff at all, and you can't expect
to find a simple answer by skimming a couple of web pages.
It took me a lot of time to understand a fair part of this,
and this is also a showstopper to get this stuff to be used.

> Also, comp.lang.lisp is currently dissing continuations.  Would that be
> because the alternative is to pass the code that will process the node
> into the iterator as a (first class) function?  Obviously, in this case,
> yes, but is that true in general (are there examples where continuations
> or coroutines make something possible that really is tricky to do in
> other ways)?

An iterator is quite an easy thing, and it can be implemented
without continuations rather easily. Continuations cover problems
of another order of magnitude. It is due to too simple examples
that everybody thinks that coroutines and generators are the
only target. Continuations can express any kind of control flow,
and they can model iterators, coroutines and generators easily.
The opposite is not true!

I know this isn't enough to convince you, but at the time I have
no chance. I need to build some very good demo applications
which use continuations without exposing them to the user,
and this is admittedly difficult.

ciao - chris

Christian Tismer             :^)   <>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Düppelstr. 31                :    *Starship*
12163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home

[May 15 2001] call-cc and coroutines in scheme

We add a a new special form, (coroutine (lambda (v) <body>)) that evaluates to a coroutine object.

Name a coroutine the same way you name an ordinary function; e.g.

 (define Coroutine-1 (coroutine (lambda (v) ...)))

The body of a coroutine is a non-terminating expression using standard Scheme constructs and one additional form This is (resume <co> <val>), where <co> refers to any other coroutine defined in the same way.

The parameter v is passed the first time coroutine is called or resumed.

Suppose coroutine A executes (resume x B).

[May 15 2001] coroutines for Ruby

PEP 219 -- Stackless Python

This PEP discusses changes required to core Python in order to efficiently support generators, microthreads and coroutines. It is related to PEP 220, which describes how Python should be extended to support these facilities. The focus of this PEP is strictly on the changes required to allow these extensions to work.

While these PEPs are based on Christian Tismer's Stackless[1] implementation, they do not regard Stackless as a reference implementation. Stackless (with an extension module) implements continuations, and from continuations one can implement coroutines, microthreads (as has been done by Will Ware[2]) and generators. But in more that a year, no one has found any other productive use of continuations, so there seems to be no demand for their support.

However, Stackless support for continuations is a relatively minor piece of the implementation, so one might regard it as "a" reference implementation (rather than "the" reference implementation).


Generators and coroutines have been implemented in a number of languages in a number of ways. Indeed, Tim Peters has done pure Python implementations of generators[3] and coroutines[4] using threads (and a thread-based coroutine implementation exists for Java). However, the horrendous overhead of a thread-based implementation severely limits the usefulness of this approach.

Microthreads (a.k.a "green" or "user" threads) and coroutines involve transfers of control that are difficult to accommodate in a language implementation based on a single stack. (Generators can be done on a single stack, but they can also be regarded as a very simple case of coroutines.)

Real threads allocate a full-sized stack for each thread of control, and this is the major source of overhead. However, coroutines and microthreads can be implemented in Python in a way that involves almost no overhead. This PEP, therefor, offers a way for making Python able to realistically manage thousands of separate "threads" of activity (vs. todays limit of perhaps dozens of separate threads of activity).

Another justification for this PEP (explored in PEP 220) is that coroutines and generators often allow a more direct expression of an algorithm than is possible in today's Python.


The first thing to note is that Python, while it mingles interpreter data (normal C stack usage) with Python data (the state of the interpreted program) on the stack, the two are logically separate. They just happen to use the same stack.

A real thread gets something approaching a process-sized stack because the implementation has no way of knowing how much stack space the thread will require. The stack space required for an individual frame is likely to be reasonable, but stack switching is an arcane and non-portable process, not supported by C.

Once Python stops putting Python data on the C stack, however, stack switching becomes easy.

The fundamental approach of the PEP is based on these two ideas. First, separate C's stack usage from Python's stack usage. Secondly, associate with each frame enough stack space to handle that frame's execution.

In the normal usage, Stackless Python has a normal stack structure, except that it is broken into chunks. But in the presence of a coroutine / microthread extension, this same mechanism supports a stack with a tree structure. That is, an extension can support transfers of control between frames outside the normal "call / return" path.


The major difficulty with this approach is C calling Python. The problem is that the C stack now holds a nested execution of the byte-code interpreter. In that situation, a coroutine / microthread extension cannot be permitted to transfer control to a frame in a different invocation of the byte-code interpreter. If a frame were to complete and exit back to C from the wrong interpreter, the C stack could be trashed.

The ideal solution is to create a mechanism where nested executions of the byte code interpreter are never needed. The easy solution is for the coroutine / microthread extension(s) to recognize the situation and refuse to allow transfers outside the current invocation.

We can categorize code that involves C calling Python into two camps: Python's implementation, and C extensions. And hopefully we can offer a compromise: Python's internal usage (and C extension writers who want to go to the effort) will no longer use a nested invocation of the interpreter. Extensions which do not go to the effort will still be safe, but will not play well with coroutines / microthreads.

Generally, when a recursive call is transformed into a loop, a bit of extra bookkeeping is required. The loop will need to keep it's own "stack" of arguments and results since the real stack can now only hold the most recent. The code will be more verbose, because it's not quite as obvious when we're done. While Stackless is not implemented this way, it has to deal with the same issues.

In normal Python, PyEval_EvalCode is used to build a frame and execute it. Stackless Python introduces the concept of a FrameDispatcher. Like PyEval_EvalCode, it executes one frame. But the interpreter may signal the FrameDispatcher that a new frame has been swapped in, and the new frame should be executed. When a frame completes, the FrameDispatcher follows the back pointer to resume the "calling" frame.

So Stackless transforms recursions into a loop, but it is not the FrameDispatcher that manages the frames. This is done by the interpreter (or an extension that knows what it's doing).

The general idea is that where C code needs to execute Python code, it creates a frame for the Python code, setting its back pointer to the current frame. Then it swaps in the frame, signals the FrameDispatcher and gets out of the way. The C stack is now clean - the Python code can transfer control to any other frame (if an extension gives it the means to do so).

In the vanilla case, this magic can be hidden from the programmer (even, in most cases, from the Python-internals programmer). Many situations present another level of difficulty, however.

The map builtin function involves two obstacles to this approach. It cannot simply construct a frame and get out of the way, not just because there's a loop involved, but each pass through the loop requires some "post" processing. In order to play well with others, Stackless constructs a frame object for map itself.

Most recursions of the interpreter are not this complex, but fairly frequently, some "post" operations are required. Stackless does not fix these situations because of amount of code changes required. Instead, Stackless prohibits transfers out of a nested interpreter. While not ideal (and sometimes puzzling), this limitation is hardly crippling.


For normal Python, the advantage to this approach is that C stack usage becomes much smaller and more predictable. Unbounded recursion in Python code becomes a memory error, instead of a stack error (and thus, in non-Cupertino operating systems, something that can be recovered from). The price, of course, is the added complexity that comes from transforming recursions of the byte-code interpreter loop into a higher order loop (and the attendant bookkeeping involved).

The big advantage comes from realizing that the Python stack is really a tree, and the frame dispatcher can transfer control freely between leaf nodes of the tree, thus allowing things like microthreads and coroutines.




[3] Demo/threads/ in the source distribution


Python semi-coroutines From: Steven D. Majewski (

Subject: Python semi-coroutines

View: (This is the only article in this thread) | Original Format

Date: 1998/07/24

Since this has come up a couple of times, I finally dug thru
my files and found a few traces of this experiment and put
the files on <>

BTW: I considered the experiment a partial failure, because it
would not work to implement full-coroutines ( because ceval.c
is recursive and some of the state is implicit in the C calling
stack. ) but I believe it did successfully implement semi-coroutines 
( i.e. Icon-like generators ) -- which seems to be all that
some folks want. 

The mods were done to python-1.0.2 
I haven't looked at how hard this would be to update to 1.5.1 

--- README --- 

There were experimental mods to python-1.0.2 to make frameobjects
into resumable continuations. ( put here for the curious and just
in case anyone wants to try a similar experiment with a more
current release. )

the files modified were frameobject.h and ceval.c

A new opcode was added for SUSPEND, but no new suspend statement
was added to the parser. I used a python disassembler/assembler
to change specific RETURN opcodes into SUSPENDs.
( various .py files here were hacked in that manner and used for testing.

co.txt & cont.semantics were the beginning of some notes on
various thread control issues.

py-co is part of the mailing list correspondence between me, Tim and Guido
that started me on that experiment.

More notes on this later when I find time.

- Steve Majewski <sdm7g@Virginia.EDU>

Python Microthreads  Will Ware, Christian Tismer, Just van Rossum, Mike Fletcher

Microthreads are useful when you want to program many behaviors happening simultaneously. Simulations and games often want to model the simultaneous and independent behavior of many people, many businesses, many monsters, many physical objects, many spaceships, and so forth. With microthreads, you can code these behaviors as Python functions. You will still need to think a teeny bit about the fact that context switching occurs between threads, hence this documentation. The microthread package uses Stackless Python.

Microthreads switch faster and use much less memory than OS threads. You can run thousands of microthreads simultaneously. Additionally, the microthread library includes a rich set of objects for inter-thread communication, synchronization, and execution control.

Linux Today - Python-dev summary, November 1-15, 2000 Stackless Python, and microthreads

Some sort of resolution to Stackless Python seems likely for 2.1. Guido is inclined to take a solution for 90% of the problems: "I still think that the current Stackless implementation is too complex, and that continuations aren't worth the insanity they seem to require (or cause :-), but that microthreads and coroutines *are* worth having and that something not completely unlike Stackless will be one day the way to get there..." He then went on to post a strawman API for microthreads:

Christian Tismer agreed with him that continuations aren't really necessary. "I'm happy to toss continuations for core Python, if we can find the right building blocks for coro/gen/uthreads. I think Guido comes quite near this, already."

And so did Tim: "I don't know of any comprehensible application of continuations that can't be done without them."

Christian Tismer enumerated the changes to ceval.c made by Stackless:

Finally, Gordon McMillan put up a Stackless intro and tutorial:


Traditional pattern matching languages, like SNOBOL4, represent patterns as an abstract data type. A special and specific pattern evaluation routine traverses both the pattern structure and the subject text, applying evaluation rules as indicated by the pattern, and advancing or regressing depending on their outcome.

Co-expressions in Icon by Shamim Mohamed

A co-expression can be thought of as an independent, encapsulated thread-like context, where the results of the expression can be picked off one at a time. Let us consider an example: suppose you are writing a program that generates code, and you need something that will generate names for you. This expression will generate names:

   "name" || seq()

(seq produces an infinite sequence of integers, by default starting at 1.) Of course, an expression exists at one point in the code; we need to separate the evaluation of the expression from its textual position in the program. We do this by creating a co-expression:

   c := create ("name" || seq())

Now wherever a label name is desired, it can be obtained by activating the co-expression c:

   tempvar_name := @c

After a co-expression has produced all its results, further evaluation with @ will fail. The ^ operator produces a new co-expression with the same expression as its argument, but `rewound' to the beginning.

   c := ^c

Coroutines In Python -- a lot of interesting links

Coroutines in Python by Tim Peters

Python Archives (1994q2) coroutines and continuations ( in Python - long discussion )

A while ago there was a (crossposted) thread about about pipes: most of the discussion was about Perl, but it caused me to repost my modules ( to show how *easy* the problem was in Python! ;-), and also to think more about the utility of having a more general 2-way interface.

The tostring/tolines functions in redirect are able to package the printed output from a function and return it as a python object. This gives some of the capability of unix pipes without having to actually pipe thru the OS - the non-trivial semantic difference being that redirect does not produce any concurrency - the inner function must run to completion before the redirecting wrapper can return the collected output to the calling function.

Actual concurrency of independent threads is not required, though: coroutine would be sufficient since the read/write is a blocking operation that could be an implicit transfer of control.

I know Guido is not as enamored of coroutines as old Icon-ers like Tim and I. He has (more than once, I think) stated the view that anything you could do with coroutines, you can do with classes. This view is certainly true from a theoretical POV - complemantarity of the object=method+data and closure=function+environment and all that, and coroutines are just a subset of continuations, etc., etc. But the difference is that saving of state into an objects instance variables has to be explicitly programmed, where coroutines and continuation are implicit. ( And being able to turn a pair of read/writes in a pair of functions into a coroutine return/continue, mediated transparently by a file-like python object would seem to be a potential big win for code-reuse. )

It is possible to program a pair of classes that cooperate with each other, but there is no way to make a class that can link two arbitrary functions by their read/write methods. The flow of control is not symetrical: one of the functions must RETURN before the other can get continue.

The fact that Python frames are (heap allocated) objects and not just pushed on the stack, means that they can persist whether that
function is active or not. In fact, an exception passes you a link to a whole bunch of frameobjects in it's traceback object.

A frame object appears contain all of the necessary state to serve as a continuation except that the interpreter's stactpointer and blockpointer are not preserved, and when a stack frame is unwound by an exception, the stack is emptied by poping and DECREF-ing the contents.

I looks to me that it would no be difficult to give Python continuations: if a stackpointer and blockpointer are added to the frameobject, all that is missing is an operation to package a frameobject-continuation, and a 'resume' operation ( a change to ceval.c ) that takes a frame/continuation object instead of creating a new frameobject.

I am far from a complete understanding of ceval and what happens to frame and blocks in Python, so I may be missing some obvious
problem ( And all the details of *exactly* how to do this aren't completely clear to me yet, either ).

Comments ?

What I initially wanted was some sort of cooperative coroutine control between two functions mediated by a file-like object that used it's read/write methods to alternate control on the two functions. Except for inet servers, most of the cases where I might want threads seem to actually fit a blocking coroutine model better than parallel threads, and in the cases where it doesn't, I'm quite happy to use unix processes for threading.
I would like to have threads on dos/mac, but I don't know if any of the portable (on unix) threads packages work on those platforms, or how much work it would be to get really portable threads written in C to work in Python. But since the easiest method of adding coroutine control appears to be the more general case of adding continuations to Python, I would think that this also gives a possible method of adding preemptive threads to the Python interpreter in a portable manner by coding it into the virtual machine rather than the real machine.

Comments ?

[ Encouraging or _discouraging_ comments invited! If I'm blind to some fatal flaw or don't understand what's going on in ceval, I'ld be happy to hear about it before I actually waste time in trying to implement it! ]

-- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --
[ "Cognitive Science is where Philosophy goes when it dies ... if it hasn't been good!" - Jerry Fodor ]

COROUTINES - what they are and how they work.

Co-routines are of particular use in file processing and simulation applications.

In file processing, co-routines enable the programmer to separate records or characters in time, rather than in space (i.e. physical file position), and view his program in terms of data flow from module to module, rather than flow of control.

In simulation, co-routines allow a more natural modelling of of a set of co-operating processes. It should be stressed that although co-routines share a number of properties with asynchronous processes (preservation of local variables etc.), which make modelling easy, co-routines are not separate processes, and the user must still manage flow of control between them.

For a formal definition of a co-routine and a full explanation of the fundamental concepts, the reader is referred to the technical literature (Knuth, CACM etc.).

Preserving State:

The current state of execution of a function can be largely described by its program instruction counter (IC) and its stack frame and pointer. The IC gives the location where execution is taking place, and the stack frame provides the values of all arguments and other storage local to the function. If the IC, stack frame, and stack pointer are preserved when a function is suspended, the function may be resumed at the exact point of suspension by restoring these values.

In regular functions, the current state of the function is lost when the function returns; the only way the state may be preserved when transferring to another function is to call the other function as a subroutine of the first. Then, of course, the state of the subroutine must be lost when it returns to resume execution in its caller.

In a like manner, one can conceive of a group of functions, like those that constitute a program, which can only preserve the group state by calling other groups of functions (programs) as sub-programs. The state of a sub-program, as with the state of a called function, vanishes when the sub-program returns to its caller. The states of both the caller and callee cannot be easily preserved.

Co-routines, on the other hand, are groups of functions which have been given a mechanism for preserving their states before transferring to other co-routines. Transfers among co-routines do not involve the regular caller/callee hierarchy, and the states of all co-routines may be thought of as existing concurrently.

See Also:

expl b coro - the B co-routine package

COROUTINES - what they are and how they work. From Learning Korn shell. reproduced without permission.

8.5 Coroutines

We've spent the last several pages on almost microscopic details of process behavior. Rather than continue our descent into the murky depths, we'll revert to a higher-level view of processes.

Earlier in this chapter, we covered ways of controlling multiple simultaneous jobs within an interactive login session; now we'll consider multiple process control within shell programs. When two (or more) processes are explicitly programmed to run simultaneously and possibly communicate with each other, we call them coroutines.

This is actually nothing new: a pipeline is an example of coroutines. The shell's pipeline construct encapsulates a fairly sophisticated set of rules about how processes interact with each other. If we take a closer look at these rules, we'll be better able to understand other ways of handling coroutines-most of which turn out to be simpler than pipelines.

When you invoke a simple pipeline, say ls | more, the shell invokes a series of UNIX primitive operations, a.k.a. system calls. In effect, the shell tells UNIX to do the following things; in case you're interested, we include in parentheses the actual system call used at each step:

  1. Create two subprocesses, which we'll call P1 and P2 (the fork system call).

  2. Set up I/O between the processes so that P1's standard output feeds into P2's standard input (pipe).

  3. Start /bin/ls in process P1 (exec).

  4. Start /bin/more in process P2 (exec).

  5. Wait for both processes to finish (wait).

You can probably imagine how the above steps change when the pipeline involves more than two processes,

Now let's make things simpler. We'll see how to get multiple processes to run at the same time if the processes do not need to communicate. For example, we want the processes dave and bob to run as coroutines, without communication, in a shell script. Our initial solution would be this:

dave &

Assume for the moment that bob is the last command in the script. The above will work-but only if dave finishes first. If dave is still running when the script finishes, then it becomes an orphan, i.e., it enters one of the "funny states" we mentioned earlier in this chapter. Never mind the details of orphanhood; just believe that you don't want this to happen, and if it does, you may need to use the "runaway process" method of stopping it, discussed earlier in this chapter.

8.5.1 wait

There is a way of making sure the script doesn't finish before dave does: the built-in command wait. Without arguments, wait simply waits until all background jobs have finished. So to make sure the above code behaves properly, we would add wait, like this:

dave &

Here, if bob finishes first, the parent shell will wait for dave to finish before finishing itself.

If your script has more than one background job and you need to wait for specific ones to finish, you can give wait the same type of job argument (with a percent sign) as you would use with kill, fg, or bg.

However, you will probably find that wait without arguments suffices for all coroutines you will ever program. Situations in which you would need to wait for specific background jobs are quite complex and beyond the scope of this book.

8.5.2 Advantages and Disadvantages of Coroutines

In fact, you may be wondering why you would ever need to program coroutines that don't communicate with each other. For example, why not just run bob after dave in the usual way? What advantage is there in running the two jobs simultaneously?

If you are running on a computer with one processor (CPU), then there is a performance advantage-but only if you have the bgnice option turned off (see Chapter 3, Customizing Your Environment), and even then only in certain situations.

Roughly speaking, you can characterize a process in terms of how it uses system resources in three ways: whether it is CPU intensive (e.g., does lots of number crunching), I/O intensive (does a lot of reading or writing to the disk), or interactive (requires user intervention).

We already know from Chapter 1 that it makes no sense to run an interactive job in the background. But apart from that, the more two or more processes differ with respect to these three criteria, the more advantage there is in running them simultaneously. For example, a number-crunching statistical calculation would do well when running at the same time as a long, I/O-intensive database query.

On the other hand, if two processes use resources in similar ways, it may even be less efficient to run them at the same time as it would be to run them sequentially. Why? Basically, because under such circumstances, the operating system often has to "time-slice" the resource(s) in contention.

For example, if both processes are "disk hogs," the operating system may enter a mode where it constantly switches control of the disk back and forth between the two competing processes; the system ends up spending at least as much time doing the switching as it does on the processes themselves. This phenomenon is known as thrashing; at its most severe, it can cause a system to come to a virtual standstill. Thrashing is a common problem; system administrators and operating system designers both spend lots of time trying to minimize it.

8.5.3 Parallelization

But if you have a computer with multiple CPUs (such as a Pyramid, Sequent, or Sun MP), you should be less concerned about thrashing. Furthermore, coroutines can provide dramatic increases in speed on this type of machine, which is often called a parallel computer; analogously, breaking up a process into coroutines is sometimes called parallelizing the job.

Normally, when you start a background job on a multiple-CPU machine, the computer will assign it to the next available processor. This means that the two jobs are actually-not just metaphorically-running at the same time.

In this case, the running time of the coroutines is essentially equal to that of the longest-running job plus a bit of overhead, instead of the sum of the run times of all processes (although if the CPUs all share a common disk drive, the possibility of I/O-related thrashing still exists). In the best case-all jobs having the same run time and no I/O contention-you get a speedup factor equal to the number of jobs.

Parallelizing a program is often not easy; there are several subtle issues involved and there's plenty of room for error. Nevertheless, it's worthwhile to know how to parallelize a shell script whether or not you have a parallel machine, especially since such machines are becoming more and more common.

We'll show how to do this-and give you an idea of some of the problems involved-by means of a simple task whose solution is amenable to parallelization.

Task 8.3

Write a utility that allows you to make multiple copies of a file at the same time.

We'll call this script mcp. The command mcp filename dest1 dest2 ... should copy filename to all of the destinations given. The code for this should be fairly obvious:

for dest in "$@"; do
    cp $file $dest

Now let's say we have a parallel computer and we want this command to run as fast as possible. To parallelize this script, it's a simple matter of firing off the cp commands in the background and adding a wait at the end:

for dest in "$@"; do
    cp $file $dest &

Simple, right? Well, there is one little problem: what happens if the user specifies duplicate destinations? If you're lucky, the file just gets copied to the same place twice. Otherwise, the identical cp commands will interfere with each other, possibly resulting in a file that contains two interspersed copies of the original file. In contrast, if you give the regular cp command two arguments that point to the same file, it will print an error message and do nothing.

To fix this problem, we would have to write code that checks the argument list for duplicates. Although this isn't too hard to do (see the exercises at the end of this chapter), the time it takes that code to run might offset any gain in speed from parallelization; furthermore, the code that does the checking detracts from the simple elegance of the script.

As you can see, even a seemingly trivial parallelization task has problems resulting from multiple processes having concurrent access to a given system resource (a file in this case). Such problems, known as concurrency control issues, become much more difficult as the complexity of the application increases. Complex concurrent programs often have much more code for handling the special cases than for the actual job the program is supposed to do!

Therefore it shouldn't surprise you that much research has been and is being done on parallelization, the ultimate goal being to devise a tool that parallelizes code automatically. (Such tools do exist; they usually work in the confines of some narrow subset of the problem.) Even if you don't have access to a multiple-CPU machine, parallelizing a shell script is an interesting exercise that should acquaint you with some of the issues that surround coroutines.

8.5.4 Coroutines with Two-way Pipes

Now that we have seen how to program coroutines that don't communicate with each other, we'll build on that foundation and discuss how to get them to communicate-in a more sophisticated way than with a pipeline. The Korn shell has a set of features that allow programmers to set up two-way communication between coroutines. These features aren't included in most Bourne shells.

If you start a background process by appending |& to a command instead of &, the Korn shell will set up a special two-way pipeline between the parent shell and the new background process. read -p in the parent shell reads a line of the background process' standard output; similarly, print -p in the parent shell feeds into the standard input of the background process. Figure 8.2 shows how this works.

Figure 8.2: Coroutine I/O

This scheme has some intriguing possibilities. Notice the following things: first, the parent shell communicates with the background process independently of its own standard input and output. Second, the background process need not have any idea that a shell script is communicating with it in this manner. This means that the background process can be any pre-existing program that uses its standard input and output in normal ways.

Here's a task that shows a simple example:

Task 8.4

You would like to have an online calculator, but the existing UNIX utility dc(1) uses Reverse Polish Notation (RPN), a la Hewlett-Packard calculators. You'd rather have one that works like the $3.95 model you got with that magazine subscription. Write a calculator program that accepts standard algebraic notation.

The objective here is to write the program without re-implementing the calculation engine that dc already has-in other words, to write a program that translates algebraic notation to RPN and passes the translated line to dc to do the actual calculation. [12]

[12] The utility bc(1) actually provides similar functionality.

We'll assume that the function alg2rpn, which does the translation, already exists: given a line of algebraic notation as argument, it prints the RPN equivalent on the standard output. If we have this, then the calculator program (which we'll call adc) is very simple:

dc |&

while read line'?adc> '; do
    print -p "$(alg2rpn $line)"
    read -p answer
    print "    = $answer"

The first line of this code starts dc as a coroutine with a two-way pipe. Then the while loop prompts the user for a line and reads it until the user types [CTRL-D] for end-of-input. The loop body converts the line to RPN, passes it to dc through the pipe, reads dc's answer, and prints it after an equal sign. For example:

$ adc
adc> 2 + 3
    = 5
adc> (7 * 8) + 54
    = 110
adc> ^D

Actually-as you may have noticed-it's not entirely necessary to have a two-way pipe with dc. You could do it with a standard pipe and let dc do its own output, like this:

{ while read line'?adc> '; do
      print "$(alg2rpn $line)"
} | dc

The only difference from the above is the lack of equal sign before each answer is printed.

But: what if you wanted to make a fancy graphical user interface (GUI), like the xcalc program that comes with many X Window System installations? Then, clearly, dc's own output would not be satisfactory, and you would need full control of your own standard output in the parent process. The user interface would have to capture dc's output and display it in the window properly. The two-way pipe is an excellent solution to this problem: just imagine that, instead of print " = $answer ", there is a call to a routine that displays the answer in the "readout" section of the calculator window.

All of this suggests that the two-way pipe scheme is great for writing shell scripts that interpose a software layer between the user (or some other program) and an existing program that uses standard input and output. In particular, it's great for writing new interfaces to old, standard UNIX programs that expect line-at-a-time, character-based user input and output. The new interfaces could be GUIs, or they could be network interface programs that talk to users over links to remote machines. In other words, the Korn shell's two-way pipe construct is designed to help develop very up-to-date software!

8.5.5 Two-way Pipes Versus Standard Pipes

Before we leave the subject of coroutines, we'll complete the circle by showing how the two-way pipe construct compares to regular pipelines. As you may have been able to figure out by now, it is possible to program a standard pipeline by using |& with print -p.

This has the advantage of reserving the parent shell's standard output for other use. The disadvantage is that the child process' standard output is directed to the two-way pipe: if the parent process doesn't read it with read -p, then it's effectively lost.

Bentley's Rules from Writing Efficient Programs

3. Use Coroutines

A multiphase algorithm in which the phases are linked by temporary files (or arrays) can be reduced to a single-pass algorithm using coroutines.

Coroutines are defined and described in detail in Knuth (Volume I) and most other modern books on algorithmics. Under IRIX, you can write coroutines in a natural way using any one of three models:

Control Structures

Coroutines are subroutines, with neither the caller nor the callee being "in charge". Instead, they allow program-controlled interleaving of instructions generated by both. Suppose A calls B. Then B wants to allow A to perform some more computation. B can "resume A", which then runs until it "resumes B". Then A can execute until it needs data from B, which might produce part of that data, and resume A, to examine or compute with the part produced so far. Coroutines have been exploited in the past in compilers, where the "parser" asks the "lexer" to run until the lexer has to stop (say at end of line). The lexer then resumes the parser to process that line's data, and is itself resumed to continue reading input characters. The text also shows an example of a tree-comparison problem solved logically by coroutines. Their advantage is that the cooperative behavior allows the "high-level" program to terminate the computation early, before the companion routine "completes" its assigned task. I have also used them to simulate parallel computation, when I want to build my own control over the task scheduling process.

As an interesting exercise, the text shows how coroutines can be simulated in C, using C's "setjmp()" and "longjmp()" library procedures. These procedures are intended for use in setting exception-handler routines. However, they have the property that they create concrete realizations of a "stopped" task -- an instruction counter, along with a variable reference context is stored when a setjmp occurs, and is resumed when a longjmp to the saved item is performed. The longjmp(Buf, Return) causes the setjmp(Buf) to return (again), this time returning value Return, instead of the 0 setjmp(Buf) returns when it is called.

Python Archives (1994q2) coroutines and continuations ( in Python - long discussion )

coroutines and continuations ( in Python? - long discussion )

Steven D. Majewski (
Fri, 29 Apr 1994 13:58:25 -0400

A while ago there was a (crossposted) thread about about pipes: most of the discussion was about Perl, but it caused me to repost my modules ( to show how *easy* the problem was in Python! ;-),
and also to think more about the utility of having a more general 2-way interface.

The tostring/tolines functions in redirect are able to package the printed output from a function and return it as a python object. This gives some of the capability of unix pipes without having to actually pipe thru the OS - the non-trivial semantic difference being that redirect does not produce any concurrency - the inner function must run to completion before the redirecting wrapper can return the collected output to the calling function.

Actual concurrency of independent threads is not required, though: coroutine would be sufficient since the read/write is a blocking operation that could be an implicit transfer of control.

I know Guido is not as enamored of coroutines as old Icon-ers like Tim and I. He has (more than once, I think) stated the view that anything you could do with coroutines, you can do with classes. This view is certainly true from a theoretical POV - complemantarity of the object=method+data and closure=function+environment and all that, and coroutines are just a subset of continuations, etc., etc. But the difference is that saving of state into an objects instance variables has to be explicitly programmed, where coroutines and continuation are implicit. ( And being able to turn a pair of read/writes in a pair of functions into a coroutine return/continue, mediated transparently by a file-like python object would seem to be a potential big win for code-reuse. )

It is possible to program a pair of classes that cooperate with each other, but there is no way to make a class that can link two arbitrary functions by their read/write methods. The flow of control is not symetrical: one of the functions must RETURN before the other can get continue.

The fact that Python frames are (heap allocated) objects and not just pushed on the stack, means that they can persist whether that function is active or not. In fact, an exception passes you a link to a whole bunch of frameobjects in it's traceback object.

A frame object appears contain all of the necessary state to serve as a continuation except that the interpreter's stactpointer and blockpointer are not preserved, and when a stack frame is unwound by an exception, the stack is emptied by poping and DECREF-ing the contents.

I looks to me that it would no be difficult to give Python continuations: if a stackpointer and blockpointer are added to the frameobject, all that is missing is an operation to package a frameobject-continuation, and a 'resume' operation ( a change to ceval.c ) that takes a frame/continuation object instead of creating a new frameobject.

I am far from a complete understanding of ceval and what happens to frame and blocks in Python, so I may be missing some obvious problem ( And all the details of *exactly* how to do this aren't completely clear to me yet, either ).

Comments ?

What I initially wanted was some sort of cooperative coroutine control between two functions mediated by a file-like object that used it's read/write methods to alternate control on the two functions. Except for inet servers, most of the cases where I might want threads seem to actually fit a blocking coroutine
model better than parallel threads, and in the cases where it doesn't, I'm quite happy to use unix processes for threading. I would like to have threads on dos/mac, but I don't know if any of the portable (on unix) threads packages work on those platforms, or how much work it would be to get really portable threads written in C to work in Python. But since the easiest method of adding coroutine control appears to be the more general case of adding continuations to Python, I would think that this also gives a possible method of adding preemptive threads to the Python interpreter in a portable manner by coding it into the virtual machine rather than the real machine.

Comments ?

[ Encouraging or _discouraging_ comments invited! If I'm blind
to some fatal flaw or don't understand what's going on in
ceval, I'ld be happy to hear about it before I actually
waste time in trying to implement it! ]

-- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --
[ "Cognitive Science is where Philosophy goes when it dies ...
if it hasn't been good!" - Jerry Fodor ]

Subject: Re: Coroutines (was: Compiler.)
From: (Eric Werme - replace nobody with werme)
Date: 1998/05/05
Message-ID: <6ilsj1$d3q$>
Newsgroups: alt.folklore.computers,alt.sys.pdp10 (Alan Bowler) writes in a.f.c:

>In article <6i72h2$hhh$> [20] writes
>>Another neat thing that was used was the notion of co-routines.
>>Perhaps somebody more qualified would talk about this?

>You mean "co-operative multitasking with threads" :-)

Well, there are coroutines that take many instructions to switch between and there are coroutines that take one instruction. Barbara was talking about the latter. The definition of a coroutine as I heard it is that the two code paths use the same mechanism to transfer control back and forth. Of the many styles of subroutine calls on the PDP-10, JSP ac,addr is the fastest, as it's the only one that doesn't require a memory store. Its ISP is something like:

ac = PC
PC = effective address [addr in the usual case]

The subroutine return, of course, is:

JRST (ac)

Here, the efective address is the contents of the register.

The coroutine instruction combined the two:

JSP ac,(ac)

This essentially exchanged the PC with ac.

There's one big catch here - there can't be unanswered pushes or pops in either piece of code, this is one reason why many people equate coroutines with context switches and the exchange of many registers.

I have two good examples to describe. I'll put the second one in a separate posting.

I wrote the client side of TOPS-10 TELNET for the Arpanet that ran at Carnegie Mellon, Harvard, and Rutgers. Telnet has several multi character sequences and they can be split across message boundaries. TOPS-10 made
it easiest for use to get a message at interrupt level, and step through each octet in sequence, calling the TTY input code as necessary. However, parsing the Telnet options is more easily done by code that can call a
subroutine to fetch one character at a time.

So I compromised. The network side looked something like:

prcmsg: PUSHJ P,SAVE1 ;Save P1
MOVE P1,LAST_PC(F) ;Get PC where telnet left off
PUSHJ P,GET_CH ;Get next byte from message
JRST DONE ;None left
JSP P1,(P1) ;call telnet processor


Not too exciting.

The telnet side looked something like:

PRCTEL: CAIN T1,IAC ;Telnet escape?
JRST NORMCH ;just text
JSP P1,(P1) ;get next character
CAIGE T1,MINTEL ;command in range 240 - 255
JRST BAD ;Out of range


WONT: JSP P1,(P1) ;Get option code

The nice thing about all this was that the telnet processor had no idea that some of its coroutine calls actually resulted in dismissing an interrupt.

Another way of looking at this code is to see a state machine where the PC is the state variable.

A decade later when I was at Alliant, I fielded a phone call from a customer with a Macintosh machine that was sometimes having trouble with its Telnet link. The customer had managed to trace it to telnet commands
split between two TCP messages. I really tried to be sympathetic, but I was firm that Alliant's system, while perhaps not being Mac-friendly, was compliant with the protocols and that I was sure a Macintosh should
be able to handle split telnet commands.

Other architectures have coroutine instructions too. On the PDP-11:

JSR Rx,@Rx

The Intel 860 _almost_ has one:

calli r1

(Calli is like jsp r1,ea in that the return pc is stored in r1.) However,
the i860 manual says this is a no-no.

I should know if Alpha has one, but I just don't do enough assembly language
<> Eric (Ric) Werme <> The above is unlikely to contain <>
<> ROT-13 addresses: <> official claims or policies of <>
<> <jrezr@mx3.qrp.pbz> <> Digital Equipment Corp. <>
<> <jrezr@plorecbegny.arg> <>

Recommended Links

Softpanorama Top Visited

Softpanorama Recommended

Coroutine - Wikipedia, the free encyclopedia

Art of Assembly Chapter Nineteen

Chapter 19 - Processes, Coroutines, and Concurrency
19.1 - DOS Processes
19.1.1 - Child Processes in DOS - Load and Execute - Load Program - Loading Overlays - Terminating a Process - Obtaining the Child Process Return Code
19.1.2 - Exception Handling in DOS: The Break Handler
19.1.3 - Exception Handling in DOS: The Critical Error Handler
19.1.4 - Exception Handling in DOS: Traps
19.1.5 - Redirection of I/O for Child Processes
19.2 - Shared Memory
19.2.1 - Static Shared Memory
19.2.2 - Dynamic Shared Memory
19.3 - Coroutines
19.3.1 - AMAZE.ASM
19.3.2 - 32-bit Coroutines
19.4 - Multitasking
19.4.1 - Lightweight and HeavyWeight Processes
19.4.2 - The UCR Standard Library Processes Package
19.4.3 - Problems with Multitasking
19.4.4 - A Sample Program with Threads
19.5 - Synchronization
19.5.1 - Atomic Operations, Test & Set, and Busy-Waiting
19.5.2 - Semaphores
19.5.3 - The UCR Standard Library Semaphore Support
19.5.4 - Using Semaphores to Protect Critical Regions
19.5.5 - Using Semaphores for Barrier Synchronization
19.6 - Deadlock

Ulm's Coroutine Scheme

A simple and powerful coroutine scheme has been offered in Modula-2 by N. Wirth. The two basic operations (exported by the SYSTEM module) of Modula-2 are:

                     addr: ADDRESS; size: CARDINAL;
                     VAR new: ADDRESS);

NEWPROCESS creates a new coroutine with a stack starting at addr with size size. The coroutine starts execution by calling the parameterless global procedure proc. A handle to the new coroutine is returned in new.

The first disadvantage of NEWPROCESS

Introduction to coroutines

The coroutine facilities of Modula-2 allow multi-threaded programs to be constructed. In such programs, several threads may be at various stages of execution at the same time. These threads are quasi-concurrent. That is, only one thread is actually active at any one time, but by interleaving the execution of the various threads all may progress apparently in parallel. The use of couroutines allows certain unique forms of program organization which are rather under-utilized in current practice, probably since few languages support coroutine primitives. In particular, coroutines form a natural foundation for simulation programs. Program threads are sometimes also known as lightweight processes, since they provide some of the functionality of processes, but are many, many orders of magnitude less costly in execution time.

Execution of each coroutine is explicitly suspended by transferring control to another coroutine. Each coroutine has its own activation stack at runtime, and these stacks are explicitly created and initialized by a call to the procedure Coroutines.NEWPROCESS.

Programs which do not use the coroutines library, so-called single-stack programs have little need to perform stack overflow testing. Typically, several hundred megabytes of virtual memory are available for expansion of the stack segment of such programs, although it is usual for s process size limit to be exceeded well before this. Programs which use the coroutines library have a separate stack for each coroutine, suggesting the prudent use of stack overflow testing. The facilities provided for this are also available for single-stack programs, although the default continues to be for stack overflow testing to be disabled.




Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : BureaucraciesHarvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes


Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor


The Last but not Least

Copyright © 1996-2014 by Dr. Nikolai Bezroukov. was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: (the fastest) and


The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February 19, 2014