Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Insertion sort

News

Sorting Algorithms Recommended Books Recommended Links Selection Sort Bubblesort
Animations Sorting Algorithms Coding Style Donald Knuth  Testing Sorting Algorithms Humor Etc

Note: As with other sorting algorithms, reading Knuth book is extremely beneficial and doing exercises from it even more so.

Among simple sort algorithms insertion sort is the best (uses the less number of comparisons then Bubble sort and selection sort). It is the most efficient sorting algorithm for small N (say N<25). The algorithms displays strong inner loop locality and that means that most operands will be cached on modern computers. That's why it is sometimes used as a final stage in complex sorting algorithm when the subsequence to be sorted shrinks to a certain number of elements (for example, in Quicksort it is typically used when the length of the segment drops to low teens). 

For large N it is excruciatingly slow unless the array to be sorted is almost in the right sorting order (which is more common that any of us think: resorting almost sorted area is favorite pasture of corporate dwellers ;-).  Those, so called "nearly sorted" arrays and are very important practical cases that are often ignored in textbooks devoted to sorting algorithms.

For example,  sometimes sorting is used (very inefficiently) for insertion of new elements: elements first are written to the end of array and then the whole array is sorted. Yes, this is a very inefficient merge algorithms, but nevertheless it represents a common practice, especially in scripts. So Unix sort  utility should be able to deal with this situation efficiently. 

In case the array is already sorted insertion sort performs exactly N comparison before stopping. You can't beat that performance. It does not work well for reversely sorted arrays and it does not takes into account existence of large sorted subsequences in the initial array (see Natural Merge Sort). In other words it is a not completely dumb sort algorithm that does not distinguish and thus can't take advantage of various input data (for example simple selection is close to dumb sorting algorithms, the same is true about quicksort), but some variants of mergesort algorithm are more intelligent.  You can see how it slows down on large data set from this demo

Another advantage of insertion sort is that it is very easy to explain and corresponds to an intuitive way people sort various sequences, for example, playing cards. We will assume that we are sorting an array in the ascending order.  Key ideas:

  1. Insertion sort always maintains two zones in the array to be sorted: sorted and unsorted.
  2. At the beginning the sorted zone consist of one element (the first element of array that we are sorting).
  3. On each step the algorithms expand the sorted zone by one element inserting the first element from the unsorted zone in the proper place in the sorted zone and shifting all larger elements one slot down.

It is an algorithms that many people intuitively use for sorting cards and it is very easy to illustrate on the deck of cards.  Here is an example (sorted zone is in blue, unsorted is in red):

5 | 3 1 7 0 -> 3 5 | 1 7 9 -> 1 3 5 | 7 9 ->  1 3 5 7 | 9 -> 1 3 5 7 9

See animation to get better idea of the details.

Since multiple keys with the same value are placed in the sorted array in the same order that they appear in the input array, insertion sort is stable.

If input data contains already sorted fragments insertion sort works slightly faster, but unless the sorted fragment is found of the very beginning of the data set, it moves its elements up one by one.

If the input data are sorted in the opposite direction, insertion soft can't take advantage of this.

Out of three elementary sort algorithms (bublesort, selection and insertion sort algorithms),  insertion sort uses fewer comparisons, so it might be faster when comparisons are expensive, for example for sorting long similar strings. 

C implementation usually contains two nested loops:

Please note that C language control structures are not very well suited for precise implementation of the control graphs of sorting or searching algorithms.  But they are good enough. At the same time we should avoid temptation to adjust algorithm to the capabilities of the language, the error that is typical in so many textbooks.

Here is sample non-optimal insertion sort implementation that contains the flaws that I mentioned:

void insertionSort(int a[], int array_size)
{
  int i, j, current;
  for (i=1; i < array_size; i++) {
    current = a[i];
    j = i; # index of the end of sorted region
    while ((j > 0) && (a[j-1] > current)) {
      a[j] = a[j-1];
      j = j - 1;
    }
    a[j] = current;
  }
}

Here is a slightly better implementation that optimized finding of already sorted subsequences. I prepared it for my algorithms class many years ago (note that the author is no Knuth and algorithms is somewhat skewed to use available in C control structures; this is not how this algorithm should be programmed in assembler):

/* insert sort */
/* Nikolai Bezroukov */


#include <stdio.h>
#include <stdlib.h>

typedef int itemtype;     // type of item to be sorted

void insertSort(
itemtype *a, // array to be sorted
int n  // size of the array; note that (n-1) is the upper index of the array
)
{
    itemtype t;
    int i, j;

   /* Outer "boundary definition" loop */
    for (i = 1; i < n; i++) {
         if ( a[i-1]<=a[i] ) {  continue;} // optimized finding of already sorted subsequences
        t = a[i];
        /* inner loop: elements shifted down until insertion point found  */
        for (j = i-1; j >= 0 && ( a[j] <= t ); j--) {
             a[j+1] = a[j];
        }
        /* insert */
        a[j+1] = t;
    }
}

Optimization and enhancements

There are various way to improve this algorithms, but few of them are worth the trouble:

  1. As the key for efficient algorithm for large sequences are "large distance swaps" the way to improve efficiency of insertion sort is to find a way to increase the distance between element swaps. A very elegant and highly non-obvious approach was invented in  Shellsort
  2. You can try insert multiple elements instead of single element. In this modification you scan unsorted area for sequences in ascending order (the max number can be fixed by, for example ten or 21) copying them to a temp area. Then you merge temp area and the sorted area working from the last element.  The main advantage is that for each ascending sequence that is present in the data you shift elements by more then one element and reuse the position of previous inserted element for inserting the next in the detected ascending sequence (not repeating comparisons from the bottom of sorted area). In my opinion this is a realistic optimization that requires some fixed amount of additional storage but cuts the number of comparison by using "blocks" instead of single element. You can also generalize this and handle both ascending and descending sequences (you just put descending sequence into temp array starting from the end of temp array and moving toward beginning). Somewhat similar idea is implemented in strand sort
  3. An obvious improvement is to use binary search to find the point of insertion, but that leaves the most important problem of shifting the elements "below" insertion point open. Which actually makes this optimization mute as the cost of shifting elements in most cases is the same or higher then cost of a comparison.  See binary insertion sort. If the cost of comparison and the cost of move of the element are equal I don't see any major improvements. As such this optimization is of purely theoretical interest.
  4. You can try to remember previous insertion point and the just inserted element imitating heuristic search algorithm. This idea was already and more elegantly implemented in optimization 1. Still you can do the following:
  5. Insert sort can be done bidirectional much like bidirectional bubble sort. In this case you built sorted areas from two sides of the array until they merge.

Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[Nov 27, 2011] insertion sort

NIST page

Definition: Sort by repeatedly taking the next item and inserting it into the final data structure in its proper order with respect to items already inserted. Run time is O(n2) because of moves.

Also known as linear insertion sort.

Generalization (I am a kind of ...)
sort.

Specialization (... is a kind of me.)
binary insertion sort.

See also gnome sort.

Note: Sorting can be done in place by moving the next item into place by repeatedly swapping it with the preceding item until it is in place - a linear search and move combined. This implementation is given in C. J. Shaw and T. N. Trimble, Algorithm 175 Shuttle Sort, CACM, 6(6):312-313, June 1963.

If comparing items is very expensive, use binary search to reduce the number of comparisons needed to find where the item should be inserted, then open a space by moving all later items down one. However a binary search is likely to make this not a stable sort.

Author: PEB

Implementation

(Java). An implementation (Java) due to Sedgewick and Wayne (search for Insertion sort). Algorithms and Data Structures' explanation and code (Java and C++). Other implementations may be available through the Stony Brook Algorithm Repository, Sorting. (Scheme). (Fortran).

More information

demonstration of several sort algorithms, with particular emphasis on insertion sort; more demonstrations; a animation (Java) of insertion sort.

[Oct 06, 2010] InformIT Simple Sorting in Java Insertion Sort

In most cases the insertion sort is the best of the elementary sorts described in this chapter. It still executes in O(N2) time, but it's about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It's also not too complex, although it's slightly more involved than the bubble and selection sorts. It's often used as the final stage of more sophisticated sorts, such as quicksort.

[Oct 06, 2010] Insertion sort (C) - LiteratePrograms

When sorting an array with insertion sort, we conceptually separate it into two parts:

In outline, our primary function looks like this:

[Apr 10, 2009] Insertion Sort - C++, Java, Algorithm and Data Structure Tutorials by Denis Kulagin

Contains Java and C++ implementation...
Visualizers
  1. Insertion Sort Animation (with source code line by line visualization)
  2. Insertion Sort in Java Applets Centre
  3. Animated Sorting Algorithms: Insertion Sort

Recommended Links

Softpanorama Top Visited

Softpanorama Recommended

Insertion sort - Wikipedia, the free encyclopedia

insertion sort (NIST)

Insertion sort (C) - LiteratePrograms Implementation is many languages with some commentaries.

Insertion Sort (ciips.ee.uwa.edu.au)

Insertion sort H.W. Lang   FH Flensburg  See also Sortierverfahren

Insertion Sort (nice description)

Insertion Sort (www.personal.kent.edu)

Insertion Sort simple animation

InformIT Simple Sorting in Java Insertion Sort

CCAA - Sorting Algorithms

binary insertion sort -- an optimization of the insertion sort where the proper location is found by binary search.

Binary Insertion Sort

bender...004librarysort.ps PS.gz  PS  PDF  Image  Update  Help

Insertion Sort (linux.wku.edu)

The insertion sort works just like its name suggests - it inserts each item into its proper place in the final list. The simplest implementation of this requires two list structures - the source list and the list into which sorted items are inserted. To save memory, most implementations use an in-place sort that works by moving the current item past the already sorted items and repeatedly swapping it with the preceding item until it is in place.

Like the bubble sort, the insertion sort has a complexity of O(n2). Although it has the same complexity, the insertion sort is twice as efficient as the bubble sort.

Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.

Abstract: Traditional INSERTION SORT runs in O(n&sup2;) time because each insertion takes O(n) time. When people run INSERTION SORT in the physical world, they leave gaps between items to accelerate insertions. Gaps help in computers as well. This paper shows that GAPPED INSERTION SORT has insertion times of O(log n) with high probability, yielding a total running time of O(n log n) with high probability. (Update)

insertionsort

Dichotomic insertion consists in using a dichotomic search to determine where to insert the new item, taking advantage of the fact the sub array is already sorted. This yield the most efficient algorithm in terms on number of comparisons, with a guaranteed bound of sum(log2 i) for i in [1..n-1], or log2 (n-1)!, which is strictly less than n log2 n. Unfortunately, this algorithm is not efficient in practice, because, to the opposite of Selection sort, it does on average of n*n exchanges. There are two variants, depending if we assume the array to be almost sorted (DichoSort) or random or inverted (DichoSort2).




Etc

Society

Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : BureaucraciesHarvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

 

The Last but not Least


Copyright © 1996-2014 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

Disclaimer:

The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February 19, 2014