Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Insertion sort

News

Sorting Algorithms Recommended Books Recommended Links Selection Sort Bubblesort
Animations Sorting Algorithms Coding Style Donald Knuth  Testing Sorting Algorithms Humor Etc

Introduction

Among simple sort algorithms insertion sort is the best (uses on of the same data less number of comparisons then Bubble sort and selection sort). It is the most efficient sorting algorithm for small N (say N<25). It is also extremely efficient of the common case of "almost sorted" arrays. The algorithms displays strong inner loop locality and that means that most operands will be cached on modern computers. That's why it is used as a final stage in complex sorting algorithm when the subsequence to be sorted shrinks to teens or single digits (for example, in Quicksort it is typically used when the length of the segment drops to low teens). 

Another advantage of insertion sort is that it is very easy to explain and corresponds to an intuitive way people sort various sequences, for example, when playing cards. But this intuitive transparency does not mean that algorithm is the simplest. Selection sort is simpler. Actually insertion sort algorithm contains a caveat -- boundary conditions for inner "shifting down" cycle and this cycle often is implemented with errors.

Among the simplest comparison based sorting algorithms it is the most intelligent and the most efficient.

Note: As with other sorting algorithms, reading Knuth book is extremely beneficial and doing exercises from it even more so.

Explanation of the algorithm

 We will assume that we are sorting an array in the ascending order.  Key ideas:

  1. Insertion sort always maintains two zones in the array to be sorted: sorted and unsorted.
  2. At the beginning the sorted zone consist of one element (the first element of array that we are sorting).
  3. On each step the algorithms expand the sorted zone by one element inserting the first element from the unsorted zone in the proper place in the sorted zone and shifting all larger elements one slot down.

It is an algorithms that many people intuitively use for sorting cards and it is very easy to illustrate on the deck of cards.  Here is an example (sorted zone is in blue, unsorted is in red):

5 | 3 1 7 0 -> 3 5 | 1 7 9 -> 1 3 5 | 7 9 ->  1 3 5 7 | 9 -> 1 3 5 7 9

See animation to get better idea of the details.

Since multiple keys with the same value are placed in the sorted array in the same order that they appear in the input array, insertion sort is stable.

If input data contains already sorted fragments insertion sort works slightly faster, but unless the sorted fragment is found of the very beginning of the data set, it moves its elements up one by one, not in chunks.

If the input data are sorted in the opposite direction, insertion soft can't take advantage of this.

Out of three elementary sort algorithms (bublesort, selection and insertion sort algorithms),  insertion sort uses fewer comparisons, so it might be faster when comparisons are expensive, for example for sorting long similar strings. 

C implementation usually contains two nested loops:

Please note that C language control structures are not very well suited for precise implementation of the control graphs of sorting or searching algorithms.  But they are good enough. At the same time we should avoid temptation to adjust algorithm to the capabilities of the language, the error that is typical in so many textbooks.

Here is sample non-optimal insertion sort implementation that contains the flaws that I mentioned:

void insertionSort(int a[], int array_size)
{
  int i, j, current;
  for (i=1; i < array_size; i++) {
    current = a[i];
    j = i; # index of the end of sorted region
    while ((j > 0) && (a[j-1] > current)) {
      a[j] = a[j-1];
      j = j - 1;
    }
    a[j] = current;
  }
}

Here is a slightly better implementation that optimized finding of already sorted subsequences. I prepared it for my algorithms class many years ago (note that the author is no Knuth and algorithms is somewhat skewed to use available in C control structures; this is not how this algorithm should be programmed in assembler):

/* insert sort */
/* Nikolai Bezroukov */


#include <stdio.h>
#include <stdlib.h>

typedef int itemtype;     // type of item to be sorted

void insertSort(
itemtype *a, // array to be sorted
int n  // size of the array; note that (n-1) is the upper index of the array
)
{
    itemtype t;
    int i, j;

   /* Outer "boundary definition" loop */
    for (i = 1; i < n; i++) {
         if ( a[i-1]<=a[i] ) {  continue;} // optimized finding of already sorted subsequences
        t = a[i];
        /* inner loop: elements shifted down until insertion point found  */
        for (j = i-1; j >= 0 && ( a[j] > t ); j--) {
             a[j+1] = a[j];
        }
        /* insert */
        a[j+1] = t;
    }
}

Below is the same algorithm written in Perl in which it is easier to debug.

#!/usr/bin/perl
# insertion_sort.pl
# Get array a from the file and sort it
# invocation insercion_sort test15.txt
# Debug 0 - production
#       1 - produce debug output 
    @a=<>; # slirp input array -- the array to be sorted
    for ($k=0; $k<@a; $k++) { chomp $a[$k]; }
    
    ($debug) && print "Initial:".join(' ',@a)."\n";
    for ($i = 1; $i < @a; $i++) {
        ($debug) && print "i=$i a[i-1] <= a[i]: ".$a[$i-1]."<=> $a[$i]\n" ;
        next if ( $a[$i-1]<=$a[$i] ); # // optimized finding of already sorted subsequences
        $t=$a[$i];
        ($debug) && print "t=$t, i=$i";
        for ($j = $i-1; $j >= 0 && ( $a[$j] > $t ); $j--) {
            ($debug) && print "move down: j=$j ". $a[$j+1]."<= $a[$j]\n";
            $a[$j+1] = $a[$j];
        }
        $a[$j+1] = $t;
        ($debug) && print "status of array".join(' ',@a)."\n";
     }
     ($debug) && print "final:".join(' ',@a)."\n";

It is actually very easy to make mistake in this algorithm. Not as easy as in binary search, but still. As an exercise try to modify this algorithm so that it sorted only a fragment of array from say L to N.

Nearly sorted arrays

For large N it is excruciatingly slow unless the array to be sorted is almost in the right sorting order (which is more common that any of us think: resorting almost sorted area is favorite pasture of corporate dwellers ;-).  Those, so called "nearly sorted" arrays and are very important practical cases that are often ignored in textbooks devoted to sorting algorithms.

For those nearly sorted array with few permutations the algorithm is quite efficient up to, say, N=250. For example if there is one permutation and the distance is 1 (adjacent elements), it will do just single element swap. For example, sorting blog entries that are already in "almost" correct order with few permutations is a good task for insertion sort.

For example,  sometimes (of very often, if you wish, if we are talking about Unix scripts) sorting is used (very inefficiently) for insertion of new elements: elements first are written to the end of already sorted array and then the whole array is sorted again. Yes, this is a very inefficient merge algorithms, but nevertheless it represents a common practice, especially in scripts. So Unix sort  utility should be able to deal with this situation efficiently. 

In case the array is already sorted insertion sort performs exactly N comparison before stopping. You can't beat that performance.

Insertion sort does not work well for reversely sorted arrays and it only partially  takes into account existence of large sorted subsequences in the initial array (see Natural Merge Sort). In other words it is a not completely dumb sort algorithm that does not distinguish and thus can't take advantage of various input data (for example simple selection is close to dumb sorting algorithms, the same is true about quicksort), but some variants of mergesort algorithm are more intelligent.  You can see how it slows down on large data set from this demo

Optimization and enhancements

There are various way to improve this algorithms, but few of them are worth the trouble:

  1. As the key for efficient algorithm for large sequences are "large distance swaps" the way to improve efficiency of insertion sort is to find a way to increase the distance between element swaps. A very elegant and highly non-obvious approach was invented in  Shellsort
  2. You can try insert multiple elements instead of single element. In this modification you scan unsorted area for sequences in ascending order (the max number can be fixed by, for example ten or 21) copying them to a temp area. Then you merge temp area and the sorted area working from the last element.  The main advantage is that for each ascending sequence that is present in the data you shift elements by more then one element and reuse the position of previous inserted element for inserting the next in the detected ascending sequence (not repeating comparisons from the bottom of sorted area). In my opinion this is a realistic optimization that requires some fixed amount of additional storage but cuts the number of comparison by using "blocks" instead of single element. You can also generalize this and handle both ascending and descending sequences (you just put descending sequence into temp array starting from the end of temp array and moving toward beginning). Somewhat similar idea is implemented in strand sort
  3. An obvious improvement is to use binary search to find the point of insertion, but that leaves the most important problem of shifting the elements "below" insertion point open. Which actually makes this optimization mute as the cost of shifting elements in most cases is the same or higher then cost of a comparison.  See binary insertion sort. If the cost of comparison and the cost of move of the element are equal I don't see any major improvements. As such this optimization is of purely theoretical interest.
  4. You can try to remember previous insertion point and the just inserted element imitating heuristic search algorithm. This idea was already and more elegantly implemented in optimization 1. Still you can do the following:
  5. Insert sort can be done bidirectional much like bidirectional bubble sort. In this case you built sorted areas from two sides of the array until they merge.

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Mar 11, 2015] insertion sort

NIST page

(algorithm)

Definition: Sort by repeatedly taking the next item and inserting it into the final data structure in its proper order with respect to items already inserted. Run time is O(n2) because of moves.

Also known as linear insertion sort.

Generalization (I am a kind of ...)
sort.

Specialization (... is a kind of me.)
binary insertion sort.

See also gnome sort.

Note: Sorting can be done in place by moving the next item into place by repeatedly swapping it with the preceding item until it is in place - a linear search and move combined. This implementation is given in C. J. Shaw and T. N. Trimble, Algorithm 175 Shuttle Sort, CACM, 6(6):312-313, June 1963.

If comparing items is very expensive, use binary search to reduce the number of comparisons needed to find where the item should be inserted, then open a space by moving all later items down one. However a binary search is likely to make this not a stable sort.

Author: PEB

Implementation

(Java). An implementation (Java) due to Sedgewick and Wayne (search for Insertion sort). Algorithms and Data Structures' explanation and code (Java and C++). Other implementations may be available through the Stony Brook Algorithm Repository, Sorting. (Fortran).

More information

demonstration of several sort algorithms, with particular emphasis on insertion sort; more demonstrations; a animation (Java) of insertion sort.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 2 March 2015.
HTML page formatted Mon Mar 2 16:13:48 2015.

[Nov 27, 2011] insertion sort

NIST page

Definition: Sort by repeatedly taking the next item and inserting it into the final data structure in its proper order with respect to items already inserted. Run time is O(n2) because of moves.

Also known as linear insertion sort.

Generalization (I am a kind of ...)
sort.

Specialization (... is a kind of me.)
binary insertion sort.

See also gnome sort.

Note: Sorting can be done in place by moving the next item into place by repeatedly swapping it with the preceding item until it is in place - a linear search and move combined. This implementation is given in C. J. Shaw and T. N. Trimble, Algorithm 175 Shuttle Sort, CACM, 6(6):312-313, June 1963.

If comparing items is very expensive, use binary search to reduce the number of comparisons needed to find where the item should be inserted, then open a space by moving all later items down one. However a binary search is likely to make this not a stable sort.

Author: PEB

Implementation

(Java). An implementation (Java) due to Sedgewick and Wayne (search for Insertion sort). Algorithms and Data Structures' explanation and code (Java and C++). Other implementations may be available through the Stony Brook Algorithm Repository, Sorting. (Scheme). (Fortran).

More information

demonstration of several sort algorithms, with particular emphasis on insertion sort; more demonstrations; a animation (Java) of insertion sort.

[Oct 06, 2010] InformIT Simple Sorting in Java Insertion Sort

In most cases the insertion sort is the best of the elementary sorts described in this chapter. It still executes in O(N2) time, but it's about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It's also not too complex, although it's slightly more involved than the bubble and selection sorts. It's often used as the final stage of more sophisticated sorts, such as quicksort.

[Oct 06, 2010] Insertion sort (C) - LiteratePrograms

When sorting an array with insertion sort, we conceptually separate it into two parts:

In outline, our primary function looks like this:

[Apr 10, 2009] Insertion Sort - C++, Java, Algorithm and Data Structure Tutorials by Denis Kulagin

Contains Java and C++ implementation...
Visualizers
  1. Insertion Sort Animation (with source code line by line visualization)
  2. Insertion Sort in Java Applets Centre
  3. Animated Sorting Algorithms: Insertion Sort

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

The insertion sort works just like its name suggests - it inserts each item into its proper place in the final list. The simplest implementation of this requires two list structures - the source list and the list into which sorted items are inserted. To save memory, most implementations use an in-place sort that works by moving the current item past the already sorted items and repeatedly swapping it with the preceding item until it is in place.

Like the bubble sort, the insertion sort has a complexity of O(n2). Although it has the same complexity, the insertion sort is twice as efficient as the bubble sort.

Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.

Abstract: Traditional INSERTION SORT runs in O(n&sup2;) time because each insertion takes O(n) time. When people run INSERTION SORT in the physical world, they leave gaps between items to accelerate insertions. Gaps help in computers as well. This paper shows that GAPPED INSERTION SORT has insertion times of O(log n) with high probability, yielding a total running time of O(n log n) with high probability. (Update)

insertionsort

Dichotomic insertion consists in using a dichotomic search to determine where to insert the new item, taking advantage of the fact the sub array is already sorted. This yield the most efficient algorithm in terms on number of comparisons, with a guaranteed bound of sum(log2 i) for i in [1..n-1], or log2 (n-1)!, which is strictly less than n log2 n. Unfortunately, this algorithm is not efficient in practice, because, to the opposite of Selection sort, it does on average of n*n exchanges. There are two variants, depending if we assume the array to be almost sorted (DichoSort) or random or inverted (DichoSort2).



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019