|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| See Also | Recommended Books | Recommended Links | Donald Knuth | |
| Animations | Test suits | History | Humor | Etc |
There are two major modifications of insertion sort: linear and binary. They can be distinguished by the method of finding a place to insert the current element. It's natural to expect that for large sets binary search can speed insertions several times and thus achieve higher sorting speed at the cost of some additional complexity.
But linear insertion is easier to explain and let's start with it. We will assume that we are sorting an array in ascending order.
Insertion sort always maintains two zones in the array to be sorted: sorted and unsorted. At the beginning the sorted zone consist of one element. On each step the algorithms expand it by one element inserting the first element from the unsorted zone in the proper place in the sorted zone and shifting all larger elements one slot down. It is an algorithms that many people intuitively use for sorting cards and it is very easy to illustrate on the deck of cards. Here is an example (sorted zone is in blue, unsorted is in red):
5 | 3 1 7 0 -> 3 5 | 1 7 9 -> 1 3 5 | 7 9 -> 1 3 5 7 | 9 -> 1 3 5 7 9
Since multiple keys with the same value are placed in the sorted array in the same order that they appear in the input array, insertion sort is stable.
If input data contains already sorted fragments insertion sort works slightly faster, but unless the sorted fragment is found of the very beginning of the data set, it moved keys one by one.
If the input data are sorted in the opposite direction, insertion soft cannot take advantage of this.
Out of three elementary sort algorithms insertion sort uses fewer comparisons, so it might be faster when comparisons are expensive, for example long similar strings.
C implementation usually contains two nested loops:
Here is sample non-optimal insertion sort implementation that contains the flaws that I mentioned:
void insertionSort(int a[], int array_size)
{
int i, j, current;
for (i=1; i < array_size; i++)
{
current = a[i];
j = i; # index of the end of sorted region
while ((j > 0) && (a[j-1] > current))
{
a[j] = a[j-1];
j = j - 1;
}
a[j] = current;
}
}
|
Here is a slightly better implementation that avoids using the second for loop that I prepared for my algorithms class many years ago (note that the author is no Knuth and algorithms is somewhat skewed to use available in C control structures; this is not how this algorithm should be programmed in assembler):
/* insert sort */
/* Nikolai Bezroukov, 1996 */
#include <stdio.h>
#include <stdlib.h>
typedef int itemtype; // type of item to be sorted
int total_comp=0;
int total_moves=0;
void insertSort(
itemtype *a, // array to be sorted
int n // size of the array; note that (n-1) is the upper index of the array
)
{
itemtype t;
int i, j;
/* Outer "boundary definition" loop */
for (i = 1; i < n; i++) {
if ( a[i-1]<=a[i] ) { total_comp++; continue;}
t = a[i]; total_moves++;
/* inner loop: elements shifted down until insertion point found */
for (j = i-1; j >= 0; j--) {
total_comp++;
if ( a[j] <= t ) { break; }
a[j+1] = a[j]; total_moves++;
}
/* insert */
a[j+1] = t; total_moves++;
}
}
|
Insertion Sort (ciips.ee.uwa.edu.au)
Insertion Sort (nice description)
Insertion Sort (www.personal.kent.edu)
Insertion Sort simple animation
binary insertion sort -- an optimization of the insertion sort where the proper location is found by binary search.
bender...004librarysort.ps PS.gz PS PDF Image Update Help
Insertion Sort (linux.wku.edu)
The insertion sort works just like its name suggests - it inserts each item into its proper place in the final list. The simplest implementation of this requires two list structures - the source list and the list into which sorted items are inserted. To save memory, most implementations use an in-place sort that works by moving the current item past the already sorted items and repeatedly swapping it with the preceding item until it is in place.
Like the bubble sort, the insertion sort has a complexity of O(n2). Although it has the same complexity, the insertion sort is twice as efficient as the bubble sort.
Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.
Abstract: Traditional INSERTION SORT runs in O(n²) time because each insertion takes O(n) time. When people run INSERTION SORT in the physical world, they leave gaps between items to accelerate insertions. Gaps help in computers as well. This paper shows that GAPPED INSERTION SORT has insertion times of O(log n) with high probability, yielding a total running time of O(n log n) with high probability. (Update)
Dichotomic insertion consists in using a dichotomic search to determine where to insert the new item, taking advantage of the fact the sub array is already sorted. This yield the most efficient algorithm in terms on number of comparisons, with a guaranteed bound of sum(log2 i) for i in [1..n-1], or log2 (n-1)!, which is strictly less than n log2 n. Unfortunately, this algorithm is not efficient in practice, because, to the opposite of Selection sort, it does on average of n*n exchanges. There are two variants, depending if we assume the array to be almost sorted (DichoSort) or random or inverted (DichoSort2).
Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: February 28, 2008