Wednesday, January 29, 2014

Sorting Algorithms

Today we are going to implement the most common typical algorithms and compare their results.  In same cases we will give the common algorithm with some optimizations.

Simple Sorts:

In this section, we deal with 3 of the simplest sorting algorithms: bubble sort, insertion sort (with a variation called binary insertion sort) and selection sort. Bubble sort is usually considered the less efficient of the three.  As we will show insertion sort is generally faster than selection sort, due to fewer comparisons and good performance on almost-sorted data, but in the case that writing operation is much more costly than comparisons then selection sort is preferred.

    Bubble sort:

   public void BubbleSort(E[] array)
   {
for(int i=0; i<array.length; i++)
{
   for(int j=array.length-1; j>i; j--)
   {
      if(array[j].compareTo(array[j-1])<0)
swap(array,j,j-1);
   }
}
   }

    Insertion Sort:

   public void InsertionSort(E[] array)
   {
for(int j=1; j<array.length; j++)
{
            E key= array[j];
    int i=j-1;
    while(i>=0 &&  array[i].compareTo(key)>0)
    {
array[i+1] = array[i];
i = i-1;
    }
    array[i+1] = key;
}
   }

  
An improvement can be done, reducing the number of comparisons to  O(n logn) using binary search to locate where to insert the data. The algorithm still has a running time of O(n^2) on average because of the swaps required for each insertion.

   Binary Insertion Sort

  public void BinaryInsertionSort(E[] array)
  { 
for(int j=1; j<array.length; j++)
{
      E key= array[j];
   //Binary Search (position to insert indicated by imax)
   int imin = 0; 
   int imax = j;
   while(imax>imin)
   {
int midpoint  = (imin+imax)/2;
if(array[midpoint].compareTo(array[j])>0) 
      imax = midpoint;
else
   imin = midpoint +1;
   }
   for(int i=j-1; i>=imax; i--)
array[i+1] = array[i];
   array[imax] = key;
} a
   }

  Selection Sort:


   public void SelectionSort(E[] array)
   {
for(int i=0; i<array.length-1; i++)
{
      int min = i;
   for (int j=i+1; j<array.length; j++)
   {
if(array[min].compareTo(array[j])>0)
      min=j;
   }
   swap(array,i,min);
}
   }

Efficient algorithms:

In this section we implement the sorting algorithms with average complexity (and generally worst-case complexity) O(n log n), of which the most common are heap sort, merge sort, and quicksort. We also provide an improved version of merge sort where the merge uses a temporary array of O(n/2) instead of n elements. 

Another very used sorting algorithm based on merge sort is Timsort, that takes advantage of the already sorted sequences, this is the default sorting algorithm for Python and to order array of Objects in Java 7 (previous used merge sort). We will implement Timsort in a future post.

   Heap Sort:

   public void heapSort(E[] array)
   { 
int heapSize = buildHeap(array);
for (int i=0; i<array.length-1; i++)
{
   heapSize--;
   swap(array,0, heapSize);
   maxheapify(array, 0, heapSize);
}
   }
   private void maxheapify(E[]array, int i, int heapSize)
   {
int l = 2*i+1;
int r = 2*i+2;
int largest = i;
if (l<heapSize && array[l].compareTo(array[i])>0)
   largest = l;
if (r<heapSize && array[r].compareTo(array[largest])>0)
   largest = r;
if (largest!=i)
{
    swap(array, i, largest);
    maxheapify(array, largest, heapSize);
}
   }
   private int buildHeap(E[] array)
   {
for(int i=(int) ((array.length-1)/2); i>=0; i--)
   maxheapify(array, i, array.length);
return array.length;
   }

   Quicksort:

   public void quickSort(E[] array)
   {
//Shuffling the array using Fisher–Yates_shuffle can improve the performance when the data is not random
quickSort(array,0,array.length-1);
   }

   private void quickSort(E[] array, int p, int r) 
   {
if (p < r)
{
   int q=partition(array,p,r);
   quickSort(array,p,q-1);
   quickSort(array,q+1,r);
}
   }
   private int partition(E[] array, int p, int r) 
   {
E pivot= array[p];
int i = p;
for(int j=p+1; j<=r; j++)
{
   if (array[j].compareTo(pivot)<)
   {
i++;
swap(array,j,i);
   }
}
swap(array,i,p);
return i;
   }

    Merge sort:

   public void mergeSort(E[] array)
   {
mergeSort(array,0,array.length-1);
   }

   private void mergeSort(E[] array, int p, int r) 
   {
if (p<r)
{
     int q= (p+r)/2;
   mergeSort(array, p, q);
   mergeSort(array, q+1, r);
   merge(array,p,q,r);
}
   }

   private void merge(E[] array, int p, int q, int r)
   {
   //Input: Array E, being E[p...q] and E[q+1...r] in sorted order, with p<=q<=r
   //Output: Array E, with E[p...r] sorted
int n1 = q-p+1;
int n2 = r-q;
Comparable[] array1 = new Comparable [n1];
Comparable[] array2 =  new Comparable [n2];
for(int i=0; i<n1; i++)
   array1[i] = array[p+i];
for(int i=0; i<n2; i++)
   array2[i] = array[q+1+i];
int i=0;
int j=0;
for(int k=p; k<=r; k++)
{
   if((i<n1 && j<n2 && array1[i].compareTo(array2[j])<0) || (j==n2))
   {
array[k]=(E) array1[i++];
   }
   else
   {
array[k]=(E) array2[j++];
   }
}
   }

    Improvement of merge, that only copies the smallest array into a temporary array:
   
   private void merge2(E[] array, int p, int q, int r)
   {
   //Input: Array E, being E[p...q] and E[q+1...r] in sorted order, with p<=q<=r
   //Output: Array E, with E[p...r] sorted
int n1 = q-p+1;
int n2 = r-q;
if(n1<n2)
{
   Comparable[] arrayTemp = new Comparable [n1];
   for(int i=0; i<n1; i++)
arrayTemp[i] = array[p+i];
   int i=0;
   int j=q+1;
   for(int k=p; k<=r; k++)
   {
if((i<n1 && j<=r && arrayTemp[i].compareTo(array[j])<0) || (j==r+1))
{
   array[k]=(E) arrayTemp[i++];
}
else
{
   array[k]=(E) array[j++];
}
   }
}
else
{
   Comparable[] arrayTemp = new Comparable [n2];
   for(int i=0; i<n2; i++)
   arrayTemp[i] = array[q+1+i];
   int i=q;
   int j=n2-1;
   for(int k=r; k>=p; k--)
   {
if((i>=p && j>=0 && arrayTemp[j].compareTo(array[i])>0) || (i==p-1))
{
   array[k]=(E) arrayTemp[j--];
}
else
{
   array[k]=(E) array[i--];
}
   }
}
   }

Results:

Small size

 This experiment compares all the algorithms proposed for arrays of size 1000 or less. We sort a random generated Double array and each experiment is repeated 100,000 times. The next figure shows the average of them.

Big size
This experiment compares only the efficient algorithms proposed for arrays from 1,000 to 10,000,000. Each experiment is repeated 100 times and the rest of the things are identical to the previous experiment.


Conclusions:
We see that in tiny arrays, less than 60 elements, insertion sort is the fastest algorithm. This results seem to confirm the Timsort results[1], that they recommend a minrun between 32 and 64, each minrun is sorted with binary insertion sort. Our experiments seem to indicate that insertion sort has less overhead than binary insertion sort for this small arrays.

Another conclusion is that Quicksort is the fastest algorithm in average for random arrays (this can be guaranteed shuffling the array at the beginning). This happens even quicksort does 39% more comparisons (1.39 n log n vs 1 n log n) , the main reasons for quicksorting performing faster than merge sort and heapsort is that it has a good locality of reference when we do the partition in place, essentially two linear scans are done from both ends of the array, due to this fact it have very few cache misses. This is the also the reason why heapsort performs worse (it takes more than twice the time than quicksort) as it has to access to very distant position of the arrays.

[1] Peters, Tim "python_timsort", http://bugs.python.org/file4451/timsort.txt

No comments :

Post a Comment