Sunday, November 2, 2008

Algorithms & Data Structures - Sorting 1 and Analysis

  • Sort problem is obvious to all
  • Several distinct algorithms to solve it
  • Algorithms use different data structures
  • Algorithm performance varies widely
  • See Chapters 1, 2, and 7
    • For now, skip ShellSort and QuickSort
  • Bubble Sort
  • Insertion Sort
  • Merge Sort
  • And analysis as we go…


BubbleSort
procedure BubbleSort( var A : InputArray; N : int);
var
j, P : integer;
begin
for P := N to 2 by -1 do begin
for j := 1 to P - 1 do begin
if A[ j ] > A[ j + 1 ] then
Swap( A[ j ], A[ j + 1 ] );
end; {for}
end; {for}
end;

Template BubbleSort class
template<class T> class CBubbleSort {
public:
void Sort(T *A, int N)
{
for(int P = N-1; P >= 1; P--)
{
for(int j = 0; j<=P-1; j++) { if(A[j + 1] < t =" A[j">Utilizing it…

// Data …
vector<double> data;

// fill in the data…

// Instantiate the sort object…
CBubbleSort<double> sort;

sort.Sort(&data[0], data.size());

Running time for BubbleSort
How many times does c,d get executed?
for P := N to 2 by -1 do begin
for j := 1 to P - 1 do begin
if A[ j ] > A[ j + 1 ] then
Swap( A[ j ], A[ j + 1 ] );
end; {for}
end; {for}

Running time for BubbleSort
How many times does c,d get executed?

So, we could say:


Big-Oh notation
Definition: T(N)=O( f(N) ) if there are positive constants c and n0 such that T(N) <= cf(N) when N >= n0
Another notation:

Intuitively what does this mean?

Asymptotic Upper Bound


Example of Asymptotic Upper Bound




Is BubbleSort O(N2)?
  • Can we find a c and n0?:





  • Some rules to simplify things…
    • Loops:
      • Running time is body time times # iterations
    • Nested loops:
      • Analysis from the inside out…
    • Statements:
      • Assume time of 1 each
        • Unless the statement is an algorithm
    • So, we could say:



    Usage


    Exercise in Big O-notation


    What’s the minimum time?


    Omega notation







    Asymptotic Lower Bound


    What these tell us
  • O(f(N)) - Upper bound on running timg
  • Ω(g(N)) - Lower bound on running time


  • What about the “constants”?
    • Algorithm 1: O(N), actual time is:
      • T(N)=1,000,000N
    • Algorithm 2: O(N2), actual time is:
      • T(N)=10N2
      • When is algorithm 2 faster?


    Why do we care?
    • After all, computers just keep getting faster, don’t they?
      • Is today’s slow algorithm acceptable tomorrow?
    • Algorithm 1: O(N2)
    • Algorithm 2: O(N)
      • If your computer is suddenly 10 times faster, can you handle 10 times as much data in the same time?


    Faster Computer or Algorithm?
    √ What happens when we buy a computer 10 times faster?


    Constant running time


    Theta Notation


    Big O, Omega, Theta


    In general:
    O is used to denote the upper bound of the complexity
    Ω is used to denote the lower bound of the complexity
    Ø is used to denote the tight bound of the complexity

    Example, the general sorting problem has the time complexity Ω(nlogn). Any sorting based on compare-interchange takes c.nlogn time for some constant c.

    Sorting problem has a time complexity O(nlogn)
    (since there is an algorithm which sorts in O(nlogn) time)

    Aside: Can we make this faster in some cases?
    procedure BubbleSort( var A : InputArray; N : int);
    var
    j, P : integer;
    begin
    for P := N to 2 by -1 do begin
    for j := 1 to P - 1 do begin
    if A[ j ] > A[ j + 1 ] then
    Swap( A[ j ], A[ j + 1 ] );
    end; {for}
    end; {for}
    end;

    Insertion Sort
    procedure InsertionSort( var A : InputArray; N : integer);
    var j, P : integer; Tmp : InputType;
    begin
    for P := 2 to N do begin
    j := P;
    Tmp := A[ P ];

    while j > 1 and Tmp < A[ j - 1 ] do begin
    A[ j ] := A[ j - 1 ];
    j := j - 1;
    end; {while}
    A[ j ] := Tmp;
    end; {for}
    end;

    Worst case analysis
    • N outer loops
    • Each item moved maximum distance
      • For item i, that would be i-1, right?



    Best case analysis
    • N outer loops
    • Each is not moved at all
      • For item i, 0 moves, which is constant time.



    Worst-Case and Average-Case Analysis
    • Worst-case analysis
      • Big O, the upper bound, of the running time of any input of size N.
    • Average-case analysis
      • Some algorithms are fast in practice.
      • The analysis is usually much harder.
      • Need to assume that all input are equally likely.
      • Sometimes, it is not obvious what is the average.


    InsertionSort Average Case
  • On average, how much would you have to move a item to sort the list?
  • For item i, the minimum is 0 and the maximum is i-1. So, the average is i/2



  • Insertion Sort Average Case More formal solution
    • For sorting, we assume randomly ordered data.
    • Inversions:
      • An inversion in an array of numbers is any ordered pair (i, j) having the property that
      • i < j and A[i] > A[j]
    • How many inversions in this list:
      • 12, 17, 5, 7, 21, 1, 8
      • How many are possible altogether?


    Inversions
    √ The maximum inversions for a list of size N is N(N-1)/2
    √ Theorem: The average number of inversions in an array of N distinct numbers is N(N-1)/4

    Average case for exchanging adjacent elements
    √ When we exchange adjacent elements we remove exactly one inversion
    • 12, 17, 5, 7, 21, 1, 8
    √ We have to remove on average N(N-1)/4 inversions=Ω(N2)

    Mergesort
    procedure MergeSort( var A : InputArray; N : int);
    Begin
    MSort(A, 1, N)
    end;

    Procedure MSort(var A : InputArray; F, L : int);
    var q: int;
    Begin
    if F < L then begin
    q := (F + L) / 2
    MSort(A, F, Q);
    MSort(A, Q+1, L);

    Merge(A, F, Q, L);
    end;
    End;



    How long does merge take?
    • One of the earliest sorting sorting algorithms, invented by John von Neumann in 1945.
    • Mergesort is Divide-and-conquer Algorithm
      • Given an instance of the problem to be solved, split this into several, smaller, sub-instances (of the same problem) independently solve each of the sub-instances and then combine the sub-instance solutions so as to yield a solution for the original instance.
      • The problem is divided into smaller problems and solved recursively.
      • Quicksort and heapsort are also such algorithms





    Running Time
  • T(1)=O(1)
  • T(N)=T(N/2)+T(N/2)+O(N)


  • Mergesort
    • O(n log n) worst-case running time
    • Same ops done regardless of input order
    • Mergesort is Divide-and-conquer Algorithm
    • Copying to and from temporary array
      • Extra memory requirement
      • Extra work







    0 comments: