Fastest Sorting Algorithms ... Guv?

**Nucleus** · Mar 17th, 2002, 03:22 AM

What is the fastest sorting algorithm out there?

I was bored yesterday..... so I came up with an algorithm myself.... and when I tested it, it turned out faster than the implementations of:

Insertion, Shell, QuickSort, QSort

that I found here:

http://epaperpress.com/sortsearch/index.html

I only spent a day on the algorithm so I am dubious that these algorithms are the fastest available. I want to find the fastest algorithm out there and preferable a VB implementation of it.

**kovan** · Mar 17th, 2002, 05:22 AM

each of those sorts are for different types of info and how your info is layed out
quicksort in some cases could be fastest, and in some cases be slowest and so on and so forth, it all depends on type of data, and structure of data

**Guv** · Mar 17th, 2002, 08:21 PM

Nucleus: If you devised an algorithm which is faster than KwikSort, could you post a description of it? KwikSort is known to be one of the faster methods.

Insertion, Shell, & KwikSort are sort in place methods. As far as I know, a proper implementation of KwikSort should usually be faster that the other two unless there is something very special about the data.

BTW: Using KwikSort au lieu de Quick Sort used to be jargon that identified a programmer as an expert in the world of sorting.

Insert Sort is pretty good for files if most records are close to their final position. It might be faster than KwikSort for such locally sequenced records. Otherwise it is a poor method, but better than the pathetic ones like Bubble Sort. I never studied Shell Sort, but assume that it is slightly better than Insert Sort because it moves some records long distances. Like Insert Sort, it seems well suited to files with records fairly close to their final positions, but is better than Insert Sort when a lot of records are not close to their final position.

KwikSort has an advantage in that it can be implemented using half exchanges au lieu de full exchanges. When sorting large records, this is an advantage which is almost impossible for other methods to overcome. KwikSort tends to work efficiently when many records are far away from their final position. In most implementations of KwikSort, the KwikSort method is used until the file has groups of 5-12 records in their final position. Then Insert Sort is used for a final cleanup phase. I do not know why it is done this way, but assume that somebody did an analysis which indicated this is the way to go.

There is an algorithm called Heap Sort, which is supposed to be very fast, but I know nothing about this algorithm. There is a citation in some thread which mentions a book containing descriptions of almost every known sorting algorithm. Included (I think) is a an analysis of the efficiency of each method.

The really big sorting jobs used to involve humongous mainframe files on magnetic tape. Merging methods were generally used. Sometimes distribution sorts were used.

A Merging Sort starts with an initial pass at the entire file. This initial phase creates groups (called strings of records in order. The groups are written alternately to two or more tape files. From that point on, multiple passes are used to merge the groups into ever larger groups until the entire file is sorted.

Merging sorts were the only possible methods when magnetic tape was the only medium with enough capacity for the files being sorted. In recent times, I guess (but am not sure) that many (if not most) mainframe sorting is done with disk files. When sorting a huge disk file with large records, key sorting techniques are used to minimize I/O time. The file is read once to build a file of short records, each containing a sort key and the disk address of the corresponding record. These short records are sorted and then used to put the file in order.

There were some ingenuous methods devised to make merging tape sorts more efficient. There were methods which required sorting some groups of records into reverse order. These methods eliminated tape rewinds (a time consuming process) by allowing a tape to be read backwards after having been written. One such method was called a Fibonacci Sort because it was based on the Fibonacci sequence.

Distribution sorts take advantage of natural sequences in a file. Two obvious examples relate to huge Open Order files in customer sequence which require sorting to Sales Office & Customer sequence for sales statistics reports and to Product/Plant sequence for Production Control Reports.

Most companies have at most 50-100 Sales Offices, with 10-15 of those offices being responsible for 70% or more of the total orders. A distribution sort starts by reading the file and writing a file for each major Sales Office. Other records are written to a miscellaneous file. After one pass, all the records are in Sales Office Customer sequence, except for those in the miscellaneous file, which is then sorted using another Distribution sort or some other method. If you can assign one file to each Sales Office, the entire file is sorted in one pass at the original file, because the customer sequencing of the original file is not disturbed by this method.
Similarly, most companies have at most 25-50 product/plant combinations, with certain combinations comprising 70% or more of the file. The method here is similar to the above.

When usable, distribution sorting tends to be faster than any other method.

**Nucleus** · Mar 18th, 2002, 03:54 AM

Thanks for all that info Guv!

As I said, this was my first attempt at creating a sorting function, so you can imagine my surprise at finding it much faster at sorting say 2 million records than those I found at the site I previously posted.

What I focused on was an algorithm that tries to minimise the time taken to perform a "switch", as I saw that as the performance bottleneck.... (rather than minimising the number of switches themselves)

So I created a second array that holds the links to higher and lower level elements. In this way I do not have to move data around in the array to switch the values.

Checks are faster than switches but I still tried to minimise checks by implementing a midpoint optimisation for the new values.

Anyway I have attached a project with the function I wrote. I haven't really done much reading yet, so I am probably covering old territory..... even so it is good to start getting to know this stuff. You'll see that it is much faster than the implementation of quicksort there, perhaps a faster implementation of quicksort exists?

I think I'll read more about QuickSort and see if I can improve on the implementation....

**Nucleus** · Mar 18th, 2002, 03:56 AM

Guv, I also notice that over 2 million records, because I use a second array... I start running into memory problems

**Guv** · Mar 18th, 2002, 12:55 PM

Nucleus: I downloaded your Sorting Demo and used it just a little bit. Then I scanned your code. A few questions and comments follow.

I did not see a Randomize Statement, which probably makes sense. You will always use the same numbers, making the comparisons more valid. There is a very small risk of the particular set of numbers being biased in favor of one algorithm. The data is unlikely to favor insertion sort, but I am not sure about the other methods.

The description of your own method did not enlighten me much. I did not analyze the code to get a better idea of how it works.

I have no idea whether or not your code actually sorts the array. Did you ever analyze the array after sorting it to make sure that each sort really puts all the items in order? I added the following code after your display of the elapsed time.

VB Code:

For i = 1 To FDemo.ItemCount - 1
        If A(i + 1) < A(i) Then
            Junk = MsgBox("Item" & CStr(i) & "out of order")
          End If
    Next i

It indicated that KwikSort had a bug.

KwikSort seems to use full switches au lieu de half switches, which makes it slower than necessary.

BTW: Most people say exchanges rather than switches, but in this context you are not likely to confuse anybody.

Also, I usually have seen Kwiksort implemented with the cleanup Insertion Sort called after KwikSort has finished. Internally, KwikSort ignores short groups rather than using Insertion Sort inside the KwikSort Subroutine.

While recursion is slicker, KwikSort is often implemented with a stack rather than recursion. This tends to use less memory, and perhaps is faster. I have no idea about the efficiency of stack versus recursion.

**Alphanos** · Mar 19th, 2002, 03:07 PM

I was recently doing some work with sorting as well, and have a quicksort that is probably faster (for numbers at least). Check this out and tell me what you find

. Using a numeric array of 32k elements ranging from 0 to 100k, this quicksort was much faster, but I haven't done a lot of tests

.

VB Code:

Public Sub IterativeQuickSort(ByRef LongArray() As Long)
    If UBound(LongArray) - LBound(LongArray) < 5 Then MiniSort LongArray, LBound(LongArray), UBound(LongArray): Exit Sub
    
    Dim lngMid As Long
    Dim lngHi As Long
    Dim lngLo As Long
    Dim Temp As Long
    Dim Lower As Long
    Dim Upper As Long
    Dim Count As Long
    Dim MyStack As Collection
    Set MyStack = New Collection
    
    MyStack.Add UBound(LongArray)
    MyStack.Add LBound(LongArray)
    Count = 2
    
    Do Until Count = 0
        Lower = MyStack(Count)
        MyStack.Remove Count
        Count = Count - 1
        Upper = MyStack(Count)
        MyStack.Remove Count
        
        If Upper - Lower > 4 Then
            Temp = 0.5 * (Upper - Lower) + Lower
            
            lngMid = LongArray(Temp)
            LongArray(Temp) = LongArray(Lower)
            lngLo = Lower
            lngHi = Upper
            
            Do
                Do While LongArray(lngHi) >= lngMid And lngHi > lngLo
                    lngHi = lngHi - 1
                Loop
                
                If lngHi = lngLo Then
                    LongArray(lngLo) = lngMid
                    Exit Do
                End If
                
                LongArray(lngLo) = LongArray(lngHi)
                lngLo = lngLo + 1
                
                Do While LongArray(lngLo) <= lngMid And lngLo < lngHi
                    lngLo = lngLo + 1
                Loop
                
                If lngLo = lngHi Then
                    LongArray(lngHi) = lngMid
                    Exit Do
                End If
                
                LongArray(lngHi) = LongArray(lngLo)
                lngHi = lngHi - 1
            Loop
            
            If (lngLo - Lower) > (Upper - lngLo) Then
                If lngLo > 1 Then
                    MyStack.Add lngLo - 1
                    MyStack.Add Lower
                End If
                MyStack.Add Upper
                MyStack.Add lngLo + 1
            Else
                MyStack.Add Upper
                MyStack.Add lngLo + 1
                If lngLo > 1 Then
                    MyStack.Add lngLo - 1
                    MyStack.Add Lower
                End If
            End If
        Else
            MiniSort LongArray, Lower, Upper
        End If
        Count = MyStack.Count
    Loop
    
    Set MyStack = Nothing
End Sub
 
 
Public Sub MiniSort(ByRef LongArray() As Long, ByRef Lower As Long, ByRef Upper As Long)
    Dim lngLargest As Long
    Dim lngMid1 As Long
    Dim lngMid2 As Long
    Dim lngMid3 As Long
    
    Select Case Upper - Lower
        Case 1
            If LongArray(Lower) > LongArray(Upper) Then SwapL LongArray(Lower), LongArray(Upper)
            Exit Sub
        Case 2
            lngMid1 = Lower + 1
            
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(lngLargest), LongArray(Upper)
 
            If LongArray(Lower) > LongArray(lngMid1) Then SwapL LongArray(Lower), LongArray(lngMid1)
            Exit Sub
        Case 3
            lngMid1 = Lower + 1
            lngMid2 = lngMid1 + 1
 
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
            If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
 
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
 
            If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
            Exit Sub
        Case 4
            lngMid1 = Lower + 1
            lngMid2 = lngMid1 + 1
            lngMid3 = lngMid2 + 1
 
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
            If LongArray(lngMid3) > LongArray(lngLargest) Then lngLargest = Lower + 3
            If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
 
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
            If Not (LongArray(lngMid3) > LongArray(lngLargest)) Then SwapL LongArray(lngMid3), LongArray(lngLargest)
 
            lngLargest = Lower
            If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
            If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
 
            If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
            Exit Sub
    End Select
 
End Sub
 
 
 
Private Sub SwapL(ByRef Num1 As Long, ByRef Num2 As Long)
    Dim Temp As Long
    Temp = Num1
    Num1 = Num2
    Num2 = Temp
End Sub

Its iterative, not recursive; uses a collection to store pairs of upper and lower boundaries yet to sort

.

BTW, tell me if you find ways to optimize it further

.

**Guv** · Mar 19th, 2002, 04:11 PM

Alphanos: You seem to have a lot of code for the KwikSort algorithm. I am too lazy to analyze that much code.

A quick scan indicates that, like Nucleus, you are doing full exchanges au lieu de half exchanges. When sorting a huge number of items or large records from a disk file, the difference is significant. You call them swaps, while Nucleus calls them switches. The normal jargon is exchanges. although anything that is undestandable in context is okay.

Try downloading the Nucleus application and modifying it to execute your sort as well as the ones already implemented. If you add the code I suggested in a previous post, you will be able to make sure you really are sorting an entire array into the correct sequence.

The Nucleus application is easy to work with and it does a time comparison of various methods.

I hope that you and Nucleus realize that we are reinventing wheels.

**kedaman** · Mar 19th, 2002, 04:16 PM

Some of heapsort:

Heapsort is specifically well suited for datastructures called heaps, that are full binary trees where each child is smaller(or larger) than its parent, and is implemented as an array. Heaps are priority queues, which are associative arrays that pops only the smallest(or largest) item.

**Alphanos** · Mar 21st, 2002, 01:49 PM

As a matter of fact I just copied a piece of code from my project which I was using to test sorting algorithms, I've had it around for a while. I'll post the whole thing if you'd like. It only compares two algorithms at once, but you can always alter which ones easily enough.

You'll notice that most recently I was comparing nusort with my optimized iterative quicksort. It will display the number of clock ticks for the first sort, then the second, and list the sorted number arrays in two listboxes. Its not in any way fancy, as I only ever intended it for my personal tests. It will make sure its sorting them the same though.

Although I haven't taken any time to check it out, I'll take a wild guess and assume that by half exchanges you mean sorting pointers instead of records. Since the algorithm I posted only sorts longs, and not other variable types, this would be pointless, although in a more universal sorting algorithm it would possibly be faster.

I'm not sure if you can say that we're re-inventing the wheel, at least for me, cause I'm just trying to make sure that I have a well-optimized implementation of a well-known algorithm, I'm not really trying to make something new up.

**Guv** · Mar 21st, 2002, 04:08 PM

Alphanos: When I referred to half exchanges, I was referring to the fact that KwikSort makes a lot of comparisons and Exchanges using the same key. I guess it is best illustrated by one phase of a quick sort using single character keys and no associated records. Assume you pick the middle item as the KwikSort Pivot. An initial string might be: KXACDWQMPRDYN. The stages of one phase of the algorithm would be the following (Exchanges indicated by bold letters).

KXACDWQMPRDYN

KQACDWXMPRDYN
KNACDWXMPRDYQ
KNACDQXMPRDYW
KNACDDXMPRQYW
KNACDDQMPRXYW
KNACDDPMQRXYW

Notice that Q was involved in every Exchange (Switch, Swap). It could have been put into a temporary location when the KwikSort phase started. Then the other elements would be copied, using Half Exchanges. At the end of the phase, Q would be copied from the temporary location to its final position. The array would look like the following using Half Exchanges..

KXACDWQMPRDYN

KXACDWXMPRDYN
KNACDWXMPRDYN
KNACDWXMPRDYW
KNACDDXMPRDYW
KNACDDXMPRXYW
KNACDDPMPRXYW
KNACDDPMQRXYW

The final step copied Q from the temporary to its final position. The duplicates are position where Q would have been put if Full Exchanges were used.

KwikSort has two advantages over the inferior algorithms like Bubble Sort.

It is not restricted to Local Exchanges. Id est: One exchange often moves an item a long distance in the array. This tends to be efficient.
It uses Half Exchanges, as illustrated above. This gains a lot if entire records instead of merely keys are being Exchanged.

**kedaman** · Mar 22nd, 2002, 06:16 AM

Guv
Shellsort is rather closer to O(log2(n)) algoritms than inserition sort, which is the fastest of O(n^2) algoritms. Shellsort starts out with global exchanges and then goes more and more local, its implementation is basically insertion sort on subranges picked of the total range for each n'th element where n decrements (halves in simple implementations of shellsort, while jumps with somewhat irregular jump like (..15,7,3,1) in more optimized versions) thus it performs about O(log2(n)) but is still generally slower than quicksort. Unlike quicksort shell has no worstcase problem, sorting a already sorted range tends to slow down to the bubble sort class, although you can develope further on the quicksort with pivot choosing. Insertion short is faster than all other algoritms including quicksort on short ranges with less than 30 items, reason why you can see them implemented within quicksort. Radixsort can be faster than quicksort at times, not too sure but probably on shorter and not too spread out ranges. Bucket sort is extreemly fast O(n) but is only good limited (and preferably filled) ranges, and hashsort similar, a bit slower but better for ranges.

jim mcnamara · Mar 22nd, 2002, 11:17 PM

The current degree of 'sortedness' affects many algorithms, as does the overal distribution of values, as does redunancy.

Guv got it on the reinventing wheels - see a 1968 3-set volume 'Art of Computing' by Don Knuth.

If you code in C, ansi C provides qsort - a quicksort algorithm.
It swaps pointers to array elements which is faster than 'swapping' strings or other long datatypes.

Also - look up linear time sort it's part of basic computer sci today.

http://www.cs.unc.edu/~lin/COMP122-F01/

wossname · Apr 3rd, 2002, 01:36 PM

I too recommend reading TAOCP by Knuth (I've got the 1997 version (smug grin

!)).

The number of 'cut and shut' versions of Quicksort (Knuth doesn't say Kwiksort and he must be the god of sorting!) out there is amazing. I recently saw someone palming off his take on it, it was about 7 lines of code long and bore no relation to the genuine algorithm at all.

I'm sure Knuth has the real McCoy in his mucho compendious tome. He also gives verbose investigations of speed attributes of more than 20 sorting algorithms.

Heavy reading matter but the technical content is unbeatable.

**Guv** · Apr 3rd, 2002, 04:34 PM

Sorting gurus use Kwiksort instead of Quick Sort, in writing memos and letters, but in a formal book on algorithms, they cater to the academic types who would think they did not know how to spell if they wrote Kwiksort.

Nobody wants to seem to be a diseducated bum.

Thread: Fastest Sorting Algorithms ... Guv?

Thread Tools

Display

Fastest Sorting Algorithms ... Guv?

Posting Permissions