Results 1 to 15 of 15

Thread: Fastest Sorting Algorithms ... Guv?

  1. #1

    Thread Starter
    Registered User Nucleus's Avatar
    Join Date
    Apr 2001
    Location
    So that's what you are up to ;)
    Posts
    2,530

    Wink Fastest Sorting Algorithms ... Guv?

    What is the fastest sorting algorithm out there?


    I was bored yesterday..... so I came up with an algorithm myself.... and when I tested it, it turned out faster than the implementations of:

    Insertion, Shell, QuickSort, QSort

    that I found here:

    http://epaperpress.com/sortsearch/index.html

    I only spent a day on the algorithm so I am dubious that these algorithms are the fastest available. I want to find the fastest algorithm out there and preferable a VB implementation of it.

  2. #2
    Frenzied Member
    Join Date
    Aug 2000
    Posts
    1,539
    each of those sorts are for different types of info and how your info is layed out
    quicksort in some cases could be fastest, and in some cases be slowest and so on and so forth, it all depends on type of data, and structure of data

  3. #3
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Nucleus: If you devised an algorithm which is faster than KwikSort, could you post a description of it? KwikSort is known to be one of the faster methods.

    Insertion, Shell, & KwikSort are sort in place methods. As far as I know, a proper implementation of KwikSort should usually be faster that the other two unless there is something very special about the data.

    BTW: Using KwikSort au lieu de Quick Sort used to be jargon that identified a programmer as an expert in the world of sorting.

    Insert Sort is pretty good for files if most records are close to their final position. It might be faster than KwikSort for such locally sequenced records. Otherwise it is a poor method, but better than the pathetic ones like Bubble Sort. I never studied Shell Sort, but assume that it is slightly better than Insert Sort because it moves some records long distances. Like Insert Sort, it seems well suited to files with records fairly close to their final positions, but is better than Insert Sort when a lot of records are not close to their final position.

    KwikSort has an advantage in that it can be implemented using half exchanges au lieu de full exchanges. When sorting large records, this is an advantage which is almost impossible for other methods to overcome. KwikSort tends to work efficiently when many records are far away from their final position. In most implementations of KwikSort, the KwikSort method is used until the file has groups of 5-12 records in their final position. Then Insert Sort is used for a final cleanup phase. I do not know why it is done this way, but assume that somebody did an analysis which indicated this is the way to go.

    There is an algorithm called Heap Sort, which is supposed to be very fast, but I know nothing about this algorithm. There is a citation in some thread which mentions a book containing descriptions of almost every known sorting algorithm. Included (I think) is a an analysis of the efficiency of each method.

    The really big sorting jobs used to involve humongous mainframe files on magnetic tape. Merging methods were generally used. Sometimes distribution sorts were used.

    A Merging Sort starts with an initial pass at the entire file. This initial phase creates groups (called strings of records in order. The groups are written alternately to two or more tape files. From that point on, multiple passes are used to merge the groups into ever larger groups until the entire file is sorted.

    Merging sorts were the only possible methods when magnetic tape was the only medium with enough capacity for the files being sorted. In recent times, I guess (but am not sure) that many (if not most) mainframe sorting is done with disk files. When sorting a huge disk file with large records, key sorting techniques are used to minimize I/O time. The file is read once to build a file of short records, each containing a sort key and the disk address of the corresponding record. These short records are sorted and then used to put the file in order.

    There were some ingenuous methods devised to make merging tape sorts more efficient. There were methods which required sorting some groups of records into reverse order. These methods eliminated tape rewinds (a time consuming process) by allowing a tape to be read backwards after having been written. One such method was called a Fibonacci Sort because it was based on the Fibonacci sequence.

    Distribution sorts take advantage of natural sequences in a file. Two obvious examples relate to huge Open Order files in customer sequence which require sorting to Sales Office & Customer sequence for sales statistics reports and to Product/Plant sequence for Production Control Reports.
    • Most companies have at most 50-100 Sales Offices, with 10-15 of those offices being responsible for 70% or more of the total orders. A distribution sort starts by reading the file and writing a file for each major Sales Office. Other records are written to a miscellaneous file. After one pass, all the records are in Sales Office Customer sequence, except for those in the miscellaneous file, which is then sorted using another Distribution sort or some other method. If you can assign one file to each Sales Office, the entire file is sorted in one pass at the original file, because the customer sequencing of the original file is not disturbed by this method.
    • Similarly, most companies have at most 25-50 product/plant combinations, with certain combinations comprising 70% or more of the file. The method here is similar to the above.
    When usable, distribution sorting tends to be faster than any other method.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  4. #4

    Thread Starter
    Registered User Nucleus's Avatar
    Join Date
    Apr 2001
    Location
    So that's what you are up to ;)
    Posts
    2,530
    Thanks for all that info Guv!

    As I said, this was my first attempt at creating a sorting function, so you can imagine my surprise at finding it much faster at sorting say 2 million records than those I found at the site I previously posted.

    What I focused on was an algorithm that tries to minimise the time taken to perform a "switch", as I saw that as the performance bottleneck.... (rather than minimising the number of switches themselves)

    So I created a second array that holds the links to higher and lower level elements. In this way I do not have to move data around in the array to switch the values.

    Checks are faster than switches but I still tried to minimise checks by implementing a midpoint optimisation for the new values.

    Anyway I have attached a project with the function I wrote. I haven't really done much reading yet, so I am probably covering old territory..... even so it is good to start getting to know this stuff. You'll see that it is much faster than the implementation of quicksort there, perhaps a faster implementation of quicksort exists?

    I think I'll read more about QuickSort and see if I can improve on the implementation....
    Attached Files Attached Files

  5. #5

    Thread Starter
    Registered User Nucleus's Avatar
    Join Date
    Apr 2001
    Location
    So that's what you are up to ;)
    Posts
    2,530
    Guv, I also notice that over 2 million records, because I use a second array... I start running into memory problems

  6. #6
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Nucleus: I downloaded your Sorting Demo and used it just a little bit. Then I scanned your code. A few questions and comments follow.

    I did not see a Randomize Statement, which probably makes sense. You will always use the same numbers, making the comparisons more valid. There is a very small risk of the particular set of numbers being biased in favor of one algorithm. The data is unlikely to favor insertion sort, but I am not sure about the other methods.

    The description of your own method did not enlighten me much. I did not analyze the code to get a better idea of how it works.

    I have no idea whether or not your code actually sorts the array. Did you ever analyze the array after sorting it to make sure that each sort really puts all the items in order? I added the following code after your display of the elapsed time.
    VB Code:
    1. For i = 1 To FDemo.ItemCount - 1
    2.         If A(i + 1) < A(i) Then
    3.             Junk = MsgBox("Item" & CStr(i) & "out of order")
    4.           End If
    5.     Next i
    It indicated that KwikSort had a bug.

    KwikSort seems to use full switches au lieu de half switches, which makes it slower than necessary.

    BTW: Most people say exchanges rather than switches, but in this context you are not likely to confuse anybody.

    Also, I usually have seen Kwiksort implemented with the cleanup Insertion Sort called after KwikSort has finished. Internally, KwikSort ignores short groups rather than using Insertion Sort inside the KwikSort Subroutine.

    While recursion is slicker, KwikSort is often implemented with a stack rather than recursion. This tends to use less memory, and perhaps is faster. I have no idea about the efficiency of stack versus recursion.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  7. #7
    Hyperactive Member
    Join Date
    Dec 2001
    Location
    I'm in front of the computer.
    Posts
    270
    I was recently doing some work with sorting as well, and have a quicksort that is probably faster (for numbers at least). Check this out and tell me what you find. Using a numeric array of 32k elements ranging from 0 to 100k, this quicksort was much faster, but I haven't done a lot of tests.

    VB Code:
    1. Public Sub IterativeQuickSort(ByRef LongArray() As Long)
    2.     If UBound(LongArray) - LBound(LongArray) < 5 Then MiniSort LongArray, LBound(LongArray), UBound(LongArray): Exit Sub
    3.    
    4.     Dim lngMid As Long
    5.     Dim lngHi As Long
    6.     Dim lngLo As Long
    7.     Dim Temp As Long
    8.     Dim Lower As Long
    9.     Dim Upper As Long
    10.     Dim Count As Long
    11.     Dim MyStack As Collection
    12.     Set MyStack = New Collection
    13.    
    14.     MyStack.Add UBound(LongArray)
    15.     MyStack.Add LBound(LongArray)
    16.     Count = 2
    17.    
    18.     Do Until Count = 0
    19.         Lower = MyStack(Count)
    20.         MyStack.Remove Count
    21.         Count = Count - 1
    22.         Upper = MyStack(Count)
    23.         MyStack.Remove Count
    24.        
    25.         If Upper - Lower > 4 Then
    26.             Temp = 0.5 * (Upper - Lower) + Lower
    27.            
    28.             lngMid = LongArray(Temp)
    29.             LongArray(Temp) = LongArray(Lower)
    30.             lngLo = Lower
    31.             lngHi = Upper
    32.            
    33.             Do
    34.                 Do While LongArray(lngHi) >= lngMid And lngHi > lngLo
    35.                     lngHi = lngHi - 1
    36.                 Loop
    37.                
    38.                 If lngHi = lngLo Then
    39.                     LongArray(lngLo) = lngMid
    40.                     Exit Do
    41.                 End If
    42.                
    43.                 LongArray(lngLo) = LongArray(lngHi)
    44.                 lngLo = lngLo + 1
    45.                
    46.                 Do While LongArray(lngLo) <= lngMid And lngLo < lngHi
    47.                     lngLo = lngLo + 1
    48.                 Loop
    49.                
    50.                 If lngLo = lngHi Then
    51.                     LongArray(lngHi) = lngMid
    52.                     Exit Do
    53.                 End If
    54.                
    55.                 LongArray(lngHi) = LongArray(lngLo)
    56.                 lngHi = lngHi - 1
    57.             Loop
    58.            
    59.             If (lngLo - Lower) > (Upper - lngLo) Then
    60.                 If lngLo > 1 Then
    61.                     MyStack.Add lngLo - 1
    62.                     MyStack.Add Lower
    63.                 End If
    64.                 MyStack.Add Upper
    65.                 MyStack.Add lngLo + 1
    66.             Else
    67.                 MyStack.Add Upper
    68.                 MyStack.Add lngLo + 1
    69.                 If lngLo > 1 Then
    70.                     MyStack.Add lngLo - 1
    71.                     MyStack.Add Lower
    72.                 End If
    73.             End If
    74.         Else
    75.             MiniSort LongArray, Lower, Upper
    76.         End If
    77.         Count = MyStack.Count
    78.     Loop
    79.    
    80.     Set MyStack = Nothing
    81. End Sub
    82.  
    83.  
    84. Public Sub MiniSort(ByRef LongArray() As Long, ByRef Lower As Long, ByRef Upper As Long)
    85.     Dim lngLargest As Long
    86.     Dim lngMid1 As Long
    87.     Dim lngMid2 As Long
    88.     Dim lngMid3 As Long
    89.    
    90.     Select Case Upper - Lower
    91.         Case 1
    92.             If LongArray(Lower) > LongArray(Upper) Then SwapL LongArray(Lower), LongArray(Upper)
    93.             Exit Sub
    94.         Case 2
    95.             lngMid1 = Lower + 1
    96.            
    97.             lngLargest = Lower
    98.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    99.             If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(lngLargest), LongArray(Upper)
    100.  
    101.             If LongArray(Lower) > LongArray(lngMid1) Then SwapL LongArray(Lower), LongArray(lngMid1)
    102.             Exit Sub
    103.         Case 3
    104.             lngMid1 = Lower + 1
    105.             lngMid2 = lngMid1 + 1
    106.  
    107.             lngLargest = Lower
    108.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    109.             If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
    110.             If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
    111.  
    112.             lngLargest = Lower
    113.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    114.             If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
    115.  
    116.             If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
    117.             Exit Sub
    118.         Case 4
    119.             lngMid1 = Lower + 1
    120.             lngMid2 = lngMid1 + 1
    121.             lngMid3 = lngMid2 + 1
    122.  
    123.             lngLargest = Lower
    124.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    125.             If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
    126.             If LongArray(lngMid3) > LongArray(lngLargest) Then lngLargest = Lower + 3
    127.             If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
    128.  
    129.             lngLargest = Lower
    130.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    131.             If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
    132.             If Not (LongArray(lngMid3) > LongArray(lngLargest)) Then SwapL LongArray(lngMid3), LongArray(lngLargest)
    133.  
    134.             lngLargest = Lower
    135.             If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
    136.             If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
    137.  
    138.             If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
    139.             Exit Sub
    140.     End Select
    141.  
    142. End Sub
    143.  
    144.  
    145.  
    146. Private Sub SwapL(ByRef Num1 As Long, ByRef Num2 As Long)
    147.     Dim Temp As Long
    148.     Temp = Num1
    149.     Num1 = Num2
    150.     Num2 = Temp
    151. End Sub


    Its iterative, not recursive; uses a collection to store pairs of upper and lower boundaries yet to sort.

    BTW, tell me if you find ways to optimize it further.
    Alphanos

  8. #8
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Alphanos: You seem to have a lot of code for the KwikSort algorithm. I am too lazy to analyze that much code.

    A quick scan indicates that, like Nucleus, you are doing full exchanges au lieu de half exchanges. When sorting a huge number of items or large records from a disk file, the difference is significant. You call them swaps, while Nucleus calls them switches. The normal jargon is exchanges. although anything that is undestandable in context is okay.

    Try downloading the Nucleus application and modifying it to execute your sort as well as the ones already implemented. If you add the code I suggested in a previous post, you will be able to make sure you really are sorting an entire array into the correct sequence.

    The Nucleus application is easy to work with and it does a time comparison of various methods.

    I hope that you and Nucleus realize that we are reinventing wheels.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  9. #9
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Some of heapsort:

    Heapsort is specifically well suited for datastructures called heaps, that are full binary trees where each child is smaller(or larger) than its parent, and is implemented as an array. Heaps are priority queues, which are associative arrays that pops only the smallest(or largest) item.
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  10. #10
    Hyperactive Member
    Join Date
    Dec 2001
    Location
    I'm in front of the computer.
    Posts
    270
    As a matter of fact I just copied a piece of code from my project which I was using to test sorting algorithms, I've had it around for a while. I'll post the whole thing if you'd like. It only compares two algorithms at once, but you can always alter which ones easily enough.

    You'll notice that most recently I was comparing nusort with my optimized iterative quicksort. It will display the number of clock ticks for the first sort, then the second, and list the sorted number arrays in two listboxes. Its not in any way fancy, as I only ever intended it for my personal tests. It will make sure its sorting them the same though.

    Although I haven't taken any time to check it out, I'll take a wild guess and assume that by half exchanges you mean sorting pointers instead of records. Since the algorithm I posted only sorts longs, and not other variable types, this would be pointless, although in a more universal sorting algorithm it would possibly be faster.

    I'm not sure if you can say that we're re-inventing the wheel, at least for me, cause I'm just trying to make sure that I have a well-optimized implementation of a well-known algorithm, I'm not really trying to make something new up.
    Attached Files Attached Files
    Alphanos

  11. #11
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Alphanos: When I referred to half exchanges, I was referring to the fact that KwikSort makes a lot of comparisons and Exchanges using the same key. I guess it is best illustrated by one phase of a quick sort using single character keys and no associated records. Assume you pick the middle item as the KwikSort Pivot. An initial string might be: KXACDWQMPRDYN. The stages of one phase of the algorithm would be the following (Exchanges indicated by bold letters).
    KXACDWQMPRDYN

    KQACDWXMPRDYN
    KNACDWXMPRDYQ
    KNACDQXMPRDYW
    KNACDDXMPRQYW
    KNACDDQMPRXYW
    KNACDDPMQRXYW
    Notice that Q was involved in every Exchange (Switch, Swap). It could have been put into a temporary location when the KwikSort phase started. Then the other elements would be copied, using Half Exchanges. At the end of the phase, Q would be copied from the temporary location to its final position. The array would look like the following using Half Exchanges..
    KXACDWQMPRDYN

    KXACDWXMPRDYN
    KNACDWXMPRDYN
    KNACDWXMPRDYW
    KNACDDXMPRDYW
    KNACDDXMPRXYW
    KNACDDPMPRXYW
    KNACDDPMQRXYW
    The final step copied Q from the temporary to its final position. The duplicates are position where Q would have been put if Full Exchanges were used.

    KwikSort has two advantages over the inferior algorithms like Bubble Sort.
    • It is not restricted to Local Exchanges. Id est: One exchange often moves an item a long distance in the array. This tends to be efficient.
    • It uses Half Exchanges, as illustrated above. This gains a lot if entire records instead of merely keys are being Exchanged.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  12. #12
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Guv
    Shellsort is rather closer to O(log2(n)) algoritms than inserition sort, which is the fastest of O(n^2) algoritms. Shellsort starts out with global exchanges and then goes more and more local, its implementation is basically insertion sort on subranges picked of the total range for each n'th element where n decrements (halves in simple implementations of shellsort, while jumps with somewhat irregular jump like (..15,7,3,1) in more optimized versions) thus it performs about O(log2(n)) but is still generally slower than quicksort. Unlike quicksort shell has no worstcase problem, sorting a already sorted range tends to slow down to the bubble sort class, although you can develope further on the quicksort with pivot choosing. Insertion short is faster than all other algoritms including quicksort on short ranges with less than 30 items, reason why you can see them implemented within quicksort. Radixsort can be faster than quicksort at times, not too sure but probably on shorter and not too spread out ranges. Bucket sort is extreemly fast O(n) but is only good limited (and preferably filled) ranges, and hashsort similar, a bit slower but better for ranges.
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  13. #13
    jim mcnamara
    Guest
    The current degree of 'sortedness' affects many algorithms, as does the overal distribution of values, as does redunancy.

    Guv got it on the reinventing wheels - see a 1968 3-set volume 'Art of Computing' by Don Knuth.

    If you code in C, ansi C provides qsort - a quicksort algorithm.
    It swaps pointers to array elements which is faster than 'swapping' strings or other long datatypes.

    Also - look up linear time sort it's part of basic computer sci today.

    http://www.cs.unc.edu/~lin/COMP122-F01/

  14. #14
    wossname
    Guest
    I too recommend reading TAOCP by Knuth (I've got the 1997 version (smug grin !)).

    The number of 'cut and shut' versions of Quicksort (Knuth doesn't say Kwiksort and he must be the god of sorting!) out there is amazing. I recently saw someone palming off his take on it, it was about 7 lines of code long and bore no relation to the genuine algorithm at all.

    I'm sure Knuth has the real McCoy in his mucho compendious tome. He also gives verbose investigations of speed attributes of more than 20 sorting algorithms.

    Heavy reading matter but the technical content is unbeatable.

  15. #15
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Sorting gurus use Kwiksort instead of Quick Sort, in writing memos and letters, but in a formal book on algorithms, they cater to the academic types who would think they did not know how to spell if they wrote Kwiksort.

    Nobody wants to seem to be a diseducated bum.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width