I only spent a day on the algorithm so I am dubious that these algorithms are the fastest available. I want to find the fastest algorithm out there and preferable a VB implementation of it.
each of those sorts are for different types of info and how your info is layed out
quicksort in some cases could be fastest, and in some cases be slowest and so on and so forth, it all depends on type of data, and structure of data
Nucleus: If you devised an algorithm which is faster than KwikSort, could you post a description of it? KwikSort is known to be one of the faster methods.
Insertion, Shell, & KwikSort are sort in place methods. As far as I know, a proper implementation of KwikSort should usually be faster that the other two unless there is something very special about the data.
BTW: Using KwikSort au lieu de Quick Sort used to be jargon that identified a programmer as an expert in the world of sorting.
Insert Sort is pretty good for files if most records are close to their final position. It might be faster than KwikSort for such locally sequenced records. Otherwise it is a poor method, but better than the pathetic ones like Bubble Sort. I never studied Shell Sort, but assume that it is slightly better than Insert Sort because it moves some records long distances. Like Insert Sort, it seems well suited to files with records fairly close to their final positions, but is better than Insert Sort when a lot of records are not close to their final position.
KwikSort has an advantage in that it can be implemented using half exchanges au lieu de full exchanges. When sorting large records, this is an advantage which is almost impossible for other methods to overcome. KwikSort tends to work efficiently when many records are far away from their final position. In most implementations of KwikSort, the KwikSort method is used until the file has groups of 5-12 records in their final position. Then Insert Sort is used for a final cleanup phase. I do not know why it is done this way, but assume that somebody did an analysis which indicated this is the way to go.
There is an algorithm called Heap Sort, which is supposed to be very fast, but I know nothing about this algorithm. There is a citation in some thread which mentions a book containing descriptions of almost every known sorting algorithm. Included (I think) is a an analysis of the efficiency of each method.
The really big sorting jobs used to involve humongous mainframe files on magnetic tape. Merging methods were generally used. Sometimes distribution sorts were used.
A Merging Sort starts with an initial pass at the entire file. This initial phase creates groups (called strings of records in order. The groups are written alternately to two or more tape files. From that point on, multiple passes are used to merge the groups into ever larger groups until the entire file is sorted.
Merging sorts were the only possible methods when magnetic tape was the only medium with enough capacity for the files being sorted. In recent times, I guess (but am not sure) that many (if not most) mainframe sorting is done with disk files. When sorting a huge disk file with large records, key sorting techniques are used to minimize I/O time. The file is read once to build a file of short records, each containing a sort key and the disk address of the corresponding record. These short records are sorted and then used to put the file in order.
There were some ingenuous methods devised to make merging tape sorts more efficient. There were methods which required sorting some groups of records into reverse order. These methods eliminated tape rewinds (a time consuming process) by allowing a tape to be read backwards after having been written. One such method was called a Fibonacci Sort because it was based on the Fibonacci sequence.
Distribution sorts take advantage of natural sequences in a file. Two obvious examples relate to huge Open Order files in customer sequence which require sorting to Sales Office & Customer sequence for sales statistics reports and to Product/Plant sequence for Production Control Reports.
Most companies have at most 50-100 Sales Offices, with 10-15 of those offices being responsible for 70% or more of the total orders. A distribution sort starts by reading the file and writing a file for each major Sales Office. Other records are written to a miscellaneous file. After one pass, all the records are in Sales Office Customer sequence, except for those in the miscellaneous file, which is then sorted using another Distribution sort or some other method. If you can assign one file to each Sales Office, the entire file is sorted in one pass at the original file, because the customer sequencing of the original file is not disturbed by this method.
Similarly, most companies have at most 25-50 product/plant combinations, with certain combinations comprising 70% or more of the file. The method here is similar to the above.
When usable, distribution sorting tends to be faster than any other method.
Live long & prosper.
The Dinosaur from prehistoric era prior to computers.
Eschew obfuscation!
If a billion people believe a foolish idea, it is still a foolish idea!
VB.net 2010 Express
64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.
As I said, this was my first attempt at creating a sorting function, so you can imagine my surprise at finding it much faster at sorting say 2 million records than those I found at the site I previously posted.
What I focused on was an algorithm that tries to minimise the time taken to perform a "switch", as I saw that as the performance bottleneck.... (rather than minimising the number of switches themselves)
So I created a second array that holds the links to higher and lower level elements. In this way I do not have to move data around in the array to switch the values.
Checks are faster than switches but I still tried to minimise checks by implementing a midpoint optimisation for the new values.
Anyway I have attached a project with the function I wrote. I haven't really done much reading yet, so I am probably covering old territory..... even so it is good to start getting to know this stuff. You'll see that it is much faster than the implementation of quicksort there, perhaps a faster implementation of quicksort exists?
I think I'll read more about QuickSort and see if I can improve on the implementation....
Nucleus: I downloaded your Sorting Demo and used it just a little bit. Then I scanned your code. A few questions and comments follow.
I did not see a Randomize Statement, which probably makes sense. You will always use the same numbers, making the comparisons more valid. There is a very small risk of the particular set of numbers being biased in favor of one algorithm. The data is unlikely to favor insertion sort, but I am not sure about the other methods.
The description of your own method did not enlighten me much. I did not analyze the code to get a better idea of how it works.
I have no idea whether or not your code actually sorts the array. Did you ever analyze the array after sorting it to make sure that each sort really puts all the items in order? I added the following code after your display of the elapsed time.
VB Code:
For i = 1 To FDemo.ItemCount - 1
If A(i + 1) < A(i) Then
Junk = MsgBox("Item" & CStr(i) & "out of order")
End If
Next i
It indicated that KwikSort had a bug.
KwikSort seems to use full switches au lieu de half switches, which makes it slower than necessary.
BTW: Most people say exchanges rather than switches, but in this context you are not likely to confuse anybody.
Also, I usually have seen Kwiksort implemented with the cleanup Insertion Sort called after KwikSort has finished. Internally, KwikSort ignores short groups rather than using Insertion Sort inside the KwikSort Subroutine.
While recursion is slicker, KwikSort is often implemented with a stack rather than recursion. This tends to use less memory, and perhaps is faster. I have no idea about the efficiency of stack versus recursion.
Live long & prosper.
The Dinosaur from prehistoric era prior to computers.
Eschew obfuscation!
If a billion people believe a foolish idea, it is still a foolish idea!
VB.net 2010 Express
64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.
I was recently doing some work with sorting as well, and have a quicksort that is probably faster (for numbers at least). Check this out and tell me what you find. Using a numeric array of 32k elements ranging from 0 to 100k, this quicksort was much faster, but I haven't done a lot of tests.
VB Code:
Public Sub IterativeQuickSort(ByRef LongArray() As Long)
If UBound(LongArray) - LBound(LongArray) < 5 Then MiniSort LongArray, LBound(LongArray), UBound(LongArray): Exit Sub
Dim lngMid As Long
Dim lngHi As Long
Dim lngLo As Long
Dim Temp As Long
Dim Lower As Long
Dim Upper As Long
Dim Count As Long
Dim MyStack As Collection
Set MyStack = New Collection
MyStack.Add UBound(LongArray)
MyStack.Add LBound(LongArray)
Count = 2
Do Until Count = 0
Lower = MyStack(Count)
MyStack.Remove Count
Count = Count - 1
Upper = MyStack(Count)
MyStack.Remove Count
If Upper - Lower > 4 Then
Temp = 0.5 * (Upper - Lower) + Lower
lngMid = LongArray(Temp)
LongArray(Temp) = LongArray(Lower)
lngLo = Lower
lngHi = Upper
Do
Do While LongArray(lngHi) >= lngMid And lngHi > lngLo
lngHi = lngHi - 1
Loop
If lngHi = lngLo Then
LongArray(lngLo) = lngMid
Exit Do
End If
LongArray(lngLo) = LongArray(lngHi)
lngLo = lngLo + 1
Do While LongArray(lngLo) <= lngMid And lngLo < lngHi
lngLo = lngLo + 1
Loop
If lngLo = lngHi Then
LongArray(lngHi) = lngMid
Exit Do
End If
LongArray(lngHi) = LongArray(lngLo)
lngHi = lngHi - 1
Loop
If (lngLo - Lower) > (Upper - lngLo) Then
If lngLo > 1 Then
MyStack.Add lngLo - 1
MyStack.Add Lower
End If
MyStack.Add Upper
MyStack.Add lngLo + 1
Else
MyStack.Add Upper
MyStack.Add lngLo + 1
If lngLo > 1 Then
MyStack.Add lngLo - 1
MyStack.Add Lower
End If
End If
Else
MiniSort LongArray, Lower, Upper
End If
Count = MyStack.Count
Loop
Set MyStack = Nothing
End Sub
Public Sub MiniSort(ByRef LongArray() As Long, ByRef Lower As Long, ByRef Upper As Long)
Dim lngLargest As Long
Dim lngMid1 As Long
Dim lngMid2 As Long
Dim lngMid3 As Long
Select Case Upper - Lower
Case 1
If LongArray(Lower) > LongArray(Upper) Then SwapL LongArray(Lower), LongArray(Upper)
Exit Sub
Case 2
lngMid1 = Lower + 1
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(lngLargest), LongArray(Upper)
If LongArray(Lower) > LongArray(lngMid1) Then SwapL LongArray(Lower), LongArray(lngMid1)
Exit Sub
Case 3
lngMid1 = Lower + 1
lngMid2 = lngMid1 + 1
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
Exit Sub
Case 4
lngMid1 = Lower + 1
lngMid2 = lngMid1 + 1
lngMid3 = lngMid2 + 1
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
If LongArray(lngMid3) > LongArray(lngLargest) Then lngLargest = Lower + 3
If Not (LongArray(Upper) > LongArray(lngLargest)) Then SwapL LongArray(Upper), LongArray(lngLargest)
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If LongArray(lngMid2) > LongArray(lngLargest) Then lngLargest = Lower + 2
If Not (LongArray(lngMid3) > LongArray(lngLargest)) Then SwapL LongArray(lngMid3), LongArray(lngLargest)
lngLargest = Lower
If LongArray(lngMid1) > LongArray(Lower) Then lngLargest = Lower + 1
If Not (LongArray(lngMid2) > LongArray(lngLargest)) Then SwapL LongArray(lngMid2), LongArray(lngLargest)
If LongArray(lngMid1) < LongArray(Lower) Then SwapL LongArray(Lower), LongArray(lngMid1)
Exit Sub
End Select
End Sub
Private Sub SwapL(ByRef Num1 As Long, ByRef Num2 As Long)
Dim Temp As Long
Temp = Num1
Num1 = Num2
Num2 = Temp
End Sub
Its iterative, not recursive; uses a collection to store pairs of upper and lower boundaries yet to sort.
BTW, tell me if you find ways to optimize it further.
Alphanos: You seem to have a lot of code for the KwikSort algorithm. I am too lazy to analyze that much code.
A quick scan indicates that, like Nucleus, you are doing full exchanges au lieu de half exchanges. When sorting a huge number of items or large records from a disk file, the difference is significant. You call them swaps, while Nucleus calls them switches. The normal jargon is exchanges. although anything that is undestandable in context is okay.
Try downloading the Nucleus application and modifying it to execute your sort as well as the ones already implemented. If you add the code I suggested in a previous post, you will be able to make sure you really are sorting an entire array into the correct sequence.
The Nucleus application is easy to work with and it does a time comparison of various methods.
I hope that you and Nucleus realize that we are reinventing wheels.
Live long & prosper.
The Dinosaur from prehistoric era prior to computers.
Eschew obfuscation!
If a billion people believe a foolish idea, it is still a foolish idea!
VB.net 2010 Express
64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.
Heapsort is specifically well suited for datastructures called heaps, that are full binary trees where each child is smaller(or larger) than its parent, and is implemented as an array. Heaps are priority queues, which are associative arrays that pops only the smallest(or largest) item.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
As a matter of fact I just copied a piece of code from my project which I was using to test sorting algorithms, I've had it around for a while. I'll post the whole thing if you'd like. It only compares two algorithms at once, but you can always alter which ones easily enough.
You'll notice that most recently I was comparing nusort with my optimized iterative quicksort. It will display the number of clock ticks for the first sort, then the second, and list the sorted number arrays in two listboxes. Its not in any way fancy, as I only ever intended it for my personal tests. It will make sure its sorting them the same though.
Although I haven't taken any time to check it out, I'll take a wild guess and assume that by half exchanges you mean sorting pointers instead of records. Since the algorithm I posted only sorts longs, and not other variable types, this would be pointless, although in a more universal sorting algorithm it would possibly be faster.
I'm not sure if you can say that we're re-inventing the wheel, at least for me, cause I'm just trying to make sure that I have a well-optimized implementation of a well-known algorithm, I'm not really trying to make something new up.
Alphanos: When I referred to half exchanges, I was referring to the fact that KwikSort makes a lot of comparisons and Exchanges using the same key. I guess it is best illustrated by one phase of a quick sort using single character keys and no associated records. Assume you pick the middle item as the KwikSort Pivot. An initial string might be: KXACDWQMPRDYN. The stages of one phase of the algorithm would be the following (Exchanges indicated by bold letters).
Notice that Q was involved in every Exchange (Switch, Swap). It could have been put into a temporary location when the KwikSort phase started. Then the other elements would be copied, using Half Exchanges. At the end of the phase, Q would be copied from the temporary location to its final position. The array would look like the following using Half Exchanges..
The final step copied Q from the temporary to its final position. The duplicates are position where Q would have been put if Full Exchanges were used.
KwikSort has two advantages over the inferior algorithms like Bubble Sort.
It is not restricted to Local Exchanges. Id est: One exchange often moves an item a long distance in the array. This tends to be efficient.
It uses Half Exchanges, as illustrated above. This gains a lot if entire records instead of merely keys are being Exchanged.
Live long & prosper.
The Dinosaur from prehistoric era prior to computers.
Eschew obfuscation!
If a billion people believe a foolish idea, it is still a foolish idea!
VB.net 2010 Express
64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.
Guv
Shellsort is rather closer to O(log2(n)) algoritms than inserition sort, which is the fastest of O(n^2) algoritms. Shellsort starts out with global exchanges and then goes more and more local, its implementation is basically insertion sort on subranges picked of the total range for each n'th element where n decrements (halves in simple implementations of shellsort, while jumps with somewhat irregular jump like (..15,7,3,1) in more optimized versions) thus it performs about O(log2(n)) but is still generally slower than quicksort. Unlike quicksort shell has no worstcase problem, sorting a already sorted range tends to slow down to the bubble sort class, although you can develope further on the quicksort with pivot choosing. Insertion short is faster than all other algoritms including quicksort on short ranges with less than 30 items, reason why you can see them implemented within quicksort. Radixsort can be faster than quicksort at times, not too sure but probably on shorter and not too spread out ranges. Bucket sort is extreemly fast O(n) but is only good limited (and preferably filled) ranges, and hashsort similar, a bit slower but better for ranges.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
The current degree of 'sortedness' affects many algorithms, as does the overal distribution of values, as does redunancy.
Guv got it on the reinventing wheels - see a 1968 3-set volume 'Art of Computing' by Don Knuth.
If you code in C, ansi C provides qsort - a quicksort algorithm.
It swaps pointers to array elements which is faster than 'swapping' strings or other long datatypes.
Also - look up linear time sort it's part of basic computer sci today.
I too recommend reading TAOCP by Knuth (I've got the 1997 version (smug grin !)).
The number of 'cut and shut' versions of Quicksort (Knuth doesn't say Kwiksort and he must be the god of sorting!) out there is amazing. I recently saw someone palming off his take on it, it was about 7 lines of code long and bore no relation to the genuine algorithm at all.
I'm sure Knuth has the real McCoy in his mucho compendious tome. He also gives verbose investigations of speed attributes of more than 20 sorting algorithms.
Heavy reading matter but the technical content is unbeatable.
Sorting gurus use Kwiksort instead of Quick Sort, in writing memos and letters, but in a formal book on algorithms, they cater to the academic types who would think they did not know how to spell if they wrote Kwiksort.
Nobody wants to seem to be a diseducated bum.
Live long & prosper.
The Dinosaur from prehistoric era prior to computers.
Eschew obfuscation!
If a billion people believe a foolish idea, it is still a foolish idea!
VB.net 2010 Express
64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.