Sorting algorithms (sort array)
While it's no doubt been covered many times before, I was hoping to set up a nice repository of the various sorting algorithms in a single thread. So if you have VB6 code for a technique that isn't listed, please post it.
To keep the code as simple as possible, assume we are sorting single-dimension typed arrays. Your code can work for string or numeric arrays; it will be up to the reader to make any necessary conversions.
I'll start us off with the old standby, the QuickSort:
Code:
' Omit plngLeft & plngRight; they are used internally during recursion
Public Sub QuickSortLong(ByRef plngArray() As Long, Optional ByVal plngLeft As Long, Optional ByVal plngRight As Long)
Dim lngFirst As Long
Dim lngMid As Long
Dim lngLast As Long
Dim lngSwap As Long
If plngRight = 0 Then
plngLeft = LBound(plngArray)
plngRight = UBound(plngArray)
End If
lngFirst = plngLeft
lngLast = plngRight
lngMid = plngArray((plngLeft + plngRight) \ 2)
Do
Do While plngArray(lngFirst) < lngMid And lngFirst < plngRight
lngFirst = lngFirst + 1
Loop
Do While lngMid < plngArray(lngLast) And lngLast > plngLeft
lngLast = lngLast - 1
Loop
If lngFirst <= lngLast Then
lngSwap = plngArray(lngFirst)
plngArray(lngFirst) = plngArray(lngLast)
plngArray(lngLast) = lngSwap
lngFirst = lngFirst + 1
lngLast = lngLast - 1
End If
Loop Until lngFirst > lngLast
If plngLeft < lngLast Then QuickSortLong plngArray, plngLeft, lngLast
If lngFirst < plngRight Then QuickSortLong plngArray, lngFirst, plngRight
End Sub
Re: Sorting algorithms (sort array)
I finished the benchmark and added it to the CodeBank here. I ended up deciding to only include native algorithms, excluding hybrids like JSort. (JSort's performance was awful, btw.) Mostly because including hybrids opens up an ocean of additional possibilities, but also because I never found a hybrid that outperformed the "better" native function.
For example, no matter how I hybridized (?) merge sort, none of them were as fast as a pure merge sort. And JSort was nowhere near the speed of a native heap sort. Maybe that's a quirk of the VB6 compiler; I don't know.
A couple bug reports:
Code Doc, Jump Sort fails if there are only two elements, because (UBound() - LBound()) \ 2 = 0, so it compares (i) with (i+0), which isn't very helpful. It doesn't hang or throw a runtime error or anything, rather it just leaves the array unsorted. I added a check to handle this issue.
Merri, there is a bug in your JSort implementation in both the Heap and InvHeap functions. The Do lines...
Code:
Do While lngChild <= UpBound And Not blnDone
...should use "<" instead of "<=", otherwise the loops hang when lngChild = UpBound.
Re: Sorting algorithms (sort array)
There appears to be two different versions of JSort, the other being under the name J sort. A writer claims it can be up to 30 times faster than quicksort; guess that can only be tested by writing the function, but this far I haven't found a code example (and don't have the time right now to do so).
Re: Sorting algorithms (sort array)
That sounds intriguing. If you run across some code and don't have time, post a link here and I'll take a crack at it. Doesn't much matter to me what language the code is in. (Oddly enough, I've become fairly adept at sorting logic in the past week.)
About the international separator character, I have a couple questions if you have a second. Assuming your separator character is a comma (,)...- Wouldn't/shouldn't Str(1/2) return "0,5"?
- Is CStr(1/2) <> Str(1/2)? (Assuming they're Trim'ed, if that matters.)
- How does Format(1/2, "0.0") behave? "0,5" or "0.5"? Thinking about it, should it instead be Format(1/2, "0,0")?
- Any other tips or points of interest that you can think of?
Re: Sorting algorithms (sort array)
Str$() always uses dots. It's return value is " 0.5" for positive and "-0.5" for negative numbers. Val() always uses dots.
CStr() always uses current locale. All datatype conversion/coersion functions are like this.
Format$() always uses current locale. Logic in this is that you can use Format$ to date formatting for an example, so you can get the name of the date or default date formatting in current locale settings.
In the other hand, imo using database for a simplish benchmarking program is an overkill. You wouldn't have a problem with locale if you didn't use database.
Then, some more reading for you:
http://groups.google.com/group/fido7...084cdb04008ab3
http://www.nist.gov/dads/HTML/jsort.html
http://www.softpanorama.org/Algorithms/sorting.shtml
Re: Sorting algorithms (sort array)
Overkill? Me? hehheh.
I decided to go with a database for two reasons. First, all the long text displays called for Memo fields instead of hardcoded literal strings. (The strings were too long to fit on a single line in the IDE, and editing them when I had them as underscore-chained assignments was a major PITA. Much easier to edit Memo fields.) The other is because I didn't want to have to hold a copy of the unsorted array in memory while the algorithms were doing their thing; I wanted to leave all the memory available to the algorithms themselves.
Thanks much for the locale info, I appeciate it.
Re: Sorting algorithms (sort array)
Ellis Dee said, "Code Doc, Jump Sort fails if there are only two elements, because (UBound() - LBound()) \ 2 = 0, so it compares (i) with (i+0), which isn't very helpful. It doesn't hang or throw a runtime error or anything, rather it just leaves the array unsorted. I added a check to handle this issue."
-----------------
Thanks for finding this one.:thumb:
For some unknown reason, I neglected to check this code with a sample of size 2. It's rather amazing that you found it in such a short period of time. Perhaps I never found it because there is a 50:50 chance that the two items are already sorted when generated randomly. Please forgive me for this oversight.:blush:
I am somewhat surprised that you converted my While...Wend loops to Do...Loop Until loops for the code bank. I suppose you had a reason for doing this, but I tend to lean toward simplicity whenever possible. Most of the microprocessors tend to like this also.
Historically, I wrote this code initially back in the 1980's by looking at a deck of cards. I took 10 playing cards from the same suit from a deck, shuffled them, and laid them face down. Then I compared the first with the sixth, the second with the seventh and so on. When I found two that were out of sequence, I swapped them and repeated until no swaps occurred.
Then I cut the jump interval in half and compared the first with the third, the second with the fourth, etc., swapping as needed. The last possible comparison is the first with the second, the second with the third, etc.
Then I turned the cards over. All were in sequence and I seldom had to repeat the last sequence twice, even if a swap was detected. The worst case scenario is when all are in reverse order before you start. The best case scenario occurs when the items are almost already in correct sequence. That's when my jump sort beats the Quicksort and stays at least even with the Shell. At that point, I wrote the source code, and it's been with my applications for over two decades.
Perhaps a third variable could be added that somehow checks the sequence on a given jump interval so that even if a swap has just occurred, a passthrough check for a swap is unnecesary at that same jump interval level, the sort would then run even faster because redundant passthroughs would vanish. However, any check that you make adds time and defeats the purpose.
Re: Sorting algorithms (sort array)
Quote:
Originally Posted by Code Doc
For some unknown reason, I neglected to check this code with a sample of size 2. It's rather amazing that you found it in such a short period of time. Perhaps I never found it because there is a 50:50 chance that the two items are already sorted when generated randomly. Please forgive me for this oversight.:blush:
I wasn't actually looking for it. When I first implemented merge sort -- which I wrote from scratch -- it kept coming up blank in the results window. A blank time (as opposed to N/A) means that it failed the verification check. So I dropped the array size down to two to see if it would work occasionally, but it still failed every time. It then became obvious that I had reversed the comparison, using < instead of >. But while I was doing this, I noticed that jump sort also came up blank a few times. Armed with the advance knowledge that it sometimes failed with an array size of two -- but never any other size -- it wasn't hard to identify the problem.
As for the While...Wend conversion, the main reason was that I am irrationally averse to initializing the flag variable before the loop starts. (None of the implementations do any initialization.) This is a very bad VB habit, but I've made my peace with that.
Is While...Wend faster than Do...Loop?
Quote:
Perhaps a third variable could be added that somehow checks the sequence on a given jump interval so that even if a swap has just occurred, a passthrough check for a swap is unnecesary at that same jump interval level, the sort would then run even faster because redundant passthroughs would vanish. However, any check that you make adds time and defeats the purpose.
This intrigues me, but I can't quite follow it. Could you elaborate?
Your description of the genesis of jump sort gives me an idea for a non-recursive quicksort, which I'm sure would be much faster and less memory-intensive without the hundreds or thousands of recursive calls.
Re: Sorting algorithms (sort array)
Now that I'm looking more closely at jump sort, there's still optimization to be had. Consider what happens when the jump value gets to 1: it runs through the inner loop, and then because it's non-zero, the outer loops runs again. This (final) iteration integer halves the jump value, and since 1 \ 2 = 0, the inner loop fires with a jump value of zero. That means that in every sort, the final pass compares every element against itself. Thus, n comparisons are wasted each call.
This behavior happens in both my implementation as well as the original code you posted. To correct it is simple: you need to change "While Jump" to "While Jump > 1", and I need to change "Loop Until lngJump = 0" to "Loop Until lngJump = 1".
This won't have a huge impact on performance, but hey, every little bit helps.