PDA

Click to See Complete Forum and Search --> : VB6: All purpose sort


Merri
May 24th, 2009, 07:31 AM
Since VB6 is missing a generic all purpose sort procedures, maybe it would make sense to create one? There are many functions available pretty much everywhere, but not that many that would have a generic speed optimization for VB nor any that would take "anything" and just sort it.

My initial idea would be to have three subs:
' CaseSensitive sort (A < a, å < ä)
Public Sub Sort(ParamArray Arrays())

End Sub

' Binary sort (A < a, å > ä)
Public Sub SortB(ParamArray Arrays())

End Sub

' CaseInsensitive sort (A = a, å < ä)
Public Sub SortI(ParamArray Arrays())

End Sub
The ParamArray would take any number of arrays, check that the given variables are real string or numeric data type arrays and then also check for dimensions.


There are a few things that require thinking. For strings the obvious one is "which sorting method to use?" – there are quite a few ones out there, for benchmarks I guess the most interesting is Super Fast String Sorts (http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=67522&lngWId=1) over at PSC. Numeric data types are a bit different case and probably require a different algorithm to use.

Should a simpler but slower sort function be used with small arrays?


One more additional thought: if a control is passed such as ComboBox or ListBox, should that be sorted too? Or worse yet, a control array of those controls? I guess I haven't seen such implementations even though those would be possible to do. As far as I can remember it is possible to get the buffer of a ListBox so it would possible to sort it efficiently with a few memory tricks.

Ellis Dee
May 24th, 2009, 09:10 AM
I think the ParamArray is overkill, as is worrying about control arrays. I'm not sure "sorting" a control array is even a meaningful concept. Wouldn't it be far better to just pass a single array/control and then go from there?

If you're trying to find the most efficient way to sort in all cases, you'll have to commit to a single pass through the array. This reduces efficiency compared to a case-specific optimization, but I don't see any way around it.

One key element mentioned in your linked article is to use your safe array hack to change the LBound() to 0 before starting, then setting it back to whatever it was when finished.

Ellis Dee
May 24th, 2009, 09:11 AM
Also, I don't think there is any practical application where you want to sort text in a case-sensitive way, much less a binary way.

Sorting is the raison d'etre of StrComp(), whose primary functionality is to make A = a = å = ä.

Ellis Dee
May 24th, 2009, 09:32 AM
Did you want to support sorting single-dimension and multi-dimension arrays using a single, unified entrypoint? I always segregate them. Here's an example using one of the shortest algorithms, just to give you an idea of how I dealt with 2-dimensional sorting:
Public Sub InsertionSort1(ByRef pvarArray As Variant)
Dim i As Long
Dim j As Long
Dim iMin As Long
Dim iMax As Long
Dim varSwap As Variant

iMin = LBound(pvarArray) + 1
iMax = UBound(pvarArray)
For i = iMin To iMax
varSwap = pvarArray(i)
For j = i To iMin Step -1
If varSwap < pvarArray(j - 1) Then pvarArray(j) = pvarArray(j - 1) Else Exit For
Next j
pvarArray(j) = varSwap
Next i
End Sub

' Sort a 2-dimensional array on either dimension
' Sample usage to sort on column 4
' Dim MyArray(1 to 5, 1 to 1000) As Long
' InsertionSort2 MyArray, 1, 4
' Dim MyArray(1 to 1000, 1 to 5) As Long
' InsertionSort2 MyArray, 2, 4
Public Sub InsertionSort2(pvarArray As Variant, plngDim As Long, plngCol As Long)
Dim i As Long
Dim j As Long
Dim iMin As Long
Dim iMax As Long
Dim varSwap As Variant
Dim c As Long
Dim cMin As Long
Dim cMax As Long

cMin = LBound(pvarArray, plngDim)
cMax = UBound(pvarArray, plngDim)
ReDim varSwap(cMin To cMax)
Select Case plngDim
Case 1
iMin = LBound(pvarArray, 2) + 1
iMax = UBound(pvarArray, 2)
For i = iMin To iMax
For c = cMin To cMax
varSwap(c) = pvarArray(c, i)
Next
For j = i To iMin Step -1
If varSwap(plngCol) < pvarArray(plngCol, j - 1) Then
For c = cMin To cMax
pvarArray(c, j) = pvarArray(c, j - 1)
Next
Else
Exit For
End If
Next j
For c = cMin To cMax
pvarArray(c, j) = varSwap(c)
Next
Next i
Case 2
iMin = LBound(pvarArray, 1) + 1
iMax = UBound(pvarArray, 1)
For i = iMin To iMax
For c = cMin To cMax
varSwap(c) = pvarArray(i, c)
Next
For j = i To iMin Step -1
If varSwap(plngCol) < pvarArray(j - 1, plngCol) Then
For c = cMin To cMax
pvarArray(j, c) = pvarArray(j - 1, c)
Next
Else
Exit For
End If
Next j
For c = cMin To cMax
pvarArray(j, c) = varSwap(c)
Next
Next i
End Select
End SubAs you can see, I basically just made two copies of the single-dimension version and put in a check to see which dimension contains the column definitions, branching to the appropriate copy of the code.

Merri
May 24th, 2009, 09:35 AM
Just for the account both Å and Ä are always != A (in Swedish/Finnish they come after Z as ÅÄÖ).


As for efficiency, Sort would call a case specific function once it has detected the format of the variable. Since it receives a Variant you can check for the VarType and work from there.

Ellis Dee
May 24th, 2009, 09:51 AM
As for efficiency, Sort would call a case specific function once it has detected the format of the variable. Since it receives a Variant you can check for the VarType and work from there.That's going to be quite a bit of code sprawl. Figure you need three copies of the algorithm for a single implementation: The basic algorithm for single dimension arrays, plus two copies for two-dimensional arrays to support column definitions in either the first or second dimension. (I don't even want to think about three-dimensional, much less n-dimensional arrays. Gah!)

If you want a separate implementation for each native type:

String
Byte
Integer
Long
Single
Double
Currency
Date
(Let's ignore boolean)

That's 8 different implementations, 3 copies of the algorithm per implementation.

If Radix sort is the best algorithm for all possible starting conditions: already sorted, inverse ordered, camel hump (quicksort killer), random, 5% shuffled, etc..., then it's ideal. A mere 24 copies of Radix sort later and you're done.

Is Radix sort the best algorithm for all starting conditions and all list sizes? If not, every change gets multiplied by 24 copies of the code.

I'm happy to be corrected, because I'm not a huge fan of code sprawl.Just for the account both Å and Ä are always != A (in Swedish/Finnish they come after Z as ÅÄÖ).I overstated my point when I said they are all equal, but they only come after Z in a binary sort. Surely if you open up a Finnish dictionary you won't find Å alphabetized after Z.

Merri
May 24th, 2009, 10:09 AM
Actually I do, when sorted they are always after Z. As a character code Ä is before Å, so in binary sort Z < Ä < Å < Ö but in text Z < Å < Ä < Ö.

Date = Double, so you can use the same code for both when sorting.

I think for the numeric data types having one code is enough. I'm not sure yet, but I think multidimensional arrays can be hacked temporarily to look like one dimensional arrays. I can remember that safe array structure had some oddities though, so it may turn out to be a bit harder, but may be doable with relative ease instead of creating tons of code for many multiple dimensions.

Basically Sort, SortB, SortI in total would have three different string functions and seven for the numeric data types (one for each). Only two implementations are required if one sort will be chosen as the main one: string implementation and numeric implementation. Making all the variation function would only require some simple replaces (the first numeric should be of course be done with other than Long or there will be unnecessary replace issues).

Of course before doing any extra copy'n'replace the two implementations must be validated to work perfectly to avoid unnecessary work.

Ellis Dee
May 24th, 2009, 09:36 PM
Actually I do, when sorted they are always after Z. As a character code Ä is before Å, so in binary sort Z < Ä < Å < Ö but in text Z < Å < Ä < Ö.I find this almost impossible to believe. You're telling me that in a Finnish dictionary -- as in a physical book with a cover and pages and everything -- an "a" with an accent mark is found after the letter z?

Could you give me an example of two words that start with the same consonant, with the second letter being a plain "a" in the first word and an accented "a" in the second?

Ellis Dee
May 24th, 2009, 10:05 PM
I think for the numeric data types having one code is enough. Only two implementations are required if one sort will be chosen as the main one: string implementation and numeric implementation.Radix sort, yes?

At a couple hundred lines for esch Radix sort implementation, this is quite ambitious. You shouldn't needs thousands of lines of code to sort something.

I'm not convinced that the efficiency of native types vs variants is justified. I think that Radix sort is so much faster than comparison sorts that just using variants throughout will still be faster than anyone would expect. Plus it's a stable algorithm, so that's pretty darn hardcore.

Using variants would reduce the code to two or three implementations. (I think a dedicated, hard-coded and strongly-typed counting sort implementation for byte arrays would be worth it.)

Ellis Dee
May 24th, 2009, 10:14 PM
I envision entrypoints something like: (Pseudocode only.)
SortTextInPlace(String(), Compare, Descending)
SortTextIndex(String(), Compare, Descending) As Long()

SortInPlace(Variant, Descending)
SortIndex(Variant, Descending) As Long()Your thoughts?

Merri
May 25th, 2009, 01:14 AM
My head is on the fuzzy side at the moment (less than 4 hours of sleep and off to work, I even awoke before the alarm...), but here are some thoughts:

I don't see a point in descending, since you can just loop the array in inverse order to switch between ascending/descending. Not really a problem for a sort function to handle.

Creating index arrays would be a good option. One thing there is to account for though: returning an array as a function return value is less efficient than passing an array as a parameter (a thing I guess I should fix in the Merge Sort implementation I made).

I haven't taken a in-depth look into Radix sort yet. Apparently the American Flag is a variation of a Radix sort?


As for Å Ä Ö, Wikipedia uses binary sort (http://fi.wikipedia.org/w/index.php?title=Luokka:Suomalaiset_kirjailijat&from=Tiainen,+Marja-Leena) and fails with automated list, but as you can see in the top of the page, Å Ä Ö are after Z. Each of them is accounted as a separate letter, Å is pretty much same as O, but Ä and Ö as used in Finnish sound entirely different from A and O. Thus accounted and should be sorted after Z. Good thing we don't need to think about sorting by locale, we can just use the generic Unicode sorting.

Ellis Dee
May 25th, 2009, 01:30 AM
I don't see a point in descending, since you can just loop the array in inverse order to switch between ascending/descending. Not really a problem for a sort function to handle.For a stable algorithm you can sort on multiple keys one at a time and end up with the same results as if you sorted on multiple keys in one pass. For such an implementation, sorting DESC becomes meaningful. (This obviously only applies to multi-column data.)

Merri
May 25th, 2009, 01:29 PM
When sorting multidimension sort, you onedimensionalize it and then can jump over by the multiplication of the first dimensions of items.

Option Explicit

Private Sub Form_Load()
Dim Test() As Byte
ReDim Test(1, 1)
Debug.Print VarPtr(Test(0, 1)) - VarPtr(Test(0, 0))
' = 2

Erase Test
ReDim Test(1, 1, 1)
Debug.Print VarPtr(Test(0, 0, 1)) - VarPtr(Test(0, 0, 0))
' = 4 (2 * 2)

Erase Test
ReDim Test(2, 1, 1)
Debug.Print VarPtr(Test(0, 0, 1)) - VarPtr(Test(0, 0, 0))
' = 6 (3 * 2)

Erase Test
ReDim Test(2, 2, 1)
Debug.Print VarPtr(Test(0, 0, 1)) - VarPtr(Test(0, 0, 0))
' = 9 (3 * 3)
End Sub

Basically you can make one dimensional sort aware of this simply by passing an extra parameter that tells how many items to jump and fake the passed multidimensional array to look like one dimensional. That ought to make sorting multidimensional arrays just as effective as one dimensional.