PDA

Click to See Complete Forum and Search --> : VB6: 2-dimensional array sorter contest


Ellis Dee
Apr 7th, 2007, 08:38 AM
This is a friendly contest to see who can come up with the best two-dimensional array sorter.


Purpose

Generally speaking, whenever you see a multi-dimensional array with fixed dimensions, programmers tend to declare them as MyArray(Rows, Cols). That works perfectly well, and is intuitive to access. For example:

Dim FixedArray(10,2)
FixedArray(0,0) = 1
FixedArray(0,1) = "John Smith"
FixedArray(0,2) = "123 Main Street"
FixedArray(1,0) = 2
FixedArray(1,1) = "Jane Doe"
FixedArray(1,2) = "96 Edgemont Drive"
...etc...

The problem creeps in when you need to use a dynamic array, because you can't Redim Preserve that first dimension to add more rows; you can only redim the last dimension. So for dynamic arrays you'll usually see the opposite setup:

Redim Preserve DynamicArray(2,10)
DynamicArray(0,0) = 1
DynamicArray(1,0) = "John Smith"
DynamicArray(2,0) = "123 Main Street"
DynamicArray(0,1) = 2
DynamicArray(1,1) = "Jane Doe"
DynamicArray(2,1) = "96 Edgemont Drive"
...etc...

Because both setups are common, your solution must be flexible enough to handle either. To finish these examples, let's say you wanted to sort on the name column. The calls would end up looking something like this:

SortArray FixedArray, 2, 1 ' Sort on second dimension, index 1
SortArray DynamicArray, 1, 1 ' Sort on first dimension, index 1

Another use is when you want to sort a large UDT array. The simplest (and fastest) approach would be to create a separate two-dimensional array of the same size as the UDT, stuff the UDT index into one of the dimensions and whatever field you wanted to sort on in the other. After sorting the index array on that field, you end up with a list of the UDT indexes in the desired order giving you a quick and easy way to loop through the UDT.

The net result is that you'll be creating a useful and flexible indexing tool that may just come in handy some day.


Requirements

1) Your function must accept at least the following parameters: An array to be sorted
The dimension that contains the column to be sorted
The column to be sorted2) Your solution must be able to sort numbers, dates and strings.

3) Your solution can only be a single function. (Unlimited API declarations -- plus any applicable UDTs -- are allowed, but no variables outside the scope of your function.)


Judging

There are three main criteria that will be used for judging, plus a tiebreaker: Speed is a trump card. Make it super-fast, and you'll probably win.
Extra features/functonality is huge. A good enough set of features can trump even speed. (eg: Sorting on multiple columns.)
Ease of use for the programmer trying to incorporate your solution into their project.
Elegance will be used as a tiebreaker.Note that code-stealing is allowed. Bonus points go to the originator of an idea, and minus points if you improve on somebody else's idea without giving credit.


Time Limit

None, until further notice.


Prizes

If anyone has any ideas, I'm all ears. I'm thinking at the very least I'll broadcast your achievement in a signature on all my posts for a month.

Merri
Apr 7th, 2007, 12:13 PM
You can have much faster access to one dimensional array that is handled as a multidimensional.

Dim Cells(19) As String

X = Index Mod 10
Y = Index \ 10

Thus:
Cells(0) -> X = 0, Y = 0
Cells(9) -> X = 9, Y = 0
Cells(10) -> X = 0, Y = 1
Cells(19) -> X = 9, Y = 1
This even allows to resize arrays in a way you can't do with multidimensional arrays in VB6 (without working into an entirely new array), although repositioning of elements must be done manually.

This can give a lot of speed in the long run. The downside is that it requires getting used into it.

Zero based arrays are also faster than other bases, because there is no need to fix given position values to zero based.


I don't know if anyone is interested enough to go for this kind of solution. Atleast it is a challenge as such and it can be rewarding once the speeds are compared.

Milk
Apr 8th, 2007, 06:29 AM
This sounds like fun (once the chores are done)<snip>
Redim Preserve DynamicArray(2,10)
DynamicArray(0,0) = 1
DynamicArray(1,0) = "John Smith"
DynamicArray(2,0) = "123 Main Street"
DynamicArray(0,1) = 2
DynamicArray(1,1) = "Jane Doe"
DynamicArray(2,1) = "96 Edgemont Drive"
...etc...:eek: these examples look like variant arrays? *shudders*

plus @Merri I also tend to use one dimensional arrays for multidimensional data.

dare i suggest stepping back a level and presenting a challenge to read, store and sort some multi field CSV data of random data types?

Ellis Dee
Apr 9th, 2007, 12:13 PM
Merri, that's a cool tip for when speed is absolutely critical. I'm curious how much faster it actually is.these examples look like variant arrays? *shudders*

dare i suggest stepping back a level and presenting a challenge to read, store and sort some multi field CSV data of random data types?Conceptually, that's the basic idea. It really is more applicable to sorting a UDT array, but as far as I am aware, any functions dealing with UDTs must explicitly reference the individual element names, and thus can't be generalized. (Which is annoying.)

There are no doubt countless ways to optimize the situation when you have different data types. Splitting each data type into its own array is one way, for example. Or just create typed index arrays, leaving the variant array alone except for reading.

For the purposes of this thread, it is okay to assume that the reason the array needs to be sorted is because the data started out in CSV format. But a further assumption is that the programmer already set up routines for saving and loading it to a multi-dimensional array, and that his array structure is so entrenched in his program that it's unrealistic to change the format. But he needs it sorted. The question is, what is the fastest easy solution?

Maven
Apr 10th, 2007, 11:43 AM
Merri, that's a cool tip for when speed is absolutely critical. I'm curious how much faster it actually is.Conceptually, that's the basic idea. It really is more applicable to sorting a UDT array, but as far as I am aware, any functions dealing with UDTs must explicitly reference the individual element names, and thus can't be generalized. (Which is annoying.)

There are no doubt countless ways to optimize the situation when you have different data types. Splitting each data type into its own array is one way, for example. Or just create typed index arrays, leaving the variant array alone except for reading.

For the purposes of this thread, it is okay to assume that the reason the array needs to be sorted is because the data started out in CSV format. But a further assumption is that the programmer already set up routines for saving and loading it to a multi-dimensional array, and that his array structure is so entrenched in his program that it's unrealistic to change the format. But he needs it sorted. The question is, what is the fastest easy solution?

Would a link list type of solution be better for a situation like this? It'll grow and shrink as you need it and you can insert anywhere into the list without really moving anything.

Ellis Dee
Apr 10th, 2007, 08:11 PM
Would a link list type of solution be better for a situation like this? It'll grow and shrink as you need it and you can insert anywhere into the list without really moving anything.Almost certainly yes. If you need to then search it, though, linked lists are less than ideal.

Thinking about it, I am unaware of any way to search a linked list other than iterating through one item at a time, in order. Is there a better way?

My solution to this thread is posted here (http://www.vbforums.com/showthread.php?t=462586). It includes a companion binary search function. I have no doubt that my functions are slow, and could be improved. That's pretty much the only reason I created this thread to begin with.

Maven
Apr 10th, 2007, 09:15 PM
Almost certainly yes. If you need to then search it, though, linked lists are less than ideal.

Thinking about it, I am unaware of any way to search a linked list other than iterating through one item at a time, in order. Is there a better way?

My solution to this thread is posted here (http://www.vbforums.com/showthread.php?t=462586). It includes a companion binary search function. I have no doubt that my functions are slow, and could be improved. That's pretty much the only reason I created this thread to begin with.

There is a whole pot full of algorithms that are pretty much extensions on the link list concept, called binary trees. Which one you pick really depends on the nature of the data you are working with and how you intend to work with that data.

AVL Trees, Red-Black Tree's, Ternary Search Trees, etc are all good algorithms. Ternary is useful when your dealing with sets of strings.

http://algorithm.diy.myrice.com/resources/technical_artile/red-black_tree/g2.gif

That should give you an idea of how a self balancing tree works.

Ellis Dee
Apr 10th, 2007, 09:57 PM
There is a whole pot full of algorithms that are pretty much extensions on the link list concept, called binary trees. Which one you pick really depends on the nature of the data you are working with and how you intend to work with that data.

AVL Trees, Red-Black Tree's, Ternary Search Trees, etc are all good algorithms. Ternary is useful when your dealing with sets of strings.

http://algorithm.diy.myrice.com/resources/technical_artile/red-black_tree/g2.gif

That should give you an idea of how a self balancing tree works.Ha, that right there looks like some kind of fancy book-learning. The fact that I am a self-taught programmer who never took a computer class in my life has finally been exposed.

Doing a google search on "searching a linked list", it appears that you do have to search sequentially. The other factor I never considered (having never bothered to use an actual linked list) is that you can't do random access on them; accessing the thousandth element requires iterating through the first 999.

I see I have a bunch of reading up to do on wikipedia just for the sake of knowledge. Thanks for the search terms to get me started.

All of this appears to be OT for this thread, though.

Maven
Apr 10th, 2007, 10:23 PM
Ha, that right there looks like some kind of fancy book-learning. The fact that I am a self-taught programmer who never took a computer class in my life has finally been exposed.

Doing a google search on "searching a linked list", it appears that you do have to search sequentially. The other factor I never considered (having never bothered to use an actual linked list) is that you can't do random access on them; accessing the thousandth element requires iterating through the first 999.

I see I have a bunch of reading up to do on wikipedia just for the sake of knowledge. Thanks for the search terms to get me started.

All of this appears to be OT for this thread, though.

Well basically a binary tree is a link list with some added features to make it faster to search. That picture shows how it would look in memory and do its search. It works by adding a extra variable to a node, instead of having a pointer to the next node like you do in a link list, you have a pointer to left and right nodes. The node on the left is smaller and the node on the right is larger. This allows the algorithm to take a divide and conquer approach to searching.

At the end of the day, an AVL tree for example has a worst case of o(log n).

Milk
Apr 12th, 2007, 06:15 PM
<snip> as far as I am aware, any functions dealing with UDTs must explicitly reference the individual element names, and thus can't be generalized.
I've found a way, it's Hacky, involvolving Copymemory Api, but it works... a least so far anyway, and it's fast compared to any Variant alternative. I'm still working on 8 byte variables in the UDT, these are more tricky because unlike variables of 4 bytes and less they don't always divide into the element length of the containing UDT array. I'll post when I have a chance to finish it.

Merri
Apr 12th, 2007, 08:53 PM
You can indeed hack into UDTs. One way is to make a custom safearrayheader that points into the same memory location as the UDT's main data is. If you for example have:
Private Type CUSTOM_UTD
Element1 As String
Element2 As String
Element3 As String
End Type


Then you can have a Long type array which header is replaced temporarily with a custom one that has three elements (0 - 2). What you are then provided in this array are three pointers to String data in memory, but as Long variables. So calling lngArray(0) would return the pointer to Element1 data.

Numeric datatypes are stored directly as values in the UDT instead of pointers to elsewhere, so if we have:Private Type CUSTOM_UTD
Element1 As Long
Element2 As Byte
Element3 As Date
End Type
We have a total of 13 bytes of data: 4 bytes Long, 1 byte Byte and 8 bytes Date. Filling this into an array of Longs so that it is actually useable is impossible. So you'd have to have atleast 4 Bytes to be able to do it. This is a limitation to the safearray technique.

Favoring Long has a very good reason: it is the fastest datatype as it is native to the 32-bit processor architecture. This means also pointers into memory locations are 32-bit values, for example the string pointers.

You can find a few examples by searching for safearray and by putting me in the writer field. As a warning, you can easily make a mess and crash VB or Windows with this, so it needs careful thinking to make sure you're not leaking memory and that you're restoring original state once done etc.

Milk
Apr 15th, 2007, 02:20 AM
I been taking advantage of the pad bytes in UDTs. They seem to always pad to a DWord if they contain any elements of 4 bytes or more, or a Word if the largest element is 2 bytes. This means that the Safearray method works well for all data types of 4 bytes and less. As soon as you start messing with UDTs holding 8+ byte variables and where the overall length is divisable by 4 but not 8, I need to use some other cunning. At the moment I'm using 1 array structure where I keep altering the start point up and down by 4 bytes for every other index, but i think there might be a better way... like, is it possible to create a custom UDT at runtime?

Here's a little demo of the padding.
Option Explicit

Private Type CUSTOM_UTD0
Element1 As Long
Element2 As Byte
Element3 As Date
End Type

Private Type CUSTOM_UTD1
Element1 As Long
Element2 As Byte
Element3 As Date
Element4 As Integer
End Type

Private Type CUSTOM_UTD2
Element1 As Long
Element2 As Byte
Element3 As Integer
Element4 As Date
End Type

Private Sub Form_Load()
Dim UDT0 As CUSTOM_UTD0
Dim UDT1 As CUSTOM_UTD1
Dim UDT2 As CUSTOM_UTD2
Dim lMemAddress As Long

Debug.Print "CUSTOM_UTD0 is " & LenB(UDT0) & " bytes long"
Debug.Print "CUSTOM_UTD1 is " & LenB(UDT1) & " bytes long"
Debug.Print "CUSTOM_UTD2 is " & LenB(UDT2) & " bytes long"
lMemAddress = VarPtr(UDT2)

Debug.Print "UDT2 starts at address " & lMemAddress
Debug.Print "The Element addresses relative to that are..."
Debug.Print "UDT2.Element1 (long) ---- " & VarPtr(UDT2.Element1) - lMemAddress
Debug.Print "UDT2.Element2 (byte) ---- " & VarPtr(UDT2.Element2) - lMemAddress
Debug.Print "UDT2.Element3 (integer) - " & VarPtr(UDT2.Element3) - lMemAddress
Debug.Print "UDT2.Element4 (date) ---- " & VarPtr(UDT2.Element4) - lMemAddress
End Sub