[RESOLVED] Why is my Multi-Column Collection extremely slow?
I need an ultra-high-performance Multi-Column Collection that takes up as little memory as possible. I wrote a couple of array classes and combined them into a Multi-Column Collection, but it was extremely slow. I don't know what the reason is.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by bahbahbah
Your remove function scales poorly, at worst O(n).
Which "Remove"-Function? The one of the "SuperCollection" or the one for the item?
The Remove of the Collection just runs "horizontally" through the "columns".
So, if he has 10 Columns (which means 10 Arrays), he calls 10 Times the CopyMemory-API within the Remove-Function of the Array.
Instead of an Array for each column, i might have looked at an UDT and/or separate class, representing the "Entry/Item" in the Collection, but having itself the "number" of "columns".
Nutshell: Such an item would represent a whole "row".
Removing such an item from the Collection would indeed be one single call to "CopyMemory".
OP wants to have different Datatypes for the Columns (currently at String and Long). In such an Item-Class as described above, i'd go with Variant and be done with it.
In the "Collection" i might keep a private Array holding the the DataType Description for each column (which would be Variants), since you need the "real" Datatype only for Calculations and/or displaying, but not for storing
Last edited by Zvoni; Mar 27th, 2025 at 02:42 AM.
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
Re: Why is my Multi-Column Collection extremely slow?
Thank you, bahbahbah.
Originally Posted by bahbahbah
Your remove function scales poorly, at worst O(n).
Yes, the remove method is the biggest cause of slow performance. Because I'm using arrays instead of hash-table and trees. There doesn't seem to be an effective way for arrays to solve the problem of efficiency of inserts and deletions in the case of large amounts of data. This is why RC6.ArrayList is equally inefficient when performing Remove operations.
vamvam provides a great Multi-Column-Collection, but unfortunately it's implemented in C++ and I can't apply it to my project.
Yes, I took a closer look at these two threads, which took me back to the good old days of vbForums, when people were talking about technical issues in high spirits, even if there were occasional arguments, but they didn't affect the technical discussions.
The information provided by the above two threads doesn't solve my problem though. If I can translate vamvam's C code into VB6 code might solve my problem, but I don't have the time and energy to do this right now.
Last edited by SearchingDataOnly; Mar 27th, 2025 at 10:25 AM.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Zvoni
Which "Remove"-Function? The one of the "SuperCollection" or the one for the item?
The Remove of the Collection just runs "horizontally" through the "columns".
So, if he has 10 Columns (which means 10 Arrays), he calls 10 Times the CopyMemory-API within the Remove-Function of the Array.
Instead of an Array for each column, i might have looked at an UDT and/or separate class, representing the "Entry/Item" in the Collection, but having itself the "number" of "columns".
Nutshell: Such an item would represent a whole "row".
Removing such an item from the Collection would indeed be one single call to "CopyMemory".
OP wants to have different Datatypes for the Columns (currently at String and Long). In such an Item-Class as described above, i'd go with Variant and be done with it.
In the "Collection" i might keep a private Array holding the the DataType Description for each column (which would be Variants), since you need the "real" Datatype only for Calculations and/or displaying, but not for storing
Thank you, Zvoni.
If we can dynamically set the members of the UDT, then using the UDT is a great solution.
For example:
Code:
With MyCollection
.Cols = 4
.ColType(0) = vbLong
.ColType(1) = vbString
.ColType(2) = vbString
.ColType(3) = vbString
End With
So, how do we dynamically generate the following UDT based on the above example?
Code:
Type MyUDT
Field1 As Long
Field2 As String
Field3 As String
Field4 As String
End Type
Last edited by SearchingDataOnly; Mar 27th, 2025 at 09:45 PM.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by SearchingDataOnly
Thank you, Zvoni.
If we can dynamically set the members of the UDT, then using the UDT is a great solution.
For example:
Code:
With MyCollection
.Cols = 4
.ColType(0) = vbLong
.ColType(1) = vbString
.ColType(2) = vbString
.ColType(3) = vbString
End With
So, how do we dynamically generate the following UDT based on the above example?
Code:
Type MyUDT
Field1 As Long
Field2 As String
Field3 As String
Field4 As String
End Type
None they i know of (maybe with Class-Factory?).
My idea was mor along the lines of
Code:
Type MyUDT
Fields() As Variant
End Type
The Variant itself will know if it's a Long or a String.
Aircode
In Collection-Class
Code:
Private DataTypesCols() As VBVarType
Private myArray() AS MyUDT
...
Public Sub Create(ByVal ColNo As Long, Optional SizeOfArray As Long=-1)
Dim sa As Long
Dim i as long
ReDim DataTypeCols(0 To ColNo-1)
If SizeOfArray>-1 Then sa=SizeOfArray Else sa=InitialCapacity
ReDim myArray(0 To sa-1)
For i=LBound(myArray) To UBound(myArray)
ReDim myArray(i).Fields(0 To ColNo-1)
Next
'Set DataTypeCols
'Fill MyArray
End Sub
'Blablabla
The UDT represents a complete row, so adding or deleting involves only one call to CopyMemory
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by SearchingDataOnly
I need an ultra-high-performance Multi-Column Collection that takes up as little memory as possible.
If you use an SQLite InMemory-(DB)-Table - in conjunction with the RC6.cCursor-Class -
(wrapped behind some convenience methods)
then you'll get high performance and low memory without larger efforts.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Schmidt
If you use an SQLite InMemory-(DB)-Table - in conjunction with the RC6.cCursor-Class -
(wrapped behind some convenience methods)
then you'll get high performance and low memory without larger efforts.
Olaf
Yep. That would also be my approach (at least in FreePascal).
No idea if OP uses RC6.
Would also simplify stuff like (multi-column) sorting, selecting "sub-sections" of the Dataset (a.k.a filtering, searching for specific Datasets etc.), "Duplicates yes/no"
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Zvoni
None they i know of (maybe with Class-Factory?).
My idea was mor along the lines of
Code:
Type MyUDT
Fields() As Variant
End Type
The Variant itself will know if it's a Long or a String.
Aircode
In Collection-Class
Code:
Private DataTypesCols() As VBVarType
Private myArray() AS MyUDT
...
Public Sub Create(ByVal ColNo As Long, Optional SizeOfArray As Long=-1)
Dim sa As Long
Dim i as long
ReDim DataTypeCols(0 To ColNo-1)
If SizeOfArray>-1 Then sa=SizeOfArray Else sa=InitialCapacity
ReDim myArray(0 To sa-1)
For i=LBound(myArray) To UBound(myArray)
ReDim myArray(i).Fields(0 To ColNo-1)
Next
'Set DataTypeCols
'Fill MyArray
End Sub
'Blablabla
The UDT represents a complete row, so adding or deleting involves only one call to CopyMemory
I'll take a closer look at your options. Thank you, Zvoni.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Schmidt
If you use an SQLite InMemory-(DB)-Table - in conjunction with the RC6.cCursor-Class -
(wrapped behind some convenience methods)
then you'll get high performance and low memory without larger efforts.
Olaf
Thank you for your guidance, Olaf.
I don't know how to add and delete records with RC6.cCursor yet. I used RC6.cCommand and RC6.cRecordSet to simulate a Muti-Column-Collection. Here are the test results.
Code:
====================================
Test RC6-cMcCollection
Add million items at the end: 5,613.53msec
Delete 500000 items from the middle: 551.07msec
Update values: 2,482.29msec
====================================
Test VMMC
Add million items at the end: 3,382.08msec
Delete 500000 items from the middle: 315.41msec
Update values: 744.66msec
That is basically a "Collection" that can be specifically customized to your needs, rather than "generalized" (with Variants) like the VB6 Collection.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
That is basically a "Collection" that can be specifically customized to your needs, rather than "generalized" (with Variants) like the VB6 Collection.
Hi Elroy,
Thank you for the information. I tested your BST but I don't seem to know how to write a Multi-Column-Collection on top of your BST yet. I've combined your BST_LngKey_NoVal and BST_StrKey_NoVal into a Multi-Column-Collection, but there seems to be something wrong with the performance.
Here are the test results:
Code:
===================================
Test cSuperCollection-BST
===================================
Add million items at the end: 72.344 seconds
Re: Why is my Multi-Column Collection extremely slow?
Hello SDO,
I guess I don't know what you mean by "multi-column collection". If it were me, I'd probabily just create multiple (perfectly matched) collections if I needed "multiple columns". My BST routines could be modified for a specific UDT (which could represent columns), but I'm not sure I'm willing to do that work for some custom-UDT, when just using "paired-collections" will get it done.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Elroy
Hello SDO,
I guess I don't know what you mean by "multi-column collection". If it were me, I'd probabily just create multiple (perfectly matched) collections if I needed "multiple columns". My BST routines could be modified for a specific UDT (which could represent columns), but I'm not sure I'm willing to do that work for some custom-UDT, when just using "paired-collections" will get it done.
It seems difficult to simulate the high performance of vamvam's Multi-Column-Collection with VB.Collection, but not only that, it's also extremely easy to use and has a low memory footprint.
Re: Why is my Multi-Column Collection extremely slow?
Btw, I usually simulate "Multi-Column-Collections" with a simple array of UDTs. Works fine for me *and* is strongly typed i.e. no need c.ColType(0) = vmeInteger and c.ColType(1) = vmeString.
Now, for deleting an element in the *middle* this probably sucks but then there are ways to optimize it by marking deleted records, keeping an index and what not, which should be attempted if deleting in the *middle* is a core requirement for the app in question (usually not).
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by wqweto
Btw, I usually simulate "Multi-Column-Collections" with a simple array of UDTs. Works fine for me *and* is strongly typed i.e. no need c.ColType(0) = vmeInteger and c.ColType(1) = vmeString.
Now, for deleting an element in the *middle* this probably sucks but then there are ways to optimize it by marking deleted records, keeping an index and what not, which should be attempted if deleting in the *middle* is a core requirement for the app in question (usually not).
cheers,
</wqw>
Searching an array for your key, or keeping it sorted, can also be tricky. That's what makes collection/dictionary-type things so useful.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Re: Why is my Multi-Column Collection extremely slow?
Also, SDO, it was mentioned above but a disconnected ADO recordset is another excellent alternative. And it's inherently multi-column, and also easily searchable.
Here's a codebank link where I put together code to make these things easy to create-and-use.
I haven't done comparative speed tests on them, but they should be ok.
Also, you mentioned "millions of items" above. If you're getting into that volume of possible records, that's starting to sound more like a disk-based recordset in some database.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by wqweto
Btw, I usually simulate "Multi-Column-Collections" with a simple array of UDTs. Works fine for me *and* is strongly typed i.e. no need c.ColType(0) = vmeInteger and c.ColType(1) = vmeString.
Yes, using UDT arrays is the fastest when performing Add and Update operations, but the code is a bit verbose.. Here's the code for my MyMultiColumnList (it looks a little cleaner and more concise):
Code:
Dim oList As New MyMultiColumnList
oList.Init vbInteger, vbString, vbString, vbString
With oExternLib
For i = 0 To oExternLib.Count -1
oList.Add .Id(i), .Caption(i), .Tag(i), .Description(i)
Next i
End With
Also, as a class, MyMultiColumnList is more convenient to pass between different modules and different functions. The only drawback is that the deletion operation performance is extremely low in large data volumes.
Originally Posted by wqweto
Now, for deleting an element in the *middle* this probably sucks but then there are ways to optimize it by marking deleted records, keeping an index and what not, which should be attempted if deleting in the *middle* is a core requirement for the app in question (usually not).
It's a good idea to set a deleted flag. Thank you, wqweto.
Last edited by SearchingDataOnly; Mar 30th, 2025 at 10:26 PM.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Elroy
Also, SDO, it was mentioned above but a disconnected ADO recordset is another excellent alternative. And it's inherently multi-column, and also easily searchable.
Here's a codebank link where I put together code to make these things easy to create-and-use.
====================================
Test RC6-cMcCollection
Add million items at the end: 5,613.53msec
Delete 500000 items from the middle: 551.07msec
Update values: 2,482.29msec
====================================
Test VMMC
Add million items at the end: 3,382.08msec
Delete 500000 items from the middle: 315.41msec
Update values: 744.66msec
Originally Posted by Elroy
I haven't done comparative speed tests on them, but they should be ok.
IMO, the performance of database-based recordsets cannot be compared to arrays, lists, and trees.
Re: Why is my Multi-Column Collection extremely slow?
Originally Posted by Zvoni
None they i know of (maybe with Class-Factory?).
My idea was mor along the lines of
Code:
Type MyUDT
Fields() As Variant
End Type
The Variant itself will know if it's a Long or a String.
Aircode
In Collection-Class
Code:
Private DataTypesCols() As VBVarType
Private myArray() AS MyUDT
...
Public Sub Create(ByVal ColNo As Long, Optional SizeOfArray As Long=-1)
Dim sa As Long
Dim i as long
ReDim DataTypeCols(0 To ColNo-1)
If SizeOfArray>-1 Then sa=SizeOfArray Else sa=InitialCapacity
ReDim myArray(0 To sa-1)
For i=LBound(myArray) To UBound(myArray)
ReDim myArray(i).Fields(0 To ColNo-1)
Next
'Set DataTypeCols
'Fill MyArray
End Sub
'Blablabla
The UDT represents a complete row, so adding or deleting involves only one call to CopyMemory
I tested your method and it has the following problems:
(1) It takes up too much memory
(2) The performance is not excellent
(3) An error occurred when using CopyMemory to manipulate Variant-Array(Fields array).
Re: Why is my Multi-Column Collection extremely slow?
Someone sent me a private message with an AI-generated piece of code. I tested it carefully and it was extremely fast and it won the championship in terms of performance. Thank you very very much, good Samaritan.
Here are the test results.
Code:
====================================
Test AI-MultiColumnList
Add million items at the end: 3.031 seconds
Delete 500000 items from the middle: 0.063 seconds
Update values: 0.828 seconds
Last edited by SearchingDataOnly; Apr 1st, 2025 at 04:18 AM.
Re: [RESOLVED] Why is my Multi-Column Collection extremely slow?
This code snippet is unnecessary
Code:
Case vbObject
' For objects, we need to carefully preserve existing references
Dim TempObjects() As Object
ReDim TempObjects(NewSize)
' Copy existing object references
For j = 0 To UBound(Columns(i).ObjectValues)
Set TempObjects(j) = Columns(i).ObjectValues(j)
Next j
' Initialize new object references to Nothing
For j = UBound(Columns(i).ObjectValues) + 1 To NewSize
Set TempObjects(j) = Nothing
Next j
' Replace the array
Columns(i).ObjectValues = TempObjects
Erase TempObjects