My application uses the standard VB6 Collection class heavily.
There are about 10 collections containing typically 50 elements for some and 500 elements for others.
I think that at least 50% of the CPU time is spent accessing these collections.
I would be interrested to speed up the application, and I would like to consider the use of some replacement for the standard Collection class.
Where could I find some replacement?
What would be the advantages of the replacements?
Is there some hope to speed up the use of Collections? (or should I better try optimize the other 50% CPU activity)
You could use arrays of the specific datatype, which is the fastest way to go. Of course you lack the convenience of keys, so if you must use them, you'd have to keep indexes of the keys yourself.
You could also make an array of your own type:
Code:
Public Type PersonInformation
Name As String
DOB As Date
DOD As Date
End Type
Dim People() As PersonInformation
Accessing array of a type is only a little bit slower than array of a specific datatype.
There is also a Dictionary object which is better than Collections, but I don't know about it's speed and have never used it as I think it had an extra dependancy.
Given a collection is just a managed array of variants with the option to search by key you could presumably get a performance boost by creating a managed array of the datatype you wish to hold.
Consider two arrays, one holding the collection data and one holding the string keys.
I imagine the really slow part of a collection is where you access an item by it's key rather than it's index. When comparing strings of different lengths you can get a marked performance boost by first comparing the LenB (faster than Len) of the string and then the string itself (if the length matches.)
You could get a massive performance boost if you stored the keys alphabetically, then you could search by divide and conquer. 10000 keys would only require 13 comparisons. This would not be so easy to set up, you would need two more arrays to link the indexes of the items to the keys and back, and adding removing items would be slow. Accessing items would be like lightning.
Edit: rather a slow post be me there, started posting before you merri
I am using Collection objects mainly to convert keys to indexes.
All the collections I use (exept one) are a mean to get an index in an array based on the name of something.
(or a double index sometimes)
To sort an initially unsorted list, use quicksort or shellsort. Once sorted, if you add a single item and want to maintain the order, use insertion sort or gnome sort.
Then a simple binary search as Milk describes would make lookups very fast.
Here we go, a pretty fast key collection class for cross referencing keys, item values and collection indexes:
Code:
' KeyCollection.cls
Option Explicit
Private Type IndexInformation
Pos As Long
Value As Long
End Type
Private m_Index() As IndexInformation
Private m_Keys As String
Public Function Add(ByVal Item As Long, ByRef Key As String) As Long
' must have key
If LenB(Key) Then
' must not exist
If InStr(m_Keys, vbNullChar & Key & vbNullChar) = 0 Then
' yay, add it!
Add = UBound(m_Index) + 1
' space for it
ReDim Preserve m_Index(Add)
' remember key position
With m_Index(Add)
.Pos = Len(m_Keys) + 1
.Value = Item
End With
' add to our key string
m_Keys = m_Keys & (Key & vbNullChar)
Exit Function
Else
' uncomment if you want an error if the item exists
'Err.Raise 5, "KeyCollection.Add", "Key already exists."
End If
Else
' uncomment if you want an error if the key is invalid
'Err.Raise 5, "KeyCollection.Add", "Invalid key."
End If
End Function
Public Function Count() As Long
' this is straightforward
Count = UBound(m_Index)
End Function
Public Function Item(ByRef Key As String) As Long
Dim lngA As Long, lngPos As Long
' must have key
If LenB(Key) Then
' find position
lngPos = InStr(m_Keys, vbNullChar & Key & vbNullChar) + 1
' do we have it?
If lngPos > 1 Then
' now we just have to find which item it is
For lngA = 1 To UBound(m_Index)
With m_Index(lngA)
' it is more likely to not have a match: True condition is faster
If .Pos <> lngPos Then
' not found
Else
' found!
Item = .Value
Exit Function
End If
End With
Next lngA
' uncomment if you want to know about this critical error
'Err.Raise 5, "KeyCollection.Item", "Key not found yet key exists!"
Else
' uncomment if you want an error if the key doesn't exist
'Err.Raise 5, "KeyCollection.Item", "Key does not exist."
End If
Else
' uncomment if you want an error if the key is invalid
'Err.Raise 5, "KeyCollection.Item", "Invalid key."
End If
End Function
Public Function Key(ByVal Item As Long) As String
Dim lngA As Long, lngPos As Long
For lngA = 1 To UBound(m_Index)
With m_Index(lngA)
If .Value <> Item Then
' not found
Else
' get the key for this index
lngPos = InStr(.Pos, m_Keys, vbNullChar)
If lngPos > 1 Then
' return it
Key = Mid$(m_Keys, .Pos, lngPos - .Pos)
Exit Function
Else
' uncomment if you want this critical error
'Err.Raise 5, "KeyCollection.Key", "Key string not found!"
End If
End If
End With
Next lngA
' uncomment if you want an error if the key doesn't exist
'Err.Raise 5, "KeyCollection.Key", "Item value not found."
End Function
Public Function KeyByIndex(ByVal Index As Long) As String
Dim lngPos As Long
' validate range
If Index >= 1 And Index <= UBound(m_Index) Then
With m_Index(Index)
' get the key for this index
lngPos = InStr(.Pos, m_Keys, vbNullChar)
If lngPos > 1 Then
' return it
KeyByIndex = Mid$(m_Keys, .Pos, lngPos - .Pos)
Exit Function
Else
' uncomment if you want this critical error
'Err.Raise 5, "KeyCollection.KeyByIndex", "Key string not found!"
End If
End With
Else
' uncomment if you want an error if the index is invalid
'Err.Raise 5, "KeyCollection.KeyByIndex", "Invalid."
End If
End Function
Public Function Remove(ByRef Key As String) As Boolean
Dim lngA As Long, lngB As Long, lngPos As Long, lngLen As Long
' must have key
If LenB(Key) Then
' find position
lngPos = InStr(m_Keys, vbNullChar & Key & vbNullChar) + 1
' do we have it?
If lngPos > 1 Then
' now we just have to find which item it is
For lngA = 1 To UBound(m_Index)
' it is more likely to not have a match: True condition is faster
If m_Index(lngA).Pos <> lngPos Then
' not found
Else
' found, start removal
lngLen = Len(Key) + 1
' copy information
For lngB = lngA To UBound(m_Index) - 1
m_Index(lngB).Pos = m_Index(lngB + 1).Pos - lngLen
m_Index(lngB).Value = m_Index(lngB + 1).Value
Next lngB
' free memory
ReDim Preserve m_Index(lngB - 1)
' now, remove the key
m_Keys = Left$(m_Keys, lngPos - 1) & Mid$(m_Keys, lngPos + lngLen)
' success!
Remove = True
Exit Function
End If
Next lngA
' uncomment if you want to know about this critical error
'Err.Raise 5, "KeyCollection.Item", "Key not found yet key exists!"
Else
' uncomment if you want an error if the key doesn't exist
'Err.Raise 5, "KeyCollection.Item", "Key does not exist."
End If
Else
' uncomment if you want an error if the key is invalid
'Err.Raise 5, "KeyCollection.Item", "Invalid key."
End If
End Function
Private Sub Class_Initialize()
m_Keys = vbNullChar
ReDim m_Index(0)
End Sub
Private Sub Class_Terminate()
m_Keys = vbNullString
Erase m_Index
End Sub
Sample:
Code:
Option Explicit
Private Sub Form_Load()
Dim Keys As New KeyCollection
Keys.Add 4, "A"
Keys.Add 3, "B"
Keys.Add 2, "C"
Keys.Add 1, "D"
' make key "D" invalid, index 4 invalid and item 1 invalid
Keys.Remove "D"
' by index "ABC" (note: I have set KeyByIndex as (Default) in Tools > Procedure Attributes)
MsgBox Keys(1) & Keys(2) & Keys(3) & Keys(4)
' by key string "4320"
MsgBox Keys.Item("A") & Keys.Item("B") & Keys.Item("C") & Keys.Item("D")
' by item "CBA"
MsgBox Keys.Key(1) & Keys.Key(2) & Keys.Key(3) & Keys.Key(4)
End Sub
Class also attached. Tell me if you find any problems in it Uncomment error lines if you want Collection style errors.
Because the post got too long, here we have some more information:
Add returns index if it succeeds or 0 if it fails.
Count returns the number of items. It never fails.
Item returns item value by given key or 0 if it fails.
Key returns key by given item value or null string if it fails.
KeyByIndex returns key by given index or nullstring if it fails.
Remove returns True if it succeeded to remove item by given key or False if it fails.
If you need more features, it shouldn't be too hard to add them in
The performance will drop for adding new items when the m_Keys string gets past about 60000 characters. This is because the string gets longer than VB's internal buffer for strings.
You may not use vbNullChar in any key strings. For speed reasons there is no error check for this.
The best way to proceed is to profile your code which will help you discover where your performance problems actualy are. There are many good code profilers around, and some of them come in trial versions that work on small projects.
Collections might well be your performance bottleneck, especially if you are using them as you suggest above:
I am using Collection objects mainly to convert keys to indexes.
All the collections I use (exept one) are a mean to get an index in an array based on the name of something.
(or a double index sometimes)
A Collection isn't intended as a hash index to an external data structure, it is a data structure that contains its own hash index. In other words the object of a search or keyed access is what you are supposed to store in each Collection item, not something like a number to be used to index something else.
I'd think this extra indirection may be what is eating your performance.
Without more specific knowledge of what you are trying to accomplish it is very hard to suggest a specific remedy. It feels as if you are trying to model something like a Collection with multi-values items by using a 2D array and indexing it through a set of single-valued Collections.
There are two standard approaches for doing this. One is to use a single Collection where each item value is a Variant array. This has the limitation of offering just one automatic hash index however.
A more flexible alternative is to use a fabricated ADO 2.5+ Recordset. Here you can add an automatic hash index for any Field simply by setting that Field's "Optimize" ADO dynamic property to True.
In either case the result will typically provide plenty of performance. This performance will often outstrip hand crafted solutions using array sorting and binary searches, which have the disadvantages of being bug-prone and clumsy to maintain as well. This doesn't mean you can't do better with hand coding, but the cost/benefit almost always weighs against it - especially when changes are required.
An ADO solution also offers a host of additional features such as serialization and persistence: you can save and load the data trivially.
The hard part of this is retrofitting it into an existing program that has been written against the "world is flat (arrays)" programming model. However by learning how to use the ADO object model as an internal datastore you can apply the approach in future programs and see an increase in your development productivity.
I'd start with code profiling though. There is no point in guessing where you need to optimize. Have you looked at anything like VB Watch v2 before?
Easy enough...you can save/load a UDT array directly from disk, even if it has arrays of its own, and it's very fast...and doesn't add a dependency...
Not sure what you mean about "metadata"
He may not know that you can save an entire udt with sub-arrays with a single line of code. I agree that it would be worth a pointless dependency to avoid having to read/write each individual record -- or even worse, each indivdual variable for each individual record -- in some huge project-specific code.
Luckily, like most most programming languages VB6 assumes you'll want to save/load udts to/from the harddrive, so it lets you do it as a one-liner.
rs.Save strFilename
vs
Put #1, 1, MyUDTWithSubArrays
Doesn't seem worth a dependency to me. Especially not MDAC.
I would also wager that just about anything you do with a udt array will be faster than treating it as a recordset for the simple fact of having way less overhead. Plus it gives you control over how you handle optimization.
Last edited by Ellis Dee; Jul 13th, 2008 at 07:59 AM.