|
-
Apr 4th, 2022, 03:07 PM
#1
Thread Starter
PowerPoster
[RESOLVED] to undestand array dictionarary, collection
Wath is the best performance to use an array, and fast find string in it?
dictionarary, collection or join&instr
-
Apr 4th, 2022, 04:01 PM
#2
Hyperactive Member
Re: to undestand array dictionarary, collection
If you're starting with an array, then the fastest solution is looping through the array. You have to access every element to set up the other options you're considering.
What size are the data sets you're talking about? For small sets, the only appreciable difference is going to be the internal exception that is raised when a string is not found in the collection index.
-
Apr 4th, 2022, 08:27 PM
#3
Re: to undestand array dictionarary, collection
For 99% of purposes, regular VB functions will be more than fast enough. If you're in one of those rare situations where VB isn't keeping up,
http://www.xbeat.net/vbspeed
-
Apr 4th, 2022, 11:20 PM
#4
Re: to undestand array dictionarary, collection
 Originally Posted by ahenry
If you're starting with an array, then the fastest solution is looping through the array. You have to access every element to set up the other options you're considering.
*rubs chin beard* really? Can you explain that? I'm not sure I completely agree with those two statements.
-tg
-
Apr 5th, 2022, 01:09 AM
#5
Re: to undestand array dictionarary, collection
For a 1D-Array of Strings there is always the Filter-Function https://www.automateexcel.com/vba/fi...rays-function/
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 5th, 2022, 02:31 AM
#6
Re: to undestand array dictionarary, collection
When the array is sorted then you can use a binary search on the array.
That's quite fast.
-
Apr 5th, 2022, 02:41 AM
#7
Re: to undestand array dictionarary, collection
 Originally Posted by Arnoutdv
When the array is sorted then you can use a binary search on the array.
That's quite fast.
True, but it would return only the first match.
As for dictionary and collection: to work reasonably quick finding a string, the added strings have to be both: Key and Value
limiting both (Dic and Col) to no duplicates and exact matches.
The Filter-Function for 1D-StringArray can do partial match, case-(in)sensitive and even negation
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 5th, 2022, 06:04 AM
#8
Re: to undestand array dictionarary, collection
here with Instr
Code:
Private FileName As String
Private cCol As Collection
Private Sub Command1_Click()
Dim s As String
On Error GoTo Fehler
s = Text1.Text & " = " & cCol(Text1.Text)
MsgBox s
On Error GoTo 0
Exit Sub
Fehler:
MsgBox Text1.Text & " not found"
On Error GoTo 0
End Sub
Private Sub Form_Load()
FileName = "E:\TestFolder\Luca.txt"
'create TestFile
WriteFile
ReadFileInCollection
Text1.Text = "Apple"
End Sub
Private Sub WriteFile()
Dim FNr As Integer
FNr = FreeFile
Open FileName For Output As #FNr
Print #FNr, "'comment = whatever"
Print #FNr, "Apple = green"
Print #FNr, "Tomato = red"
Print #FNr, "Banana = yellow"
Close #FNr
End Sub
Private Sub ReadFileInCollection()
Dim FNr As Integer
Dim s As String, s1() As String
Dim i As Long
Set cCol = Nothing
Set cCol = New Collection
FNr = FreeFile
Open FileName For Input As #FNr
Do While Not EOF(FNr)
Line Input #FNr, s
i = InStr(1, s, "'")
If i > 0 Then
'remove comment in File
s = Left$(s, i - 1)
End If
If InStr(1, s, "=") > 0 Then
'= found
s1() = Split(s, "=")
cCol.Add Trim(s1(1)), Trim(s1(0))
End If
Loop
Close #FNr
End Sub
but as usuall you never show what you tried to code yourself, so here another Code for HDD
to hunt a species to extinction is not logical !
since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.
-
Apr 5th, 2022, 08:23 AM
#9
Fanatic Member
Re: to undestand array dictionarary, collection
In my opinion it depends on what problem you are trying to solve. Could you please perhaps elaborate.
Each of the things you mention have their own strengths and weaknesses.
thanks
-
Apr 5th, 2022, 10:09 AM
#10
Re: to undestand array dictionarary, collection
 Originally Posted by vbwins
In my opinion it depends on what problem you are trying to solve. Could you please perhaps elaborate.
Each of the things you mention have their own strengths and weaknesses.
That is precisely correct.
Collections are very nice in that they're organized as a binary tree (if you supply keys) for quick searching/finding. They also have these features:
- Their creation class is built into VB6.
- You can iterate them with the For Each syntax.
- Their data item is a Variant so pretty much anything can be stored in them.
- They also have forward and backward index pointers for looped access.
- Additions and (more importantly) deletions are trivial.
- There is "balancing" code for the binary tree when adding items.
When I have a list of items that's going to be massaged, I find Collections extremely useful.
There is also the Dictionary object. Compared to Collections, it has one upside and one downside. The upside is that it uses a hash table for key lookups, so it's faster at that. The downside is that it's not a VB6 built-in class.
---------
I suppose, the downside of Collections compared to regular arrays is that there's a bit more overhead, specifically in two areas: 1) Each item takes a structure (see below), and 2) The data is a Variant rather than the specific data type you wish to store. Regarding #2, I've gotten very comfortable with Variants through the years, and find that small amount of overhead very acceptable (just look up the VT and then fetch the data out of the Variant's remaining 14 bytes, no biggie). Regarding #1, here's the structure of each Collection's item.
Collection Item structure:
Code:
Private Type VbCollectionItem
Data As Variant ' Ox00
Key As String ' Ox10
pPrevIndexedItem As Long ' Ox14
pNextIndexedItem As Long ' Ox18
pvUnknown As Long ' Ox1C
pParentItem As Long ' Ox20
pRightBranch As Long ' Ox24
pLeftBranch As Long ' Ox28
bFlag As Boolean ' Ox2C
End Type ' Ox30 ' Length. (boolean padded to 4)
And each instantiated Collection has the following header:
Code:
Private Type VbCollectionHeader
pInterface1 As Long ' Ox00
pInterface2 As Long ' Ox04
pInterface3 As Long ' Ox08
lRefCounter As Long ' Ox0C
Count As Long ' Ox10
pvUnk1 As Long ' Ox14
pFirstIndexedItem As Long ' Ox18
pLastIndexedItem As Long ' Ox1C
pvUnk4 As Long ' Ox20
pRootTreeItem As Long ' Ox24 ' This is actually a pointer to what's typically thought of as the root.
pEndTreePtr As Long ' Ox28 ' This is effectively an EOF marker for the tree (bottom of it). It points to the end of the VbCollectionHeader (HdrPtr + &h30)
pvUnk5 As Long ' Ox2C
End Type ' Ox30 ' Length.
With that information, we could essentially "roll our own" pretty easily if we wanted. Possibly a part a bit more complex is the "tree balancing" that the VB6 Collection Class performs, but that's only run on an Add item.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Apr 5th, 2022, 10:29 AM
#11
Re: to undestand array dictionarary, collection
downsides for collections when used that way (using the Key):
no duplicates allowed
no partial match
no case-insensivity
Don‘t get me wrong: i use collections all the time, we just shouldn’t forget the downsides
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 5th, 2022, 10:37 AM
#12
Re: to undestand array dictionarary, collection
 Originally Posted by Zvoni
downsides for collections when used that way (using the Key):
no duplicates allowed
no partial match
no case-insensivity
Don‘t get me wrong: i use collections all the time, we just shouldn’t forget the downsides
Zvoni, you are correct but the plusses and minuses of lookup keys is another whole discussion. I guess another approach to a list is to create a memory-based ADO recordset. Doing that, we'd have all the search power allowed for SQL SELECT statements. We would have the ADO reference, but that's no biggie. Also, setting up those memory-based ADO recordsets does take a bit of doing, but also, no biggie. Someone should create a class for that and post it in the codebank. I haven't searched, maybe it's already done.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Apr 5th, 2022, 12:04 PM
#13
Re: to undestand array dictionarary, collection
 Originally Posted by Zvoni
downsides for collections when used that way (using the Key):
no duplicates allowed
no partial match
no case-insensivity
Don‘t get me wrong: i use collections all the time, we just shouldn’t forget the downsides
2/3 of those can be eliminated by using Olaf's CHashD class: https://www.vbforums.com/showthread....lass-(no-APIs). As an added bonus, it's faster than the VB6 Collection class for adds & lookups.
-
Apr 5th, 2022, 12:09 PM
#14
Thread Starter
PowerPoster
Re: to undestand array dictionarary, collection
Tks tò the all.
But really my problem Is the numbers of item in my dataset, Is approx 120.xxx items.
And if i use a loop with a for next, i think not Is a good solution, for a find,or not?
-
Apr 5th, 2022, 12:28 PM
#15
Re: to undestand array dictionarary, collection
Luca,
are those items „unique“ as in: no duplicates?
if yes, a collection is your best bet.
and you mention „dataset“: by any chance, is there a database in play?
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 5th, 2022, 12:47 PM
#16
Re: to undestand array dictionarary, collection
I just did a test of 466550 words stored in both an Array and in a CHashD object. I performed 1000 look-ups for randomly selected Keys. The array looping lookups took around 6 seconds (compiled with all optimizations). The CHashD lookups took about 6 milliseconds.
-
Apr 5th, 2022, 01:14 PM
#17
Thread Starter
PowerPoster
Re: to undestand array dictionarary, collection
 Originally Posted by Zvoni
Luca,
are those items „unique“ as in: no duplicates?
if yes, a collection is your best bet.
and you mention „dataset“: by any chance, is there a database in play?
In The data Not possibile duplicate Key! Sure
-
Apr 5th, 2022, 02:35 PM
#18
Re: to undestand array dictionarary, collection
Then an sorted array with a binary search should be the fastest method
-
Apr 6th, 2022, 03:23 AM
#19
Re: to undestand array dictionarary, collection
 Originally Posted by Arnoutdv
Then an sorted array with a binary search should be the fastest method
Not necessarily.
When you have (as mentioned by the OP) about 120,000 entries -
then a binary-search (on a sorted Array) will need about 16-17 "full string-coparisons" (with the searchstring).
Most HashList-implementations can complete an exists-check with fewer operations/comparisons
(when the amount of data to search, is as large as the OPs).
E.g. the cHashD-Class which was mentioned, is faster than the RC5/6-cSortedDictionary-Class in such a scenario.
Olaf
-
Apr 6th, 2022, 05:43 AM
#20
Re: to undestand array dictionarary, collection
I did a basic test with a binary search and the cHashList.
Compiled, fast code only, the binary search performs about 15% better compared to cHashList
But I really like the elegancy of cHashList!
Code:
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Sub Form_Click()
Dim t As Long
Dim sArray() As String
Dim i As Long, j As Long
Dim s(999) As String
Dim lIndex As Long, bExist As Boolean
ReDim sArray(120000 - 1)
For i = 0 To UBound(sArray)
sArray(i) = "K:" & Format(CStr(i), "0000000")
Next i
Dim cHL As clsHashList
Set cHL = New clsHashList
cHL.ReInit 120000
For i = 0 To UBound(sArray)
cHL.Add i, sArray(i)
Next i
For i = 0 To 999
s(i) = "K:" & Format(Int(Rnd * 120000), "0000000")
Next i
t = GetTickCount
For j = 1 To 1000: For i = 0 To 999
lIndex = GetIndexStringArray(s(i), sArray)
Next: Next
Print "GetIndex: " & GetTickCount - t
t = GetTickCount
For j = 1 To 1000: For i = 0 To 999
bExist = cHL.Exists(s(i))
Next: Next
Print "HashList: " & GetTickCount - t
End Sub
Public Function GetIndexStringArray(ByVal sID As String, sArray() As String) As Long
Dim lFirst As Long
Dim lLast As Long
Dim lMiddle As Long
Dim lSteps As Long
' assume search failed
GetIndexStringArray = -1
lFirst = 0
lLast = UBound(sArray)
Do
'lSteps = lSteps + 1
lMiddle = (lFirst + lLast) \ 2
Select Case StrComp(sID, sArray(lMiddle), vbBinaryCompare)
Case 0
GetIndexStringArray = lMiddle
Exit Do
Case 1
lFirst = lMiddle + 1
Case Else
lLast = lMiddle - 1
End Select
Loop Until lFirst > lLast
End Function
-
Apr 6th, 2022, 06:23 AM
#21
Re: to undestand array dictionarary, collection
Try against the modified CHashD class linked a bit further down the post (Or directly here: https://www.vbforums.com/attachment....9&d=1473008813)
In my test it was about almost twice as fast as the binary search. Note that the Add method parameters are flipped, so you should reverse the order of the parameters in your test code.
-
Apr 6th, 2022, 06:50 AM
#22
Re: to undestand array dictionarary, collection
 Originally Posted by jpbro
Try against the modified CHashD class ...
Yep, this one performs even better than the "bare-bones, first implementation".
(despite supporting "all kind of Key-Types" via Variant-Key-Params).
Also, the CompareMode didn't match in Arnouts example (cHashD defaults to vbTextCompare)...
below is an adjusted example (using the cHashD-Code from your Zip), which native-copiled is:
- about twice as fast in bincomp-mode
- and about 2.8 times as fast in text-comparemode
Code:
Option Explicit
Private Declare Function GetTickCount Lib "kernel32" () As Long
Private Sub Form_Click()
Dim t As Long
Dim sArray() As String
Dim i As Long, j As Long
Dim S(999) As String
Dim lIndex As Long
ReDim sArray(120000 - 1)
For i = 0 To UBound(sArray)
sArray(i) = "K:" & Format(CStr(i), "0000000")
Next i
Const CompareMode = vbTextCompare
' Const CompareMode = vbBinaryCompare
Dim cHL As clsHashList
Set cHL = New clsHashList
cHL.ReInit 120000
cHL.CompareMode = CompareMode
For i = 0 To UBound(sArray)
cHL.Add sArray(i), i
Next i
For i = 0 To 999
S(i) = "K:" & Format(Int(Rnd * 120000), "0000000")
Next i
t = GetTickCount
For j = 1 To 1000: For i = 0 To 999
lIndex = GetIndexStringArray(S(i), sArray, CompareMode)
Next: Next
Print "GetIndex: " & GetTickCount - t
t = GetTickCount
For j = 1 To 1000: For i = 0 To 999
lIndex = cHL.IndexByKey(S(i))
Next: Next
Print "HashList: " & GetTickCount - t
End Sub
Public Function GetIndexStringArray(ByVal sID As String, sArray() As String, Optional ByVal CompareMode As VbCompareMethod) As Long
Dim lFirst As Long
Dim lLast As Long
Dim lMiddle As Long
Dim lSteps As Long
' assume search failed
GetIndexStringArray = -1
lFirst = 0
lLast = UBound(sArray)
Do
'lSteps = lSteps + 1
lMiddle = (lFirst + lLast) \ 2
Select Case StrComp(sID, sArray(lMiddle), CompareMode)
Case 0
GetIndexStringArray = lMiddle
Exit Do
Case 1
lFirst = lMiddle + 1
Case Else
lLast = lMiddle - 1
End Select
Loop Until lFirst > lLast
End Function
HTH
Olaf
Last edited by Schmidt; Apr 6th, 2022 at 06:58 AM.
Reason: forgot to swap value and key in cHL.Add
-
Apr 6th, 2022, 06:55 AM
#23
Re: to undestand array dictionarary, collection
I missed the update in the original thread.
Indeed with the cHashD the performance is twice as fast, thanks for the heads up!
-
Apr 6th, 2022, 07:27 AM
#24
Re: to undestand array dictionarary, collection
 Originally Posted by Arnoutdv
I missed the update in the original thread.
Indeed with the cHashD the performance is twice as fast, thanks for the heads up!
FWIW, a further optimized variant of cHashD is contained in RC6
(performing in this scenario about factor 3-3.5 better in both modes, binary and textcompare):
Code:
Option Explicit
Private Sub Form_Click()
Dim sArray() As String, K(999) As String
Dim i As Long, lIndex As Long
ReDim sArray(120000 - 1)
For i = 0 To UBound(sArray)
sArray(i) = "K:" & Format(CStr(i), "0000000")
Next i
' Const CompareMode = vbTextCompare
Const CompareMode = vbBinaryCompare
Dim cHL As cHashD
Set cHL = New_c.HashD(120000, False)
cHL.StringCompareMode = CompareMode
For i = 0 To UBound(sArray)
cHL.Add sArray(i), i
Next i
For i = 0 To UBound(K)
K(i) = "K:" & Format(Int(Rnd * 120000), "0000000")
Next i
For i = 0 To UBound(K) 'just to test for "formal correctness" of the algos
If cHL.IndexByKey(K(i)) <> GetIndexStringArray(K(i), sArray, CompareMode) Then
MsgBox "The found indexes do not match": Exit Sub
End If
Next
New_c.Timing True
For i = 0 To UBound(K)
lIndex = GetIndexStringArray(K(i), sArray, CompareMode)
Next
Print "GetIndex: " & New_c.Timing
New_c.Timing True
For i = 0 To UBound(K)
lIndex = cHL.IndexByKey(K(i))
Next
Print "HashList: " & New_c.Timing
End Sub
Public Function GetIndexStringArray(ByVal sID As String, sArray() As String, Optional ByVal CompareMode As VbCompareMethod) As Long
Dim lFirst As Long
Dim lLast As Long
Dim lMiddle As Long
Dim lSteps As Long
' assume search failed
GetIndexStringArray = -1
lFirst = 0
lLast = UBound(sArray)
Do
'lSteps = lSteps + 1
lMiddle = (lFirst + lLast) \ 2
Select Case StrComp(sID, sArray(lMiddle), CompareMode)
Case 0
GetIndexStringArray = lMiddle
Exit Do
Case 1
lFirst = lMiddle + 1
Case Else
lLast = lMiddle - 1
End Select
Loop Until lFirst > lLast
End Function
Olaf
-
Apr 6th, 2022, 09:19 AM
#25
Fanatic Member
Re: to undestand array dictionarary, collection
There is also the Dictionary object. Compared to Collections, it has one upside and one downside. The upside is that it uses a hash table for key lookups, so it's faster at that. The downside is that it's not a VB6 built-in class.
Only problem with this is that its limited to 1200 hash buckets. When there are a lot of collisions it gets much slower.
-
Apr 6th, 2022, 09:22 AM
#26
Fanatic Member
Re: to undestand array dictionarary, collection
 Originally Posted by luca90
Tks tò the all.
But really my problem Is the numbers of item in my dataset, Is approx 120.xxx items.
And if i use a loop with a for next, i think not Is a good solution, for a find,or not?
In case I missed it what is the key. A string?
-
Apr 6th, 2022, 09:26 AM
#27
Re: to undestand array dictionarary, collection
 Originally Posted by vbwins
In case I missed it what is the key. A string?
Code:
Private Type VbCollectionItem
Data As Variant ' Ox00
Key As String ' <---------------- Ox10
pPrevIndexedItem As Long ' Ox14
pNextIndexedItem As Long ' Ox18
pvUnknown As Long ' Ox1C
pParentItem As Long ' Ox20
pRightBranch As Long ' Ox24
pLeftBranch As Long ' Ox28
bFlag As Boolean ' Ox2C
End Type ' Ox30 ' Length. (boolean padded to 4)
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Apr 6th, 2022, 09:51 AM
#28
Fanatic Member
Re: to undestand array dictionarary, collection
Sorry I meant what is key the OP wants to use.
-
Apr 6th, 2022, 11:19 AM
#29
Re: to undestand array dictionarary, collection
 Originally Posted by vbwins
Sorry I meant what is key the OP wants to use.
It‘s a simple 1D-array of strings, all unique (no duplicates).
for Collection/Dictionary the array-member would be key and value.
i still don‘t understand why he just doesn’t test the performance on his dataset with all possibilities mentioned:
collection
dictionary
filter-function for string-arrays
Hash-thingy in RC6
binary search on array
whatever i missed….
EDIT: one downside not mentioned for binary search: array must be sorted.
so there might be another penalty hidden there
Last edited by Zvoni; Apr 6th, 2022 at 11:24 AM.
Last edited by Zvoni; Tomorrow at 31:69 PM.
----------------------------------------------------------------------------------------
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------------------
People call me crazy because i'm jumping out of perfectly fine airplanes.
---------------------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad
-
Apr 6th, 2022, 11:26 AM
#30
Re: to undestand array dictionarary, collection
 Originally Posted by Zvoni
Hash-thingy in RC6
The Hash-thingy is also available as pure VB6 source, without requiring RC6 at all (though apparently there is a further optimized version in RC6).
Just thought I'd mention this in case anyone was scared off thinking they had to use a separate library/reference.
-
Apr 6th, 2022, 11:45 AM
#31
Re: to undestand array dictionarary, collection
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|