-
1 Attachment(s)
Simple and fast, lightweight HashList-Class (no APIs)
Not much to add to the Threads Title...
the implementation of cHashList comes in only about 100 lines of code, and it can
be used as a VB.Collection-compatible replacement with much better performance.
The following Methods/Properties are supported:
- CompareMode (to switch between case-insensitive and case-sensitive compares, default-behaviour is like the VB.Collection)
- UniqueKeys (to allow multiple entries with the same Key, when switched from the True-defaultsetting to False)
- Count
- Add(Item As Variant, Optional Key As String)
- Exists(Key As String)
- IndexByKey(Key As String)
- KeyByIndex(ByVal IndexOneBased As Long)
- Item(KeyOrOneBasedIndex As Variant) ... (Get, Let and Set)
- ReInit(Optional ByVal ExpectedItemCount As Long = 5000)
Indexed access (for both, Keys and Items) is by orders of magnitude faster than the VB-Collection.
What's possible now as well (compared to the VB-Collection) is the ability to overwrite Item-Values
(at a given Index- or Key-position).
Note, that in the above List a Remove-Method is missing -
I've left this out for two reasons:
1) to demonstrate that a simplified HashList-Implementation doesn't necessarily need to be a linked List
2) because Remove is not used very often, when a Collection is used primarily as a fast "Key-Lookup-Container"
... (for Queue- or Stack-scenarios one can always use the normal VBA.Collection)
Performance is about 6 times as fast, when Key-Value-pairs are added -
and about twice as fast when Items are retrieved per Key-Lookup...
Here's a ScreenShot:
http://vbRichClient.com/Downloads/HashList.png
Here's the Class-Code of cHashList:
Code:
Option Explicit
Private Type DataTableEntry
Key As String
Value As Variant
End Type
Private Type HashTableEntry
DataIndexes() As Long
End Type
Private DataTable() As DataTableEntry, HashTable() As HashTableEntry
Private mCount As Long, mDTUBound As Long, mHashTableSize As Long
Public CompareMode As VbCompareMethod, UniqueKeys As Boolean
Private Sub Class_Initialize()
UniqueKeys = True
CompareMode = vbTextCompare
ReInit
End Sub
Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000)
mHashTableSize = 8
Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop
ReDim HashTable(0 To mHashTableSize - 1)
Dim i As Long
For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next
mDTUBound = 16: ReDim DataTable(0 To mDTUBound)
mCount = 0
End Sub
Public Property Get Count() As Long
Count = mCount
End Property
Public Function Exists(Key As String) As Boolean
Exists = FindIndex(Key, CalcHash(Key)) > 0
End Function
Public Function IndexByKey(Key As String) As Long
IndexByKey = FindIndex(Key, CalcHash(Key))
End Function
Public Sub Add(Item, Optional Key As String)
Dim HashValue As Long, UB As Long
HashValue = CalcHash(Key)
If UniqueKeys Then If FindIndex(Key, HashValue) Then Err.Raise 457
'prolong and add to the new entries to the DataTable-Array
mCount = mCount + 1
If mDTUBound < mCount Then mDTUBound = mDTUBound * 1.5: ReDim Preserve DataTable(0 To mDTUBound)
DataTable(mCount).Key = Key
DataTable(mCount).Value = Item
'add the new DataIndex to the proper Hash-Buckets .DataIndexes-Array
With HashTable(HashValue)
UB = UBound(.DataIndexes): UB = UB + 1
ReDim Preserve .DataIndexes(0 To UB)
.DataIndexes(UB) = mCount
End With
End Sub
Public Property Get KeyByIndex(ByVal IndexOneBased As Long)
If IndexOneBased < 1 Or IndexOneBased > mCount Then Err.Raise 9
KeyByIndex = DataTable(IndexOneBased).Key
End Property
Public Property Get Item(KeyOrOneBasedIndex)
Dim Index As Long
If VarType(KeyOrOneBasedIndex) = vbString Then
Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex))
If Index = 0 Then Err.Raise 457
Else
Index = KeyOrOneBasedIndex
If Index < 1 Or Index > mCount Then Err.Raise 9
End If
If IsObject(DataTable(Index).Value) Then
Set Item = DataTable(Index).Value
Else
Item = DataTable(Index).Value
End If
End Property
Public Property Let Item(KeyOrOneBasedIndex, NewItem)
Dim Index As Long
If VarType(KeyOrOneBasedIndex) = vbString Then
Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex))
If Index = 0 Then Err.Raise 457
Else
Index = KeyOrOneBasedIndex
If Index < 1 Or Index > mCount Then Err.Raise 9
End If
If IsObject(NewItem) Then
Set DataTable(Index).Value = NewItem
Else
DataTable(Index).Value = NewItem
End If
End Property
Public Property Set Item(KeyOrOneBasedIndex, NewItem)
Item(KeyOrOneBasedIndex) = NewItem
End Property
Private Function FindIndex(Key, ByVal HashValue As Long) As Long
Dim i As Long, CM As VbCompareMethod
With HashTable(HashValue)
CM = CompareMode
For i = 1 To UBound(.DataIndexes)
If StrComp(Key, DataTable(.DataIndexes(i)).Key, CM) = 0 Then
FindIndex = .DataIndexes(i): Exit Function
End If
Next
End With 'returns Zero, when no Key can be found
End Function
Private Function CalcHash(Key) As Long
Dim i As Long, L As Long, B() As Byte
If CompareMode Then B = LCase$(Key) Else B = Key
L = 7919
For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
CalcHash = L * B(0) Mod mHashTableSize
End Function
Friend Sub CheckHashDistribution()
Dim i As Long, UB As Long, cc As Long, Min As Long, Max As Long
Min = &H7FFFFFFF
For i = 0 To UBound(HashTable)
UB = UBound(HashTable(i).DataIndexes)
If UB Then
If Min > UB Then Min = UB
If Max < UB Then Max = UB
cc = cc + 1
End If
Next
Debug.Print "Distribution over a HashTable with"; UBound(HashTable) + 1; "slots:"
Debug.Print "Used-HashSlots:"; cc
Debug.Print "Min-Entries:"; Min
Debug.Print "Max-Entries:"; Max
End Sub
And here the Code of the Test-Form:
Code:
Option Explicit
Private Const TestEntryCount As Long = 100000
Private C As Collection, H As cHashList
Private Sub Form_Click()
Dim i As Long, T!, Item
AutoRedraw = True
Cls
Print "Count of Test-Entries:"; TestEntryCount; vbLf
Set C = New Collection
Set H = New cHashList
H.ReInit TestEntryCount
T = Timer
For i = 1 To TestEntryCount
C.Add i, "K" & i
Next
Print "Collection-Add:", Timer - T & "sec"
T = Timer
For i = 1 To TestEntryCount
H.Add i, "K" & i
Next
Print "cHashList-Add:", Timer - T & "sec"; vbLf
T = Timer
For i = 1 To TestEntryCount
Item = C.Item("K" & i)
Next
Print "Collection-ItemByKey:", Timer - T & "sec"
T = Timer
For i = 1 To TestEntryCount
Item = H.Item("K" & i)
Next
Print "cHashList-ItemByKey:", Timer - T & "sec"
Print vbLf; "Indexed access is not compared (it would be much faster per HashList)"
H.CheckHashDistribution
End Sub
Have fun with it...
Edit: Whilst the above Demo was more a "Concept-Showcase"... below is a performance-optimized version,
acting more like the MS-Scripting-Dictionary now (with a Param-wise inversed Add-Method, but feel free
to change it at will - the code-volume is still bearable with about 230 lines in cHashD).
In the meantime I've also added support for Integer-, Double- and Object-Keys - but I think I'll leave it at that -
since the Class is easy understandable and expandable...
Attachment 140909
Performance of Case-Insensitive-TextcompareMode for Scripting-Dictionary and cHashD
http://vbRichClient.com/Downloads/Ha...xtCompares.png
Performance of Binary-TextcompareModes for Scripting-Dictionary and cHashD
http://vbRichClient.com/Downloads/Ha...ryCompares.png
Olaf
-
2 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Hi.
Here is a quick comparison of the last wraps over VB.Collection
cHashList much faster, but only for very large collections
Attachment 140703
If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList
Attachment 140705
I compared only operation .Add, .Item, .Exist
Elroy - http://www.vbforums.com/showthread.p...B6-Collections
Schmidt - http://www.vbforums.com/showthread.p...Indexed-Access
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
In C#, String "K0" lower case return Hash -843466811, cHashList CalcHash returns 8611. Is it safe hash number?
Code:
public static int GetHashFromKey(string key)
{
if (key == null || key.Length == 0)
{
return -2147483648;
}
return key.ToLower().GetHashCode();
}
Code:
Private Function CalcHash(Key) As Long
Dim i As Long, L As Long, B() As Byte
If CompareMode Then B = LCase$(Key) Else B = Key
L = 7919
For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
CalcHash = L * B(0) Mod mHashTableSize
End Function
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Code:
Private Function CalcHash(Key) As Long
Dim i As Long, L As Long, B() As Byte
If CompareMode Then B = LCase$(Key) Else B = Key
L = 7919
For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
CalcHash = L * B(0) Mod mHashTableSize
End Function
What is the name of algorithm?
The algorithm is very smart. The possibility of collision is relatively ZERO. JAVA string hash has famous Aa vs BB.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
@Romeo91
According to your own tests, even with only 20,000 entries - cHashList is faster than Dictionary at Retrieval by Key.
Scripting.Dictionary-ItemByKey:
.046875 sec
cHshList-ItemByKey:
2.734E-02 --> .027344 sec
watch out for those decimals...
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Just as an FYI, since my code is mentioned in this thread ... my Collections wrapper was written for one primary reason (and just had some extra methods thrown in for good measure).
Hopefully, this code illustrates the motivation for my wrapper:
Code:
Dim c As New CollectionEx
'Dim c As New Collection ' To just use built-in VB6 collection.
Dim k1 As String
Dim k2 As String
k1 = ChrW$(&H1056) & ChrW$(&H1060) & ChrW$(&HBFD) & ChrW$(&H15F1) & ChrW$(&H31B1) & ChrW$(&HA19E)
k2 = ChrW$(&HD9A2) & ChrW$(&H2770) & ChrW$(&H4DE5) & ChrW$(&H1AF0) & ChrW$(&HA28B) & ChrW$(&H2F86)
c.Add "asdf", k1
c.Add "qwer", k2 ' <---- Error on built-in, error on Schmidt, NO error on Elroy.
I knew I was sacrificing some speed to accomplish this. But, for people who want to use binary keys, it gets the job done.
Regards,
Elroy
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
cHashlist doesn't suffer this problem, as it uses its own binary hash function.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
I abuse the code for learning purpose.
If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
H.ReInit
H.Add 1, "KK"
H.Add 2, "LL"
I am worried about the Hash value is not identical or has collision.
Code:
Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000)
mHashTableSize = 9
Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop
ReDim HashTable(0 To mHashTableSize - 1)
Dim i As Long
For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next
mDTUBound = 16: ReDim DataTable(0 To mDTUBound)
mCount = 0
End Sub
Private Function CalcHash(Key) As Long
Dim i As Long, L As Long, B() As Byte
If CompareMode Then B = LCase$(Key) Else B = Key
L = 7919
For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
CalcHash = L * B(0) Mod mHashTableSize
End Function
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Romeo91
If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList
The MS-Dictionary operates per default in Binary-StringComparemode,
whilst cHashList is operating in (case-insensitive) TextComparemode by default.
Maybe that was throwing things off a bit.
However - cHashList was just showcasing the concept of a simple HashTable-approach...
If you need more speed - please check-out the new performanceoptimized derivate (cHashD),
I've just put a Project-Zip-Attachment online at the bottom of the opener-post.
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
DaveDavis
I abuse the code for learning purpose.
If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
H.ReInit
H.Add 1, "KK"
H.Add 2, "LL"
I am worried about the Hash value is not identical or has collision.
As to your other question - the algorithm is "my own invention" (inspired a bit
by the Random-Key-Generator of the RC4-StreamCypher-algo).
And that two different String-Keys produce the same Hash-Value is entirely
normal - and expected in this scenario...
The Hash-Algorithm is only expected to deliver:
- always the same HashValue for a given Key and a given HashTable-Size
- thereby maintaining a relative good distribution over the HashTable-Size (the amount ot buckets).
The Zero you get for both "KK" and "LL" is "just another index" into the given HashTable.
The HashTable works well enough and quite fast, when later, in any given Hash-Bucket,
there's roughly 3-6 "data-index-pointers", after the expected amount of Key-Value-Pairs "has settled in".
Example for the new Zip-Project (cHashD) in the opener-post, which has a slightly changed algo
(now looping over WChars in a VB-SafeArray which was spanned over the String-Key):
E.g. when you expect about 20000 Key-Value-Pairs in your Hashing-Dictionary
(as is the default in the Test-Routine of the just mentioned new Demo),
the calculation for the HashTable-Size (which prefers Powers of two) -
comes up with 8192 (as the available amount of Buckets in the HashTable-Array).
That means, that the later on added 20000 Key-Value-Pairswould need to be distributed
(ideally) with 20000/8192 = 2.4 ---> meaning: "2-3 data-index-pointers per bucket".
Now, since we cannot expect the Hashing-Algo to provide such an ideal
distribution (especially not with such short String-Keys as used in the example),
we have to be satisfied with the following output:
(produced by the Friend Sub CheckHashDistribution() at the end of cHashD)
Code:
Distribution over a HashTable with 8192 slots:
Used-HashSlots: 8109
Min-Entries: 1
Max-Entries: 6
From an "ideally distributing" Hash-Algo, we would have expected:
Code:
Distribution over a HashTable with 8192 slots:
Used-HashSlots: 8192
Min-Entries: 2
Max-Entries: 3
But the results we achieved from the small and fast, but "less than ideal" algo are good enough,
to outperform the MS-Scripting.Dictionary. Between the endpoints of 1 and 6 as seen in the
Debug-Output, there'll be a gaussian normal-distribution IMO, so most of the entries *are*
in the expected 2-3-range...
So, the HashAlgo as seen in cHashList or (slightly changed in cHashD) is not thought
as a "secure Hash with at least 128Bit, which minimizes the probability of a collision to near zero"
(the Modulo-Op at the end the algo, which "folds" any calculated HashValue back onto
the available Index-Space for the HashTable-Array prevents that anyways, as does
the "only 32Bit long, randomizing carrier-variable" L).
The quality of such fast performing "Short-Hashes" will only show in the achieved
(hopefully equal) distribution across the available buckets (or Index-Slots).
I hope that clears a few things up...
Olaf
-
2 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
BTW, The Microsoft Dictionary from the scrrun.dll uses next hashing algorithm for strings:
Code:
INT CalcHashFromStringItem (BSTR bstrKey) {
INT iResult = 0;
for (INT i = 0; i < SysStringLen(bstrKey); i++)
iResult = iResult * 17 + bstrKey[i];
iResult %= 1201;
return iResult;
}
I make an analog for testing it in VB6:
Code:
Option Explicit
Private Declare Function GetMem4 Lib "msvbvm60" ( _
ByRef src As Any, _
ByRef Dst As Any) As Long
Private Sub Form_Load()
Dim sKey As String
Dim pDic As Dictionary
Dim index As Long
Set pDic = New Dictionary
For index = 0 To 10
sKey = MakeRandomString()
Debug.Print pDic.HashVal(sKey), Dictionaty_HashVal(sKey)
Next
End Sub
Private Function MakeRandomString() As String
Dim l As Long: Dim i As Long
l = Int(Rnd * 10) + 5
Do While l
i = Int(Rnd * 62)
Select Case i
Case Is < 10
i = i + 48
Case Is < 36
i = i - 10 + 65
Case Else
i = i - 36 + 97
End Select
MakeRandomString = MakeRandomString & Chr$(i)
l = l - 1
Loop
End Function
Private Function Dictionaty_HashVal( _
ByRef sValue As String) As Long
Dim index As Long
Dim result As Long
#If COMPILED Then
' // We can't use unsigned arithmetic without overflow in IDE
For index = 0 To Len(sValue) - 1
result = result * 17 + AscW(Mid$(sValue, index + 1, 1))
Next
result = result Mod 1201
#Else
' // In order to bypass overflow error, we use currency 64-bit numbers
Dim cResult As Currency
For index = 0 To Len(sValue) - 1
cResult = cResult * 17 + AscW(Mid$(sValue, index + 1, 1)) / 10000@
' // Zero high dword
If cResult > 429496.7295@ Then
GetMem4 0&, ByVal VarPtr(cResult) + 4
End If
Next
cResult = cResult * 10000@
result = cResult - Int(cResult / 1201@) * 1201@
#End If
Dictionaty_HashVal = result
End Function
Also i attached the project that allows make some tricks with MS Dictionary (sorting/get key/etc.). This knowledge had been obtained during reversing scrrun.dll code on Windows 7 x64 therefore it can distinct on the different platforms.
Attachment 140753
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
You are the genius, The Trick.
I didn't see VB's implementation of C# string GetHashCode. From Reflector we can see the code, but hard part is that VB's Long is not equal C#'s Int. Probably we will get Over Flow error during LeftShift or RightShift. If we have to use BigInteger, It is harder.
Code:
public override unsafe int GetHashCode()
{
if (HashHelpers.s_UseRandomizedStringHashing)
{
return InternalMarvin32HashString(this, this.Length, 0L);
}
fixed (char* str = ((char*) this))
{
char* chPtr = str;
if (chPtr != null)
{
chPtr += RuntimeHelpers.OffsetToStringData;
}
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
int length = this.Length;
while (length > 2)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
length -= 4;
}
if (length > 0)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
}
return (num + (num2 * 0x5d588b65));
}
}
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
DaveDavis
You are the tenant, The Trick.
What do you mean?
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
The trick
What do you mean?
:p:p
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Resurrected
I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?
Not really (not by much).
See, the MS-Scripting.Dictionary *is* already implmented in C/C++ (by people who knew their stuff) -
and still the VB-Implementation (along with the HashFunction I've used) runs circles around it,
when larger amounts of Key/Value-Pairs are used within cHashD.
The VB6-native compiler is based on VC++6 and produces decent enough binaries speedwise.
What it then comes down to (as always), is the choosen algorithm (and a bit of experience,
to avoid unnecessary memory-allocations)...
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Resurrected, you can test my hash table. It works faster than Dictionary on my computer. This class has been written on VB6.
-
1 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
The trick
Resurrected, you can test my
hash table. It works faster than
Dictionary on my computer. This class has been written on VB6.
Anatoli, this thread here is to discuss the Demo-Code for:
- cHashList (roughly compatible to the VB-Collection)
- cHashD (roughly compatible to the Scripting.Dictionary)
I don't have anything against pointing out better alternatives in CodeBank-Threads,
but if someone does, there should be a few good reasons, as e.g. better performance,
or less and easier to understand code.
Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard because:
- it is much slower in most tests
- and it contains much more code to read, maintain and understand (including assembler-blocks)
Performance-Results, using your slightly adapted Test-Project: Attachment 140839
Using 100000 Key/Value-pairs (as in your Demo) cHashD is:
- about factor 4 as fast whilst adding these Key/Value-pairs
- about factor 6 as fast with Keyed-Item access
- about factor 5 as fast in an enumeration of Key/Value-pairs
- about factor 1.5 slower in the For Each Enumeration of Keys
So, besides the last point, there's only huge gains when using cHashD -
(instead of the MS-Dictionary or your HashList-implementation)
and all of them in areas which are really important for such a Container.
Olaf
-
1 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Schmidt
Anatoli, this thread here is to discuss the Demo-Code for:
- cHashList (roughly compatible to the VB-Collection)
- cHashD (roughly compatible to the Scripting.Dictionary)
Olaf, my apologies but i responded to Resurrected. I showed him that full analog of MS Dictionary written on VB6 works faster than origin written on C++. I showed him the FULL analog of Dictionary as response to:
Quote:
I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?
I don't have something against your class, i answered to Resurrected post. I was doing my class as replacement for MS Dictionary.
As test of performance:
Your class is far NO analog of Dictionary:
- Your class have only string keys, dictionary and mine variant. And all tests works with string keys and String variables therefore it faster than mine. My class can use any keys 1, 1.0434@, &H100, Me, Null, Empty, etc like Dictionary, your doesn't. Just look at my CalcHash procedure.
- If i compile project and tries to access to your cHashD through For Each it get crash (bug?), therefore i can't check performance through For each.
Attachment 140841 - Your code can cause potential memory leak. For example:
Code:
Dim z As Long
Dim p As Long
Dim l As Long
For z = 0 To 100
For p = 0 To UBound(HD.Items)
l = l + Len(HD.ItemByIndex(p))
Next
Next
MsgBox l
Compile and look memory usage. Same with Keys properties. - Also please make correct test in For each ;):
Code:
For Each dat In HD.Items
out = out + Len(dat)
Next
- I can't add Object item. This is probably bug you should use either VariantCopy or make checking IsObject.
- I've not found Remove method. How can i remove an item from your hash table by a key/index?
- In the most cases i can just replace Dictionary without changing any source code by my hash table. Your code requires to do changes in source code.
Again, i don't have something against your class. I just wanted to answer to the Resurrected's post and as the example i suggested my hash table. I didn't tell that my class is better than you or something else. Our classes has different functional your class is quite small and understandable, mine is more functional but bigger and slower, but as an analog of Dictionary it's better (this class was creating as replacement of Dictionary):
Quote:
Represent a standalone class implements a hash table, which in many cases can be a substitute for the dictionary (Dictionary) of Scripting runtime.
Regarding,
Krivous Anatoly (The trick).
-
1 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Just a few short comments to the above:
Thanks for pointing out the Bug in the Add-method (which is now accepting Objects as Items, too).
As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
Assembly-stuff, sitting in the same project).
As for your HashTableClass (since you made it a topic here):
- no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)
- and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.
- no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)
As for the whole topic of "being compatible to the MS-Dictionary":
- neither cHashD nor clsTrickHashTable are fully compatible (the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible")
- my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)
- with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%
- I will not do that for cHashD
Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).
Here's an example, which makes good use of quite a lot of Scripting.Dictionary methods - and
both (cHashD and clsTrickHashTable) can be used without changes in that WordCounting-Scenario:
Here's the test-results for scanning a "work of Shakespeare-Text-File": http://www.gutenberg.org/files/100/
Performance in Binary-CompareMode (case-sensitive):
- 465msec (cHashD)
- 755msec (Dictionary)
- 1201msec (clsTrickHashTable)
Performance in Text-CompareMode (case-insensitive):
- 647msec (cHashD)
- 1277msec (clsTrickHashTable)
- 3341msec (Dictionary)
Your class is not bad (especially in TextCompareMode) - but as said, not always faster than the MS-Dictionary.
Here's the WordScan-Test-Code: Attachment 140903
And here a ScreenShot:
http://vbRichClient.com/Downloads/FastWordScan.png
Thanks again for testing cHashD, pointing out that Object-Add-Bug (the Zip-Download in the Opener-Post is now updated).
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
Assembly-stuff, sitting in the same project).
https://www.youtube.com/watch?v=BrGiUvb4GPw
Quote:
- no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)
What do you mean?
Quote:
- and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.
I can bet with you. This is another test:
Code:
Option Explicit
Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
ByRef lpFrequency As Any) As Long
Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
ByRef lpPerformanceCount As Any) As Long
Dim d As Dictionary
Dim t As clsTrickHashTable
Private Sub Form_Load()
Dim index As Long
Dim cFreq As Currency
Dim cVal As Currency
Dim time As Currency
Dim tDic As Currency
Dim tTrick As Currency
Dim sKeys() As String
MakeRandomKeys sKeys()
QueryPerformanceFrequency cFreq
Set d = New Dictionary
Set t = New clsTrickHashTable
' // Check binary compare
t.CompareMode = BinaryCompare
d.CompareMode = BinaryCompare
QueryPerformanceCounter time
For index = 0 To UBound(sKeys)
d.Add sKeys(index) & index, Me
Next
QueryPerformanceCounter tDic: tDic = tDic - time
QueryPerformanceCounter time
For index = 0 To UBound(sKeys)
t.Add sKeys(index) & index, Me
Next
QueryPerformanceCounter tTrick: tTrick = tTrick - time
MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _
"TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine
End Sub
Private Sub MakeRandomKeys( _
ByRef sKeys() As String)
Dim index As Long
Dim lLen As Long
Dim char As Integer
ReDim sKeys(500000)
For index = 0 To UBound(sKeys)
lLen = Int(Rnd * 50) + 10
Do While lLen
char = Int(Rnd * 62)
Select Case char
Case Is < 10: char = char + 48
Case Is < 36: char = char + 55
Case Else: char = char + 61
End Select
sKeys(index) = sKeys(index) & Chr$(char)
lLen = lLen - 1
Loop
Next
End Sub
On my computer Windows 7 x64 SP1 AMD Athlon(tm) II X2 245 2.90 Ghz 6GB RAM:
Quote:
---------------------------
Project1
---------------------------
Dictionary: 24905,1339625567 ms.
TrickHashTable: 15037,1265761431 ms.
---------------------------
ОК
---------------------------
https://www.youtube.com/watch?v=xL-9Ow_fcWg
Overall (mine and yours tests) both Dictionary and TrickHashTable work identically with Binary compare mode (however i wrote so in the original topic):
Quote:
Run fast enough on my computer about as well (even a bit faster) as a dictionary with binary comparison...
With Text compare mode my class works faster than Dictionary.
Quote:
- no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)
No, you enumerated String keys. Mine Variant, ie. any Object, Boolean, String, etc. simultaneously.
Quote:
the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible"
Then why did you compare our classes? Maybe i'm wrong but you didn't say why my class isn't FULL compatible with MS Dictionary. Please tell me then i make it full compatible.
Quote:
- my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)
The controversial statement, my class is replacement of MS Dictionary. Dictionary/Collection contains Remove method.
Quote:
- with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%
Please tell me what should i add in order to make full compatibility.
Quote:
Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).
Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative. You can add 100 arrays with different keys types, but it won't be replacement.
Quote:
since you made it a topic here
You became to compare our classes.
About your test. Your class don't support different keys types (like TrickHashTable, Dictionary). For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:
Quote:
Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard
I can also write same.
Regarding,
Krivous Anatoly (The trick)
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
@trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"
I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:
Code:
Sub Test()
Dim x As New Scripting.Dictionary
x.Item("Test") = "Hello"
Debug.Print x.Item("Test")
Dim y As New clsTrickHashTable
y.Item("Test") = "Goodbye"
Debug.Print x.Item("Test")
End Sub
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
jpbro
@trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"
I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:
Code:
Sub Test()
Dim x As New Scripting.Dictionary
x.Item("Test") = "Hello"
Debug.Print x.Item("Test")
Dim y As New clsTrickHashTable
y.Item("Test") = "Goodbye"
Debug.Print x.Item("Test")
End Sub
Thank you very much. I'll fix it.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
The trick
I can bet with you. This is another test:
Code:
Option Explicit
Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
ByRef lpFrequency As Any) As Long
Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
ByRef lpPerformanceCount As Any) As Long
Dim d As Dictionary
Dim t As clsTrickHashTable
Private Sub Form_Load()
Dim index As Long
Dim cFreq As Currency
Dim cVal As Currency
Dim time As Currency
Dim tDic As Currency
Dim tTrick As Currency
Dim sKeys() As String
MakeRandomKeys sKeys()
QueryPerformanceFrequency cFreq
Set d = New Dictionary
Set t = New clsTrickHashTable
' // Check binary compare
t.CompareMode = BinaryCompare
d.CompareMode = BinaryCompare
QueryPerformanceCounter time
For index = 0 To UBound(sKeys)
d.Add sKeys(index) & index, Me
Next
QueryPerformanceCounter tDic: tDic = tDic - time
QueryPerformanceCounter time
For index = 0 To UBound(sKeys)
t.Add sKeys(index) & index, Me
Next
QueryPerformanceCounter tTrick: tTrick = tTrick - time
MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _
"TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine
End Sub
Private Sub MakeRandomKeys( _
ByRef sKeys() As String)
Dim index As Long
Dim lLen As Long
Dim char As Integer
ReDim sKeys(500000)
For index = 0 To UBound(sKeys)
lLen = Int(Rnd * 50) + 10
Do While lLen
char = Int(Rnd * 62)
Select Case char
Case Is < 10: char = char + 48
Case Is < 36: char = char + 55
Case Else: char = char + 61
End Select
sKeys(index) = sKeys(index) & Chr$(char)
lLen = lLen - 1
Loop
Next
End Sub
Nope, no success here regarding "your bet", since on my Win8.1-system
I get (native compiled of course) the following results with your above code-snippet:
http://vbRichClient.com/Downloads/TestDictionaries.png
That the Dictionary performs similarily to your Class might be due to a better Mem-Allocator
in Win8.1 (compared to the one in your Win7) - but more likely it is the CPU-differences (AMD vs. Intel)
which is favouring one or the other implementation (and theres by far more Intel-based machines
out there these days than AMD-powered ones).
Also note, that cHashD is more than 10 times faster than either of the other Dictionaries
in such a "mass amount test" (like your 500,000 entries) of Key-Value-Pairs.
Quote:
Originally Posted by
The trick
Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative.
I thought I already gave a real-world-example (the Word-Counting one), where each of our Classes
can be used as a simple "drop-in" replacement to the MS-Dictionary (without changing the CodeBase,
by just changing the Class-Types).
As I already said, I don't aim to be fully compatible to the MS-Dictionary - but cHashD is "compatible enough"
to be used as a direct replacments in 90% of the real-world-scenarios out there (that's what I meant with "use-cases").
These real-world-scenarios involve:
- first and foremost: fast adding and retrieving of Variant-Items per StringKey
- to a lesser extend: a fast Exists-Method (also mostly used with StringKeys)
- the ability to exchange an Item behind an existing Key to a different one
- and occasionally the usage of the RemoveAll-Method
- fast enumeration of the complete Key-Value-Pair by accessing them per Index also comes up quite often
The last one is not available in the MS-Dictionary or in your Class - but it is a real-world-requirement
which is very handy to have in many situations.
And in all my decades of experience with real-world applications, I very rarely needed the Remove-Method
on such Key/Value-pair containers (your mileage may vary).
Quote:
Originally Posted by
The trick
You became to compare our classes.
Well, I didn't intend to - not here in this CodeBank-Thread - it was you who brought clsTrickHashTable into the discussion.
Quote:
Originally Posted by
The trick
Your class don't support different keys types (like TrickHashTable, Dictionary).
It does support Integer Keys (Byte, Integer, Long up to Currency) and "Float-Keys" (Singles, Dates and Doubes) now.
In these modes cHashD will act even faster than with String-Keys.
What it doesn't support is, to use these different Key-Types in a "mixed way" - (would appreciate though,
when you could give me a real-world-example, where you used such "mixed-Keys-capabilities" to great success).
Quote:
Originally Posted by
The trick
For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:
Well, it was easy enough to add that feature with a dozen lines of code (have now updated
the HashD.zip-Download at the end of the Top-Posting in this thread accordingly)...
Besides having it added now (just for kicks) - I'd like to see a real-world-example from you, where such Object-associations per Dictionary would be "nice to have"...
Usually when I need that, I simply add a Property:
Code:
Public AssociatedObject As Object
...into the Class in question - later assign the "Side-Object" to it and be done with the task.
As for your: "don't compare our Classes"... why not, when they both are useful in scenarios,
where the VB6-community formerly used the MS-Scripting.Dictionary.
cHashD is extremely fast (especially with a larger amount of Key/Value-pairs, where the
over-all-timings are often in the range of seconds) - scenarios similar to my WordCounting-example
are a quite common use-case.
In Use-Cases where e.g. the Remove-Method becomes paramount, your Class seems to be a good alternative, too.
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
If we maintain Hash value in sorted Asc Order, whatever inserting or adding or changing a Key, then FindIndex or Key removing is faster by using Binary Search.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.
I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).
My app - https://github.com/ADIAProject/DIA
The program need the driver packages (DP*.7z) - can be downloaded:
http://driveroff.net/SamDrivers_16.8.torrent
http://driverpacks.net/downloads
-
1 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Romeo91
In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.
I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).
First, to get the best speed out of VB6-code, you should compile such performance-critical stuff
natively -> with all extended Options.
In your project (in case you "hung-in" cHashD as is) this is not the case.
If you want to retain your current Compile-Settings as they are in your Main-Project,
then compile cHashD (with all extended native Compile-Options) into an ActiveX-Dll.
That said, for your special case you can also fine-tune cHashD "as you need it"
(it's codebase is simple and small enough for that)...
Also, I've seen a lot of lines in your code like:
Code:
If Not Dictionary.Exists(SomeKey) Then
Dictionary.Add SomeKey, SomeItem
End If
That means, that you already ensure uniqueness in said Dictionaries on your own ("by hand") -
and thus you could instruct cHashD, to leave out its Test for Uniqueness in the Add-Method
(either by setting the Public Prop UniqueKeys to False after Instantiation, or already ensuring
mUniqueKeys to False in Class_Initialize - as the new Default).
Also, in case you use only String-Keys, why not changing all Key-Parameters to Strings instead of
Variants - that will give it another little performance-kick.
What's also important is, that you can give hints to cHashD about the (roughly) expected
number of entries - in your case these seem to be "number of Text-Lines in *.inf-Files" or something...
What your performance-critical code also needs, is a bit more "structuring" IMO ...
Long Procedures with about 1000 lines of code (like your DevParserByRegExp) are not
really nice to read, nor easy maintainable.
Not sure, whether the following example is useful for you, but it calculates the
Line-differences of two "modern" (UTF16-LE) *.inf Files, as they exist these days e.g.
in the MS-SxS Folder.
The adapted cHashD-version in the following Demo-Zip: Attachment 141021
performs about 50% faster than the MS-Dictionary in this test (finding the LineDifferences
of two blue-tooth related, nearly identical Inf-Files with about 32K WChars each.
The time to read all the lines of these two files into two Dictionaries first:
- cHashD needs about 2msec
- MS-Dictionary about 3.7msec
To find the Line-differences in these two Dictionaries:
- cHashD needs about 0.4msec
- MS-Dictionary about 0.6msec
(the above measurement was done in a 100x repetition-loop, to allow for easier timing-comparison).
As said, not sure whether such a Line-Delta-Scan is useful in your scenario,
but if your App is thought to manage (or switch) between (nearly) identical Driver-Files,
then I can imagine that this could be of help (comparing only the handful of different lines
in two Files with a "deeper analysis" should be faster, than performing the deep analysis on
the Files as a whole).
HTH
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Olaf:
Can u help make KeyByIndex property to be editable?
I have and these codes to cHashD, but can not make it.
Code:
Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long
Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS)
If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
Select Case mKeyMode
Case eUseStrings:
RHS = CStr(RHS)
VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS)
Case eUseIntegers:
VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS)
Case eUseDoubles:
VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS)
End Select
End Property
Public Property Set KeyByIndex(ByVal IndexZeroBased As Long, RHS)
If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
KeyByIndex(IndexZeroBased) = RHS
End Property
Code:
Sub testll()
Dim c As New cHashD
c.Add 1, 11
c.KeyByIndex(0) = "2"
Debug.Print c.KeyByIndex(0) 'the key is still "1"
End Sub
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
loquat
Can u help make KeyByIndex property to be editable?
Why do you need that (I ask, because the need for something like that was near zero in the scenarios I've encountered over the last two decades).
I mean, there is some scenarios where it is needed, but these I usually cover with DBs and SQL.
Quote:
Originally Posted by
loquat
I have and these codes to cHashD, but can not make it.
Code:
Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long
Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS)
If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
Select Case mKeyMode
Case eUseStrings:
RHS = CStr(RHS)
VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS)
Case eUseIntegers:
VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS)
Case eUseDoubles:
VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS)
End Select
End Property
There's no need to overcomplicate things with API-calls in this case -
simple Value-assignments would be enough, like: mKeys(IndexZeroBased) = CStr(RHS)
But with something like that, you're only "halfway-there", because the old Key you have just overriden,
will still link to the old Index-Position of Keys and Value (when doing a Hash-based search), -
and for the new Key no such Hash-Link exists yet.
So in general, such requirements are solved, by removing the old Pair (OldKey and OldItemValue) completely from the List -
followed by adding the new Pair (NewKey/OldItemValue) over the normal Add-mechanism.
Olaf
-
1 Attachment(s)
Re: Simple and fast, lightweight HashList-Class (no APIs)
Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed
Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):
http://vbRichClient.com/Downloads/cHashDWithRemove.png
Ok, here the Zip with the new version (including the little PerformanceTest).
Attachment 177177
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Schmidt
Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed
Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):
http://www.vbforums.com/images/ieimages/2020/05/1.png
Ok, here the Zip with the new version (including the little PerformanceTest).
Attachment 177177
Olaf
I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!
Also on VB6 try with: Const Count As Long = 1048500
Code:
Sub HashTable_Dictionary()
Const Count As Long = 1048500
Dim SD As Scripting.Dictionary
Dim i As Long, T@
Set SD = New Scripting.Dictionary
Dim HD As cHashTable
Set HD = New cHashTable
T = msTimer
For i = 1 To Count
SD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i
Next
Debug.Print "Dictionary - AddItems", msTimer - T, SD.Count
T = msTimer
HD.ReInit Count
For i = 1 To Count
HD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i
Next
Debug.Print "HashTable - AddItems", msTimer - T, HD.Count
End Sub
Attachment 180419
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
PhuongNam
I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!
Wild guess: you missed this text in parentheses in OP -- "(native compiled, all options checked)"
*Everything* in the IDE/p-code is slow because array index bounds checking cannot be switched off. Only *native* compiled executable has the benefit of accessing entries in a buffer without the overhead.
cheers,
</wqw>
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Hi ~ Schmidt
why when string array , it error ?
dim w() As string, saW As SAFEARRAY1D
BindArray w, VarPtr(saW) ' ----> VB6 show it is error !
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Schmidt
Not much to add to the Threads Title...
the implementation of cHashList comes in only about 100 lines of code, and it can
be used as a VB.Collection-compatible replacement with much better performance. ... .. .
...
..
.
Olaf
Dear Schmidt,
Such sleek code (as usual) and such amazing speed (in comparison to Dictionary, etc.). A big wow, as usual. Thanks a TON.
Well, I tried out your code (just the adding routine alone, commenting out the rest, incl. the code related to the scripting Dictionary) and was curious to know as to what is the largest Count your hash table would support. In my system with 16GB RAM, I was able to add 450,83,000 records when I just added a long value as key as follows. More than 450,83,000 records, it resulted in error (Run-time error '7': Out of memory).
Code:
For i = 1 To Count: HD.Add i, i: Next
When I added a two-characters string as follows, I was able to add around 245,00,000 records. More than that, it resulted in error (Run-time error '7': Out of memory).
Code:
For i = 1 To Count: HD.Add "S " & i, i: Next
So, obviously, more the length of the key, less the no. of records I was able to add. Perhaps in a system with only 2GB memory, may be I will be able to add still further less records only for the same above codes. So, in real time, when my app is adding some records in user-end, how will my code be able to know that it might result in 'Out of memory' at some point of time and hence can't keep adding more records (and hence have to inform the user)? As such, as of now, I don't have the need to add 450,00,000+ records but I am just curious to know as to how to find out in real time whether adding any further records rmay result in 'out of memory' situation and inform the user accordingly OR take some other action.
Ideally, upto what value of Count can I safely use your 'hash table code' in any system (assuming 2GB is the lowest RAM I might encounter in an user's system)?
Kind regards.
EDIT-1:
I am using your latest code only (cHashDWithRemove)
EDIT-2:
My above observations were with the compiled code only; not in the IDE. By the way, I sometimes observed that for large values of Count, even though .Add (adding records) worked correctly uptil Count, if I give HD.Clear later (say, after printing HD.Count), it caused "Run-time error '7': Out of memory"
Kind regards.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
Schmidt
Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed
... .. .
Ok, here the Zip with the new version (including the little PerformanceTest).
cHashDWithRemove.zip
Olaf
Dear Schmidt,
I tried the following code, wherein I have added duplicate keys and also tried to fetch all the items for a particular duplicate key.
Code:
HD.Add 1, 1
HD.Add 2, 2
HD.Add 1, 2
HD.Add 1, 3
HD.Add 1, 4
HD.Add 3, 3
HD.Add 1, 5
For i = 6 To 50000: HD.Add i, i: Next
Dim v() As Variant
v = HD.ItemsAll(1)
Debug.Print UBound(v)
For "ItemsAll" function, I wrote my own code as follows as I could not find a way with your existing code to get that info. Sorry if a way exists and I did not know it. Kindly let me know that way, if it exists.
The following code works correctly for the above case but I am not sure whether it will work perfectly for all cases. Also, I think there must be a more optimal way to do it. Can you please provide me the optimal code for "ItemsAll", when your time permits, please.
I did not know how to use IndexByKey to find out the last index of a non-unique key. IndexByKey returns the first found index of a duplicate key only. Ideally, FindFirst, FindNext, and FindLast for any duplicate key would be fine. Can you provide me with the optimal codes for these functions as I don't know how to write them in the perfect/optimal way. Thanks in advance.
Code:
Public Function ItemsAll(key) As Variant() 'hand-out all the items for a non-unique Key
If mCount = 0 Then ItemsAll = Array(): Exit Function
Dim i As Long, j As Long, v()
Dim itemsCnt As Long
Dim idx As Long
idx = IndexByKey(key)
ReDim v(0 To mCount - (idx + 1))
itemsCnt = 0
j = idx
For i = idx To mCount - 1
Do While mValues(j) = NoEntry: j = j + 1: Loop
If KeyByIndex(i) = key Then
VariantCopy v(itemsCnt), ByVal VarPtr(mValues(j))
itemsCnt = itemsCnt + 1
End If
j = j + 1
Next
ReDim Preserve v(0 To itemsCnt - 1) As Variant
ItemsAll = v
End Function
Kind regards.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:
Code:
Option Explicit
Sub Test()
Dim lo_Lists As cHashD
Dim lo_List As RC6.cArrayList
Dim l_Key As String
Dim ii As Long
Randomize
Set lo_Lists = New cHashD
For ii = 1 To 5000
l_Key = Int(Rnd * 10) + 1
If lo_Lists.Exists(l_Key) Then
Set lo_List = lo_Lists(l_Key)
Else
Set lo_List = New_c.ArrayList(vbLong)
lo_Lists.Add l_Key, lo_List
End If
lo_List.Add Int(Rnd * 100) + 1
Next ii
l_Key = Int(Rnd * 10) + 1
Set lo_List = ItemsAll(lo_Lists, l_Key)
For ii = 0 To lo_List.Count - 1
Debug.Print "Item #" & ii + 1 & " for key " & l_Key & ": " & lo_List(ii)
Next ii
End Sub
Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList
If po_HashList.Exists(p_Key) Then
Set ItemsAll = po_HashList(p_Key)
Else
Err.Raise vbObjectError, , "Key not found."
End If
End Function
If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
jpbro
Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:
... .. .
..
.
If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.
First of all, thank you sooo much for taking time to answer my query and that too in such a helpful manner. Thank you sooo much.
Actually, in a project I commenced recently, I was/am handling it the way you have suggested only. I am reading a file once and holding its data in an array of structures. The structure has many fields and one of the date fields has multiple values, in 1000s, but for some keys alone. In other words, this date field can have a minimum of 1 value to any number of indeterminate values (which could be in 1000s). As written earlier, I am using variable array only for this field. And, as you said, Redim was an overhead since I had to Redim this array field (increasing by 1) as and when a new value came for the same key.
A few days after working with the aforesaid structure, I started learning about Collections. The need to use Collections had not arisen earlier and hence I did not know anything about it, as such. Once I started learning about Collections, I came to know about Dictionaries too and that they were faster than Collections. And, I read on the internet (from multiple sources) that the best and ideal way to handle multiple values for a single key is to have them as separate key-value pairs. This was with respect to database tables though. But, I thought perhaps the same would be ideal for Collections/Dictionaries too. And, while searching for Collection/Dictionary alternatives, I came upon Olaf's CHashD class which was super super fast when compared to Dictionaries. That is when I tried it and I posted my query with respect to ItemsAll function.
Well, if you feel even with 1000s of multiple values for a single key, the method which I was already employing and which is the one which you have also recommended is the best option, then I would go with it. I am not an expert in these areas. So, kindly give your confirmed opinion on whether this is the best way. Also, let me know what is the best way to redim. Since the count of the multiple values are not known beforehand, I am redimming every time a new value comes in for the same key. Is there a better way? I wish to mention that the number of keys is also in the 1000s.
Note:
Regarding RC6, I know about it well but because of lack of time, I am unable to explore, learn and integrate it with my app yet. And, a database table will not suit my case since I have to make some queries on user input instantly (as the user keeps typing) and display the muti-layer results real-time (on a microsecond level). Also, in addition to collections, I am not in a position to spend time to learn additionally about databases also (their integration, querying them, testing them, speed comparison, etc. etc.). So, a super-fast collection like Olaf's cHashTable will be the ideal one.
Very kind regards.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
softv
I am reading a file once and holding its data in an array of structures.
The structure has many fields and one of the date fields has multiple values, in 1000s,
Is there any chance we could see the definition of this "structure" (the UDT-definition)?
(along with an info, which field of the structure is the "key-field")?
Olaf
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):
1. Using a cHashD object of cHashD object
2. Using a cHashD object of RC6.cArrayList object
3. Using a cHashD object of Variant Arrays
Here's the code:
Code:
Option Explicit
Sub Test(Optional po_Form As Form)
Dim lo_Lists As cHashD
Dim lo_List As cHashD
Dim l_Key As String
Dim ii As Long
Randomize
New_c.Timing True
Set lo_Lists = New cHashD
For ii = 1 To 5000
l_Key = "KEY_" & Int(Rnd * 10) + 1
If lo_Lists.Exists(l_Key) Then
Set lo_List = lo_Lists(l_Key)
Else
Set lo_List = New cHashD
lo_List.UniqueKeys = False
lo_Lists.Add l_Key, lo_List
End If
lo_List.Add "", Int(Rnd * 100) + 1
Next ii
l_Key = "KEY_" & Int(Rnd * 10) + 1
Set lo_List = ItemsAll(lo_Lists, l_Key)
If po_Form Is Nothing Then
Debug.Print New_c.Timing
Else
po_Form.Print New_c.Timing
End If
End Sub
Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As cHashD
If po_HashList.Exists(p_Key) Then
Set ItemsAll = po_HashList(p_Key)
Else
Err.Raise vbObjectError, , "Key not found."
End If
End Function
Sub Test2(Optional po_Form As Form)
Dim lo_Lists As cHashD
Dim lo_List As RC6.cArrayList
Dim l_Key As String
Dim ii As Long
Randomize
New_c.Timing True
Set lo_Lists = New cHashD
For ii = 1 To 5000
l_Key = "KEY_" & Int(Rnd * 10) + 1
If lo_Lists.Exists(l_Key) Then
Set lo_List = lo_Lists(l_Key)
Else
Set lo_List = New_c.ArrayList(vbLong)
lo_Lists.Add l_Key, lo_List
End If
lo_List.Add Int(Rnd * 100) + 1
Next ii
l_Key = "KEY_" & Int(Rnd * 10) + 1
Set lo_List = ItemsAll2(lo_Lists, l_Key)
If po_Form Is Nothing Then
Debug.Print New_c.Timing
Else
po_Form.Print New_c.Timing
End If
End Sub
Private Function ItemsAll2(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList
If po_HashList.Exists(p_Key) Then
Set ItemsAll2 = po_HashList(p_Key)
Else
Err.Raise vbObjectError, , "Key not found."
End If
End Function
Sub Test3(Optional po_Form As Form)
Dim lo_Lists As cHashD
Dim la_ListsIdx() As Long
Dim la_List() As Variant
Dim l_ListsIdx As Long
Dim l_ListIdx As Long
Dim l_Key As String
Dim ii As Long
Randomize
New_c.Timing True
Set lo_Lists = New cHashD
ReDim la_ListsIdx(32)
For ii = 1 To 5000
l_Key = "KEY_" & Int(Rnd * 10) + 1
If lo_Lists.Exists(l_Key) Then
la_List = lo_Lists(l_Key)
l_ListsIdx = lo_Lists.IndexByKey(l_Key)
l_ListIdx = la_ListsIdx(l_ListsIdx) + 1
If l_ListIdx > UBound(la_List) Then
ReDim Preserve la_List(UBound(la_List) * 2)
End If
lo_Lists(l_Key) = la_List
la_ListsIdx(l_ListsIdx) = l_ListIdx
Else
l_ListIdx = 0
la_List = MakeList
lo_Lists.Add l_Key, la_List
l_ListsIdx = lo_Lists.Count - 1
If l_ListsIdx > UBound(la_ListsIdx) Then
ReDim Preserve la_ListsIdx(UBound(la_ListsIdx) * 2)
End If
la_ListsIdx(l_ListsIdx) = 0
End If
la_List(l_ListIdx) = Int(Rnd * 100) + 1
Next ii
For ii = 0 To lo_Lists.Count - 1
la_List = lo_Lists.ItemByIndex(ii)
ReDim Preserve la_List(la_ListsIdx(ii))
lo_Lists.ItemByIndex(ii) = la_List
Next ii
l_Key = "KEY_" & Int(Rnd * 10) + 1
la_List = ItemsAll3(lo_Lists, l_Key)
If po_Form Is Nothing Then
Debug.Print New_c.Timing
Else
po_Form.Print New_c.Timing
End If
End Sub
Private Function MakeList() As Variant()
ReDim MakeList(32)
End Function
Private Function ItemsAll3(po_HashList As cHashD, ByVal p_Key As String) As Variant
If po_HashList.Exists(p_Key) Then
ItemsAll3 = po_HashList(p_Key)
Else
Err.Raise vbObjectError, , "Key not found."
End If
End Function
I'm not sure I've programmed the Variant Array approach optimally, as I didn't want to spend too much time on it. Fiddling with ReDim's and indexes is tiring when we have the cArrayList or cHashD classes handy. That said, I've tried to limit the # of ReDim's to save time, as ReDimmming+1 is a performance killer. Instead what I did is start with a "bucket" of 33 elements, then I keep doubling the size of the bucket as needed. This means we also need to track the current index for each bucket. That said, either I have a bug or the Variant Array list approach won't work with cHashD. I kept getting empty arrays back, so I suspect cHashD doesn't support Arrays as items.
The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.
Anyway,I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.
-
Re: Simple and fast, lightweight HashList-Class (no APIs)
Quote:
Originally Posted by
jpbro
I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):
1. Using a cHashD object of cHashD object
2. Using a cHashD object of RC6.cArrayList object
3. Using a cHashD object of Variant Arrays
... .. .
..
.
The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.
Anyway, I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.
Thank you so much once again for taking time amidst your own hectic schedules to chalk out such detailed scenarios and also test cases for each of them. It was really helpful. Today I could get time to run and understand the 1st case ("cHashD object of cHashD object"). I think I will go with it, until I am able to get the right time to integrate the fabulous RC6 in my apps (I am sure, some day in future, I will integrate RC6, and I am eagerly looking forward to that day too).
Your idea (of cHashD object of cHashD object) just did not occur to my mind at all. Brilliant! Thanks a TON, once again.
Kind regards.