Simple and fast, lightweight HashList-Class (no APIs)

1 Attachment(s)

Not much to add to the Threads Title...

the implementation of cHashList comes in only about 100 lines of code, and it can
be used as a VB.Collection-compatible replacement with much better performance.

The following Methods/Properties are supported:
- CompareMode (to switch between case-insensitive and case-sensitive compares, default-behaviour is like the VB.Collection)
- UniqueKeys (to allow multiple entries with the same Key, when switched from the True-defaultsetting to False)
- Count
- Add(Item As Variant, Optional Key As String)
- Exists(Key As String)
- IndexByKey(Key As String)
- KeyByIndex(ByVal IndexOneBased As Long)
- Item(KeyOrOneBasedIndex As Variant) ... (Get, Let and Set)
- ReInit(Optional ByVal ExpectedItemCount As Long = 5000)

Indexed access (for both, Keys and Items) is by orders of magnitude faster than the VB-Collection.
What's possible now as well (compared to the VB-Collection) is the ability to overwrite Item-Values
(at a given Index- or Key-position).

Note, that in the above List a Remove-Method is missing -
I've left this out for two reasons:
1) to demonstrate that a simplified HashList-Implementation doesn't necessarily need to be a linked List
2) because Remove is not used very often, when a Collection is used primarily as a fast "Key-Lookup-Container"
... (for Queue- or Stack-scenarios one can always use the normal VBA.Collection)

Performance is about 6 times as fast, when Key-Value-pairs are added -
and about twice as fast when Items are retrieved per Key-Lookup...

Here's a ScreenShot:
http://vbRichClient.com/Downloads/HashList.png

Here's the Class-Code of cHashList:

Code:

Option Explicit Private Type DataTableEntry Key As String Value As Variant End Type Private Type HashTableEntry DataIndexes() As Long End Type Private DataTable() As DataTableEntry, HashTable() As HashTableEntry Private mCount As Long, mDTUBound As Long, mHashTableSize As Long Public CompareMode As VbCompareMethod, UniqueKeys As Boolean Private Sub Class_Initialize() UniqueKeys = True CompareMode = vbTextCompare ReInit End Sub Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000) mHashTableSize = 8 Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop ReDim HashTable(0 To mHashTableSize - 1) Dim i As Long For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next mDTUBound = 16: ReDim DataTable(0 To mDTUBound) mCount = 0 End Sub Public Property Get Count() As Long Count = mCount End Property Public Function Exists(Key As String) As Boolean Exists = FindIndex(Key, CalcHash(Key)) > 0 End Function Public Function IndexByKey(Key As String) As Long IndexByKey = FindIndex(Key, CalcHash(Key)) End Function Public Sub Add(Item, Optional Key As String) Dim HashValue As Long, UB As Long HashValue = CalcHash(Key) If UniqueKeys Then If FindIndex(Key, HashValue) Then Err.Raise 457 'prolong and add to the new entries to the DataTable-Array mCount = mCount + 1 If mDTUBound < mCount Then mDTUBound = mDTUBound * 1.5: ReDim Preserve DataTable(0 To mDTUBound) DataTable(mCount).Key = Key DataTable(mCount).Value = Item 'add the new DataIndex to the proper Hash-Buckets .DataIndexes-Array With HashTable(HashValue) UB = UBound(.DataIndexes): UB = UB + 1 ReDim Preserve .DataIndexes(0 To UB) .DataIndexes(UB) = mCount End With End Sub Public Property Get KeyByIndex(ByVal IndexOneBased As Long) If IndexOneBased < 1 Or IndexOneBased > mCount Then Err.Raise 9 KeyByIndex = DataTable(IndexOneBased).Key End Property Public Property Get Item(KeyOrOneBasedIndex) Dim Index As Long If VarType(KeyOrOneBasedIndex) = vbString Then Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex)) If Index = 0 Then Err.Raise 457 Else Index = KeyOrOneBasedIndex If Index < 1 Or Index > mCount Then Err.Raise 9 End If If IsObject(DataTable(Index).Value) Then Set Item = DataTable(Index).Value Else Item = DataTable(Index).Value End If End Property Public Property Let Item(KeyOrOneBasedIndex, NewItem) Dim Index As Long If VarType(KeyOrOneBasedIndex) = vbString Then Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex)) If Index = 0 Then Err.Raise 457 Else Index = KeyOrOneBasedIndex If Index < 1 Or Index > mCount Then Err.Raise 9 End If If IsObject(NewItem) Then Set DataTable(Index).Value = NewItem Else DataTable(Index).Value = NewItem End If End Property Public Property Set Item(KeyOrOneBasedIndex, NewItem) Item(KeyOrOneBasedIndex) = NewItem End Property Private Function FindIndex(Key, ByVal HashValue As Long) As Long Dim i As Long, CM As VbCompareMethod With HashTable(HashValue) CM = CompareMode For i = 1 To UBound(.DataIndexes) If StrComp(Key, DataTable(.DataIndexes(i)).Key, CM) = 0 Then FindIndex = .DataIndexes(i): Exit Function End If Next End With 'returns Zero, when no Key can be found End Function Private Function CalcHash(Key) As Long Dim i As Long, L As Long, B() As Byte If CompareMode Then B = LCase$(Key) Else B = Key L = 7919 For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next CalcHash = L * B(0) Mod mHashTableSize End Function Friend Sub CheckHashDistribution() Dim i As Long, UB As Long, cc As Long, Min As Long, Max As Long Min = &H7FFFFFFF For i = 0 To UBound(HashTable) UB = UBound(HashTable(i).DataIndexes) If UB Then If Min > UB Then Min = UB If Max < UB Then Max = UB cc = cc + 1 End If Next Debug.Print "Distribution over a HashTable with"; UBound(HashTable) + 1; "slots:" Debug.Print "Used-HashSlots:"; cc Debug.Print "Min-Entries:"; Min Debug.Print "Max-Entries:"; Max End Sub

And here the Code of the Test-Form:

Code:

Option Explicit Private Const TestEntryCount As Long = 100000 Private C As Collection, H As cHashList Private Sub Form_Click() Dim i As Long, T!, Item AutoRedraw = True Cls Print "Count of Test-Entries:"; TestEntryCount; vbLf Set C = New Collection Set H = New cHashList H.ReInit TestEntryCount T = Timer For i = 1 To TestEntryCount C.Add i, "K" & i Next Print "Collection-Add:", Timer - T & "sec" T = Timer For i = 1 To TestEntryCount H.Add i, "K" & i Next Print "cHashList-Add:", Timer - T & "sec"; vbLf T = Timer For i = 1 To TestEntryCount Item = C.Item("K" & i) Next Print "Collection-ItemByKey:", Timer - T & "sec" T = Timer For i = 1 To TestEntryCount Item = H.Item("K" & i) Next Print "cHashList-ItemByKey:", Timer - T & "sec" Print vbLf; "Indexed access is not compared (it would be much faster per HashList)" H.CheckHashDistribution End Sub

Have fun with it...

Edit: Whilst the above Demo was more a "Concept-Showcase"... below is a performance-optimized version,
acting more like the MS-Scripting-Dictionary now (with a Param-wise inversed Add-Method, but feel free
to change it at will - the code-volume is still bearable with about 230 lines in cHashD).
In the meantime I've also added support for Integer-, Double- and Object-Keys - but I think I'll leave it at that -
since the Class is easy understandable and expandable...
Attachment 140909

Performance of Case-Insensitive-TextcompareMode for Scripting-Dictionary and cHashD
http://vbRichClient.com/Downloads/Ha...xtCompares.png

Performance of Binary-TextcompareModes for Scripting-Dictionary and cHashD
http://vbRichClient.com/Downloads/Ha...ryCompares.png

Olaf

2 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Hi.
Here is a quick comparison of the last wraps over VB.Collection

cHashList much faster, but only for very large collections
Attachment 140703
If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList
Attachment 140705
I compared only operation .Add, .Item, .Exist

Elroy - http://www.vbforums.com/showthread.p...B6-Collections
Schmidt - http://www.vbforums.com/showthread.p...Indexed-Access

Re: Simple and fast, lightweight HashList-Class (no APIs)

In C#, String "K0" lower case return Hash -843466811, cHashList CalcHash returns 8611. Is it safe hash number?

Code:

public static int GetHashFromKey(string key) { if (key == null || key.Length == 0) { return -2147483648; } return key.ToLower().GetHashCode(); }

Code:

Private Function CalcHash(Key) As Long Dim i As Long, L As Long, B() As Byte If CompareMode Then B = LCase$(Key) Else B = Key L = 7919 For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next CalcHash = L * B(0) Mod mHashTableSize End Function

Re: Simple and fast, lightweight HashList-Class (no APIs)

Code:

Private Function CalcHash(Key) As Long Dim i As Long, L As Long, B() As Byte If CompareMode Then B = LCase$(Key) Else B = Key L = 7919 For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next CalcHash = L * B(0) Mod mHashTableSize End Function

What is the name of algorithm?
The algorithm is very smart. The possibility of collision is relatively ZERO. JAVA string hash has famous Aa vs BB.

Re: Simple and fast, lightweight HashList-Class (no APIs)

@Romeo91

According to your own tests, even with only 20,000 entries - cHashList is faster than Dictionary at Retrieval by Key.

Scripting.Dictionary-ItemByKey:
.046875 sec
cHshList-ItemByKey:
2.734E-02 --> .027344 sec

watch out for those decimals...

Re: Simple and fast, lightweight HashList-Class (no APIs)

Just as an FYI, since my code is mentioned in this thread ... my Collections wrapper was written for one primary reason (and just had some extra methods thrown in for good measure).

Hopefully, this code illustrates the motivation for my wrapper:

Code:

Dim c As New CollectionEx 'Dim c As New Collection ' To just use built-in VB6 collection. Dim k1 As String Dim k2 As String k1 = ChrW$(&H1056) & ChrW$(&H1060) & ChrW$(&HBFD) & ChrW$(&H15F1) & ChrW$(&H31B1) & ChrW$(&HA19E) k2 = ChrW$(&HD9A2) & ChrW$(&H2770) & ChrW$(&H4DE5) & ChrW$(&H1AF0) & ChrW$(&HA28B) & ChrW$(&H2F86) c.Add "asdf", k1 c.Add "qwer", k2 ' <---- Error on built-in, error on Schmidt, NO error on Elroy.

I knew I was sacrificing some speed to accomplish this. But, for people who want to use binary keys, it gets the job done.

Regards,
Elroy

Re: Simple and fast, lightweight HashList-Class (no APIs)

cHashlist doesn't suffer this problem, as it uses its own binary hash function.

Re: Simple and fast, lightweight HashList-Class (no APIs)

I abuse the code for learning purpose.
If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
H.ReInit
H.Add 1, "KK"
H.Add 2, "LL"

I am worried about the Hash value is not identical or has collision.

Code:

Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000) mHashTableSize = 9 Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop ReDim HashTable(0 To mHashTableSize - 1) Dim i As Long For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next mDTUBound = 16: ReDim DataTable(0 To mDTUBound) mCount = 0 End Sub Private Function CalcHash(Key) As Long Dim i As Long, L As Long, B() As Byte If CompareMode Then B = LCase$(Key) Else B = Key L = 7919 For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next CalcHash = L * B(0) Mod mHashTableSize End Function

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Romeo91

If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList

The MS-Dictionary operates per default in Binary-StringComparemode,
whilst cHashList is operating in (case-insensitive) TextComparemode by default.

Maybe that was throwing things off a bit.

However - cHashList was just showcasing the concept of a simple HashTable-approach...
If you need more speed - please check-out the new performanceoptimized derivate (cHashD),
I've just put a Project-Zip-Attachment online at the bottom of the opener-post.

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by DaveDavis

I abuse the code for learning purpose.
If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
H.ReInit
H.Add 1, "KK"
H.Add 2, "LL"

I am worried about the Hash value is not identical or has collision.

As to your other question - the algorithm is "my own invention" (inspired a bit
by the Random-Key-Generator of the RC4-StreamCypher-algo).

And that two different String-Keys produce the same Hash-Value is entirely
normal - and expected in this scenario...

The Hash-Algorithm is only expected to deliver:
- always the same HashValue for a given Key and a given HashTable-Size
- thereby maintaining a relative good distribution over the HashTable-Size (the amount ot buckets).

The Zero you get for both "KK" and "LL" is "just another index" into the given HashTable.

The HashTable works well enough and quite fast, when later, in any given Hash-Bucket,
there's roughly 3-6 "data-index-pointers", after the expected amount of Key-Value-Pairs "has settled in".

Example for the new Zip-Project (cHashD) in the opener-post, which has a slightly changed algo
(now looping over WChars in a VB-SafeArray which was spanned over the String-Key):

E.g. when you expect about 20000 Key-Value-Pairs in your Hashing-Dictionary
(as is the default in the Test-Routine of the just mentioned new Demo),
the calculation for the HashTable-Size (which prefers Powers of two) -
comes up with 8192 (as the available amount of Buckets in the HashTable-Array).

That means, that the later on added 20000 Key-Value-Pairswould need to be distributed
(ideally) with 20000/8192 = 2.4 ---> meaning: "2-3 data-index-pointers per bucket".

Now, since we cannot expect the Hashing-Algo to provide such an ideal
distribution (especially not with such short String-Keys as used in the example),
we have to be satisfied with the following output:
(produced by the Friend Sub CheckHashDistribution() at the end of cHashD)

Code:

Distribution over a HashTable with 8192 slots: Used-HashSlots: 8109 Min-Entries: 1 Max-Entries: 6

From an "ideally distributing" Hash-Algo, we would have expected:

Code:

Distribution over a HashTable with 8192 slots: Used-HashSlots: 8192 Min-Entries: 2 Max-Entries: 3

But the results we achieved from the small and fast, but "less than ideal" algo are good enough,
to outperform the MS-Scripting.Dictionary. Between the endpoints of 1 and 6 as seen in the
Debug-Output, there'll be a gaussian normal-distribution IMO, so most of the entries *are*
in the expected 2-3-range...

So, the HashAlgo as seen in cHashList or (slightly changed in cHashD) is not thought
as a "secure Hash with at least 128Bit, which minimizes the probability of a collision to near zero"
(the Modulo-Op at the end the algo, which "folds" any calculated HashValue back onto
the available Index-Space for the HashTable-Array prevents that anyways, as does
the "only 32Bit long, randomizing carrier-variable" L).

The quality of such fast performing "Short-Hashes" will only show in the achieved
(hopefully equal) distribution across the available buckets (or Index-Slots).

I hope that clears a few things up...

Olaf

2 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

BTW, The Microsoft Dictionary from the scrrun.dll uses next hashing algorithm for strings:

Code:

INT CalcHashFromStringItem (BSTR bstrKey) { INT iResult = 0; for (INT i = 0; i < SysStringLen(bstrKey); i++) iResult = iResult * 17 + bstrKey[i]; iResult %= 1201; return iResult; }

I make an analog for testing it in VB6:

Code:

Option Explicit Private Declare Function GetMem4 Lib "msvbvm60" ( _ ByRef src As Any, _ ByRef Dst As Any) As Long Private Sub Form_Load() Dim sKey As String Dim pDic As Dictionary Dim index As Long Set pDic = New Dictionary For index = 0 To 10 sKey = MakeRandomString() Debug.Print pDic.HashVal(sKey), Dictionaty_HashVal(sKey) Next End Sub Private Function MakeRandomString() As String Dim l As Long: Dim i As Long l = Int(Rnd * 10) + 5 Do While l i = Int(Rnd * 62) Select Case i Case Is < 10 i = i + 48 Case Is < 36 i = i - 10 + 65 Case Else i = i - 36 + 97 End Select MakeRandomString = MakeRandomString & Chr$(i) l = l - 1 Loop End Function Private Function Dictionaty_HashVal( _ ByRef sValue As String) As Long Dim index As Long Dim result As Long #If COMPILED Then ' // We can't use unsigned arithmetic without overflow in IDE For index = 0 To Len(sValue) - 1 result = result * 17 + AscW(Mid$(sValue, index + 1, 1)) Next result = result Mod 1201 #Else ' // In order to bypass overflow error, we use currency 64-bit numbers Dim cResult As Currency For index = 0 To Len(sValue) - 1 cResult = cResult * 17 + AscW(Mid$(sValue, index + 1, 1)) / 10000@ ' // Zero high dword If cResult > 429496.7295@ Then GetMem4 0&, ByVal VarPtr(cResult) + 4 End If Next cResult = cResult * 10000@ result = cResult - Int(cResult / 1201@) * 1201@ #End If Dictionaty_HashVal = result End Function

Also i attached the project that allows make some tricks with MS Dictionary (sorting/get key/etc.). This knowledge had been obtained during reversing scrrun.dll code on Windows 7 x64 therefore it can distinct on the different platforms.
Attachment 140753

Re: Simple and fast, lightweight HashList-Class (no APIs)

You are the genius, The Trick.
I didn't see VB's implementation of C# string GetHashCode. From Reflector we can see the code, but hard part is that VB's Long is not equal C#'s Int. Probably we will get Over Flow error during LeftShift or RightShift. If we have to use BigInteger, It is harder.

Code:

public override unsafe int GetHashCode() { if (HashHelpers.s_UseRandomizedStringHashing) { return InternalMarvin32HashString(this, this.Length, 0L); } fixed (char* str = ((char*) this)) { char* chPtr = str; if (chPtr != null) { chPtr += RuntimeHelpers.OffsetToStringData; } int num = 0x15051505; int num2 = num; int* numPtr = (int*) chPtr; int length = this.Length; while (length > 2) { num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0]; num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; numPtr += 2; length -= 4; } if (length > 0) { num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0]; } return (num + (num2 * 0x5d588b65)); } }

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by DaveDavis

You are the tenant, The Trick.

What do you mean?

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by The trick

What do you mean?

:p:p

Re: Simple and fast, lightweight HashList-Class (no APIs)

I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Resurrected

I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?

Not really (not by much).

See, the MS-Scripting.Dictionary *is* already implmented in C/C++ (by people who knew their stuff) -
and still the VB-Implementation (along with the HashFunction I've used) runs circles around it,
when larger amounts of Key/Value-Pairs are used within cHashD.

The VB6-native compiler is based on VC++6 and produces decent enough binaries speedwise.
What it then comes down to (as always), is the choosen algorithm (and a bit of experience,
to avoid unnecessary memory-allocations)...

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

Resurrected, you can test my hash table. It works faster than Dictionary on my computer. This class has been written on VB6.

1 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by The trick

Resurrected, you can test my hash table. It works faster than Dictionary on my computer. This class has been written on VB6.

Anatoli, this thread here is to discuss the Demo-Code for:
- cHashList (roughly compatible to the VB-Collection)
- cHashD (roughly compatible to the Scripting.Dictionary)

I don't have anything against pointing out better alternatives in CodeBank-Threads,
but if someone does, there should be a few good reasons, as e.g. better performance,
or less and easier to understand code.

Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard because:
- it is much slower in most tests
- and it contains much more code to read, maintain and understand (including assembler-blocks)

Performance-Results, using your slightly adapted Test-Project: Attachment 140839
Using 100000 Key/Value-pairs (as in your Demo) cHashD is:
- about factor 4 as fast whilst adding these Key/Value-pairs
- about factor 6 as fast with Keyed-Item access
- about factor 5 as fast in an enumeration of Key/Value-pairs
- about factor 1.5 slower in the For Each Enumeration of Keys

So, besides the last point, there's only huge gains when using cHashD -
(instead of the MS-Dictionary or your HashList-implementation)
and all of them in areas which are really important for such a Container.

Olaf

1 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Schmidt

Anatoli, this thread here is to discuss the Demo-Code for:
- cHashList (roughly compatible to the VB-Collection)
- cHashD (roughly compatible to the Scripting.Dictionary)

Olaf, my apologies but i responded to Resurrected. I showed him that full analog of MS Dictionary written on VB6 works faster than origin written on C++. I showed him the FULL analog of Dictionary as response to:

Quote:

I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?

I don't have something against your class, i answered to Resurrected post. I was doing my class as replacement for MS Dictionary.
As test of performance:

Your class is far NO analog of Dictionary:

Your class have only string keys, dictionary and mine variant. And all tests works with string keys and String variables therefore it faster than mine. My class can use any keys 1, 1.0434@, &H100, Me, Null, Empty, etc like Dictionary, your doesn't. Just look at my CalcHash procedure.
If i compile project and tries to access to your cHashD through For Each it get crash (bug?), therefore i can't check performance through For each.
Attachment 140841
Your code can cause potential memory leak. For example:

Code:

Dim z As Long Dim p As Long Dim l As Long For z = 0 To 100 For p = 0 To UBound(HD.Items) l = l + Len(HD.ItemByIndex(p)) Next Next MsgBox l

Compile and look memory usage. Same with Keys properties.
Also please make correct test in For each ;):

Code:

For Each dat In HD.Items out = out + Len(dat) Next

I can't add Object item. This is probably bug you should use either VariantCopy or make checking IsObject.
I've not found Remove method. How can i remove an item from your hash table by a key/index?
In the most cases i can just replace Dictionary without changing any source code by my hash table. Your code requires to do changes in source code.

Again, i don't have something against your class. I just wanted to answer to the Resurrected's post and as the example i suggested my hash table. I didn't tell that my class is better than you or something else. Our classes has different functional your class is quite small and understandable, mine is more functional but bigger and slower, but as an analog of Dictionary it's better (this class was creating as replacement of Dictionary):

Quote:

Represent a standalone class implements a hash table, which in many cases can be a substitute for the dictionary (Dictionary) of Scripting runtime.

Regarding,
Krivous Anatoly (The trick).

1 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Just a few short comments to the above:

Thanks for pointing out the Bug in the Add-method (which is now accepting Objects as Items, too).

As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
Assembly-stuff, sitting in the same project).

As for your HashTableClass (since you made it a topic here):
- no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)
- and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.
- no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)

As for the whole topic of "being compatible to the MS-Dictionary":
- neither cHashD nor clsTrickHashTable are fully compatible (the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible")
- my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)
- with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%
- I will not do that for cHashD

Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).

Here's an example, which makes good use of quite a lot of Scripting.Dictionary methods - and
both (cHashD and clsTrickHashTable) can be used without changes in that WordCounting-Scenario:

Here's the test-results for scanning a "work of Shakespeare-Text-File": http://www.gutenberg.org/files/100/
Performance in Binary-CompareMode (case-sensitive):
- 465msec (cHashD)
- 755msec (Dictionary)
- 1201msec (clsTrickHashTable)

Performance in Text-CompareMode (case-insensitive):
- 647msec (cHashD)
- 1277msec (clsTrickHashTable)
- 3341msec (Dictionary)

Your class is not bad (especially in TextCompareMode) - but as said, not always faster than the MS-Dictionary.

Here's the WordScan-Test-Code: Attachment 140903

And here a ScreenShot:
http://vbRichClient.com/Downloads/FastWordScan.png

Thanks again for testing cHashD, pointing out that Object-Add-Bug (the Zip-Download in the Opener-Post is now updated).

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
Assembly-stuff, sitting in the same project).

https://www.youtube.com/watch?v=BrGiUvb4GPw

Quote:

- no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)

What do you mean?

Quote:

- and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.

I can bet with you. This is another test:

Code:

Option Explicit Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _ ByRef lpFrequency As Any) As Long Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _ ByRef lpPerformanceCount As Any) As Long Dim d As Dictionary Dim t As clsTrickHashTable Private Sub Form_Load() Dim index As Long Dim cFreq As Currency Dim cVal As Currency Dim time As Currency Dim tDic As Currency Dim tTrick As Currency Dim sKeys() As String MakeRandomKeys sKeys() QueryPerformanceFrequency cFreq Set d = New Dictionary Set t = New clsTrickHashTable ' // Check binary compare t.CompareMode = BinaryCompare d.CompareMode = BinaryCompare QueryPerformanceCounter time For index = 0 To UBound(sKeys) d.Add sKeys(index) & index, Me Next QueryPerformanceCounter tDic: tDic = tDic - time QueryPerformanceCounter time For index = 0 To UBound(sKeys) t.Add sKeys(index) & index, Me Next QueryPerformanceCounter tTrick: tTrick = tTrick - time MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _ "TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine End Sub Private Sub MakeRandomKeys( _ ByRef sKeys() As String) Dim index As Long Dim lLen As Long Dim char As Integer ReDim sKeys(500000) For index = 0 To UBound(sKeys) lLen = Int(Rnd * 50) + 10 Do While lLen char = Int(Rnd * 62) Select Case char Case Is < 10: char = char + 48 Case Is < 36: char = char + 55 Case Else: char = char + 61 End Select sKeys(index) = sKeys(index) & Chr$(char) lLen = lLen - 1 Loop Next End Sub

On my computer Windows 7 x64 SP1 AMD Athlon(tm) II X2 245 2.90 Ghz 6GB RAM:

Quote:

---------------------------
Project1
---------------------------
Dictionary: 24905,1339625567 ms.

TrickHashTable: 15037,1265761431 ms.

---------------------------
ОК
---------------------------

https://www.youtube.com/watch?v=xL-9Ow_fcWg
Overall (mine and yours tests) both Dictionary and TrickHashTable work identically with Binary compare mode (however i wrote so in the original topic):

Quote:

Run fast enough on my computer about as well (even a bit faster) as a dictionary with binary comparison...

With Text compare mode my class works faster than Dictionary.

Quote:

- no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)

No, you enumerated String keys. Mine Variant, ie. any Object, Boolean, String, etc. simultaneously.

Quote:

the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible"

Then why did you compare our classes? Maybe i'm wrong but you didn't say why my class isn't FULL compatible with MS Dictionary. Please tell me then i make it full compatible.

Quote:

- my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)

The controversial statement, my class is replacement of MS Dictionary. Dictionary/Collection contains Remove method.

Quote:

- with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%

Please tell me what should i add in order to make full compatibility.

Quote:

Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).

Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative. You can add 100 arrays with different keys types, but it won't be replacement.

Quote:

since you made it a topic here

You became to compare our classes.
About your test. Your class don't support different keys types (like TrickHashTable, Dictionary). For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:

Quote:

Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard

I can also write same.
Regarding,
Krivous Anatoly (The trick)

Re: Simple and fast, lightweight HashList-Class (no APIs)

@trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"

I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:

Code:

Sub Test() Dim x As New Scripting.Dictionary x.Item("Test") = "Hello" Debug.Print x.Item("Test") Dim y As New clsTrickHashTable y.Item("Test") = "Goodbye" Debug.Print x.Item("Test") End Sub

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by jpbro

@trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"

I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:

Code:

Sub Test() Dim x As New Scripting.Dictionary x.Item("Test") = "Hello" Debug.Print x.Item("Test") Dim y As New clsTrickHashTable y.Item("Test") = "Goodbye" Debug.Print x.Item("Test") End Sub

Thank you very much. I'll fix it.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by The trick

I can bet with you. This is another test:

Code:

Option Explicit Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _ ByRef lpFrequency As Any) As Long Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _ ByRef lpPerformanceCount As Any) As Long Dim d As Dictionary Dim t As clsTrickHashTable Private Sub Form_Load() Dim index As Long Dim cFreq As Currency Dim cVal As Currency Dim time As Currency Dim tDic As Currency Dim tTrick As Currency Dim sKeys() As String MakeRandomKeys sKeys() QueryPerformanceFrequency cFreq Set d = New Dictionary Set t = New clsTrickHashTable ' // Check binary compare t.CompareMode = BinaryCompare d.CompareMode = BinaryCompare QueryPerformanceCounter time For index = 0 To UBound(sKeys) d.Add sKeys(index) & index, Me Next QueryPerformanceCounter tDic: tDic = tDic - time QueryPerformanceCounter time For index = 0 To UBound(sKeys) t.Add sKeys(index) & index, Me Next QueryPerformanceCounter tTrick: tTrick = tTrick - time MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _ "TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine End Sub Private Sub MakeRandomKeys( _ ByRef sKeys() As String) Dim index As Long Dim lLen As Long Dim char As Integer ReDim sKeys(500000) For index = 0 To UBound(sKeys) lLen = Int(Rnd * 50) + 10 Do While lLen char = Int(Rnd * 62) Select Case char Case Is < 10: char = char + 48 Case Is < 36: char = char + 55 Case Else: char = char + 61 End Select sKeys(index) = sKeys(index) & Chr$(char) lLen = lLen - 1 Loop Next End Sub

Nope, no success here regarding "your bet", since on my Win8.1-system
I get (native compiled of course) the following results with your above code-snippet:
http://vbRichClient.com/Downloads/TestDictionaries.png

That the Dictionary performs similarily to your Class might be due to a better Mem-Allocator
in Win8.1 (compared to the one in your Win7) - but more likely it is the CPU-differences (AMD vs. Intel)
which is favouring one or the other implementation (and theres by far more Intel-based machines
out there these days than AMD-powered ones).

Also note, that cHashD is more than 10 times faster than either of the other Dictionaries
in such a "mass amount test" (like your 500,000 entries) of Key-Value-Pairs.

Quote:

Originally Posted by The trick

Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative.

I thought I already gave a real-world-example (the Word-Counting one), where each of our Classes
can be used as a simple "drop-in" replacement to the MS-Dictionary (without changing the CodeBase,
by just changing the Class-Types).

As I already said, I don't aim to be fully compatible to the MS-Dictionary - but cHashD is "compatible enough"
to be used as a direct replacments in 90% of the real-world-scenarios out there (that's what I meant with "use-cases").

These real-world-scenarios involve:
- first and foremost: fast adding and retrieving of Variant-Items per StringKey
- to a lesser extend: a fast Exists-Method (also mostly used with StringKeys)
- the ability to exchange an Item behind an existing Key to a different one
- and occasionally the usage of the RemoveAll-Method
- fast enumeration of the complete Key-Value-Pair by accessing them per Index also comes up quite often

The last one is not available in the MS-Dictionary or in your Class - but it is a real-world-requirement
which is very handy to have in many situations.

And in all my decades of experience with real-world applications, I very rarely needed the Remove-Method
on such Key/Value-pair containers (your mileage may vary).

Quote:

Originally Posted by The trick

You became to compare our classes.

Well, I didn't intend to - not here in this CodeBank-Thread - it was you who brought clsTrickHashTable into the discussion.

Quote:

Originally Posted by The trick

Your class don't support different keys types (like TrickHashTable, Dictionary).

It does support Integer Keys (Byte, Integer, Long up to Currency) and "Float-Keys" (Singles, Dates and Doubes) now.
In these modes cHashD will act even faster than with String-Keys.

What it doesn't support is, to use these different Key-Types in a "mixed way" - (would appreciate though,
when you could give me a real-world-example, where you used such "mixed-Keys-capabilities" to great success).

Quote:

Originally Posted by The trick

For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:

Well, it was easy enough to add that feature with a dozen lines of code (have now updated
the HashD.zip-Download at the end of the Top-Posting in this thread accordingly)...

Besides having it added now (just for kicks) - I'd like to see a real-world-example from you, where such Object-associations per Dictionary would be "nice to have"...
Usually when I need that, I simply add a Property:

Code:

Public AssociatedObject As Object

...into the Class in question - later assign the "Side-Object" to it and be done with the task.

As for your: "don't compare our Classes"... why not, when they both are useful in scenarios,
where the VB6-community formerly used the MS-Scripting.Dictionary.

cHashD is extremely fast (especially with a larger amount of Key/Value-pairs, where the
over-all-timings are often in the range of seconds) - scenarios similar to my WordCounting-example
are a quite common use-case.

In Use-Cases where e.g. the Remove-Method becomes paramount, your Class seems to be a good alternative, too.

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

If we maintain Hash value in sorted Asc Order, whatever inserting or adding or changing a Key, then FindIndex or Key removing is faster by using Binary Search.

Re: Simple and fast, lightweight HashList-Class (no APIs)

In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.

I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).
My app - https://github.com/ADIAProject/DIA

The program need the driver packages (DP*.7z) - can be downloaded:
http://driveroff.net/SamDrivers_16.8.torrent
http://driverpacks.net/downloads

1 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Romeo91

In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.

I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).

First, to get the best speed out of VB6-code, you should compile such performance-critical stuff
natively -> with all extended Options.

In your project (in case you "hung-in" cHashD as is) this is not the case.

If you want to retain your current Compile-Settings as they are in your Main-Project,
then compile cHashD (with all extended native Compile-Options) into an ActiveX-Dll.

That said, for your special case you can also fine-tune cHashD "as you need it"
(it's codebase is simple and small enough for that)...

Also, I've seen a lot of lines in your code like:

Code:

If Not Dictionary.Exists(SomeKey) Then Dictionary.Add SomeKey, SomeItem End If

That means, that you already ensure uniqueness in said Dictionaries on your own ("by hand") -
and thus you could instruct cHashD, to leave out its Test for Uniqueness in the Add-Method
(either by setting the Public Prop UniqueKeys to False after Instantiation, or already ensuring
mUniqueKeys to False in Class_Initialize - as the new Default).

Also, in case you use only String-Keys, why not changing all Key-Parameters to Strings instead of
Variants - that will give it another little performance-kick.

What's also important is, that you can give hints to cHashD about the (roughly) expected
number of entries - in your case these seem to be "number of Text-Lines in *.inf-Files" or something...

What your performance-critical code also needs, is a bit more "structuring" IMO ...
Long Procedures with about 1000 lines of code (like your DevParserByRegExp) are not
really nice to read, nor easy maintainable.

Not sure, whether the following example is useful for you, but it calculates the
Line-differences of two "modern" (UTF16-LE) *.inf Files, as they exist these days e.g.
in the MS-SxS Folder.

The adapted cHashD-version in the following Demo-Zip: Attachment 141021
performs about 50% faster than the MS-Dictionary in this test (finding the LineDifferences
of two blue-tooth related, nearly identical Inf-Files with about 32K WChars each.

The time to read all the lines of these two files into two Dictionaries first:
- cHashD needs about 2msec
- MS-Dictionary about 3.7msec

To find the Line-differences in these two Dictionaries:
- cHashD needs about 0.4msec
- MS-Dictionary about 0.6msec
(the above measurement was done in a 100x repetition-loop, to allow for easier timing-comparison).

As said, not sure whether such a Line-Delta-Scan is useful in your scenario,
but if your App is thought to manage (or switch) between (nearly) identical Driver-Files,
then I can imagine that this could be of help (comparing only the handful of different lines
in two Files with a "deeper analysis" should be faster, than performing the deep analysis on
the Files as a whole).

HTH

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

Olaf:
Can u help make KeyByIndex property to be editable?

I have and these codes to cHashD, but can not make it.

Code:

Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS) If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9 Select Case mKeyMode Case eUseStrings: RHS = CStr(RHS) VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS) Case eUseIntegers: VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS) Case eUseDoubles: VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS) End Select End Property Public Property Set KeyByIndex(ByVal IndexZeroBased As Long, RHS) If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9 KeyByIndex(IndexZeroBased) = RHS End Property

Code:

Sub testll() Dim c As New cHashD c.Add 1, 11 c.KeyByIndex(0) = "2" Debug.Print c.KeyByIndex(0) 'the key is still "1" End Sub

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by loquat

Can u help make KeyByIndex property to be editable?

Why do you need that (I ask, because the need for something like that was near zero in the scenarios I've encountered over the last two decades).

I mean, there is some scenarios where it is needed, but these I usually cover with DBs and SQL.

Quote:

Originally Posted by loquat

I have and these codes to cHashD, but can not make it.

Code:

Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS) If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9 Select Case mKeyMode Case eUseStrings: RHS = CStr(RHS) VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS) Case eUseIntegers: VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS) Case eUseDoubles: VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS) End Select End Property

There's no need to overcomplicate things with API-calls in this case -
simple Value-assignments would be enough, like: mKeys(IndexZeroBased) = CStr(RHS)

But with something like that, you're only "halfway-there", because the old Key you have just overriden,
will still link to the old Index-Position of Keys and Value (when doing a Hash-based search), -
and for the new Key no such Hash-Link exists yet.

So in general, such requirements are solved, by removing the old Pair (OldKey and OldItemValue) completely from the List -
followed by adding the new Pair (NewKey/OldItemValue) over the normal Add-mechanism.

Olaf

1 Attachment(s)

Re: Simple and fast, lightweight HashList-Class (no APIs)

Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed

Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):
http://vbRichClient.com/Downloads/cHashDWithRemove.png

Ok, here the Zip with the new version (including the little PerformanceTest).
Attachment 177177

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Schmidt

Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed

Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):
http://www.vbforums.com/images/ieimages/2020/05/1.png

Ok, here the Zip with the new version (including the little PerformanceTest).
Attachment 177177

Olaf

I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!
Also on VB6 try with: Const Count As Long = 1048500

Code:

Sub HashTable_Dictionary() Const Count As Long = 1048500 Dim SD As Scripting.Dictionary Dim i As Long, T@ Set SD = New Scripting.Dictionary Dim HD As cHashTable Set HD = New cHashTable T = msTimer For i = 1 To Count SD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i Next Debug.Print "Dictionary - AddItems", msTimer - T, SD.Count T = msTimer HD.ReInit Count For i = 1 To Count HD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i Next Debug.Print "HashTable - AddItems", msTimer - T, HD.Count End Sub

Attachment 180419

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by PhuongNam

I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!

Wild guess: you missed this text in parentheses in OP -- "(native compiled, all options checked)"

*Everything* in the IDE/p-code is slow because array index bounds checking cannot be switched off. Only *native* compiled executable has the benefit of accessing entries in a buffer without the overhead.

cheers,
</wqw>

Re: Simple and fast, lightweight HashList-Class (no APIs)

Hi ~ Schmidt

why when string array , it error ?

dim w() As string, saW As SAFEARRAY1D
BindArray w, VarPtr(saW) ' ----> VB6 show it is error !

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Schmidt

Not much to add to the Threads Title...

the implementation of cHashList comes in only about 100 lines of code, and it can
be used as a VB.Collection-compatible replacement with much better performance. ... .. .
...
..
.
Olaf

Dear Schmidt,

Such sleek code (as usual) and such amazing speed (in comparison to Dictionary, etc.). A big wow, as usual. Thanks a TON.

Well, I tried out your code (just the adding routine alone, commenting out the rest, incl. the code related to the scripting Dictionary) and was curious to know as to what is the largest Count your hash table would support. In my system with 16GB RAM, I was able to add 450,83,000 records when I just added a long value as key as follows. More than 450,83,000 records, it resulted in error (Run-time error '7': Out of memory).

Code:

For i = 1 To Count: HD.Add i, i: Next

When I added a two-characters string as follows, I was able to add around 245,00,000 records. More than that, it resulted in error (Run-time error '7': Out of memory).

Code:

For i = 1 To Count: HD.Add "S " & i, i: Next

So, obviously, more the length of the key, less the no. of records I was able to add. Perhaps in a system with only 2GB memory, may be I will be able to add still further less records only for the same above codes. So, in real time, when my app is adding some records in user-end, how will my code be able to know that it might result in 'Out of memory' at some point of time and hence can't keep adding more records (and hence have to inform the user)? As such, as of now, I don't have the need to add 450,00,000+ records but I am just curious to know as to how to find out in real time whether adding any further records rmay result in 'out of memory' situation and inform the user accordingly OR take some other action.

Ideally, upto what value of Count can I safely use your 'hash table code' in any system (assuming 2GB is the lowest RAM I might encounter in an user's system)?

Kind regards.

EDIT-1:
I am using your latest code only (cHashDWithRemove)

EDIT-2:
My above observations were with the compiled code only; not in the IDE. By the way, I sometimes observed that for large values of Count, even though .Add (adding records) worked correctly uptil Count, if I give HD.Clear later (say, after printing HD.Count), it caused "Run-time error '7': Out of memory"

Kind regards.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by Schmidt

Update: The latest version is contained in the Zip below, with the following benefits:
- better performance
- RemoveByIndex and Remove (by Key) is now included
- variable Keys (incl. Objects), which can also be mixed
... .. .
Ok, here the Zip with the new version (including the little PerformanceTest).
cHashDWithRemove.zip

Olaf

Dear Schmidt,

I tried the following code, wherein I have added duplicate keys and also tried to fetch all the items for a particular duplicate key.

Code:

HD.Add 1, 1 HD.Add 2, 2 HD.Add 1, 2 HD.Add 1, 3 HD.Add 1, 4 HD.Add 3, 3 HD.Add 1, 5 For i = 6 To 50000: HD.Add i, i: Next Dim v() As Variant v = HD.ItemsAll(1) Debug.Print UBound(v)

For "ItemsAll" function, I wrote my own code as follows as I could not find a way with your existing code to get that info. Sorry if a way exists and I did not know it. Kindly let me know that way, if it exists.

The following code works correctly for the above case but I am not sure whether it will work perfectly for all cases. Also, I think there must be a more optimal way to do it. Can you please provide me the optimal code for "ItemsAll", when your time permits, please.

I did not know how to use IndexByKey to find out the last index of a non-unique key. IndexByKey returns the first found index of a duplicate key only. Ideally, FindFirst, FindNext, and FindLast for any duplicate key would be fine. Can you provide me with the optimal codes for these functions as I don't know how to write them in the perfect/optimal way. Thanks in advance.

Code:

Public Function ItemsAll(key) As Variant() 'hand-out all the items for a non-unique Key If mCount = 0 Then ItemsAll = Array(): Exit Function Dim i As Long, j As Long, v() Dim itemsCnt As Long Dim idx As Long idx = IndexByKey(key) ReDim v(0 To mCount - (idx + 1)) itemsCnt = 0 j = idx For i = idx To mCount - 1 Do While mValues(j) = NoEntry: j = j + 1: Loop If KeyByIndex(i) = key Then VariantCopy v(itemsCnt), ByVal VarPtr(mValues(j)) itemsCnt = itemsCnt + 1 End If j = j + 1 Next ReDim Preserve v(0 To itemsCnt - 1) As Variant ItemsAll = v End Function

Kind regards.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:

Code:

Option Explicit Sub Test() Dim lo_Lists As cHashD Dim lo_List As RC6.cArrayList Dim l_Key As String Dim ii As Long Randomize Set lo_Lists = New cHashD For ii = 1 To 5000 l_Key = Int(Rnd * 10) + 1 If lo_Lists.Exists(l_Key) Then Set lo_List = lo_Lists(l_Key) Else Set lo_List = New_c.ArrayList(vbLong) lo_Lists.Add l_Key, lo_List End If lo_List.Add Int(Rnd * 100) + 1 Next ii l_Key = Int(Rnd * 10) + 1 Set lo_List = ItemsAll(lo_Lists, l_Key) For ii = 0 To lo_List.Count - 1 Debug.Print "Item #" & ii + 1 & " for key " & l_Key & ": " & lo_List(ii) Next ii End Sub Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList If po_HashList.Exists(p_Key) Then Set ItemsAll = po_HashList(p_Key) Else Err.Raise vbObjectError, , "Key not found." End If End Function

If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by jpbro

Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:
... .. .
..
.
If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.

First of all, thank you sooo much for taking time to answer my query and that too in such a helpful manner. Thank you sooo much.

Actually, in a project I commenced recently, I was/am handling it the way you have suggested only. I am reading a file once and holding its data in an array of structures. The structure has many fields and one of the date fields has multiple values, in 1000s, but for some keys alone. In other words, this date field can have a minimum of 1 value to any number of indeterminate values (which could be in 1000s). As written earlier, I am using variable array only for this field. And, as you said, Redim was an overhead since I had to Redim this array field (increasing by 1) as and when a new value came for the same key.

A few days after working with the aforesaid structure, I started learning about Collections. The need to use Collections had not arisen earlier and hence I did not know anything about it, as such. Once I started learning about Collections, I came to know about Dictionaries too and that they were faster than Collections. And, I read on the internet (from multiple sources) that the best and ideal way to handle multiple values for a single key is to have them as separate key-value pairs. This was with respect to database tables though. But, I thought perhaps the same would be ideal for Collections/Dictionaries too. And, while searching for Collection/Dictionary alternatives, I came upon Olaf's CHashD class which was super super fast when compared to Dictionaries. That is when I tried it and I posted my query with respect to ItemsAll function.

Well, if you feel even with 1000s of multiple values for a single key, the method which I was already employing and which is the one which you have also recommended is the best option, then I would go with it. I am not an expert in these areas. So, kindly give your confirmed opinion on whether this is the best way. Also, let me know what is the best way to redim. Since the count of the multiple values are not known beforehand, I am redimming every time a new value comes in for the same key. Is there a better way? I wish to mention that the number of keys is also in the 1000s.

Note:
Regarding RC6, I know about it well but because of lack of time, I am unable to explore, learn and integrate it with my app yet. And, a database table will not suit my case since I have to make some queries on user input instantly (as the user keeps typing) and display the muti-layer results real-time (on a microsecond level). Also, in addition to collections, I am not in a position to spend time to learn additionally about databases also (their integration, querying them, testing them, speed comparison, etc. etc.). So, a super-fast collection like Olaf's cHashTable will be the ideal one.

Very kind regards.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by softv

I am reading a file once and holding its data in an array of structures.

The structure has many fields and one of the date fields has multiple values, in 1000s,

Is there any chance we could see the definition of this "structure" (the UDT-definition)?
(along with an info, which field of the structure is the "key-field")?

Olaf

Re: Simple and fast, lightweight HashList-Class (no APIs)

I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):

1. Using a cHashD object of cHashD object
2. Using a cHashD object of RC6.cArrayList object
3. Using a cHashD object of Variant Arrays

Here's the code:

Code:

Option Explicit Sub Test(Optional po_Form As Form) Dim lo_Lists As cHashD Dim lo_List As cHashD Dim l_Key As String Dim ii As Long Randomize New_c.Timing True Set lo_Lists = New cHashD For ii = 1 To 5000 l_Key = "KEY_" & Int(Rnd * 10) + 1 If lo_Lists.Exists(l_Key) Then Set lo_List = lo_Lists(l_Key) Else Set lo_List = New cHashD lo_List.UniqueKeys = False lo_Lists.Add l_Key, lo_List End If lo_List.Add "", Int(Rnd * 100) + 1 Next ii l_Key = "KEY_" & Int(Rnd * 10) + 1 Set lo_List = ItemsAll(lo_Lists, l_Key) If po_Form Is Nothing Then Debug.Print New_c.Timing Else po_Form.Print New_c.Timing End If End Sub Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As cHashD If po_HashList.Exists(p_Key) Then Set ItemsAll = po_HashList(p_Key) Else Err.Raise vbObjectError, , "Key not found." End If End Function Sub Test2(Optional po_Form As Form) Dim lo_Lists As cHashD Dim lo_List As RC6.cArrayList Dim l_Key As String Dim ii As Long Randomize New_c.Timing True Set lo_Lists = New cHashD For ii = 1 To 5000 l_Key = "KEY_" & Int(Rnd * 10) + 1 If lo_Lists.Exists(l_Key) Then Set lo_List = lo_Lists(l_Key) Else Set lo_List = New_c.ArrayList(vbLong) lo_Lists.Add l_Key, lo_List End If lo_List.Add Int(Rnd * 100) + 1 Next ii l_Key = "KEY_" & Int(Rnd * 10) + 1 Set lo_List = ItemsAll2(lo_Lists, l_Key) If po_Form Is Nothing Then Debug.Print New_c.Timing Else po_Form.Print New_c.Timing End If End Sub Private Function ItemsAll2(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList If po_HashList.Exists(p_Key) Then Set ItemsAll2 = po_HashList(p_Key) Else Err.Raise vbObjectError, , "Key not found." End If End Function Sub Test3(Optional po_Form As Form) Dim lo_Lists As cHashD Dim la_ListsIdx() As Long Dim la_List() As Variant Dim l_ListsIdx As Long Dim l_ListIdx As Long Dim l_Key As String Dim ii As Long Randomize New_c.Timing True Set lo_Lists = New cHashD ReDim la_ListsIdx(32) For ii = 1 To 5000 l_Key = "KEY_" & Int(Rnd * 10) + 1 If lo_Lists.Exists(l_Key) Then la_List = lo_Lists(l_Key) l_ListsIdx = lo_Lists.IndexByKey(l_Key) l_ListIdx = la_ListsIdx(l_ListsIdx) + 1 If l_ListIdx > UBound(la_List) Then ReDim Preserve la_List(UBound(la_List) * 2) End If lo_Lists(l_Key) = la_List la_ListsIdx(l_ListsIdx) = l_ListIdx Else l_ListIdx = 0 la_List = MakeList lo_Lists.Add l_Key, la_List l_ListsIdx = lo_Lists.Count - 1 If l_ListsIdx > UBound(la_ListsIdx) Then ReDim Preserve la_ListsIdx(UBound(la_ListsIdx) * 2) End If la_ListsIdx(l_ListsIdx) = 0 End If la_List(l_ListIdx) = Int(Rnd * 100) + 1 Next ii For ii = 0 To lo_Lists.Count - 1 la_List = lo_Lists.ItemByIndex(ii) ReDim Preserve la_List(la_ListsIdx(ii)) lo_Lists.ItemByIndex(ii) = la_List Next ii l_Key = "KEY_" & Int(Rnd * 10) + 1 la_List = ItemsAll3(lo_Lists, l_Key) If po_Form Is Nothing Then Debug.Print New_c.Timing Else po_Form.Print New_c.Timing End If End Sub Private Function MakeList() As Variant() ReDim MakeList(32) End Function Private Function ItemsAll3(po_HashList As cHashD, ByVal p_Key As String) As Variant If po_HashList.Exists(p_Key) Then ItemsAll3 = po_HashList(p_Key) Else Err.Raise vbObjectError, , "Key not found." End If End Function

I'm not sure I've programmed the Variant Array approach optimally, as I didn't want to spend too much time on it. Fiddling with ReDim's and indexes is tiring when we have the cArrayList or cHashD classes handy. That said, I've tried to limit the # of ReDim's to save time, as ReDimmming+1 is a performance killer. Instead what I did is start with a "bucket" of 33 elements, then I keep doubling the size of the bucket as needed. This means we also need to track the current index for each bucket. That said, either I have a bug or the Variant Array list approach won't work with cHashD. I kept getting empty arrays back, so I suspect cHashD doesn't support Arrays as items.

The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.

Anyway,I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.

Re: Simple and fast, lightweight HashList-Class (no APIs)

Quote:

Originally Posted by jpbro

I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):

1. Using a cHashD object of cHashD object
2. Using a cHashD object of RC6.cArrayList object
3. Using a cHashD object of Variant Arrays

... .. .
..
.

The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.

Anyway, I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.

Thank you so much once again for taking time amidst your own hectic schedules to chalk out such detailed scenarios and also test cases for each of them. It was really helpful. Today I could get time to run and understand the 1st case ("cHashD object of cHashD object"). I think I will go with it, until I am able to get the right time to integrate the fabulous RC6 in my apps (I am sure, some day in future, I will integrate RC6, and I am eagerly looking forward to that day too).

Your idea (of cHashD object of cHashD object) just did not occur to my mind at all. Brilliant! Thanks a TON, once again.

Kind regards.