Page 1 of 2 12 LastLast
Results 1 to 40 of 54

Thread: Simple and fast, lightweight HashList-Class (no APIs)

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Simple and fast, lightweight HashList-Class (no APIs)

    Not much to add to the Threads Title...

    the implementation of cHashList comes in only about 100 lines of code, and it can
    be used as a VB.Collection-compatible replacement with much better performance.

    The following Methods/Properties are supported:
    - CompareMode (to switch between case-insensitive and case-sensitive compares, default-behaviour is like the VB.Collection)
    - UniqueKeys (to allow multiple entries with the same Key, when switched from the True-defaultsetting to False)
    - Count
    - Add(Item As Variant, Optional Key As String)
    - Exists(Key As String)
    - IndexByKey(Key As String)
    - KeyByIndex(ByVal IndexOneBased As Long)
    - Item(KeyOrOneBasedIndex As Variant) ... (Get, Let and Set)
    - ReInit(Optional ByVal ExpectedItemCount As Long = 5000)

    Indexed access (for both, Keys and Items) is by orders of magnitude faster than the VB-Collection.
    What's possible now as well (compared to the VB-Collection) is the ability to overwrite Item-Values
    (at a given Index- or Key-position).

    Note, that in the above List a Remove-Method is missing -
    I've left this out for two reasons:
    1) to demonstrate that a simplified HashList-Implementation doesn't necessarily need to be a linked List
    2) because Remove is not used very often, when a Collection is used primarily as a fast "Key-Lookup-Container"
    ... (for Queue- or Stack-scenarios one can always use the normal VBA.Collection)

    Performance is about 6 times as fast, when Key-Value-pairs are added -
    and about twice as fast when Items are retrieved per Key-Lookup...

    Here's a ScreenShot:


    Here's the Class-Code of cHashList:
    Code:
    Option Explicit
     
    Private Type DataTableEntry
      Key As String
      Value As Variant
    End Type
    Private Type HashTableEntry
      DataIndexes() As Long
    End Type
     
    Private DataTable() As DataTableEntry, HashTable() As HashTableEntry
    Private mCount As Long, mDTUBound As Long, mHashTableSize As Long
     
    Public CompareMode As VbCompareMethod, UniqueKeys As Boolean
    
    Private Sub Class_Initialize()
      UniqueKeys = True
      CompareMode = vbTextCompare
      ReInit
    End Sub
    
    Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000)
      mHashTableSize = 8
      Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop
      ReDim HashTable(0 To mHashTableSize - 1)
      
      Dim i As Long
      For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next
      mDTUBound = 16: ReDim DataTable(0 To mDTUBound)
      mCount = 0
    End Sub
    
    Public Property Get Count() As Long
      Count = mCount
    End Property
    
    Public Function Exists(Key As String) As Boolean
      Exists = FindIndex(Key, CalcHash(Key)) > 0
    End Function
    Public Function IndexByKey(Key As String) As Long
      IndexByKey = FindIndex(Key, CalcHash(Key))
    End Function
    
    Public Sub Add(Item, Optional Key As String)
    Dim HashValue As Long, UB As Long
      HashValue = CalcHash(Key)
      If UniqueKeys Then If FindIndex(Key, HashValue) Then Err.Raise 457
      
      'prolong and add to the new entries to the DataTable-Array
      mCount = mCount + 1
      If mDTUBound < mCount Then mDTUBound = mDTUBound * 1.5: ReDim Preserve DataTable(0 To mDTUBound)
      DataTable(mCount).Key = Key
      DataTable(mCount).Value = Item
      
      'add the new DataIndex to the proper Hash-Buckets .DataIndexes-Array
      With HashTable(HashValue)
        UB = UBound(.DataIndexes): UB = UB + 1
        ReDim Preserve .DataIndexes(0 To UB)
        .DataIndexes(UB) = mCount
      End With
    End Sub
    
    Public Property Get KeyByIndex(ByVal IndexOneBased As Long)
      If IndexOneBased < 1 Or IndexOneBased > mCount Then Err.Raise 9
      KeyByIndex = DataTable(IndexOneBased).Key
    End Property
    
    Public Property Get Item(KeyOrOneBasedIndex)
    Dim Index As Long
      If VarType(KeyOrOneBasedIndex) = vbString Then
        Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex))
        If Index = 0 Then Err.Raise 457
      Else
        Index = KeyOrOneBasedIndex
        If Index < 1 Or Index > mCount Then Err.Raise 9
      End If
      If IsObject(DataTable(Index).Value) Then
        Set Item = DataTable(Index).Value
      Else
        Item = DataTable(Index).Value
      End If
    End Property
    
    Public Property Let Item(KeyOrOneBasedIndex, NewItem)
    Dim Index As Long
      If VarType(KeyOrOneBasedIndex) = vbString Then
        Index = FindIndex(KeyOrOneBasedIndex, CalcHash(KeyOrOneBasedIndex))
        If Index = 0 Then Err.Raise 457
      Else
        Index = KeyOrOneBasedIndex
        If Index < 1 Or Index > mCount Then Err.Raise 9
      End If
      If IsObject(NewItem) Then
        Set DataTable(Index).Value = NewItem
      Else
        DataTable(Index).Value = NewItem
      End If
    End Property
    Public Property Set Item(KeyOrOneBasedIndex, NewItem)
      Item(KeyOrOneBasedIndex) = NewItem
    End Property
    
    Private Function FindIndex(Key, ByVal HashValue As Long) As Long
    Dim i As Long, CM As VbCompareMethod
      With HashTable(HashValue)
        CM = CompareMode
        For i = 1 To UBound(.DataIndexes)
          If StrComp(Key, DataTable(.DataIndexes(i)).Key, CM) = 0 Then
            FindIndex = .DataIndexes(i): Exit Function
          End If
        Next
      End With 'returns Zero, when no Key can be found
    End Function
    
    Private Function CalcHash(Key) As Long
    Dim i As Long, L As Long, B() As Byte
      If CompareMode Then B = LCase$(Key) Else B = Key
      L = 7919
        For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
      CalcHash = L * B(0) Mod mHashTableSize
    End Function
    
    Friend Sub CheckHashDistribution()
    Dim i As Long, UB As Long, cc As Long, Min As Long, Max As Long
      Min = &H7FFFFFFF
      For i = 0 To UBound(HashTable)
        UB = UBound(HashTable(i).DataIndexes)
        If UB Then
          If Min > UB Then Min = UB
          If Max < UB Then Max = UB
          cc = cc + 1
        End If
      Next
      Debug.Print "Distribution over a HashTable with"; UBound(HashTable) + 1; "slots:"
      Debug.Print "Used-HashSlots:"; cc
      Debug.Print "Min-Entries:"; Min
      Debug.Print "Max-Entries:"; Max
    End Sub
    And here the Code of the Test-Form:
    Code:
    Option Explicit
    
    Private Const TestEntryCount As Long = 100000
    
    Private C As Collection, H As cHashList
    
    Private Sub Form_Click()
    Dim i As Long, T!, Item
      AutoRedraw = True
      Cls
      Print "Count of Test-Entries:"; TestEntryCount; vbLf
      
      Set C = New Collection
      Set H = New cHashList
          H.ReInit TestEntryCount
          
      T = Timer
        For i = 1 To TestEntryCount
          C.Add i, "K" & i
        Next
      Print "Collection-Add:", Timer - T & "sec"
      
      T = Timer
        For i = 1 To TestEntryCount
          H.Add i, "K" & i
        Next
      Print "cHashList-Add:", Timer - T & "sec"; vbLf
      
      T = Timer
        For i = 1 To TestEntryCount
          Item = C.Item("K" & i)
        Next
      Print "Collection-ItemByKey:", Timer - T & "sec"
      
      T = Timer
        For i = 1 To TestEntryCount
          Item = H.Item("K" & i)
        Next
      Print "cHashList-ItemByKey:", Timer - T & "sec"
     
      Print vbLf; "Indexed access is not compared (it would be much faster per HashList)"
      H.CheckHashDistribution
    End Sub
    Have fun with it...

    Edit: Whilst the above Demo was more a "Concept-Showcase"... below is a performance-optimized version,
    acting more like the MS-Scripting-Dictionary now (with a Param-wise inversed Add-Method, but feel free
    to change it at will - the code-volume is still bearable with about 230 lines in
    cHashD).
    In the meantime I've also added support for Integer-, Double- and Object-Keys - but I think I'll leave it at that -
    since the Class is easy understandable and expandable...

    HashD.zip

    Performance of Case-Insensitive-TextcompareMode for Scripting-Dictionary and cHashD


    Performance of Binary-TextcompareModes for Scripting-Dictionary and cHashD


    Olaf
    Last edited by Schmidt; Sep 4th, 2016 at 12:07 PM.

  2. #2
    Lively Member
    Join Date
    Mar 2012
    Posts
    68

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Hi.
    Here is a quick comparison of the last wraps over VB.Collection

    cHashList much faster, but only for very large collections
    Name:  vb_compare.png
Views: 2805
Size:  32.7 KB
    If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList
    Name:  vb_compare2.png
Views: 2703
Size:  26.1 KB
    I compared only operation .Add, .Item, .Exist

    Elroy - http://www.vbforums.com/showthread.p...B6-Collections
    Schmidt - http://www.vbforums.com/showthread.p...Indexed-Access

  3. #3
    Hyperactive Member
    Join Date
    Aug 2016
    Posts
    437

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    In C#, String "K0" lower case return Hash -843466811, cHashList CalcHash returns 8611. Is it safe hash number?

    Code:
    public static int GetHashFromKey(string key)
       {
          if (key == null || key.Length == 0)
          {
             return -2147483648;
          }
          return key.ToLower().GetHashCode();
       }
    Code:
    Private Function CalcHash(Key) As Long
    Dim i As Long, L As Long, B() As Byte
      If CompareMode Then B = LCase$(Key) Else B = Key
      L = 7919
        For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
      CalcHash = L * B(0) Mod mHashTableSize
    End Function
    Last edited by DaveDavis; Aug 30th, 2016 at 03:12 AM.

  4. #4
    Hyperactive Member
    Join Date
    Aug 2016
    Posts
    437

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Code:
    Private Function CalcHash(Key) As Long
    Dim i As Long, L As Long, B() As Byte
      If CompareMode Then B = LCase$(Key) Else B = Key
      L = 7919
        For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
      CalcHash = L * B(0) Mod mHashTableSize
    End Function
    What is the name of algorithm?
    The algorithm is very smart. The possibility of collision is relatively ZERO. JAVA string hash has famous Aa vs BB.
    Last edited by DaveDavis; Aug 30th, 2016 at 07:10 AM.

  5. #5
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    @Romeo91

    According to your own tests, even with only 20,000 entries - cHashList is faster than Dictionary at Retrieval by Key.

    Scripting.Dictionary-ItemByKey:
    .046875 sec
    cHshList-ItemByKey:
    2.734E-02 --> .027344 sec

    watch out for those decimals...
    Last edited by DEXWERX; Aug 30th, 2016 at 10:22 AM.

  6. #6
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    7,840

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Just as an FYI, since my code is mentioned in this thread ... my Collections wrapper was written for one primary reason (and just had some extra methods thrown in for good measure).

    Hopefully, this code illustrates the motivation for my wrapper:

    Code:
    
        Dim c As New CollectionEx
        'Dim c As New Collection ' To just use built-in VB6 collection. 
        Dim k1 As String
        Dim k2 As String
    
    
        k1 = ChrW$(&H1056) & ChrW$(&H1060) & ChrW$(&HBFD) & ChrW$(&H15F1) & ChrW$(&H31B1) & ChrW$(&HA19E)
        k2 = ChrW$(&HD9A2) & ChrW$(&H2770) & ChrW$(&H4DE5) & ChrW$(&H1AF0) & ChrW$(&HA28B) & ChrW$(&H2F86)
    
    
        c.Add "asdf", k1
        c.Add "qwer", k2 ' <---- Error on built-in,  error on Schmidt,  NO error on Elroy. 
    
    I knew I was sacrificing some speed to accomplish this. But, for people who want to use binary keys, it gets the job done.

    Regards,
    Elroy
    Any software I post in these forums written by me is provided “AS IS” without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. Please understand that I’ve been programming since the mid-1970s and still have some of that code. My contemporary VB6 project is approaching 1,000 modules. In addition, I have a “VB6 random code folder” that is overflowing. I’ve been at this long enough to truly not know with absolute certainty from whence every single line of my code has come, with much of it coming from programmers under my employ who signed intellectual property transfers. I have not deliberately attempted to remove any licenses and/or attributions from any software. If someone finds that I have inadvertently done so, I sincerely apologize, and, upon notice and reasonable proof, will re-attach those licenses and/or attributions. To all, peace and happiness.

  7. #7
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    cHashlist doesn't suffer this problem, as it uses its own binary hash function.

  8. #8
    Hyperactive Member
    Join Date
    Aug 2016
    Posts
    437

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    I abuse the code for learning purpose.
    If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
    H.ReInit
    H.Add 1, "KK"
    H.Add 2, "LL"

    I am worried about the Hash value is not identical or has collision.

    Code:
    Public Sub ReInit(Optional ByVal ExpectedItemCount As Long = 5000)
      mHashTableSize = 9
      Do Until mHashTableSize * 5 > ExpectedItemCount: mHashTableSize = mHashTableSize * 2: Loop
      ReDim HashTable(0 To mHashTableSize - 1)
      
      Dim i As Long
      For i = 0 To UBound(HashTable): ReDim HashTable(i).DataIndexes(0 To 0): Next
      mDTUBound = 16: ReDim DataTable(0 To mDTUBound)
      mCount = 0
    End Sub
    
    Private Function CalcHash(Key) As Long
    Dim i As Long, L As Long, B() As Byte
      If CompareMode Then B = LCase$(Key) Else B = Key
      L = 7919
        For i = UBound(B) To 0 Step -1: L = (i + B(i) + L) * 37 And &H7FFFFF: Next
      CalcHash = L * B(0) Mod mHashTableSize
    End Function
    Last edited by DaveDavis; Aug 30th, 2016 at 07:09 PM.

  9. #9

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Romeo91 View Post
    If we take for example the value iteration 20000, it is not so simple - Scripting.Dictionary shows advantage over cHashList
    The MS-Dictionary operates per default in Binary-StringComparemode,
    whilst cHashList is operating in (case-insensitive) TextComparemode by default.

    Maybe that was throwing things off a bit.

    However - cHashList was just showcasing the concept of a simple HashTable-approach...
    If you need more speed - please check-out the new performanceoptimized derivate (cHashD),
    I've just put a Project-Zip-Attachment online at the bottom of the opener-post.

    Olaf

  10. #10

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by DaveDavis View Post
    I abuse the code for learning purpose.
    If we change mHashTableSize from 8 to 9 in ReInit, then the Hash of string "KK" and "LL" both are 0:
    H.ReInit
    H.Add 1, "KK"
    H.Add 2, "LL"

    I am worried about the Hash value is not identical or has collision.
    As to your other question - the algorithm is "my own invention" (inspired a bit
    by the Random-Key-Generator of the RC4-StreamCypher-algo).

    And that two different String-Keys produce the same Hash-Value is entirely
    normal - and expected in this scenario...

    The Hash-Algorithm is only expected to deliver:
    - always the same HashValue for a given Key and a given HashTable-Size
    - thereby maintaining a relative good distribution over the HashTable-Size (the amount ot buckets).

    The Zero you get for both "KK" and "LL" is "just another index" into the given HashTable.

    The HashTable works well enough and quite fast, when later, in any given Hash-Bucket,
    there's roughly 3-6 "data-index-pointers", after the expected amount of Key-Value-Pairs "has settled in".

    Example for the new Zip-Project (cHashD) in the opener-post, which has a slightly changed algo
    (now looping over WChars in a VB-SafeArray which was spanned over the String-Key):

    E.g. when you expect about 20000 Key-Value-Pairs in your Hashing-Dictionary
    (as is the default in the Test-Routine of the just mentioned new Demo),
    the calculation for the HashTable-Size (which prefers Powers of two) -
    comes up with 8192 (as the available amount of Buckets in the HashTable-Array).

    That means, that the later on added 20000 Key-Value-Pairswould need to be distributed
    (ideally) with 20000/8192 = 2.4 ---> meaning: "2-3 data-index-pointers per bucket".

    Now, since we cannot expect the Hashing-Algo to provide such an ideal
    distribution (especially not with such short String-Keys as used in the example),
    we have to be satisfied with the following output:
    (produced by the Friend Sub CheckHashDistribution() at the end of cHashD)
    Code:
    Distribution over a HashTable with 8192 slots:
    Used-HashSlots: 8109 
    Min-Entries: 1 
    Max-Entries: 6
    From an "ideally distributing" Hash-Algo, we would have expected:
    Code:
    Distribution over a HashTable with 8192 slots:
    Used-HashSlots: 8192 
    Min-Entries: 2 
    Max-Entries: 3
    But the results we achieved from the small and fast, but "less than ideal" algo are good enough,
    to outperform the MS-Scripting.Dictionary. Between the endpoints of 1 and 6 as seen in the
    Debug-Output, there'll be a gaussian normal-distribution IMO, so most of the entries *are*
    in the expected 2-3-range...

    So, the HashAlgo as seen in cHashList or (slightly changed in cHashD) is not thought
    as a "secure Hash with at least 128Bit, which minimizes the probability of a collision to near zero"
    (the Modulo-Op at the end the algo, which "folds" any calculated HashValue back onto
    the available Index-Space for the HashTable-Array prevents that anyways, as does
    the "only 32Bit long, randomizing carrier-variable" L).

    The quality of such fast performing "Short-Hashes" will only show in the achieved
    (hopefully equal) distribution across the available buckets (or Index-Slots).

    I hope that clears a few things up...

    Olaf
    Last edited by Schmidt; Aug 30th, 2016 at 09:55 PM.

  11. #11
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,181

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    BTW, The Microsoft Dictionary from the scrrun.dll uses next hashing algorithm for strings:
    Code:
    INT CalcHashFromStringItem (BSTR bstrKey) {
        INT iResult = 0;
        
        for (INT i = 0; i < SysStringLen(bstrKey); i++)
            iResult = iResult * 17 + bstrKey[i];
        
        iResult %= 1201;
        
        return iResult;
        
    }
    I make an analog for testing it in VB6:
    Code:
    Option Explicit
    
    Private Declare Function GetMem4 Lib "msvbvm60" ( _
                             ByRef src As Any, _
                             ByRef Dst As Any) As Long
    
    Private Sub Form_Load()
        Dim sKey    As String
        Dim pDic    As Dictionary
        Dim index   As Long
        
        Set pDic = New Dictionary
        
        For index = 0 To 10
            sKey = MakeRandomString()
            Debug.Print pDic.HashVal(sKey), Dictionaty_HashVal(sKey)
        Next
        
    End Sub
    
    Private Function MakeRandomString() As String
        Dim l As Long:  Dim i As Long
        
        l = Int(Rnd * 10) + 5
        
        Do While l
            
            i = Int(Rnd * 62)
            
            Select Case i
            Case Is < 10
                i = i + 48
            Case Is < 36
                i = i - 10 + 65
            Case Else
                i = i - 36 + 97
            End Select
            
            MakeRandomString = MakeRandomString & Chr$(i)
            
            l = l - 1
            
        Loop
        
    End Function
    
    Private Function Dictionaty_HashVal( _
                     ByRef sValue As String) As Long
        Dim index   As Long
        Dim result  As Long
        
    #If COMPILED Then
        
        ' // We can't use unsigned arithmetic without overflow in IDE
        For index = 0 To Len(sValue) - 1
            result = result * 17 + AscW(Mid$(sValue, index + 1, 1))
        Next
        
        result = result Mod 1201
        
    #Else
        
        ' // In order to bypass overflow error, we use currency 64-bit numbers
        Dim cResult As Currency
        
        For index = 0 To Len(sValue) - 1
        
            cResult = cResult * 17 + AscW(Mid$(sValue, index + 1, 1)) / 10000@
            
            ' // Zero high dword
            If cResult > 429496.7295@ Then
                GetMem4 0&, ByVal VarPtr(cResult) + 4
            End If
            
        Next
        
        cResult = cResult * 10000@
        result = cResult - Int(cResult / 1201@) * 1201@
        
    #End If
        
        Dictionaty_HashVal = result
        
    End Function
    Also i attached the project that allows make some tricks with MS Dictionary (sorting/get key/etc.). This knowledge had been obtained during reversing scrrun.dll code on Windows 7 x64 therefore it can distinct on the different platforms.
    Name:  DicViever.png
Views: 2681
Size:  14.7 KB
    Attached Files Attached Files

  12. #12
    Hyperactive Member
    Join Date
    Aug 2016
    Posts
    437

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    You are the genius, The Trick.
    I didn't see VB's implementation of C# string GetHashCode. From Reflector we can see the code, but hard part is that VB's Long is not equal C#'s Int. Probably we will get Over Flow error during LeftShift or RightShift. If we have to use BigInteger, It is harder.

    Code:
    public override unsafe int GetHashCode()
            {
                if (HashHelpers.s_UseRandomizedStringHashing)
                {
                    return InternalMarvin32HashString(this, this.Length, 0L);
                }
                fixed (char* str = ((char*) this))
                {
                    char* chPtr = str;
                    if (chPtr != null)
                    {
                        chPtr += RuntimeHelpers.OffsetToStringData;
                    }
                    int num = 0x15051505;
                    int num2 = num;
                    int* numPtr = (int*) chPtr;
                    int length = this.Length;
                    while (length > 2)
                    {
                        num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
                        num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
                        numPtr += 2;
                        length -= 4;
                    }
                    if (length > 0)
                    {
                        num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
                    }
                    return (num + (num2 * 0x5d588b65));
                }
            }
    Last edited by DaveDavis; Aug 31st, 2016 at 06:32 PM.

  13. #13

  14. #14
    Lively Member
    Join Date
    Aug 2016
    Posts
    101

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by The trick View Post
    What do you mean?

  15. #15
    Lively Member
    Join Date
    Aug 2016
    Posts
    101

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?

  16. #16

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Resurrected View Post
    I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?
    Not really (not by much).

    See, the MS-Scripting.Dictionary *is* already implmented in C/C++ (by people who knew their stuff) -
    and still the VB-Implementation (along with the HashFunction I've used) runs circles around it,
    when larger amounts of Key/Value-Pairs are used within cHashD.

    The VB6-native compiler is based on VC++6 and produces decent enough binaries speedwise.
    What it then comes down to (as always), is the choosen algorithm (and a bit of experience,
    to avoid unnecessary memory-allocations)...

    Olaf
    Last edited by Schmidt; Sep 2nd, 2016 at 11:04 AM.

  17. #17

  18. #18

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by The trick View Post
    Resurrected, you can test my hash table. It works faster than Dictionary on my computer. This class has been written on VB6.
    Anatoli, this thread here is to discuss the Demo-Code for:
    - cHashList (roughly compatible to the VB-Collection)
    - cHashD (roughly compatible to the Scripting.Dictionary)

    I don't have anything against pointing out better alternatives in CodeBank-Threads,
    but if someone does, there should be a few good reasons, as e.g. better performance,
    or less and easier to understand code.

    Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard because:
    - it is much slower in most tests
    - and it contains much more code to read, maintain and understand (including assembler-blocks)

    Performance-Results, using your slightly adapted Test-Project: Trick Hash.zip
    Using 100000 Key/Value-pairs (as in your Demo) cHashD is:
    - about factor 4 as fast whilst adding these Key/Value-pairs
    - about factor 6 as fast with Keyed-Item access
    - about factor 5 as fast in an enumeration of Key/Value-pairs

    - about factor 1.5 slower in the For Each Enumeration of Keys

    So, besides the last point, there's only huge gains when using cHashD -
    (instead of the MS-Dictionary or your HashList-implementation)
    and all of them in areas which are really important for such a Container.

    Olaf
    Last edited by Schmidt; Sep 2nd, 2016 at 11:44 AM.

  19. #19
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,181

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Schmidt View Post
    Anatoli, this thread here is to discuss the Demo-Code for:
    - cHashList (roughly compatible to the VB-Collection)
    - cHashD (roughly compatible to the Scripting.Dictionary)
    Olaf, my apologies but i responded to Resurrected. I showed him that full analog of MS Dictionary written on VB6 works faster than origin written on C++. I showed him the FULL analog of Dictionary as response to:
    I have been wondering, although it's a VB Forum, that if this whole thing is implemented in C++, will it be much faster? What's the difference if it's implemented in C++ and used by VB?
    I don't have something against your class, i answered to Resurrected post. I was doing my class as replacement for MS Dictionary.
    As test of performance:

    Your class is far NO analog of Dictionary:
    1. Your class have only string keys, dictionary and mine variant. And all tests works with string keys and String variables therefore it faster than mine. My class can use any keys 1, 1.0434@, &H100, Me, Null, Empty, etc like Dictionary, your doesn't. Just look at my CalcHash procedure.
    2. If i compile project and tries to access to your cHashD through For Each it get crash (bug?), therefore i can't check performance through For each.
      Name:  HD_Bug.jpg
Views: 2617
Size:  28.5 KB
    3. Your code can cause potential memory leak. For example:
      Code:
      Dim z   As Long
      Dim p   As Long
      Dim l   As Long
      
      For z = 0 To 100
          For p = 0 To UBound(HD.Items)
              l = l + Len(HD.ItemByIndex(p))
          Next
      Next
      
      MsgBox l
      Compile and look memory usage. Same with Keys properties.
    4. Also please make correct test in For each :
      Code:
      For Each dat In HD.Items
          out = out + Len(dat)
      Next
    5. I can't add Object item. This is probably bug you should use either VariantCopy or make checking IsObject.
    6. I've not found Remove method. How can i remove an item from your hash table by a key/index?
    7. In the most cases i can just replace Dictionary without changing any source code by my hash table. Your code requires to do changes in source code.


    Again, i don't have something against your class. I just wanted to answer to the Resurrected's post and as the example i suggested my hash table. I didn't tell that my class is better than you or something else. Our classes has different functional your class is quite small and understandable, mine is more functional but bigger and slower, but as an analog of Dictionary it's better (this class was creating as replacement of Dictionary):
    Represent a standalone class implements a hash table, which in many cases can be a substitute for the dictionary (Dictionary) of Scripting runtime.
    Regarding,
    Krivous Anatoly (The trick).

  20. #20

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Just a few short comments to the above:

    Thanks for pointing out the Bug in the Add-method (which is now accepting Objects as Items, too).

    As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
    to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
    Assembly-stuff, sitting in the same project).

    As for your HashTableClass (since you made it a topic here):
    - no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)
    - and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.
    - no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)

    As for the whole topic of "being compatible to the MS-Dictionary":
    - neither cHashD nor clsTrickHashTable are fully compatible (the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible")
    - my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)
    - with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%
    - I will not do that for cHashD

    Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
    changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).

    Here's an example, which makes good use of quite a lot of Scripting.Dictionary methods - and
    both (cHashD and clsTrickHashTable) can be used without changes in that WordCounting-Scenario:

    Here's the test-results for scanning a "work of Shakespeare-Text-File": http://www.gutenberg.org/files/100/
    Performance in Binary-CompareMode (case-sensitive):
    - 465msec (cHashD)
    - 755msec (Dictionary)
    - 1201msec (clsTrickHashTable)

    Performance in Text-CompareMode (case-insensitive):
    - 647msec (cHashD)
    - 1277msec (clsTrickHashTable)
    - 3341msec (Dictionary)

    Your class is not bad (especially in TextCompareMode) - but as said, not always faster than the MS-Dictionary.

    Here's the WordScan-Test-Code: FastWordScan.zip

    And here a ScreenShot:


    Thanks again for testing cHashD, pointing out that Object-Add-Bug (the Zip-Download in the Opener-Post is now updated).

    Olaf
    Last edited by Schmidt; Sep 3rd, 2016 at 06:49 PM. Reason: Corrected the link to the WordScan-Demo

  21. #21
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,181

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    As for "the crash" ... I tried my best to reproduce it (unsuccessfully), but changed the return-values of .Items() and Keys()
    to a normal Variant-Container now anyways (in my opinion there's a good chance, that the crash has more to do with your
    Assembly-stuff, sitting in the same project).

    - no, it is not a full replacement for the MS-Dictionary (for that you will have to change your Item-Prop-Implementation)
    What do you mean?
    - and no, it is only partly an example to serve as "VB-Code being faster than C" because it is quite an amount slower than the MS-Dictionary in Binary-Comparemode.
    I can bet with you. This is another test:
    Code:
    Option Explicit
    
    Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
                             ByRef lpFrequency As Any) As Long
    Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
                             ByRef lpPerformanceCount As Any) As Long
                             
    Dim d   As Dictionary
    Dim t   As clsTrickHashTable
    
    Private Sub Form_Load()
        Dim index   As Long
        Dim cFreq   As Currency
        Dim cVal    As Currency
        Dim time    As Currency
        Dim tDic    As Currency
        Dim tTrick  As Currency
        Dim sKeys() As String
        
        MakeRandomKeys sKeys()
        
        QueryPerformanceFrequency cFreq
        
        Set d = New Dictionary
        Set t = New clsTrickHashTable
        
        ' // Check binary compare
        t.CompareMode = BinaryCompare
        d.CompareMode = BinaryCompare
        
        QueryPerformanceCounter time
        
        For index = 0 To UBound(sKeys)
            d.Add sKeys(index) & index, Me
        Next
        
        QueryPerformanceCounter tDic: tDic = tDic - time
        
        QueryPerformanceCounter time
        
        For index = 0 To UBound(sKeys)
            t.Add sKeys(index) & index, Me
        Next
        
        QueryPerformanceCounter tTrick: tTrick = tTrick - time
        
        MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _
               "TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine
    End Sub
    
    Private Sub MakeRandomKeys( _
                ByRef sKeys() As String)
        Dim index   As Long
        Dim lLen    As Long
        Dim char    As Integer
        
        ReDim sKeys(500000)
        
        For index = 0 To UBound(sKeys)
            
            lLen = Int(Rnd * 50) + 10
                
            Do While lLen
                
                char = Int(Rnd * 62)
                
                Select Case char
                Case Is < 10:   char = char + 48
                Case Is < 36:   char = char + 55
                Case Else:      char = char + 61
                End Select
                
                sKeys(index) = sKeys(index) & Chr$(char)
                
                lLen = lLen - 1
                
            Loop
        
        Next
        
    End Sub
    On my computer Windows 7 x64 SP1 AMD Athlon(tm) II X2 245 2.90 Ghz 6GB RAM:
    ---------------------------
    Project1
    ---------------------------
    Dictionary: 24905,1339625567 ms.

    TrickHashTable: 15037,1265761431 ms.


    ---------------------------
    ОК
    ---------------------------

    Overall (mine and yours tests) both Dictionary and TrickHashTable work identically with Binary compare mode (however i wrote so in the original topic):
    Run fast enough on my computer about as well (even a bit faster) as a dictionary with binary comparison...
    With Text compare mode my class works faster than Dictionary.
    - no, my For Each (over the Keys) did the same as yours (since your Class was saying: hsh.EnumMode = ENUM_BY_KEY = True)
    No, you enumerated String keys. Mine Variant, ie. any Object, Boolean, String, etc. simultaneously.
    the difference is that you claim that to be the case, whereas I did not. I said "roughly compatible"
    Then why did you compare our classes? Maybe i'm wrong but you didn't say why my class isn't FULL compatible with MS Dictionary. Please tell me then i make it full compatible.
    - my aim is, to cover about 90% of the use-cases (and no, the .Remove-Single-Item call is not among those)
    The controversial statement, my class is replacement of MS Dictionary. Dictionary/Collection contains Remove method.
    - with a bit of additional work, you could make yours probably 99% compatible - maybe even 100%
    Please tell me what should i add in order to make full compatibility.
    Nevertheless, one can throw all three Class-Types into a real-world project - and use them there without
    changing the code (much). In most cases only the ClassVariable-Definitions have to be changed (type-wise).
    Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative. You can add 100 arrays with different keys types, but it won't be replacement.
    since you made it a topic here
    You became to compare our classes.
    About your test. Your class don't support different keys types (like TrickHashTable, Dictionary). For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:
    Your HashTable-Class-Implementation is not really an alternative to cHashD in either regard
    I can also write same.
    Regarding,
    Krivous Anatoly (The trick)

  22. #22
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,114

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    @trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"

    I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:

    Code:
    Sub Test()
       Dim x As New Scripting.Dictionary
       
       x.Item("Test") = "Hello"
       Debug.Print x.Item("Test")
       
       Dim y As New clsTrickHashTable
       
       y.Item("Test") = "Goodbye"
       Debug.Print x.Item("Test")
    End Sub

  23. #23
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,181

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by jpbro View Post
    @trick - Olaf alluded to a problem with your hash table re: 100% compatibility with the scripting Dictionary object when he said: "(for that you will have to change your Item-Prop-Implementation)"

    I think he was referring to the fact that the Dictionary Item property Let will perform an Add operation if the item doesn't already exist, whereas your class will raise an Error 5. See this test:

    Code:
    Sub Test()
       Dim x As New Scripting.Dictionary
       
       x.Item("Test") = "Hello"
       Debug.Print x.Item("Test")
       
       Dim y As New clsTrickHashTable
       
       y.Item("Test") = "Goodbye"
       Debug.Print x.Item("Test")
    End Sub
    Thank you very much. I'll fix it.

  24. #24

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by The trick View Post

    I can bet with you. This is another test:
    Code:
    Option Explicit
    
    Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
                             ByRef lpFrequency As Any) As Long
    Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
                             ByRef lpPerformanceCount As Any) As Long
                            
    Dim d   As Dictionary
    Dim t   As clsTrickHashTable
    
    Private Sub Form_Load()
        Dim index   As Long
        Dim cFreq   As Currency
        Dim cVal    As Currency
        Dim time    As Currency
        Dim tDic    As Currency
        Dim tTrick  As Currency
        Dim sKeys() As String
        
        MakeRandomKeys sKeys()
        
        QueryPerformanceFrequency cFreq
        
        Set d = New Dictionary
        Set t = New clsTrickHashTable
        
        ' // Check binary compare
        t.CompareMode = BinaryCompare
        d.CompareMode = BinaryCompare
        
        QueryPerformanceCounter time
        
        For index = 0 To UBound(sKeys)
            d.Add sKeys(index) & index, Me
        Next
        
        QueryPerformanceCounter tDic: tDic = tDic - time
        
        QueryPerformanceCounter time
        
        For index = 0 To UBound(sKeys)
            t.Add sKeys(index) & index, Me
        Next
        
        QueryPerformanceCounter tTrick: tTrick = tTrick - time
        
        MsgBox "Dictionary: " & (tDic / cFreq) * 1000 & " ms." & vbNewLine & _
               "TrickHashTable: " & (tTrick / cFreq) * 1000 & " ms." & vbNewLine
    End Sub
    
    Private Sub MakeRandomKeys( _
                ByRef sKeys() As String)
        Dim index   As Long
        Dim lLen    As Long
        Dim char    As Integer
        
        ReDim sKeys(500000)
        
        For index = 0 To UBound(sKeys)
            
            lLen = Int(Rnd * 50) + 10
                
            Do While lLen
                
                char = Int(Rnd * 62)
                
                Select Case char
                Case Is < 10:   char = char + 48
                Case Is < 36:   char = char + 55
                Case Else:      char = char + 61
                End Select
                
                sKeys(index) = sKeys(index) & Chr$(char)
                
                lLen = lLen - 1
                
            Loop
        
        Next
        
    End Sub
    Nope, no success here regarding "your bet", since on my Win8.1-system
    I get (native compiled of course) the following results with your above code-snippet:


    That the Dictionary performs similarily to your Class might be due to a better Mem-Allocator
    in Win8.1 (compared to the one in your Win7) - but more likely it is the CPU-differences (AMD vs. Intel)
    which is favouring one or the other implementation (and theres by far more Intel-based machines
    out there these days than AMD-powered ones).

    Also note, that cHashD is more than 10 times faster than either of the other Dictionaries
    in such a "mass amount test" (like your 500,000 entries) of Key-Value-Pairs.

    Quote Originally Posted by The trick View Post
    Okay, if you compare our classes and MS Dictionary you should suggest alternative. Your class isn't alternative.
    I thought I already gave a real-world-example (the Word-Counting one), where each of our Classes
    can be used as a simple "drop-in" replacement to the MS-Dictionary (without changing the CodeBase,
    by just changing the Class-Types).

    As I already said, I don't aim to be fully compatible to the MS-Dictionary - but cHashD is "compatible enough"
    to be used as a direct replacments in 90% of the real-world-scenarios out there (that's what I meant with "use-cases").

    These real-world-scenarios involve:
    - first and foremost: fast adding and retrieving of Variant-Items per StringKey
    - to a lesser extend: a fast Exists-Method (also mostly used with StringKeys)
    - the ability to exchange an Item behind an existing Key to a different one
    - and occasionally the usage of the RemoveAll-Method
    - fast enumeration of the complete Key-Value-Pair by accessing them per Index also comes up quite often

    The last one is not available in the MS-Dictionary or in your Class - but it is a real-world-requirement
    which is very handy to have in many situations.

    And in all my decades of experience with real-world applications, I very rarely needed the Remove-Method
    on such Key/Value-pair containers (your mileage may vary).

    Quote Originally Posted by The trick View Post
    You became to compare our classes.
    Well, I didn't intend to - not here in this CodeBank-Thread - it was you who brought clsTrickHashTable into the discussion.

    Quote Originally Posted by The trick View Post
    Your class don't support different keys types (like TrickHashTable, Dictionary).
    It does support Integer Keys (Byte, Integer, Long up to Currency) and "Float-Keys" (Singles, Dates and Doubes) now.
    In these modes cHashD will act even faster than with String-Keys.

    What it doesn't support is, to use these different Key-Types in a "mixed way" - (would appreciate though,
    when you could give me a real-world-example, where you used such "mixed-Keys-capabilities" to great success).

    Quote Originally Posted by The trick View Post
    For example if i want to have associate an object with other object i can't do it through your class. Of course you can say "my class shouldn't support it" - okay, then don't compare them:
    Well, it was easy enough to add that feature with a dozen lines of code (have now updated
    the HashD.zip-Download at the end of the Top-Posting in this thread accordingly)...

    Besides having it added now (just for kicks) - I'd like to see a real-world-example from you, where such Object-associations per Dictionary would be "nice to have"...
    Usually when I need that, I simply add a Property:
    Code:
    Public AssociatedObject As Object
    ...into the Class in question - later assign the "Side-Object" to it and be done with the task.

    As for your: "don't compare our Classes"... why not, when they both are useful in scenarios,
    where the VB6-community formerly used the MS-Scripting.Dictionary.

    cHashD is extremely fast (especially with a larger amount of Key/Value-pairs, where the
    over-all-timings are often in the range of seconds) - scenarios similar to my WordCounting-example
    are a quite common use-case.

    In Use-Cases where e.g. the Remove-Method becomes paramount, your Class seems to be a good alternative, too.

    Olaf

  25. #25
    Hyperactive Member
    Join Date
    Aug 2016
    Posts
    437

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    If we maintain Hash value in sorted Asc Order, whatever inserting or adding or changing a Key, then FindIndex or Key removing is faster by using Binary Search.

  26. #26
    Lively Member
    Join Date
    Mar 2012
    Posts
    68

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.

    I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
    My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).
    My app - https://github.com/ADIAProject/DIA

    The program need the driver packages (DP*.7z) - can be downloaded:
    http://driveroff.net/SamDrivers_16.8.torrent
    http://driverpacks.net/downloads

  27. #27

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Romeo91 View Post
    In tests, it looks good, it cHashD better than other analogs in speed. But in the real application I see a noticeable performance drop.

    I use methods add, item, exist, removeall, type of comparison-vbbinarycompare.
    My program works with a lot of text information and uses the dictionary to find duplicate rows (exist method).
    First, to get the best speed out of VB6-code, you should compile such performance-critical stuff
    natively -> with all extended Options.

    In your project (in case you "hung-in" cHashD as is) this is not the case.

    If you want to retain your current Compile-Settings as they are in your Main-Project,
    then compile cHashD (with all extended native Compile-Options) into an ActiveX-Dll.

    That said, for your special case you can also fine-tune cHashD "as you need it"
    (it's codebase is simple and small enough for that)...

    Also, I've seen a lot of lines in your code like:
    Code:
    If Not Dictionary.Exists(SomeKey) Then
       Dictionary.Add SomeKey, SomeItem
    End If
    That means, that you already ensure uniqueness in said Dictionaries on your own ("by hand") -
    and thus you could instruct cHashD, to leave out its Test for Uniqueness in the Add-Method
    (either by setting the Public Prop UniqueKeys to False after Instantiation, or already ensuring
    mUniqueKeys to False in Class_Initialize - as the new Default).

    Also, in case you use only String-Keys, why not changing all Key-Parameters to Strings instead of
    Variants - that will give it another little performance-kick.

    What's also important is, that you can give hints to cHashD about the (roughly) expected
    number of entries - in your case these seem to be "number of Text-Lines in *.inf-Files" or something...

    What your performance-critical code also needs, is a bit more "structuring" IMO ...
    Long Procedures with about 1000 lines of code (like your DevParserByRegExp) are not
    really nice to read, nor easy maintainable.

    Not sure, whether the following example is useful for you, but it calculates the
    Line-differences of two "modern" (UTF16-LE) *.inf Files, as they exist these days e.g.
    in the MS-SxS Folder.

    The adapted cHashD-version in the following Demo-Zip: DeltaParsing.zip
    performs about 50% faster than the MS-Dictionary in this test (finding the LineDifferences
    of two blue-tooth related, nearly identical Inf-Files with about 32K WChars each.

    The time to read all the lines of these two files into two Dictionaries first:
    - cHashD needs about 2msec
    - MS-Dictionary about 3.7msec

    To find the Line-differences in these two Dictionaries:
    - cHashD needs about 0.4msec
    - MS-Dictionary about 0.6msec
    (the above measurement was done in a 100x repetition-loop, to allow for easier timing-comparison).

    As said, not sure whether such a Line-Delta-Scan is useful in your scenario,
    but if your App is thought to manage (or switch) between (nearly) identical Driver-Files,
    then I can imagine that this could be of help (comparing only the handful of different lines
    in two Files with a "deeper analysis" should be faster, than performing the deep analysis on
    the Files as a whole).

    HTH

    Olaf

  28. #28
    Hyperactive Member
    Join Date
    Jan 2015
    Posts
    276

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Olaf:
    Can u help make KeyByIndex property to be editable?

    I have and these codes to cHashD, but can not make it.



    Code:
    Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long
    
    Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS)
        If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
        Select Case mKeyMode
            Case eUseStrings:
                RHS = CStr(RHS)
                VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS)
            Case eUseIntegers:
                VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS)
            Case eUseDoubles:
                VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS)
        End Select
    End Property
    
    Public Property Set KeyByIndex(ByVal IndexZeroBased As Long, RHS)
        If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
        KeyByIndex(IndexZeroBased) = RHS
    End Property

    Code:
    Sub testll()
    Dim c As New cHashD
    c.Add 1, 11
    c.KeyByIndex(0) = "2"
    Debug.Print c.KeyByIndex(0)   'the key is still "1"
    End Sub

  29. #29

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by loquat View Post
    Can u help make KeyByIndex property to be editable?
    Why do you need that (I ask, because the need for something like that was near zero in the scenarios I've encountered over the last two decades).

    I mean, there is some scenarios where it is needed, but these I usually cover with DBs and SQL.

    Quote Originally Posted by loquat View Post
    I have and these codes to cHashD, but can not make it.

    Code:
    Private Declare Function VariantCopyLng Lib "oleaut32" Alias "VariantCopy" (Dst As Long, Src As Long) As Long
    
    Public Property Let KeyByIndex(ByVal IndexZeroBased As Long, RHS)
        If IndexZeroBased < 0 Or IndexZeroBased >= mCount Then Err.Raise 9
        Select Case mKeyMode
            Case eUseStrings:
                RHS = CStr(RHS)
                VariantCopyLng ByVal VarPtr(mKeys(IndexZeroBased)), ByVal VarPtr(RHS)
            Case eUseIntegers:
                VariantCopyLng ByVal VarPtr(mCurs(IndexZeroBased)), ByVal VarPtr(RHS)
            Case eUseDoubles:
                VariantCopyLng ByVal VarPtr(mDbls(IndexZeroBased)), ByVal VarPtr(RHS)
        End Select
    End Property
    There's no need to overcomplicate things with API-calls in this case -
    simple Value-assignments would be enough, like: mKeys(IndexZeroBased) = CStr(RHS)

    But with something like that, you're only "halfway-there", because the old Key you have just overriden,
    will still link to the old Index-Position of Keys and Value (when doing a Hash-based search), -
    and for the new Key no such Hash-Link exists yet.

    So in general, such requirements are solved, by removing the old Pair (OldKey and OldItemValue) completely from the List -
    followed by adding the new Pair (NewKey/OldItemValue) over the normal Add-mechanism.

    Olaf

  30. #30

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Update: The latest version is contained in the Zip below, with the following benefits:
    - better performance
    - RemoveByIndex and Remove (by Key) is now included
    - variable Keys (incl. Objects), which can also be mixed

    Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):


    Ok, here the Zip with the new version (including the little PerformanceTest).
    cHashDWithRemove.zip

    Olaf

  31. #31
    Member
    Join Date
    May 2020
    Posts
    35

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Schmidt View Post
    Update: The latest version is contained in the Zip below, with the following benefits:
    - better performance
    - RemoveByIndex and Remove (by Key) is now included
    - variable Keys (incl. Objects), which can also be mixed

    Here's the result of a performance-comparison to the MS-Scripting.Dictionary (native compiled, all options checked):


    Ok, here the Zip with the new version (including the little PerformanceTest).
    cHashDWithRemove.zip

    Olaf
    I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!
    Also on VB6 try with: Const Count As Long = 1048500

    Code:
    Sub HashTable_Dictionary()
        Const Count As Long = 1048500
        Dim SD As Scripting.Dictionary
        Dim i As Long, T@
        Set SD = New Scripting.Dictionary
        
        Dim HD As cHashTable
        Set HD = New cHashTable
        
        T = msTimer
        For i = 1 To Count
            SD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i
        Next
        Debug.Print "Dictionary - AddItems", msTimer - T, SD.Count
        
        T = msTimer
        HD.ReInit Count
        For i = 1 To Count
            HD.Add "khdkjhjhkjhkhkhkhkjhk " & i, i
        Next
        Debug.Print "HashTable - AddItems", msTimer - T, HD.Count
    End Sub
    Attachment 180419

  32. #32
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Posts
    3,931

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by PhuongNam View Post
    I just tried to copy this post code and try to build it to DLL and try it in VBA environment, the result of hash table runs slower than Ms's Dictionary !!!
    Wild guess: you missed this text in parentheses in OP -- "(native compiled, all options checked)"

    *Everything* in the IDE/p-code is slow because array index bounds checking cannot be switched off. Only *native* compiled executable has the benefit of accessing entries in a buffer without the overhead.

    cheers,
    </wqw>

  33. #33
    Addicted Member
    Join Date
    Jun 2017
    Posts
    199

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Hi ~ Schmidt

    why when string array , it error ?

    dim w() As string, saW As SAFEARRAY1D
    BindArray w, VarPtr(saW) ' ----> VB6 show it is error !
    Last edited by quickbbbb; Sep 29th, 2021 at 12:50 AM.

  34. #34
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    143

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Schmidt View Post
    Not much to add to the Threads Title...

    the implementation of cHashList comes in only about 100 lines of code, and it can
    be used as a VB.Collection-compatible replacement with much better performance. ... .. .
    ...
    ..
    .
    Olaf
    Dear Schmidt,

    Such sleek code (as usual) and such amazing speed (in comparison to Dictionary, etc.). A big wow, as usual. Thanks a TON.

    Well, I tried out your code (just the adding routine alone, commenting out the rest, incl. the code related to the scripting Dictionary) and was curious to know as to what is the largest Count your hash table would support. In my system with 16GB RAM, I was able to add 450,83,000 records when I just added a long value as key as follows. More than 450,83,000 records, it resulted in error (Run-time error '7': Out of memory).
    Code:
    For i = 1 To Count: HD.Add i, i: Next
    When I added a two-characters string as follows, I was able to add around 245,00,000 records. More than that, it resulted in error (Run-time error '7': Out of memory).
    Code:
    For i = 1 To Count: HD.Add "S " & i, i: Next
    So, obviously, more the length of the key, less the no. of records I was able to add. Perhaps in a system with only 2GB memory, may be I will be able to add still further less records only for the same above codes. So, in real time, when my app is adding some records in user-end, how will my code be able to know that it might result in 'Out of memory' at some point of time and hence can't keep adding more records (and hence have to inform the user)? As such, as of now, I don't have the need to add 450,00,000+ records but I am just curious to know as to how to find out in real time whether adding any further records rmay result in 'out of memory' situation and inform the user accordingly OR take some other action.

    Ideally, upto what value of Count can I safely use your 'hash table code' in any system (assuming 2GB is the lowest RAM I might encounter in an user's system)?

    Kind regards.

    EDIT-1:
    I am using your latest code only (cHashDWithRemove)

    EDIT-2:
    My above observations were with the compiled code only; not in the IDE. By the way, I sometimes observed that for large values of Count, even though .Add (adding records) worked correctly uptil Count, if I give HD.Clear later (say, after printing HD.Count), it caused "Run-time error '7': Out of memory"

    Kind regards.
    Last edited by softv; Nov 9th, 2021 at 04:30 AM.

  35. #35
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    143

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by Schmidt View Post
    Update: The latest version is contained in the Zip below, with the following benefits:
    - better performance
    - RemoveByIndex and Remove (by Key) is now included
    - variable Keys (incl. Objects), which can also be mixed
    ... .. .
    Ok, here the Zip with the new version (including the little PerformanceTest).
    cHashDWithRemove.zip

    Olaf
    Dear Schmidt,

    I tried the following code, wherein I have added duplicate keys and also tried to fetch all the items for a particular duplicate key.
    Code:
     
      HD.Add 1, 1
      HD.Add 2, 2
      HD.Add 1, 2
      HD.Add 1, 3
      HD.Add 1, 4
      HD.Add 3, 3
      HD.Add 1, 5
      
      For i = 6 To 50000: HD.Add i, i: Next
      
      Dim v() As Variant    
      v = HD.ItemsAll(1)
      Debug.Print UBound(v)
    For "ItemsAll" function, I wrote my own code as follows as I could not find a way with your existing code to get that info. Sorry if a way exists and I did not know it. Kindly let me know that way, if it exists.

    The following code works correctly for the above case but I am not sure whether it will work perfectly for all cases. Also, I think there must be a more optimal way to do it. Can you please provide me the optimal code for "ItemsAll", when your time permits, please.

    I did not know how to use IndexByKey to find out the last index of a non-unique key. IndexByKey returns the first found index of a duplicate key only. Ideally, FindFirst, FindNext, and FindLast for any duplicate key would be fine. Can you provide me with the optimal codes for these functions as I don't know how to write them in the perfect/optimal way. Thanks in advance.

    Code:
    Public Function ItemsAll(key) As Variant() 'hand-out all the items for a non-unique Key
      If mCount = 0 Then ItemsAll = Array(): Exit Function
      
      Dim i As Long, j As Long, v()
      Dim itemsCnt As Long
      Dim idx As Long
      
      idx = IndexByKey(key)
      ReDim v(0 To mCount - (idx + 1))
      
      itemsCnt = 0
      j = idx
      For i = idx To mCount - 1
        Do While mValues(j) = NoEntry: j = j + 1: Loop
        If KeyByIndex(i) = key Then
          VariantCopy v(itemsCnt), ByVal VarPtr(mValues(j))
          itemsCnt = itemsCnt + 1
        End If
        j = j + 1
      Next
      
      ReDim Preserve v(0 To itemsCnt - 1) As Variant
      
      ItemsAll = v
      
    End Function
    Kind regards.
    Last edited by softv; Nov 9th, 2021 at 04:04 AM.

  36. #36
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,114

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:

    Code:
    Option Explicit
    
    Sub Test()
       Dim lo_Lists As cHashD
       Dim lo_List As RC6.cArrayList
       Dim l_Key As String
       Dim ii As Long
       
       Randomize
       
       Set lo_Lists = New cHashD
       
       For ii = 1 To 5000
          l_Key = Int(Rnd * 10) + 1
          
          If lo_Lists.Exists(l_Key) Then
             Set lo_List = lo_Lists(l_Key)
          Else
             Set lo_List = New_c.ArrayList(vbLong)
             lo_Lists.Add l_Key, lo_List
          End If
          
          lo_List.Add Int(Rnd * 100) + 1
       Next ii
       
       l_Key = Int(Rnd * 10) + 1
       
       Set lo_List = ItemsAll(lo_Lists, l_Key)
       
       For ii = 0 To lo_List.Count - 1
          Debug.Print "Item #" & ii + 1 & " for key " & l_Key & ": " & lo_List(ii)
       Next ii
    End Sub
    
    Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList
       If po_HashList.Exists(p_Key) Then
          Set ItemsAll = po_HashList(p_Key)
       Else
          Err.Raise vbObjectError, , "Key not found."
       End If
    End Function
    If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.
    Last edited by jpbro; Nov 9th, 2021 at 08:51 AM.

  37. #37
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    143

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by jpbro View Post
    Perhaps a cHashD list of RC6.cArrayLists would be more appropriate? You use each cHashD key only once, and fill it with a cArrayList of values. Something like this:
    ... .. .
    ..
    .
    If you don't want the RC6 reference, you could always use a regular VB6 array, it just take a bit more work to manage the ReDimming efficiently.
    First of all, thank you sooo much for taking time to answer my query and that too in such a helpful manner. Thank you sooo much.

    Actually, in a project I commenced recently, I was/am handling it the way you have suggested only. I am reading a file once and holding its data in an array of structures. The structure has many fields and one of the date fields has multiple values, in 1000s, but for some keys alone. In other words, this date field can have a minimum of 1 value to any number of indeterminate values (which could be in 1000s). As written earlier, I am using variable array only for this field. And, as you said, Redim was an overhead since I had to Redim this array field (increasing by 1) as and when a new value came for the same key.

    A few days after working with the aforesaid structure, I started learning about Collections. The need to use Collections had not arisen earlier and hence I did not know anything about it, as such. Once I started learning about Collections, I came to know about Dictionaries too and that they were faster than Collections. And, I read on the internet (from multiple sources) that the best and ideal way to handle multiple values for a single key is to have them as separate key-value pairs. This was with respect to database tables though. But, I thought perhaps the same would be ideal for Collections/Dictionaries too. And, while searching for Collection/Dictionary alternatives, I came upon Olaf's CHashD class which was super super fast when compared to Dictionaries. That is when I tried it and I posted my query with respect to ItemsAll function.

    Well, if you feel even with 1000s of multiple values for a single key, the method which I was already employing and which is the one which you have also recommended is the best option, then I would go with it. I am not an expert in these areas. So, kindly give your confirmed opinion on whether this is the best way. Also, let me know what is the best way to redim. Since the count of the multiple values are not known beforehand, I am redimming every time a new value comes in for the same key. Is there a better way? I wish to mention that the number of keys is also in the 1000s.

    Note:
    Regarding RC6, I know about it well but because of lack of time, I am unable to explore, learn and integrate it with my app yet. And, a database table will not suit my case since I have to make some queries on user input instantly (as the user keeps typing) and display the muti-layer results real-time (on a microsecond level). Also, in addition to collections, I am not in a position to spend time to learn additionally about databases also (their integration, querying them, testing them, speed comparison, etc. etc.). So, a super-fast collection like Olaf's cHashTable will be the ideal one.

    Very kind regards.

  38. #38

    Thread Starter
    PowerPoster
    Join Date
    Jun 2013
    Posts
    6,049

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by softv View Post
    I am reading a file once and holding its data in an array of structures.

    The structure has many fields and one of the date fields has multiple values, in 1000s,
    Is there any chance we could see the definition of this "structure" (the UDT-definition)?
    (along with an info, which field of the structure is the "key-field")?

    Olaf

  39. #39
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,114

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):

    1. Using a cHashD object of cHashD object
    2. Using a cHashD object of RC6.cArrayList object
    3. Using a cHashD object of Variant Arrays

    Here's the code:

    Code:
    Option Explicit
    
    Sub Test(Optional po_Form As Form)
       Dim lo_Lists As cHashD
       Dim lo_List As cHashD
       Dim l_Key As String
       Dim ii As Long
       
       Randomize
       
       New_c.Timing True
       
       Set lo_Lists = New cHashD
       
       For ii = 1 To 5000
          l_Key = "KEY_" & Int(Rnd * 10) + 1
          
          If lo_Lists.Exists(l_Key) Then
             Set lo_List = lo_Lists(l_Key)
          Else
             Set lo_List = New cHashD
             lo_List.UniqueKeys = False
             
             lo_Lists.Add l_Key, lo_List
          End If
          
          lo_List.Add "", Int(Rnd * 100) + 1
       Next ii
       
       l_Key = "KEY_" & Int(Rnd * 10) + 1
       
       Set lo_List = ItemsAll(lo_Lists, l_Key)
       
       If po_Form Is Nothing Then
          Debug.Print New_c.Timing
       Else
          po_Form.Print New_c.Timing
       End If
    End Sub
    
    Private Function ItemsAll(po_HashList As cHashD, ByVal p_Key As String) As cHashD
       If po_HashList.Exists(p_Key) Then
          Set ItemsAll = po_HashList(p_Key)
       Else
          Err.Raise vbObjectError, , "Key not found."
       End If
    End Function
    
    Sub Test2(Optional po_Form As Form)
       Dim lo_Lists As cHashD
       Dim lo_List As RC6.cArrayList
       Dim l_Key As String
       Dim ii As Long
       
       Randomize
       
       New_c.Timing True
       
       Set lo_Lists = New cHashD
       
       For ii = 1 To 5000
          l_Key = "KEY_" & Int(Rnd * 10) + 1
          
          If lo_Lists.Exists(l_Key) Then
             Set lo_List = lo_Lists(l_Key)
          Else
             Set lo_List = New_c.ArrayList(vbLong)
             lo_Lists.Add l_Key, lo_List
          End If
          
          lo_List.Add Int(Rnd * 100) + 1
       Next ii
       
       l_Key = "KEY_" & Int(Rnd * 10) + 1
       
       Set lo_List = ItemsAll2(lo_Lists, l_Key)
       
       If po_Form Is Nothing Then
          Debug.Print New_c.Timing
       Else
          po_Form.Print New_c.Timing
       End If
    End Sub
    
    Private Function ItemsAll2(po_HashList As cHashD, ByVal p_Key As String) As RC6.cArrayList
       If po_HashList.Exists(p_Key) Then
          Set ItemsAll2 = po_HashList(p_Key)
       Else
          Err.Raise vbObjectError, , "Key not found."
       End If
    End Function
    
    Sub Test3(Optional po_Form As Form)
       Dim lo_Lists As cHashD
       Dim la_ListsIdx() As Long
       Dim la_List() As Variant
       Dim l_ListsIdx As Long
       Dim l_ListIdx As Long
       Dim l_Key As String
       Dim ii As Long
       
       Randomize
       
       New_c.Timing True
       
       Set lo_Lists = New cHashD
       ReDim la_ListsIdx(32)
       
       For ii = 1 To 5000
          l_Key = "KEY_" & Int(Rnd * 10) + 1
          
          If lo_Lists.Exists(l_Key) Then
             la_List = lo_Lists(l_Key)
             l_ListsIdx = lo_Lists.IndexByKey(l_Key)
             l_ListIdx = la_ListsIdx(l_ListsIdx) + 1
             If l_ListIdx > UBound(la_List) Then
                ReDim Preserve la_List(UBound(la_List) * 2)
             End If
             lo_Lists(l_Key) = la_List
             la_ListsIdx(l_ListsIdx) = l_ListIdx
          
          Else
             l_ListIdx = 0
             la_List = MakeList
             lo_Lists.Add l_Key, la_List
             l_ListsIdx = lo_Lists.Count - 1
             If l_ListsIdx > UBound(la_ListsIdx) Then
                ReDim Preserve la_ListsIdx(UBound(la_ListsIdx) * 2)
             End If
             la_ListsIdx(l_ListsIdx) = 0
          End If
          
          la_List(l_ListIdx) = Int(Rnd * 100) + 1
       Next ii
       
       For ii = 0 To lo_Lists.Count - 1
          la_List = lo_Lists.ItemByIndex(ii)
          ReDim Preserve la_List(la_ListsIdx(ii))
          lo_Lists.ItemByIndex(ii) = la_List
       Next ii
       
       l_Key = "KEY_" & Int(Rnd * 10) + 1
       
       la_List = ItemsAll3(lo_Lists, l_Key)
       
       If po_Form Is Nothing Then
          Debug.Print New_c.Timing
       Else
          po_Form.Print New_c.Timing
       End If
    End Sub
    
    Private Function MakeList() As Variant()
       ReDim MakeList(32)
    End Function
    
    Private Function ItemsAll3(po_HashList As cHashD, ByVal p_Key As String) As Variant
       If po_HashList.Exists(p_Key) Then
          ItemsAll3 = po_HashList(p_Key)
       Else
          Err.Raise vbObjectError, , "Key not found."
       End If
    End Function
    I'm not sure I've programmed the Variant Array approach optimally, as I didn't want to spend too much time on it. Fiddling with ReDim's and indexes is tiring when we have the cArrayList or cHashD classes handy. That said, I've tried to limit the # of ReDim's to save time, as ReDimmming+1 is a performance killer. Instead what I did is start with a "bucket" of 33 elements, then I keep doubling the size of the bucket as needed. This means we also need to track the current index for each bucket. That said, either I have a bug or the Variant Array list approach won't work with cHashD. I kept getting empty arrays back, so I suspect cHashD doesn't support Arrays as items.

    The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.

    Anyway,I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.

  40. #40
    Addicted Member
    Join Date
    Apr 2017
    Location
    India
    Posts
    143

    Re: Simple and fast, lightweight HashList-Class (no APIs)

    Quote Originally Posted by jpbro View Post
    I tried 3 different approaches (compiled with all advanced optimizations enabled except Assume No Aliasing):

    1. Using a cHashD object of cHashD object
    2. Using a cHashD object of RC6.cArrayList object
    3. Using a cHashD object of Variant Arrays

    ... .. .
    ..
    .

    The cArrayList approach was the fastest at ~7ms. The cHashD approach took ~35ms.

    Anyway, I would recommend using a cHashD object of cHashD objects if you don't want to use the faster RC6.cArrayList method. At least until someone comes along with a better option.
    Thank you so much once again for taking time amidst your own hectic schedules to chalk out such detailed scenarios and also test cases for each of them. It was really helpful. Today I could get time to run and understand the 1st case ("cHashD object of cHashD object"). I think I will go with it, until I am able to get the right time to integrate the fabulous RC6 in my apps (I am sure, some day in future, I will integrate RC6, and I am eagerly looking forward to that day too).

    Your idea (of cHashD object of cHashD object) just did not occur to my mind at all. Brilliant! Thanks a TON, once again.

    Kind regards.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width