Results 1 to 40 of 40

Thread: Binary Search Not Finding Capitals

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Binary Search Not Finding Capitals

    In my implementation of a Binary Search function, the return does not find a match of a Capitalized word.

    Obviously it should.
    What am I doing wrong?

    Here is my test code:
    Code:
    Dim myArr() As String
    
    
    Private Sub Form_Load()
    
        Command1.Caption = "Hyphenate"
        
        ReDim myArr(20)
        
        myArr(0) = "abacus, ab-a-cus"
        myArr(1) = "above, a-bove"
        myArr(2) = "congratulate, con-gra-tu-late"
        myArr(3) = "diamond, dia-mond"
        myArr(4) = "high, high"
        myArr(5) = "how, how"
        myArr(6) = "I, I"
        myArr(7) = "in, in"
        myArr(8) = "like, like"
        myArr(9) = "little, lit-tle"
        myArr(10) = "so, so"
        myArr(11) = "stardust, star-dust"
        myArr(12) = "Sweden, Swe-den"
        myArr(13) = "the, the"
        myArr(14) = "twinkle, twin-kle"
        myArr(15) = "up, up"
        myArr(16) = "what, what"
        myArr(17) = "wonderful, won-der-ful"
        myArr(18) = "workability, work-a-bi-li-ty"
        myArr(19) = "Yarmouth, Yar-mouth"
        myArr(20) = "Zurich, Zu-rich"
        
        Text1.Text = "diamond"
        
    End Sub
    
    
    Private Sub Command1_Click()
        
        Dim key As String
        key = Text1.Text
        
        ' Search for key in array
        Dim searchResult As Long
        searchResult = BinarySearch(myArr, key)
        
        ' Dispaly results
        If searchResult = -1 Then
            MsgBox "Not in dictionary!"
        Else
            
            MsgBox Mid(myArr(searchResult), 1, InStr(1, myArr(searchResult), ",") - 1) & _
                   " (" & _
                   Mid(myArr(searchResult), InStr(1, myArr(searchResult), ",") + 2) & _
                   ")"
        End If
    
    End Sub
    
    
    Private Function BinarySearch(aSortedArray() As String, key As String) As Long
        
        Dim minIndex     As Integer
        Dim maxIndex     As Integer
        Dim midIndex     As Integer
        
        minIndex = 0
        maxIndex = UBound(aSortedArray)
        
        Do While minIndex <= maxIndex
            midIndex = Int((minIndex + maxIndex) / 2)
            If key < aSortedArray(midIndex) Then
                maxIndex = midIndex - 1
            ElseIf key > aSortedArray(midIndex) Then
                minIndex = midIndex + 1
            End If
        Loop
        
        '--------------------------------------------------------------------------------------
        'Prevent "Subscript out of range" error if key > last array entry - KLUDGE!!!!!!!!!!!!!
        If minIndex > UBound(aSortedArray) Then minIndex = UBound(aSortedArray)
        Debug.Print minIndex
        '--------------------------------------------------------------------------------------
        
        If key = Mid(aSortedArray(minIndex), 1, InStr(1, aSortedArray(minIndex), ",") - 1) Then
            BinarySearch = minIndex
        Else
            BinarySearch = -1
        End If
    
    End Function

  2. #2
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: Binary Search Not Finding Capitals

    If you are going to compare if key = Mid(aSortedArray(minIndex), 1, InStr(1, aSortedArray(minIndex), ",") - 1) in the final compare then you should also compare key < Mid(aSortedArray(midIndex), 1, InStr(1, aSortedArray(midIndex), ",") - 1) not just key < aSortedArray(midIndex) in the actual search.

    Also note that there is StrComp built-in function which returns -1, 0 or 1 when first string is less than, equal to or greater than second string *and* supports case-insensitive compares which is probably what you want here too.

    cheers,
    </wqw>

  3. #3
    PowerPoster Arnoutdv's Avatar
    Join Date
    Oct 2013
    Posts
    5,872

    Re: Binary Search Not Finding Capitals

    Your first problem is that you search for "diamond"
    This string is NOT in your array.
    There is a string which starts with "diamond", but that's not a exact match.
    Code:
    myArr(3) = "diamond, dia-mond"
    Second if you want to do a case insensitive match then you should use the StrComp() function.

    Code:
    Option Explicit
    Private myArr() As String
    
    Private Sub Form_Click()
     Debug.Print BinarySearch("diamond", myArr) ' will fail
     Debug.Print BinarySearch("diamond, dia-mond", myArr)
     Debug.Print BinarySearch("Sweden, Swe-den", myArr)
     
    End Sub
    
    Private Sub Form_Load()
      
      ReDim myArr(20)
        
      myArr(0) = "abacus, ab-a-cus"
      myArr(1) = "above, a-bove"
      myArr(2) = "congratulate, con-gra-tu-late"
      myArr(3) = "diamond, dia-mond"
      myArr(4) = "high, high"
      myArr(5) = "how, how"
      myArr(6) = "I, I"
      myArr(7) = "in, in"
      myArr(8) = "like, like"
      myArr(9) = "little, lit-tle"
      myArr(10) = "so, so"
      myArr(11) = "stardust, star-dust"
      myArr(12) = "Sweden, Swe-den"
      myArr(13) = "the, the"
      myArr(14) = "twinkle, twin-kle"
      myArr(15) = "up, up"
      myArr(16) = "what, what"
      myArr(17) = "wonderful, won-der-ful"
      myArr(18) = "workability, work-a-bi-li-ty"
      myArr(19) = "Yarmouth, Yar-mouth"
      myArr(20) = "Zurich, Zu-rich"
        
    End Sub
    
    Public Function BinarySearch(ByVal sKey As String, sStringArray() As String) As Long
      Dim lFirst As Long, lLast As Long, lMiddle As Long
      Dim iCompare As Integer
      
      ' assume search failed
      BinarySearch = -1
      
      lFirst = 0
      lLast = UBound(sStringArray)
      
      Do
        lMiddle = (lFirst + lLast) \ 2
        iCompare = StrComp(sKey, sStringArray(lMiddle), vbTextCompare)
        If iCompare = 0 Then
          BinarySearch = lMiddle
          Exit Do
        ElseIf iCompare = 1 Then
          lFirst = lMiddle + 1
        Else
          lLast = lMiddle - 1
        End If
      Loop Until lFirst > lLast
      
    End Function

  4. #4
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Binary Search Not Finding Capitals

    Hi mms: I didn't test your code, but I have written binary search functions in the past. From experience, they can be a bit tricky at the edge conditions. Be sure to check both "even number item" lists and "odd number item" lists, and make sure you test for finding items right where things get split when doing the "binary cutting".

    I haven't written a binary search function in quite a while though, as our built-in Collection object does this for us, and you don't have to mess with sorting. It builds a binary tree as keys are added, which effectively allows quick binary searches of your list. If I have a list of keys (with no data values), I'll sometimes just create a Collection, and use 0& for all the data values, putting my list in the keys.

    Also, just as an FYI, key searches in a Collection are case insensitive.

    Just an idea for you. Personally, I love Collections.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  5. #5
    PowerPoster Arnoutdv's Avatar
    Join Date
    Oct 2013
    Posts
    5,872

    Re: Binary Search Not Finding Capitals

    Indeed, in this case a binary search can not be used directly on the array.
    The array should be a key-value pair array (using a user defined type) or even better a collection.
    Code:
    Option Explicit
    Private m_cLookUp As Collection
    
    Private Sub Form_Click()
      
      Debug.Print Hypenate("diamond")
      Debug.Print Hypenate("congratulate")
      Debug.Print Hypenate("bla")
      Debug.Print Hypenate("stardust")
     
    End Sub
    
    Private Sub Form_Load()
      Dim myArr() As String
      
      ReDim myArr(20)
        
      myArr(0) = "abacus|ab-a-cus"
      myArr(1) = "above|a-bove"
      myArr(2) = "congratulate|con-gra-tu-late"
      myArr(3) = "diamond|dia-mond"
      myArr(4) = "high|high"
      myArr(5) = "how|how"
      myArr(6) = "I|I"
      myArr(7) = "in|in"
      myArr(8) = "like|like"
      myArr(9) = "little|lit-tle"
      myArr(10) = "so|so"
      myArr(11) = "stardust|star-dust"
      myArr(12) = "Sweden|Swe-den"
      myArr(13) = "the|the"
      myArr(14) = "twinkle|twin-kle"
      myArr(15) = "up|up"
      myArr(16) = "what|what"
      myArr(17) = "wonderful|won-der-ful"
      myArr(18) = "workability|work-a-bi-li-ty"
      myArr(19) = "Yarmouth|Yar-mouth"
      myArr(20) = "Zurich|Zu-rich"
        
      Set m_cLookUp = ArrayToCollection(myArr)
    End Sub
    
    Private Function ArrayToCollection(aData() As String) As Collection
      Dim i As Long
      Dim aLine() As String
      
      Set ArrayToCollection = New Collection
      For i = 0 To UBound(aData)
        aLine = Split(aData(i), "|")
        ArrayToCollection.Add aLine(1), aLine(0)
      Next i
    End Function
    
    Private Function Hypenate(ByVal sWord As String) As String
      On Error GoTo notFound
      Hypenate = m_cLookUp(sWord)
      Exit Function
    notFound:
      Hypenate = "! No match for: " & sWord
    End Function

  6. #6
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: Binary Search Not Finding Capitals

    How big is that Array?
    IMO a BinarySearch only makes sense if we're talking a few thousand entries.
    An alternative might be the Filter-Function for String-Arrays.
    The Filter-Function works like Excel's Filter-Option "contains", so a search for "mo" returns your members 3 and 19
    Code:
    Dim myArr() As String
    Dim arrResult() As String
    Dim s As String
    Dim i As Long
        
        ReDim myArr(20)
        s = "diamond"
        myArr(0) = "abacus, ab-a-cus"
        myArr(1) = "above, a-bove"
        myArr(2) = "congratulate, con-gra-tu-late"
        myArr(3) = "diamond, dia-mond"
        myArr(4) = "high, high"
        myArr(5) = "how, how"
        myArr(6) = "I, I"
        myArr(7) = "in, in"
        myArr(8) = "like, like"
        myArr(9) = "little, lit-tle"
        myArr(10) = "so, so"
        myArr(11) = "stardust, star-dust"
        myArr(12) = "Sweden, Swe-den"
        myArr(13) = "the, the"
        myArr(14) = "twinkle, twin-kle"
        myArr(15) = "up, up"
        myArr(16) = "what, what"
        myArr(17) = "wonderful, won-der-ful"
        myArr(18) = "workability, work-a-bi-li-ty"
        myArr(19) = "Yarmouth, Yar-mouth"
        myArr(20) = "Zurich, Zu-rich"
        arrResult = Filter(myArr, s, True, vbTextCompare)
        For i = LBound(arrResult) To UBound(arrResult)
            Debug.Print arrResult(i)
        Next
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  7. #7

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    OK thanks for all the advice.

    Arnoutdv's collection code works and I can use it.
    Zvoni's Filter function code makes me aware of a new function.

    Being as stubborn as I am, I am still trying to figure out what I am doing wrong with my approach
    wqweto offers some clues, but as of yet I can still not figure out where/how to implement his suggestions.

  8. #8
    PowerPoster Arnoutdv's Avatar
    Join Date
    Oct 2013
    Posts
    5,872

    Re: Binary Search Not Finding Capitals

    Two things.

    1.
    When using a string comparison using the = operator it will "fail" when mixing upper and lower case.
    "aa" is not equal "Aa". Better use the StrComp() method by passing vbTextCompare as the compare parameter.

    2.
    You are actually looking for substrings in your string array myArr
    "diamond" will never match "diamond, dia-mond"
    To do a partial compare you need to do something like this:
    Code:
    ' If key < aSortedArray(midIndex) Then
    If key < Left$(aSortedArray(midIndex), 1, Len(key) Then

  9. #9
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Text mode comparing has other pitfalls because it takes additional lexical rules into account such as ligatures:

    Code:
    Option Explicit
    Option Compare Text
    
    Private Sub Main()
        'Western locale, e.g. US English:
        MsgBox "Ae" = "Æ"
        MsgBox StrComp("Ae", "Æ", vbTextCompare)
    End Sub

  10. #10
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Another possibility:

    Code:
    Option Explicit
    
    Private RS As ADODB.Recordset
    
    Private Sub Command1_Click()
        With RS
            On Error Resume Next
            .Find "[Word] = '" & Replace$(Text1.Text, "'", "''") & "'", _
                  0, _
                  adSearchForward, _
                  adBookmarkFirst
            If Err Then
                Text2.Text = "*Error " & CStr(Err.Number) & " " & Err.Description & "*"
                On Error GoTo 0
            Else
                On Error GoTo 0
                If .EOF Then
                    Text2.Text = "*Word not found*"
                Else
                    Text2.Text = .Fields("Hyphenated").Value
                End If
            End If
        End With
        Text1.SetFocus
    End Sub
    
    Private Sub Form_Load()
        Dim FldIndexes As Variant
    
        Set RS = New ADODB.Recordset
        With RS
            .CursorLocation = adUseClient
            .LockType = adLockBatchOptimistic
            With .Fields
                .Append "Word", adVarWChar, 255
                .Append "Hyphenated", adVarWChar, 255
            End With
            .Open
            .Fields("Word").Properties("Optimize").Value = True
            FldIndexes = Array(0, 1)
            .AddNew FldIndexes, Array("abacus", "ab-a-cus")
            .AddNew FldIndexes, Array("above", "a-bove")
            .AddNew FldIndexes, Array("congratulate", "con-gra-tu-late")
            .AddNew FldIndexes, Array("diamond", "dia-mond")
            .AddNew FldIndexes, Array("don't", "don't")
            .AddNew FldIndexes, Array("high", "high")
            .AddNew FldIndexes, Array("how", "how")
            .AddNew FldIndexes, Array("I", "I")
            .AddNew FldIndexes, Array("in", "in")
            .AddNew FldIndexes, Array("like", "like")
            .AddNew FldIndexes, Array("little", "lit-tle")
            .AddNew FldIndexes, Array("so", "so")
            .AddNew FldIndexes, Array("stardust", "star-dust")
            .AddNew FldIndexes, Array("Sweden", "Swe-den")
            .AddNew FldIndexes, Array("the", "the")
            .AddNew FldIndexes, Array("twinkle", "twin-kle")
            .AddNew FldIndexes, Array("up", "up")
            .AddNew FldIndexes, Array("what", "what")
            .AddNew FldIndexes, Array("wonderful", "won-der-ful")
            .AddNew FldIndexes, Array("workability", "work-a-bi-li-ty")
            .AddNew FldIndexes, Array("Yarmouth", "Yar-mouth")
            .AddNew FldIndexes, Array("Zurich", "Zu-rich")
            .UpdateBatch
        End With
    End Sub
    
    Private Sub Form_Unload(Cancel As Integer)
        RS.Close
    End Sub

  11. #11

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    Thanks dilettante, that works perfectly!
    Finds "sweden" and "Sweden" etc, handles all fringe cases I can think of " ", "a", "zzzz", etc

    I will see which one I can adopt easier for my purposes; dilettante's or Arnoutdv's

    I was hoping to use a .txt file I found on web (formatted as in my first post)
    abacus, ab-a-cus
    above, a-bove
    congratulate, con-gra-tu-late
    etc
    The file has approx 167,000 entries

    I wanted 20-30 lines of code to read from file and then hyphenate.
    Wanted also to not have to reference additional libraries.
    Last edited by mms_; Nov 16th, 2021 at 09:59 AM.

  12. #12
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    You can read the file any of several ways. One is to use the Jet Text IISAM and afterward disconnect the Recordset. Another is to use the Input # statement.

    ADO and Jet are both installed as part of Windows, just like the VB6 runtime and the hundreds of other libraries even the simplest VB6 program depends on.

  13. #13

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    Good to know about ADO and Jet being installed as part of Windows.

  14. #14
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    If you have a really massive list you might want to consider alternatives though. After all, if your program already does a lot it might be using quite a bit of RAM already.

    Then you must wrestle with the battle between "space and time" as well as more complexity.

    For example storing the text as ASCII, problematic ANSI (if your program crosses locales), or even UTF-8 can save half the space all by itself. Even doing that for the "looked up" hyphenation can save a lot of space.

    Or maybe just using a read-only database for lookups can be a practical alternative if performance doesn't become a problem.

  15. #15
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Binary Search Not Finding Capitals

    This brings back old memories. I once wrote an algorithm for doing disk-based binary searches, the index sorted and on disk. I'd have to dig to find it as that was back in the 80s. Personally though, yeah, if I were wanting that today, I'd get a database going, and create an index and use the Seek statement to get that work done. I don't know exactly how indexes are organized in MS-databases, but I do know they're highly optimized, probably much faster than any disk-based binary search.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  16. #16

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    After all, if your program already does a lot it might be using quite a bit of RAM already.
    Yes, I was already considering this.

    The way I think I want to go is
    - use the .txt file as it already is (I don't want to have to even touch it)
    - do some sort of binary search on the .txt file (but this has already proved cumbersome as I have two-part entries as <abacus, ab-a-cus> and getting a match on abacus does not work)

    I'd rather not use collections or databases if I don't have to, because this old brain doesn't want to learn new concepts.

    Will try the text file approach, but admittedly don't know where to start.

    Elroy if you happen to find that old code I'd like to look at it
    Last edited by mms_; Nov 16th, 2021 at 11:29 AM.

  17. #17
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Binary Search Not Finding Capitals

    Quote Originally Posted by mms_ View Post
    Elroy if you happen to find that old code I'd like to look at it
    mms, this is about my latest iteration of that disk-based binary search code (attached). I ripped this out of a larger project. Sorry, but I'm not willing to post all of that larger project at this time.

    And, the following module does have references to code that's not included, so it won't run as it is. However, all of the disk-based binary searching is included in it.

    This code dates back to the 1990s, and I take no responsibility for it whatsoever. At the time, it was working with no known bugs, but it hasn't been used in a LONG time. If memory serves, it was written just after VB6 came out (or maybe before, as it doesn't have class modules).
    Attached Files Attached Files
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  18. #18

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    So I modified my code from Post #1 slightly, and my BinarySearch finds all words in the array, and correctly handles all edge conditions.
    Got rid also of the kludge to handle "Subscript out of range" errors.
    It only works however with uncapitalized words.

    So my question is What is the difference between a capital letter and its non-capital counterpart?
    they both are just a series of bits
    A = 01000001
    a = 01100001
    ???

    I can work with all small letters if I must, and handle any capitalization that might be required separately.

    Revised code that works.
    Code:
    Option Explicit
    
    
    Dim myArr() As String
    
    
    Private Sub Form_Load()
    
        Command1.Caption = "Hyphenate"
        
        ReDim myArr(20)
    
        myArr(0) = "abacus, ab-a-cus"
        myArr(1) = "above, a-bove"
        myArr(2) = "congratulate, con-gra-tu-late"
        myArr(3) = "diamond, dia-mond"
        myArr(4) = "high, high"
        myArr(5) = "how, how"
        myArr(6) = "i, I"
        myArr(7) = "in, in"
        myArr(8) = "like, like"
        myArr(9) = "little, lit-tle"
        myArr(10) = "so, so"
        myArr(11) = "stardust, star-dust"
        myArr(12) = "sweden, Swe-den"
        myArr(13) = "the, the"
        myArr(14) = "twinkle, twin-kle"
        myArr(15) = "up, up"
        myArr(16) = "what, what"
        myArr(17) = "wonderful, won-der-ful"
        myArr(18) = "workability, work-a-bi-li-ty"
        myArr(19) = "yarmouth, Yar-mouth"
        myArr(20) = "zurich, Zu-rich"
    
        Text1.Text = "diamond"
        
    End Sub
    
    
    Private Sub Command1_Click()
        
        Dim key As String
        key = Text1.Text
        
        ' Search for key in array
        Dim searchResult As Long
        searchResult = BinarySearch(myArr, key)
        
        ' Display results
        If searchResult = -1 Then
            MsgBox "Not in dictionary!"
        Else
            
            MsgBox Mid(myArr(searchResult), 1, InStr(1, myArr(searchResult), ",") - 1) & _
                   " (" & _
                   Mid(myArr(searchResult), InStr(1, myArr(searchResult), ",") + 2) & _
                   ")"
        End If
    
    End Sub
    
    
    Private Function BinarySearch(aSorted() As String, key As String) As Long
        
        Dim minIndex     As Integer
        Dim maxIndex     As Integer
        Dim midIndex     As Integer
        
        minIndex = 0
        maxIndex = UBound(aSorted)
    
        Do While minIndex <= maxIndex
            midIndex = Int((minIndex + maxIndex) / 2)
            If key < Mid(aSorted(midIndex), 1, InStr(1, aSorted(midIndex), ",") - 1) Then
                maxIndex = midIndex - 1
            ElseIf key > Mid(aSorted(midIndex), 1, InStr(1, aSorted(midIndex), ",") - 1) Then
                minIndex = midIndex + 1
            Else
                Exit Do
            End If
        Loop
        
        If key = Mid(aSorted(midIndex), 1, InStr(1, aSorted(midIndex), ",") - 1) Then
            BinarySearch = midIndex
        Else
            BinarySearch = -1
        End If
    
    End Function
    I will try to move this to a read text file approach.
    Last edited by mms_; Nov 16th, 2021 at 01:34 PM.

  19. #19
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,853

    Re: Binary Search Not Finding Capitals

    Quote Originally Posted by mms_ View Post
    So my question is What is the difference between a capital letter and its non-capital counterpart?
    they both are just a series of bits
    If you consider ANSI, different languages, code pages, and all of Unicode, that can be a very complex question, especially if you're attempting to reduce it to an examination of bits.

    If you restrict yourself to ASCII, it becomes fairly straightforward, but most people aren't willing to do this.

    For ASCII, if we're dealing with letters (A thru Z) or (a thru z), we can force Upper case by ANDing the numeric encoding with &hDF. To force Lower case (again, dealing only with letters), we can OR with &h20. But personally, I'd recommend using LCase and UCase to get these things done.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  20. #20

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    Thanks Elroy
    and thanks for code in Post #17 (missed it somehow)

  21. #21

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    So I am going to attempt to answer my own question
    What is the difference between a capital letter and its non-capital counterpart?
    I think it is because mixing Capitalized words in the array, puts the array out-of-order, and the BinarySearch algorithm requires a sorted array.
    My tests of all capitalized or all uncapitalized words in array both work as I would expect.

    Correct me if I'm wrong.

  22. #22
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,531

    Re: Binary Search Not Finding Capitals

    It depends on what was used to sort the array... "a" and "A" have different values... "a" will sort before "A" will... unless you use a case-agnostic means to sort & compare the data. It's the same kind of sorting that causes this to happen: "1" "11" "12" "120" "20" "25" as strings, those are sorted correctly. As numbers they are not.
    So if you have "a word" and "A BUG" ... unless you make them both upper or both lower case, and use text sorting, then they will sort "a word" and "A BUG" which could make finding "A bug" harder" ...

    This is why most string comparison method have a way of indicating if case should be important or not.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  23. #23
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: Binary Search Not Finding Capitals

    Quote Originally Posted by mms_ View Post
    So my question is What is the difference between a capital letter and its non-capital counterpart?
    they both are just a series of bits
    A = 01000001
    a = 01100001
    ???*snipp*
    I will try to move this to a read text file approach.
    The difference is 32 (I nearly wrote 42....)
    Code:
    Dim s As String
        s = "A"
        Debug.Print Chr(Asc(s) Xor 32)   '--> Returns "a"
        Debug.Print Chr(Asc(s) Or 32)   '--> Returns "a"
        s = "b"
        Debug.Print Chr(Asc(s) Xor 32)   '--> Returns "B"
        Debug.Print Chr(Asc(s) Or 32)   '--> Returns "b"
    EDIT: Obviously, this works only for ASCII "a..zA..Z" in a reliable way
    No idea about Unicode, UTF8, Umlauts, extended ASCII or other strange letters
    Last edited by Zvoni; Nov 17th, 2021 at 03:17 AM.
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  24. #24
    Fanatic Member
    Join Date
    Jan 2013
    Posts
    759

    Re: Binary Search Not Finding Capitals

    Quote Originally Posted by mms_ View Post
    In my implementation of a Binary Search function, the return does not find a match of a Capitalized word.
    . . .
    Code:
        . . . 
        myArr(0) = "abacus, ab-a-cus"
        . . . 
        myArr(20) = "Zurich, Zu-rich"
        . . . 
        Do While minIndex <= maxIndex
            midIndex = Int((minIndex + maxIndex) / 2)
            If key < aSortedArray(midIndex) Then
                maxIndex = midIndex - 1
            ElseIf key > aSortedArray(midIndex) Then
                minIndex = midIndex + 1
            End If
        Loop
    A Binary Search requires that the values in the list be sorted in ascending order.

    Does this surprise you at all?

    Code:
    ? "Z" > "a" 
    False
    In terms of their ASCII value, [all of] the lower case letters appear after [all of] the upper case letters.

    Try sorting your list into truly ascending order and things might work a bit better.

    However, you'll never match a specific word unless you search for the whole entry in the list, e.g. "abacus, ab-a-cus". Searching for the value "abacus" will probably return you the entry immediately before this, because "abacus" < "abacus, ab-a-cus".

    Best option is to only have the values that you want to search for in your list but, failing that, this might work a bit better:

    Code:
    If key < Split( aSortedArray(midIndex), "," )( 0 ) Then
        maxIndex = midIndex - 1
    ElseIf key > Split( aSortedArray(midIndex), "," )( 0 ) Then
        minIndex = midIndex + 1
    End If
    Regards, Phill W.

  25. #25

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    The BinarySearch on Capitalized/Uncapitalized lists all makes perfect sense to me now

    Still working on BinarySearch in Text file.
    Looks like I must use Random mode???

  26. #26
    Fanatic Member
    Join Date
    Jan 2013
    Posts
    759

    Re: Binary Search Not Finding Capitals

    Unless the file is huge, it'll be quickest to read the file into a list and search it there.

    The variable lengths of each line in a text file make searching them directly in the file a complete Nightmare.

    Code:
    Len	Line
    55	If key < Split( aSortedArray(midIndex), "," )( 0 ) Then
    27	    maxIndex = midIndex - 1
    59	ElseIf key > Split( aSortedArray(midIndex), "," )( 0 ) Then
    27	    minIndex = midIndex + 1
     6	End If
    Of course, if your file has fixed-length records, then you stand a far better chance.

    Regards, Phill W.

  27. #27
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: Binary Search Not Finding Capitals

    Quote Originally Posted by mms_ View Post
    Looks like I must use Random mode???
    If you do not read the file into memory beforehand, then yes, you need random mode.
    BUT: AFAIK, the File must be saved in Random mode beforehand, meaning you probably need to change everything to fixed-length string.
    Never tried it without UDT resp. fixed-length string.
    Some experiments?

    I have such a BinSearch for a file, but i use an UDT with fixed-length strings, so the recordsize is always the same.
    AND, as already mentioned: The Entries have to be sorted in ascending order.
    with 180K entries my search time is under 1 second
    EDIT: Just did a quick test with a few calls. The highest result was about 0.125 seconds, everything else below 0.1 seconds
    Last edited by Zvoni; Nov 17th, 2021 at 10:25 AM.
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  28. #28

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    OK thanks Phill.W and Zvoni
    Good to know.

    I will work with Random Access Files using UDT with fixed-length strings.

  29. #29
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Results (times in seconds, on a slow PC with slow hard drive):

    Code:
    ---------------------------
    MakeDB
    ---------------------------
    Imported 161,521 rows
    ---------------------------
    OK   
    ---------------------------
    
    ---------------------------
    UseDB
    ---------------------------
    0.017165, zymosthenic, zy-mos-then-ic
    0.001675, able, a-ble
    0.001315, Garbage*Value, *not found*
    0.001274, table, ta-ble
    0.001220, Label, la-bel
    0.001175, pepperoni, pep-per-o-ni
    0.001254, anteater, ant-eat-er
    0.001473, Systematic, sys-tem-a-tic
    0.001248, UNDERWEAR, un-der-wear
    0.001785, Popsicle, pop-si-cle
    ---------------------------
    OK   
    ---------------------------
    MakeDB.bas:

    Code:
    Option Explicit
    
    Private Sub Main()
        Dim Connection As ADODB.Connection
        Dim InsertedCount As Long
    
        ChDir App.Path
        ChDrive App.Path
        On Error Resume Next
        Kill "Hyphenation.mdb"
        On Error GoTo 0
        With CreateObject("ADOX.Catalog")
            .Create "Provider=Microsoft.Jet.OLEDB.4.0;" _
                  & "Jet OLEDB:Engine Type=5;Data Source='TEMP.mdb';" _
                  & "Mode=Share Exclusive"
            Set Connection = .ActiveConnection
        End With
        With Connection
            .Execute "SELECT FIRST([Word]) AS [Word], FIRST([Hyphenation]) AS [Hyphenation] " _
                   & "INTO [TEMP] " _
                   & "FROM [Text;Database=" & App.Path & "].[Hyphenation.txt] " _
                   & "WHERE LEN([Word]) > 0 AND LEN(Hyphenation) > 0 " _
                   & "GROUP BY LCASE$([Word]) ORDER BY LCASE$([Word])", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Execute "CREATE TABLE [Hyphenation](" _
                   & "[Word] TEXT(255) WITH COMPRESSION CONSTRAINT [pkWord] PRIMARY KEY, " _
                   & "[Hyphenation] TEXT(255) WITH COMPRESSION)", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Execute "INSERT INTO [Hyphenation] " _
                   & "SELECT LCASE$([Word]) AS [Word], LCASE$([Hyphenation]) AS [Hyphenation]" _
                   & "FROM [TEMP]", _
                     InsertedCount, _
                     adCmdText Or adExecuteNoRecords
            .Execute "DROP TABLE [TEMP]", _
                     , _
                     adCmdText Or adExecuteNoRecords
            .Close
        End With
        With New JRO.JetEngine
            .CompactDatabase "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='TEMP.mdb'", _
                             "Provider=Microsoft.Jet.OLEDB.4.0;" _
                           & "Jet OLEDB:Engine Type=5;Data Source='Hyphenation.mdb'"
        End With
        On Error Resume Next
        Kill "TEMP.mdb"
        On Error GoTo 0
    
        'We now have "Hyphenation.mdb":
        '
        '   o One table "Hyphenation"
        '   o Two fields imported: "Word" and "Hyphenation" both lowercased with duplicate
        '     "Word" rows and any empty-string rows eliminated.
    
        MsgBox "Imported " & Format$(InsertedCount, "#,##0") & " rows"
    End Sub
    Schema.ini used to import text into the database:

    Code:
    [Hyphenation.txt]
    CharacterSet=1252
    Format=Delimited(,)
    TextDelimiter=none
    MaxScanRows=1
    ColNameHeader=False
    Col1="Word" TEXT
    Col2="Hyphenation" TEXT
    UseDB.bas:

    Code:
    Option Explicit
    
    Private Declare Function GetTickCount Lib "kernel32" () As Long
    Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
        ByRef Frequency As Currency) As Long
    Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
        ByRef PerformanceCount As Currency) As Long
    
    Private mFrequency As Currency
    
    Private Connection As ADODB.Connection
    Private QueryCommand As ADODB.Command
    Private Result As ADODB.Recordset
    Private Msg As String
    
    Private Function Frequency() As Long
        If mFrequency < 0 Then
            Frequency = 1000
        Else
            Frequency = CLng(mFrequency * 10000@)
        End If
    End Function
    
    Private Function GetTicks() As Double
        'Returns a time in seconds.
        Dim PerformanceCount As Currency
    
        If mFrequency < 0 Then
            'Not supported, substitute:
            GetTicks = CDbl(GetTickCount()) / 1000
        Else
            QueryPerformanceCounter PerformanceCount
            GetTicks = CDbl(PerformanceCount) / mFrequency
        End If
    End Function
    
    Private Sub TicksInit()
        If mFrequency = 0 Then If QueryPerformanceFrequency(mFrequency) = 0 Then mFrequency = -1
    End Sub
    
    Private Sub Lookup(ByRef Word As String)
        Dim Secs As Double
    
        Secs = GetTicks()
        Connection.Lookup Word, Result
        Secs = GetTicks() - Secs
        Msg = Msg & Format$(Secs, "#,##0.000000") & ", " & Word & ", "
        If Result.EOF Then
            Msg = Msg & "*not found*" & vbNewLine
        Else
            Msg = Msg & Result(0).Value & vbNewLine
        End If
        Result.Close
    End Sub
    
    Private Sub Main()
        ChDir App.Path
        ChDrive App.Path
        Set Connection = New ADODB.Connection
        Connection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" _
                      & "Data Source='Hyphenation.mdb';Mode=Read"
        Set QueryCommand = New ADODB.Command
        With QueryCommand
            .Name = "Lookup"
            .CommandType = adCmdText
            .CommandText = "SELECT [Hyphenation] FROM [Hyphenation] WHERE [Word] = ?"
            .Prepared = True
            Set .ActiveConnection = Connection
        End With
        Set Result = New ADODB.Recordset
        With Result
            .CursorLocation = adUseClient
            .CursorType = adOpenStatic
            .LockType = adLockReadOnly
        End With
    
        TicksInit
    
        Lookup "zymosthenic"
        Lookup "able"
        Lookup "Garbage*Value"
        Lookup "table"
        Lookup "Label"
        Lookup "pepperoni"
        Lookup "anteater"
        Lookup "Systematic"
        Lookup "UNDERWEAR"
        Lookup "Popsicle"
    
        Set QueryCommand = Nothing
        Connection.Close
    
        MsgBox Msg
    End Sub
    Pretty hard to justify a lot of convoluted tricky and error-prone code when we already have power tools available to us.

    The database came to ~ 8.5MB. The text I found to import was a bit nutty, with some empty values here and there as well as many duplicates with different casing. Otherwise the MakeDB code could have been a lot simpler.

  30. #30

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    I would like to try your sample code.
    I guess it is impossible to upload that 8.5MB database due to its size?

    I have also come up with a BinarySearch on Random Mode Text File solution,
    but can't test it with my 167,000 entry text file because it is not in the correct format.

  31. #31
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    If you have a text file with two columns per line with a comma or comma-space between them then the MakeDB should run fine. Mine looks like:

    Code:
    Aalborg,Aal-borg
    Aalto,Aal-to
    aardvark,aard-vark
    aardwolf,aard-wolf
    Aarau,Aar-au
    Aargau,Aar-gau
    Aarhus,Aar-hus
    Aaron,Aar-on
    Aaronitic,Aar-on-i-tic
    Aalesund,Aa-le-sund
    aalii,aa-li-i
    Aaron,Aa-ron
    Aaronic,Aa-ro-nic
    Aaronite,Aa-ron-ite
    Aaronical,Aa-ron-i-cal
    abb,abb
    Abbevillian,Abbe-vill-i-an
    Abbevilean,Abbe-vil-e-an
    abbrev,abbrev
    absinth,ab-sinth
    aba,ab-a
    ...
    Note that it needs references to:

    Microsoft ActiveX Data Objects 2.5 Library (or newer)
    Microsoft Jet and Replication Objects 2.6 Library (or newer)

    The UseDB program only needs the first reference.

  32. #32
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Note that the Schema.ini file goes "next to" the text data file (Hyphenation.txt) and the Jet Text IISAM uses that to determine how to parse the text data file. As written above, those both go into the Project folder.

  33. #33

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    Mine is an Excel file with 2 columns.
    I should be able to figure out how to convert to text file in that format.

  34. #34

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    I ran the program, but get

    Code:
    Run-time error '-2147467259 (80004005)':
    
    Could not find file
    'C:\Desktop\Hyphenate_dilettante_DB\Hyphenation.mdb'.
    I will try again tomorrow.

  35. #35
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Ok, well there are two programs: MakeDB imports the text data, UseDB uses the created MDB file.

  36. #36

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    I had mistakenly put one module in Form1 code, and other in Module1.code

  37. #37
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    I'd have zipped and uploaded the two Projects but but my upload allowance is running on fumes.

  38. #38

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    It's OK I got it working
    I've only used your 20 line test file from Post #31
    I will play some more tomorrow, and compare to my BinarySearch/Text file solution (with BIG test file).

    Thanks again for all your help!!

  39. #39

    Thread Starter
    Fanatic Member
    Join Date
    Nov 2018
    Posts
    602

    Re: Binary Search Not Finding Capitals

    I was able to convert my Excel file to a format compatible with yours, and your solution works flawlessly!
    Thank you!!
    Also, what you posted teaches me some totally NEW concepts.

    I also converted the Excel file to a text file format compatible with my BinarySearch code, but those "wonky" special characters cause my solution to not work.
    I will see if I can get rid of those, so my code can work.

  40. #40
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Binary Search Not Finding Capitals

    Some of these techniques have awkward limitations. This example works fine, but if the word lookup needed to be case-sensitive instead there is no simple change that makes that possible and retains good performance.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width