Results 1 to 36 of 36

Thread: MyInstr: Skip the contents of the quotes to find a substring

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Resolved MyInstr: Skip the contents of the quotes to find a substring

    I need to search some substrings frequently in many strings, but I need to skip the contents of quotes (single quotes, double quotes, and back-quotes). For example:

    String1 = "The title of the book is 'Harry Potter'. "
    String2 = "Harry"

    Then, the return value of MyInstr(1, String1, String2, vbTextCompare) should be 0.

    Currently, I plan to compare and judge character by character, and I need to consider vbBinaryCompare and vbTextCompare. I wonder if there are some clever and efficient ways to achieve this? Thanks.


    VB Code:
    1. Public Function MyInstr(Start, S1, Optional S2, Optional ByVal Cmp As VbCompareMethode, _
    2.                                 Optional ByVal SkipQuotationCotent As Boolean = True) As Long
    3.  
    4. End Function
    Last edited by dreammanor; Jun 11th, 2019 at 05:52 AM.

  2. #2
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    I need to search some substrings frequently in many strings, but I need to skip the contents of quotes (single quotes, double quotes, and back-quotes). For example:

    String1 = "The title of the book is 'Harry Potter'. "
    String2 = "Harry"

    Then, the return value of MyInstr(1, String1, String2, vbTextCompare) should be 0.

    Currently, I plan to compare and judge character by character, and I need to consider vbBinaryCompare and vbTextCompare. I wonder if there are some clever and quick ways to achieve this? Thanks.


    VB Code:
    1. Public Function MyInstr(Start, S1, Optional S2, Optional ByVal Cmp As VbCompareMethode, _
    2.                                 Optional ByVal SkipQuotationCotent As Boolean = True) As Long
    3.  
    4. End Function
    Huh?
    Why not just:
    1) Search your String 1 for the (opening) quote. Result be saved in S (=Start)
    2) Search your String 1 for the (closing) quote starting at S+1, being saved in E (=End)
    3) Replace the String between S and E with BLANK (this includes the quotes)
    3a) If you have more "quoted" strings, repeat 1), 2) and 3) until search for quotes returns 0 (or whatever value signaling "not found")
    4) Run your If InStr=0 Then....

    EDIT: To search for the Quotes i'd use the C-API-Function StrCSpnW/StrCSpnIW
    Last edited by Zvoni; Jun 5th, 2019 at 08:04 AM.
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  3. #3

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Hi Zvoni, Your method is similar to my original idea: replacing the contents of the quotes with Chr(0). But I'd like to know if there is a more efficient way.

    I'm going to find the information of StrCSpnW/StrCSpnIW now, thank you very much, Zvoni.

  4. #4
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Here's my kick at it...there are almost definitely bugs since I haven't tested it too thoroughly, but it might get you started (or give you a comparison approach for benchmarking against other approaches):

    Code:
    Option Explicit
    
    Public Function MyInstr(ByVal Start As Long, _
                            ByVal S1 As String, _
                            ByVal S2 As String, _
                            Optional ByVal Cmp As VBA.VbCompareMethod = vbBinaryCompare, _
                            Optional ByVal SearchQuotedContent As Boolean = False) As Long
       Dim ii As Long
       Dim l1 As Long
       Dim l2 As Long
       Dim l_Char As Integer
       Dim l_InQuote As Integer
       Dim l_QuoteEnd As Long
       
       l1 = Len(S1)
       If l1 = 0 Then Exit Function  ' Can't match empty string
       
       l2 = Len(S2)
       If l2 = 0 Then Exit Function  ' Can't match empty string
       
       If l1 < l2 Then Exit Function ' Can't find a longer string in a smaller string
       
       If Start > l1 - l2 + 1 Then Exit Function ' Can't find if start is after end of string1 less the length of string2
        
       l_QuoteEnd = Start   ' Assume everything before Start is in quotes so we don't check it
       
       If Not SearchQuotedContent Then
          For ii = Start To l1
             l_Char = AscW(Mid$(S1, ii, 1))
          
             Select Case l_Char
             Case 34, 39, 96   ' ", ', `
                ' Found a quote character
    
                If l_InQuote Then
                   ' We are already within a quoted block of text
                   If l_InQuote = l_Char Then
                      ' and in a matching quote character
                      ' So close off the quoted content run and remember the starting position of the unquoted run to come
                      l_InQuote = 0
                      l_QuoteEnd = ii + 1
                   End If
                   
                Else
                   ' Entering quote - check previous non-quoted chunk to see if we have a match
                   l_InQuote = l_Char
                   
                   If ii - l_QuoteEnd >= l2 Then
                      ' The previous unquoted run is long enough for a possible match
                      MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, ii - l_QuoteEnd), S2, Cmp)
                      If MyInstr > 0 Then
                         ' We found a match so short-circuit
                         Exit For
                      End If
                   End If
                End If
             End Select
          Next ii
       End If
       
       If MyInstr = 0 Then
          ' No match so far
          If Not l_InQuote Then
             ' We're not currently in a quoted run at the end of the string, so check the remaining characters
             If l1 - l_QuoteEnd + 1 >= l2 Then
                ' There are enough remaining characters for a possible match
                MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, l1 - l_QuoteEnd + 1), S2, Cmp)
             End If
          End If
       End If
       
       If MyInstr > 0 Then
          If l_QuoteEnd > 0 Then
             ' Add position of closing quote to the matches starting character position
             MyInstr = MyInstr + l_QuoteEnd - 1
          End If
       End If
    End Function
    
    Sub TestSpeed()
       Dim ii As Long
       Dim ll As Long
       Dim d As Double
       
       d = New_c.HPTimer
    
       Do
          ll = MyInstr(11, "Harry is 'cool'", "cool")
          Debug.Assert ll = 0
          
          ii = ii + 1
       Loop While New_c.HPTimer - d < 1
       
       MsgBox ii & " ops/s"
    End Sub
    I've tried to put reasonable short-circuits in to prevent unnecessary comparisons/processing, but that adds some complexity and I may have missed some edge cases where bugs may lurk.

    When the SearchQuotedContent param is True you'd be better off just calling the VB Instr() method directly I think.
    Last edited by jpbro; Jun 5th, 2019 at 03:04 PM. Reason: Fixed a bug

  5. #5
    Fanatic Member
    Join Date
    Feb 2019
    Posts
    706

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    According to the following test case, using vbBinaryCompare is 9 times faster than vbTextCompare:

    VB Code:
    1. Option Explicit
    2.  
    3. Private Sub Form_Load()
    4.     Dim t As Single
    5.     Dim s As String
    6.     Dim i As Long
    7.     Dim pos As Long
    8.    
    9.     s = String(1000, 65) & "BCD"
    10.    
    11.     t = Timer
    12.     For i = 1 To 1000000
    13.         pos = InStr(1, s, "BCD", vbTextCompare)
    14.     Next
    15.     Debug.Print "vbTextCompare: " & Timer - t
    16.    
    17.     t = Timer
    18.     For i = 1 To 1000000
    19.         pos = InStr(1, s, "BCD", vbBinaryCompare)
    20.     Next
    21.     Debug.Print "vbBinaryCompare: " & Timer - t
    22. End Sub

    Output:

    vbTextCompare: 4.574219
    vbBinaryCompare: 0.5273438

  6. #6
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Yikes! Instr with TextCompare is really slow. Check out VBSpeed for an Instr replacement that beats the pants off the native implementation: http://www.xbeat.net/vbspeed/c_InStr.htm

  7. #7
    PowerPoster Arnoutdv's Avatar
    Join Date
    Oct 2013
    Posts
    5,872

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    I would use the InStr method.
    If a match is found then check position - 1 and position + length of search string for a '

  8. #8
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    TextCompare does a lot of extra work. It is not as simple as a case-insensitive compare, for example it respects ligatures.

    For an English locale:

    Code:
    MsgBox InStr(1, "Abcœefg", "oe", vbTextCompare)
    Displays 4, not 0. The 4th character is a ligature.

  9. #9
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Hi Zvoni, Your method is similar to my original idea: replacing the contents of the quotes with Chr(0). But I'd like to know if there is a more efficient way.

    I'm going to find the information of StrCSpnW/StrCSpnIW now, thank you very much, Zvoni.
    Don't!
    Rather use vbNullString or the classic ""
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  10. #10

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by jpbro View Post
    Here's my kick at it...there are almost definitely bugs since I haven't tested it too thoroughly, but it might get you started (or give you a comparison approach for benchmarking against other approaches):

    Code:
    Option Explicit
    
    Public Function MyInstr(ByVal Start As Long, _
                            ByVal S1 As String, _
                            ByVal S2 As String, _
                            Optional ByVal Cmp As VBA.VbCompareMethod = vbBinaryCompare, _
                            Optional ByVal SearchQuotedContent As Boolean = False) As Long
       Dim ii As Long
       Dim l1 As Long
       Dim l2 As Long
       Dim l_Char As Integer
       Dim l_InQuote As Integer
       Dim l_QuoteEnd As Long
       
       l1 = Len(S1)
       If l1 = 0 Then Exit Function  ' Can't match empty string
       
       l2 = Len(S2)
       If l2 = 0 Then Exit Function  ' Can't match empty string
       
       If l1 < l2 Then Exit Function ' Can't find a longer string in a smaller string
       
       If Start > l1 - l2 + 1 Then Exit Function ' Can't find if start is after end of string1 less the length of string2
        
       l_QuoteEnd = Start   ' Assume everything before Start is in quotes so we don't check it
       
       If Not SearchQuotedContent Then
          For ii = Start To l1
             l_Char = AscW(Mid$(S1, ii, 1))
          
             Select Case l_Char
             Case 34, 39, 96   ' ", ', `
                ' Found a quote character
    
                If l_InQuote Then
                   ' We are already within a quoted block of text
                   If l_InQuote = l_Char Then
                      ' and in a matching quote character
                      ' So close off the quoted content run and remember the starting position of the unquoted run to come
                      l_InQuote = 0
                      l_QuoteEnd = ii + 1
                   End If
                   
                Else
                   ' Entering quote - check previous non-quoted chunk to see if we have a match
                   l_InQuote = l_Char
                   
                   If ii - l_QuoteEnd >= l2 Then
                      ' The previous unquoted run is long enough for a possible match
                      MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, ii - l_QuoteEnd), S2, Cmp)
                      If MyInstr > 0 Then
                         ' We found a match so short-circuit
                         Exit For
                      End If
                   End If
                End If
             End Select
          Next ii
       End If
       
       If MyInstr = 0 Then
          ' No match so far
          If Not l_InQuote Then
             ' We're not currently in a quoted run at the end of the string, so check the remaining characters
             If l1 - l_QuoteEnd + 1 >= l2 Then
                ' There are enough remaining characters for a possible match
                MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, l1 - l_QuoteEnd + 1), S2, Cmp)
             End If
          End If
       End If
       
       If MyInstr > 0 Then
          If l_QuoteEnd > 0 Then
             ' Add position of closing quote to the matches starting character position
             MyInstr = MyInstr + l_QuoteEnd - 1
          End If
       End If
    End Function
    
    Sub TestSpeed()
       Dim ii As Long
       Dim ll As Long
       Dim d As Double
       
       d = New_c.HPTimer
    
       Do
          ll = MyInstr(11, "Harry is 'cool'", "cool")
          Debug.Assert ll = 0
          
          ii = ii + 1
       Loop While New_c.HPTimer - d < 1
       
       MsgBox ii & " ops/s"
    End Sub
    I've tried to put reasonable short-circuits in to prevent unnecessary comparisons/processing, but that adds some complexity and I may have missed some edge cases where bugs may lurk.

    When the SearchQuotedContent param is True you'd be better off just calling the VB Instr() method directly I think.
    Hi jpbro, sorry for the late reply. I tested your code, it's 3 times faster than my method (replacing the contents of the quotes with Chr(0)). Thank you very much.

  11. #11

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by qvb6 View Post
    According to the following test case, using vbBinaryCompare is 9 times faster than vbTextCompare:

    VB Code:
    1. Option Explicit
    2.  
    3. Private Sub Form_Load()
    4.     Dim t As Single
    5.     Dim s As String
    6.     Dim i As Long
    7.     Dim pos As Long
    8.    
    9.     s = String(1000, 65) & "BCD"
    10.    
    11.     t = Timer
    12.     For i = 1 To 1000000
    13.         pos = InStr(1, s, "BCD", vbTextCompare)
    14.     Next
    15.     Debug.Print "vbTextCompare: " & Timer - t
    16.    
    17.     t = Timer
    18.     For i = 1 To 1000000
    19.         pos = InStr(1, s, "BCD", vbBinaryCompare)
    20.     Next
    21.     Debug.Print "vbBinaryCompare: " & Timer - t
    22. End Sub

    Output:

    vbTextCompare: 4.574219
    vbBinaryCompare: 0.5273438
    Thank you, qvb6.

    Quote Originally Posted by Arnoutdv View Post
    I would use the InStr method.
    If a match is found then check position - 1 and position + length of search string for a '
    Yes, InStr is the easiest and most effective way. Thank you, Arnoutdv.

    Quote Originally Posted by dilettante View Post
    TextCompare does a lot of extra work. It is not as simple as a case-insensitive compare, for example it respects ligatures.

    For an English locale:

    Code:
    MsgBox InStr(1, "Abcœefg", "oe", vbTextCompare)
    Displays 4, not 0. The 4th character is a ligature.
    Thank you, dilettante. For TextCompare, my solution is to convert both S1 and S2 to lowercase, and then compare them with BinaryCompare.

    Quote Originally Posted by Zvoni View Post
    Don't!
    Rather use vbNullString or the classic ""
    Thank you, Zvoni. I decided to use jpbro's method, which is three times faster than my method (replacing the contents of the quotes with Chr(0)).

  12. #12
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    @DreamManor. One tweak you may want to consider... Test the string for a quote/apostrophe (InStr binary compare) before looping through the string. If most of your strings do not have quotes/apostrophes then that tweak should improve overall speed. If no special characters, then perform the InStr(String1,String2) immediately without looping.

    Just a thought and an easy enough test...
    Code:
    Const VBquote = """"
    Const VBapos = "'"
    If InStr(1, String1, VBquote, vbBinaryCompare) = 0 Then
        If InStr(1, String1, VBapos, vbBinaryCompare) = 0 Then
            If InStr(1, String1, String2, vbTextCompare) Then
                ... match
            Else
                ... no match
            End If
            Exit Sub
        End If
    End If
    
    ' ... do the loop
    Swap the two tests around if you expect more strings with apostrophes than those with quotes
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  13. #13
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post


    Thank you, dilettante. For TextCompare, my solution is to convert both S1 and S2 to lowercase, and then compare them with BinaryCompare.
    setting to UCase is faster

    Code:
    Private Type SearchTxtType
       SearchFor As String
       Found     As Long
    End Type
    
    Private Sub Command1_Click()
      Dim SearchTxt() As SearchTxtType
       Dim i As Long
          
          ReDim SearchTxt(3)
          SearchTxt(0).SearchFor = "Hi"
          SearchTxt(1).SearchFor = "with"
          SearchTxt(2).SearchFor = "this"
          SearchTxt(3).SearchFor = "'"
          
          For i = 0 To UBound(SearchTxt)
             SearchTxt(i).Found = CountStringInString(Text1.Text, SearchTxt(i).SearchFor, vbTextCompare)
            Debug.Print SearchTxt(i).SearchFor, SearchTxt(i).Found
          Next
    End Sub
    
    
    Private Sub Form_Load()
    Dim FileNo As Integer
       Dim TempData As String
       FileNo = FreeFile
       Open "E:\Testword.txt" For Input As FileNo
          TempData = Input(LOF(FileNo), FileNo)
       Close
       Text1.Text = TempData
    End Sub
    
    Public Function CountStringInString(Text As String, SearchFor As String, _
                                        Optional ComapareAsText As Boolean = False) As Long
    
       Dim i As Long, j As Long, z As Long
       Dim s As String, s1 As String
          If ComapareAsText Then
             s = UCase$(Text)
             s1 = UCase$(SearchFor)
          Else
             s = Text
             s1 = SearchFor
          End If
          i = 1
          Do
             j = InStr(i, s, s1, vbBinaryCompare)
             If j = 0 Then
                Exit Do
             End If
             i = j + Len(s1)
             z = z + 1
          Loop
          CountStringInString = z
    End Function
    another option would be to use Regex to seperate the Parts in double quotes
    and leave only the words

    hth
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  14. #14

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by LaVolpe View Post
    @DreamManor. One tweak you may want to consider... Test the string for a quote/apostrophe (InStr binary compare) before looping through the string. If most of your strings do not have quotes/apostrophes then that tweak should improve overall speed. If no special characters, then perform the InStr(String1,String2) immediately without looping.

    Just a thought and an easy enough test...
    Code:
    Const VBquote = """"
    Const VBapos = "'"
    If InStr(1, String1, VBquote, vbBinaryCompare) = 0 Then
        If InStr(1, String1, VBapos, vbBinaryCompare) = 0 Then
            If InStr(1, String1, String2, vbTextCompare) Then
                ... match
            Else
                ... no match
            End If
            Exit Sub
        End If
    End If
    
    ' ... do the loop
    Swap the two tests around if you expect more strings with apostrophes than those with quotes
    Very helpful advice, thank you, LaVolpe. I'm currently working on HTML, CSS, JavaScript strings. I not only need to judge quotes (Chr(34), Chr(39), Chr(96)), I also need to judge the comment symbols ("//", "/* ... */") , so I need to use jpbro's method to scan every character.

  15. #15
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Very helpful advice, thank you, LaVolpe. I'm currently working on HTML, CSS, JavaScript strings. I not only need to judge quotes (Chr(34), Chr(39), Chr(96)), I also need to judge the comment symbols ("//", "/* ... */") , so I need to use jpbro's method to scan every character.
    I see. Ignore the following if it doesn't apply...

    Not sure how many strings you are talking about, i.e., parsing entire documents? If so, it may be much faster to use an overlay array and loop thru the array elements. The advantages can be significant:

    - The array is an overlay, you don't do myArray()=theString. Requires CopyMemory and SafeArray structures & result is no copying of data which would be a speed hit

    - The string characters and array data share the same binary information. You would be comparing numbers vs string characters when looping. Ultimately, you would use InStr() for comparison, but looping via the bytes. A speed hit by looping with string characters is the temporary creation of strings, i.e., Mid$(...), AscW(Mid$(...)), etc
    Code:
    For x = 1 To Len(String1)
        If arrInts(x) = 34 Then 
    
        End If
    Next
    will be faster than
    Code:
    For x = 1 To Len(String1)
        If AscW(Mid$(String, x, 1)) = 34 Then
    
        End If
    Next
    The usage of arrays requires more work, but can really improve speed when parsing KBs or MBs of text.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  16. #16

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by ChrisE View Post
    setting to UCase is faster
    Thank you, ChrisE.

    Quote Originally Posted by ChrisE View Post
    another option would be to use Regex to seperate the Parts in double quotes
    and leave only the words

    hth
    I know that RegEx will have better flexibility and expandability, but RegEx always makes me a headache. If I don't use RegEx for 3 months, I'll forget all the rules of it. When I use it next time, I need to relearn it.

    The following is the language syntax definition of Monaco Editor (most of which are RegEx expressions), which is shocking to me:

    Code:
    // Difficulty: "Nightmare!"
    /*
    Ruby language definition
    
    Quite a complex language due to elaborate escape sequences
    and quoting of literate strings/regular expressions, and
    an 'end' keyword that does not always apply to modifiers like until and while,
    and a 'do' keyword that sometimes starts a block, but sometimes is part of
    another statement (like 'while').
    
    (1) end blocks:
    'end' may end declarations like if or until, but sometimes 'if' or 'until'
    are modifiers where there is no 'end'. Also, 'do' sometimes starts a block
    that is ended by 'end', but sometimes it is part of a 'while', 'for', or 'until'
    To do proper brace matching we do some elaborate state manipulation.
    some examples:
    
      until bla do
        work until tired
        list.each do
          foo if test
        end
      end
    
    or
    
    if test
     foo (if test then x end)
     bar if bla
    end
    
    or, how about using class as a property..
    
    class Foo
      def endpoint
        self.class.endpoint || routes
      end
    end
    
    (2) quoting:
    there are many kinds of strings and escape sequences. But also, one can
    start many string-like things as '%qx' where q specifies the kind of string
    (like a command, escape expanded, regular expression, symbol etc.), and x is
    some character and only another 'x' ends the sequence. Except for brackets
    where the closing bracket ends the sequence.. and except for a nested bracket
    inside the string like entity. Also, such strings can contain interpolated
    ruby expressions again (and span multiple lines). Moreover, expanded
    regular expression can also contain comments.
    */
    return {
    	tokenPostfix: '.ruby',
    
    	keywords: [
    		'__LINE__', '__ENCODING__', '__FILE__', 'BEGIN', 'END', 'alias', 'and', 'begin',
    		'break', 'case', 'class', 'def', 'defined?', 'do', 'else', 'elsif', 'end',
    		'ensure', 'for', 'false', 'if', 'in', 'module', 'next', 'nil', 'not', 'or', 'redo',
    		'rescue', 'retry', 'return', 'self', 'super', 'then', 'true', 'undef', 'unless',
    		'until', 'when', 'while', 'yield',
    	],
    
    	keywordops: [
    		'::', '..', '...', '?', ':', '=>'
    	],
    
    	builtins: [
    		'require', 'public', 'private', 'include', 'extend', 'attr_reader',
    		'protected', 'private_class_method', 'protected_class_method', 'new'
    	],
    
    	// these are closed by 'end' (if, while and until are handled separately)
    	declarations: [
    		'module', 'class', 'def', 'case', 'do', 'begin', 'for', 'if', 'while', 'until', 'unless'
    	],
    
    	linedecls: [
    		'def', 'case', 'do', 'begin', 'for', 'if', 'while', 'until', 'unless'
    	],
    
    	operators: [
    		'^', '&', '|', '<=>', '==', '===', '!~', '=~', '>', '>=', '<', '<=', '<<', '>>', '+',
    		'-', '*', '/', '%', '**', '~', '+@', '-@', '[]', '[]=', '`',
    		'+=', '-=', '*=', '**=', '/=', '^=', '%=', '<<=', '>>=', '&=', '&&=', '||=', '|='
    	],
    
    	brackets: [
    		{ open: '(', close: ')', token: 'delimiter.parenthesis' },
    		{ open: '{', close: '}', token: 'delimiter.curly' },
    		{ open: '[', close: ']', token: 'delimiter.square' }
    	],
    
    	// we include these common regular expressions
    	symbols: /[=><!~?:&|+\-*\/\^%\.]+/,
    
    	// escape sequences
    	escape: /(?:[abefnrstv\\"'\n\r]|[0-7]{1,3}|x[0-9A-Fa-f]{1,2}|u[0-9A-Fa-f]{4})/,
    	escapes: /\\(?:C\-(@escape|.)|c(@escape|.)|@escape)/,
    
    	decpart: /\d(_?\d)*/,
    	decimal: /0|@decpart/,
    
    	delim: /[^a-zA-Z0-9\s\n\r]/,
    	heredelim: /(?:\w+|'[^']*'|"[^"]*"|`[^`]*`)/,
    
    	regexpctl: /[(){}\[\]\$\^|\-*+?\.]/,
    	regexpesc: /\\(?:[AzZbBdDfnrstvwWn0\\\/]|@regexpctl|c[A-Z]|x[0-9a-fA-F]{2}|u[0-9a-fA-F]{4})?/,
    
    
    	// The main tokenizer for our languages
    	tokenizer: {
    		// Main entry.
    		// root.<decl> where decl is the current opening declaration (like 'class')
    		root: [
    			// identifiers and keywords
    			// most complexity here is due to matching 'end' correctly with declarations.
    			// We distinguish a declaration that comes first on a line, versus declarations further on a line (which are most likey modifiers)
    			[/^(\s*)([a-z_]\w*[!?=]?)/, ['white',
    				{
    					cases: {
    						'for|until|while': { token: 'keyword.$2', next: '@dodecl.$2' },
    						'@declarations': { token: 'keyword.$2', next: '@root.$2' },
    						'end': { token: 'keyword.$S2', next: '@pop' },
    						'@keywords': 'keyword',
    						'@builtins': 'predefined',
    						'@default': 'identifier'
    					}
    				}]],
    			[/[a-z_]\w*[!?=]?/,
    				{
    					cases: {
    						'if|unless|while|until': { token: 'keyword.$0x', next: '@modifier.$0x' },
    						'for': { token: 'keyword.$2', next: '@dodecl.$2' },
    						'@linedecls': { token: 'keyword.$0', next: '@root.$0' },
    						'end': { token: 'keyword.$S2', next: '@pop' },
    						'@keywords': 'keyword',
    						'@builtins': 'predefined',
    						'@default': 'identifier'
    					}
    				}],
    
    			[/[A-Z][\w]*[!?=]?/, 'constructor.identifier'],     // constant
    			[/\$[\w]*/, 'global.constant'],               // global
    			[/@[\w]*/, 'namespace.instance.identifier'], // instance
    			[/@@[\w]*/, 'namespace.class.identifier'],    // class
    
    			// here document
    			[/<<[-~](@heredelim).*/, { token: 'string.heredoc.delimiter', next: '@heredoc.$1' }],
    			[/[ \t\r\n]+<<(@heredelim).*/, { token: 'string.heredoc.delimiter', next: '@heredoc.$1' }],
    			[/^<<(@heredelim).*/, { token: 'string.heredoc.delimiter', next: '@heredoc.$1' }],
    
    
    			// whitespace
    			{ include: '@whitespace' },
    
    			// strings
    			[/"/, { token: 'string.d.delim', next: '@dstring.d."' }],
    			[/'/, { token: 'string.sq.delim', next: '@sstring.sq' }],
    
    			// % literals. For efficiency, rematch in the 'pstring' state
    			[/%([rsqxwW]|Q?)/, { token: '@rematch', next: 'pstring' }],
    
    			// commands and symbols
    			[/`/, { token: 'string.x.delim', next: '@dstring.x.`' }],
    			[/:(\w|[$@])\w*[!?=]?/, 'string.s'],
    			[/:"/, { token: 'string.s.delim', next: '@dstring.s."' }],
    			[/:'/, { token: 'string.s.delim', next: '@sstring.s' }],
    
    			// regular expressions. Lookahead for a (not escaped) closing forwardslash on the same line
    			[/\/(?=(\\\/|[^\/\n])+\/)/, { token: 'regexp.delim', next: '@regexp' }],
    
    			// delimiters and operators
    			[/[{}()\[\]]/, '@brackets'],
    			[/@symbols/, {
    				cases: {
    					'@keywordops': 'keyword',
    					'@operators': 'operator',
    					'@default': ''
    				}
    			}],
    
    			[/[;,]/, 'delimiter'],
    
    			// numbers
    			[/0[xX][0-9a-fA-F](_?[0-9a-fA-F])*/, 'number.hex'],
    			[/0[_oO][0-7](_?[0-7])*/, 'number.octal'],
    			[/0[bB][01](_?[01])*/, 'number.binary'],
    			[/0[dD]@decpart/, 'number'],
    			[/@decimal((\.@decpart)?([eE][\-+]?@decpart)?)/, {
    				cases: {
    					'$1': 'number.float',
    					'@default': 'number'
    				}
    			}],
    
    		],
    
    		// used to not treat a 'do' as a block opener if it occurs on the same
    		// line as a 'do' statement: 'while|until|for'
    		// dodecl.<decl> where decl is the declarations started, like 'while'
    		dodecl: [
    			[/^/, { token: '', switchTo: '@root.$S2' }], // get out of do-skipping mode on a new line
    			[/[a-z_]\w*[!?=]?/, {
    				cases: {
    					'end': { token: 'keyword.$S2', next: '@pop' }, // end on same line
    					'do': { token: 'keyword', switchTo: '@root.$S2' }, // do on same line: not an open bracket here
    					'@linedecls': { token: '@rematch', switchTo: '@root.$S2' }, // other declaration on same line: rematch
    					'@keywords': 'keyword',
    					'@builtins': 'predefined',
    					'@default': 'identifier'
    				}
    			}],
    			{ include: '@root' }
    		],
    
    		// used to prevent potential modifiers ('if|until|while|unless') to match
    		// with 'end' keywords.
    		// modifier.<decl>x where decl is the declaration starter, like 'if'
    		modifier: [
    			[/^/, '', '@pop'], // it was a modifier: get out of modifier mode on a new line
    			[/[a-z_]\w*[!?=]?/, {
    				cases: {
    					'end': { token: 'keyword.$S2', next: '@pop' }, // end on same line
    					'then|else|elsif|do': { token: 'keyword', switchTo: '@root.$S2' }, // real declaration and not a modifier
    					'@linedecls': { token: '@rematch', switchTo: '@root.$S2' }, // other declaration => not a modifier
    					'@keywords': 'keyword',
    					'@builtins': 'predefined',
    					'@default': 'identifier'
    				}
    			}],
    			{ include: '@root' }
    		],
    
    		// single quote strings (also used for symbols)
    		// sstring.<kind>  where kind is 'sq' (single quote) or 's' (symbol)
    		sstring: [
    			[/[^\\']+/, 'string.$S2'],
    			[/\\\\|\\'|\\$/, 'string.$S2.escape'],
    			[/\\./, 'string.$S2.invalid'],
    			[/'/, { token: 'string.$S2.delim', next: '@pop' }]
    		],
    
    		// double quoted "string".
    		// dstring.<kind>.<delim> where kind is 'd' (double quoted), 'x' (command), or 's' (symbol)
    		// and delim is the ending delimiter (" or `)
    		dstring: [
    			[/[^\\`"#]+/, 'string.$S2'],
    			[/#/, 'string.$S2.escape', '@interpolated'],
    			[/\\$/, 'string.$S2.escape'],
    			[/@escapes/, 'string.$S2.escape'],
    			[/\\./, 'string.$S2.escape.invalid'],
    			[/[`"]/, {
    				cases: {
    					'$#==$S3': { token: 'string.$S2.delim', next: '@pop' },
    					'@default': 'string.$S2'
    				}
    			}]
    		],
    
    		// literal documents
    		// heredoc.<close> where close is the closing delimiter
    		heredoc: [
    			[/^(\s*)(@heredelim)$/, {
    				cases: {
    					'$2==$S2': ['string.heredoc', { token: 'string.heredoc.delimiter', next: '@pop' }],
    					'@default': ['string.heredoc', 'string.heredoc']
    				}
    			}],
    			[/.*/, 'string.heredoc'],
    		],
    
    		// interpolated sequence
    		interpolated: [
    			[/\$\w*/, 'global.constant', '@pop'],
    			[/@\w*/, 'namespace.class.identifier', '@pop'],
    			[/@@\w*/, 'namespace.instance.identifier', '@pop'],
    			[/[{]/, { token: 'string.escape.curly', switchTo: '@interpolated_compound' }],
    			['', '', '@pop'], // just a # is interpreted as a #
    		],
    
    		// any code
    		interpolated_compound: [
    			[/[}]/, { token: 'string.escape.curly', next: '@pop' }],
    			{ include: '@root' },
    		],
    
    		// %r quoted regexp
    		// pregexp.<open>.<close> where open/close are the open/close delimiter
    		pregexp: [
    			{ include: '@whitespace' },
    			// turns out that you can quote using regex control characters, aargh!
    			// for example; %r|kgjgaj| is ok (even though | is used for alternation)
    			// so, we need to match those first
    			[/[^\(\{\[\\]/, {
    				cases: {
    					'$#==$S3': { token: 'regexp.delim', next: '@pop' },
    					'$#==$S2': { token: 'regexp.delim', next: '@push' }, // nested delimiters are allowed..
    					'~[)}\\]]': '@brackets.regexp.escape.control',
    					'~@regexpctl': 'regexp.escape.control',
    					'@default': 'regexp'
    				}
    			}],
    			{ include: '@regexcontrol' },
    		],
    
    		// We match regular expression quite precisely
    		regexp: [
    			{ include: '@regexcontrol' },
    			[/[^\\\/]/, 'regexp'],
    			['/[ixmp]*', { token: 'regexp.delim' }, '@pop'],
    		],
    
    		regexcontrol: [
    			[/(\{)(\d+(?:,\d*)?)(\})/, ['@brackets.regexp.escape.control', 'regexp.escape.control', '@brackets.regexp.escape.control']],
    			[/(\[)(\^?)/, ['@brackets.regexp.escape.control', { token: 'regexp.escape.control', next: '@regexrange' }]],
    			[/(\()(\?[:=!])/, ['@brackets.regexp.escape.control', 'regexp.escape.control']],
    			[/\(\?#/, { token: 'regexp.escape.control', next: '@regexpcomment' }],
    			[/[()]/, '@brackets.regexp.escape.control'],
    			[/@regexpctl/, 'regexp.escape.control'],
    			[/\\$/, 'regexp.escape'],
    			[/@regexpesc/, 'regexp.escape'],
    			[/\\\./, 'regexp.invalid'],
    			[/#/, 'regexp.escape', '@interpolated'],
    		],
    
    		regexrange: [
    			[/-/, 'regexp.escape.control'],
    			[/\^/, 'regexp.invalid'],
    			[/\\$/, 'regexp.escape'],
    			[/@regexpesc/, 'regexp.escape'],
    			[/[^\]]/, 'regexp'],
    			[/\]/, '@brackets.regexp.escape.control', '@pop'],
    		],
    
    		regexpcomment: [
    			[/[^)]+/, 'comment'],
    			[/\)/, { token: 'regexp.escape.control', next: '@pop' }]
    		],
    
    
    		// % quoted strings
    		// A bit repetitive since we need to often special case the kind of ending delimiter
    		pstring: [
    			[/%([qws])\(/, { token: 'string.$1.delim', switchTo: '@qstring.$1.(.)' }],
    			[/%([qws])\[/, { token: 'string.$1.delim', switchTo: '@qstring.$1.[.]' }],
    			[/%([qws])\{/, { token: 'string.$1.delim', switchTo: '@qstring.$1.{.}' }],
    			[/%([qws])</, { token: 'string.$1.delim', switchTo: '@qstring.$1.<.>' }],
    			[/%([qws])(@delim)/, { token: 'string.$1.delim', switchTo: '@qstring.$1.$2.$2' }],
    
    			[/%r\(/, { token: 'regexp.delim', switchTo: '@pregexp.(.)' }],
    			[/%r\[/, { token: 'regexp.delim', switchTo: '@pregexp.[.]' }],
    			[/%r\{/, { token: 'regexp.delim', switchTo: '@pregexp.{.}' }],
    			[/%r</, { token: 'regexp.delim', switchTo: '@pregexp.<.>' }],
    			[/%r(@delim)/, { token: 'regexp.delim', switchTo: '@pregexp.$1.$1' }],
    
    			[/%(x|W|Q?)\(/, { token: 'string.$1.delim', switchTo: '@qqstring.$1.(.)' }],
    			[/%(x|W|Q?)\[/, { token: 'string.$1.delim', switchTo: '@qqstring.$1.[.]' }],
    			[/%(x|W|Q?)\{/, { token: 'string.$1.delim', switchTo: '@qqstring.$1.{.}' }],
    			[/%(x|W|Q?)</, { token: 'string.$1.delim', switchTo: '@qqstring.$1.<.>' }],
    			[/%(x|W|Q?)(@delim)/, { token: 'string.$1.delim', switchTo: '@qqstring.$1.$2.$2' }],
    
    			[/%([rqwsxW]|Q?)./, { token: 'invalid', next: '@pop' }], // recover
    			[/./, { token: 'invalid', next: '@pop' }], // recover
    		],
    
    		// non-expanded quoted string.
    		// qstring.<kind>.<open>.<close>
    		//  kind = q|w|s  (single quote, array, symbol)
    		//  open = open delimiter
    		//  close = close delimiter
    		qstring: [
    			[/\\$/, 'string.$S2.escape'],
    			[/\\./, 'string.$S2.escape'],
    			[/./, {
    				cases: {
    					'$#==$S4': { token: 'string.$S2.delim', next: '@pop' },
    					'$#==$S3': { token: 'string.$S2.delim', next: '@push' }, // nested delimiters are allowed..
    					'@default': 'string.$S2'
    				}
    			}],
    		],
    
    		// expanded quoted string.
    		// qqstring.<kind>.<open>.<close>
    		//  kind = Q|W|x  (double quote, array, command)
    		//  open = open delimiter
    		//  close = close delimiter
    		qqstring: [
    			[/#/, 'string.$S2.escape', '@interpolated'],
    			{ include: '@qstring' }
    		],
    
    
    		// whitespace & comments
    		whitespace: [
    			[/[ \t\r\n]+/, ''],
    			[/^\s*=begin\b/, 'comment', '@comment'],
    			[/#.*$/, 'comment'],
    		],
    
    		comment: [
    			[/[^=]+/, 'comment'],
    			[/^\s*=begin\b/, 'comment.invalid'],    // nested comment
    			[/^\s*=end\b.*/, 'comment', '@pop'],
    			[/[=]/, 'comment']
    		],
    	}
    };
    Last edited by dreammanor; Jun 8th, 2019 at 12:40 AM.

  17. #17

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by LaVolpe View Post
    I see. Ignore the following if it doesn't apply...

    Not sure how many strings you are talking about, i.e., parsing entire documents? If so, it may be much faster to use an overlay array and loop thru the array elements. The advantages can be significant:

    - The array is an overlay, you don't do myArray()=theString. Requires CopyMemory and SafeArray structures & result is no copying of data which would be a speed hit

    - The string characters and array data share the same binary information. You would be comparing numbers vs string characters when looping. Ultimately, you would use InStr() for comparison, but looping via the bytes. A speed hit by looping with string characters is the temporary creation of strings, i.e., Mid$(...), AscW(Mid$(...)), etc
    Code:
    For x = 1 To Len(String1)
        If arrInts(x) = 34 Then 
    
        End If
    Next
    will be faster than
    Code:
    For x = 1 To Len(String1)
        If AscW(Mid$(String, x, 1)) = 34 Then
    
        End If
    Next
    The usage of arrays requires more work, but can really improve speed when parsing KBs or MBs of text.
    Yes, I need to parse the entire document. After I have completed the entire parsing algorithm, I'll try to use CopyMemory and SafeArray structures to further improve the software performance. Thank you very much, LaVolpe.

  18. #18
    Fanatic Member
    Join Date
    Feb 2019
    Posts
    706

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    InStrB is slightly faster than InStr(vbBinaryCompare). In the test case in post #5, I got 0.4 Seconds, so it's good for searching for quotes, but the position you get is in Bytes.

  19. #19
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    There is one potential issue in my code that you will need to be aware of, and may need to fix/change the behaviour. It has to do with setting the Start parameter to a value inside a quoted run of text. This can produce possible unexpected/undesired results.

    Consider the following example:

    Code:
    Debug.Print MyInstr(2, "'Harry' is 'cool'", "Harry")
    That will return 2 (though you may expect 0) because my routine does no back checking to see if it is in a string - it only scans in the forward direction and the scanning has been instructed to begin after the opening apostrophe.

    Likewise, you might expect Debug.Print MyInstr(2, "'Harry' is 'cool'", "is") to return 9, but it returns 0.

    I don't have a solution for this right now, just wanted to bring it to your attention. I think you'll always have to start the scan at the beginning of the string and only start looking for matches once the Start parameter value has been passed and you are outside a quote block.

    Also I agree with LaVolpe that mapping the string to an array (as discussed in an earlier thread of yours) would be an good optimization. I didn't go that far with my example because I wanted to take a quick hack at the logic.

  20. #20
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    @jpbro & dreammanor. The logic can get more complicated when special characters are not matched/paired,
    i.e. "Harry's car is cool"
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  21. #21

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    Quote Originally Posted by qvb6 View Post
    InStrB is slightly faster than InStr(vbBinaryCompare). In the test case in post #5, I got 0.4 Seconds, so it's good for searching for quotes, but the position you get is in Bytes.
    If InstrB is used, the judgment of Chinese characters and Unicode characters will become complicated.

    Quote Originally Posted by LaVolpe View Post
    @jpbro & dreammanor. The logic can get more complicated when special characters are not matched/paired,
    i.e. "Harry's car is cool"
    Yes, I've made some additions and enhancements to jpbro's code, and the logic of the code has become a bit complicated, but it is still faster than my original method (replacing the contents of the quotes with Chr(0)).
    Last edited by dreammanor; Jun 9th, 2019 at 06:55 PM.

  22. #22

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    Quote Originally Posted by jpbro View Post
    There is one potential issue in my code that you will need to be aware of, and may need to fix/change the behaviour. It has to do with setting the Start parameter to a value inside a quoted run of text. This can produce possible unexpected/undesired results.

    Consider the following example:

    Code:
    Debug.Print MyInstr(2, "'Harry' is 'cool'", "Harry")
    That will return 2 (though you may expect 0) because my routine does no back checking to see if it is in a string - it only scans in the forward direction and the scanning has been instructed to begin after the opening apostrophe.

    Likewise, you might expect Debug.Print MyInstr(2, "'Harry' is 'cool'", "is") to return 9, but it returns 0.

    I don't have a solution for this right now, just wanted to bring it to your attention. I think you'll always have to start the scan at the beginning of the string and only start looking for matches once the Start parameter value has been passed and you are outside a quote block.

    Also I agree with LaVolpe that mapping the string to an array (as discussed in an earlier thread of yours) would be an good optimization. I didn't go that far with my example because I wanted to take a quick hack at the logic.
    Hi jpbro, I modified your code, now MyInstr can return the correct results, but as LaVolpe said, the logic of the code becomes a bit complicated, but it is still faster than my original method (replacing the contents of the quotes with Chr(0)). Thank you very much.

    Debug.Print MyInstr(2, "'Harry' is 'cool'", "Harry") ==> 0
    Debug.Print MyInstr(2, "'Harry' is 'cool'", "is") ==> 9
    Debug.Print MyInstr(1, "Harry's car is cool", "Harry") ==> 1

    Code:
    Public Function MyInstr(ByVal Start As Long, _
                            ByVal S1 As String, _
                            ByVal S2 As String, _
                            Optional ByVal Cmp As VBA.VbCompareMethod = vbBinaryCompare, _
                            Optional ByVal SearchQuotedContent As Boolean = False) As Long
       Dim ii As Long
       Dim l1 As Long
       Dim l2 As Long
       Dim l_Char As Integer
       Dim l_InQuote As Integer
       Dim l_QuoteEnd As Long
       Dim l_FirstChar As Integer
       Dim l_S3 As String
       Dim l_Pos As Long
       
       l1 = Len(S1)
       If l1 = 0 Then Exit Function  ' Can't match empty string
       
       l2 = Len(S2)
       If l2 = 0 Then Exit Function  ' Can't match empty string
       
       If l1 < l2 Then Exit Function ' Can't find a longer string in a smaller string
       
       If Start > l1 - l2 + 1 Then Exit Function ' Can't find if start is after end of string1 less the length of string2
        
       '--- DreamManor Added on 2019-06-08 -------------------------------------------
       l_FirstChar = AscW(Left$(S2, 1))
       If Cmp <> vbBinaryCompare Then
          l_S3 = UCase(S2)
       End If
       
       If Start > 1 And Not SearchQuotedContent Then
          l_Pos = MyInstr(1, S1, S2, Cmp, SearchQuotedContent)
          If l_Pos = 0 Then Exit Function
          Do While l_Pos < Start
             l_Pos = MyInstr(l_Pos + 1, S1, S2, Cmp, SearchQuotedContent)
             If l_Pos = 0 Then Exit Function
          Loop
          MyInstr = l_Pos
          Exit Function
       End If
       '---------------------------------------------------------------------------------
       
       l_QuoteEnd = Start   ' Assume everything before Start is in quotes so we don't check it
       
       If Not SearchQuotedContent Then
          For ii = Start To l1
             l_Char = AscW(Mid$(S1, ii, 1))
          
             Select Case l_Char
             Case 34, 39, 96   ' ", ', `
                ' Found a quote character
    
                If l_InQuote Then
                   ' We are already within a quoted block of text
                   If l_InQuote = l_Char Then
                      ' and in a matching quote character
                      ' So close off the quoted content run and remember the starting position of the unquoted run to come
                      l_InQuote = 0
                      l_QuoteEnd = ii + 1
                   End If
                   
                Else
                   ' Entering quote - check previous non-quoted chunk to see if we have a match
                   l_InQuote = l_Char
                   
                   If ii - l_QuoteEnd >= l2 Then
                      ' The previous unquoted run is long enough for a possible match
                      MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, ii - l_QuoteEnd), S2, Cmp)
                      If MyInstr > 0 Then
                         ' We found a match so short-circuit
                         Exit For
                      End If
                   End If
                End If
             
             Case l_FirstChar
                 '--- DreamManor Added on 2019-06-08 -----------------
                 If l_InQuote = 0 Then
                     If Cmp = vbBinaryCompare Then
                        If Mid$(S1, ii, l2) = S2 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    Else
                       If UCase(Mid$(S1, ii, l2)) = l_S3 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    End If
                 End If
                 '-------------------------------------------------------
                
             End Select
          Next ii
       End If
       
       If MyInstr = 0 Then
          ' No match so far
          If l_InQuote = 0 Then
             ' We're not currently in a quoted run at the end of the string, so check the remaining characters
             If l1 - l_QuoteEnd + 1 >= l2 Then
                ' There are enough remaining characters for a possible match
                MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, l1 - l_QuoteEnd + 1), S2, Cmp)
             End If
          End If
       End If
       
       If MyInstr > 0 Then
          If l_QuoteEnd > 0 Then
             ' Add position of closing quote to the matches starting character position
             MyInstr = MyInstr + l_QuoteEnd - 1
          End If
       End If
    End Function
    Edit:
    Sorry, I missed an important parameter: CheckPreviousContent, the corrected code is on #24.
    Last edited by dreammanor; Jun 11th, 2019 at 05:48 AM.

  23. #23
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Hi,

    I don't know what the Text or Textfile looks like you want to search, perhaps splitting
    the problem into smaller parts is an option.

    I'm just guessing here
    I created a Textfile like this

    Code:
    Hi there "The title of the book is 'Harry Potter'. " see some "Harry" movie
    start in Cinema  "in quotes!", world , "more words" bar
    with Regex seperate Text in quotes
    Code:
    Option Explicit
    Private pRegEx As Object
    
    Public Property Get oRegex() As Object
       If (pRegEx Is Nothing) Then
          Set pRegEx = CreateObject("Vbscript.Regexp")
       End If
       Set oRegex = pRegEx
    End Property
    
    Public Function ReadFile(ByRef Path As String) As String
       Dim FileNr As Long
       On Error Resume Next
       If FileLen(Path) = 0 Then Exit Function
       On Error GoTo 0
       FileNr = FreeFile
       Open Path For Binary As #FileNr
       ReadFile = Space$(LOF(FileNr))
       Get #FileNr, , ReadFile
       Close #FileNr
    End Function
    
    Private Sub Command1_Click()
     Dim cMatches As Object
       Dim m As Object
    
       With oRegex
        .Pattern = "\""(.+?)\""" 'get all Text between "..."
        .Global = True
        .MultiLine = True
        
        Set cMatches = .Execute(ReadFile("E:\zTestq.txt"))
          For Each m In cMatches
           Debug.Print m
    'the output:
    '"The title of the book is 'Harry Potter'. "
    '"Harry"
    '"in quotes!"
    '"more words"
    
          Next
       End With
       Set m = Nothing
       Set cMatches = Nothing
    
    End Sub
    
    Private Sub Command2_Click()
     Dim cMatches As Object
     Dim m As Object
       With oRegex
            .Pattern = "\""(.+?)\""|\s(\w+)" 'get Text outside double quotes
        .Global = True
        .MultiLine = True
        Set cMatches = .Execute(ReadFile("E:\zTestq.txt"))
          For Each m In cMatches
           
           Debug.Print m.submatches(1)
    'the output:
    'Hi
    'there
    '
    'see
    'Some
    '
    'movie
    'start
    'in
    'Cinema
    '
    'world
    '
    'bar
           
          Next
       End With
       Set m = Nothing
       Set cMatches = Nothing
       End Sub
    write the output to new Files and perform the search/count there

    hth
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  24. #24

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    Correct the code of #22: I missed an important parameter: CheckPreviousContent

    Code:
    Public Function MyInstr(ByVal Start As Long, _
                            ByVal S1 As String, _
                            ByVal S2 As String, _
                            Optional ByVal Cmp As VBA.VbCompareMethod = vbBinaryCompare, _
                            Optional ByVal SearchQuotedContent As Boolean = False, _
                            Optional ByVal CheckPreviousContent As Boolean = True) As Long
       Dim ii As Long
       Dim l1 As Long
       Dim l2 As Long
       Dim l_Char As Integer
       Dim l_InQuote As Integer
       Dim l_QuoteEnd As Long
       Dim l_FirstChar As Integer
       Dim l_S3 As String
       Dim l_Pos As Long
       
       l1 = Len(S1)
       If l1 = 0 Then Exit Function  ' Can't match empty string
       
       l2 = Len(S2)
       If l2 = 0 Then Exit Function  ' Can't match empty string
       
       If l1 < l2 Then Exit Function ' Can't find a longer string in a smaller string
       
       If Start > l1 - l2 + 1 Then Exit Function ' Can't find if start is after end of string1 less the length of string2
        
       '--- DreamManor Added on 2019-06-08 -------------------------------------------
       l_FirstChar = AscW(Left$(S2, 1))
       If Cmp <> vbBinaryCompare Then
          l_S3 = UCase(S2)
       End If
       
       If Start > 1 And Not SearchQuotedContent Then
          l_Pos = MyInstr(1, S1, S2, Cmp, SearchQuotedContent)
          If l_Pos = 0 Then Exit Function
          Do While l_Pos < Start
             l_Pos = MyInstr(l_Pos + 1, S1, S2, Cmp, SearchQuotedContent, CheckPreviousContent:= False)
             If l_Pos = 0 Then Exit Function
          Loop
          MyInstr = l_Pos
          Exit Function
       End If
       '---------------------------------------------------------------------------------
       
       l_QuoteEnd = Start   ' Assume everything before Start is in quotes so we don't check it
       
       If Not SearchQuotedContent Then
          For ii = Start To l1
             l_Char = AscW(Mid$(S1, ii, 1))
          
             Select Case l_Char
             Case 34, 39, 96   ' ", ', `
                ' Found a quote character
    
                If l_InQuote Then
                   ' We are already within a quoted block of text
                   If l_InQuote = l_Char Then
                      ' and in a matching quote character
                      ' So close off the quoted content run and remember the starting position of the unquoted run to come
                      l_InQuote = 0
                      l_QuoteEnd = ii + 1
                   End If
                   
                Else
                   ' Entering quote - check previous non-quoted chunk to see if we have a match
                   l_InQuote = l_Char
                   
                   If ii - l_QuoteEnd >= l2 Then
                      ' The previous unquoted run is long enough for a possible match
                      MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, ii - l_QuoteEnd), S2, Cmp)
                      If MyInstr > 0 Then
                         ' We found a match so short-circuit
                         Exit For
                      End If
                   End If
                End If
             
             Case l_FirstChar
                 '--- DreamManor Added on 2019-06-08 -----------------
                 If l_InQuote = 0 Then
                     If Cmp = vbBinaryCompare Then
                        If Mid$(S1, ii, l2) = S2 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    Else
                       If UCase(Mid$(S1, ii, l2)) = l_S3 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    End If
                 End If
                 '-------------------------------------------------------
                
             End Select
          Next ii
       End If
       
       If MyInstr = 0 Then
          ' No match so far
          If l_InQuote = 0 Then
             ' We're not currently in a quoted run at the end of the string, so check the remaining characters
             If l1 - l_QuoteEnd + 1 >= l2 Then
                ' There are enough remaining characters for a possible match
                MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, l1 - l_QuoteEnd + 1), S2, Cmp)
             End If
          End If
       End If
       
       If MyInstr > 0 Then
          If l_QuoteEnd > 0 Then
             ' Add position of closing quote to the matches starting character position
             MyInstr = MyInstr + l_QuoteEnd - 1
          End If
       End If
    End Function
    Last edited by dreammanor; Jun 11th, 2019 at 05:48 AM.

  25. #25

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Hi ChrisE, thank you for your code.

    I need to search for some characters from HTML, CSS, JavaScript or TypeScript, for example:
    (1) Search for the first "{" and the last "}" of the following code block
    Code:
    /**
     * Simple example: search the first `{` and  the latst `}` 
     */
    const enum JSONTokenType {
    	UNKNOWN = 0,
    	STRING = 1,
    	LEFT_SQUARE_BRACKET = 2, // [
    	LEFT_CURLY_BRACKET = 3, // {
    	RIGHT_SQUARE_BRACKET = 4, // ]
    	RIGHT_CURLY_BRACKET = 5, // }
    	COLON = 6, // :
    	COMMA = 7, // ,
    	NULL = 8,
    	TRUE = 9,
    	FALSE = 10,
    	NUMBER = 11
    }
    (2) Search for the start symbol "{" and the end symbol "}" of a TypeScript function body
    Code:
    /**
     * Complex example: search for the left curly brace ("{") of the function code block and the corresponding right curly brace ("}")
     */
    function testMatchers<T>(selector: string, matchesName: (names: string[], matcherInput: T) => { return someObject }): MatcherWithPriority<T>[] {
    	var results = <MatcherWithPriority<T>[]> [];
    	var tokenizer = newTokenizer(selector);
    	var token = tokenizer.next();
    	while (token !== null) {
    		let priority : -1 | 0 | 1 = 0;
    		if (token.length === 2 && token.charAt(1) === ':') {
    			switch (token.charAt(0)) {
    				case 'R': priority = 1; break;
    				case 'L': priority = -1; break;
    				case '{': priority = 1; break;		// {
    				case '}': priority = -1; break;		// }
    				default:
    					console.log(`Unknown priority ${token} in scope selector`);
    			}
    			token = tokenizer.next();
    		}
    		let matcher = parseConjunction();
    		if (matcher) {
    			results.push({ matcher, priority });
    		}
    		if (token !== '}') {
    			break;
    		}
    		token = tokenizer.next();
    	}
    	return results;
    }
    How to achieve the above goals with regular expressions? Thanks!

  26. #26
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Hi ChrisE, thank you for your code.

    I need to search for some characters from HTML, CSS, JavaScript or TypeScript, for example:
    (1) Search for the first "{" and the last "}" of the following code block
    Code:
    /**
     * Simple example: search the first `{` and  the latst `}` 
     */
    const enum JSONTokenType {
    	UNKNOWN = 0,
    	STRING = 1,
    	LEFT_SQUARE_BRACKET = 2, // [
    	LEFT_CURLY_BRACKET = 3, // {
    	RIGHT_SQUARE_BRACKET = 4, // ]
    	RIGHT_CURLY_BRACKET = 5, // }
    	COLON = 6, // :
    	COMMA = 7, // ,
    	NULL = 8,
    	TRUE = 9,
    	FALSE = 10,
    	NUMBER = 11
    }
    (2) Search for the start symbol "{" and the end symbol "}" of a TypeScript function body
    Code:
    /**
     * Complex example: search for the left curly brace ("{") of the function code block and the corresponding right curly brace ("}")
     */
    function testMatchers<T>(selector: string, matchesName: (names: string[], matcherInput: T) => { return someObject }): MatcherWithPriority<T>[] {
    	var results = <MatcherWithPriority<T>[]> [];
    	var tokenizer = newTokenizer(selector);
    	var token = tokenizer.next();
    	while (token !== null) {
    		let priority : -1 | 0 | 1 = 0;
    		if (token.length === 2 && token.charAt(1) === ':') {
    			switch (token.charAt(0)) {
    				case 'R': priority = 1; break;
    				case 'L': priority = -1; break;
    				case '{': priority = 1; break;		// {
    				case '}': priority = -1; break;		// }
    				default:
    					console.log(`Unknown priority ${token} in scope selector`);
    			}
    			token = tokenizer.next();
    		}
    		let matcher = parseConjunction();
    		if (matcher) {
    			results.push({ matcher, priority });
    		}
    		if (token !== '}') {
    			break;
    		}
    		token = tokenizer.next();
    	}
    	return results;
    }
    How to achieve the above goals with regular expressions? Thanks!
    Regex is the wrong Tool for the above, it would work to a certain point but you will have to wright your own parser for that search
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  27. #27

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by ChrisE View Post
    Regex is the wrong Tool for the above, it would work to a certain point but you will have to wright your own parser for that search
    Yes, you are right. Currently, jpbro's approach seems to be the most feasible.

  28. #28
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Yes, you are right. Currently, jpbro's approach seems to be the most feasible.
    You'll need a lexer that tokenizes the input for a non-paliative solution. jpbro's approach is pretty unextendable and falls flat with string literals and open/close brackets inside block/line comments for instance.

    IMO you don't need full language parser, just a lexer to impl keywords/strings/numbers highlighting and/or "match opening/closing bracket" functionality.

    Btw, PEG parsers combine lexer/parser (i.e. they don't have a separate lexer) but I'm positive VbPeg can be used to impl a JS/TS lexer that returns array of (token_type, offset+size) tuples from an input string. It's the nesting of the { } that a JS/TS parser would handle while the lexer just marks these as OPEN_BACKET/CLOSE_BRACKET types only, w/ no nesting level tracked.

    cheers,
    </wqw>

  29. #29
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Yes, you are right. Currently, jpbro's approach seems to be the most feasible.
    I tried your two samples for the search out of interest.

    here the results, you'll see that the first seams to to work correct, but the second doesn't
    so with regex it 'kinda' works a little

    Code:
    Private Sub Command3_Click()
     Dim cMatches As Object
     Dim m As Object
     
       With oRegex
       'get first { and ignore any closing } brackets in between
       'go to the last closing bracket }
        .Pattern = "\{[^()]*\}*"
        .Global = True
        .MultiLine = True
        Set cMatches = .Execute(ReadFile("E:\zSearch.txt"))
          For Each m In cMatches
           
           Debug.Print m
    
    ''output from Textfile zSearch.txt:
    '{
    '    UNKNOWN = 0,
    '    STRING = 1,
    '    LEFT_SQUARE_BRACKET = 2, // [
    '    LEFT_CURLY_BRACKET = 3, // {
    '    RIGHT_SQUARE_BRACKET = 4, // ]
    '    RIGHT_CURLY_BRACKET = 5, // }
    '    COLON = 6, // :
    '    COMMA = 7, // ,
    '    NULL = 8,
    '    TRUE = 9,
    '    FALSE = 10,
    '    Number = 11
    '}
    
    'output other textfile zSearch2.txt:
    
    '{ return someObject }
    '{
    '    var results = <MatcherWithPriority<T>[]> [];
    '    Var tokenizer = newTokenizer
    '{
    '        let priority : -1 | 0 | 1 = 0;
    '        if
    '{
    '            Switch
    '{
    '                case 'R': priority = 1; break;
    '                case 'L': priority = -1; break;
    '                case '{': priority = 1; break;      // {
    '                case '}': priority = -1; break;     // }
    'default:
    '                    console.Log
    '{token} in scope selector`
    '{
    '            results.push
    '{ matcher, priority }
    '{
    '            break;
    '        }
    '        token = tokenizer.Next
    
          Next
       End With
       Set m = Nothing
       Set cMatches = Nothing
    End Sub
    Last edited by ChrisE; Jun 14th, 2019 at 05:49 AM.
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  30. #30
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by wqweto View Post
    You'll need a lexer that tokenizes the input for a non-paliative solution. jpbro's approach is pretty unextendable and falls flat with string literals and open/close brackets inside block/line comments for instance
    Agreed - my approach was only intended as a response to the original question for an InStr replacement that ignores text within various "quotes". Even then it was only posted as a nudge in a possible direction as I wrote it in a few minutes and didn't test it much at all. So anyone using it please beware - it's not polished/production-ready code! If the ultimate need is for a lexer, then my approach is not appropriate.

  31. #31

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by jpbro View Post
    Agreed - my approach was only intended as a response to the original question for an InStr replacement that ignores text within various "quotes". Even then it was only posted as a nudge in a possible direction as I wrote it in a few minutes and didn't test it much at all. So anyone using it please beware - it's not polished/production-ready code! If the ultimate need is for a lexer, then my approach is not appropriate.
    Yes, I need not only a lexer but also a full-language parser. But your code MyInstr is still very valuable to me, I'll further improve it, and will develop MySplit based on it, these functions can be used to search for some strings in HTML, CSS. Thank you, jpbro.

  32. #32

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by ChrisE View Post
    I tried your two samples for the search out of interest.

    here the results, you'll see that the first seams to to work correct, but the second doesn't
    so with regex it 'kinda' works a little

    Code:
    Private Sub Command3_Click()
     Dim cMatches As Object
     Dim m As Object
     
       With oRegex
       'get first { and ignore any closing } brackets in between
       'go to the last closing bracket }
        .Pattern = "\{[^()]*\}*"
        .Global = True
        .MultiLine = True
        Set cMatches = .Execute(ReadFile("E:\zSearch.txt"))
          For Each m In cMatches
           
           Debug.Print m
    
    ''output from Textfile zSearch.txt:
    '{
    '    UNKNOWN = 0,
    '    STRING = 1,
    '    LEFT_SQUARE_BRACKET = 2, // [
    '    LEFT_CURLY_BRACKET = 3, // {
    '    RIGHT_SQUARE_BRACKET = 4, // ]
    '    RIGHT_CURLY_BRACKET = 5, // }
    '    COLON = 6, // :
    '    COMMA = 7, // ,
    '    NULL = 8,
    '    TRUE = 9,
    '    FALSE = 10,
    '    Number = 11
    '}
    
    'output other textfile zSearch2.txt:
    
    '{ return someObject }
    '{
    '    var results = <MatcherWithPriority<T>[]> [];
    '    Var tokenizer = newTokenizer
    '{
    '        let priority : -1 | 0 | 1 = 0;
    '        if
    '{
    '            Switch
    '{
    '                case 'R': priority = 1; break;
    '                case 'L': priority = -1; break;
    '                case '{': priority = 1; break;      // {
    '                case '}': priority = -1; break;     // }
    'default:
    '                    console.Log
    '{token} in scope selector`
    '{
    '            results.push
    '{ matcher, priority }
    '{
    '            break;
    '        }
    '        token = tokenizer.Next
    
          Next
       End With
       Set m = Nothing
       Set cMatches = Nothing
    End Sub
    Thank you, ChrisE. Is it possible to accomplish some very complex search logic with multiple RegExp patterns?

  33. #33

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by wqweto View Post
    You'll need a lexer that tokenizes the input for a non-paliative solution. jpbro's approach is pretty unextendable and falls flat with string literals and open/close brackets inside block/line comments for instance.

    IMO you don't need full language parser, just a lexer to impl keywords/strings/numbers highlighting and/or "match opening/closing bracket" functionality.

    Btw, PEG parsers combine lexer/parser (i.e. they don't have a separate lexer) but I'm positive VbPeg can be used to impl a JS/TS lexer that returns array of (token_type, offset+size) tuples from an input string. It's the nesting of the { } that a JS/TS parser would handle while the lexer just marks these as OPEN_BACKET/CLOSE_BRACKET types only, w/ no nesting level tracked.

    cheers,
    </wqw>
    Hi wqweto, I've been learning about PEG for a few days, but obviously I still need to spend more time studying. Could you explain the technical difference between your VbPEG and Gold Parser and PEG.js? Thank you.

    In addition, I'd to know if VbPEG can achieve conversion between different languages. If you could demonstrate how to convert a small piece of kscope code into VB code, that would be great.

    Edit:
    When I execute "VbPeg.exe VbPeg.peg -tree" and "VbPeg.exe VbPeg.peg -ir", the result displayed in the console is the content of cParser.cls.
    Last edited by dreammanor; Jun 15th, 2019 at 10:51 PM.

  34. #34
    PowerPoster ChrisE's Avatar
    Join Date
    Jun 2017
    Location
    Frankfurt
    Posts
    3,046

    Re: (String search algorithm) Skip the contents of the quotes to find a substring

    Quote Originally Posted by dreammanor View Post
    Thank you, ChrisE. Is it possible to accomplish some very complex search logic with multiple RegExp patterns?
    like I said .. it 'kinda' works. take the advice from wqweto.
    regex is the wrong Tool for this
    to hunt a species to extinction is not logical !
    since 2010 the number of Tigers are rising again in 2016 - 3900 were counted. with Baby Callas it's 3901, my wife and I had 2-3 months the privilege of raising a Baby Tiger.

  35. #35
    Addicted Member gilman's Avatar
    Join Date
    Jan 2017
    Location
    Bilbao
    Posts
    176

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    Quote Originally Posted by dreammanor View Post
    Correct the code of #22: I missed an important parameter: CheckPreviousContent

    Code:
    Public Function MyInstr(ByVal Start As Long, _
                            ByVal S1 As String, _
                            ByVal S2 As String, _
                            Optional ByVal Cmp As VBA.VbCompareMethod = vbBinaryCompare, _
                            Optional ByVal SearchQuotedContent As Boolean = False, _
                            Optional ByVal CheckPreviousContent As Boolean = True) As Long
       Dim ii As Long
       Dim l1 As Long
       Dim l2 As Long
       Dim l_Char As Integer
       Dim l_InQuote As Integer
       Dim l_QuoteEnd As Long
       Dim l_FirstChar As Integer
       Dim l_S3 As String
       Dim l_Pos As Long
       
       l1 = Len(S1)
       If l1 = 0 Then Exit Function  ' Can't match empty string
       
       l2 = Len(S2)
       If l2 = 0 Then Exit Function  ' Can't match empty string
       
       If l1 < l2 Then Exit Function ' Can't find a longer string in a smaller string
       
       If Start > l1 - l2 + 1 Then Exit Function ' Can't find if start is after end of string1 less the length of string2
        
       '--- DreamManor Added on 2019-06-08 -------------------------------------------
       l_FirstChar = AscW(Left$(S2, 1))
       If Cmp <> vbBinaryCompare Then
          l_S3 = UCase(S2)
       End If
       
       If Start > 1 And Not SearchQuotedContent Then
          l_Pos = MyInstr(1, S1, S2, Cmp, SearchQuotedContent)
          If l_Pos = 0 Then Exit Function
          Do While l_Pos < Start
             l_Pos = MyInstr(l_Pos + 1, S1, S2, Cmp, SearchQuotedContent, CheckPreviousContent:= False)
             If l_Pos = 0 Then Exit Function
          Loop
          MyInstr = l_Pos
          Exit Function
       End If
       '---------------------------------------------------------------------------------
       
       l_QuoteEnd = Start   ' Assume everything before Start is in quotes so we don't check it
       
       If Not SearchQuotedContent Then
          For ii = Start To l1
             l_Char = AscW(Mid$(S1, ii, 1))
          
             Select Case l_Char
             Case 34, 39, 96   ' ", ', `
                ' Found a quote character
    
                If l_InQuote Then
                   ' We are already within a quoted block of text
                   If l_InQuote = l_Char Then
                      ' and in a matching quote character
                      ' So close off the quoted content run and remember the starting position of the unquoted run to come
                      l_InQuote = 0
                      l_QuoteEnd = ii + 1
                   End If
                   
                Else
                   ' Entering quote - check previous non-quoted chunk to see if we have a match
                   l_InQuote = l_Char
                   
                   If ii - l_QuoteEnd >= l2 Then
                      ' The previous unquoted run is long enough for a possible match
                      MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, ii - l_QuoteEnd), S2, Cmp)
                      If MyInstr > 0 Then
                         ' We found a match so short-circuit
                         Exit For
                      End If
                   End If
                End If
             
             Case l_FirstChar
                 '--- DreamManor Added on 2019-06-08 -----------------
                 If l_InQuote = 0 Then
                     If Cmp = vbBinaryCompare Then
                        If Mid$(S1, ii, l2) = S2 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    Else
                       If UCase(Mid$(S1, ii, l2)) = l_S3 Then
                           l_QuoteEnd = 0:   MyInstr = ii:   Exit For
                        End If
                    End If
                 End If
                 '-------------------------------------------------------
                
             End Select
          Next ii
       End If
       
       If MyInstr = 0 Then
          ' No match so far
          If l_InQuote = 0 Then
             ' We're not currently in a quoted run at the end of the string, so check the remaining characters
             If l1 - l_QuoteEnd + 1 >= l2 Then
                ' There are enough remaining characters for a possible match
                MyInstr = InStr(1, Mid$(S1, l_QuoteEnd, l1 - l_QuoteEnd + 1), S2, Cmp)
             End If
          End If
       End If
       
       If MyInstr > 0 Then
          If l_QuoteEnd > 0 Then
             ' Add position of closing quote to the matches starting character position
             MyInstr = MyInstr + l_QuoteEnd - 1
          End If
       End If
    End Function
    This code has a problem, and I don't konw if has a solution, you can try
    Code:
        Debug.Print MyInstr(1, "Mark O'Brian, Tim O'Sullivan", "Tim")
    Returns 0 but I think the correct answer is 15

  36. #36

    Thread Starter
    PowerPoster
    Join Date
    Sep 2012
    Posts
    2,083

    Re: [RESOLVED] (String search algorithm) Skip the contents of the quotes to find a su

    Quote Originally Posted by gilman View Post
    This code has a problem, and I don't konw if has a solution, you can try
    Code:
        Debug.Print MyInstr(1, "Mark O'Brian, Tim O'Sullivan", "Tim")
    Returns 0 but I think the correct answer is 15
    Hi Gilman, the correct return value should be 0.

    In addition, you can add the judgment of the escape character ("\") to MyInstr. In this case, if you want the return value to be 15, you can add an escape character ("\") to the left of the single quote.
    Last edited by dreammanor; Jun 16th, 2019 at 07:08 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width