|
-
Apr 18th, 2007, 05:46 AM
#1
Faster InStrRev
As many of us know the InStrRev in VB6 is on the slower side. The bigger strings you give it the slower it gets, leaving room for a lot of improvement.
I made this replacement function that gives some improvement against the original function, although I didn't optimize the maximum out of it to keep it simple for anyone to use.
Code:
Option Explicit
Private Declare Sub RtlMoveMemory Lib "ntdll.dll" (ByRef lpvDest As Any, ByRef lpvSrc As Any, ByVal cbLen As Long)
'Private Declare Function VarPtrArray Lib "msvbvm50.dll" Alias "VarPtr" (Var() As Any) As Long
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Var() As Any) As Long
Public Function InStrRev(ByRef String1 As String, ByRef String2 As String, Optional ByVal Start As Long = -1, Optional ByVal CompareMethod As VbCompareMethod = vbBinaryCompare) As Long
Static BufStrHeader(5) As Long, BufFindHeader(5) As Long, BufFind2Header(5) As Long
Dim BufStr() As Integer, BufFind() As Integer, BufFind2() As Integer, KeyLC As String, KeyUC As String
Dim intComp1 As Integer, intComp2 As Integer, intComp3 As Integer, intComp4 As Integer
Dim intComp1b As Integer, intComp3b As Integer
Dim lngA As Long, lngB As Long, lngStartEnd As Long, lngKeyEnd As Long
lngKeyEnd = Len(String2)
If LenB(String1) = 0 Or lngKeyEnd = 0 Then Exit Function
lngStartEnd = Len(String1) - lngKeyEnd
If Start < 1 Then
Start = lngStartEnd
ElseIf Start > lngStartEnd Then
Start = lngStartEnd
Else
Start = Start - 1
End If
BufStrHeader(0) = 1
BufStrHeader(1) = 2
BufStrHeader(3) = StrPtr(String1)
BufStrHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufStr), VarPtr(BufStrHeader(0)), 4
If CompareMethod = vbBinaryCompare Then
BufFindHeader(0) = 1
BufFindHeader(1) = 2
BufFindHeader(3) = StrPtr(String2)
BufFindHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind), VarPtr(BufFindHeader(0)), 4
If lngKeyEnd = 1 Then
intComp1 = BufFind(0)
For lngA = Start To 0 Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Then Exit For
Next lngA
If lngA > -1 Then InStrRev = lngA + 1
Else
lngKeyEnd = lngKeyEnd - 1
intComp1 = BufFind(lngKeyEnd)
For lngA = Start + lngKeyEnd To lngKeyEnd Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Then
For lngB = 0 To lngKeyEnd - 1
intComp3 = BufFind(lngB)
intComp4 = BufStr(lngA - lngKeyEnd + lngB)
If intComp3 <> intComp4 Then Exit For
Next lngB
If lngB = lngKeyEnd Then InStrRev = lngA - lngKeyEnd + 1: Exit For
End If
Next lngA
End If
RtlMoveMemory ByVal VarPtrArray(BufStr), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind), 0&, 4
Else
KeyUC = UCase$(String2)
KeyLC = LCase$(String2)
BufFindHeader(0) = 1
BufFindHeader(1) = 2
BufFindHeader(3) = StrPtr(KeyUC)
BufFindHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind), VarPtr(BufFindHeader(0)), 4
BufFind2Header(0) = 1
BufFind2Header(1) = 2
BufFind2Header(3) = StrPtr(KeyLC)
BufFind2Header(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind2), VarPtr(BufFind2Header(0)), 4
If lngKeyEnd = 1 Then
intComp1 = BufFind(0)
intComp1b = BufFind2(0)
For lngA = Start To 0 Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Or intComp1b = intComp2 Then Exit For
Next lngA
If lngA > -1 Then InStrRev = lngA + 1
Else
lngKeyEnd = lngKeyEnd - 1
intComp1 = BufFind(lngKeyEnd)
intComp1b = BufFind2(lngKeyEnd)
For lngA = Start + lngKeyEnd To lngKeyEnd Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Or intComp1b = intComp2 Then
For lngB = 0 To lngKeyEnd - 1
intComp3 = BufFind(lngB)
intComp3b = BufFind2(lngB)
intComp4 = BufStr(lngA - lngKeyEnd + lngB)
If intComp3 <> intComp4 And intComp3b <> intComp4 Then Exit For
Next lngB
If lngB = lngKeyEnd Then InStrRev = lngA - lngKeyEnd + 1: Exit For
End If
Next lngA
End If
RtlMoveMemory ByVal VarPtrArray(BufStr), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind2), 0&, 4
End If
End Function
The major optimizations I can think of:- Use Boyer-Moore or other text searching algorithm for keywords that are longer than 5 characters (binary compare) or 3 characters (text compare).
- Pass string pointers and string lengths instead of passing the string datatype (in which case I'd suggest renaming the function to InPtrRevW for UTF-16 version and InPtrRevA for ANSI version).
- Only call RtlMoveMemory when required and when program ends (instead of always in the beginning of each function call). This could mean wrapping the function into a Class Module to avoid extra function calls to initialize and clean up; although classes are far slower.
- It is possible to declare API functions in a faster way in VB6 by making a custom function and faking it to point to the API function. Thus we could get faster access to RtlMoveMemory and VarPtrArray.
You don't need to use this code as a base to your own work, although you might find some tricks in it that are useful (especially the safearray structure).
Note!
It is important when working with speed orientated things that you benchmark only compiled code. Another big thing that affects speed are the advanced optimizations in the compile tab when you're setting the options of your executable. Ticking everything on can have a major effects on the compiled speed of your program if you've used a lot of math and arrays. However, this also makes your program more unstable if you have any errors in your code as there will be no array boundary checks for example.
-
Apr 19th, 2007, 03:35 AM
#2
Re: Faster InStrRev
I tested your code and found it slightly better looking thru a 5mb (the Permutation challenge) file only by about 41ms. Even with the advanced optimizations compiled into the code.
Could that be better?
-
Apr 19th, 2007, 05:29 AM
#3
Re: Faster InStrRev
The function could do some more optimization for short keywords by caching more of the keyword into variables for fast access (like three or four characters depending on odd or even length of string). However, a lot of the slowdown is caused by constant API calls which are in this implementation forced to be in there (as easy usage won against speed in this case; and I wanted to leave space for speed improvements anyway).
-
Apr 21st, 2007, 06:31 AM
#4
Re: Faster InStrRev
Without using any API, my initial thought was to combine InStr() with a binary search. Sadly, this solution ends up running extremely slowly, even if the bulky up-front validation is remarked out.
Code:
Public Function InStrRev_BinSearch(ByVal pstrText As String, ByVal pstrFind As String, Optional plngStart As Long = -1, Optional penCompare As VbCompareMethod = vbBinaryCompare) As Long
Dim lngLeft As Long
Dim lngMid As Long
Dim lngRight As Long
Dim lngPos As Long
Dim lngMatch As Long
' =====> Begin Prep
If Not penCompare = vbBinaryCompare Then
pstrText = LCase$(pstrText)
pstrFind = LCase$(pstrFind)
End If
' Validate pstrText
Select Case Len(pstrText)
Case 0: Exit Function
Case Is < Len(pstrFind): Exit Function
End Select
' Validate pstrFind
If Len(pstrFind) = 0 Then
Select Case plngStart
Case -1: InStrRev_BinSearch = Len(pstrText)
Case Is < 1: Err.Raise 5
Case Is > Len(pstrText)
Case Else: InStrRev_BinSearch = plngStart
End Select
Exit Function
End If
' Identify boundaries
lngLeft = 1
If plngStart = -1 Then
lngRight = Len(pstrText) - Len(pstrFind) + 1
Else
lngRight = plngStart - Len(pstrFind) + 1
End If
' <===== End Prep
Do While lngLeft <= lngRight
lngMid = (lngLeft + lngRight) \ 2
lngPos = InStr(lngMid, pstrText, pstrFind)
Select Case lngPos
Case 0
lngRight = lngMid - 1
Case Is > lngRight
lngRight = lngMid - 1
Case lngRight
lngMatch = lngPos
Exit Do
Case Else
lngMatch = lngPos
lngLeft = lngPos + 1
End Select
Loop
InStrRev_BinSearch = lngMatch
End Function
Then I tried a braindead approach of looping through the string until InStr() returned 0, and that ended up being twice as fast as InStrRev(). Go figure.
Code:
Public Function InStrReverse(ByVal pstrText As String, ByVal pstrFind As String, Optional plngStart As Long = -1, Optional penCompare As VbCompareMethod = vbBinaryCompare) As Long
Dim lngLeft As Long
Dim lngRight As Long
Dim lngPos As Long
Dim lngMatch As Long
' =====> Begin Prep
If Not penCompare = vbBinaryCompare Then
pstrText = LCase$(pstrText)
pstrFind = LCase$(pstrFind)
End If
' Validate pstrText
Select Case Len(pstrText)
Case 0: Exit Function
Case Is < Len(pstrFind): Exit Function
End Select
' Validate pstrFind
If Len(pstrFind) = 0 Then
Select Case plngStart
Case -1: InStrReverse = Len(pstrText)
Case Is < 1: Err.Raise 5
Case Is > Len(pstrText)
Case Else: InStrReverse = plngStart
End Select
Exit Function
End If
' Identify boundaries
lngLeft = 1
If plngStart = -1 Then
lngRight = Len(pstrText) - Len(pstrFind) + 1
Else
lngRight = plngStart - Len(pstrFind) + 1
End If
' <===== End Prep
For lngPos = lngLeft To lngRight
lngPos = InStr(lngPos, pstrText, pstrFind)
Select Case lngPos
Case 0: Exit For
Case Is > lngRight: Exit For
Case Else: lngMatch = lngPos
End Select
Next
InStrReverse = lngMatch
End Function
Both mirror InStrRev() functionality exactly. (ETA: Except, of course, for textual comparison involving foreign characters.)
Last edited by Ellis Dee; Apr 23rd, 2007 at 07:44 AM.
-
Apr 21st, 2007, 12:31 PM
#5
Re: Faster InStrRev
I am not surprised. Instr by default does a binary search...
-
Apr 21st, 2007, 09:53 PM
#6
Re: Faster InStrRev
Ellis: you're applying LCase$ when doing a binary comparison and not when you're doing a text comparison when you should do it the other way around. In the other hand, why you're using somewhat slow LCase$ when you could just pass CompareMethod directly to InStr?
Simplest test strings:
Search = "aA"
Keyword = "a"
Binary comparison should return 1 and text comparison 2.
InStr is very fast as it is assembly optimized and it is very fast to access from VB code as well. However, InStr isn't as fast anymore when you do text comparison with it and you can do native VB code that runs faster. InStrRev calls InStr as well, only that it does it once it has inverted the strings (on this I have to note that I base this on what I've read elsewhere, I can't check or verify this behavior myself).
But what we have to notice is that you can't make InStr method fast if the input strings are long: the worst case scenario is that what you're looking for is at the end of the search string. And even worse case scenario is that you should work in text compare mode.
-
Apr 21st, 2007, 11:18 PM
#7
Re: Faster InStrRev
 Originally Posted by Merri
Ellis: you're applying LCase$ when doing a binary comparison and not when you're doing a text comparison when you should do it the other way around. In the other hand, why you're using somewhat slow LCase$ when you could just pass CompareMethod directly to InStr?
No, I'm applying LCase$ when not matching case. Matching the case is a binary comparison; ignoring case is a text comparison.
I'll grant that the LCase$ calls are slower than a single InStr() call even when using vbTextCompare, but the margin isn't huge. Since the function makes multiple InStr() calls, the two LCase$ calls upfront result in faster execution than multiple InStr() calls doing text comparison.
InStr is very fast as it is assembly optimized and it is very fast to access from VB code as well. However, InStr isn't as fast anymore when you do text comparison with it and you can do native VB code that runs faster.
Or just normalize the text case and use the super-fast binary compare method of InStr(). (As in my example.)
But what we have to notice is that you can't make InStr method fast if the input strings are long: the worst case scenario is that what you're looking for is at the end of the search string. And even worse case scenario is that you should work in text compare mode.
That was my original thought to the binary search method, but it didn't seem to be much help outside of extreme cases. (For example, searching for the last "x" in a string of 500 "x"s. But how useful is that, really?)
Last edited by Ellis Dee; Apr 22nd, 2007 at 12:18 AM.
-
Apr 21st, 2007, 11:27 PM
#8
Re: Faster InStrRev
 Originally Posted by Ellis Dee
Or just normalize the text case and use the super-fast binary compare method of InStr().
Specifically, my benchmarks show the following results when ignoring case. (vbTextCompare)
InStr(): 0.91101 seconds (included simply for comparison)
InStrRev(): 1.53296 seconds
InStrReverse(): 0.70105 seconds
So using my benchmark program, a single native InStr() call using vbTextCompare is noticeably slower (.9 seconds) than my InStrReverse() implementation (.7 seconds), which does two LCase$ calls upfront followed by a loop of InStr() calls using vbBinaryCompare.
Just for completeness, the numbers I get when doing a binary comparison:
InStr(): 0.33008 seconds
InStrRev(): 0.89111 seconds
InStrReverse(): 0.45117 seconds
Last edited by Ellis Dee; Apr 22nd, 2007 at 12:25 AM.
-
Apr 22nd, 2007, 01:56 AM
#9
Re: Faster InStrRev
To avoid confusions, you should keep the function call as identical to the original function as possible.
Benchmarks don't really tell anything unless you give the strings you used. Results can change dramatically depending on conditions. Things that can affect:- The length of the test strings.
- The position where a match can be found.
- Binary compare or text compare.
This is why I didn't give any specific results as I didn't do a complete testing which would have required several different kinds of benchmarks to see what are the strengths and weaknesses.
For example:
Code:
KEYWORD = "abcdefghijklmnopqrstuvwxyzåäö"
TEST = KEYWORD & String$(100000, "X")
Resulted in my function being three times faster than InStrReverse when doing text compare, despite this being the best possible condition for your function and the worst possible for my function. Binary comparison is of course a different story, because there is no LCase$ done and it is impossible to compete with a highly optimized ASM code with VB6 code: both codes loop through every single byte. So for this I got 2.0 seconds vs 4.7 seconds (with 10000 loops).
But if we turn things around in the search string:
Code:
KEYWORD = "abcdefghijklmnopqrstuvwxyzåäö"
TEST = String$(100000, "X") & KEYWORD
We're still working in binary compare, but InStrReverse gives the exact same result, 2 seconds, while my function enjoys it's best case and is already done in 0.01 seconds. When I switch to text compare, your function runs for a whopping 16 seconds while my function is still at 0.01 seconds (original InStrRev ran for 23 seconds in this latest test).
This is why it is important to do multiple benchmarks and test with various sized strings.
Last edited by Merri; Apr 22nd, 2007 at 02:01 AM.
-
Apr 22nd, 2007, 03:58 AM
#10
Re: Faster InStrRev
 Originally Posted by Merri
Benchmarks don't really tell anything unless you give the strings you used. Results can change dramatically depending on conditions.
[...] When I switch to text compare, your function runs for a whopping 16 seconds while my function is still at 0.01 seconds (original InStrRev ran for 23 seconds in this latest test).
This is why it is important to do multiple benchmarks and test with various sized strings.
I make no claims that my function is faster than yours; only that it is faster than InStrRev().
As for the benchmark, mine is much smaller. (100,000+ characters? Good lord.)
I create 25000 text and search string pairs. Each string's length is randomly determined, and each character is randomly generated. Search strings are between 3 and 7 characters long, and text strings are between 100 and 1000 characters long. In order to ensure there are a reasonable number of hits, each character in the text string has a one in three chance to be one of the characters from its corresponding search string. This results in roughly 4000 matches out of the 25000 trials.
I used pblnMatchCase instead of a vbCompare enumeration because vbDatabaseCompare only applies to Access VBA, and is therefore inappropriate to a VB implementation. But on testing I see that it does a text comparison, so I'll update my functions with that now.
-
Apr 22nd, 2007, 06:14 AM
#11
Re: Faster InStrRev
I'm in the process of adding your API version to my benchmark program. I'll post the complete code here in a bit. In the meantime, I notice that neither my nor your function perfectly mimics InStrRev() return values or runtime errors. These special cases are a good reference for anyone wanting to craft a solution to the OP:
InStrRev("ABCDE", "BCD", 0) generates runtime error 5
InStrRev("ABCDE", "BCD", -2) generates runtime error 5
InStrRev("ABCDE", "BCD", 6) returns 0
InStrRev("ABCDE", "BCD", 3) returns 0
InStrRev("ABCDE", "") returns 5
InStrRev("ABCDE", "", 3) returns 3
Also inconsistent is how both of our solutions handle vbTextCompare when foreign characters are allowed. InStrRev is a bit goofy on that; I'm not even sure how to fix my solution for that. (Shouldn't InStr() work the same way?)
I've added some enhancements to the benchmark program. The prep has been optimized to be much faster, and I've added several new constants to make it easier to use. Most should be reasonably self-evident.
Settings
Tests = 25000
Compare = vbBinaryCompare
FindMinLen = 3
FindMaxLen = 7
TextMinLen = 100
TextMaxLen = 1000
AscMin = 32
AscMax = 167
Seed = 4
Results
1663 of 25000 trials find match
InStr: 0.27930 seconds
InStrRev: 0.96289 seconds
InStrReverse: 0.35938 seconds
InStrRev_API: 0.44141 seconds
Settings
Tests = 25000
Compare = vbTextCompare
FindMinLen = 3
FindMaxLen = 7
TextMinLen = 100
TextMaxLen = 1000
AscMin = 32
AscMax = 127
Seed = 4
Results
1996 of 25000 trials find match
InStr: 0.86133 seconds
InStrRev: 1.57227 seconds
InStrReverse: 0.62109 seconds
InStrRev_API: 0.64062 seconds
Notice that when using text comparison, I had to limit the eligible ASCII values to 127 because the foreign characters make both of our solutions fail.
Also, the Seed value represents how much you want to "sweeten" pstrText to increase matches. Every character in pstrText has a 1 in Seed chance to be randomly pulled from the Find string instead of the AscMin to AscMax range. So if you want more matches, lower the Seed; raise it to get fewer matches.
Last edited by Ellis Dee; Apr 22nd, 2007 at 06:17 AM.
-
Apr 22nd, 2007, 06:20 AM
#12
Re: Faster InStrRev
Here's the complete benchmark program. I had to remove the root indentation to get it under the 10000 character maximum.
Code:
Option Explicit
Private Const Tests = 25000
Private Const Compare = vbTextCompare
Private Const FindMinLen = 3
Private Const FindMaxLen = 7
Private Const TextMinLen = 100
Private Const TextMaxLen = 1000
Private Const AscMin = 32
Private Const AscMax = 127
Private Const Seed = 4
Private Type TestType
Text As String
Find As String
End Type
Private typTest(1 To Tests) As TestType
Private Declare Sub RtlMoveMemory Lib "ntdll.dll" (ByRef lpvDest As Any, ByRef lpvSrc As Any, ByVal cbLen As Long)
'Private Declare Function VarPtrArray Lib "msvbvm50.dll" Alias "VarPtr" (Var() As Any) As Long
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Var() As Any) As Long
Sub Main()
Dim lngReturn As Long
Dim strResults As String
Dim lngHits As Long
Dim i As Long
Dim sngStart As Single
Randomize Timer
If Not PrepareTest(lngHits) Then Exit Sub
strResults = lngHits & " of " & Tests & " trials find match" & vbCrLf & vbCrLf
' Run benchmark - InStr
sngStart = Timer
For i = 1 To Tests
With typTest(i)
lngReturn = InStr(1, .Text, .Find, Compare)
End With
Next
strResults = strResults & "InStr: " & Format(Timer - sngStart, "0.00000") & " seconds" & vbCrLf
' Run benchmark - InStrRev
sngStart = Timer
For i = 1 To Tests
With typTest(i)
lngReturn = InStrRev(.Text, .Find, -1, Compare)
End With
Next
strResults = strResults & "InStrRev: " & Format(Timer - sngStart, "0.00000") & " seconds" & vbCrLf
' Run benchmark - InStrReverse
sngStart = Timer
For i = 1 To Tests
With typTest(i)
lngReturn = InStrReverse(.Text, .Find, -1, Compare)
End With
Next
strResults = strResults & "InStrReverse: " & Format(Timer - sngStart, "0.00000") & " seconds" & vbCrLf
' Run benchmark - InStrRev_API
sngStart = Timer
For i = 1 To Tests
With typTest(i)
lngReturn = InStrRev_API(.Text, .Find, -1, Compare)
End With
Next
strResults = strResults & "InStrRev_API: " & Format(Timer - sngStart, "0.00000") & " seconds" & vbCrLf
MsgBox strResults, vbInformation, "Notice"
End Sub
Private Function PrepareTest(plngHits As Long) As Boolean
Dim lngReturn As Long
Dim i As Long
Dim ii As Long
Dim blnSuccess As Boolean
Dim bytFind() As Byte
Dim bytText() As Byte
Dim lngFindLen As Long
Dim lngTextLen As Long
' Create the samples
For i = 1 To Tests
With typTest(i)
' Generate find string
lngFindLen = Int((FindMaxLen - FindMinLen + 1) * Rnd + FindMinLen)
ReDim bytFind(lngFindLen - 1)
For ii = 0 To lngFindLen - 1
bytFind(ii) = Int((AscMax - AscMin + 1) * Rnd + AscMin)
Next
.Find = StrConv(bytFind, vbUnicode)
' Generate the text string
lngTextLen = Int((TextMaxLen - TextMinLen + 1) * Rnd + TextMinLen)
ReDim bytText(lngTextLen - 1)
For ii = 0 To lngTextLen - 1
' Give preference to key characters
If Int(Seed * Rnd) = 0 Then
bytText(ii) = bytFind(Int(lngFindLen * Rnd))
Else
bytText(ii) = Int((AscMax - AscMin + 1) * Rnd + AscMin)
End If
Next
.Text = StrConv(bytText, vbUnicode)
' Count hits
lngReturn = InStrRev(.Text, .Find, -1, Compare)
If lngReturn <> 0 Then plngHits = plngHits + 1
' Verify InStrRev_API
blnSuccess = (InStrRev_API(.Text, .Find, -1, Compare) = lngReturn)
If Not blnSuccess Then
Debug.Print "InStrRev_API failed:" & vbCrLf & vbCrLf & "pstrText = " & .Text & vbCrLf & "pstrFind = " & .Find
MsgBox "InStrRev_API failed:" & vbCrLf & vbCrLf & "pstrText = " & .Text & vbCrLf & "pstrFind = " & .Find
Exit Function
End If
' Verify InStrReverse
blnSuccess = (InStrReverse(.Text, .Find, -1, Compare) = lngReturn)
If Not blnSuccess Then
Debug.Print "InStrReverse failed:" & vbCrLf & vbCrLf & "pstrText = " & .Text & vbCrLf & "pstrFind = " & .Find
MsgBox "InStrReverse failed:" & vbCrLf & vbCrLf & "pstrText = " & .Text & vbCrLf & "pstrFind = " & .Find
Exit Function
End If
End With
Next
Erase bytFind
Erase bytText
PrepareTest = True
End Function
Public Function InStrReverse(ByVal pstrText As String, ByVal pstrFind As String, Optional ByVal plngStart As Long = -1, Optional penCompare As VbCompareMethod = vbBinaryCompare) As Long
Dim lngPos As Long
Dim lngMatch As Long
' =====> Begin Prep
If Not penCompare = vbBinaryCompare Then
pstrText = LCase$(pstrText)
pstrFind = LCase$(pstrFind)
End If
' Validate plngStart and pstrFind
Select Case plngStart
Case -1
If Len(pstrFind) <> 0 Then
plngStart = Len(pstrText) - Len(pstrFind) + 1
Else
InStrReverse = Len(pstrText)
Exit Function
End If
Case Is < 1
Err.Raise 5
Case Is > Len(pstrText)
Exit Function
Case Else
If Len(pstrFind) <> 0 Then
plngStart = plngStart - Len(pstrFind) + 1
Else
InStrReverse = plngStart
Exit Function
End If
End Select
' Validate pstrText
Select Case Len(pstrText)
Case 0: Exit Function
Case Is < Len(pstrFind): Exit Function
End Select
' <===== End Prep
For lngPos = 1 To plngStart
lngPos = InStr(lngPos, pstrText, pstrFind)
Select Case lngPos
Case 0: Exit For
Case Is > plngStart: Exit For
Case Else: lngMatch = lngPos
End Select
Next
InStrReverse = lngMatch
End Function
Public Function InStrRev_API(ByRef String1 As String, ByRef String2 As String, Optional ByVal Start As Long = -1, Optional ByVal CompareMethod As VbCompareMethod = vbBinaryCompare) As Long
Static BufStrHeader(5) As Long, BufFindHeader(5) As Long, BufFind2Header(5) As Long
Dim BufStr() As Integer, BufFind() As Integer, BufFind2() As Integer, KeyLC As String, KeyUC As String
Dim intComp1 As Integer, intComp2 As Integer, intComp3 As Integer, intComp4 As Integer
Dim intComp1b As Integer, intComp3b As Integer
Dim lngA As Long, lngB As Long, lngStartEnd As Long, lngKeyEnd As Long
lngKeyEnd = Len(String2)
If LenB(String1) = 0 Or lngKeyEnd = 0 Then Exit Function
lngStartEnd = Len(String1) - lngKeyEnd
If Start < 1 Then
Start = lngStartEnd
ElseIf Start > lngStartEnd Then
Start = lngStartEnd
Else
Start = Start - 1
End If
BufStrHeader(0) = 1
BufStrHeader(1) = 2
BufStrHeader(3) = StrPtr(String1)
BufStrHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufStr), VarPtr(BufStrHeader(0)), 4
If CompareMethod = vbBinaryCompare Then
BufFindHeader(0) = 1
BufFindHeader(1) = 2
BufFindHeader(3) = StrPtr(String2)
BufFindHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind), VarPtr(BufFindHeader(0)), 4
If lngKeyEnd = 1 Then
intComp1 = BufFind(0)
For lngA = Start To 0 Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Then Exit For
Next lngA
If lngA > -1 Then InStrRev_API = lngA + 1
Else
lngKeyEnd = lngKeyEnd - 1
intComp1 = BufFind(lngKeyEnd)
For lngA = Start + lngKeyEnd To lngKeyEnd Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Then
For lngB = 0 To lngKeyEnd - 1
intComp3 = BufFind(lngB)
intComp4 = BufStr(lngA - lngKeyEnd + lngB)
If intComp3 <> intComp4 Then Exit For
Next lngB
If lngB = lngKeyEnd Then InStrRev_API = lngA - lngKeyEnd + 1: Exit For
End If
Next lngA
End If
RtlMoveMemory ByVal VarPtrArray(BufStr), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind), 0&, 4
Else
KeyUC = UCase$(String2)
KeyLC = LCase$(String2)
BufFindHeader(0) = 1
BufFindHeader(1) = 2
BufFindHeader(3) = StrPtr(KeyUC)
BufFindHeader(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind), VarPtr(BufFindHeader(0)), 4
BufFind2Header(0) = 1
BufFind2Header(1) = 2
BufFind2Header(3) = StrPtr(KeyLC)
BufFind2Header(4) = &H7FFFFFFF
RtlMoveMemory ByVal VarPtrArray(BufFind2), VarPtr(BufFind2Header(0)), 4
If lngKeyEnd = 1 Then
intComp1 = BufFind(0)
intComp1b = BufFind2(0)
For lngA = Start To 0 Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Or intComp1b = intComp2 Then Exit For
Next lngA
If lngA > -1 Then InStrRev_API = lngA + 1
Else
lngKeyEnd = lngKeyEnd - 1
intComp1 = BufFind(lngKeyEnd)
intComp1b = BufFind2(lngKeyEnd)
For lngA = Start + lngKeyEnd To lngKeyEnd Step -1
intComp2 = BufStr(lngA)
If intComp1 = intComp2 Or intComp1b = intComp2 Then
For lngB = 0 To lngKeyEnd - 1
intComp3 = BufFind(lngB)
intComp3b = BufFind2(lngB)
intComp4 = BufStr(lngA - lngKeyEnd + lngB)
If intComp3 <> intComp4 And intComp3b <> intComp4 Then Exit For
Next lngB
If lngB = lngKeyEnd Then InStrRev_API = lngA - lngKeyEnd + 1: Exit For
End If
Next lngA
End If
RtlMoveMemory ByVal VarPtrArray(BufStr), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind), 0&, 4
RtlMoveMemory ByVal VarPtrArray(BufFind2), 0&, 4
End If
End Function
-
Apr 22nd, 2007, 08:32 AM
#13
Re: Faster InStrRev
I'd suggest to follow InStr behavior and not InStrRev as they differ from each other. In my opinion a generic helper function should never raise an error, because that makes error handling bothersome. Thus returning 0 instead of raising an error would be correct, and this is also how InStr works.
-
Apr 22nd, 2007, 09:04 PM
#14
Re: Faster InStrRev
 Originally Posted by Merri
I'd suggest to follow InStr behavior and not InStrRev as they differ from each other. In my opinion a generic helper function should never raise an error, because that makes error handling bothersome. Thus returning 0 instead of raising an error would be correct, and this is also how InStr works.
It's your challenge.
Just seems to me that to replacing a function with a faster version should mirror the replaced function's behavior, to avoid confusion. In any case, the following scenario should probably be mirrored properly...
InStrRev("ABCDE", "BCD", 3) returns 0
...since "BCD" doesn't actually exist starting from character 3 and working left.
ETA: InStr() does not return 0 for invalid starting positions (below 1); like InStrRev(), it generates runtime error 5.
Last edited by Ellis Dee; Apr 22nd, 2007 at 09:18 PM.
-
Apr 22nd, 2007, 10:53 PM
#15
Re: Faster InStrRev
Shouldn't do things when tired: prone to errors. I guess I happened to have vbNullString as a keyword parameter. InStr's error handling seems to first check for the length of the string, then length of the search keyword and then the starting position.
- String length: if 0, exit immediately, otherwise continue
- Keyword length: if 0, return starting position value, otherwise continue
- String length vs. keyword length: if keyword is longer, exit immediately, otherwise continue
- Starting position: if below 1, raise error 5, otherwise continue
- Do the search
Exit immediately means that the function return value is 0.
InStrRev allows for -1 of course, but -2 and 0 raise error 5 again.
But now I'm off to work.
-
Apr 22nd, 2007, 11:29 PM
#16
Re: Faster InStrRev
 Originally Posted by Merri
Shouldn't do things when tired: prone to errors. I guess I happened to have vbNullString as a keyword parameter. InStr's error handling seems to first check for the length of the string, then length of the search keyword and then the starting position.
- String length: if 0, exit immediately, otherwise continue
- Keyword length: if 0, return starting position value, otherwise continue
- String length vs. keyword length: if keyword is longer, exit immediately, otherwise continue
- Starting position: if below 1, raise error 5, otherwise continue
- Do the search
Exit immediately means that the function return value is 0.
I went through the exact same frustration when trying to mirror the behavior. It is annoying. And inconsistent, as you have pointed out.
Of note is that InStrRev() apparently has a different order to its list of checks:
InStr(0,"","") returns 0
InStrRev("","",0) generates runtime error 5
Stupid VB authors. I'm guessing that the guys who wrote InStrRev() were not only different people than the ones who wrote InStr(), but that they were unpaid interns. heh.
-
Apr 23rd, 2007, 01:45 AM
#17
Re: Faster InStrRev
Here is something that I use that works really fast
Code:
Public Function InStrRev(ByVal StringCheck As String, ByVal StringMatch As String, Optional ByVal Start As Integer = -1, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As Integer
Dim sStr As String
Dim i As Long
Dim j As Long
Dim CurLoc As Long
Dim LastLoc As Long
Dim Offset As Long
Dim strItem As String
Dim BackupCount As Long
InStrRev = 0
i = Len(StringCheck)
j = Len(StringMatch)
If i = 0 Or j = 0 Then Exit Function
If i < j Then Exit Function
sStr = StrReverse(StringCheck)
strItem = StrReverse(StringMatch)
BackupCount = BackupCount + (Len(strItem) - 1) ' Adjust for reversed string offset
LastLoc = 1
CurLoc = InStr(LastLoc, sStr, strItem, Compare)
If CurLoc > 0 Then InStrRev = ((Len(sStr) - CurLoc) - BackupCount) + 1
End Function
Last edited by randem; Apr 23rd, 2007 at 01:49 AM.
-
Apr 23rd, 2007, 02:36 AM
#18
Re: Faster InStrRev
If you are looking for a faster Instr or InstrRev and prepared to cut a few corners, theres another way. I will make the principle clear but I won't work out a fully fledged Instr or InstrRev Function
Things to remember:
- WARNING it is NOT UNICODE complient (that's the corner to cut ).
- Due to legacy compatibility Strings can be converted to Byte arrays.
on the fly:
'asuming you want to find 1 char not a string if you want to you can add that functionality
Code:
Public Function NoUnicodeInstr(Start as Long, StringToSearch as String, StringToFind) as Long
Dim Bt() as Byte
Dim CharVal as Long
Dim Counter as Long
Dim Found as Boolean
Bt() = StringToSearch
CharVal = val(left$(StringToFind))
For Counter= 0 To UBound(Bt) Step 2
If Bt(Counter) = CharVal Then
Found = True
Exit For
end if
Next
IIf (Found,NoUnicodeInstr = Counter/2, NoUnicodeInstr = -1)
End Function
 why can't programmers keep and 31 Oct and 25 dec apart. Why Rating is Useful
for every question you ask provide an answer on another thread.
-
Apr 23rd, 2007, 07:20 AM
#19
Re: Faster InStrRev
randem: check your function's values with the test strings I provided. It gives odd results.
KEYWORD = "abcdefghijklmnopqrstuvwxyzåäö"
TEST = String$(100000, "X") & KEYWORD
Returns: -31071 (should be 100001)
KEYWORD = "abcdefghijklmnopqrstuvwxyzåäö"
TEST = KEYWORD & String$(100000, "X")
Returns: 1 (correct)
You can also optimize your function by about 25% (with the big string I used) if you switch ByVal to ByRef in strings. ByVal causes strings to be copied, which isn't so nice.
-
Apr 23rd, 2007, 07:35 AM
#20
Re: Faster InStrRev
 Originally Posted by randem
Here is something that I use that works really fast
Stipulating that the return behavior isn't perfectly mirrored, it's still a nice peppy solution, and as an added bonus it's nice and simple. Using the same two parameter sets as outline upthread, the benchmark results are:
Binary Search
InStr: 0.28125
InStrRev: 0.97070
Ellis Dee: 0.36133
Merri: 0.45117
Randem: 0.50977
Text Search
InStr: 0.87109
InStrRev: 1.55078
Ellis Dee: 0.62109
Merri: 0.65234
Randem: 1.07227
Note that like Merri's API solution, yours also incorrectly returns 2 (should be 0) when calling InStrRev("ABCDE","BCD",3)
I find your solution's times to be particularly interesting. If InStrRev is indeed slower because it reverses the strings, then your solution shouldn't be faster at all. Yet it is significantly faster. Just how bad is the native InStrRev implementation? I'm wondering if it has to do with the foreign characters. Do you think maybe it's trying to replace them all with the English equivalents before searching?
After some quick testing, that doesn't seem to be the case:
?InStrRev("Straße","ss",-1,vbTextCompare)
5
?InStr(1,"Straße","ss",vbTextCompare)
5
Both functions handle them "properly", so it's not like InStrRev is wasting time doing something InStr skips. Incidentally, these return results are virtually unreproducible by us. Well, they could be reproduced by our functions, but there is no way they'd end up faster than the native calls. That would be a good reason to ditch the vbCompare parameter altogether and instead opt for a boolean CaseSensitive flag.
ETA: Those of us using native InStr() calls in our solutions (like me and Randem) can mirror the foreign character behavior by using vbTextCompare instead of LCase$. Thus, Randem's solution should theoretically work as-is, and mine could with minor changes. IMO, though, the performance hit isn't worth it.
Last edited by Ellis Dee; Apr 23rd, 2007 at 07:49 AM.
-
Apr 23rd, 2007, 07:55 AM
#21
Re: Faster InStrRev
 Originally Posted by Ellis Dee
Note that like Merri's API solution, yours also incorrectly returns 2 (should be 0) when calling InStrRev("ABCDE","BCD",3)
Lets see:- Starting position 3.
- InStrRev works to the left.
- Position 3: "CDE", not found.
- Move to the left.
- Position 2: "BCD", found.
- Return 2.
So I see nothing wrong with the behavior.
-
Apr 23rd, 2007, 08:14 AM
#22
Re: Faster InStrRev
 Originally Posted by Merri
Lets see: - Starting position 3.
- InStrRev works to the left.
- Position 3: "CDE", not found.
- Move to the left.
- Position 2: "BCD", found.
- Return 2.
So I see nothing wrong with the behavior.
That's not how the native InStrRev works.
Try it in the debug window. InStrRev("ABCDE","BCD",3) returns 0. That's because the starting position is considered the end of the string. When you start at position 3, positions 4 and 5 do not exist.
Think of it in the (non)reverse. InStr(3,"ABCDE","BCD") doesn't return 2 either. If we assume that the conceptual operation of InStrRev is an InStr call on the reversed strings, then it becomes clearer:
InStr(3,"EDCBA","DCB") should return...?
Last edited by Ellis Dee; Apr 23rd, 2007 at 08:17 AM.
-
Apr 23rd, 2007, 09:15 AM
#23
Re: Faster InStrRev
We shouldn't replicate bugs of InStrRev. This is a bug. And I'll explain why.
We have a keyword that is three characters long: BCD
We check that we have a valid starting point (highlighting the last three characters from the starting point): ABCDE
We can work from here, there is no error in giving starting position 3. This is identical to giving a starting position of 1 in InStr. At this point native InStrRev has a validation error: everything is correct, we could work with this string with the given keyword!
Now for an example that shouldn't work. What if we have a keyword that is four characters long?
We have a keyword that is four characters long: BCDE
We check that we have a valid starting point (highlighting the last four characters from the starting point): ABCDE!
We go over the string, this is an error condition!
I guess this latter condition might be incorrectly implemented in my solution; I can't remember for sure and I don't have the time to check that right now as I'm on train and soon at my destination.
Here is a simple sample application on why this is a bug. What I'd expect is to get three numbers: 3, 2 and 1.
Code:
Option Explicit
Private Sub Form_Load()
Dim lngPos As Long
lngPos = InStrRev("AAAAA", "AAA")
MsgBox lngPos
lngPos = InStrRev("AAAAA", "AAA", lngPos - 1)
MsgBox lngPos
lngPos = InStrRev("AAAAA", "AAA", lngPos - 1)
MsgBox lngPos
End Sub
Here is the InStr counterpart that works as expected:
Code:
Option Explicit
Private Sub Form_Load()
Dim lngPos As Long
lngPos = InStr("AAAAA", "AAA")
MsgBox lngPos
lngPos = InStr(lngPos + 1, "AAAAA", "AAA")
MsgBox lngPos
lngPos = InStr(lngPos + 1, "AAAAA", "AAA")
MsgBox lngPos
End Sub
See it now?
Last edited by Merri; Apr 23rd, 2007 at 09:20 AM.
-
Apr 23rd, 2007, 11:14 AM
#24
Re: Faster InStrRev
 Originally Posted by Merri
We shouldn't replicate bugs of InStrRev. This is a bug.
[...]
See it now?
I understand what you're saying, but I think your example is flawed. When searching for "AAA" within "AAAAA":
InStrRev returns the last occurrence, which includes the length of the search string. So it returns 3. The next call, starting at 2, finds nothing.
InStr returns the first occurence, which is 1. This is where your example breaks down, because you didn't factor in the length of the search string, which is inherent in the InStrRev concept. So the next place to look would not be 1 plus 1, but rather 1 + 3. Sending InStr a starting position of 4 would return zero, which is the expected complement to InStrRev's behavior.
I could easily envision that changing the behavior in the way you suggest could result in unexpected problems for a programmer trying to implement the change. For example, say they are looking for the last vbCrLf in the first 80 characters. If there were one at position 65 and one at position 80, instead of returning 66 characters you would be returning 81, which could mess up their formatting if they had an 80 character limit.
Being a very old, well-worn language, if the native behavior were a bug I'd expect to be able to dig up a cite somewhere on the internet documenting the bug. I can find no such cite.
In the process of looking for a cite, I did find an avalanche of InStrRev replacement functions. Some used your method, some used Randem's, some used mine, and a couple even used my binary search method. All handled this particular issue the way the native function does; not the way you are proposing.
Here is a nicely put together InStrRev coding contest. It specifically addresses this particular issue:
Start Optional. Numeric expression that sets the starting position for each search (ie, looks from here left-bound; note that for a match the last char of sMatch must be at pos <= Start).
If omitted, -1 is used, which means that the search begins at the last character position.
They seem to feel the behavior is by design, as do I.
If you would rather have the behavior you describe, that's fine. But that's a different contest than the one you describe in the OP. It would not be a replacement for InStrRev, but rather just a function with similar functionality. If that's the case, I would recommend we ditch the vbCompareMethod parameter and go with a boolean CaseSensitive one, since your approach may or may not even be capable of mirroring a true textual comparison.
If you can find a cite documenting the behavior as a bug, I will concede.
-
Apr 23rd, 2007, 12:55 PM
#25
Re: Faster InStrRev
Here is a correction to the function
Code:
Option Explicit
Public Function InStrRev(ByRef StringCheck As String, ByRef StringMatch As String, Optional ByVal Start As Integer = -1, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As Long
Dim sStr As String
Dim i As Long
Dim j As Long
Dim CurLoc As Long
Dim StartLoc As Long
Dim Offset As Long
Dim strItem As String
Dim BackupCount As Long
InStrRev = 0
i = Len(StringCheck)
j = Len(StringMatch)
If i = 0 Or j = 0 Then Exit Function
If i < j Then Exit Function
sStr = StrReverse(StringCheck)
strItem = StrReverse(StringMatch)
BackupCount = BackupCount + (Len(strItem) - 1) ' Adjust for reversed string offset
If Start = -1 Then
StartLoc = 1
Else
StartLoc = Start
End If
CurLoc = InStr(StartLoc, sStr, strItem, Compare)
If CurLoc > 0 Then InStrRev = ((Len(sStr) - CurLoc) - BackupCount) + StartLoc
End Function
-
Apr 23rd, 2007, 12:55 PM
#26
Re: Faster InStrRev
First of all I have to remind that this isn't a contest: this is just a regular "Code It Better" thread. That does leave us space to question how the original functions work.
We're getting down to the basics of programming languages here, eventually. There is one major problem that is causing us a lot of trouble: that is the fact we're not working with zero base, instead we're working on one base.
Code:
Option Explicit
Private Sub Form_Load()
Dim lngPos As Long
lngPos = InStrRev("AAAAAA", "AAA", 3)
MsgBox lngPos
lngPos = InStrRev("AAAAAA", "AAA", lngPos + Len("AAA") - 2)
MsgBox lngPos
lngPos = InStrRev("AAAAAA", "AAA", lngPos + Len("AAA") - 2)
MsgBox lngPos
End Sub
This just doesn't feel or look right: it certainly isn't obvious that we're looping from end to beginning. I wouldn't like to code a code that is this complex when doing something this simple. With zero base we could even get rid of - 2 and have much more logical - 1 instead.
However, I still feel that the current behavior is wrong. I think the function should in this case also return positions in this "reversed" way, based on end of the keyword, instead of returning the first position like InStr. So I do think this hasn't been really thought of carefully. If it returned values like it handles the Start parameter, it would immediately be obvious and logical how the Start parameter works.
So all this has lead me into a new idea: making it possible to use Integer arrays instead of Strings. This would mean creating a lot of custom functions for making use of Integer arrays as easy as Strings, including conversion to and from String (as fast as possible, of course). And this would of course leave a lot of space for defining how functions should work.
-
Apr 25th, 2007, 12:16 AM
#27
Re: Faster InStrRev
Here is the updated code.
Code:
Option Explicit
Public Function InStrRev(ByRef StringCheck As String, ByRef StringMatch As String, Optional ByVal Start As Integer = -1, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As Long
Dim sStr As String
Dim i As Long
Dim j As Long
Dim CurLoc As Long
Dim StartLoc As Long
Dim Offset As Long
Dim strItem As String
Dim BackupCount As Long
InStrRev = 0
i = Len(StringCheck)
j = Len(StringMatch)
If i = 0 Or j = 0 Then Exit Function
If i < j Then Exit Function
sStr = StrReverse(StringCheck)
strItem = StrReverse(StringMatch)
BackupCount = Len(strItem) - 1 ' Adjust for reversed string offset
If Start = -1 Then
StartLoc = 1
Else
StartLoc = i - Start + 1
End If
CurLoc = InStr(StartLoc, sStr, strItem, Compare)
If CurLoc > 0 Then InStrRev = (i - CurLoc - BackupCount) + 1
End Function
-
Apr 25th, 2007, 12:44 AM
#28
Re: Faster InStrRev
Regarding my benchmark program, it occurs to me that it's only measuring the average case. I want to add in logic to force a (roughly) equal number of best and worst case scenarios by stuffing the find string into the start or end of some of the non-matches.
A good generalized solution wouldn't be impacted by this, but if a solution were weak in the extreme case it would be likelier to show up. I don't really have the motivation to do this just now, but I'll try to get to it in the next couple days.
One other thought is that it might be good to include a start parameter in the benchmarks, though I'm not sure the best way to do that. Any ideas would be appreciated.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|