-
[RESOLVED] Fastest way to remove strings between two substrings?
I am looking for a fastest possible way to remove a set of strings (they are not the same) placed between two characters or sub-strings.
For example:
If I have a sentence like this:
I am looking for a <-- fastest possible --> way to remove <--a set of--> strings <--(they are not the same)--> placed between two <--characters or--> sub-strings.
How would I remove the texts between <-- and --> in the fastest possible way.
Thank you.
-
Re: Fastest way to remove strings between two substrings?
Code:
Private Sub Form_Load()
Dim strOriginal As String
Dim strNew As String
strOriginal = "I am looking for a <-- fastest possible --> way to remove <--a set of--> strings <--(they are not the same)--> placed between two <--characters or--> sub-strings."
Do Until InStr(strOriginal, "<--") = 0
strOriginal = Replace$(strOriginal, Mid$(strOriginal, InStr(strOriginal, "<--"), InStr(strOriginal, "-->") - InStr(strOriginal, "<--") + 3), "")
Loop
MsgBox strOriginal
End Sub
But maybe it can be faster using regular expressions.
-
Re: Fastest way to remove strings between two substrings?
You can get a good speed by using Mid$() = Mid$() technique, working within the same string. For best optimization for speed you would need to first go through all occurances, storing all start positions and lengths, then moving the remaining substrings towards the beginning of the string. Then you can simply cut off the rest. The most important thing is to reduce amount of memory consumption by working with the same string, using Mid$ on both sides.
Replace and regular expressions by comparison are a very slow method.
-
Re: Fastest way to remove strings between two substrings?
I will try to apply this onto what I need. RegEx wouldn't do much good because the enclosing sub-strings vary,,,
I will let you know if this does it (of course in a modified form for what I need :) )
-
Re: Fastest way to remove strings between two substrings?
Quote:
The most important thing is to reduce amount of memory consumption by working with the same string, using Mid$ on both sides.
Yes,, this is what I have problems with,, LARGE strings I have here,, lol
-
Re: Fastest way to remove strings between two substrings?
@ baja_yu
I think I will get a significant increase in speed with your sample. I had to modify a little bit but it works pretty good :) Still a few tweaks to do and I am pretty sure this is it.
Thank you.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Does this function perform better?
Code:
Public Function CleanBetween(Text As String, Before As String, After As String) As String
Dim lngA As Long, lngCur As Long, lngLength As Long
Dim lngPos() As Long, lngLen() As Long
' calculate maximum possible number of matches
lngA = Len(Text) \ (Len(Before) + Len(After))
' work arrays
ReDim lngPos(lngA)
ReDim lngLen(lngA)
' find first
lngPos(0) = 1
lngLen(0) = InStr(Text, Before) - 1
' got anything to work with?
If lngLen(0) >= 0 Then
' total length found
lngLength = lngLen(0)
' keep on working until no match found or maximum theoretical items found
For lngCur = 1 To lngA
' find next start position for a substring to preserve
lngPos(lngCur) = InStr(lngPos(lngCur - 1) + lngLen(lngCur - 1), Text, After)
If lngPos(lngCur) Then
' fix start position to after the tag
lngPos(lngCur) = lngPos(lngCur) + Len(After)
' then find the next start tag to ignore
lngLen(lngCur) = InStr(lngPos(lngCur), Text, Before)
' did we get it?
If lngLen(lngCur) Then
' get actual length
lngLen(lngCur) = lngLen(lngCur) - lngPos(lngCur)
' increase total length
lngLength = lngLength + lngLen(lngCur)
Else
' add all remaining text as no start tag found
lngLen(lngCur) = Len(Text) - lngPos(lngCur) + 1
' increase total length
lngLength = lngLength + lngLen(lngCur)
' nothing more to process
Exit For
End If
Else
' ignore last one (we have a start/end tag mismatch!)
lngCur = lngCur - 1
' nothing more to process
Exit For
End If
Next lngCur
' create output buffer
CleanBetween = Left$(Text, lngLength)
' copy position
lngA = 1
For lngCur = 0 To lngCur
' see if anything to copy and copy if there is!
If lngLen(lngCur) Then
Mid$(CleanBetween, lngA, lngLen(lngCur)) = Mid$(Text, lngPos(lngCur), lngLen(lngCur))
' next copy position
lngA = lngA + lngLen(lngCur)
End If
Next lngCur
Else
' return everything
CleanBetween = Text
End If
End Function
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
@ Merri
I think it does actually,, I am using a very large sample and the speed is not noticeably different but it seems smoother, so I will have to time it to see exactly,, but the function itself is very usable as you presented it.
Awesome. Thanks for the help. This is a good solution for what I need.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Updated with comments & fixed a potential endless loop error :)
The biggest potential performance issue is with are the constantly redimmed Long arrays, it would gain speed if not used as a function but instead would be modified into a inline code that would only ReDim those arrays if they needed to be made bigger.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Here is a version with Static work arrays: it will consume more memory and actual processing slows down, but as the arrays are not constantly being redimmed and thus memory consumption doesn't go up'n'down all the time it should be overall faster.
Code:
Public Function CleanBetween(Text As String, Before As String, After As String) As String
Dim lngA As Long, lngCur As Long, lngLength As Long
Static lngPos() As Long, lngLen() As Long
' calculate maximum possible number of matches
lngA = Len(Text) \ (Len(Before) + Len(After))
' work arrays
If Not Not lngPos Then
If lngA > UBound(lngPos) Then
ReDim lngPos(lngA)
ReDim lngLen(lngA)
End If
Else
ReDim lngPos(lngA)
ReDim lngLen(lngA)
End If
' VB6 IDE error fix (with Not Array)
Debug.Assert App.hInstance
' find first
lngPos(0) = 1
lngLen(0) = InStr(Text, Before) - 1
' got anything to work with?
If lngLen(0) >= 0 Then
' total length found
lngLength = lngLen(0)
' keep on working until no match found or maximum theoretical items found
For lngCur = 1 To lngA
' find next start position for a substring to preserve
lngPos(lngCur) = InStr(lngPos(lngCur - 1) + lngLen(lngCur - 1), Text, After)
If lngPos(lngCur) Then
' fix start position to after the tag
lngPos(lngCur) = lngPos(lngCur) + Len(After)
' then find the next start tag to ignore
lngLen(lngCur) = InStr(lngPos(lngCur), Text, Before)
' did we get it?
If lngLen(lngCur) Then
' get actual length
lngLen(lngCur) = lngLen(lngCur) - lngPos(lngCur)
' increase total length
lngLength = lngLength + lngLen(lngCur)
Else
' add all remaining text as no start tag found
lngLen(lngCur) = Len(Text) - lngPos(lngCur) + 1
' increase total length
lngLength = lngLength + lngLen(lngCur)
' nothing more to process
Exit For
End If
Else
' ignore last one (we have a start/end tag mismatch!)
lngCur = lngCur - 1
' nothing more to process
Exit For
End If
Next lngCur
' create output buffer
CleanBetween = Left$(Text, lngLength)
' copy position
lngA = 1
For lngCur = 0 To lngCur
' see if anything to copy and copy if there is!
If lngLen(lngCur) Then
Mid$(CleanBetween, lngA, lngLen(lngCur)) = Mid$(Text, lngPos(lngCur), lngLen(lngCur))
' next copy position
lngA = lngA + lngLen(lngCur)
End If
Next lngCur
Else
' return everything
CleanBetween = Text
End If
End Function
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
I am forced to use it as a function :) My delimiters are variable.. and I use it in multiple locations.. This is really convenient.
and again.. THANKS!
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
I tried for 10 minutes but fail to understand Merri's method.
This is my way. Much simpler and perhaps not bad with speed.
Code:
Public Function RemoveBetween(sText As String, sBefore As String, sAfter As String) As String
Dim arTemp() As String
Dim i As Long
arTemp = Split(sText, sBefore)
For i = 1 To UBound(arTemp)
arTemp(i) = Mid$(arTemp(i), InStr(arTemp(i), sAfter) + Len(sAfter))
Next
RemoveBetween = Join(arTemp, "")
End Function
-
2 Attachment(s)
Re: [RESOLVED] Fastest way to remove strings between two substrings?
anhn: it is slower than baja_yu's Replace method, which is slower than CleanBetween. This is on the easy side to see, because the sole purpose of CleanBetween's logic is to avoid creating new strings (and thus memory allocation/deallocation). What Split does is to create an array of many strings, and then almost each of these strings is modified again creating more new strings (and also removing old ones). This is why it is so bad for performance. Replace does quite well for such a short code, so it would be the #1 choice in most cases.
CleanBetween has two arrays, which contain the start position to copy from and the length to copy. These are the substrings that are to be preserved. Handling these arrays is much faster than multiple strings. The end of the procedure is the important part, where CleanBetween is first filled with the final size using Left$ and then Mid$ is used to construct the final string from the substrings that are wanted.
However, the speed difference only becomes truly noticeable when string size grows to very big. If you had a lot of very short strings then baja_yu's suggestion would be better in code vs. results ratio.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Okay, then I noticed one thing that I ignored earlier: baya_yu's Replace method is only very effective if there is a lot of same matches, which is something that happens with a replicating method that I used to build the test string. If I change test string builder code to this:
Code:
Private Sub Form_Load()
Dim strOriginal As String
Dim lngA As Long
For lngA = 1 To 10000
TEST = TEST & ((ChrW$(lngA) & "<-- " & ChrW$(lngA) & " --> ") & String$(lngA \ 100 + 10, ChrW$(lngA)))
Next lngA
Me.Caption = "Test string length: " & Format$(Len(TEST) / 1024 / 1024, "0.00") & " MB"
End Sub
The test string ends up being "only" 0.67 MB, but the Replace method runs over a minute. CleanBetween does the same in 7 ms and RemoveBetween 14 ms. Thus this makes RemoveBetween the better short code alternative. I knew there was something I was missing with Replace but just didn't get into my head in time :) (I put the blame on my gf... and she'll just end up on laughing and tell I'm unfair, which I am.)
-
2 Attachment(s)
Re: [RESOLVED] Fastest way to remove strings between two substrings?
This doesn't use arrays at all and is a much shorter code, and faster!
Code:
Public Function CleanBetweenShort2(ByVal Text As String, Before As String, After As String) As String
Dim lngTo As Long, lngFrom As Long, lngLenBefore As Long, lngLenAfter As Long
Dim lngCur As Long, lngLen As Long
lngLenBefore = Len(Before)
lngLenAfter = Len(After)
lngTo = InStr(Text, Before)
lngCur = lngTo
If lngTo Then
Do
lngFrom = InStr(lngTo + lngLenBefore, Text, After)
If lngFrom Then
lngTo = InStr(lngFrom + lngLenAfter, Text, Before)
If lngTo Then
lngLen = lngTo - lngFrom - lngLenAfter
Else
lngLen = Len(Text) - lngFrom - lngLenAfter + 1
End If
Mid$(Text, lngCur, lngLen) = Mid$(Text, lngFrom + lngLenAfter, lngLen)
lngCur = lngCur + lngLen
Else
lngTo = 0
End If
Loop While lngTo
CleanBetweenShort2 = Left$(Text, lngCur - 1)
End If
End Function
CleanBetween = original try
CleanBetween2 = use only one array
CleanBetween3 = use a Collection instead of array (not a very bright idea as you can see)
CleanBetweenShort = try to simplify the idea
CleanBetweenShort2 = Get It Right
And that "Same!" is just a confirmation that CleanBetweenShort2 gives the same result as RemoveBetween.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
I know that this has been marked as Resolved but regualr expression is still a great option here.
Code:
Private Sub Command1_Click()
Dim strInput As String
Dim strRet As String
Dim regEx As RegExp
strInput = "I am looking for a <-- fastest possible --> way to remove <--a set of--> strings <--(they are not the same)--> placed between two <--characters or--> sub-strings."
Set regEx = CreateObject("VBScript.RegExp")
' Set the pattern to remove everything between <-- and -->
regEx.Pattern = "<--(.*?)-->"
regEx.Global = True
' Remove everything between <-- and -->
strRet = regEx.Replace(strInput, "")
' Set the pattern to replace 2 spaces with 1
regEx.Pattern = " "
regEx.Global = True
' Remove double spaces
strRet = regEx.Replace(strRet, " ")
Debug.Print strRet
' Cleanup
Set regEx = Nothing
End Sub
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Using the following:
Code:
Dim RE As Object
Set RE = CreateObject("VBScript.RegExp")
RE.Pattern = "<--(.*?)-->"
RE.Global = True
Timing = 0
strResult = RE.Replace(TEST, vbNullString)
List1.AddItem Format$(Timing * 1000, "0.00000") & " RegExp"
Hitting @ ~5 ms using the attached project, thus faster than RemoveBetween. Probably the best short code solution, although for some reason it gives me a failing match against valid output data. I haven't checked character by character where the difference is, if it is related to characters 1 - 31 then there is no problem, but if it because of Unicode characters then there is a bigger issue for some cases.
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
Merri, surprisingly this is very similar logic but shorter.
Code:
Public Function DeleteBetween(ByVal sText As String, sBefore As String, sAfter As String) As String
Dim p1 As Long, p2 As Long, p As Long, n As Long
Dim L1 As Long: L1 = Len(sBefore)
Dim L2 As Long: L2 = Len(sAfter)
If L1 = 0 Or L2 = 0 Then Exit Function
p2 = 1 - L2
Do
p2 = p2 + L2
p1 = InStr(p2, sText, sBefore)
If p1 = 0 Then p1 = Len(sText) + 1
p = p1 - p2
Mid$(sText, n + 1, p) = Mid$(sText, p2, p)
n = n + p
p2 = InStr(p1 + L1, sText, sAfter)
Loop While p2
DeleteBetween = Left$(sText, n)
End Function
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
It is very much the same logic, Mid$ = Mid$ is the same, which is what brings the speed, other than that it mostly moves around some of the numeric variables to minify the number of code lines :) Thus it runs the same speed as my last submission. The only real complain would be that it does not check whether sText has anything or not, so empty or null string crashes it. Easy to fix, my only change suggestions, of which the two latter are a matter of taste:
If LenB(sText) = 0 Or L1 = 0 Or L2 = 0 Then Exit Function
If p Then Mid$(sText, n + 1, p) = Mid$(sText, p2, p)
If n Then DeleteBetween = Left$(sText, n)
-
Re: [RESOLVED] Fastest way to remove strings between two substrings?
If change all Instr, Mid, Left, Len to InstrB, MidB, LeftB, LenB that can speed up about 10%.
If test on LenB(sText) for quick exit then other tests on p and n are not required.
-
2 Attachment(s)
Re: [RESOLVED] Fastest way to remove strings between two substrings?
The 10% increase comes purely from InStr vs. InStrB, LeftB$ and MidB$ are equal in speed to Left$ and Mid$ :)
The n test is only for returning a NULL string instead of EMPTY string when there is nothing to return. A very minor detail, personally prefer returning vbNullString over "".
Edit!
Checking for Mid$ copy length does matter. If there are empty items then speed increases. Here is an updated version of DeleteBetween.
Code:
Public Function DeleteBetween(ByVal sText As String, sBefore As String, sAfter As String) As String
Dim P1 As Long, P2 As Long, L As Long, T As Long
Dim L1 As Long: L1 = LenB(sBefore)
Dim L2 As Long: L2 = LenB(sAfter)
If LenB(sText) = 0 Or L1 = 0 Or L2 = 0 Then Exit Function
P2 = 1 - L2
Do
P2 = P2 + L2: P1 = P2 - 1
Do: P1 = InStrB(P1 + 1, sText, sBefore)
Loop While (P1 And 1) = 0 And (P1 > 0)
If P1 = 0 Then P1 = LenB(sText) + 1
L = P1 - P2
If L Then MidB$(sText, T + 1, L) = MidB$(sText, P2, L)
T = T + L
P2 = P1 + L1 - 1
Do: P2 = InStrB(P2 + 1, sText, sAfter)
Loop While (P2 And 1) = 0 And (P2 > 0)
Loop While P2
If T Then DeleteBetween = LeftB$(sText, T)
End Function
Updated attachment code includes the earlier test with some empty extra stuff. If you remove If L Then you'll see it slowing down –*testing the length is very quick.