PDA

Click to See Complete Forum and Search --> : [VB6] Faster Split & Join (development)


Merri
Sep 20th, 2008, 11:49 AM
Old short QuickSplit versions in post #11 (http://www.vbforums.com/showpost.php?p=3344191&postcount=11).

Strings.Split
This is an optimized code using advanced methods for providing a "no external files" solution using a single BAS module file. No TLB, no DLL.

There is many advantages:
100% syntax compatible with native Split: just add the module to your project and you have a faster Split!1)
Faster with any possible call! Be the output empty array, empty strings, one array item only, short strings, mid length strings, long strings, this Split is always faster.
BinaryCompare for ultimate speed (InStrB), CharacterCompare for possible problems with some special case characters (InStr). InStrB is always faster than native Split; InStr version is faster or rougly equal depending on case.

1) TextCompare not supported, you can call VBA.Split for TextCompare. Adding support for TextCompare is too complex as it might require analyzing the entire Unicode character map.

Merri
Sep 20th, 2008, 11:50 AM
Here is a byte version that splits bytewise instead of characterwise, a feature not supported natively by VB:Public Sub QuickSplitB(Expression As String, ResultSplit() As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare, Optional ByRef IgnoreDelimiterWithin As String = vbNullString)
Dim lngA As Long, lngCount As Long, lngDelLen As Long, lngExpLen As Long, lngIgnLen As Long, lngResults() As Long
lngExpLen = LenB(Expression)
lngDelLen = LenB(Delimiter)
If lngExpLen > 0 And lngDelLen > 0 And (Limit > 0 Or Limit = -1&) Then
lngIgnLen = LenB(IgnoreDelimiterWithin)
If lngIgnLen Then
lngA = InStrB(1, Expression, Delimiter, Compare)
lngB = InStrB(1, Expression, IgnoreDelimiterWithin, Compare)
If Limit = -1& Then
ReDim lngResults(0 To (lngExpLen \ lngDelLen))
Do While lngA > 0
If lngA + lngDelLen <= lngB Or lngB = 0 Then
lngResults(lngCount) = lngA
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
lngCount = lngCount + 1
Else
lngB = InStrB(lngB + lngIgnLen, Expression, IgnoreDelimiterWithin, Compare)
If lngB Then
lngA = InStrB(lngB + lngIgnLen, Expression, Delimiter, Compare)
If lngA Then
lngB = InStrB(lngB + lngIgnLen, Expression, IgnoreDelimiterWithin, Compare)
End If
End If
End If
Loop
Else
ReDim lngResults(0 To Limit - 1)
Do While lngA > 0
If lngA + lngDelLen <= lngB Or lngB = 0 Then
lngResults(lngCount) = lngA
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
lngCount = lngCount + 1
If lngCount = Limit Then Exit Do
Else
lngB = InStrB(lngB + lngIgnLen, Expression, IgnoreDelimiterWithin, Compare)
If lngB Then
lngA = InStrB(lngB + lngIgnLen, Expression, Delimiter, Compare)
If lngA Then
lngB = InStrB(lngB + lngIgnLen, Expression, IgnoreDelimiterWithin, Compare)
End If
End If
End If
Loop
End If
Else
lngA = InStrB(1, Expression, Delimiter, Compare)
If Limit = -1& Then
ReDim lngResults(0 To (lngExpLen \ lngDelLen))
Do While lngA > 0
lngResults(lngCount) = lngA
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
lngCount = lngCount + 1
Loop
Else
ReDim lngResults(0 To Limit - 1)
Do While lngA > 0 And lngCount < Limit
lngResults(lngCount) = lngA
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
lngCount = lngCount + 1
Loop
End If
End If
ReDim Preserve ResultSplit(0 To lngCount)
If lngCount = 0 Then
ResultSplit(0) = Expression
Else
ResultSplit(0) = LeftB$(Expression, lngResults(0) - 1)
For lngCount = 0 To lngCount - 2
ResultSplit(lngCount + 1) = MidB$(Expression, lngResults(lngCount) + lngDelLen, lngResults(lngCount + 1) - lngResults(lngCount) - lngDelLen)
Next lngCount
ResultSplit(lngCount + 1) = RightB$(Expression, lngExpLen - lngResults(lngCount) - lngDelLen + 1)
End If
Else
ResultSplit = VBA.Split(vbNullString)
End If
End Sub

Edit!
Removed unused variables.

Merri
Sep 20th, 2008, 02:41 PM
A version for those who want it commented, simplified and just to be a super fast Split replacement that is usable in both VB5 & VB6:Option Explicit

Private Declare Function SafeArrayRedim Lib "oleaut32" (ByVal saPtr As Long, saBound As Long) As Long

' Short, commented & fully VB5 compatible version of QuickSplit at http://www.vbforums.com/showthread.php?t=540323
Public Sub QuickSplit(Expression As String, ResultSplit() As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1)
' general variables that we need
Dim lngA As Long, lngCount As Long, lngDelLen As Long, lngExpLen As Long, lngResults() As Long
' some dummy variables that we happen to need
Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
' length information
lngExpLen = LenB(Expression)
lngDelLen = LenB(Delimiter)
' validate lengths and limit (limit must be larger than 0 or it must be unlimited)
If lngExpLen > 0 And lngDelLen > 0 And (Limit > 0 Or Limit = -1&) Then
' now look up for the first position
lngA = InStrB(1, Expression, Delimiter, Compare)
' InStrB is very fast, but it may give "between characters" results
Do Until (lngA And 1) Or (lngA = 0)
' this is why we look for odd positions (1, 3, 5, 7 etc. are a valid position)
lngA = InStrB(lngA + 1, Expression, Delimiter, Compare)
Loop
' unlimited or limited?
If Limit = -1& Then
' unlimited, reserve space for maximum possible amount of returned items
ReDim lngResults(0 To (lngExpLen \ lngDelLen))
' index positions until none is found
Do While lngA > 0
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
Do Until (lngA And 1) Or (lngA = 0)
lngA = InStrB(lngA + 1, Expression, Delimiter, Compare)
Loop
' increase found counter
lngCount = lngCount + 1
Loop
Else
' limited, reserve space for the wanted amount of items
ReDim lngResults(0 To Limit - 1)
' index positions until none is found or until counter hits limitation
Do While lngA > 0 And lngCount < Limit
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
Do Until (lngA And 1) Or (lngA = 0)
lngA = InStrB(lngA + 1, Expression, Delimiter, Compare)
Loop
' increase found counter
lngCount = lngCount + 1
Loop
End If
' set results to actual findings
ReDim Preserve ResultSplit(0 To lngCount)
' see if we found any results
If lngCount = 0 Then
' nope, just set the only item to be the whole string
ResultSplit(0) = Expression
Else
' get the first item
ResultSplit(0) = LeftB$(Expression, lngResults(0) - 1)
' get the other items except the last one
For lngCount = 0 To lngCount - 2
ResultSplit(lngCount + 1) = MidB$(Expression, lngResults(lngCount) + lngDelLen, lngResults(lngCount + 1) - lngResults(lngCount) - lngDelLen)
Next lngCount
' get the last item
ResultSplit(lngCount + 1) = RightB$(Expression, lngExpLen - lngResults(lngCount) - lngDelLen + 1)
End If
Else
' clean any possible data that exists in the passed string array (like if it is multidimensional)
If Not Not ResultSplit Then Erase ResultSplit
' mysterious IDE error fix
Debug.Assert App.hInstance
' reset to one element, one dimension
ReDim ResultSplit(0 To 0)
' custom redimension: remove the items (this duplicates the VB6 Split behavior)
SafeArrayRedim Not Not ResultSplit, SafeArrayBound(0)
End If
End Sub

' Short, commented & fully VB5 compatible version of QuickSplitB at http://www.vbforums.com/showthread.php?t=540323
Public Sub QuickSplitB(Expression As String, ResultSplit() As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1)
' general variables that we need
Dim lngA As Long, lngCount As Long, lngDelLen As Long, lngExpLen As Long, lngResults() As Long
' some dummy variables that we happen to need
Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
' length information
lngExpLen = LenB(Expression)
lngDelLen = LenB(Delimiter)
' validate lengths and limit (limit must be larger than 0 or it must be unlimited)
If lngExpLen > 0 And lngDelLen > 0 And (Limit > 0 Or Limit = -1&) Then
' now look up for the first position
lngA = InStrB(1, Expression, Delimiter, Compare)
' unlimited or limited?
If Limit = -1& Then
' unlimited, reserve space for maximum possible amount of returned items
ReDim lngResults(0 To (lngExpLen \ lngDelLen))
' index positions until none is found
Do While lngA > 0
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
' increase found counter
lngCount = lngCount + 1
Loop
Else
' limited, reserve space for the wanted amount of items
ReDim lngResults(0 To Limit - 1)
' index positions until none is found or until counter hits limitation
Do While lngA > 0 And lngCount < Limit
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
' increase found counter
lngCount = lngCount + 1
Loop
End If
' set results to actual findings
ReDim Preserve ResultSplit(0 To lngCount)
' see if we found any results
If lngCount = 0 Then
' nope, just set the only item to be the whole string
ResultSplit(0) = Expression
Else
' get the first item
ResultSplit(0) = LeftB$(Expression, lngResults(0) - 1)
' get the other items except the last one
For lngCount = 0 To lngCount - 2
ResultSplit(lngCount + 1) = MidB$(Expression, lngResults(lngCount) + lngDelLen, lngResults(lngCount + 1) - lngResults(lngCount) - lngDelLen)
Next lngCount
' get the last item
ResultSplit(lngCount + 1) = RightB$(Expression, lngExpLen - lngResults(lngCount) - lngDelLen + 1)
End If
Else
' clean any possible data that exists in the passed string array (like if it is multidimensional)
If Not Not ResultSplit Then Erase ResultSplit
' mysterious IDE error fix
Debug.Assert App.hInstance
' reset to one element, one dimension
ReDim ResultSplit(0 To 0)
' custom redimension: remove the items (this duplicates the VB6 Split behavior)
SafeArrayRedim Not Not ResultSplit, SafeArrayBound(0)
End If
End Sub

This test shows it is a perfect VB6 Split replacement by behavior:
Private Sub Form_Load()
Dim strArray() As String
QuickSplit vbNullString, strArray
Debug.Print LBound(strArray), UBound(strArray)
End SubVB6 Split would also return a similar array.

Also to show how the byte version works:Private Sub Form_Load()
Dim strArray() As String
QuickSplitB String$(2, ChrW$(&H2020)), strArray, ChrB$(&H20)
Debug.Print LBound(strArray), UBound(strArray)
End Sub
You will get 5 items (because there are 4 space 8-bit ANSI characters). Very useful if you happen to read files directly into strings via API without any character set conversions from ANSI to Unicode (see an example of reading a text file fully using API (http://www.vbforums.com/showpost.php?p=3337813&postcount=3)).

Merri
Sep 20th, 2008, 03:05 PM
Finally, for those interested, I have made a version of QuickSplit that splits directly into string variables instead of a string array, SplitToVar (http://www.vbforums.com/showthread.php?t=538612). Please note that unlike Split and QuickSplit it modifies the expression string passed into the sub.

snookered
Sep 21st, 2008, 01:45 AM
Merri,
I am using VB5 and am finding it hard to follow your QuickSplitB code.
My situation is that I have a textbox with string, including commas or underscores, etc and I want to make an array to output to individual textboxes or a listbox (essentially the same as the Split function in VB6).
How does your code relate to my situation, as well as the situation where the data is in a text file? Where does the code refer to the textbox, label, etc?
I would appreciate any assistance.
Cheers,
Snookered.

Merri
Sep 21st, 2008, 07:12 AM
With VB6's Split you would do something like this to add the separated items to listbox:
Dim strArray() As String, lngA As Long
' split by comma
strArray = Split(Text1.Text, ",")
' clean up the listbox
List1.Clear
' fill the listbox with all the resulting items
For lngA = 0 To UBound(strArray)
List1.AddItem strArray(lngA)
Next lngA

When QuickSplit has been added to the project (into a module), the usage is slightly different, instead of this line:

strArray = Split(Text1.Text, ",")

You use:

QuickSplit Text1.Text, strArray, ","

This allows getting a true string array in VB5 and it is also much faster.

Dim strArray() As String, lngA As Long
' split by comma
QuickSplit Text1.Text, strArray, ","
' clean up the listbox
List1.Clear
' fill the listbox with all the resulting items
For lngA = 0 To UBound(strArray)
List1.AddItem strArray(lngA)
Next lngA

snookered
Sep 26th, 2008, 04:21 PM
I have got this working fine. Thanks for your help.
Snookered.

Ellis Dee
Sep 27th, 2008, 09:26 AM
You could tighten that code up a little. Instead of having two mostly duplicate code blocks for limited/unlimited, if you switch the logic around to a For...Next loop I think it'll be very close to just as fast. ' unlimited or limited?
If Limit = -1& Then
' unlimited, reserve space for maximum possible amount of returned items
ReDim lngResults(0 To (lngExpLen \ lngDelLen))
' index positions until none is found
Do While lngA > 0
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
' increase found counter
lngCount = lngCount + 1
Loop
Else
' limited, reserve space for the wanted amount of items
ReDim lngResults(0 To Limit - 1)
' index positions until none is found or until counter hits limitation
Do While lngA > 0 And lngCount < Limit
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
' increase found counter
lngCount = lngCount + 1
Loop
End IfAdding a variable would reduce the above code to: ' unlimited or limited?
If Limit = -1& Then
lngMax = lngExpLen \ lngDelLen
Else
lngMax = Limit - 1
End If
ReDim lngResults(lngMax)
For lngCount = 0 To lngMax
If lngA <= 0 Then Exit For
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
Next

tanjiunnyann
Sep 28th, 2008, 11:02 AM
Merri, I'm now using SplitB05 that downloaded from VBSpeed. And luckily I found another Split function at here! I want to know, QuickSplit and QuickSplitB which is faster? Are they support Chinese character as well? I'm using SplitB05(VBSpeed) has no problem with the chinese character.

Merri
Sep 28th, 2008, 12:42 PM
tinjiunnyann: They do different slightly things. QuickSplit works with character data (two bytes = one character). QuickSplitB works with byte data (one byte = half a character). With most strings you want to use QuickSplit. Both work with any character you give them.


Ellis Dee: thanks, I'll take a look at some point (hopefully later today... I was requested do some LAN work at home so I'm just in a quick visit at the moment)

Merri
Sep 28th, 2008, 04:10 PM
Ellis Dee: These are the short versions updated to the even shorter code. I did a minor logical change: I made the loop code run via a True condition, instead of only checking whether to exit the loop, switching that to be the slightly slower False condition. I think this kind of improves readibility too, but that is a matter of opinion.

I also did some further optimization that gives 0 to 4 % of performance boost, depends on the usage (how often called, how big input data, how many possible output results).
Option Explicit

Private Declare Function SafeArrayRedim Lib "oleaut32" (ByVal saPtr As Long, saBound As Long) As Long

' Short, commented & fully VB5 compatible version of QuickSplit at http://www.vbforums.com/showthread.php?t=540323
Public Sub QuickSplit(Expression As String, ResultSplit() As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1)
Static lngResults() As Long
' general variables that we need
Dim lngA As Long, lngCount As Long, lngDelLen As Long, lngExpLen As Long
' some dummy variables that we happen to need
Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
' length information
lngExpLen = LenB(Expression)
lngDelLen = LenB(Delimiter)
' validate lengths and limit (limit must be larger than 0 or it must be unlimited)
If lngExpLen > 0 And lngDelLen > 0 And (Limit > 0 Or Limit = -1&) Then
' unlimited or limited?
If Limit = -1& Then
' unlimited, reserve space for maximum possible amount of returned items
lngCount = lngExpLen \ lngDelLen
Else
' limited, reserve space for maximum wanted amount of returned items
lngCount = Limit - 1
End If
' data of found positions
ReDim Preserve lngResults(0 To lngCount)
' now look up for the first position
lngA = InStrB(1, Expression, Delimiter, Compare)
' InStrB is very fast, but it may give "between characters" results
Do Until (lngA And 1) Or (lngA = 0)
' this is why we look for odd positions (1, 3, 5, 7 etc. are a valid position)
lngA = InStrB(lngA + 1, Expression, Delimiter, Compare)
Loop
' now work until maximum limit
For lngCount = 0 To lngCount
' did we find anything?
If lngA > 0 Then
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
Do Until (lngA And 1) Or (lngA = 0)
lngA = InStrB(lngA + 1, Expression, Delimiter, Compare)
Loop
Else
' we did not find the next string, we are done
Exit For
End If
Next lngCount
' set results to actual findings
ReDim Preserve ResultSplit(0 To lngCount)
' see if we found any results
If lngCount = 0 Then
' nope, just set the only item to be the whole string
ResultSplit(0) = Expression
Else
' get the first item
ResultSplit(0) = LeftB$(Expression, lngResults(0) - 1)
' get the other items except the last one
For lngCount = 0 To lngCount - 2
ResultSplit(lngCount + 1) = MidB$(Expression, lngResults(lngCount) + lngDelLen, lngResults(lngCount + 1) - lngResults(lngCount) - lngDelLen)
Next lngCount
' get the last item
ResultSplit(lngCount + 1) = RightB$(Expression, lngExpLen - lngResults(lngCount) - lngDelLen + 1)
End If
Else
' clean any possible data that exists in the passed string array (like if it is multidimensional)
If Not Not ResultSplit Then Erase ResultSplit
' mysterious IDE error fix
Debug.Assert App.hInstance
' reset to one element, one dimension
ReDim ResultSplit(0 To 0)
' remove the item (this duplicates the VB6 Split behavior)
SafeArrayRedim Not Not ResultSplit, SafeArrayBound(0)
End If
End Sub

' Short, commented & fully VB5 compatible version of QuickSplitB at http://www.vbforums.com/showthread.php?t=540323
Public Sub QuickSplitB(Expression As String, ResultSplit() As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1)
Static lngResults() As Long
' general variables that we need
Dim lngA As Long, lngCount As Long, lngDelLen As Long, lngExpLen As Long
' some dummy variables that we happen to need
Dim Compare As VbCompareMethod, SafeArrayBound(1) As Long
' length information
lngExpLen = LenB(Expression)
lngDelLen = LenB(Delimiter)
' validate lengths and limit (limit must be larger than 0 or it must be unlimited)
If lngExpLen > 0 And lngDelLen > 0 And (Limit > 0 Or Limit = -1&) Then
' unlimited or limited?
If Limit = -1& Then
' unlimited, reserve space for maximum possible amount of returned items
lngCount = lngExpLen \ lngDelLen
Else
' limited, reserve space for maximum wanted amount of returned items
lngCount = Limit - 1
End If
' data of found positions
ReDim Preserve lngResults(0 To lngCount)
' now look up for the first position
lngA = InStrB(1, Expression, Delimiter, Compare)
' now work until maximum limit
For lngCount = 0 To lngCount
' did we find anything?
If lngA > 0 Then
' remember this position
lngResults(lngCount) = lngA
' look for the next one
lngA = InStrB(lngA + lngDelLen, Expression, Delimiter, Compare)
Else
' we did not find the next string, we are done
Exit For
End If
Next lngCount
' set results to actual findings
ReDim Preserve ResultSplit(0 To lngCount)
' see if we found any results
If lngCount = 0 Then
' nope, just set the only item to be the whole string
ResultSplit(0) = Expression
Else
' get the first item
ResultSplit(0) = LeftB$(Expression, lngResults(0) - 1)
' get the other items except the last one
For lngCount = 0 To lngCount - 2
ResultSplit(lngCount + 1) = MidB$(Expression, lngResults(lngCount) + lngDelLen, lngResults(lngCount + 1) - lngResults(lngCount) - lngDelLen)
Next lngCount
' get the last item
ResultSplit(lngCount + 1) = RightB$(Expression, lngExpLen - lngResults(lngCount) - lngDelLen + 1)
End If
Else
' clean any possible data that exists in the passed string array (like if it is multidimensional)
If Not Not ResultSplit Then Erase ResultSplit
' mysterious IDE error fix
Debug.Assert App.hInstance
' reset to one element, one dimension
ReDim ResultSplit(0 To 0)
' remove the item (this duplicates the VB6 Split behavior)
SafeArrayRedim Not Not ResultSplit, SafeArrayBound(0)
End If
End Sub

I haven't perfectly tested these codes, but they appear to work the same.


Edit!
A screenshot showing some of the performance difference against Splits available over at VBspeed:
http://kontu.selfip.info/vb6/quicksplit_vbspeed2.png

http://kontu.selfip.info/vb6/quicksplit_vbspeed.png

Edit #2!
One final addition... the longer the output strings in the array, the better QuickSplit does against what is available over at VBspeed:
http://kontu.selfip.info/vb6/quicksplit_vbspeed3.png

Ellis Dee
Sep 28th, 2008, 11:30 PM
Ellis Dee: These are the short versions updated to the even shorter code. I did a minor logical change: I made the loop code run via a True condition, instead of only checking whether to exit the loop, switching that to be the slightly slower False condition. I think this kind of improves readibility too, but that is a matter of opinion.Yeah, looks much nicer. I agree on the readability of the logical flow.

tanjiunnyann
Sep 29th, 2008, 12:42 AM
Merri, your QuickSplit is very nice, and I've using SplitB05(VBSpeed), QuickSplit and QuickSplitB for doing a comparison in my environment. I found that QuickSplit and QuickSplitB are 50% faster than SplitB05(VBSpeed). And as my coding result shows me, QuickSplitB is slightly faster than QuickSplit!

As the above testing benchmark are based on my own coding and my own usage.

Ellis Dee
Sep 29th, 2008, 03:10 AM
Oh, and I like how you reuse lngCount instead of introducing a new variable. But I'm a fan of reusing variables as a concept; no doubt others will find it less readable/more confusing/whatever.

For lngCount = 0 To lngCount

tanjiunnyann
Sep 29th, 2008, 05:40 AM
Oh, and I like how you reuse lngCount instead of introducing a new variable. But I'm a fan of reusing variables as a concept; no doubt others will find it less readable/more confusing/whatever.

For lngCount = 0 To lngCount

Is it will improve a bit of performance if we reuse the same variable for looping?

Ellis Dee
Sep 29th, 2008, 05:57 AM
Is it will improve a bit of performance if we reuse the same variable for looping?No. I just like that there is one less variable. Makes it seem cleaner to me.

Merri
Sep 29th, 2008, 08:57 AM
tanjiunnyann: you should not use QuickSplitB for regular character data! Especially with Chinese you may get incorrect results at times, because a character may be understood "from the middle" of a character.

This is a theoretical example: say you're looking for character which hex representation is |2020|. You have two characters within a string: |2040| and |4020| - when put together as a byte stream, they appear as |4020|2040|. Now, when QuickSplitB is processing the data, you may get a in between result: |4020|2040| - this is something that cannot happen with QuickSplit, because it only works per character, not by byte. Thus they should not be compared to each other as they're doing a slightly different thing (except of course for curiosity).


Ellis Dee: I reused lngCount, because I didn't see a need for a new variable as lngCount was still working for the same purpose: being a counter. I could have, however, used the Limit variable too, it is passed ByVal so I could do whatever I'd like with it, thus the code could be For lngCount = 0 To Limit instead.

tanjiunnyann
Oct 6th, 2008, 04:34 AM
Merri, thanks for your advice! I'm using QuickSplit instaed of QuickSplitB.

Your Split function is now the fastest split function what I've ever use before! I hope got some more improvement of performance from you next time.

Tom Moran
Oct 26th, 2008, 12:08 PM
Nice Job! :)

I converted a VB6 Split function to QuickSplit using a .csv file that needed to be parsed. I didn't time it but the visual display by itself tells me it is much faster. The data is already displayed when the form pops-up. Can't see it being drawn!

Thanks for the code!

Tom

Merri
Jun 13th, 2010, 02:07 PM
I spent some time doing further research on speeding up this procedure. I found a case where QuickSplit performs badly: a string where there are a lot of longer substrings causes performance issues. I also tweaked a lot of other areas. The resulting code is a lengthy one:
' VBspeedMerri4.bas
Option Explicit

Private Declare Function ArrPtr Lib "msvbvm60" Alias "VarPtr" (Arr() As Any) As Long
Private Declare Sub GetMem4 Lib "msvbvm60" (ByVal Ptr As Long, Value As Long)
Private Declare Sub PutMem4 Lib "msvbvm60" (ByVal Ptr As Long, ByVal Value As Long)
Private Declare Sub PutMem8 Lib "msvbvm60" (ByVal Ptr As Long, ByVal Value As Currency)
Private Declare Function SysAllocStringByteLen Lib "oleaut32" (ByVal Ptr As Long, ByVal Length As Long) As Long

Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function GetCurrentProcessId Lib "kernel32" () As Long
Private Declare Function OpenProcess Lib "kernel32" (ByVal dwDesiredAccess As Long, ByVal bInheritHandle As Long, ByVal dwProcessId As Long) As Long
Private Declare Function WriteProcessMemory Lib "kernel32" (ByVal hProcess As Long, lpBaseAddress As Any, lpBuffer As Any, ByVal nSize As Long, Optional lpNumberOfBytesWritten As Long) As Long

Private Function InIDE(Optional IDE) As Boolean
If IsMissing(IDE) Then Debug.Assert Not InIDE(InIDE) Else IDE = True
End Function

Private Property Get Procedure(ByVal AddressOfDest As Long) As Long
Procedure = AddressOfDest
End Property

Private Property Let Procedure(ByVal AddressOfDest As Long, ByVal AddressOfSrc As Long)
Dim JMP As Currency, PID As Long
' get process handle
PID = OpenProcess(&H1F0FFF, 0&, GetCurrentProcessId)
If PID Then
If InIDE Then
' get correct pointers to procedures in IDE
GetMem4 AddressOfDest + &H16&, AddressOfDest
GetMem4 AddressOfSrc + &H16&, AddressOfSrc
End If
Debug.Assert App.hInstance
' ASM JMP (0xE9) followed by bytes to jump in memory
JMP = (&HE9& * 0.0001@) + (AddressOfSrc - AddressOfDest - 5@) * 0.0256@
' write the JMP over the destination procedure
WriteProcessMemory PID, ByVal AddressOfDest, JMP, 5
' close process handle
CloseHandle PID
End If
End Property

Public Sub QuickSplit4(ResultSplit() As String, Expression As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1)
' general variables that we need
Dim D() As Integer, E() As Integer, H() As Long, HE() As Long, HS() As Long, P() As Long, S() As Long
Dim C As Long, I As Long, J As Long, K As Long, N As Long, LD As Long, LE As Long, PL As Long, V As Long
Dim PD As Long, PE As Long, PS As Long
' length information
LE = LenB(Expression)
LD = LenB(Delimiter)
' unlimited or limited?
If Limit = -1 Then Limit = LE \ LD + 1
' validate lengths and limit
If LE > 0 And LD > 0 And Limit >= 0 Then
' find the first item
If Limit > 1 Then
Do: I = InStrB(I + 1, Expression, Delimiter)
Loop Until (I And 1) = 1 Or (I = 0)
End If
' did we find an item?
If I Then
' space for knowing the positions
PL = Limit \ 64
ReDim P(0 To PL)
Do
' remember position
P(C) = I
' find next
I = I + LD - 1
Do: I = InStrB(I + 1, Expression, Delimiter)
Loop Until (I And 1) = 1 Or (I = 0)
' increase counter
C = C + 1
If C > PL Then PL = PL * 2: ReDim Preserve P(PL)
Loop While I > 0 And C <= Limit
P(C) = LE + 1
' get pointer
I = Not Not ResultSplit
Debug.Assert App.hInstance
' check if it is multidimensional array
If I Then GetMem4 I, I: If I And &HFFFF& <> 1 Then Erase ResultSplit
' make space for the new items
ReDim Preserve ResultSplit(0 To C)
If C < 32 Then
' there is not a lot to do so keep it simple!
I = 1
For C = 0 To C
J = P(C) - I
ResultSplit(C) = MidB$(Expression, I, J)
I = P(C) + LD
Next C
Else
' okay, now there should be enough to work with to have use of safe arrays...
ReDim H(0 To 5)
HS = H
HS(0) = 1: HS(1) = 4: HS(4) = 1
' S = current ResultSplit array item (Long)
PS = ArrPtr(S)
PutMem4 PS, VarPtr(HS(0))
H(0) = 1: H(1) = 2: H(4) = LenB(Expression)
HE = H
HE(3) = StrPtr(Expression)
' D = current ResultSplit item's data (Integer array)
PD = ArrPtr(D)
PutMem4 PD, VarPtr(H(0))
' E = Expression's data (Integer array)
PE = ArrPtr(E)
PutMem4 PE, VarPtr(HE(0))
V = VarPtr(ResultSplit(0))
' then start working...
I = 1
For C = 0 To C
J = P(C) - I
If J Then
If J > 191 Then
' MidB$ is faster than Integer array with long copy
ResultSplit(C) = MidB$(Expression, I, J)
Else
' Integer array is faster with a short copy
HS(3) = V + C * 4
If S(0) Then
If LenB(ResultSplit(C)) <> J Then
ResultSplit(C) = vbNullString
S(0) = SysAllocStringByteLen(0, J)
End If
Else
S(0) = SysAllocStringByteLen(0, J)
End If
H(3) = S(0)
N = (I - 1) \ 2
For K = 0 To (J - 1) \ 2
D(K) = E(N + K)
Next K
End If
End If
I = P(C) + LD
Next C
PutMem4 PS, 0
PutMem4 PD, 0
PutMem4 PE, 0
End If
Else
' one item
ReDim ResultSplit(0)
ResultSplit(0) = Expression
End If
Else
I = Not Not ResultSplit
' VB6 IDE mysterious bug fix with Not Array
Debug.Assert App.hInstance
If I Then
If LBound(ResultSplit) = 0 And UBound(ResultSplit) = -1 Then Exit Sub
Erase ResultSplit
End If
' safe array header for an empty string array
ReDim H(0 To 6): H(0) = vbString: H(1) = &H1800001: H(2) = 4
' H(1) becomes ArrPtr; H(0) is a negative item telling array datatype
PutMem4 StrArrPtr(ResultSplit), VarPtr(H(1))
' remove items from H array
PutMem8 (Not Not H) + 12, 0
' VB6 IDE mysterious bug fix with Not Array
Debug.Assert App.hInstance
End If
End Sub

Public Function StrArrPtr(Arr() As String) As Long
Procedure(AddressOf VBspeedMerri4.StrArrPtr) = Procedure(AddressOf VBspeedMerri4.z_ArrPtr)
StrArrPtr = VBspeedMerri4.StrArrPtr(Arr)
End Function

Public Function z_ArrPtr(ByVal Value As Long) As Long
z_ArrPtr = Value
End Function

The name QuickSplit4 only comes from the test version: I did two other versions to find out where things could be improved. Whatever the call this procedure should be faster than the native Split.

Btw, VBspeed's SplitB03 is actually quite a good one! The tests in VBspeed's site are poor for finding out all the cases. Here is a modified version:Public Sub SplitB03(Expression$, ResultSplit$(), Optional Delimiter$ = " ")
' by G.Beckmann, G.Beckmann@NikoCity.de
' modified from InStr to InStrB by Merri

Dim C&, iLen&, iLast&, iCur&

iLen = LenB(Delimiter)

If iLen Then

'/ count delimiters
Do: iCur = InStrB(iCur + 1, Expression, Delimiter)
Loop Until (iCur And 1) = 1 Or (iCur = 0)
Do While iCur
iCur = iCur + iLen - 1
Do: iCur = InStrB(iCur + 1, Expression, Delimiter)
Loop Until (iCur And 1) = 1 Or (iCur = 0)
C = C + 1
Loop

'/ initalization
ReDim Preserve ResultSplit(0 To C)
C = 0: iCur = 0: iLast = 1

'/ search again...
Do: iCur = InStrB(iCur + 1, Expression, Delimiter)
Loop Until (iCur And 1) = 1 Or (iCur = 0)
Do While iCur
ResultSplit(C) = MidB$(Expression, iLast, iCur - iLast)
iLast = iCur + iLen
iCur = iLast - 1
Do: iCur = InStrB(iCur + 1, Expression, Delimiter)
Loop Until (iCur And 1) = 1 Or (iCur = 0)
C = C + 1
Loop
ResultSplit(C) = MidB$(Expression, iLast)

Else
ReDim Preserve ResultSplit(0 To 0)
ResultSplit(0) = Expression
End If

End Sub
This is a good short code to work with, but it doesn't beat the native Split by a great margin very often (depends a lot on what you're doing).


Edit!
Test benchmark project attached. Note that I've added anti-cheating code: ReDim Preserve messes up true comparison between procedures in a multicall benchmark with exact same Expression and Delimiter. Thus I added Erase to clean up the array, just like what happens with native Split. This gives a truer comparison between Split and other solutions.

Merri
Jun 14th, 2010, 12:32 PM
First post (http://www.vbforums.com/showthread.php?p=3338200#post3338200) has been updated with a hopefully "final" version. I don't know if I can come up with anything to improve it more using VB6 code: ASM thunks would be "cheating".

Merri
Jun 20th, 2010, 06:08 PM
More improvements here! Both faster & shorter!

' Strings2.bas
Option Explicit

Public Enum SplitCompareMethod
[Split BinaryCompare] = VbCompareMethod.vbBinaryCompare ' InStrB
' [Split TextCompare] = VbCompareMethod.vbTextCompare ' InStr(TextCompare)
[Split CharacterCompare] = VbCompareMethod.vbDatabaseCompare ' InStr(BinaryCompare)
End Enum

Private Declare Function ArrPtr Lib "msvbvm60" Alias "VarPtr" (Arr() As Any) As Long
Private Declare Sub GetMem4 Lib "msvbvm60" (ByVal ptr As Long, Value As Long)
Private Declare Function InitStringArray Lib "oleaut32" Alias "SafeArrayCreate" (Optional ByVal VarType As VbVarType = vbString, Optional ByVal Dims As Integer = 1, Optional saBound As Currency) As Long
Private Declare Sub PutMem4 Lib "msvbvm60" (ByVal ptr As Long, ByVal Value As Long)
Private Declare Function SysAllocStringByteLen Lib "oleaut32" (ByVal ptr As Long, ByVal Length As Long) As Long
Private Declare Function SysAllocStringLen Lib "oleaut32" (ByVal ptr As Long, ByVal Length As Long) As Long

Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function GetCurrentProcessId Lib "kernel32" () As Long
Private Declare Function OpenProcess Lib "kernel32" (ByVal dwDesiredAccess As Long, ByVal bInheritHandle As Long, ByVal dwProcessId As Long) As Long
Private Declare Function WriteProcessMemory Lib "kernel32" (ByVal hProcess As Long, lpBaseAddress As Any, lpBuffer As Any, ByVal nSize As Long, Optional lpNumberOfBytesWritten As Long) As Long

Private Function InIDE(Optional IDE) As Boolean
If IsMissing(IDE) Then Debug.Assert Not InIDE(InIDE) Else IDE = True
End Function

Private Property Get Procedure(ByVal AddressOfDest As Long) As Long
Procedure = AddressOfDest
End Property

Private Property Let Procedure(ByVal AddressOfDest As Long, ByVal AddressOfSrc As Long)
Dim JMP As Currency, PID As Long
' get process handle
PID = OpenProcess(&H1F0FFF, 0&, GetCurrentProcessId)
If PID Then
If InIDE Then
' get correct pointers to procedures in IDE
GetMem4 AddressOfDest + &H16&, AddressOfDest
GetMem4 AddressOfSrc + &H16&, AddressOfSrc
End If
Debug.Assert App.hInstance
' ASM JMP (0xE9) followed by bytes to jump in memory
JMP = (&HE9& * 0.0001@) + (AddressOfSrc - AddressOfDest - 5@) * 0.0256@
' write the JMP over the destination procedure
WriteProcessMemory PID, ByVal AddressOfDest, JMP, 5
' close process handle
CloseHandle PID
End If
End Property

Public Function Split(Expression As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1, Optional ByVal Compare As SplitCompareMethod) As String()
Procedure(AddressOf Strings2.Split) = Procedure(AddressOf Strings2.z_Split)
Split = Strings2.Split(Expression, Delimiter, Limit, Compare)
End Function

Public Function z_Split(Expression As String, Optional Delimiter As String = " ", Optional ByVal Limit As Long = -1, Optional ByVal Compare As SplitCompareMethod) As Long
' general variables that we need
Dim P() As Long, R() As Long
Dim C As Long, I As Long, J As Long, K As Long, LD As Long, LE As Long, PL As Long, PS As Long
' get pointer
PS = StrPtr(Expression)
' length information
LE = LenB(Expression)
LD = LenB(Delimiter)
' unlimited or limited?
If Limit = -1 Then If LD Then Limit = LE \ LD + 1
' validate lengths and limit
If LE > 0 And LD > 0 And Limit >= 0 Then
' find the first item
If Limit > 1 Then
If Compare = [Split BinaryCompare] Then
Do: I = InStrB(I + 1, Expression, Delimiter)
Loop Until (I And 1) = 1 Or (I = 0)
Else
I = InStr(Expression, Delimiter)
End If
End If
' did we find an item?
If I Then
' space for knowing the positions
PL = Limit \ 80
ReDim P(0 To PL)
' InStrB?
If Compare = [Split BinaryCompare] Then
Do
' remember position
P(C) = I - 1
' find next
I = I + LD - 1
Do: I = InStrB(I + 1, Expression, Delimiter)
Loop Until (I And 1) = 1 Or (I = 0)
' increase counter
C = C + 1
If C > PL Then PL = PL + C: ReDim Preserve P(PL)
Loop While I > 0 And C <= Limit
Else ' InStr
Do
' remember position
P(C) = (I - 1) * 2
' find next
I = InStr(I + LD \ 2, Expression, Delimiter)
' increase counter
C = C + 1
If C > PL Then PL = PL + C: ReDim Preserve P(PL)
Loop While I > 0 And C <= Limit
End If
P(C) = LE
' make space for the new items
z_Split = InitStringArray(, , (C + 1) * 0.0001@)
PutMem4 ArrPtr(R), z_Split
' keep it simple, stupid!
I = 0
For C = 0 To C
K = P(C)
J = K - I
If J Then R(C) = SysAllocStringByteLen(PS + I, J)
I = K + LD
Next C
Else
' one item
z_Split = InitStringArray(, , 0.0001@)
PutMem4 ArrPtr(R), z_Split
R(0) = SysAllocStringByteLen(PS, LE)
End If
' clean up z_Split reference
PutMem4 ArrPtr(R), 0
Else
z_Split = InitStringArray
End If
End Function

This is one of those improvements after which I have this "boy do I feel like an idiot" feeling. This is massively faster with large amount of data.

Merri
Jun 23rd, 2010, 03:22 PM
I got interested on improving Join function. Initially the native VB6 function seems nearly unbeatable (without using ASM thunks). Even the functions available at VBspeed (http://www.xbeat.net/vbspeed/c_Join.htm) have a lot of trouble being faster. Also all VBspeed's benchmarks are for joining just one character delimiters: use a two character delimiter and you're back being either equal to or slower than the native Join.

So, with this in mind I started looking for solutions. A major thing that I found (that was missing in VBspeed solutions) was the use of non-array variables. Using this trick I ended up with a very long specialized code that is still experimental, but it is fast with short delimiters (up to 8 ~ 16 characters) & very short array items (0 to 4 characters). Being experimental I'm still working my way with it. One thing that I simply had to do was adding Goto. Without it the function would be simply unreadable long.

Anyway, here is the current version of Strings.bas that now has fast Join & Split. For joining large data (long string array items and/or delimiters longer than 16 characters) I still have to recommend VBA.Join as this function does not provide speed there, just a barely equal speed. With short stuff the speed can be almost twice at fast (and some null length cases are some four times faster, but they're rare & special).


I think I may next move to using only Integer & Long datatypes to simplify the code & make things simpler. At the moment there are arrays for Byte, Integer, Long & Currency. The Currency speed doesn't seem to be as good as I'd like it to be.

Merri
Jun 29th, 2010, 02:31 PM
Another update. Some tweaks that improve both Join & Split, especially with lightweight calls. Also I present the fastest StrReverse implementation in VB code. It is nowhere near the native VB6 function (it is ASM optimized), but it is clearly faster than any of the VBspeed solutions.

Merri
Jun 30th, 2010, 04:40 PM
Quick update. First, a bug fix: I noticed I had a small bug in the Join as it had a comma as a delimiter. Otherwise, I've improved Join so that with short delimiters and/or short strings to join it is always faster than the native Join, up to four times in some cases! With both long delimiters & long array strings native Join is still faster (by a small margin).

I think at this point the Join function has reached "maturity", it no longer has clearly visible weak spots.


I still have some ideas for improvement... for example, RtlMoveMemory copies from right to left. The current copy order goes from left to right for the most part. I throw a guess that if all copying always progressed from right to left then it may improve the overall speed of filling the string, but I can't be sure before I've tested it.

Edit!
I think I had a brainfart while thinking that, but anyway... I did find some interesting results.

Merri
Jul 1st, 2010, 10:57 AM
The code attached is for those curious to check out how reversed code performs. What I have to say here is based on what I found out in my benchmarks:
RtlMoveMemory copies stuff from left to right. It performs optimally when it can run continually from left to right. It performs worse if it needs to jump right to left between calls.
VB6 arrays perform better when stuff is copied right to left – the better if it can run continually from right to left. Arrays perform worse if the copy order is left to right.


My only guess is that VB6's compiler somehow optimizes right to left copying better, or that this is somehow true in general for safe arrays.


Based on this information I've also improved Join function. Especially the 8 character copy performs much better now. However I won't post a new version of Strings2.bas at this point... I want to do more this time before a new release. Just wanted to post this information for those that may still do some performance tuning in VB6. (And yes, I'm doing this only because I'm interested where this old warhorse stands performance wise.)