String concatenation (which should use &, not +, btw) is slow. ReDim Preserve is very slow, especially inside a loop. I'd rewrite the logic to remove both of these elements.
Here's a pure native VB6 solution (no API) that runs a hair slower than your solution. It uses the native Split() function, which is slow. No doubt implementing one of Merri's faster Split() variations would speed it up.
vb Code:
Public Function GetNumbers(ByVal pstrText As String) As String()
Yes, according to the benchmark program I tossed together. I ran his sample string through both algorithms 100,000 times and got:
His: 1.094
Mine: 1.171
What were your results?
Also (splitting hairs) the original code doesn't use any API calls. ArrayPtr and PutMem4 are both VB6 runtime functions.
Yes it does use API calls. If you comment out the two declarations at the top of his code it'll throw a "Sub or Function not defined" compilation error. By definition, that's not native VB6.
I modified the code, using only one time ReDim Preserve() but i tested it and is slower than the previous one did. :S
Why?
Thanks!
Yeah, your code confused me by being counter-intuitively fast. My tips are true in the general sense, but I think your particular implementation is the exception that proves the rule.
I think it's because the tokens are so very short that concatenation is plenty fast -- what are we thinking, an average of 2-3 digits per token? -- and the difference between the amount of memory that needs to be cleared by redimming the maximum possible space is much larger than redim preserving as needed inside the loop.
If your strings were 1000 characters long and your numbers were 20 digits long I think your second implementation would be faster. That's my best guess, at least.
I'm only useful for optimizing native code. Others that post regularly are orders of magnitude better with esoteric optimization methods using API, so keep an eye on the thread over the next couple days.
EDIT: We're now in Code It Better, which gets very little traffic. You might consider linking to this thread from your previous one to get the big guns to come over and play. (Merri in particular enjoys squeezing the most speed possible out of VB6.)
Last edited by Ellis Dee; Jan 13th, 2011 at 05:01 AM.
Public Function GetNumbers(ByVal strText As String) As Integer()
Dim r As Integer
Dim bHaveDigit As Boolean
Dim intChar As Integer
Dim intCounter As Integer = -1
Dim intExponent As Integer = -1
Dim intNumbers As Integer() = Nothing
Array.Resize(intNumbers, 9)
For r = Len(strText) To 1 Step -1
intChar = Asc(Mid(strText, r, 1))
Select Case intChar
Case 48 To 57
If Not bHaveDigit Then
intCounter += 1
If intCounter / 10 = Int(intCounter / 10) Then
Array.Resize(intNumbers, intCounter + 10)
End If
End If
bHaveDigit = True
intExponent += 1
intNumbers(intCounter) += CType((intChar - 48) * 10 ^ intExponent, Integer)
Case Else
If bHaveDigit Then
bHaveDigit = False
End If
intExponent = -1
End Select
Next
Array.Resize(intNumbers, intCounter)
Array.Reverse(intNumbers)
Return intNumbers
End Function
Public Function GetNumbers(ByVal strText As String) As Integer()
Dim r As Integer
Dim bHaveDigit As Boolean
Dim intChar As Integer
Dim intCounter As Integer
Dim intExponent As Integer
Dim intNumbers() As Integer
intCounter = -1
intExponent = -1
redim intNumbers (9)
For r = Len(strText) To 1 Step -1
intChar = Asc(Mid(strText, r, 1))
Select Case intChar
Case 48 To 57
If Not bHaveDigit Then
intCounter = intCounter + 1
If intCounter / 10 = Int(intCounter / 10) Then
redim preserve intNumbers(intCounter + 10)
End If
End If
bHaveDigit = True
intExponent = intExponent + 1
intNumbers(intCounter) = intNumbers(intCounter) + (intChar - 48) * 10 ^ intExponent
Case Else
If bHaveDigit Then
bHaveDigit = False
End If
intExponent = -1
End Select
Next
redim preserve intNumbers (intCounter)
'Array.Reverse(intNumbers)
GetNumbers = intNumbers
End Function
I don't have VB6 anymore and haven't used it in awhile so let me know if it doesn't compile.
Also the numbers are stored in the array in opposite order which can easily be corrected if the speed increase is significant.
Last edited by MarMan; Jan 13th, 2011 at 10:12 AM.
Reason: Missed a couple of VB.NET to VB6 conversions.
Public Function GetNumberM(ByVal Text As String) As String()
Dim B() As Byte, i As Long, M As Boolean, L As Long, P As Long, PC As Long, PT As Long
PT = 1
If LenB(Text) Then
B = Text
For i = 0 To UBound(B) - 1 Step 2
Select Case B(i)
Case 48 To 57
M = B(i + 1) = 0
Case Else
M = False
End Select
If M Then
If P = 0 Then P = (i \ 2) + 1
ElseIf P Then
PC = (i \ 2) + 1
L = PC - P
If PT < P Then Mid$(Text, PT, L + 1) = Mid$(Text, P, L) & " " Else Mid$(Text, PC, 1) = " "
PT = PT + L + 1
P = 0
End If
Next i
If P Then
PC = (i \ 2) + 1
L = PC - P
If PT < P Then Mid$(Text, PT, L + 1) = Mid$(Text, P, L)
PT = PT + L + 1
End If
GetNumberM = Split(Left$(Text, PT - 2), " ")
Else
GetNumberM = Split(vbNullString)
End If
End Function
Don't have the time for API-doped version just right now.
Last edited by Merri; Jan 14th, 2011 at 07:12 AM.
Reason: Update to fix a bug
Merri put the result here, now i'm not in my own computer and i have the portable version wich can't compilate...
I'm seeing yor new functions.
It would better if you put the Ubound() of your array in a varible, if you doo this the For...Next will not have to calculate it every time.
I love your way to do it but i have some questions that i will post tomorrow
---------------------------------------------------------------------------------------------------------------------------
Edit:
Any advice about my function Merri?
Last edited by *PsyKE1*; Jan 13th, 2011 at 03:51 PM.
Attached is a benchmark program that includes the four methods posted to the thread. My solution had a bug where I forgot to specify what I was splitting the return value by, and it appears Merri had the same bug. I corrected that for both of us in the benchmark program.
I included a "validate" button to verify the results, which revealed the limitation of MarMan's method of using a numeric array: leading zeroes are lost. It also revealed that Psyke's method leaves a trailing blank in the returned array. Here's the validation results:
This forum is for intellectual exercises, not functional solutions.
not according to this: The Purpose Of This Forum (by Brad Jones)
anyway, here's my attempt. I haven't programmed in vb6 in a while, and needed the challenge.
smaller code is "usually" faster but i'll let someone else benchmark it. I tried to keep mine as a direct replacement for the first function. If i were to write my own function, i would have made use of a string and split it into an array instead of using a method that required a variant.
Code:
Private Function GetNumbersbyStringI(ByRef s As String) As String()
Dim cl As Long, Tempchar As String, tempchar2 As String, TempNumString As String
Dim myarray() As String, arraycount As Long
For cl = 1 To Len(s)
Tempchar = Mid(s, cl, 1)
If IsNumeric(Tempchar) Then
Do
TempNumString = TempNumString + Tempchar
cl = cl + 1
Tempchar = Mid(s, cl, 1)
If Not IsNumeric(Tempchar) Then
arraycount = arraycount + 1
ReDim Preserve myarray(1 To arraycount)
myarray(arraycount) = TempNumString
TempNumString = ""
Exit Do
End If
Loop
End If
Next cl
GetNumbersbyStringI = myarray
End Function
Edit: Don't bother. It's about 50% slower. I actually managed to get this down to about 5 lines of code and every single thing i did (except getting rid of the redim preserve) slowed it down. I give up
Last edited by Lord Orwell; Jan 14th, 2011 at 05:16 AM.
Private Sub Form_Load()
Dim S As String
S = "1A2B3C4D5E6F7G8H9"
Combo1.AddItem S
S = Space$(10241)
Mid$(S, 1, 18) = "1A2B3C4D5E6F7G8H9I"
Mid$(S, 19) = S
Combo1.AddItem S
S = "123456789"
Combo1.AddItem S
S = Space$(10241)
Mid$(S, 1, 10) = "123456789A"
Mid$(S, 11) = S
Combo1.AddItem S
Combo1.ListIndex = 0
End Sub
In output it now it creates:
a short list of very short strings
a long list of very short strings
a single item of a long string (for a number nine digits is pretty long in regular use)
a long list of long strings
So this benchmark tests the extreme cases, not your avarage ones.
Compiled benchmark one:
Psyke: 73 ms
Ellis Dee: 70 ms
MarMan: 53 ms
Merri 1: 53 ms
Merri 2: 30 ms
Compiled benchmark two:
Psyke: 5555 ms
Ellis Dee: 3544 ms
MarMan: 2940 ms
Merri 1: 2333 ms
Merri 2: 1508 ms
Compiled benchmark three:
Psyke: 32 ms
Ellis Dee: 28 ms
MarMan: 37 ms
Merri 1: 20 ms
Merri 2: 14 ms
Compiled benchmark four:
Psyke: 2427 ms
Ellis Dee: 1736 ms
MarMan: 3114 ms
Merri 1: 553 ms
Merri 2: 357 ms
As my faster one was missed being only in the attachment:
Code:
Private Declare Function ArrPtr Lib "msvbvm60" Alias "VarPtr" (Arr() As Any) As Long
Private Declare Sub PutMem4 Lib "msvbvm60" (ByVal Ptr As Long, ByVal Value As Long)
Public Sub GetNumberM2(Text As String, O() As String)
Dim LA() As Long, LH(0 To 5) As Long, LP As Long
Dim IA() As Integer, IH(0 To 5) As Long, IP As Long
Dim C As Integer, i As Long, P As Long, T As Long
LP = ArrPtr(LA)
LH(0) = 1: LH(1) = 4: LH(4) = &H3FFFFFFF
PutMem4 LP, VarPtr(LH(0))
IP = ArrPtr(IA)
IH(0) = 1: IH(1) = 2: IH(3) = StrPtr(Text): IH(4) = Len(Text)
LH(3) = IP: LA(0) = VarPtr(IH(0))
ReDim O(0 To Len(Text) \ 2)
For i = 0 To UBound(IA)
C = IA(i)
If C > 57 Then
If P Then
O(T) = Mid$(Text, P, i - P + 1)
T = T + 1
P = 0
End If
ElseIf C < 48 Then
If P Then
O(T) = Mid$(Text, P, i - P + 1)
T = T + 1
P = 0
End If
ElseIf P = 0 Then
P = i + 1
End If
Next i
If P Then
O(T) = Mid$(Text, P, i - P + 1)
ReDim Preserve O(T)
ElseIf T Then
ReDim Preserve O(T - 1)
Else
O = Split(vbNullString)
End If
LH(3) = IP: LA(0) = 0
LH(3) = LP: LA(0) = 0
End Sub
The attached project also contains fixed code for MarMan (Integer -> Long) and my first function.
This isn't true in VB6. Smaller code usually means you are using built-in functions, for example Split, to keep your code short. With longer code that I write (and it is often the longest solution) I do a lot of small tweaking, which results in a long code. But it is often the fastest as well.
The world of .NET may be a lot different though, as they probably have optimized a lot of the functions. In VB6 only a few built-in functions have been truly optimized for speed.
MarMan: here is an optimized version of your function:
Code:
Public Function GetNumbers(strText As String) As Long()
Dim LA() As Long, LH(0 To 5) As Long, LP As Long
Dim IA() As Integer, IH(0 To 5) As Long, IP As Long
Dim r As Long
Dim bHaveDigit As Boolean
Dim intChar As Integer
Dim intCounter As Long
Dim intValue As Long
Dim intNumbers() As Long
LP = ArrPtr(LA)
LH(0) = 1: LH(1) = 4: LH(4) = &H3FFFFFFF
PutMem4 LP, VarPtr(LH(0))
IP = ArrPtr(IA)
IH(0) = 1: IH(1) = 2: IH(3) = StrPtr(strText): IH(4) = Len(strText)
LH(3) = IP: LA(0) = VarPtr(IH(0))
ReDim intNumbers(255)
For r = 0 To UBound(IA)
intChar = IA(r) - 48
If intChar >= 0 And intChar <= 9 Then
intValue = intValue * 10 + intChar
bHaveDigit = True
ElseIf bHaveDigit Then
intNumbers(intCounter) = intValue
intCounter = intCounter + 1
If (intCounter Mod 256) = 0 Then
ReDim Preserve intNumbers(intCounter + 255)
End If
intValue = 0
bHaveDigit = False
End If
Next
If bHaveDigit Then intNumbers(intCounter) = intValue: intCounter = intCounter + 1
If intCounter > 0 Then
ReDim Preserve intNumbers(intCounter - 1)
GetNumbers = intNumbers
End If
LH(3) = IP: LA(0) = 0
LH(3) = LP: LA(0) = 0
End Function
Exponent is slow, so I turned the logic around to multiply by 10. This also cures the problem of having results in the reversed order. I also removed Select Case structure to a somewhat simpler If structure, which in this case executes faster (yes, I tested this). Finally I added the Integer array instead for Asc & Mid combination which makes this solution the fastest. If you want a pure VB6 solution instead then the change is quite easy to do:
Code:
Public Function GetNumbers(strText As String) As Long()
Dim r As Long
Dim bHaveDigit As Boolean
Dim intChar As Integer
Dim intCounter As Long
Dim intValue As Long
Dim intNumbers() As Long
ReDim intNumbers(255)
For r = 1 To Len(Text)
intChar = AscW(Mid$(strText, r, 1)) - 48
If intChar >= 0 And intChar <= 9 Then
intValue = intValue * 10 + intChar
bHaveDigit = True
ElseIf bHaveDigit Then
intNumbers(intCounter) = intValue
intCounter = intCounter + 1
If (intCounter Mod 256) = 0 Then
ReDim Preserve intNumbers(intCounter + 255)
End If
intValue = 0
bHaveDigit = False
End If
Next
If bHaveDigit Then intNumbers(intCounter) = intValue: intCounter = intCounter + 1
If intCounter > 0 Then
ReDim Preserve intNumbers(intCounter - 1)
GetNumbers = intNumbers
End If
End Function
But this slows it down quite a bit, because Mid$ creates a new string for each character. The earlier code just reads the strText data directly as an Integer array.
Edit: Please disregard this. I didn't read the above post.
My results weren't very good. (optimized for short strings with sequential digits does poorly with single digits interspersed with other characters in very large strings)
I tried to improve it to work well with different sized strings. Thanks for testing it for me!
Code:
Public Function GetNumbers(ByVal strText As String) As Integer()
Dim r As Integer
Dim bHaveDigit As Boolean
Dim intChar As Integer
Dim intCounter As Integer
Dim intExponent As Integer
Dim intNumbers() As Integer
Dim lngGuessCount As Long
intCounter = -1
intExponent = -1
lngGuessCount = 10 * (Len(Str(Len(strText))) - 2)
if lngGuessCount < 10 then
lngGuessCount = 10
elseif lngGuessCount > 1000000 then
lngGuessCount = 1000000
endif
redim intNumbers (lngGuessCount - 1)
For r = Len(strText) To 1 Step -1
intChar = Asc(Mid(strText, r, 1))
Select Case intChar
Case 48 To 57
If Not bHaveDigit Then
intCounter = intCounter + 1
If intCounter / lngGuessCount = Int(intCounter / lngGuessCount) Then
redim preserve intNumbers(intCounter + lngGuessCount)
End If
End If
bHaveDigit = True
intExponent = intExponent + 1
intNumbers(intCounter) = intNumbers(intCounter) + (intChar - 48) * 10 ^ intExponent
Case Else
If bHaveDigit Then
bHaveDigit = False
End If
intExponent = -1
End Select
Next
redim preserve intNumbers (intCounter)
'Array.Reverse(intNumbers)
GetNumbers = intNumbers
End Function
Last edited by MarMan; Jan 14th, 2011 at 10:03 AM.
This isn't true in VB6. Smaller code usually means you are using built-in functions, for example Split, to keep your code short. With longer code that I write (and it is often the longest solution) I do a lot of small tweaking, which results in a long code. But it is often the fastest as well.
The world of .NET may be a lot different though, as they probably have optimized a lot of the functions. In VB6 only a few built-in functions have been truly optimized for speed.
my first idea was to convert to a byte array and work with it that way, but you had already done that and i didn't want to rehash your work.
Well, my first byte array solution isn't the fastest possible, so feel free to improve on the byte array method
Atm I have two functions quite finished, GetNumbersToLong & GetNumbersToString – I guess there isn't much that could still be improved so I'll post them:
vb Code:
Option Explicit
Private Declare Function ArrPtr Lib "msvbvm60" Alias "VarPtr" (Arr() As Any) As Long
Private Declare Sub PutMem4 Lib "msvbvm60" (ByVal Ptr As Long, ByVal Value As Long)
Private Declare Function SafeArrayRedim Lib "oleaut32" (ByVal saPtr As Long, Optional saBound As Currency) As Long
Public Function GetNumbersToLong(Text As String) As Long()
' temporary safe array variables
Dim LA() As Long, LH(0 To 5) As Long, LP As Long
Dim IA() As Integer, IH(0 To 5) As Long, IP As Long
' other variables
Dim C As Long, D As Long, I As Long, N As Boolean, O() As Long, V As Long
' create a temporary Long array to replace the need for PutMem4
LP = ArrPtr(LA)
' create safe array header for Long array
LH(0&) = 1&: LH(1&) = 4&: LH(4&) = &H3FFFFFFF
' this is the only PutMem4 call we need, accessing a Long array is much faster than calling PutMem4!
PutMem4 LP, VarPtr(LH(0&))
' create a temporary Integer array to access the contents of Text
lngTextLen = LenB(sText) \ 2 + 1
Len is faster than LenB(sText) \ 2 – I don't know why.
ReDim strTempArr$(0)
This would be equal to thinking "I expect to have only one item, it is rare to have more than one"
Code:
Do
If intAsc(lngPos) < 48 Then Exit Do
If intAsc(lngPos) > 57 Then Exit Do
strNum = strNum + ChrW$(intAsc(lngPos))
lngPos = lngPos + 1
Loop Until lngPos = lngTextLen
While you have optimized the check for string length (lngPos = lngTextLen) it doesn't help much as you're doing a string concatenation. String concatenation is actually relatively fast with short strings, but it is always faster to create the final full size string if possible.
Code:
If LenB(strNum) Then
Q = Q + 1
ReDim Preserve strTempArr$(Q)
End If
You do ReDim Preserve each time you add a new item. This is too often and is the biggest performance bottleneck in your code.
Finally, your code doesn't account for the empty extra item at the end of your array.
I've been playing around with the idea of reserving one big space in memory for all the BSTRs that are created and then manually fill this space. Well, I found out that I can't do this. BSTRs are reserved from process heap, you can validate this with HeapSize(GetProcessHeap, 0, StrPtr(strText) - 4) – and this reveals the fact that each string is a single item in the heap, and one such item has always a specific size. So you can't have one continuous, minimal block of memory that has all the output strings there, because that could be only one big item in the heap. The system will not be able to follow this and you'll eventually get a crash, because it expects each BSTR to be a single item in the heap.
So the conclusion is that it'll be very hard to optimize the string solution, because creating strings via API calls will be slower than native VB6 code (such as Mid$).
Hi Merri!
Thanks for the tips men!
I have two little questions for you:
1.-Are you sure that Len() is faster that LenB()\2 ?
I think that i tested it days ago and if i compile the program LenB()\2 was faster... I think... :P
2.-I can´t create 2 intarrays using PutMem4 api...
Imagine that i have 2 string variables in the call, how is the way to convert it to Integer? my vb crash when I try do something...
Thanks friend, now i haven't much time, but in a few days i'll look your new functions.
1) It is quite minor so as it isn't in a loop it doesn't really matter – in this particular use Len() would win as it is shorter to write. Many speed optimizations only really matter within a loop.
2) Probably better to just post code, but I guess it would be a matter of another topic (to keep this one cleaner).
If intCounter / lngGuessCount = Int(intCounter / lngGuessCount) Then
Could also be dealt this way:
If (intCounter Mod lngGuessCount) = 0 Then
I think it should be faster, I don't bother to test.
Also if you used a value that is a multiple of 2 then you could use And, ie. in case lngGuessCount = 1024:
If (intCounter And (lngGuessCount - 1)) = 0 Then
It would trigger every 1024th, because 1023 would match all the lower bits. But only works with multiple of 2 (4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 etc.) – and this would certainly be the fastest.
Well, I did use Mod there anyway as I didn't stop to really think about And until my last post Mod works with any number.
Instead I used 256 as, without testing really, I thought it may work more optimally with memory. The array would always grow with 1024 bytes when more space is needed. However I'm not certain whether this in itself would be a good reason for better speed, but using And would definately give a nice boost over Mod.
If to be compared with memory allocation benchmark it has to be remembered that each computer is a bit different. Something that works great with one computer in comparison to another method may not give equal kind of difference on another computer. So for me allocation style is something that comes with another kind of priority: minimizing memory usage. Increasing 1 kB step-by-step is quite minimal waste, but if size of array would be doubled each time then it would be likely the array would at one point have a lot of waste. Worst case scenario: 100000 items in array, ReDim Preserve to 200000 items, whoops, the next item is the last one to be added, ReDim Preserve back to 100001 items...
So if one wanted to make the allocation more intelligent then the remaining amount of max. possibilities could be accounted for before doing such waste. Or one could allocate more items based on avarage findings until current position... this kind of optimizations could be done, but they'd require new test strings that are not as straightforward as the ones that for example I used in my test.