|
-
Apr 24th, 2007, 12:51 AM
#1
Let's build... IntegerString
This is an idea that popped up in the faster InStrRev thread: a replacement of String that is zero based and has more logical functions. I ended up building a Class Module, which you can't of course use in the same way you've used to use Strings. It also wastes some more memory, but in the other hand it allows us to do speed optimizations that otherwise would make things more complex to use as you'd need to initialize and clean up separately. With Class Module you have Initialize and Terminate events to handle that.
So this is what I have ended up this far: code removed, see second post
I haven't really thought of the procedure names carefully at the moment, so those are likely to change. I'm also thinking about separating them more so that there is always only a minimal amount of parameters, to keep final syntax easy to read.
So anyways, is this better than regular strings? Let us make a benchmark of a regular string usage:
Code:
' in Form1
Option Explicit
Private Sub Command1_Click()
Dim sngStart As Single, lngA As Long, strResult As String
sngStart = Timer
For lngA = 1 To 100000
strResult = strResult & "A!"
Next lngA
Command1.Caption = Format$(Timer - sngStart, "0.00000") & " - " & Len(strResult)
End Sub
Private Sub Command2_Click()
Dim sngStart As Single, lngA As Long, strResult As New IntegerString
sngStart = Timer
For lngA = 1 To 100000
strResult.ValueAppend "A!"
Next lngA
Command2.Caption = Format$(Timer - sngStart, "0.00000") & " - " & Len(strResult)
End Sub
Both add a string "A!" into the end of the string 100000 times, thus resulting into a string that is 200000 characters long. When I compiled this code using all advanced optimizations, I didn't actually expect much: afterall, each adding requires a ReDim Preserve into the integer array. I was surprised:
The native string way took 20 seconds! I knew it was slow, but I didn't remember it was that slow. Now how about IntegerString? It was done immediately, in about 0.05 seconds. The result from both is the same, a string of 200000 characters.
So does this look like something that is worth to develop further?
Edit!
Adding before the IntegerString in the other hand is far slower, that needs some rethinking on how it should be done. Currently it runs for over 50 seconds and that isn't acceptable.
Edit #2!
Now improved to be almost as fast as string. Though it still needs some thinking on the implementation side.
Last edited by Merri; Apr 25th, 2007 at 08:22 AM.
-
Apr 24th, 2007, 04:15 AM
#2
Re: Let's build... IntegerString
Added a few features, most importantly Mid. You can do stuff with it you can't do with the VB's native Mid: use it in the left side and try using a string that doesn't match with the length parameter...
Code:
Dim Test As New IntegerString
Test = "ABC"
Test.Mid(0, 1) = "CBA"
MsgBox Test
Edit!
See post #8 for an attachment.
Last edited by Merri; Apr 25th, 2007 at 03:07 PM.
-
Apr 24th, 2007, 10:49 PM
#3
Re: Let's build... IntegerString
That is way cool. In reading about string optimization under VB6, I saw mention of a class to handle very large strings using byte arrays. (That could be switched to integer arrays like yours to handle unicode.) It was mentioned that the class "allocates more space from time to time. Because it allocates far less often than VB, CString performs much better than a regular String." That might be an approach to consider.
The reference I saw is here; scroll down to Building large strings with CString.
-
Apr 25th, 2007, 03:00 AM
#4
Re: Let's build... IntegerString
I found out that the class leaks memory somewhere: using two classes appears to be crash prone. At the moment I suspect CharLen being somehow related into this, because when I create a new IntegerString within class, the value of CharLen is suddenly reseted to zero. This shouldn't happen. So for some reason the memory space of the classes is shared and this can only lead to problems.
But what I really wonder is why this happens in the first place: why a new class object uses the same variables as the one made earlier?
Edit!
Now things start to clear up: for some reason Char is shared with both classes!
Last edited by Merri; Apr 25th, 2007 at 03:07 AM.
-
Apr 25th, 2007, 08:56 AM
#5
Re: Let's build... IntegerString
Okay, I was able to figure it out, eventually. I didn't always update the pointer to CharLenHeader(3) when I used ReDim. I solved this by making Allocate procedure which handles all things related to allocation. This allows for some optimizations, such as keeping UBound(Char) in a variable as well as doing what CString does: allocating more space less often. I decided to use a buffer of 2 kB in bytes.
I also did some tricks and added Left, Right and InStr. Next I'm considering adding Boyer-Moore search algorithm to InStr as well as adding the remaining missing basic string functions that everyone is familiar from VB. I've also considered adding string functions that people might know from PHP.
Then a small example of what you can do with this class atm:
Code:
Private Sub Form_Load()
Dim Test As New IntegerString
Test = "Moi"
MsgBox Test.Left(2).Right(1).ValuePrepend("[").ValueAppend("]").ValueAppend(vbNewLine).ValueAppend("Can you follow all this?")
End Sub
Yeah, freaky stuff, but it works. And I think it does it faster than native VB code, I didn't benchmark this.
-
Apr 25th, 2007, 10:00 AM
#6
Re: Let's build... IntegerString
 Originally Posted by Merri
Yeah, freaky stuff, but it works.
Ha, that line scares me.
About the 2k buffer; seems a solid value to use. I don't know much about low-level stuff, but I'm guessing there is an actual optimal value. Something about a page? I bet the folks in the C++ forum would know.
Unless you know and that's why you chose 2k, but your wording seems to imply it was somewhat arbitrary. (Other than making it a whole number of k.)
One feature that I'd recommend is ProperCase to go along with Upper and Lower.
-
Apr 25th, 2007, 10:08 AM
#7
Re: Let's build... IntegerString
Although on second thought I'm thinking of how I typically use strings. For example, with my benchmark program I think nothing of defining a UDT with two strings and then declaring an array with 25,000 elements. At 50,000 strings, if the minimum size of a string is 2k, we're looking at 50 megs. That could lead to some serious HD thrashing. (Of course, the average size in memory of those 50,000 strings is 559, so it's using around 14 megs anyway.)
Maybe a sliding scale with variable cutoffs? Just spitballing here, but say 8, 16, 32, 64, 256, etc... up to the first 2048 characters, then 2k blocks after that?
-
Apr 25th, 2007, 02:17 PM
#8
Re: Let's build... IntegerString
I've been doing some improvements again, including the LCase, ProperCase and UCase that you requested. Unfortunatenaly I had to separate some stuff into a new module, because otherwise there would have been tons of extra memory waste for every class that had been loaded into memory. So the separate module includes LTable, PTable and UTable that include the knowledge of lower and upper case conversions and proper case separators (yes, I did find out which characters StrConv uses as case separators).
I improved the buffer and allocation handling, it is now perfectly customizable. If there is a need for it, it is possible to reserve any amount of memory explicitly before putting stuff in. I also did some work to mirror how cString works, to keep things familiar to those who have used it. Although I didn't follow Left, Right and Mid behavior exactly in the same way, I decided allowing Mid for placements is enough and Left and Right can only return a new copy of the portion.
IntegerString has now more features than cString and it also seems to be faster and fully Unicode aware, so we should have a win-win situatation here now 
Attachment includes a simple benchmarking for basic appending to a string and it also has a sample of ProperCase. Or CaseP as I call it: I couldn't use LCase and UCase because I need them elsewhere in the class, so I decided to go with CaseL, CaseP and CaseU: nicely together 
Edit!
Now includes a cool Trim function as well as a license (which I want to have as a legal protection for retaining my name in every copy of this class; there shouldn't be any other restrictions, you're free to use this in commercial and open source projects alike).
Edit #2!
I've now vastly improved the performance of Prepend thanks to adding buffering to the left side of the data as well. I also did some work to get Trim faster, although I'm still not fully pleased with it. In the other hand, Trim can work with any character or multiple characters unlike VB's Trim, so that should make someone happy.
Edit #3!
Added some more benchmark tests and fixed or improved a few things.
Last edited by Merri; Apr 27th, 2007 at 09:49 AM.
-
Apr 26th, 2007, 02:23 AM
#9
Re: Let's build... IntegerString
I looked at and ran your code but what puzzles me is why you wrote some of the functions that were slower than native VB functions, what gives?
I also noticed that when you did you actually wrote vb code. Of course that will slow things down quite a bit. The more vb code you write the more code you generate and nothing is faster than assembly code. So, you are generating more assembly code to run. Anytime you write a loop in vb you are adding to the time tremendously.
-
Apr 26th, 2007, 02:24 AM
#10
Re: Let's build... IntegerString
BTW: you are missing an icon file...
-
Apr 26th, 2007, 05:27 AM
#11
Re: Let's build... IntegerString
 Originally Posted by randem
I looked at and ran your code but what puzzles me is why you wrote some of the functions that were slower than native VB functions, what gives?
First I want to remind that what you see is a work in progress. In hours I haven't spent a whole lot in this project.
The main aim is on features and ease of use, which eventually means some loss on the speed side (and some extra memory consumption). For example, many functions pass a reference to the class itself as a return value or even a fresh new copy of themselves. Slowdowns caused by this kind of things are very noticiable when benchmarking very small strings; in the other hand many VB functions become far slower in comparison when they hit a certain size.
Also, I can't on some situatations use the optimized VB functions (like StrReverse). In the other situatations I've used them for ease of use and haven't yet bothered with optimization (such as the use of Space$). Last but not least, many input values are strings. There'd be a big speed boost in many cases if the input values were StrPtrs and a length of the string was given as well, but that would make the class harder to use. I don't want that. I also don't want to add extra file dependancy (such as TLB files, which would allow for ASM optimized code snippets).
The benchmark doesn't account extra features not possible in VB, either: you can do stuff with Mid and Trim you can't do with VB's native functions. Also, my ProperCase implementation works with Unicode strings unlike the native StrConv version: although my code is faster as well.
This is why Reverse is slower - the native VB6 implementation is fast. As for Trim, there might be something that can be done to improve it. Left side Mid$ in VB is optimized very well.
 Originally Posted by randem
The more vb code you write the more code you generate and nothing is faster than assembly code. So, you are generating more assembly code to run.
Length of written code and the resulting amount of compiled code doesn't go hand to hand. As doesn't speed go with the amount of code. If it would, almost all VB's native code would be very very fast!
Last edited by Merri; Apr 26th, 2007 at 05:50 AM.
-
Apr 26th, 2007, 08:02 PM
#12
Re: Let's build... IntegerString
I've done further changes, but I'm not posting the current version. The simple reason is that I have a memory leak issue somewhere. I haven't located the exact reason for it yet, but it is related to the new Replicate function that I built; or that's the function that raises the problem (function code seems ok).
I really have to rewrite Allocate procedure as it is completely constructed without a centralized thought on it. And because of that there is probably something broken there.
I've ended up renaming Mid to Middle so that I can use Mid$() - I included a new string variable for fast string mode access that is always pointing to the correct position. This allows me to use StrReverse for reversing strings (although class module's overhead shows up in benchmarks pretty badly, but you'd get the same or more disadvantage if you used StrReverse outside the class to do it) and Mid$ for quick non-API based data copy.
But now I'm off to sleep.
-
Apr 27th, 2007, 09:59 AM
#13
Re: Let's build... IntegerString
Here's the latest. I've been experimenting with some optimization tricks and also fixed bugs.
Edit!
Added to CodeBank (class and module only) and submitted to Planet Source Code (includes a sample benchmarker project)
This thread remains for the main thing: "do it better".
Last edited by Merri; Apr 27th, 2007 at 06:57 PM.
-
May 1st, 2007, 08:45 AM
#14
Retired VBF Adm1nistrator
Re: Let's build... IntegerString
System.Text.StringBuilder FTW
Microsoft MVP : Visual Developer - Visual Basic [2004-2005]
-
May 1st, 2007, 09:45 AM
#15
Re: Let's build... IntegerString
VB6 FTW
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|