Merging two data streams (string or byte array)
This is probably going to be a bit advanced for some people...it definitely is for me...what I basically want is to be able to take two strings and merge their *values* together to make a third string.
For instance, if I had "aaaaaa" and the values of the second string was "123456" (note the string *ISN'T* 123456, their ascii values would be) then the outputted string would be "bcdefg"
This is a piece of code that I have written that can do it the simple way:
VB Code:
str1 = "aaaaaa"
str2 = "123456"
For b = 1 To Len(str1)
tmpnum = Asc(Mid(str1, b, 1)) + Val(Mid(str2, b, 1))
If tmpnum > 255 Then tmpnum = tmpnum - 256
str3 = str3 & Chr(tmpnum)
Next b
For simplicity, I have made this use val rather than asc for the second string, and for the purposes of my requirements either way works fine for me...and if val was used, the string COULD be "123456" rather than an ascii equivalent. The code also *HAS* to deduct 256 from the value if it's over 255 to keep it a valid character
Basically, this piece of code will be run something like 90%-95% of the time in my program and I need it to be as fast as possible. In my case, the code works with 50k strings of text and will need to process many gigabytes of data, so any speed increase would help.
For the record, this sort of code used with a prng (search wiki for the word :-)) would make an excellent encryption system...although that's not what I am doing with it. :-)
Another aspect of this code I will need is the ability to do the opposite...deduct the value from string 1 and if it's below 0 add 256 to it...which again would be useful for decrypting the encrypted data...the simple way to do this is to have 256 minus the value in string 2 then deduct 256 from the value of the number...or two lines of code where one adds while the other subtracts and an if/then on each checking for something.
And if anyone is wondering why I would want this, it's just a filter for something I'm working on...you could say it's encryption as that *is* the effect it has but the resulting effect for me has nothing to do with encryption :-P
Re: Merging two data streams (string or byte array)
To see a "near fastest possible on VB6" method of processing strings, check out this post by me. It probably goes way over your head, but what you see there is really fast. It avoids making extra copy of a string, it avoids string processing (which is slow), it avoids any unnecessary extra calls and sticks with the minimum and fast.
You'd need to have two arrays instead of just one I've used in that code.
Note: hitting Stop while that code is running probably crashes your IDE. Don't do it. Manipulating memory follows the rule "if you change something that is done by something else, restore the original state once you're done".
Re: Merging two data streams (string or byte array)
Yeah, most of it is beyond me on first glance, I don't see how it would be able to do what was required by me :-)
BTW, I noticed you use gettickcount but don't have it declared anywhere in that code :-P
Re: Merging two data streams (string or byte array)
I'm just about to go to sleep so I'm just going to go with a short "introduction".
When you have a string variable, it has two things in memory: the data part, where all the characters are. Right before this datapart is also length of the data. This data part is referenced by a pointer in memory. Basically a string variable is a pointer, a four bytes long value.
Then we have an integer array that is empty. When integer array is defined, it has a SAFEARRAYHEADER. The array variable in VB6 is a pointer to this header. Then, this header has another pointer value to where the data part resides.
What the code does is that it changes the integer array's pointer to point into a custom SAFEARRAYHEADER, which in turn has it's settings set so that it points into data part of a wanted string. In practise this allows handling the string's contents directly using the integer array, so iarText(0) would be the same as AscW(Left$(strText, 1)) - for the exception that accessing the integer array is greatly faster.
A simple training function that you can use instead of BeforeNumbers, you can paste this into the clsSA.cls:
VB Code:
Public Function Test(ByRef Text As String) As Integer
' make iarText data pointer to point to Text string's data
SA(3) = StrPtr(Text)
' this makes UBound work correctly
SA(4) = Len(Text)
' since this is a simple function, it simply returns the ASCII value of the first character
Test = iarText(0)
End Function
Usage would be something like
VB Code:
MsgBox "Character code for A is " & SA.Test("A")
You can use this line of code to replace everything that is between Set SA = New clsSA and Set SA = Nothing
Re: Merging two data streams (string or byte array)
I think I understand most of that...how it all actually works is probably beyond me though so maybe I'll stick with either the code I pasted originally or an optimised version of that using some sort of array so I don't have to keep doing asc/chr/instr on parts of the string :-)
I will have a play about with your code to see if it helps though, of course...maybe it'll speed up certain aspects :-)
Re: Merging two data streams (string or byte array)
"Certain aspects" might not be descriptive enough: the code might end up being 100 times faster or more than the current code you have.
An alternative "easy" method is to use a byte array. You can convert strings directly to byte arrays and vice versa. Why this is slower than the method I've shown above is simple: you copy memory. Always when you copy memory around, you cause slowdown. If you're able to work without moving memory a lot, you get fast results. The faster the longer the data to be processed.
But I guess I'm now off to bed.
Re: Merging two data streams (string or byte array)
I am actually currently using a byte array, and the 50k block of data is converted once to a byte array and the "filter" I mentioned above is to be run on the byte array up to 100 times then converted back to a string.
I estimate that my program is going to take about 55 hours to do what it needs to do, and a few minor tweaks I have already done to the code has maybe got it down to 20 hours...I am going to need to get it to 1 hour or less to be worth it :-)
For now my plan is just to get it doing what it needs to do (which it currently doesn't do at all :-)) then I will work on improving the speed of it...possibly to the extent of giving the code to a friend to see if they can rewrite it in a faster language like C++ :-)
Re: Merging two data streams (string or byte array)
For pure raw speed write it in FORTH. That can be even faster than assembly. Of course you have to learn to think backwards first ...
Re: Merging two data streams (string or byte array)
Quote:
Originally Posted by Al42
For pure raw speed write it in FORTH. That can be even faster than assembly. Of course you have to learn to think backwards first...
I can only barely do it forwards :-)
Re: Merging two data streams (string or byte array)
Now that I've had my sleep, lets go with "what you're actually doing" type of thinking. If you're just reading normal ANSI text files, then byte arrays are the way to go. If you read Unicode text files (UTF-16), then integer arrays are your choice.
Integers give you headache by being unsigned. You'd have to convert them to Long and then back to Integer. You also need to take that you're not trying to give too big numbers, ie. 65536 is too big for an integer and 256 is too big for a byte. With bytes:
VB Code:
ByteArray(X) = CByte((CLng(ByteArray(X)) + 10) And &HFF&)
A faster method for VB would be to actually make a table (array) with precalculated values. In this example, Table(0) would be 10 and Table(255) would be 9. The major advantage with this would be the lack of data type conversions when the actual process is going on. I actually bothered to code this:
VB Code:
' in a module
Option Explicit
Public Type MYBYTETABLE
ToThis(255) As Byte
End Type
Public ByteAdd(255) As MYBYTETABLE
Public Sub InitByteTable()
Dim lngA As Long, lngB As Long
For lngA = 0 To 255
With ByteAdd(lngA)
For lngB = 0 To 255
.ToThis(lngB) = CByte((lngB + lngA) And &HFF&)
Next lngB
End With
Next lngA
End Sub
VB Code:
' a sample program
Option Explicit
Private Sub Form_Load()
InitByteTable
MsgBox "Oh wow! 255 + 10 = " & ByteAdd(10).ToThis(255)
End Sub
Oh, one last thing: VB works much much faster when you compile the program. Also, if you set all advanced optimizations on, the array handling becomes far far faster. But you can't rely the working of the program on errors, you must make sure you never use too big or too small indexes... indexes that there aren't in the array won't rise an error and might crash your program. Same applies for data.
Re: Merging two data streams (string or byte array)
Thanks for that, i'll try it out...the actual data will be any file at all so any byte value from 0 to 255...whether what I have planned works or not I will tell people here about it when I know either way :-)
Just a few things...
I don't understand what "Table(0) would be 10 and Table(255) would be 9." means exactly.
"MsgBox "Oh wow! 255 + 10 = " & ByteAdd(10).ToThis(255)"...i am assuming here without running it that it'd return 265...hmmm...(reads previous line)...(loud clanking noise)...(steam coming out of ears)...
...ah, i see now what you're saying...would this code actually be efficient no matter how many times it was used and how many different values were passed (for instance 0/0 to 255/255) or would running it 10000s of times actually be slower...I am sure I am totally missing something here looking at this but obviously can't place it :-)
I am thinking now that the best way to use this (if it works as I now believe it does) is to write a function that makes use of two byte arrays with one of them being the 50k block of data I am processing and the other being the data I am filtering it with...does that sound right to you?
Re: Merging two data streams (string or byte array)
Yes, that's how you should do it. First array for the file, second array for the filter data. Then you can use what I've shown above to "convert" the file data into another format. For speed, you should work directly back to the file array so you don't need to reserve memory for a third array, which would hold the end result.
So code would be like:
VB Code:
FileArray(X) = ByteAdd(FilterArray(Y)).ToThis(FileArray(X))
We'll see if you can figure out how to process these X and Y variables efficiently :)
Btw, character "0", which you used in your examples above, is actually 48 if you look into it memorywise. Character "A" is 65. If you add "0" to "A" in pure math, you get "q" (113) and not "B" (66).
Re: Merging two data streams (string or byte array)
So what i think you are saying at the end is that when splitting the filter data into the array I shouldn't use 0-9 but should instead use chr(0) to chr(9)...don't worry, I plan to either use a standard prng (pseudo-random number generator) or write my own based on established methods...that way the end project I am planning has as small a footprint as possible on the hard drive :-)
Because I am using a prng, I would store the numbers directly as numbers into the byte array so there should be no issues :-)
Edit: Forgot to ask...you have "ByteAdd(10).ToThis(255)"...I am pretty sure I could write my own ByteDed(x). to deduct the value, but how would I do the ".ToThis(lngB) = CByte((lngB + lngA) And &HFF&)" in the initByteTable? (Yes, I know I don't edit the one in byteadd, i'd make a ByteDed one below :-))...I don't really understand how the &HFF& affects it...I know it's 255, but don't know exactly what it is doing to the algorhythm :-)
Re: Merging two data streams (string or byte array)
Just made use of the code, and it *seems* to be working almost perfectly...I've written a piece of code to generate a string of "a"s and a random string of 0-9 numbers then split them into two byte arrays...the above code takes about 6s to complete but the actual filter takes 30ms...both uncompiled...and this is without any tweaking by me yet :-)
Only bug I can see is that it only seems to filter half the byte array and leave the rest alone...here's the code I used:
VB Code:
Dim str1() As Byte, str2() As Byte
For b = 1 To 50000: tx1 = tx1 & "a": tx2 = tx2 & Chr(Int(Rnd(1) * 10)): Next b
str1 = tx1
str2 = tx2
For b = 1 To 50000
str1(b) = ByteAdd(str2(b)).ToThis(str1(b))
Next b
Text1 = str1
It's early/late here in the UK (6am) so I am off to bed now...hopefully tomorrow I will be able to sort this if no-one else has pointed out my glaring error (if it's that visible :-))
Re: Merging two data streams (string or byte array)
Your error is very clear: strings are Unicode, two bytes per character. 50000 characters = 100000 bytes.
Also, your method of adding up strings like that is slow. You could just use other features VB provides:
VB Code:
Dim str1() As Byte, b As Long
InitByteTable
str1 = StrConv(String$(5000, "0"), vbFromUnicode)
For b = 0 To UBound(str1)
str1(b) = ThisByte(str1(b)).Add(Int(Rnd * 10))
Next b
' note: if you use a very long string, you get a major slowdown
' when you put it into textbox
Text1.Text = StrConv(CStr(str1), vbUnicode)
Erase str1
But note that you can't restore the original state this way. You probably want to use a certain key to scramble things up, although a good hacker can figure that out rather easily (especially if the key is short). Basically you'd use a shorter key to loop through a bigger file.
Re: Merging two data streams (string or byte array)
Quote:
Originally Posted by Merri
Your error is very clear: strings are Unicode, two bytes per character. 50000 characters = 100000 bytes.
I've had problems like that before, and just used strconv to fix it...I'm still a bit unsure of what to do and when to do it with these things though :-)
Quote:
Originally Posted by Merri
Also, your method of adding up strings like that is slow. You could just use other features VB provides
Yeah, I'm aware that there's other ways to do it...just for the purposes of testing I was writing quick and dirty code :-)
Quote:
Originally Posted by Merri
But note that you can't restore the original state this way. You probably want to use a certain key to scramble things up, although a good hacker can figure that out rather easily (especially if the key is short). Basically you'd use a shorter key to loop through a bigger file.
I say again, it's not an encryption code (although it could be used as such) :-P
I actually have a much more secure custom encryption code I wrote a while back...basically generates filter strings from a password and uses a special replacement function to filter it...similar to this code but it's simpler :-)
Although restoring the original state will be important...read a couple of posts up, as I ask about modifying ByteAdd(x) so I can also have a ByteDed(x) which also loops. Be aware that at NO point will I need both ByteAdd and ByteDed at the same time so they could both use ToThis() so maybe initbytetable(1,255) for initialising an add array (the 255 is explained below) and initbytetable(2,255) for initialising a deduct array for rebuilding the data using the filters
There *IS* another issue I forgot about...is there any way to set an upper limit for the values? For instance, if I type initbytetable(255) it would allow a limit of 0-255 so once the number passes 255 it would loop back, but if I typed initbytetable(100) it would allow only 0-100 then loop...my guess would be modifying the "and &HFF&" but I wouldn't know how esp. as the number would be variable :-)
Re: Merging two data streams (string or byte array)
VB6 does some ANSI <-> Unicode conversions. The basic rules:- When you load a file in text mode, VB6 converts ANSI to Unicode. When you save in text mode, Unicode is converted to ANSI.
- When you call an API call, strings passed are converted to ANSI. When you get a string result from API, the string is converted to Unicode. (There are no difference between ANSI and Unicode versions of the API function).
- VB6's native controls don't support Unicode. Thus Unicode and ANSI conversions happen all the time under the hood.
- RichTextBox supports Unicode, but VB6 port of it does not. Same applies to some other controls.
I made a version in which I swapped some code around:
VB Code:
' in a module
Option Explicit
Public Type BYTETABLE
Add(255) As Byte
Remove(255) As Byte
End Type
Public ThisByte(255) As BYTETABLE
Public Sub InitByteTable()
Dim lngA As Long, lngB As Long
For lngA = 0 To 255
For lngB = 0 To 255
ThisByte(lngB).Add(lngA) = CByte((lngB + lngA) And &HFF&)
ThisByte(lngB).Remove(lngA) = CByte((256 + (lngB - lngA)) And &HFF&)
Next lngB
Next lngA
End Sub
You can use Mod to limit a value. 100 Mod 100 = 0. 101 Mod 100 = 1. And does a simple bitwise comparison.
Re: Merging two data streams (string or byte array)
That's a useful piece of info, thanks...I used to basically use a 3 or 4 part line where first thing it did was divide the number by 256, int it then multiply and deduct that from the original number...mod should be a lot simpler :-)
Regarding the mod thing, what I am actually talking about is modifying how many numbers initbytearray uses...default there is 256 (0 to 255) but I may actually want to use it with only (for instance) 100 (0 to 99) or 200 (0 to 199) or any number basically between 1 and 255...I assume the initbytearray doesn't take too long to build the strings so I should be able to re-initialise it at any time. Now I understand that I can change the for/next values in your code to achieve this PARTIALLY but how do I edit the bit with the &HFF& at the end...can I use a variable there or would I have to do it a different way...or am I right in thinking that I just take out the "and" and replace with mod X (x being the upper limit)...and I assume X+1 replacing the 256 in the second line.
I will actually make use of this code and test out these theories I am having about how to do this tomorrow...I am currently on a TV as my monitor is FUBAR, but should have a new monitor (or the old one fixed) tomorrow :-)
Thanks for your help...will mark it resolved as soon as I've got the code all working in my program :-)