|
-
Feb 4th, 2011, 02:01 PM
#1
Thread Starter
Frenzied Member
Optimizing byte array to string encoder
Hi, below is a byte array to string encoder. It uses the yEnc algorithm that is used for Usenet binaries (Usenet only supports text).
Doing a performance analysis in VS2010 shows that this is the most CPU intensive part of the application.
The encoder function is used thousands of times, so I'd like to optimize it as much as possible.
Does anybody know if more improvements can be made?
I'm using 'ReDim Preserve' instead of 'Take()ToArray', because I'm targeting .NET Framework 2.0. Is there a more efficient way of getting only a part of a byte array?
vb.net Code:
'Values for yEnc Encoder
Public Structure EncoderValues
Public lChar As Byte
Public IsSingleChar As Boolean
Public CheckLine0 As Boolean
Public Line0Char As Byte
End Structure
Public m_EncoderTable(255) As EncoderValues
'This is called in the Form_Load event.
Public Sub SetupEncoder()
Dim lngInput As Integer, bTest As Byte
For lngInput = 0 To 255
With m_EncoderTable(lngInput)
bTest = CByte((lngInput + 42) Mod 256)
Select Case bTest
Case 1 To 8, 11 To 12, 14 To 45, 47 To 60, 62 To 255 'Normal processing
.lChar = bTest
.IsSingleChar = True
Case 46 'Test for column one, escape if in that col.
.lChar = bTest
.IsSingleChar = False
.CheckLine0 = True
.Line0Char = CByte((bTest + 64) Mod 256)
Case Else 'Critical char, needs to be escaped.
.lChar = CByte((bTest + 64) Mod 256)
.IsSingleChar = False
End Select
End With
Next lngInput
End Sub
Public Function EncodeData(ByVal bIn() As Byte) As String
Dim bOut() As Byte, lChar As Integer, lPos As Integer, lLine As Integer
ReDim bOut(UBound(bIn) * 2) 'Make the output buffer. Double size will almost always be enough. 2 extra
'chars per 72 and worst case would be all characters escaped. If so, use Base64.
For lChar = 0 To UBound(bIn)
With m_EncoderTable(bIn(lChar))
If .IsSingleChar Then 'Normal processing
bOut(lPos) = .lChar
lPos += 1
lLine += 1
ElseIf .CheckLine0 Then 'Test for column one, escape if in that col.
If lLine <> 0 Then
bOut(lPos) = .lChar
lPos += 1
lLine += 1
Else
bOut(lPos) = 61
bOut(lPos + 1) = .Line0Char
lPos += 2
lLine += 2
End If
Else 'Critical char, needs to be escaped.
bOut(lPos) = 61
bOut(lPos + 1) = .lChar
lPos += 2
lLine += 2
End If
End With
If lLine >= 128 Then 'Add a vbCrLf
bOut(lPos) = 13
bOut(lPos + 1) = 10
lPos += 2
lLine = 0
End If
Next lChar
ReDim Preserve bOut(lPos - 1) 'Truncate the unused portion of the buffer.
Return System.Text.Encoding.GetEncoding(1252).GetString(bOut)
End Function
-
Feb 4th, 2011, 03:10 PM
#2
Re: Optimizing byte array to string encoder
Is this what you are trying to do? From yenc.org:
* Fetch the character
* Add 42
* Check for NULL, TAB(ascii 9), LF(ascii 10), CR (ascii 13) and '='
* If one of the critical chars encounters then write '=' as escape
character to the output stream followed by the critical+64.
(NULL -> =@, TAB --> =I, LF --> =J, CR --> =M, '=' --> =}
I think "=" is supposed to be ==
-
Feb 4th, 2011, 03:14 PM
#3
Re: Optimizing byte array to string encoder
I see nothing in particular about that code, except that there is lots of simple math in it. Simple math is not highly effective in .NET. It is effective, but no more effective than multiplication or division. Therefore, you could probably benefit considerably by building the method into unmanaged C where you can get the advantage of all the integer math you are using, which means instructions that fall into the RISC subset of the CPU.
Other than that, I doubt you will gain anything by tightening the existing code. That means that the only opportunity for improvement would be by changing the algorithm. There may be room to work there, as I note that, despite the various conditions, you really only have two alternatives, though you add different values to different places in those alternatives. Hard to say, though.
My usual boring signature: Nothing
 
-
Feb 4th, 2011, 03:33 PM
#4
Thread Starter
Frenzied Member
Re: Optimizing byte array to string encoder
@dbasnett
The encoder works perfectly fine. I'm just wondering if it can be made faster/more efficient.
@Shaggy Hiker
Changing the algorithm is not an option, as yEnc is the most used algorithm with the least amount of overhead. If I use my own algorithm, then other applications can't decode the data, because they only support yEnc, MIME or UUencode.
I have no idea how to create an unmanaged C library, but I'll have a look around and see if I can find a solution.
-
Feb 4th, 2011, 03:43 PM
#5
Re: Optimizing byte array to string encoder
Assuming the algorithm I posted is correct...
Code:
Dim buf(255) As Byte 'test buffer
'generate all possible byte values for testing
For bv As Integer = 0 To 255
buf(bv) = CByte(bv)
Next
Dim eqAsByte As Byte = Asc("=")
Dim critCH() As Byte = New Byte() {0, _
Asc(ControlChars.Tab), _
Asc(ControlChars.Lf), _
Asc(ControlChars.Cr), _
eqAsByte}
Dim encodedBuf As New List(Of Byte)
'encode
For Each byt As Byte In buf 'fetch the character
Dim addByte As Integer
If Not critCH.Contains(byt) Then 'is it critical
'no
addByte = byt + 42
Else
'yes, is a critical? character
encodedBuf.Add(eqAsByte) 'add equal sign to output
addByte = byt + 64
End If
addByte = addByte And 255
encodedBuf.Add(CByte(addByte))
Next
Dim outBuf() As Byte = encodedBuf.ToArray
'Debug.WriteLine(System.Text.ASCIIEncoding.ASCII.GetChars(encodedBuf.ToArray))
edit - Wow!
Last edited by dbasnett; Feb 4th, 2011 at 04:23 PM.
-
Feb 4th, 2011, 03:56 PM
#6
Re: Optimizing byte array to string encoder
Summary: I agree with SH, there's not much you can do.
Normally I'd say use List(Of T) instead of an array as a reflex to seeing ReDim Preserve, but in this case you have to produce an array to feed to GetString() so you're going to pay the price somewhere. Even Take() is going to use roughly the same amount of time/resources.
I'm going to go out on a limb and say that ReDim Preserve isn't your bottleneck anyway. Without putting a profiler on it or adding some extra code for instrumentation, it's hard to figure out where you spend the most time. You don't know what the problem is if you aren't measuring, so I started to measure.
I added two stopwatches: one around the entire operation ("outer") and one that doesn't include either array resizing ("inner"). I had to make a HUGE data array to get some reasonable times. With 100,000,000 bytes of data, I got 1,310ms for the inner stopwatch and 1,705 for the outer; 400ms for a big array copy seems reasonable, so the biggest problem is the rest of the code. I thought maybe using a structure for EncoderValues might introduce value copy overhead, but making it a Class only shed about 100ms, that's negligible. I tried caching the EncoderValue to see if the With statement was behaving inefficiently; that had practically no effect. That left nothing but accesses to the lookup and simple math; there's not really much you can do to optimize that unless you have a clever algorithm. I thought a pre-generated lookup table for each byte might perform better, but it made performance worse. CPU effort's hard to compare; I watched task manager and hit 13% in both the lookup table approach (similar to dbasnett's) and your original code.
For each byte, you have to do a little bit of work, and that's going to take some amount of CPU and time. Sometimes you just can't get around it.
-
Feb 4th, 2011, 04:11 PM
#7
Re: Optimizing byte array to string encoder
Just a thought: how about splitting the input byte array to spin off multiple threads (each one encodes small part of the input array) and then join the thread to build the final output array?
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Feb 4th, 2011, 04:33 PM
#8
Re: Optimizing byte array to string encoder
My internet connection is horrible. Maybe it is because of the foot of snow piled up in front of the antenna on the roof... Finally got it posted.
-
Feb 4th, 2011, 05:22 PM
#9
Thread Starter
Frenzied Member
Re: Optimizing byte array to string encoder
Thanks everyone.
dbasnett, I tried your code and although it's much smaller, it's a bit slower. The average of your code is about 46 ms and the average of my code is about 38 ms. Normally those few ms are neglectable, but some people have a 100/120 Mbit upload connection and are uploading with 10000 kB/s.
10000 kB = 10240000 bytes
Each byte array that needs to be encoded = 640000 bytes
10240000 bytes / 640000 bytes = 16 encodes per second
1000 (1 sec) / 16 = 62.5 ms
So in order to keep up with those high upload speeds, each encode should not take longer than 62.5 ms. Both are still well in that range, but this is the reason why every ms is important.
-
Feb 4th, 2011, 06:02 PM
#10
Re: Optimizing byte array to string encoder
When I mentioned a different algorithm, I wasn't talking about HOW the conversion took place, but how the code was operating. Sitten covered some of those, and Stanav covered what is probably the best one, considering the volumes you just posted. Multi-threading may well be the only alternative to speed this up and stay within .NET. However, you will probably hurt your speed if you use too many threads for the number of cores you have, as each thread is going to have zero wait time. I hope to be testing this concept in a year or so (I'm way too busy at the moment, and will be until August), but I suspect that for this type of task, you will optimize core usage if your number of threads is somewhere between N and 2N-1 where N is the number of cores. Therefore, on a quad core processor, you might make 6 threads (actually, 7, by that math, but 6 is easier to work with), where each thread gets 100,000 bytes to encode (plus some change), while one thread is listening for bytes.
However, the unmanaged C route seems very promising in this case. You would be looking at building a C dll that has a method that takes a byte array and returns the string. When I started out in .NET, I took the time to examine the difference in speed between integer addition, integer division, and floating point addition and division. There was no appreciable difference between the four. However, since the dawn of the x86 processor, integer addition and subtraction were up to 70 times as fast as integer division. That gap has changed with each version (and floating point math was horrid, even with a math co-processor, up until the Pentium), but it remains pretty big. I lost track of the timing after the early Pentium chips, and specific instruction timing no longer exists on modern CPU architectures (the actual time taken for any instruction is based on the context it is in), but integer addition and subtraction was the fastest instruction on the CPU. There were a few others in that category, but they were in a class by themselves on the early Pentium, and are probably still there. Integer division should have been at least 30x slower than integer addition, yet my tests in .NET showed no significant difference. Perhaps there was a few percent, but not 3000%. This couldn't have been achieved by integer division being made vastly faster in code generated by .NET, so it could only be that integer addition, the most common thing you do in that code, is massively slowed by .NET.
While unmanaged code doesn't always have to be significantly faster, this is a case where it seems likely to be FAR faster, especially if you use a language like C that is only a baby step above ASM.
My usual boring signature: Nothing
 
-
Feb 4th, 2011, 09:06 PM
#11
Thread Starter
Frenzied Member
Re: Optimizing byte array to string encoder
Thank you, Shaggy Hiker. I've never programmed in C before, so that won't be easy, but below is an example.
I only need the actual encoding part and not the header/footer parts ('=ybegin part=', 'ypart begin=', '=yend size=') or CRC32 part.
This doesn't belong in this forum, but perhaps you know how to turn this code into a function that accepts a byte array and returns a string?
http://www.yenc.org/yencode-c.txt
c Code:
deslen=0; desp=desbuf; restlen=filelen;
while (restlen>0)
{
srclen=restlen;
if (srclen>4096) srclen=4096;
id=fread(srcbuf,srclen,1,fSrc);
if (id != 1)
{
eprint("Error in reading file for encoding (code=%d)\r\n",errno);
return(1);
}
restlen=restlen-srclen;
srcp=srcbuf;
while (srclen>0)
{
c=*srcp; // Get a source byte
CrcAdd(c); // Add it to the CRC
c=(unsigned char) (c+42); // and add the secret number
srcp++; srclen--;
switch (c) // Solve special in NNTP 'forbidden' characters
{
case 0:
case 9:
case 10:
case 13:
case '=': // Including the escape character itself
case '.': // Some usual servers have problems with a dot in the first column
*desp='='; desp++; deslen++;
c=(unsigned char)(c+64);
}
*desp=c; desp++; deslen++;
if ((deslen>=linelength)|((srclen==0)&(restlen==0))) // Block full - or end of file
{
*desp=13; desp++; deslen++;
*desp=10; deslen++;
id=fwrite(desbuf,deslen,1,fDes);
if (id!=1)
{
eprint("Error in writing encoded file (code=%d)\r\n",errno);
return(1);
}
deslen=0; desp=desbuf;
}
} // end of while (srclen > 0)
} // end of while (restlen > 0)
-
Feb 5th, 2011, 08:37 AM
#12
Re: Optimizing byte array to string encoder
Three approaches. The fastest is very close to the OP's.
Code:
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click
Dim l As New List(Of Byte)
'generate byte values for testing
For i As Integer = 1 To 2510
For bv As Integer = 0 To 255
l.Add(CByte(bv))
Next
Next
Dim buf() As Byte = l.ToArray 'test buffer
loadEncVals() 'build output values
Const numLoops As Integer = 5
Dim stpw As New Stopwatch
Dim nb() As Byte = New Byte() {}
stpw.Start()
For x As Integer = 1 To numLoops
'nb = yencEncode1(buf)
'nb = yencEncode2(buf)
nb = yencEncode3(buf) 'fastest of the three - very similar to OP's
Next
Dim s As String = System.Text.Encoding.GetEncoding(28591).GetChars(nb)
stpw.Stop()
Debug.WriteLine(s.Length & " " & nb.Length)
Label1.Text = stpw.ElapsedMilliseconds.ToString("n0")
RichTextBox1.Text = s
End Sub
Dim encValues As New List(Of Byte())
Private Sub loadEncVals()
If encValues.Count = 0 Then
'create output values
Debug.WriteLine("create output values")
For ch As Integer = 0 To 255
If critCH.Contains(CByte(ch)) Then
encValues.Add(New Byte() {eqAsByte, CByte((ch + 64) And 255)})
Else
encValues.Add(New Byte() {CByte((ch + 42) And 255)})
End If
Next
End If
End Sub
Dim eqAsByte As Byte = Asc("=")
Dim critCH() As Byte = New Byte() {0, _
Asc(ControlChars.Tab), _
Asc(ControlChars.Lf), _
Asc(ControlChars.Cr), _
eqAsByte}
Private Function yencEncode1(ByVal buf() As Byte) As Byte()
Dim encodedBuf As New List(Of Byte)(2 * buf.Length)
'encode
For idx As Integer = 0 To buf.Length - 1 'fetch the character
Dim addByte As Integer
If Not critCH.Contains(buf(idx)) Then 'is it critical
'no
addByte = buf(idx) + 42
Else
'yes, is a critical? character
encodedBuf.Add(eqAsByte) 'add equal sign to output
addByte = buf(idx) + 64
End If
addByte = addByte And 255
encodedBuf.Add(CByte(addByte))
Next
Return encodedBuf.ToArray
End Function
Private Function yencEncode2(ByVal buf() As Byte) As Byte()
Dim encodedBuf As New List(Of Byte)(2 * buf.Length)
'encode
For idx As Integer = 0 To buf.Length - 1 'fetch the character
encodedBuf.AddRange(encValues(buf(idx)))
Next
Return encodedBuf.ToArray
End Function
Private Function yencEncode3(ByVal buf() As Byte) As Byte()
Dim encodedBuf(buf.Length * 2) As Byte
Dim offs As Integer = 0
'encode
For idx As Integer = 0 To buf.Length - 1 'fetch the character
For Each byt As Byte In encValues(buf(idx))
encodedBuf(offs) = byt
offs += 1
Next
Next
Array.Resize(encodedBuf, offs)
Return encodedBuf
End Function
End Class
-
Feb 5th, 2011, 08:53 AM
#13
Re: Optimizing byte array to string encoder
On my PC yencEncode3 runs in less than 20ms. for a buffer of 642,560 bytes.
-
Feb 5th, 2011, 10:46 AM
#14
Thread Starter
Frenzied Member
Re: Optimizing byte array to string encoder
Thank you very much. It is a bit faster than my code. The only problem is that the algorithm is not entirely correct, but that's not too difficult to fix.
Each line of encoded data should be 128 characters and it also doesn't take care of lines that start with a dot (char 46). Lines that start with a dot need to be double dots.
Thanks again.
-
Feb 5th, 2011, 10:51 AM
#15
Re: Optimizing byte array to string encoder
What reference are you using for the encoding rules?
-
Feb 5th, 2011, 11:21 AM
#16
Thread Starter
Frenzied Member
Re: Optimizing byte array to string encoder
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|