Results 1 to 16 of 16

Thread: Optimizing byte array to string encoder

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Optimizing byte array to string encoder

    Hi, below is a byte array to string encoder. It uses the yEnc algorithm that is used for Usenet binaries (Usenet only supports text).

    Doing a performance analysis in VS2010 shows that this is the most CPU intensive part of the application.

    The encoder function is used thousands of times, so I'd like to optimize it as much as possible.
    Does anybody know if more improvements can be made?

    I'm using 'ReDim Preserve' instead of 'Take()ToArray', because I'm targeting .NET Framework 2.0. Is there a more efficient way of getting only a part of a byte array?


    vb.net Code:
    1. 'Values for yEnc Encoder
    2. Public Structure EncoderValues
    3.     Public lChar As Byte
    4.     Public IsSingleChar As Boolean
    5.     Public CheckLine0 As Boolean
    6.     Public Line0Char As Byte
    7. End Structure
    8. Public m_EncoderTable(255) As EncoderValues
    9.  
    10. 'This is called in the Form_Load event.
    11. Public Sub SetupEncoder()
    12.     Dim lngInput As Integer, bTest As Byte
    13.     For lngInput = 0 To 255
    14.         With m_EncoderTable(lngInput)
    15.             bTest = CByte((lngInput + 42) Mod 256)
    16.             Select Case bTest
    17.                 Case 1 To 8, 11 To 12, 14 To 45, 47 To 60, 62 To 255    'Normal processing
    18.                     .lChar = bTest
    19.                     .IsSingleChar = True
    20.                 Case 46                                           'Test for column one, escape if in that col.
    21.                     .lChar = bTest
    22.                     .IsSingleChar = False
    23.                     .CheckLine0 = True
    24.                     .Line0Char = CByte((bTest + 64) Mod 256)
    25.                 Case Else                                         'Critical char, needs to be escaped.
    26.                     .lChar = CByte((bTest + 64) Mod 256)
    27.                     .IsSingleChar = False
    28.             End Select
    29.         End With
    30.     Next lngInput
    31. End Sub
    32.  
    33.  
    34. Public Function EncodeData(ByVal bIn() As Byte) As String
    35.  
    36.     Dim bOut() As Byte, lChar As Integer, lPos As Integer, lLine As Integer
    37.  
    38.     ReDim bOut(UBound(bIn) * 2)             'Make the output buffer.  Double size will almost always be enough.  2 extra
    39.     'chars per 72 and worst case would be all characters escaped.  If so, use Base64.
    40.     For lChar = 0 To UBound(bIn)
    41.         With m_EncoderTable(bIn(lChar))
    42.             If .IsSingleChar Then             'Normal processing
    43.                 bOut(lPos) = .lChar
    44.                 lPos += 1
    45.                 lLine += 1
    46.             ElseIf .CheckLine0 Then           'Test for column one, escape if in that col.
    47.                 If lLine <> 0 Then
    48.                     bOut(lPos) = .lChar
    49.                     lPos += 1
    50.                     lLine += 1
    51.                 Else
    52.                     bOut(lPos) = 61
    53.                     bOut(lPos + 1) = .Line0Char
    54.                     lPos += 2
    55.                     lLine += 2
    56.                 End If
    57.             Else                              'Critical char, needs to be escaped.
    58.                 bOut(lPos) = 61
    59.                 bOut(lPos + 1) = .lChar
    60.                 lPos += 2
    61.                 lLine += 2
    62.             End If
    63.         End With
    64.         If lLine >= 128 Then           'Add a vbCrLf
    65.             bOut(lPos) = 13
    66.             bOut(lPos + 1) = 10
    67.             lPos += 2
    68.             lLine = 0
    69.         End If
    70.     Next lChar
    71.  
    72.     ReDim Preserve bOut(lPos - 1)           'Truncate the unused portion of the buffer.
    73.     Return System.Text.Encoding.GetEncoding(1252).GetString(bOut)
    74. End Function

  2. #2
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    Is this what you are trying to do? From yenc.org:

    * Fetch the character
    * Add 42
    * Check for NULL, TAB(ascii 9), LF(ascii 10), CR (ascii 13) and '='
    * If one of the critical chars encounters then write '=' as escape
    character to the output stream followed by the critical+64.
    (NULL -> =@, TAB --> =I, LF --> =J, CR --> =M, '=' --> =}

    I think "=" is supposed to be ==
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  3. #3
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,104

    Re: Optimizing byte array to string encoder

    I see nothing in particular about that code, except that there is lots of simple math in it. Simple math is not highly effective in .NET. It is effective, but no more effective than multiplication or division. Therefore, you could probably benefit considerably by building the method into unmanaged C where you can get the advantage of all the integer math you are using, which means instructions that fall into the RISC subset of the CPU.

    Other than that, I doubt you will gain anything by tightening the existing code. That means that the only opportunity for improvement would be by changing the algorithm. There may be room to work there, as I note that, despite the various conditions, you really only have two alternatives, though you add different values to different places in those alternatives. Hard to say, though.
    My usual boring signature: Nothing

  4. #4

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Optimizing byte array to string encoder

    @dbasnett

    The encoder works perfectly fine. I'm just wondering if it can be made faster/more efficient.


    @Shaggy Hiker

    Changing the algorithm is not an option, as yEnc is the most used algorithm with the least amount of overhead. If I use my own algorithm, then other applications can't decode the data, because they only support yEnc, MIME or UUencode.

    I have no idea how to create an unmanaged C library, but I'll have a look around and see if I can find a solution.

  5. #5
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    Assuming the algorithm I posted is correct...

    Code:
            Dim buf(255) As Byte 'test buffer
    
            'generate all possible byte values for testing
            For bv As Integer = 0 To 255
                buf(bv) = CByte(bv)
            Next
    
    
            Dim eqAsByte As Byte = Asc("=")
            Dim critCH() As Byte = New Byte() {0, _
                                               Asc(ControlChars.Tab), _
                                               Asc(ControlChars.Lf), _
                                               Asc(ControlChars.Cr), _
                                               eqAsByte}
    
            Dim encodedBuf As New List(Of Byte)
            'encode
            For Each byt As Byte In buf 'fetch the character
                Dim addByte As Integer
                If Not critCH.Contains(byt) Then 'is it critical
                    'no
                    addByte = byt + 42
                Else
                    'yes, is a critical? character
                    encodedBuf.Add(eqAsByte) 'add equal sign to output
                    addByte = byt + 64
                End If
                addByte = addByte And 255
                encodedBuf.Add(CByte(addByte))
            Next
    
            Dim outBuf() As Byte = encodedBuf.ToArray
            'Debug.WriteLine(System.Text.ASCIIEncoding.ASCII.GetChars(encodedBuf.ToArray))
    edit - Wow!
    Last edited by dbasnett; Feb 4th, 2011 at 04:23 PM.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  6. #6
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,578

    Re: Optimizing byte array to string encoder

    Summary: I agree with SH, there's not much you can do.

    Normally I'd say use List(Of T) instead of an array as a reflex to seeing ReDim Preserve, but in this case you have to produce an array to feed to GetString() so you're going to pay the price somewhere. Even Take() is going to use roughly the same amount of time/resources.

    I'm going to go out on a limb and say that ReDim Preserve isn't your bottleneck anyway. Without putting a profiler on it or adding some extra code for instrumentation, it's hard to figure out where you spend the most time. You don't know what the problem is if you aren't measuring, so I started to measure.

    I added two stopwatches: one around the entire operation ("outer") and one that doesn't include either array resizing ("inner"). I had to make a HUGE data array to get some reasonable times. With 100,000,000 bytes of data, I got 1,310ms for the inner stopwatch and 1,705 for the outer; 400ms for a big array copy seems reasonable, so the biggest problem is the rest of the code. I thought maybe using a structure for EncoderValues might introduce value copy overhead, but making it a Class only shed about 100ms, that's negligible. I tried caching the EncoderValue to see if the With statement was behaving inefficiently; that had practically no effect. That left nothing but accesses to the lookup and simple math; there's not really much you can do to optimize that unless you have a clever algorithm. I thought a pre-generated lookup table for each byte might perform better, but it made performance worse. CPU effort's hard to compare; I watched task manager and hit 13&#37; in both the lookup table approach (similar to dbasnett's) and your original code.

    For each byte, you have to do a little bit of work, and that's going to take some amount of CPU and time. Sometimes you just can't get around it.

  7. #7
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Optimizing byte array to string encoder

    Just a thought: how about splitting the input byte array to spin off multiple threads (each one encodes small part of the input array) and then join the thread to build the final output array?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  8. #8
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    My internet connection is horrible. Maybe it is because of the foot of snow piled up in front of the antenna on the roof... Finally got it posted.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  9. #9

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Optimizing byte array to string encoder

    Thanks everyone.

    dbasnett, I tried your code and although it's much smaller, it's a bit slower. The average of your code is about 46 ms and the average of my code is about 38 ms. Normally those few ms are neglectable, but some people have a 100/120 Mbit upload connection and are uploading with 10000 kB/s.


    10000 kB = 10240000 bytes

    Each byte array that needs to be encoded = 640000 bytes

    10240000 bytes / 640000 bytes = 16 encodes per second

    1000 (1 sec) / 16 = 62.5 ms

    So in order to keep up with those high upload speeds, each encode should not take longer than 62.5 ms. Both are still well in that range, but this is the reason why every ms is important.

  10. #10
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,104

    Re: Optimizing byte array to string encoder

    When I mentioned a different algorithm, I wasn't talking about HOW the conversion took place, but how the code was operating. Sitten covered some of those, and Stanav covered what is probably the best one, considering the volumes you just posted. Multi-threading may well be the only alternative to speed this up and stay within .NET. However, you will probably hurt your speed if you use too many threads for the number of cores you have, as each thread is going to have zero wait time. I hope to be testing this concept in a year or so (I'm way too busy at the moment, and will be until August), but I suspect that for this type of task, you will optimize core usage if your number of threads is somewhere between N and 2N-1 where N is the number of cores. Therefore, on a quad core processor, you might make 6 threads (actually, 7, by that math, but 6 is easier to work with), where each thread gets 100,000 bytes to encode (plus some change), while one thread is listening for bytes.

    However, the unmanaged C route seems very promising in this case. You would be looking at building a C dll that has a method that takes a byte array and returns the string. When I started out in .NET, I took the time to examine the difference in speed between integer addition, integer division, and floating point addition and division. There was no appreciable difference between the four. However, since the dawn of the x86 processor, integer addition and subtraction were up to 70 times as fast as integer division. That gap has changed with each version (and floating point math was horrid, even with a math co-processor, up until the Pentium), but it remains pretty big. I lost track of the timing after the early Pentium chips, and specific instruction timing no longer exists on modern CPU architectures (the actual time taken for any instruction is based on the context it is in), but integer addition and subtraction was the fastest instruction on the CPU. There were a few others in that category, but they were in a class by themselves on the early Pentium, and are probably still there. Integer division should have been at least 30x slower than integer addition, yet my tests in .NET showed no significant difference. Perhaps there was a few percent, but not 3000&#37;. This couldn't have been achieved by integer division being made vastly faster in code generated by .NET, so it could only be that integer addition, the most common thing you do in that code, is massively slowed by .NET.

    While unmanaged code doesn't always have to be significantly faster, this is a case where it seems likely to be FAR faster, especially if you use a language like C that is only a baby step above ASM.
    My usual boring signature: Nothing

  11. #11

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Optimizing byte array to string encoder

    Thank you, Shaggy Hiker. I've never programmed in C before, so that won't be easy, but below is an example.
    I only need the actual encoding part and not the header/footer parts ('=ybegin part=', 'ypart begin=', '=yend size=') or CRC32 part.

    This doesn't belong in this forum, but perhaps you know how to turn this code into a function that accepts a byte array and returns a string?

    http://www.yenc.org/yencode-c.txt

    c Code:
    1. deslen=0; desp=desbuf; restlen=filelen;
    2.     while (restlen>0)
    3.         {
    4.         srclen=restlen;
    5.         if (srclen>4096) srclen=4096;
    6.         id=fread(srcbuf,srclen,1,fSrc);
    7.         if (id != 1)
    8.             {
    9.             eprint("Error in reading file for encoding (code=%d)\r\n",errno);
    10.             return(1);
    11.             }
    12.         restlen=restlen-srclen;
    13.         srcp=srcbuf;
    14.  
    15.         while (srclen>0)
    16.             {
    17.  
    18.             c=*srcp;                    // Get a source byte
    19.             CrcAdd(c);                  // Add it to the CRC
    20.             c=(unsigned char) (c+42);   // and add the secret number
    21.             srcp++; srclen--;
    22.             switch (c)  // Solve special in NNTP 'forbidden' characters
    23.                 {
    24.             case 0:
    25.             case 9:
    26.             case 10:
    27.             case 13:
    28.             case '=':   // Including the escape character itself
    29.             case '.':   // Some usual servers have problems with a dot in the first column
    30.                 *desp='='; desp++; deslen++;
    31.                 c=(unsigned char)(c+64);
    32.                 }
    33.             *desp=c; desp++; deslen++;
    34.             if ((deslen>=linelength)|((srclen==0)&(restlen==0))) // Block full - or end of file
    35.                 {
    36.                 *desp=13; desp++; deslen++;
    37.                 *desp=10; deslen++;
    38.                 id=fwrite(desbuf,deslen,1,fDes);
    39.                 if (id!=1)
    40.                     {
    41.                     eprint("Error in writing encoded file (code=%d)\r\n",errno);
    42.                     return(1);
    43.                     }
    44.                 deslen=0; desp=desbuf;
    45.                 }
    46.             }  // end of while (srclen > 0)
    47.  
    48.         }   // end of while (restlen > 0)

  12. #12
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    Three approaches. The fastest is very close to the OP's.

    Code:
    Public Class Form1
    
        Private Sub Button1_Click(ByVal sender As System.Object, _
                                  ByVal e As System.EventArgs) Handles Button1.Click
            Dim l As New List(Of Byte)
            'generate byte values for testing
            For i As Integer = 1 To 2510
                For bv As Integer = 0 To 255
                    l.Add(CByte(bv))
                Next
            Next
            Dim buf() As Byte = l.ToArray 'test buffer
    
            loadEncVals() 'build output values
            Const numLoops As Integer = 5
            Dim stpw As New Stopwatch
            Dim nb() As Byte = New Byte() {}
    
            stpw.Start()
            For x As Integer = 1 To numLoops
                'nb = yencEncode1(buf)
                'nb = yencEncode2(buf)
                nb = yencEncode3(buf) 'fastest of the three - very similar to OP's
            Next
            Dim s As String = System.Text.Encoding.GetEncoding(28591).GetChars(nb)
            stpw.Stop()
    
            Debug.WriteLine(s.Length & " " & nb.Length)
            Label1.Text = stpw.ElapsedMilliseconds.ToString("n0")
            RichTextBox1.Text = s
        End Sub
    
        Dim encValues As New List(Of Byte())
        Private Sub loadEncVals()
            If encValues.Count = 0 Then
                'create output values
                Debug.WriteLine("create output values")
                For ch As Integer = 0 To 255
                    If critCH.Contains(CByte(ch)) Then
                        encValues.Add(New Byte() {eqAsByte, CByte((ch + 64) And 255)})
                    Else
                        encValues.Add(New Byte() {CByte((ch + 42) And 255)})
                    End If
    
                Next
            End If
        End Sub
    
        Dim eqAsByte As Byte = Asc("=")
        Dim critCH() As Byte = New Byte() {0, _
                                           Asc(ControlChars.Tab), _
                                           Asc(ControlChars.Lf), _
                                           Asc(ControlChars.Cr), _
                                           eqAsByte}
    
        Private Function yencEncode1(ByVal buf() As Byte) As Byte()
    
            Dim encodedBuf As New List(Of Byte)(2 * buf.Length)
            'encode
            For idx As Integer = 0 To buf.Length - 1 'fetch the character
                Dim addByte As Integer
                If Not critCH.Contains(buf(idx)) Then 'is it critical
                    'no
                    addByte = buf(idx) + 42
                Else
                    'yes, is a critical? character
                    encodedBuf.Add(eqAsByte) 'add equal sign to output
                    addByte = buf(idx) + 64
                End If
                addByte = addByte And 255
                encodedBuf.Add(CByte(addByte))
            Next
    
            Return encodedBuf.ToArray
        End Function
    
        Private Function yencEncode2(ByVal buf() As Byte) As Byte()
    
            Dim encodedBuf As New List(Of Byte)(2 * buf.Length)
            'encode
            For idx As Integer = 0 To buf.Length - 1 'fetch the character
                encodedBuf.AddRange(encValues(buf(idx)))
            Next
            Return encodedBuf.ToArray
        End Function
    
        Private Function yencEncode3(ByVal buf() As Byte) As Byte()
    
            Dim encodedBuf(buf.Length * 2) As Byte
            Dim offs As Integer = 0
            'encode
            For idx As Integer = 0 To buf.Length - 1 'fetch the character
                For Each byt As Byte In encValues(buf(idx))
                    encodedBuf(offs) = byt
                    offs += 1
                Next
            Next
            Array.Resize(encodedBuf, offs)
            Return encodedBuf
        End Function
    End Class
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  13. #13
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    On my PC yencEncode3 runs in less than 20ms. for a buffer of 642,560 bytes.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  14. #14

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Optimizing byte array to string encoder

    Thank you very much. It is a bit faster than my code. The only problem is that the algorithm is not entirely correct, but that's not too difficult to fix.

    Each line of encoded data should be 128 characters and it also doesn't take care of lines that start with a dot (char 46). Lines that start with a dot need to be double dots.

    Thanks again.

  15. #15
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: Optimizing byte array to string encoder

    What reference are you using for the encoding rules?
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  16. #16

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Optimizing byte array to string encoder


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width