Results 1 to 6 of 6

Thread: [RESOLVED] Best way to extract string from byte array or memorystream

  1. #1
    Frenzied Member
    Join Date
    Nov 05
    Posts
    1,808

    Resolved [RESOLVED] Best way to extract string from byte array or memorystream

    Hi, I'm using WinPcap to capture incoming data. Basically I'm dealing with a continuous stream of incoming data, either as byte array or as memorystream (pieces up to 64kb).

    I need to check this data to see if there's a certain url and then extract it.

    The url I need to find looks like this:

    Code:
    http://201.122.38.5/data/today/798572987-589571?139805890582
    The length of the 'code' after "/today/" is always the same. The IP address changes.

    What's the best/fastest/most efficient way to continuously check these byte arrays or memorystreams and extract the urls without hogging the CPU too much?

    I'm using Framework 4.0.


    vb.net Code:
    1. Private Sub PacketHandler(ByVal packet As Packet)
    2.  
    3.     packet.Ethernet.IpV4.Tcp.Payload.ToMemoryStream()
    4.  
    5.     '// or
    6.  
    7.     packet.Ethernet.IpV4.Tcp.Payload.ToArray()
    8. End Sub

  2. #2
    .NUT jmcilhinney's Avatar
    Join Date
    May 05
    Location
    Sydney, Australia
    Posts
    80,747

    Re: Best way to extract string from byte array or memorystream

    Encoding.GetString will convert a Byte array to a String. You just have to pick the appropriate Encoding object, e.g. Encoding.ASCII.

  3. #3
    Frenzied Member
    Join Date
    Nov 05
    Posts
    1,808

    Re: Best way to extract string from byte array or memorystream

    Thanks. I'm wondering if it wouldn't be better to loop through the byte array and search for a pattern of bytes and only convert the required bytes to a String?

    The data might be coming in at a few MegaBytes per second and converting all that data to a String will probably hog the CPU quite a lot.

  4. #4
    Burning Member Niya's Avatar
    Join Date
    Nov 11
    Posts
    3,095

    Re: Best way to extract string from byte array or memorystream

    If its an ANSI string(basically 1 byte per character) and then that would be more efficient. However, if its a unicode string then it may not be so simple since unicode have some nuances that may get in the way(Eg . Somtimes a character could be represented as 3 bytes instead of 2...in the same string!!). Evil_Giraffe seems to knows a lot about unicode strings and strings in general in the .Net framework. He may be able to provide a more solid answer.
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | Create Sortable BindingList(not mine) | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading


    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. -jmcilhinney

  5. #5
    Frenzied Member
    Join Date
    Nov 05
    Posts
    1,808

    Re: Best way to extract string from byte array or memorystream

    The best solution seems to be the Boyer–Moore string search algorithm. It's also used in the ngrep commandline tool.

    http://en.wikipedia.org/wiki/Boyer%E...arch_algorithm

  6. #6
    Frenzied Member Evil_Giraffe's Avatar
    Join Date
    Aug 02
    Location
    Suffolk, UK
    Posts
    1,876

    Re: Best way to extract string from byte array or memorystream

    Quote Originally Posted by Niya View Post
    Evil_Giraffe seems to knows a lot about unicode strings and strings in general in the .Net framework. He may be able to provide a more solid answer.
    The only thing I would have said is that you must know the encoding used at the source. But jmc already said that:

    Quote Originally Posted by jmcilhinney View Post
    You just have to pick the appropriate Encoding object, e.g. Encoding.ASCII.
    If you don't know the encoding used, then you don't have an encoded string, you have a bunch of bytes.

    The other issue to deal with is that the stream is not going to be one long string, you will have 'messages' that you need to extract and find the string of. It may be tempting to try and do a byte search down at the low level stream, but I think you'll stay saner if you use the message protocol to decompose the stream into messages and be able to grab the bytes that relate to the address directly, and then decode into the string.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •