|
-
Jun 16th, 2012, 11:02 AM
#1
Thread Starter
Frenzied Member
[RESOLVED] Best way to extract string from byte array or memorystream
Hi, I'm using WinPcap to capture incoming data. Basically I'm dealing with a continuous stream of incoming data, either as byte array or as memorystream (pieces up to 64kb).
I need to check this data to see if there's a certain url and then extract it.
The url I need to find looks like this:
Code:
http://201.122.38.5/data/today/798572987-589571?139805890582
The length of the 'code' after "/today/" is always the same. The IP address changes.
What's the best/fastest/most efficient way to continuously check these byte arrays or memorystreams and extract the urls without hogging the CPU too much?
I'm using Framework 4.0.
vb.net Code:
Private Sub PacketHandler(ByVal packet As Packet)
packet.Ethernet.IpV4.Tcp.Payload.ToMemoryStream()
'// or
packet.Ethernet.IpV4.Tcp.Payload.ToArray()
End Sub
-
Jun 16th, 2012, 11:14 AM
#2
Re: Best way to extract string from byte array or memorystream
Encoding.GetString will convert a Byte array to a String. You just have to pick the appropriate Encoding object, e.g. Encoding.ASCII.
-
Jun 16th, 2012, 11:26 AM
#3
Thread Starter
Frenzied Member
Re: Best way to extract string from byte array or memorystream
Thanks. I'm wondering if it wouldn't be better to loop through the byte array and search for a pattern of bytes and only convert the required bytes to a String?
The data might be coming in at a few MegaBytes per second and converting all that data to a String will probably hog the CPU quite a lot.
-
Jun 17th, 2012, 02:02 AM
#4
Re: Best way to extract string from byte array or memorystream
If its an ANSI string(basically 1 byte per character) and then that would be more efficient. However, if its a unicode string then it may not be so simple since unicode have some nuances that may get in the way(Eg . Somtimes a character could be represented as 3 bytes instead of 2...in the same string!!). Evil_Giraffe seems to knows a lot about unicode strings and strings in general in the .Net framework. He may be able to provide a more solid answer.
-
Jun 18th, 2012, 04:12 AM
#5
Thread Starter
Frenzied Member
Re: Best way to extract string from byte array or memorystream
The best solution seems to be the Boyer–Moore string search algorithm. It's also used in the ngrep commandline tool.
http://en.wikipedia.org/wiki/Boyer%E...arch_algorithm
-
Jun 18th, 2012, 04:21 AM
#6
Re: Best way to extract string from byte array or memorystream
 Originally Posted by Niya
Evil_Giraffe seems to knows a lot about unicode strings and strings in general in the .Net framework. He may be able to provide a more solid answer.
The only thing I would have said is that you must know the encoding used at the source. But jmc already said that:
 Originally Posted by jmcilhinney
You just have to pick the appropriate Encoding object, e.g. Encoding.ASCII.
If you don't know the encoding used, then you don't have an encoded string, you have a bunch of bytes.
The other issue to deal with is that the stream is not going to be one long string, you will have 'messages' that you need to extract and find the string of. It may be tempting to try and do a byte search down at the low level stream, but I think you'll stay saner if you use the message protocol to decompose the stream into messages and be able to grab the bytes that relate to the address directly, and then decode into the string.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|