Results 1 to 16 of 16

Thread: [RESOLVED] Converting String to ByteArray

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Jul 2006
    Location
    Maldon, Essex. UK
    Posts
    6,334

    Resolved [RESOLVED] Converting String to ByteArray

    I've written an Asynchronous Socket client, which is working ok. I'm still on the learning curve from VB6 to .NET and there's one or two things I'm not clear about.

    Part of the code requires sending the user's input to the Socket, using the BeginSend method. This requires the data to be sent as a Byte Array. Currently I'm doing this:
    Code:
            If txtToSend.Text <> vbNullString Then
                MessageToSend = txtToSend.Text & vbNewLine
                Dim bytSend(MessageToSend.Length - 1) As Byte
                For i = 0 To MessageToSend.Length - 1
                    bytSend(i) = Asc(MessageToSend.Substring(i, 1))
                Next
                Client.Client.BeginSend(bytSend, 0, MessageToSend.Length, 0, AddressOf SendCallback, Client)
            End If
    which seems to be a bit 'clunky' and 'vb6ish'. I feel as though I ought to be able to use 'MessageToSend.ToCharArray' and somehow coerce the result into a byte array.

    Or is there something quite fundamental I'm missing?

  2. #2
    Angel of Code Niya's Avatar
    Join Date
    Nov 2011
    Posts
    9,017

    Re: Converting String to ByteArray



    If you're gonna move to VB.Net, I implore you to stop thinking of text as directly interchangable with bytes. Think of a string as a sequence of unicode characters where conversion is necessary to switch between representations. In a unicode world, thinking of strings as a 1 byte per character byte array could be detrimental. The exact same string could be represented by totally different byte sequences according to which unicode format its in. The closest thing to ASCII, which you would be used to from VB6, would be UTF8:-
    vbnet Code:
    1. '
    2.         Dim message As String = "Hello world"
    3.  
    4.         'Gets a byte array that represents the string in UTF8 format
    5.         Dim byMessage As Byte() = System.Text.Encoding.UTF8.GetBytes(message)

    The great thing about this is that it would work with text in any language since UTF8 can represent any character, even non-latin characters. UTF8 in particular is backward compatible with ASCII so you can save the bytes directly into a text file and even an old DOS text editor would be able to read it as long as you use normal latin characters from the ASCII codepage.

    Your code should look like this:-
    vbnet Code:
    1. '
    2.         If txtToSend.Text <> vbNullString Then
    3.             MessageToSend = txtToSend.Text & vbNewLine
    4.             Dim bytSend As Byte() = System.Text.Encoding.UTF8.GetBytes(MessageToSend)
    5.  
    6.             Client.Client.BeginSend(bytSend, 0, MessageToSend.Length, 0, AddressOf SendCallback, Client)
    7.         End If
    Last edited by Niya; Nov 5th, 2012 at 02:54 AM.
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading | Simple file compression | Demon Arena

    Copy/move files using Windows Shell | I'm not wanted

    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. - jmcilhinney

    The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber

  3. #3

    Thread Starter
    PowerPoster
    Join Date
    Jul 2006
    Location
    Maldon, Essex. UK
    Posts
    6,334

    Re: Converting String to ByteArray

    Thanks for that.

    I seem to be on quite a steep learning curve and it's difficult getting my head round ASCII vs UTF8 etc after 40 years of programming. The little grey cells are not quite as active as they were. However, I suppose now is as good a time as any to start.

  4. #4
    Angel of Code Niya's Avatar
    Join Date
    Nov 2011
    Posts
    9,017

    Re: [RESOLVED] Converting String to ByteArray

    Don't worry you'd get it. It was confusing for me too in the beginning. When I realized that I should stop interfering with the bytes in Strings directly and treat them as black boxes where the only thing I know is the format and the character sequence, everything became clearer.
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading | Simple file compression | Demon Arena

    Copy/move files using Windows Shell | I'm not wanted

    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. - jmcilhinney

    The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber

  5. #5
    Frenzied Member
    Join Date
    Mar 2005
    Location
    Sector 001
    Posts
    1,577

    Re: [RESOLVED] Converting String to ByteArray

    UTF-8 is not Unicode.
    .NET is mostly using Unicode if the encoding is not explicitly specified but the forms themselves are not saved in Unicode.
    Socket communication and pretty much everything else Internet is still using extended ASCII (the 0-255 range)
    :-D

    When I am sending 대한민국, I am not sending 대한민국 but the bytes EB,8C,80,ED,95,9C,EB,AF,BC,EA,B5,AD. Just like you are doing it in your original code. I think it is great to use what's easiest but only if you know why and how it is easier. Otherwise when your users say they are getting squares and question marks you won't know if the GetBytes' encoding was wrong, they don't have an adequate font, the send went wrong etc. etc.

    I now find the basics to be really simple, the only thing I had to read back then:
    http://www.joelonsoftware.com/articles/Unicode.html
    VB 2005, Win Xp Pro sp2

  6. #6
    PowerPoster Evil_Giraffe's Avatar
    Join Date
    Aug 2002
    Location
    Suffolk, UK
    Posts
    2,555

    Re: [RESOLVED] Converting String to ByteArray

    Quote Originally Posted by Half View Post
    UTF-8 is not Unicode.
    Um, yes it is.

    Quote Originally Posted by The very article you linked to
    Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.
    Quote Originally Posted by Half View Post
    Socket communication and pretty much everything else Internet is still using extended ASCII (the 0-255 range)
    No, "Socket communication and pretty much everything else Internet" is using bytes. How the receiving systems interpret those bytes is up to them. The sending and receiving system simply have to agree. This can either be by decree (as in this example: it is stated that text will be sent encoded as UTF-8), or by some form of negotiation/command (as in web pages, the doctype that comes first in the html document should specify what encoding the document is in (yes, this is a bit screwy, fortunately the doctype is carefully constructed to only use characters that are the same in basically every encoding ever invented)

  7. #7
    Frenzied Member
    Join Date
    Mar 2005
    Location
    Sector 001
    Posts
    1,577

    Re: [RESOLVED] Converting String to ByteArray

    Quote Originally Posted by Evil_Giraffe View Post
    Quote Originally Posted by Half View Post
    UTF-8 is not Unicode.
    Um, yes it is.
    Quote Originally Posted by The very article I linked to
    Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.
    Note the 'another system' piece of text. You can use a system of dead raccoons to store Unicode data and call it URCns-D but it won't make it Unicode. To illustrate the humongous difference between actual Unicode and a distorted thing like UTF-8:
    input string: "대한민국"
    UTF8.GetBytes("대한민국") => Byte Seq: 128 237 149 156 235 175 188 234 181 173
    Unicode.GetBytes("대한민국") => Byte Seq: 0 179 92 213 252 187 109 173

    -----------------------------------

    Quote Originally Posted by Evil_Giraffe View Post
    No, "Socket communication and pretty much everything else Internet" is using bytes. How the receiving systems interpret those bytes is up to them. The sending and receiving system simply have to agree. This can either be by decree (as in this example: it is stated that text will be sent encoded as UTF-8), or by some form of negotiation/command (as in web pages, the doctype that comes first in the html document should specify what encoding the document is in (yes, this is a bit screwy, fortunately the doctype is carefully constructed to only use characters that are the same in basically every encoding ever invented)
    Bytes and ASCII chars are interchangeable terms in most dev communities. You may not know it but e.g. HEX editors , in the text portion, almost always show the bytes in ASCII and not in other encodings.

    Of course I can see how beginners can become a bit frustrated when a byte is treated like a char and a char is treated like a byte but after battling with APIs for a while or working with webservers it all becomes a bit clearer. In any case it boils down to: either use simple extended ASCII when generating byte sequences for transfer and tell the client the encoding to use when treating those bytes (my preference)

    or

    Use some other encoding in generating the byte sequence and when Windows 9 decides it's time for middle-endiannes, start digging for the source code to make sense of the output.
    VB 2005, Win Xp Pro sp2

  8. #8
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: [RESOLVED] Converting String to ByteArray

    What is Unicode? In today's computing environment it would be the exception that a character set isn't unicode. The only trick is that the encoder and decoder agree on the encoding. There is even one for extended ASCII, Windows-28591, which supplies a one-to-one code for each of the 256 characters.

    One other thing. If you are going to quote a link it is always best to read it thoroughly. From The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    "The Single Most Important Fact About Encodings

    If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII."
    Last edited by dbasnett; Nov 6th, 2012 at 02:22 PM.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  9. #9
    PowerPoster Evil_Giraffe's Avatar
    Join Date
    Aug 2002
    Location
    Suffolk, UK
    Posts
    2,555

    Re: [RESOLVED] Converting String to ByteArray

    Quote Originally Posted by Half View Post
    Note the 'another system' piece of text. You can use a system of dead raccoons to store Unicode data and call it URCns-D but it won't make it Unicode.
    I think you're confusing "Unicode" with "some specific encoding of Unicode". Unicode text is simply a string of code points. Representing these code points as bytes (or, as in this [hopefully] hypothetical example, dead raccoons) is the job of the encoding. So the 'another system' refers to a different encoding system of Unicode code points, not a non-Unicode system.

    Quote Originally Posted by Half View Post
    To illustrate the humongous difference between actual Unicode and a distorted thing like UTF-8:
    input string: "대한민국"
    UTF8.GetBytes("대한민국") => Byte Seq: 128 237 149 156 235 175 188 234 181 173
    Unicode.GetBytes("대한민국") => Byte Seq: 0 179 92 213 252 187 109 173
    Yes, two different encodings end up with different bytes. Hence why the sending and receiving party need to agree on the encoding used. Maybe you're confused because the encoding class is called "Unicode". I agree it's oddly named, but look up the documentation and you'll find that it's simply little-endian UTF-16.
    The "actual Unicode" isn't the result of Unicode.GetBytes. It's this: U+B300 U+D55C U+BBFC U+AD6D

    Quote Originally Posted by Half View Post
    Use some other encoding in generating the byte sequence and when Windows 9 decides it's time for middle-endiannes, start digging for the source code to make sense of the output.
    No, as the quote that dbasnett has pulled out of the article states, you simply have to know what encoding was used to generate that set of bytes. Windows 9 doesn't decide what the encoding is, the application chooses the encoding scheme it uses to decode the bytes into a string.

    I really think you need to go back and read that article again.

  10. #10
    Frenzied Member
    Join Date
    Mar 2005
    Location
    Sector 001
    Posts
    1,577

    Re: [RESOLVED] Converting String to ByteArray

    Mhhh this whole thing started by me trying to say that it is ok to simply loop through a string and get the bytes of each char. It seemed 'clunky' and 'vb6ish' to the OP but it is neither. If anything, it is C-ish.

    I did not link to the now infamous webpage in order to somehow try to prove that encodings are irrelevant or useless, but to make it easier for those who would like to know why and how UTF8.GetBytes, Unicode.GetBytes, Default.GetBytes etc etc differ from one another.

    I was going to ignore dbasnett's remark since it serves no other purpose other than a shot at being sarcastic but meh: the quoted text is indeed scary but it is just for drama. What happens if we do have a string without knowing its encoding? The Earth becomes a unipolar magnet, a hole in space-time opens, Stephen Hawking turns into a black hole? If it were such a big deal we would use it as an encryption system.

    Quote Originally Posted by Evil_Giraffe View Post
    Quote Originally Posted by Half
    To illustrate the humongous difference between actual Unicode and a distorted thing like UTF-8:
    input string: "대한민국"
    UTF8.GetBytes("대한민국") => Byte Seq: 128 237 149 156 235 175 188 234 181 173
    Unicode.GetBytes("대한민국") => Byte Seq: 0 179 92 213 252 187 109 173
    Yes, two different encodings end up with different bytes. Hence why the sending and receiving party need to agree on the encoding used. Maybe you're confused because the encoding class is called "Unicode". I agree it's oddly named, but look up the documentation and you'll find that it's simply little-endian UTF-16.
    The "actual Unicode" isn't the result of Unicode.GetBytes. It's this: U+B300 U+D55C U+BBFC U+AD6D
    but but but but... The bytes 0 179 92 213 252 187 109 173 in hex are 00 B3 | 5C D5 | FC BB | 6D AD
    & we all know about endianness
    VB 2005, Win Xp Pro sp2

  11. #11
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: [RESOLVED] Converting String to ByteArray

    I was NOT being sarcastic. Several statements were made that were just wrong, and I felt they needed to be corrected. I wouldn't call the quoted text dramatic, unless that is what stating the obvious is.

    Code:
    82BB82EA82CD89E4815882AA96E291E882C582A082E982B182C682F0926D82C182C482A282E982B782D782C482C582CD82C882A2814182BB82EA82CD89E4815882AA82A082E982B182C682F08D7382A482C68E7682ED82EA82E982B782D782C482C582B78142
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  12. #12
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: [RESOLVED] Converting String to ByteArray

    I was NOT being sarcastic. Several statements were made that were just wrong, and I felt they needed to be corrected. I wouldn't call the quoted text dramatic, unless that is what stating the obvious is.

    Code:
    82BB82EA82CD89E4815882AA96E291E882C582A082E982B182C682F0926D82C182C482A282E982B782D782C482C582CD82C
    882A2814182BB82EA82CD89E4815882AA82A082E982B182C682F08D7382A482C68E7682ED82EA82E982B782D782C482C582B78142
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  13. #13

    Thread Starter
    PowerPoster
    Join Date
    Jul 2006
    Location
    Maldon, Essex. UK
    Posts
    6,334

    Re: [RESOLVED] Converting String to ByteArray

    Although I marked this as resolved (as my original question was answered) the continuing discussion has me interested.

    Let's see if I've got a grip on it yet.....

    If I were developing a multi-lingual international Chat program where clients may be using, for instance, a Chineese character set and others may be using a Latin based character set, I would need to support multi-byte character transfer. In order to 'unscramble' the data I would need to know the character set that the client is using.

    e.g.
    Client 1 (using Chineese characters) send a message, the Server picks it up and forwards it to Client 2 who's using English. In order for the Chineese characters to be displayed at Client 2 I would need to also send the Code Page ID(?) or at least something to tell Client 2's program how to interpret the multi-byte format of the data (since multi-byte data can be from 1 to 6 bytes(?)) so that the original Chineese characters are displayed at Client 2.

  14. #14
    Angel of Code Niya's Avatar
    Join Date
    Nov 2011
    Posts
    9,017

    Re: [RESOLVED] Converting String to ByteArray

    Actually, you don't have to do a thing more than encode your strings in a format that can express the unicode code-points used to represent chinese characters. UTF8 should be sufficient for this. As long as both clients agree that strings passed between them are encoded in UTF8.
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading | Simple file compression | Demon Arena

    Copy/move files using Windows Shell | I'm not wanted

    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. - jmcilhinney

    The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber

  15. #15

    Thread Starter
    PowerPoster
    Join Date
    Jul 2006
    Location
    Maldon, Essex. UK
    Posts
    6,334

    Re: [RESOLVED] Converting String to ByteArray

    Ah ha, got it. I could think of UTF8 as a 'universal panacea' in terms of unicode code-points. Once everyone's agreed that is the way data is going to be 'encoded' there shouldn't be any problems. (I'm always a bit wary about 'universal panaceas'.)

  16. #16
    Angel of Code Niya's Avatar
    Join Date
    Nov 2011
    Posts
    9,017

    Re: [RESOLVED] Converting String to ByteArray

    Exactly
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading | Simple file compression | Demon Arena

    Copy/move files using Windows Shell | I'm not wanted

    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. - jmcilhinney

    The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width