The original CSocket/SocketMaster emulated the Microsoft Winsock Control. cSocket2 introduced the newer Dual Stack calls that would support IPv6 as well as IPv4 , but remained procedurally the same. All of them supported the sending/receiving of all types of data (byte, string, integer, long, etc). WSA sockets treated everything as byte data and could care less what type of data it was, and it was up to the higher level programs as to how that data was treated. What I found was that all my programs utilizing cSocket2 used string data, which made a lot of the code used to support data types unused and redundant. Even though the conversion of Unicode string data (16 bit) to byte data (8 bit) was inefficient, VB6 contained a plethora of commands that made manipulating string data relatively easy, and inherently took care of garbage collection.
NewSocket sends and receives string data only. There is only a single conversion made to and from byte data.
I was not happy with the way cSocket2 handled errors, so now all errors are returned to the calling program for handling through the Error Event.
Modern Cryptographic techniques are built on top of TCP/IP, so a new Event called "EncrDataArrival" was added along with a flag to allow incoming encrypted data to be treated differently from plain data.
Originally, the LocalPort(PropertyGet) routine would simply return the value of m_lngLocalPort, which was assigned during the Bind process. However, GetAddrInfo automatically assigns the first available socket on a Connect call (Bind not called), and the routine would return zero. It was changed to use GetLocalPort.
I have never had a problem using this function to convert from Unicode string to byte array and back again. If there are other issues, please point me in the right direction.
On japanese systems is even ANSI double-byte. StrConv is cutting off the second byte and thus there are some situations where StrConv is causing a loss of data.
The help file says that it is supposed to do the conversion according to the default code page of the system it is running on (much like the date settings).
-----------------------------------------------------------------------
vbUnicode - 64 - Converts the string to Unicode using the default code page of the system.
vbFromUnicode - 128 - Converts the string from Unicode to the default code page of the system.
-----------------------------------------------------------------------
Are you saying that this is not always the case? And if so, under what conditions?
It certainly seems to work for standard ANSI characters.
Found this little routine that could be used to replace StrConv. Trouble is I don't know how you would determine when to use it, since it does require setting a flag.
J.A. Coutts
Edit: A much cleaner (and presumably faster) way to convert Unicode byte arrays is to coerce it into a string using the Let statement. Sample code below has been modified to reflect that.
Code:
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (hpvDest As Any, hpvSource As Any, ByVal cbCopy As Long)
'CodePage
Private Const CP_ACP = 0 'ANSI
Private Const CP_MACCP = 2 'Mac
Private Const CP_OEMCP = 1 'OEM
Private Const CP_UTF7 = 65000
Private Const CP_UTF8 = 65001
'dwFlags
Private Const WC_NO_BEST_FIT_CHARS = &H400
Private Const WC_COMPOSITECHECK = &H200
Private Const WC_DISCARDNS = &H10
Private Const WC_SEPCHARS = &H20 'Default
Private Const WC_DEFAULTCHAR = &H40
'The WideCharToMultiByte function maps a wide-character string to a new character string.
'The function is faster when both lpDefaultChar and lpUsedDefaultChar are NULL.
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, _
ByVal dwFlags As Long, _
ByVal lpWideCharStr As Long, _
ByVal cchWideChar As Long, _
ByVal lpMultiByteStr As Long, _
ByVal cbMultiByte As Long, _
ByVal lpDefaultChar As Long, _
ByVal lpUsedDefaultChar As Long) As Long
Private Sub Command1_Click()
Dim bANSI() As Byte
Dim bUNI() As Byte
Dim sTest As String
Dim sRestore As String
Dim I As Long
sTest = "Testing"
bANSI = StringToByteArray(sTest, False)
Debug.Print "ANSI = '" & ByteArrayToString(bANSI) & "' Len = " & UBound(bANSI) + 1
For I = 0 To UBound(bANSI)
Debug.Print Hex$(bANSI(I)) & " ";
Next
Debug.Print
bUNI = StringToByteArray(sTest)
Debug.Print "UNICODE = '" & ByteArrayToString(bUNI) & "' Len = " & UBound(bUNI) + 1
For I = 0 To UBound(bUNI)
Debug.Print Hex$(bUNI(I)) & " ";
Next
Debug.Print
'Using StrConv to convert a Unicode character array directly
'will cause the resultant string to have extra embedded nulls
'reason, StrConv does not know the difference between Unicode and ANSI
Debug.Print "StrConv ANSI = " & StrConv(bANSI, vbUnicode)
Debug.Print "StrConv UNICODE = " & StrConv(bUNI, vbUnicode)
Let sRestore = bANSI
Debug.Print "Assign ANSI = " & sRestore
Let sRestore = bUNI
Debug.Print "Assign UNICODE = " & sRestore
End Sub
Private Function ByteArrayToString(Bytes() As Byte) As String
Dim iUnicode As Long, I As Long, j As Long
On Error Resume Next
I = UBound(Bytes)
If (I < 1) Then
'ANSI, just convert to unicode and return
ByteArrayToString = StrConv(Bytes, vbUnicode)
Exit Function
End If
I = I + 1
'Examine the first two bytes
CopyMemory iUnicode, Bytes(0), 2
If iUnicode = Bytes(0) Then 'Unicode
'Account for terminating null
If (I Mod 2) Then I = I - 1
'Set up a buffer to recieve the string
ByteArrayToString = String$(I / 2, 0)
'Copy to string
CopyMemory ByVal StrPtr(ByteArrayToString), Bytes(0), I
Else 'ANSI
ByteArrayToString = StrConv(Bytes, vbUnicode)
End If
End Function
Private Function StringToByteArray(strInput As String, Optional bReturnAsUnicode As Boolean = True, Optional bAddNullTerminator As Boolean = False) As Byte()
Dim lRet As Long
Dim byteBuffer() As Byte
Dim lLenB As Long
If bReturnAsUnicode Then
lLenB = LenB(strInput)
'Resize buffer, do we want terminating null?
If bAddNullTerminator Then
ReDim byteBuffer(lLenB)
Else
ReDim byteBuffer(lLenB - 1)
End If
'Copy characters from string to byte array
CopyMemory byteBuffer(0), ByVal StrPtr(strInput), lLenB
Else
lLenB = Len(strInput)
'Num of characters
If bAddNullTerminator Then
ReDim byteBuffer(lLenB)
Else
ReDim byteBuffer(lLenB - 1)
End If
lRet = WideCharToMultiByte(CP_ACP, 0&, ByVal StrPtr(strInput), -1, ByVal VarPtr(byteBuffer(0)), lLenB, 0&, 0&)
End If
StringToByteArray = byteBuffer
End Function
Last edited by couttsj; Mar 19th, 2014 at 01:08 PM.
For unicode-capable socket-channels UTF8 (without BOM) makes the least problems as the finally transported ByteArray-Format,
because the Char-range from 0 - 127 is encoded (out of WChars within this range) fully compatible to ASCII (1:1) and
vice versa.
This ensures that protocols as e.g. http (all the Header-prefixes and stuff) translate as they should, whilst the content
which follows behind, could still contain char-sequences from out of the full Unicode-Range.
With the MultiByte-APIs you're already on the right track - just use them with the correct UTF-8 Codepage 65001 -
(there exist ready-to-use Conversion-functions in the Web already for UTF8ToVBString and vice versa though).
The original CSocket/SocketMaster emulated the Microsoft Winsock Control. cSocket2 introduced the newer Dual Stack calls that would support IPv6 as well as IPv4 , but remained procedurally the same. All of them supported the sending/receiving of all types of data (byte, string, integer, long, etc). WSA sockets treated everything as byte data and could care less what type of data it was, and it was up to the higher level programs as to how that data was treated. What I found was that all my programs utilizing cSocket2 used string data, which made a lot of the code used to support data types unused and redundant. Even though the conversion of Unicode string data (16 bit) to byte data (8 bit) was inefficient, VB6 contained a plethora of commands that made manipulating string data relatively easy, and inherently took care of garbage collection.
NewSocket sends and receives string data only. There is only a single conversion made to and from byte data.
I was not happy with the way cSocket2 handled errors, so now all errors are returned to the calling program for handling through the Error Event.
Modern Cryptographic techniques are built on top of TCP/IP, so a new Event called "EncrDataArrival" was added along with a flag to allow incoming encrypted data to be treated differently from plain data.
Originally, the LocalPort(PropertyGet) routine would simply return the value of m_lngLocalPort, which was assigned during the Bind process. However, GetAddrInfo automatically assigns the first available socket on a Connect call (Bind not called), and the routine would return zero. It was changed to use GetLocalPort.
J.A. Coutts
I have a class obtained from somewhere who claimed to support Unicode. But I never got a chance to test it.
Refer to attachment.
Last edited by Jonney; Mar 20th, 2014 at 10:05 PM.