Results 1 to 12 of 12

Thread: [RESOLVED] Problem with InkEdit Control

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Dec 2012
    Posts
    1,477

    Resolved [RESOLVED] Problem with InkEdit Control

    This one has me stumped. I am using an InkEdit Control called "txtDisp" in order to display UTF-8 samples.
    Code:
    Private Sub cmdHebrew_Click()
        Dim sUtf8 As String
        Dim bRev() As Byte
        DispByte "Hebrew", bHebrew
        sUtf8 = Utf8ToStr(bHebrew)
        txtDisp = txtDisp & sUtf8 & vbCrLf
        bRev = StrToUtf8(sUtf8)
        DispByte "Reversed", bRev
        txtDisp = txtDisp & "String Size:" & Len(sUtf8) & " Byte Size:" & GetbSize(bHebrew) & vbLf
    End Sub
    On 4 of those samples, everything works as it should. However, on the above sample it shows:
    Code:
    Hebrew:
    61 62 63 20 D7 9B D7 A9 D7 A8 20 66 31 32 33    
    abc כשר f123
    Reversed:
    61 62 63 20 D7 9B D7 A9 D7 A33 32 31 66 20 8  
    String Size:12 Byte Size:15
    I have narrowed it down to the line:
    txtDisp = txtDisp & sUtf8 & vbCrLf
    If I remove that line, I get:
    Code:
    Hebrew:
    61 62 63 20 D7 9B D7 A9 D7 A8 20 66 31 32 33    
    Reversed:
    61 62 63 20 D7 9B D7 A9 D7 A8 20 66 31 32 33    
    String Size:12 Byte Size:15
    Any ideas?

    J.A. Coutts
    Last edited by couttsj; Aug 5th, 2021 at 12:36 AM.

  2. #2
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,156

    Re: Problem with InkEdit Control

    You can append text to a TextBox/RichTextBox/InkEdit more efficiently by using SelText property like this

    Code:
        txtDisp.SelStart = &H7FFFFFFF
        txtDisp.SelText = sUtf8 & vbCrLf
    cheers,
    </wqw>

  3. #3
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    It looks like you have shoved UTF-8 into an ANSI String and then blithely it as Unicode.

    "Unicode" means UTF-16LE in Windows. UTF-8 is entirely different and requires special handling. This is mainly historical. There was no UTF-8 yet when Microsoft first implemented Unicode in NT 3.

    Windows 11 begins encouraging use of UTF-8 for text. We may get additional system APIs for dealing with it, but OLE and of course VB are unlikely to get any enhancements for dealing with UTF-8 natively.


    EM_STREAMIN and EM_STREAMOUT can use the SF_USECODEPAGE flag and the UTF-8 code page to import/export UTF-8 text or RTF. That requires a 3.0 or later RichEdit, which the InkEdit wraps.

    But the InkEdit.Text and .TextRTF properties do not magically detect and process a string of UTF-8 as far as I know. Even if they did, concatenation (&) does not.

  4. #4

    Thread Starter
    Frenzied Member
    Join Date
    Dec 2012
    Posts
    1,477

    Re: Problem with InkEdit Control

    Historically browsers have used % encoding. The RFCs are very vague about how URLs and URIs are encoded, but browsers seem to accept UTF-8 encoding without % encoding. I find it very difficult to test Unicode, so I found a demo on DI Management to try and get some data to use. It extracts samples from an Excel Spreadsheet and uses WideCharToMultiByte to convert them to UTF-8 strings. Since I had no desire to use Excel, I took the UTF-8 byte arrays and converted them back to Unicode characters using MultiByteToWideChar. I had to use InkEdit in order to display the results, and this glitch popped up when I tried to convert the Hebrew sample back to UTF-8. I am only assuming that the Hebrew byte array is correct and that the problem is with the InkEdit control.

    Regardless, it is only a demo, which I will post in the CodeBank.

    J.A. Coutts

  5. #5
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    Well you seem to be very, very lost.

    UTF-8 is not Unicode. At least not to Windows. Instead it is a multibyte character encoding.

    VB6 and InkEdit both work with Unicode, i.e. UTF-16LE.

    There is no problem with InkEdit. it works fine here when properly used:

    Code:
    Option Explicit
    
    Private Enum CBTS_FLAGS
        CRYPT_STRING_HEX = &H4&
        CRYPT_STRING_HEXRAW = &HC&
        CRYPT_STRING_NOCRLF = &H40000000
    End Enum
    
    Private Declare Function CryptBinaryToString Lib "Crypt32" _
        Alias "CryptBinaryToStringW" ( _
        ByRef pbBinary As Byte, _
        ByVal cbBinary As Long, _
        ByVal dwFlags As Long, _
        ByVal pszString As Long, _
        ByRef pcchString As Long) As Long
    
    Private Const CP_UTF8 As Long = 65001
    
    Private Declare Function MultiByteToWideChar Lib "kernel32" ( _
        ByVal CodePage As Long, _
        ByVal dwFlags As Long, _
        ByVal lpMultiByteStr As Long, _
        ByVal cchMultiByte As Long, _
        ByVal lpWideCharStr As Long, _
        ByVal cchWideChar As Long) As Long
    
    Private Declare Function WideCharToMultiByte Lib "kernel32" ( _
        ByVal CodePage As Long, _
        ByVal dwFlags As Long, _
        ByVal lpWideCharStr As Long, _
        ByVal cchWideChar As Long, _
        ByVal lpMultiByteStr As Long, _
        ByVal cchMultiByte As Long, _
        ByVal lpDefaultChar As Long, _
        ByVal lpUsedDefaultChar As Long) As Long
    
    Private Function DumpHex( _
        ByRef Bytes() As Byte, _
        Optional ByVal Flags As CBTS_FLAGS = CRYPT_STRING_HEX Or CRYPT_STRING_NOCRLF) As String
    
        Dim OutLength As Long
        
        CryptBinaryToString Bytes(LBound(Bytes)), _
                            UBound(Bytes) - LBound(Bytes) + 1, _
                            Flags, _
                            0&, _
                            OutLength
        DumpHex = String$(OutLength, 0)
        CryptBinaryToString Bytes(LBound(Bytes)), _
                            UBound(Bytes) - LBound(Bytes) + 1, _
                            Flags, _
                            StrPtr(DumpHex), _
                            OutLength
    End Function
    
    Private Function FromUTF8(ByRef UTF8() As Byte) As String
        Dim CountBytes As Long
        Dim OutLength As Long
        
        CountBytes = UBound(UTF8) - LBound(UTF8) + 1
        OutLength = MultiByteToWideChar(CP_UTF8, 0, VarPtr(UTF8(LBound(UTF8))), _
                                        CountBytes, 0, 0)
        FromUTF8 = String$(OutLength, 0)
        MultiByteToWideChar CP_UTF8, 0, VarPtr(UTF8(LBound(UTF8))), _
                            CountBytes, StrPtr(FromUTF8), OutLength
    End Function
    
    Private Function ToUTF8(ByVal Wide As String) As Byte()
        Dim OutLength As Long
        Dim UTF8() As Byte
        
        OutLength = WideCharToMultiByte(CP_UTF8, 0, StrPtr(Wide), Len(Wide), _
                                        0, 0, 0, 0)
        ReDim UTF8(OutLength - 1)
        WideCharToMultiByte CP_UTF8, 0, StrPtr(Wide), Len(Wide), _
                            VarPtr(UTF8(0)), OutLength, 0, 0
        ToUTF8 = UTF8
    End Function
    
    Private Sub Form_Load()
        Dim F As Integer
        Dim Bytes() As Byte
    
        With InkEdit1
            .SelText = "Hebrew:"
            .SelText = vbNewLine
    
            'Get bytes from a file of Hebrew text, UTF-8 No BM:
            F = FreeFile(0)
            Open "Hebrew UTF-8.txt" For Binary Access Read As #F
            ReDim Bytes(LOF(F) - 1)
            Get #F, , Bytes
            Close #F
            Text1.Text = DumpHex(Bytes)
            .SelText = FromUTF8(Bytes)
    
            'The Hebrew text ended with a newline, so just"
            .SelText = "More text."
            .SelStart = 0
        End With
    End Sub
    
    Private Sub Form_Resize()
        If WindowState <> vbMinimized Then
            With InkEdit1
                .Move 0, 0, ScaleWidth, ScaleHeight / 2
                Text1.Move 0, .Height, ScaleWidth, ScaleHeight - .Height
            End With
        End If
    End Sub
    
    Private Sub InkEdit1_SelChange()
        mnuDumpSelection.Enabled = InkEdit1.SelLength <> 0
    End Sub
    
    Private Sub mnuDumpSelection_Click()
        'Dump UTF-8 of selection:
        Text1.Text = DumpHex(ToUTF8(InkEdit1.SelText))
    End Sub

  6. #6

    Thread Starter
    Frenzied Member
    Join Date
    Dec 2012
    Posts
    1,477

    Re: Problem with InkEdit Control

    I will make it simpler.
    Code:
    Private Sub cmdTest_Click()
        Dim sUtf8 As String
        Dim sTmp As String
        sUtf8 = Utf8ToStr(bHebrew)
        sTmp = sUtf8 & vbLf & HexDump(bHebrew)
        Debug.Print sTmp
        InkEdit1 = sTmp
    End Sub
    the Debug.Print produces:
    Code:
    abc ??? f123
    61 62 63 20 D7 9B D7 A9 D7 A8 20 66 31 32 33
    The InkEdit1 display produces:
    Code:
    abc כשר f123
    61 62 63 20 D9 7B D7 A9 D7 A33 32 31 66 20 8
    J.A. Coutts

  7. #7
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    Well it works just fine here:

    Code:
    Option Explicit
    
    Private Enum CBTS_FLAGS
        CRYPT_STRING_HEX = &H4&
        CRYPT_STRING_HEXRAW = &HC&
        CRYPT_STRING_NOCRLF = &H40000000
    End Enum
    
    Private Declare Function CryptBinaryToString Lib "Crypt32" _
        Alias "CryptBinaryToStringW" ( _
        ByRef pbBinary As Any, _
        ByVal cbBinary As Long, _
        ByVal dwFlags As Long, _
        ByVal pszString As Long, _
        ByRef pcchString As Long) As Long
    
    Private Const CP_UTF8 As Long = 65001
    
    Private Declare Function MultiByteToWideChar Lib "kernel32" ( _
        ByVal CodePage As Long, _
        ByVal dwFlags As Long, _
        ByVal lpMultiByteStr As Long, _
        ByVal cchMultiByte As Long, _
        ByVal lpWideCharStr As Long, _
        ByVal cchWideChar As Long) As Long
    
    Private Function DumpHex( _
        ByRef S As String, _
        Optional ByVal Flags As CBTS_FLAGS = CRYPT_STRING_HEX Or CRYPT_STRING_NOCRLF) As String
    
        Dim OutLength As Long
        
        CryptBinaryToString ByVal StrPtr(S), _
                            LenB(S), _
                            Flags, _
                            0&, _
                            OutLength
        DumpHex = String$(OutLength, 0)
        CryptBinaryToString ByVal StrPtr(S), _
                            LenB(S), _
                            Flags, _
                            StrPtr(DumpHex), _
                            OutLength
    End Function
    
    Private Function FromUTF8(ByRef UTF8() As Byte) As String
        Dim CountBytes As Long
        Dim OutLength As Long
        
        CountBytes = UBound(UTF8) - LBound(UTF8) + 1
        OutLength = MultiByteToWideChar(CP_UTF8, 0, VarPtr(UTF8(LBound(UTF8))), _
                                        CountBytes, 0, 0)
        FromUTF8 = String$(OutLength, 0)
        MultiByteToWideChar CP_UTF8, 0, VarPtr(UTF8(LBound(UTF8))), _
                            CountBytes, StrPtr(FromUTF8), OutLength
    End Function
    
    Private Sub Form_Load()
        Dim F As Integer
        Dim Bytes() As Byte
        Dim SFromUTF8 As String
        Dim SOut As String
    
        'Get bytes from a file of Hebrew text, UTF-8 No BOM:
        F = FreeFile(0)
        Open "Hebrew UTF-8.txt" For Binary Access Read As #F
        ReDim Bytes(LOF(F) - 1)
        Get #F, , Bytes
        Close #F
    
        SFromUTF8 = FromUTF8(Bytes)
        SOut = SFromUTF8 & vbNewLine & DumpHex(SFromUTF8)
    
        'Results:
        InkEdit1.Text = SOut
        Text1.Text = SOut
        Debug.Print SOut
    End Sub
    
    Private Sub Form_Resize()
        If WindowState <> vbMinimized Then
            With InkEdit1
                .Move 0, 0, ScaleWidth, ScaleHeight / 2
                Text1.Move 0, .Height, ScaleWidth, ScaleHeight - .Height
            End With
        End If
    End Sub
    Hebrew UTF-8.txt:
    Code:
    abc כשר f123
    Seems most likely you have something else going on there.

  8. #8
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    BTW:

    It should look like:

    Code:
    abc ??? f123
    61 00 62 00 63 00 20 00 db 05 e9 05 e8 05 20 00 66 00 31 00 32 00 33 00
    and:

    Code:
    abc כשר f123
    61 00 62 00 63 00 20 00 db 05 e9 05 e8 05 20 00 66 00 31 00 32 00 33 00
    So even your "hex dump" looks a bit dicey.

    Are you using ANSI-converting functions like Asc() or something?
    Last edited by dilettante; Aug 5th, 2021 at 07:30 PM.

  9. #9

    Thread Starter
    Frenzied Member
    Join Date
    Dec 2012
    Posts
    1,477

    Re: Problem with InkEdit Control

    Quote Originally Posted by dilettante View Post
    So even your "hex dump" looks a bit dicey.

    Are you using ANSI-converting functions like Asc() or something?
    The full code is available here:
    https://www.vbforums.com/showthread....896-UTF-8-Demo
    It looks like the display of the Hebrew string is somehow affecting the display of the ASCII display following. If you look closely at the hex values, you can see that the last six values (33 32 31 66 20 8) are the reverse of what they should be (8 20 66 31 32 33). The difference between the Hebrew sample and the others is that it seems to be a mixture 8 bit and 16 bit values.
    61|62|63|20|D7 9B|D7 A9|D7 A8|20|66|31|32|33

    My knowledge of these non-latin character sets is very limited, as is the internal workings of the InkEdit Control. For your info, here is a complete hex dump of the combined string.
    Code:
    sTmp:
    61 00 62 00 63 00 20 00 DB 05 E9 05 E8 05 20 00 
    66 00 31 00 32 00 33 00 0A 00
    36 00 31 00 20 00 36 00 32 00 20 00 36 00 33 00 
    20 00 32 00 30 00 20 00 44 00 37 00 20 00 39 00 
    42 00 20 00 44 00 37 00 20 00 41 00 39 00 20 00 
    44 00 37 00 20 00 41 00 38 00 20 00 32 00 30 00 
    20 00 36 00 36 00 20 00 33 00 31 00 20 00 33 00 
    32 00 20 00 33 00 33 00 20 00 0A 00
    J.A. Coutts
    Last edited by couttsj; Aug 5th, 2021 at 08:10 PM.

  10. #10

    Thread Starter
    Frenzied Member
    Join Date
    Dec 2012
    Posts
    1,477

    Re: Problem with InkEdit Control

    Following wqweto's advice, the following works properly.
    Code:
    Private Sub cmdtest_Click()
        Dim sUtf8 As String
        Dim sTmp As String
        sUtf8 = Utf8ToStr(bHebrew)
        txtDisp = sUtf8 & vbLf
        sTmp = HexDump(bHebrew)
        txtDisp.SelStart = Len(txtDisp.Text)
        txtDisp.SelText = sTmp
    End Sub
    J.A. Coutts

  11. #11
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    No, UTF-16 Windows Unicode is always 16 bits... well except when it is 32 bits. But that's a rare case known as surrogate pairs.

    UTF-8 encoded characters can be 8, 16, 24, or 32 bits. Normally this is stated as 1 to 4 octets.

    As I showed above, there is no InkEdit problem. You'll need to debug your code to figure out where it goes wrong.

    Your post #6 code displays the same String twice. Aside from the characters with no ANSI mapping that display as a "?" or "▯" depending on the control and font, the two should appear identical.

  12. #12
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Problem with InkEdit Control

    I did a bit more fiddling, and I did indeed find some ways that using right-to-left characters with RichEdit controls (like the InkEdit) can cause weird problems when you mix them with left-to-right characters.

    I didn't find a pattern Some combinations work fine, others mess up as you showed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width