Results 1 to 8 of 8

Thread: FYI: UTF-8 + Manifest on Win10.1903 or better

  1. #1

    Thread Starter
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    FYI: UTF-8 + Manifest on Win10.1903 or better

    Starting with Win10.1903 one can use UTF8 encoded text and display it in a VB textbox as if textbox were unicode compatible.
    https://docs.microsoft.com/en-us/win...utf8-code-page

    Caveats: Manifest required and includes entries for
    - common controls v6 (i.e., theming)
    - <activeCodePage> element with its value set to: UTF-8
    Code:
    <?xml version="1.0" standalone="yes"?>
    <assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
    	<assemblyIdentity name="My.Cool.New.Application" version="1.0.0.0" type="win32" processorArchitecture="x86"/>
    	<dependency>
    		<dependentAssembly>
    			<assemblyIdentity name="Microsoft.Windows.Common-Controls" version="6.0.0.0" type="win32" processorArchitecture="x86" publicKeyToken="6595b64144ccf1df" language="*"/>
    		</dependentAssembly>
    	</dependency>
    	<trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    		<security>
    			<requestedPrivileges>
    				<requestedExecutionLevel level="asInvoker" uiAccess="false"/>
    			</requestedPrivileges>
    		</security>
    	</trustInfo>
    	<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
    		<application>
    			<supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
    			<supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
    			<supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
    			<supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/>
    			<supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/>
    		</application>
    	</compatibility>
    	<application xmlns="urn:schemas-microsoft-com:asm.v3">
    		<windowsSettings>
    			<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    		</windowsSettings>
    	</application>
    </assembly>
    A quick test showed that with/without a BOM, the text appears correctly. In a sample project, I dumped a bunch of unicode text into NotePad and had it save the file as UTF-8. I then simply read the file into a byte array, ensuring the final byte was zero and sent that array to the textbox via SetWindowText API.

    Here's the declaration. Notice that we are using the A, not W unicode, version of the API. Windows added UTF-8 ability for A-version APIs starting with I think Win10.1803
    Code:
    Private Declare Function SetWindowText Lib "user32.dll" Alias "SetWindowTextA" (ByVal hwnd As Long, lpString As Any) As Long
    
       ... read utf-8 file into BYTE array: aData()
        SetWindowText Text1.hwnd, aData(0)
    Note. Trying this without manifesting for common controls failed to display text correctly. Also, trying with common controls and without the new activeCodePage entry failed to display text correctly. I did not test this with theming disabled.

    This is kinda new. Feel free to comment especially regarding gotchas from personal experiences. Not sure how we might use this in the VB world.

    Edited:
    Should be doable with API-created fonts using the character set 65001. However VB stdFont will not accept that character set. stdFont.CharSet is Integer & 65001 converted to integer is -535. Negative values are rejected. Attempts to use COM OleCreateFontIndirect rejects the character set also & resets it to zero. Wouldn't be surprised if that changes down the road. But fonts created with APIs like CreateFont/CreateFontIndirect should be ok.

    Prior to Win10.1903, UTF-8 parsing/conversion still needs to be done using other methods, converting to unicode for use of W-suffix APIs. This manifest entry does not apply to those earlier operating systems.
    Last edited by LaVolpe; Dec 8th, 2019 at 12:46 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  2. #2
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    Good find. Interesting even if not obviously useful.

    VB's intrinsic controls' .Text, .Caption, etc. seem to go through OLE plumbing that looks at control.Font.Charset, so I'm not sure we can jigger things to make those work.

    Code:
    Option Explicit
    
    Private Enum WINDOW_MESSAGES
        WM_SETTEXT = &HC&
    End Enum
    #If False Then
    Dim WM_SETTEXT
    #End If
    
    Private Declare Function SendMessageA Lib "user32" ( _
        ByVal hWnd As Long, _
        ByVal wMsg As WINDOW_MESSAGES, _
        Optional ByVal wParam As Long, _
        Optional ByVal lParam As Long) As Long
    
    Private Sub Form_Load()
        Dim F As Integer
        Dim SampleBytes() As Byte
    
        F = FreeFile(0)
        Open App.Path & "\sample.txt" For Binary Access Read As #F
        ReDim SampleBytes(LOF(F) - 1)
        Get #F, , SampleBytes
        ReDim Preserve SampleBytes(LOF(F))
        Close #F
        SendMessageA Text1.hWnd, WM_SETTEXT, , VarPtr(SampleBytes(0))
    End Sub
    Code:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
      <assemblyIdentity type="win32" name="Joeblow.Project1" version="1.0.0.0"/>
      <dependency>
        <dependentAssembly>
          <assemblyIdentity language="*" name="Microsoft.Windows.Common-Controls" processorArchitecture="X86" publicKeyToken="6595b64144ccf1df" type="win32" version="6.0.0.0" />
        </dependentAssembly>
      </dependency>
      <application>
        <windowsSettings>
          <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
        </windowsSettings>
      </application>
    </assembly>

  3. #3
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    VB6 native text I/O seems to use some other ANSI-Unicode conversion (OLE? VB runtime?).

    I wonder if FSO's TextStream.ReadLine & .WriteLine might work properly for UTF-8 text with this setting (and opened ANSI) though?

    FSO.OpenTextFile docs say "ASCII" but I wonder if it really might use the current ANSI codepage with system codec calls like MultiByteToWideChar?


    Line by line reading seems to be a common headache when working with MultiByteToWideChar manually to convert blocks of input text. You have to parse the blocks to extract whole lines and since CRLF might span blocks the logic gets a little tricky.

  4. #4
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    I tried FSO I/O and it works just well enough to mislead you into hurting yourself.

    TextStream.ReadAll() seems to work fine. .ReadLine() seems to work in many cases and fail in others, the failures manifesting as a multi-byte UTF-8 character being incorrectly read as several single-byte characters.

  5. #5
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    Here's a case that fails on both methods:

    Name:  sshot.png
Views: 788
Size:  4.3 KB
    Attached Files Attached Files

  6. #6

    Thread Starter
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    However, the APIs seem to work fine. I'm guessing FSO has problems converting UTF-8? Following was result of simply using SetWindowText and MsgBox Text1.Text. As we can see below, I think VB's internal unicode-ANSI conversion has a negative effect -- message box is wider than it should be. So if UTF-8 strings are going to be used, looks like they should be passed via APIs vs. VB's text/caption properties

    Left: VB's MsgBox on Text1.Text. Right: MessageBox API passed byte array directly
    Name:  SShot.jpg
Views: 800
Size:  20.7 KB

    Edited.
    FSO.OpenTextFile docs say "ASCII" but I wonder if it really might use the current ANSI codepage with system codec calls like MultiByteToWideChar?
    ^^ May be spot on

    FYI Your manifest seems to be technically incorrect, but didn't fail. The <application> & <windowsSettings> elements are in the asm.v3 namespace, not asm.v1. At least that is how it is referenced in every MSDN example I've seen.
    Last edited by LaVolpe; Dec 8th, 2019 at 12:57 PM.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  7. #7
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    I'm pretty sure most of asm.v3 got folded back into asm.v1 and asm.v2 discarded with Windows 10 1803.

    Look at the example given at the page you linked to, or pretty much every newer example anywhere at doc.microsoft.com these days. But you can probably still freely decorate with asm.v3 for backward compatibility.

    It does seem arbitrary to the point of near chaos though. I can't find anything that authoritatively lists tags and namespaces for Fusion manifests.

  8. #8

    Thread Starter
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: FYI: UTF-8 + Manifest on Win10.1903 or better

    Quote Originally Posted by dilettante View Post
    I'm pretty sure most of asm.v3 got folded back into asm.v1 and asm.v2 discarded with Windows 10 1803.
    Would be really nice to find some non-conflicting documentation from Microsoft. Even the various WindowSettings namespaces can be iffy depending on which MSDN page you happen to land on
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width