FYI: UTF-8 + Manifest on Win10.1903 or better
Starting with Win10.1903 one can use UTF8 encoded text and display it in a VB textbox as if textbox were unicode compatible.
https://docs.microsoft.com/en-us/win...utf8-code-page
Caveats: Manifest required and includes entries for
- common controls v6 (i.e., theming)
- <activeCodePage> element with its value set to: UTF-8
Code:
<?xml version="1.0" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity name="My.Cool.New.Application" version="1.0.0.0" type="win32" processorArchitecture="x86"/>
<dependency>
<dependentAssembly>
<assemblyIdentity name="Microsoft.Windows.Common-Controls" version="6.0.0.0" type="win32" processorArchitecture="x86" publicKeyToken="6595b64144ccf1df" language="*"/>
</dependentAssembly>
</dependency>
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
<security>
<requestedPrivileges>
<requestedExecutionLevel level="asInvoker" uiAccess="false"/>
</requestedPrivileges>
</security>
</trustInfo>
<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
<application>
<supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
<supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
<supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
<supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/>
<supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/>
</application>
</compatibility>
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
A quick test showed that with/without a BOM, the text appears correctly. In a sample project, I dumped a bunch of unicode text into NotePad and had it save the file as UTF-8. I then simply read the file into a byte array, ensuring the final byte was zero and sent that array to the textbox via SetWindowText API.
Here's the declaration. Notice that we are using the A, not W unicode, version of the API. Windows added UTF-8 ability for A-version APIs starting with I think Win10.1803
Code:
Private Declare Function SetWindowText Lib "user32.dll" Alias "SetWindowTextA" (ByVal hwnd As Long, lpString As Any) As Long
... read utf-8 file into BYTE array: aData()
SetWindowText Text1.hwnd, aData(0)
Note. Trying this without manifesting for common controls failed to display text correctly. Also, trying with common controls and without the new activeCodePage entry failed to display text correctly. I did not test this with theming disabled.
This is kinda new. Feel free to comment especially regarding gotchas from personal experiences. Not sure how we might use this in the VB world.
Edited:
Should be doable with API-created fonts using the character set 65001. However VB stdFont will not accept that character set. stdFont.CharSet is Integer & 65001 converted to integer is -535. Negative values are rejected. Attempts to use COM OleCreateFontIndirect rejects the character set also & resets it to zero. Wouldn't be surprised if that changes down the road. But fonts created with APIs like CreateFont/CreateFontIndirect should be ok.
Prior to Win10.1903, UTF-8 parsing/conversion still needs to be done using other methods, converting to unicode for use of W-suffix APIs. This manifest entry does not apply to those earlier operating systems.
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
Good find. Interesting even if not obviously useful.
VB's intrinsic controls' .Text, .Caption, etc. seem to go through OLE plumbing that looks at control.Font.Charset, so I'm not sure we can jigger things to make those work.
Code:
Option Explicit
Private Enum WINDOW_MESSAGES
WM_SETTEXT = &HC&
End Enum
#If False Then
Dim WM_SETTEXT
#End If
Private Declare Function SendMessageA Lib "user32" ( _
ByVal hWnd As Long, _
ByVal wMsg As WINDOW_MESSAGES, _
Optional ByVal wParam As Long, _
Optional ByVal lParam As Long) As Long
Private Sub Form_Load()
Dim F As Integer
Dim SampleBytes() As Byte
F = FreeFile(0)
Open App.Path & "\sample.txt" For Binary Access Read As #F
ReDim SampleBytes(LOF(F) - 1)
Get #F, , SampleBytes
ReDim Preserve SampleBytes(LOF(F))
Close #F
SendMessageA Text1.hWnd, WM_SETTEXT, , VarPtr(SampleBytes(0))
End Sub
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="Joeblow.Project1" version="1.0.0.0"/>
<dependency>
<dependentAssembly>
<assemblyIdentity language="*" name="Microsoft.Windows.Common-Controls" processorArchitecture="X86" publicKeyToken="6595b64144ccf1df" type="win32" version="6.0.0.0" />
</dependentAssembly>
</dependency>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
VB6 native text I/O seems to use some other ANSI-Unicode conversion (OLE? VB runtime?).
I wonder if FSO's TextStream.ReadLine & .WriteLine might work properly for UTF-8 text with this setting (and opened ANSI) though?
FSO.OpenTextFile docs say "ASCII" but I wonder if it really might use the current ANSI codepage with system codec calls like MultiByteToWideChar?
Line by line reading seems to be a common headache when working with MultiByteToWideChar manually to convert blocks of input text. You have to parse the blocks to extract whole lines and since CRLF might span blocks the logic gets a little tricky.
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
I tried FSO I/O and it works just well enough to mislead you into hurting yourself.
TextStream.ReadAll() seems to work fine. .ReadLine() seems to work in many cases and fail in others, the failures manifesting as a multi-byte UTF-8 character being incorrectly read as several single-byte characters.
2 Attachment(s)
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
Here's a case that fails on both methods:
Attachment 172867
1 Attachment(s)
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
However, the APIs seem to work fine. I'm guessing FSO has problems converting UTF-8? Following was result of simply using SetWindowText and MsgBox Text1.Text. As we can see below, I think VB's internal unicode-ANSI conversion has a negative effect -- message box is wider than it should be. So if UTF-8 strings are going to be used, looks like they should be passed via APIs vs. VB's text/caption properties
Left: VB's MsgBox on Text1.Text. Right: MessageBox API passed byte array directly
Attachment 172877
Edited.
Quote:
FSO.OpenTextFile docs say "ASCII" but I wonder if it really might use the current ANSI codepage with system codec calls like MultiByteToWideChar?
^^ May be spot on
FYI Your manifest seems to be technically incorrect, but didn't fail. The <application> & <windowsSettings> elements are in the asm.v3 namespace, not asm.v1. At least that is how it is referenced in every MSDN example I've seen.
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
I'm pretty sure most of asm.v3 got folded back into asm.v1 and asm.v2 discarded with Windows 10 1803.
Look at the example given at the page you linked to, or pretty much every newer example anywhere at doc.microsoft.com these days. But you can probably still freely decorate with asm.v3 for backward compatibility.
It does seem arbitrary to the point of near chaos though. I can't find anything that authoritatively lists tags and namespaces for Fusion manifests.
Re: FYI: UTF-8 + Manifest on Win10.1903 or better
Quote:
Originally Posted by
dilettante
I'm pretty sure most of asm.v3 got folded back into asm.v1 and asm.v2 discarded with Windows 10 1803.
Would be really nice to find some non-conflicting documentation from Microsoft. Even the various WindowSettings namespaces can be iffy depending on which MSDN page you happen to land on