I make a simple RTF writer to draw a textbox and put Chinese characters into Textbox,but it can't show Chinese characters,only English ANSI string.
CHS:\u27426\u-28722to TaiWan
It supposes to show "CHS:欢迎to Taiwan",but only shows "CHS:to Taiwan".
According to RTF specifications 1.6 above,what is the wrong with below rtf text? Why it cut \u27426\u-28722 by missing some tags?
Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.
Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:
Public Function Uni_RTF(ByVal sText As String) As String
Dim lLen As Long
Dim i As Long
Dim sChar As String
Dim lChar As Integer
lLen = Len(sText)
If lLen Then
For i = 1 To lLen
sChar = Mid$(sText, i, 1)
lChar = AscW(sChar)
Select Case lChar
Case 0 To 127
Uni_RTF = Uni_RTF & sChar
Case Else
Uni_RTF = Uni_RTF & "\u" & lChar & "?"
End Select
Next
Uni_RTF = Replace$(Uni_RTF, vbNewLine, "\par" & vbNewLine)
End If
End Function
Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.
Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:
Remove all "?" then change RTF Header "\uc1" to "\uc0",all unicodes show right also.
I don't not really understand the exact meaning of those control words such as \ucN,\uN,\upr,\ud. Might the above my RTF text got problem is due to those control words? (In the document of RTF specification,there're no examples.)
Edited:
After trail,\uc2 means the MDBS character is two bytes per character. e.g. \uc2\u27426\'bb\'b6 欢 Charset134 \'bb\'b6,Unicode \u27426
Last edited by Jonney; May 15th, 2011 at 09:12 AM.
The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character. http://en.wikipedia.org/wiki/Rich_Text_Format
If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.
With the "?" you have an indication that your Font doesn't have that character.
The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character. http://en.wikipedia.org/wiki/Rich_Text_Format
If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.
With the "?" you have an indication that your Font doesn't have that character.
At English environment (lcid=1033),I am typing or coping some Unicodes into MSWord,then save as RTF format. I change this file as .txt then opened by Notepad, I am seeing there're lots of charsetXXX (fcharset124,fcharset128,fcharset186...),Wondering why MSWord automatically know the charset,default font name,\uxxxx and \'xx\'xx? Are there any table corresponding to Unicode blocks?Any Algorithm in vb6?
I doubt that you will find much in the way of source code calling usp10.dll functions from Vb6. In addition you have to go through hoops and make wrappers so that they are Vb friendly.
'-----
If you find everything you need in "Arial Unicode MS" there is probably no need to even mess with codepages.