2 Attachment(s)
[RESOLVED] RTF didn't show Unicode
I make a simple RTF writer to draw a textbox and put Chinese characters into Textbox,but it can't show Chinese characters,only English ANSI string.
CHS:\u27426\u-28722to TaiWan
It supposes to show "CHS:欢迎to Taiwan",but only shows "CHS:to Taiwan".
According to RTF specifications 1.6 above,what is the wrong with below rtf text? Why it cut \u27426\u-28722 by missing some tags?
{{\rtf1\ansi\ansicpg1253\uc0\deff0{\fonttbl{\f0\fnil\fcharset0\fprq2 Arial;}}{\colortbl;\red0\green0\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0 Normal;}}{\info{\author }{\operator }{\title }{\subject }{\comment Athanasios Gardos - RTFWriter.dll}{\creatim\yr2011\mo5\dy14\hr20\min26}{\revtim\yr2011\mo5\dy14\hr20\min26}{\version2}{\ edmins7}{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\paperw11906\paperh16837\margl1440\margr1440 \margt1440\margb1440\widowctrl\ftnbj \sectd \pmmetafile28\endnhere\pard\plain\ql\f0\fs20\cb2\cf1 {{{\pard\plain{\*\do\dobxcolumn\dobypara\dodhgt2250\dpx0\dpy0\dpxsize4500\dpysize2250\dpgroup\dpcoun t2\dpx25\dpy230\dpxsize4500\dpysize2250\dptxbx\dpx25\dpy230\dpxsize4500\dpysize2250\dpfillfgcr0\dpfi llfgcg0\dpfillfgcb0\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dpline cog0\dplinecob0{\dptxbxtext\pard\plain CHS:\u27426\u-28722to TaiWan\par}\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0}}\brdrt\brdrnone\brdrb\brdrnone\brdrl\brdrnone\brdrr\br drnone\ql\par
}}}
1 Attachment(s)
Re: RTF didn't show Unicode
After editing the rtf text manually,it shows Unicode Characters.
1.Add \ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial Unicode MS;}{\f1\fnil\fcharset134 Arial Unicode MS;}}
2. Change the unicode encoding "CHS:\u27426\u-28722to TaiWan" to "CHS:\f1\'bb\'b6\'d3\'ad\f0"
Wondering why RTF don't recognize "CHS:\u27426\u-28722to TaiWan", does it need to add some addditional tags to tell they are unicode?
2 Attachment(s)
Re: RTF didn't show Unicode
I have tried more string,some are showing right,but why some not?
{\rtf1\ansi\ansicpg1253\uc0\deff0{\fonttbl{\f0\fnil\fcharset0\fprq2 Arial;}}{\colortbl;\red0\green0\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0 Normal;}}{\info{\author }{\operator }{\title }{\subject }{\comment Athanasios Gardos - RTFWriter.dll}{\creatim\yr2011\mo5\dy14\hr22\min52}{\revtim\yr2011\mo5\dy14\hr22\min52}{\version2}{\ edmins7}{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\paperw11906\paperh16837\margl1440\margr1440 \margt1440\margb1440\widowctrl\ftnbj \sectd \pmmetafile28\endnhere\pard\plain\ql\f0\fs20\cb2\cf1 {{{\pard\plain{\b\f0\fs40 CHS:\u27426\u-28722to TaiWan}{\*\do\dobxcolumn\dobypara\dodhgt2250\dpx0\dpy0\dpxsize4500\dpysize2250\dpgroup\dpcount2\dpx25\dpy2 30\dpxsize4500\dpysize2250\dptxbx\dpx25\dpy230\dpxsize4500\dpysize2250\dpfillfgcr0\dpfillfgcg0\dpfil lfgcb0\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dplinecog0\dplineco b0{\dptxbxtext\pard\plain CHS:\u27426\u-28722to TaiWan\par}\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0}}\brdrt\brdrnone\brdrb\brdrnone\brdrl\brdrnone\brdrr\br drnone\ql\par }}}
Re: RTF didn't show Unicode
I guess I don't really understand the control words (\ucN,\uN,\upr,\ud). Who can explain those?
Re: RTF didn't show Unicode
Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.
Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:
{\rtf1\fbidis\ansi\deff0{\fonttbl
{\f0\fnil\fcharset0 Arial Unicode MS;}}
{\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\ltrpar\lang1033\f0\fs24
ARA: \rtlch\u1605?\u1600?\u1585?\u1581?\u1576?\u1600?\u1600?\u1575?\u1611?\ltrch\par
CHS: \u27426?\u-28722?\par
CHT: \u27489?\u-28722?\par
ENG: Welcome\par
GEO: \u4321?\u4304?\u4321?\u4323?\u4320?\u4309?\u4308?\u4314?\u4312?\par
GRK: \u922?\u945?\u955?\u974?\u962? \u942?\u955?\u952?\u945?\u964?\u949?\par
HEB: \rtlch\u1489?\u1512?\u1493?\u1499?\u1497?\u1501) \u1492?\u1489?\u1488?\u1497?\u1501?\ltrch\par
HIN: \u2352?\u2357?\u2366?\u2327?\u2340?\par
JPN: \u12424?\u12358?\u12371?\u12381?\par
KOR: \u-14868?\u-17164?\u-16072?\u-14700?\par
PTB: Bem-vindo\par
PUN: \u2588?\u2624? \u2566?\u2567?\u2566?\u2562? \u2600?\u2626?\u2672?\par
RUS: \u1044?\u1086?\u1073?\u1088?\u1086? \u1087?\u1086?\u1078?\u1072?\u1083?\u1086?\u1074?\u1072?\u1090?\u1100?\par
TAM: \u2949?\u2969?\u3021?\u2965?\u3007?\u2965?\u2992?\u3007?\par
THA: \u3585?\u3634?\u3619?\u3605?\u3657?\u3629?\u3609?\u3619?\u3633?\u3610?\par
URD: \u2360?\u2381?\u2357?\u2366?\u2327?\u2340?\par
VIE: tính t\u7915?\par
ARM: \u1329?\u1330?\u1331?\u1332?\u1333?\u1334?\u1335?\u1336?\u1337?\par
GER: Umlaute A\u776?I\u776?O\u776?\par
}
Re: RTF didn't show Unicode
Conversion Function Unicode to RTF:
Code:
Public Function Uni_RTF(ByVal sText As String) As String
Dim lLen As Long
Dim i As Long
Dim sChar As String
Dim lChar As Integer
lLen = Len(sText)
If lLen Then
For i = 1 To lLen
sChar = Mid$(sText, i, 1)
lChar = AscW(sChar)
Select Case lChar
Case 0 To 127
Uni_RTF = Uni_RTF & sChar
Case Else
Uni_RTF = Uni_RTF & "\u" & lChar & "?"
End Select
Next
Uni_RTF = Replace$(Uni_RTF, vbNewLine, "\par" & vbNewLine)
End If
End Function
Re: RTF didn't show Unicode
Quote:
Originally Posted by
DrUnicode
Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.
Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:
{\rtf1\fbidis\ansi\deff0{\fonttbl
{\f0\fnil\fcharset0 Arial Unicode MS;}}
{\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\ltrpar\lang1033\f0\fs24
ARA: \rtlch\u1605?\u1600?\u1585?\u1581?\u1576?\u1600?\u1600?\u1575?\u1611?\ltrch\par
CHS: \u27426?\u-28722?\par
CHT: \u27489?\u-28722?\par
ENG: Welcome\par
GEO: \u4321?\u4304?\u4321?\u4323?\u4320?\u4309?\u4308?\u4314?\u4312?\par
GRK: \u922?\u945?\u955?\u974?\u962? \u942?\u955?\u952?\u945?\u964?\u949?\par
HEB: \rtlch\u1489?\u1512?\u1493?\u1499?\u1497?\u1501) \u1492?\u1489?\u1488?\u1497?\u1501?\ltrch\par
HIN: \u2352?\u2357?\u2366?\u2327?\u2340?\par
JPN: \u12424?\u12358?\u12371?\u12381?\par
KOR: \u-14868?\u-17164?\u-16072?\u-14700?\par
PTB: Bem-vindo\par
PUN: \u2588?\u2624? \u2566?\u2567?\u2566?\u2562? \u2600?\u2626?\u2672?\par
RUS: \u1044?\u1086?\u1073?\u1088?\u1086? \u1087?\u1086?\u1078?\u1072?\u1083?\u1086?\u1074?\u1072?\u1090?\u1100?\par
TAM: \u2949?\u2969?\u3021?\u2965?\u3007?\u2965?\u2992?\u3007?\par
THA: \u3585?\u3634?\u3619?\u3605?\u3657?\u3629?\u3609?\u3619?\u3633?\u3610?\par
URD: \u2360?\u2381?\u2357?\u2366?\u2327?\u2340?\par
VIE: tính t\u7915?\par
ARM: \u1329?\u1330?\u1331?\u1332?\u1333?\u1334?\u1335?\u1336?\u1337?\par
GER: Umlaute A\u776?I\u776?O\u776?\par
}
Remove all "?" then change RTF Header "\uc1" to "\uc0",all unicodes show right also.
I don't not really understand the exact meaning of those control words such as \ucN,\uN,\upr,\ud. Might the above my RTF text got problem is due to those control words? (In the document of RTF specification,there're no examples.)
Edited:
After trail,\uc2 means the MDBS character is two bytes per character. e.g. \uc2\u27426\'bb\'b6 欢 Charset134 \'bb\'b6,Unicode \u27426
Re: RTF didn't show Unicode
The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character.
http://en.wikipedia.org/wiki/Rich_Text_Format
If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.
With the "?" you have an indication that your Font doesn't have that character.
1 Attachment(s)
Re: RTF didn't show Unicode
Quote:
Originally Posted by
DrUnicode
The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character.
http://en.wikipedia.org/wiki/Rich_Text_Format
If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.
With the "?" you have an indication that your Font doesn't have that character.
At English environment (lcid=1033),I am typing or coping some Unicodes into MSWord,then save as RTF format. I change this file as .txt then opened by Notepad, I am seeing there're lots of charsetXXX (fcharset124,fcharset128,fcharset186...),Wondering why MSWord automatically know the charset,default font name,\uxxxx and \'xx\'xx? Are there any table corresponding to Unicode blocks?Any Algorithm in vb6?
Re: RTF didn't show Unicode
Yes, there are tables for block ranges.
You can get them from my Tutorial at http://cyberactivex.com/UnicodeTutor...#UnicodeBlocks
Also a utility I use here often is BabelMap at http://www.babelstone.co.uk/Software/BabelMap.html
So how does RichEdit control decide what charset to use?
RichEdit uses Uniscribe DLL (usp10.dll).
ScriptGetProperties:
http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx
SCRIPT_PROPERTIES Structure
typedef struct {
DWORD langid :16;
DWORD fNumeric :1;
DWORD fComplex :1;
DWORD fNeedsWordBreaking :1;
DWORD fNeedsCaretInfo :1;
DWORD bCharSet :8;
DWORD fControl :1;
DWORD fPrivateUseArea :1;
DWORD fNeedsCharacterJustify :1;
DWORD fInvalidGlyph :1;
DWORD fInvalidLogAttr :1;
DWORD fCDM :1;
DWORD fAmbiguousCharSet :1;
DWORD fClusterSizeVaries :1;
DWORD fRejectInvalid :1;
} SCRIPT_PROPERTIES;
I doubt that you will find much in the way of source code calling usp10.dll functions from Vb6. In addition you have to go through hoops and make wrappers so that they are Vb friendly.
'-----
If you find everything you need in "Arial Unicode MS" there is probably no need to even mess with codepages.
Re: RTF didn't show Unicode
Source code for using Usp10.dll from vb6:
http://blogs.msdn.com/b/michkap/arch...12/628714.aspx
This code is part of what is in "Internationalization with Visual Basic" by Michael S. Kaplan - http://i18nwithvb.com/
The book is out of print and you may have a hard time finding a copy.