Results 1 to 11 of 11

Thread: [RESOLVED] RTF didn't show Unicode

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Resolved [RESOLVED] RTF didn't show Unicode

    I make a simple RTF writer to draw a textbox and put Chinese characters into Textbox,but it can't show Chinese characters,only English ANSI string.

    CHS:\u27426\u-28722to TaiWan

    It supposes to show "CHS:欢迎to Taiwan",but only shows "CHS:to Taiwan".
    According to RTF specifications 1.6 above,what is the wrong with below rtf text? Why it cut \u27426\u-28722 by missing some tags?

    {{\rtf1\ansi\ansicpg1253\uc0\deff0{\fonttbl{\f0\fnil\fcharset0\fprq2 Arial;}}{\colortbl;\red0\green0\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0 Normal;}}{\info{\author }{\operator }{\title }{\subject }{\comment Athanasios Gardos - RTFWriter.dll}{\creatim\yr2011\mo5\dy14\hr20\min26}{\revtim\yr2011\mo5\dy14\hr20\min26}{\version2}{\ edmins7}{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\paperw11906\paperh16837\margl1440\margr1440 \margt1440\margb1440\widowctrl\ftnbj \sectd \pmmetafile28\endnhere\pard\plain\ql\f0\fs20\cb2\cf1 {{{\pard\plain{\*\do\dobxcolumn\dobypara\dodhgt2250\dpx0\dpy0\dpxsize4500\dpysize2250\dpgroup\dpcoun t2\dpx25\dpy230\dpxsize4500\dpysize2250\dptxbx\dpx25\dpy230\dpxsize4500\dpysize2250\dpfillfgcr0\dpfi llfgcg0\dpfillfgcb0\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dpline cog0\dplinecob0{\dptxbxtext\pard\plain CHS:\u27426\u-28722to TaiWan\par}\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0}}\brdrt\brdrnone\brdrb\brdrnone\brdrl\brdrnone\brdrr\br drnone\ql\par
    }}}
    Attached Images Attached Images  
    Attached Files Attached Files
    Last edited by Jonney; May 14th, 2011 at 07:33 AM.

  2. #2

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Re: RTF didn't show Unicode

    After editing the rtf text manually,it shows Unicode Characters.

    1.Add \ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial Unicode MS;}{\f1\fnil\fcharset134 Arial Unicode MS;}}

    2. Change the unicode encoding "CHS:\u27426\u-28722to TaiWan" to "CHS:\f1\'bb\'b6\'d3\'ad\f0"

    Wondering why RTF don't recognize "CHS:\u27426\u-28722to TaiWan", does it need to add some addditional tags to tell they are unicode?
    Attached Files Attached Files
    Last edited by Jonney; May 14th, 2011 at 08:30 AM.

  3. #3

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Re: RTF didn't show Unicode

    I have tried more string,some are showing right,but why some not?

    {\rtf1\ansi\ansicpg1253\uc0\deff0{\fonttbl{\f0\fnil\fcharset0\fprq2 Arial;}}{\colortbl;\red0\green0\blue0;\red255\green255\blue255;}{\stylesheet{\fs20 \snext0 Normal;}}{\info{\author }{\operator }{\title }{\subject }{\comment Athanasios Gardos - RTFWriter.dll}{\creatim\yr2011\mo5\dy14\hr22\min52}{\revtim\yr2011\mo5\dy14\hr22\min52}{\version2}{\ edmins7}{\nofpages1}{\nofwords0}{\nofchars0}{\vern8351}}\paperw11906\paperh16837\margl1440\margr1440 \margt1440\margb1440\widowctrl\ftnbj \sectd \pmmetafile28\endnhere\pard\plain\ql\f0\fs20\cb2\cf1 {{{\pard\plain{\b\f0\fs40 CHS:\u27426\u-28722to TaiWan}{\*\do\dobxcolumn\dobypara\dodhgt2250\dpx0\dpy0\dpxsize4500\dpysize2250\dpgroup\dpcount2\dpx25\dpy2 30\dpxsize4500\dpysize2250\dptxbx\dpx25\dpy230\dpxsize4500\dpysize2250\dpfillfgcr0\dpfillfgcg0\dpfil lfgcb0\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dplinecog0\dplineco b0{\dptxbxtext\pard\plain CHS:\u27426\u-28722to TaiWan\par}\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0}}\brdrt\brdrnone\brdrb\brdrnone\brdrl\brdrnone\brdrr\br drnone\ql\par }}}
    Attached Images Attached Images  
    Attached Files Attached Files

  4. #4

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Re: RTF didn't show Unicode

    I guess I don't really understand the control words (\ucN,\uN,\upr,\ud). Who can explain those?

  5. #5
    Fanatic Member DrUnicode's Avatar
    Join Date
    Mar 2008
    Location
    Natal, Brazil
    Posts
    631

    Re: RTF didn't show Unicode

    Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
    You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
    Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
    ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.

    Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:

    {\rtf1\fbidis\ansi\deff0{\fonttbl
    {\f0\fnil\fcharset0 Arial Unicode MS;}}
    {\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\ltrpar\lang1033\f0\fs24
    ARA: \rtlch\u1605?\u1600?\u1585?\u1581?\u1576?\u1600?\u1600?\u1575?\u1611?\ltrch\par
    CHS: \u27426?\u-28722?\par
    CHT: \u27489?\u-28722?\par
    ENG: Welcome\par
    GEO: \u4321?\u4304?\u4321?\u4323?\u4320?\u4309?\u4308?\u4314?\u4312?\par
    GRK: \u922?\u945?\u955?\u974?\u962? \u942?\u955?\u952?\u945?\u964?\u949?\par
    HEB: \rtlch\u1489?\u1512?\u1493?\u1499?\u1497?\u1501) \u1492?\u1489?\u1488?\u1497?\u1501?\ltrch\par
    HIN: \u2352?\u2357?\u2366?\u2327?\u2340?\par
    JPN: \u12424?\u12358?\u12371?\u12381?\par
    KOR: \u-14868?\u-17164?\u-16072?\u-14700?\par
    PTB: Bem-vindo\par
    PUN: \u2588?\u2624? \u2566?\u2567?\u2566?\u2562? \u2600?\u2626?\u2672?\par
    RUS: \u1044?\u1086?\u1073?\u1088?\u1086? \u1087?\u1086?\u1078?\u1072?\u1083?\u1086?\u1074?\u1072?\u1090?\u1100?\par
    TAM: \u2949?\u2969?\u3021?\u2965?\u3007?\u2965?\u2992?\u3007?\par
    THA: \u3585?\u3634?\u3619?\u3605?\u3657?\u3629?\u3609?\u3619?\u3633?\u3610?\par
    URD: \u2360?\u2381?\u2357?\u2366?\u2327?\u2340?\par
    VIE: tính t\u7915?\par
    ARM: \u1329?\u1330?\u1331?\u1332?\u1333?\u1334?\u1335?\u1336?\u1337?\par
    GER: Umlaute A\u776?I\u776?O\u776?\par
    }
    Last edited by DrUnicode; May 14th, 2011 at 10:59 PM.

  6. #6
    Fanatic Member DrUnicode's Avatar
    Join Date
    Mar 2008
    Location
    Natal, Brazil
    Posts
    631

    Re: RTF didn't show Unicode

    Conversion Function Unicode to RTF:

    Code:
    Public Function Uni_RTF(ByVal sText As String) As String
       Dim lLen             As Long
       Dim i                As Long
       Dim sChar            As String
       Dim lChar            As Integer
    
       lLen = Len(sText)
    
       If lLen Then
          For i = 1 To lLen
             sChar = Mid$(sText, i, 1)
             lChar = AscW(sChar)
             Select Case lChar
                Case 0 To 127
                   Uni_RTF = Uni_RTF & sChar
                Case Else
                   Uni_RTF = Uni_RTF & "\u" & lChar & "?"
             End Select
          Next
          
          Uni_RTF = Replace$(Uni_RTF, vbNewLine, "\par" & vbNewLine)
    
       End If
    End Function

  7. #7

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Re: RTF didn't show Unicode

    Quote Originally Posted by DrUnicode View Post
    Each piece of RTF text needs a corresponding Font + charset defined in the fonttbl each time there is a language change.
    You can circumvent this by using default font "Arial Unicode MS" and fcharset0 to indicate that we are Unicode and not some other codepage.
    Since you are inserting Rtf you can skip the MBCS convention and go exclusively with "\uNNNN?", where NNNN is the integer value in decimal.
    ASCII Values < 128 can be left intact. You need to convert 128-255 to Unicode also so that any diacritics are properly encoded.

    Here is an example of a multilanguage rtf using the above method. You can load this into Vb6 RichTextBox TextRTF Property and it should properly display all your strings:

    {\rtf1\fbidis\ansi\deff0{\fonttbl
    {\f0\fnil\fcharset0 Arial Unicode MS;}}
    {\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\ltrpar\lang1033\f0\fs24
    ARA: \rtlch\u1605?\u1600?\u1585?\u1581?\u1576?\u1600?\u1600?\u1575?\u1611?\ltrch\par
    CHS: \u27426?\u-28722?\par
    CHT: \u27489?\u-28722?\par
    ENG: Welcome\par
    GEO: \u4321?\u4304?\u4321?\u4323?\u4320?\u4309?\u4308?\u4314?\u4312?\par
    GRK: \u922?\u945?\u955?\u974?\u962? \u942?\u955?\u952?\u945?\u964?\u949?\par
    HEB: \rtlch\u1489?\u1512?\u1493?\u1499?\u1497?\u1501) \u1492?\u1489?\u1488?\u1497?\u1501?\ltrch\par
    HIN: \u2352?\u2357?\u2366?\u2327?\u2340?\par
    JPN: \u12424?\u12358?\u12371?\u12381?\par
    KOR: \u-14868?\u-17164?\u-16072?\u-14700?\par
    PTB: Bem-vindo\par
    PUN: \u2588?\u2624? \u2566?\u2567?\u2566?\u2562? \u2600?\u2626?\u2672?\par
    RUS: \u1044?\u1086?\u1073?\u1088?\u1086? \u1087?\u1086?\u1078?\u1072?\u1083?\u1086?\u1074?\u1072?\u1090?\u1100?\par
    TAM: \u2949?\u2969?\u3021?\u2965?\u3007?\u2965?\u2992?\u3007?\par
    THA: \u3585?\u3634?\u3619?\u3605?\u3657?\u3629?\u3609?\u3619?\u3633?\u3610?\par
    URD: \u2360?\u2381?\u2357?\u2366?\u2327?\u2340?\par
    VIE: t&#237;nh t\u7915?\par
    ARM: \u1329?\u1330?\u1331?\u1332?\u1333?\u1334?\u1335?\u1336?\u1337?\par
    GER: Umlaute A\u776?I\u776?O\u776?\par
    }
    Remove all "?" then change RTF Header "\uc1" to "\uc0",all unicodes show right also.
    I don't not really understand the exact meaning of those control words such as \ucN,\uN,\upr,\ud. Might the above my RTF text got problem is due to those control words? (In the document of RTF specification,there're no examples.)
    Edited:
    After trail,\uc2 means the MDBS character is two bytes per character. e.g. \uc2\u27426\'bb\'b6 欢 Charset134 \'bb\'b6,Unicode \u27426
    Last edited by Jonney; May 15th, 2011 at 09:12 AM.

  8. #8
    Fanatic Member DrUnicode's Avatar
    Join Date
    Mar 2008
    Location
    Natal, Brazil
    Posts
    631

    Re: RTF didn't show Unicode

    The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character.
    http://en.wikipedia.org/wiki/Rich_Text_Format

    If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.

    With the "?" you have an indication that your Font doesn't have that character.

  9. #9

    Thread Starter
    Frenzied Member
    Join Date
    Jan 2010
    Posts
    1,103

    Re: RTF didn't show Unicode

    Quote Originally Posted by DrUnicode View Post
    The control word \uc0 can be used to indicate that subsequent Unicode escape sequences within the current group do not specify a substitution character.
    http://en.wikipedia.org/wiki/Rich_Text_Format

    If you remove the "?" from "\uNNNN?" and use "\uc0" doesn't that mean that it will print nothing if the Font does not have the glyph requested.

    With the "?" you have an indication that your Font doesn't have that character.
    At English environment (lcid=1033),I am typing or coping some Unicodes into MSWord,then save as RTF format. I change this file as .txt then opened by Notepad, I am seeing there're lots of charsetXXX (fcharset124,fcharset128,fcharset186...),Wondering why MSWord automatically know the charset,default font name,\uxxxx and \'xx\'xx? Are there any table corresponding to Unicode blocks?Any Algorithm in vb6?
    Attached Files Attached Files

  10. #10
    Fanatic Member DrUnicode's Avatar
    Join Date
    Mar 2008
    Location
    Natal, Brazil
    Posts
    631

    Re: RTF didn't show Unicode

    Yes, there are tables for block ranges.
    You can get them from my Tutorial at http://cyberactivex.com/UnicodeTutor...#UnicodeBlocks

    Also a utility I use here often is BabelMap at http://www.babelstone.co.uk/Software/BabelMap.html

    So how does RichEdit control decide what charset to use?
    RichEdit uses Uniscribe DLL (usp10.dll).
    ScriptGetProperties:
    http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx
    SCRIPT_PROPERTIES Structure
    typedef struct {
    DWORD langid :16;
    DWORD fNumeric :1;
    DWORD fComplex :1;
    DWORD fNeedsWordBreaking :1;
    DWORD fNeedsCaretInfo :1;
    DWORD bCharSet :8;
    DWORD fControl :1;
    DWORD fPrivateUseArea :1;
    DWORD fNeedsCharacterJustify :1;
    DWORD fInvalidGlyph :1;
    DWORD fInvalidLogAttr :1;
    DWORD fCDM :1;
    DWORD fAmbiguousCharSet :1;
    DWORD fClusterSizeVaries :1;
    DWORD fRejectInvalid :1;
    } SCRIPT_PROPERTIES;

    I doubt that you will find much in the way of source code calling usp10.dll functions from Vb6. In addition you have to go through hoops and make wrappers so that they are Vb friendly.

    '-----

    If you find everything you need in "Arial Unicode MS" there is probably no need to even mess with codepages.

  11. #11
    Fanatic Member DrUnicode's Avatar
    Join Date
    Mar 2008
    Location
    Natal, Brazil
    Posts
    631

    Re: RTF didn't show Unicode

    Source code for using Usp10.dll from vb6:
    http://blogs.msdn.com/b/michkap/arch...12/628714.aspx
    This code is part of what is in "Internationalization with Visual Basic" by Michael S. Kaplan - http://i18nwithvb.com/
    The book is out of print and you may have a hard time finding a copy.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width