Results 1 to 10 of 10

Thread: UTF Unicode (Urdu language Database)

  1. #1

    Thread Starter
    New Member
    Join Date
    May 2006
    Posts
    5

    UTF Unicode (Urdu language Database)

    hi
    i am new user.
    i download urdu keyboard from web. that keyboard name is "Phonatic" for Windows XP. With the help of that keyboad we will type urdu in Word Excel.

    i am facing problem,that there is a database in excel (urdu),

    i copy data into notepad, and save as Encoding =UTF-8.
    then Interst Richtext box in vb6 exe application, load txt file into it. every thing is ok. file open and show all data correctly

    then i copy all the data (including urdu fileds) into Access, the urdu filed looks fine in MS Access.

    But when i try to read that data through Recordset or processing throgh VB normal string functions i did not get proper results, when ever i try to read the data i find ? ? ? ? instead of data.

    Please any one now how to read utf data through recordset or any way to process in vb6.

    Regards
    Manzer
    Last edited by manzerehsan; May 26th, 2006 at 02:32 AM.

  2. #2
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: UTF Unicode (Urdu language Database)

    RichTextBox in VB6 is not Unicode-aware. Basically, when you try to access the RTB, the data is converted to ANSI and then given to your VB6 program. Same when you set something to RTB: Unicode (VB6 strings are UTF-16) is converted to ANSI which is then again converted to Unicode in the RTB.

    This weird thing happens due to historical reasons: original Visual Basic wasn't compatible with Unicode, neither was Windows. So for backwards compatibility there is a lot of ANSI conversions going on.

    So, do not load the data into a textbox. None of the native VB6 controls is Unicode aware. Instead, open the textfile in binary mode into a byte array and then convert this byte array's contents to what you need. To help with the internal UTF-8 to Unicode conversion I've a module that can be very helpful. It lets you convert any codepage to Unicode and vice versa.

    VB Code:
    1. Private Function OpenUTF8toStr(ByVal Filename As String) As String
    2.     Dim FN As Byte, barBuffer() As Byte
    3.     ' check if the file exists
    4.     If LenB(Dir$(Filename)) = 0 Then Exit Function
    5.     ' open file for reading
    6.     FN = FreeFile
    7.     Open Filename For Binary Access Read As #FN
    8.     ' resize byte array
    9.     ReDim barBuffer(LOF(FN) - 3)
    10.     ' read file contents
    11.     ' we skip the two first bytes (Byte Order Mark aka BOM)
    12.     Get #FN, 3, barBuffer
    13.     Close #FN
    14.     ' convert to UTF-16 and to string and return
    15.     OpenUTF8toStr = CStr(ANSItoUTF16(barBuffer, CP_UTF8))
    16.     ' clear memory
    17.     Erase barBuffer
    18. End Function

    One last note: this module is still a bit of a work-in-progress. It works, but it isn't optimized or cleaned up.
    Attached Files Attached Files

  3. #3

    Thread Starter
    New Member
    Join Date
    May 2006
    Posts
    5

    Re: UTF Unicode (Urdu language Database)

    Thanks Merri,

    I did not try your solution but I get the idea, My database is in excel and total fields are 18, one of the field is in Urdu language. When I convert excel to access , the process goes ok or smoothly.

    With your solution I must convert all my file to text format or open as binary format. Then I must change all the internal coding according to binary code handling. ? ? ?

    I need a solution which handles the access database Unicode or UTF fileds. I need to open Unicode field with all other English language fields.

    Thanks again.

    waiting for your response.


    regards
    Manzer

  4. #4
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: UTF Unicode (Urdu language Database)

    I don't know how you get data from Excel or Access in VB, but they should be Unicode-aware and thus useable in VB. So you can actually access them directly. The problem is displaying the data as native VB controls don't support Unicode. Excel or Access objects within VB should support Unicode though without doing the ANSI conversion.

    If you want to check that the data you possess in a string is Unicode, copy data to a string from a cell that has Unicode characters and then do MsgBox AscW(String) to see the character code of the first character. If it is above 255 then the data is internally in the format you want it to be.

    If when saving data the data gets converted to ANSI, then you could use the conversion functions to change data to UTF-8. Because of the automatical ANSI conversion, you can do this kind of trick when saving data:

    VB Code:
    1. StringToSave = StrConv(UTF16toANSI(barTemp, CP_UTF8), vbUnicode)

    StrConv basically adds zeros to every other byte.


    You don't really need to change the internal way you process the data. You only need to make sure that automatical ANSI conversions and such don't occur whenever you process data.


    (Might be a bit confusing post, I've been awake quite a while).

  5. #5

    Thread Starter
    New Member
    Join Date
    May 2006
    Posts
    5

    Re: UTF Unicode (Urdu language Database)

    I describe my problem in pictrorial form.

    Please find attached file “Unicode Issue.doc”

    Regards,
    Manzer
    Attached Files Attached Files

  6. #6
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: UTF Unicode (Urdu language Database)

    You're trying to show Unicode in a control that does not support Unicode.


    Solution 1: try to change RichTextBox1.Font.Charset to one of the following:
    VB Code:
    1. Global Const ANSI_CHARSET = 0
    2. Global Const SHIFTJIS_CHARSET = 128
    3. Global Const CHINESEBIG5_CHARSET = 136
    4. Global Const HANGEUL_CHARSET = 129
    5. Global Const GB2312_CHARSET = 134
    6. Global Const DEFAULT_CHARSET = 1
    7. Global Const SYMBOL_CHARSET = 2
    8. Global Const OEM_CHARSET = 255
    9. Global Const JOHAB_CHARSET = 130
    10. Global Const HEBREW_CHARSET = 177
    11. Global Const ARABIC_CHARSET = 178
    12. Global Const GREEK_CHARSET = 161
    13. Global Const TURKISH_CHARSET = 162
    14. Global Const VIETNAMESE_CHARSET = 163
    15. Global Const THAI_CHARSET = 222
    16. Global Const EASTEUROPE_CHARSET = 238
    17. Global Const RUSSIAN_CHARSET = 204
    18. Global Const MAC_CHARSET = 77
    19. Global Const BALTIC_CHARSET = 186

    I don't know if you can have Urdu without Unicode support (changing Charset only changes the ANSI/DBCS support mode). Please note that even if you get it to show the characters, you're only able to show that one specific language. For example, you can't have Urdu and say, Chinese, together.


    Solution 2: use Unicode controls. I've made a few myself: a label, a command button and a listbox. I also have a component that is able to show Unicode in the menus.

    UniLabel - UniCommand - UniListBox - UniMenu

    I really think you need to understand Unicode in VB6 better than you do now. It takes time to understand it, but it may be worth the time it takes. Here is something that can get you started:
    Unicode info on VBForums FAQ forum
    A ton of information about Unicode in VB6

  7. #7

    Thread Starter
    New Member
    Join Date
    May 2006
    Posts
    5

    Re: UTF Unicode (Urdu language Database)

    I now that I am not a expert in VB and this is my first Unicode program. Learning Unicode will take some time.

    Please make a sample code for my problem.

    This is the database in MS Access

    Table:Test

    IDs Name Number
    1 علی 55
    2 اکبر 22
    3 محمود 33


    Please make a vb code for above table.
    To read data from “Test” table show in to text box (normal or rich) or put into string which can be saved as text file.

    This is will great help to me.

    Regards
    Manzer

  8. #8
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: UTF Unicode (Urdu language Database)

    This allows you to save as UTF-8. Paste the code in the next post into a new empty module and give it a name (modCharset or similar).

    Sample usage:
    VB Code:
    1. Option Explicit
    2.  
    3. Private Sub Form_Load()
    4.     Dim strTemp As String, blnBOM As Boolean
    5.     If Not SaveAsUTF8("C:\test.txt", "ÅÄÖ", True) Then MsgBox "Saving failed": Exit Sub
    6.     If Not OpenFromUTF8("C:\test.txt", strTemp, blnBOM) Then MsgBox "Open failed": Exit Sub
    7.     MsgBox strTemp, , "Byte Order Mark: " & blnBOM
    8. End Sub

    You can just directly save what you've gotten from the DB to the file, it is Unicode.

  9. #9
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: UTF Unicode (Urdu language Database)

    Code:
    Option Explicit
    
    Public Enum WINCODEPAGE
        CP_UNKNOWN = -1
        CP_ACP = 0
        CP_OEMCP = 1
        CP_MACCP = 2
        CP_THREAD_ACP = 3
        CP_SYMBOL = 42
        CP_AWIN = 101
        CP_709 = 102
        CP_720 = 103
        CP_A708 = 104
        CP_A449 = 105
        CP_TARB = 106
        CP_NAE = 107
        CP_V4 = 108
        CP_MA2 = 109
        CP_I864 = 110
        CP_A437 = 111
        CP_AMAC = 112
        CP_HWIN = 201
        CP_862I = 202
        CP_7BIT = 203
        CP_ISO = 204
        CP_H437 = 205
        CP_HMAC = 206
        CP_OEM_437 = 437
        CP_ARABICDOS = 708
        CP_DOS720 = 720
        CP_DOS737 = 737
        CP_DOS775 = 775
        CP_IBM850 = 850
        CP_IBM852 = 852
        CP_DOS861 = 861
        CP_DOS862 = 862
        CP_IBM866 = 866
        CP_DOS869 = 869
        CP_THAI = 874
        CP_EBCDIC = 875
        CP_JAPAN = 932
        CP_CHINA = 936
        CP_KOREA = 949
        CP_TAIWAN = 950
        CP_UNICODELITTLE = 1200
        CP_UNICODEBIG = 1201
        CP_EASTEUROPE = 1250
        CP_RUSSIAN = 1251
        CP_WESTEUROPE = 1252
        CP_GREEK = 1253
        CP_TURKISH = 1254
        CP_HEBREW = 1255
        CP_ARABIC = 1256
        CP_BALTIC = 1257
        CP_VIETNAMESE = 1258
        CP_JOHAB = 1361
        CP_MAC_ROMAN = 10000
        CP_MAC_JAPAN = 10001
        CP_MAC_ARABIC = 10004
        CP_MAC_GREEK = 10006
        CP_MAC_CYRILLIC = 10007
        CP_MAC_LATIN2 = 10029
        CP_MAC_TURKISH = 10081
        CP_CHINESECNS = 20000
        CP_CHINESEETEN = 20002
        CP_IA5WEST = 20105
        CP_IA5GERMAN = 20106
        CP_IA5SWEDISH = 20107
        CP_IA5NORWEGIAN = 20108
        CP_ASCII = 20127
        CP_RUSSIANKOI8R = 20866
        CP_RUSSIANKOI8U = 21866
        CP_ISOLATIN1 = 28591
        CP_ISOEASTEUROPE = 28592
        CP_ISOTURKISH = 28593
        CP_ISOBALTIC = 28594
        CP_ISORUSSIAN = 28595
        CP_ISOARABIC = 28596
        CP_ISOGREEK = 28597
        CP_ISOHEBREW = 28598
        CP_ISOTURKISH2 = 28599
        CP_ISOLATIN9 = 28605
        CP_HEBREWLOG = 38598
        CP_JAPANNHK = 50220
        CP_JAPANESC = 50221
        CP_JAPANISO = 50222
        CP_KOREAISO = 50225
        CP_TAIWANISO = 50227
        CP_CHINAISO = 50229
        CP_JAPANEUC = 51932
        CP_CHINAEUC = 51936
        CP_KOREAEUC = 51949
        CP_TAIWANEUC = 51950
        CP_CHINAHZ = 52936
        CP_GB18030 = 54936
        CP_UTF7 = 65000
        CP_UTF8 = 65001
    End Enum
    
    Public Declare Function GetACP Lib "kernel32" () As Long
    Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
    Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
    ' ANSI/DBCS/UTF-8 byte array to Unicode string
    Public Function CBarToUStr(ByRef Text() As Byte, Optional ByVal cPage As WINCODEPAGE = CP_UNKNOWN, Optional lFlags As Long, Optional bValidate As Boolean = True) As String
        Static barTemp() As Byte, barNew() As Byte
        Dim lngStrLen As Long, lngNewLen As Long
        ' array initialized?
        If (Not Text) = -1 Then Exit Function
        ' get string length
        lngStrLen = UBound(Text) + 1
        If lngStrLen = 0 Then Exit Function
        ' we have to add one null character to the end to have valid input for the conversion function...
        barTemp = Text
        ReDim Preserve barTemp(lngStrLen)
        ' validate/autodetect character set?
        If bValidate Then
            If IsUTF8(Text) Then
                ' it is UTF-8!
                cPage = CP_UTF8
            Else
                Select Case cPage
                    Case CP_UNKNOWN, CP_UTF8
                        ' use default character set
                        cPage = GetACP
                End Select
            End If
        End If
        ' change size of new string
        lngNewLen = lngStrLen * 2
        ReDim Preserve barNew(lngNewLen + 1)
        ' get new string
        lngNewLen = (2 * MultiByteToWideChar(CLng(cPage), lFlags, ByVal VarPtr(barTemp(0)), lngStrLen, ByVal VarPtr(barNew(0)), lngNewLen)) - 1
        ' check string length
        Select Case lngNewLen
            Case Is < 2
                Exit Function
            Case Is < UBound(barNew)
                ReDim Preserve barNew(lngNewLen)
        End Select
        ' output result
        CBarToUStr = barNew
    End Function
    ' Validate contents of a byte array as UTF-8
    Public Function IsUTF8(ByRef bytArray() As Byte, Optional ByVal lngReadSize As Long = 2048) As Boolean
        Dim lngArraySize As Long, lngReadPosition As Long, lngUtf8ByteSize As Long, lngIsUtf8 As Long
        Dim i As Long
     
        If lngReadSize < 0 Then Exit Function
        If (Not bytArray) = -1 Then Exit Function
        lngArraySize = UBound(bytArray) + 1
        If lngReadSize > lngArraySize Then lngReadSize = lngArraySize
        
        Do While lngReadPosition < lngReadSize
            If bytArray(lngReadPosition) <= &H7F Then
                lngReadPosition = lngReadPosition + 1
            ElseIf bytArray(lngReadPosition) < &HC0 Then
                Exit Function
            ElseIf (bytArray(lngReadPosition) >= &HC0) And (bytArray(lngReadPosition) <= &HFD) Then
                If (bytArray(lngReadPosition) And &HFC) = &HFC Then
                    lngUtf8ByteSize = 5
                ElseIf (bytArray(lngReadPosition) And &HF8) = &HF8 Then
                    lngUtf8ByteSize = 4
                ElseIf (bytArray(lngReadPosition) And &HF0) = &HF0 Then
                    lngUtf8ByteSize = 3
                ElseIf (bytArray(lngReadPosition) And &HE0) = &HE0 Then
                    lngUtf8ByteSize = 2
                ElseIf (bytArray(lngReadPosition) And &HC0) = &HC0 Then
                    lngUtf8ByteSize = 1
                End If
                If (lngReadPosition + lngUtf8ByteSize) >= lngReadSize Then Exit Do
                For i = (lngReadPosition + 1) To (lngReadPosition + lngUtf8ByteSize) Step 1
                    If Not ((bytArray(i) >= &H80) And (bytArray(i) <= &HBF)) Then Exit Function
                Next i
                lngIsUtf8 = lngIsUtf8 + 1
                lngReadPosition = lngReadPosition + lngUtf8ByteSize + 1
            Else
                lngReadPosition = lngReadPosition + 1
            End If
        Loop
        IsUTF8 = lngIsUtf8 > 0
    End Function
    ' Open to string from UTF-8 text file
    Public Function OpenFromUTF8(ByVal Filename As String, ByRef Text As String, Optional ByRef BOM As Boolean) As Boolean
        Dim barUTF8() As Byte, barBOM(2) As Byte, blnBOM As Boolean
        Dim lngFN As Long, lngFilesize As Long
        ' make sure the filename exists
        If LenB(Filename) = 0 Then Exit Function
        If LenB(Dir$(Filename)) = 0 Then Exit Function
        ' any size?
        lngFilesize = FileLen(Filename)
        ' if no size then just return a null stirng
        If lngFilesize = 0 Then Text = vbNullString: OpenFromUTF8 = True: Exit Function
        ' get free file
        lngFN = FreeFile
        ' open file
        Open Filename For Binary Access Read Lock Write As #lngFN
            If lngFilesize > 2 Then
                ' check for BOM
                Get #lngFN, , barBOM
                ' validate BOM
                blnBOM = ((barBOM(0) = &HEF) And (barBOM(1) = &HBB) And (barBOM(2) = &HBF))
                If blnBOM Then
                    lngFilesize = lngFilesize - 3
                Else
                    Seek #lngFN, 1
                End If
            End If
            ' see if anything to read
            If lngFilesize > 0 Then
                ' resize buffer and read file
                ReDim barUTF8(lngFilesize - 1)
                Get #lngFN, , barUTF8
            End If
        Close #lngFN
        ' return values...
        If Not IsMissing(BOM) Then BOM = blnBOM
        ' convert UTF-8 to a Unicode string
        Text = CBarToUStr(barUTF8, CP_UTF8)
        ' return true
        OpenFromUTF8 = True
    End Function
    ' Save string as UTF-8 text file
    Public Function SaveAsUTF8(ByVal Filename As String, ByRef Text As String, Optional ByVal BOM As Boolean = True) As Boolean
        Dim barUTF8() As Byte, barBOM(2) As Byte
        Dim lngFN As Long
        ' validate filename and text
        If LenB(Filename) = 0 Then Exit Function
        If LenB(Text) = 0 Then Exit Function
        ' remove the existing file if it exists
        If LenB(Dir$(Filename)) > 0 Then Kill Filename
        ' convert to byte array
        barUTF8 = UStrToCBar(Text, CP_UTF8, , False)
        ' set byte order mark (BOM)
        barBOM(0) = &HEF
        barBOM(1) = &HBB
        barBOM(2) = &HBF
        ' get free file
        lngFN = FreeFile
        ' open file for writing
        Open Filename For Binary Access Write As #lngFN
            ' save BOM and UTF-8 data
            If BOM Then Put #lngFN, , barBOM
            Put #lngFN, , barUTF8
        Close #lngFN
        ' return true
        SaveAsUTF8 = True
    End Function
    ' Unicode string to ANSI/DBCS/UTF-8 byte array
    Public Function UStrToCBar(ByRef Text As String, Optional ByVal cPage As WINCODEPAGE = CP_UTF8, Optional lFlags As Long, Optional bValidate As Boolean = True) As Byte()
        Static barNew() As Byte
        Dim lngStrLen As Long, lngNewLen As Long
        ' check length
        lngStrLen = LenB(Text)
        If lngStrLen = 0 Then Exit Function
        ' validate character set?
        If bValidate Then
            Select Case cPage
                Case CP_UNKNOWN
                    cPage = GetACP
            End Select
        End If
        ' reserve enough space to the new array
        lngNewLen = lngStrLen * 2
        ReDim Preserve barNew(lngNewLen)
        ' convert from Unicode to given character set or UTF-8
        lngNewLen = WideCharToMultiByte(CLng(cPage), lFlags, ByVal StrPtr(Text), lngStrLen \ 2, ByVal VarPtr(barNew(0)), lngNewLen, ByVal 0&, ByVal 0&) - 1
        ' what is  the length of the new string?
        Select Case lngNewLen
            Case Is < 0
                Exit Function
            Case Is < UBound(barNew)
                ' remove unused bytes
                ReDim Preserve barNew(lngNewLen)
        End Select
        ' return result
        UStrToCBar = barNew
    End Function

  10. #10

    Thread Starter
    New Member
    Join Date
    May 2006
    Posts
    5

    Re: UTF Unicode (Urdu language Database)

    I am working on it. After testing this example I will give u my feed back.
    Thanks for every thing.
    Bye
    Manzer

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width