UTF Unicode (Urdu language Database)

**manzerehsan** · May 26th, 2006, 02:25 AM

hi
i am new user.
i download urdu keyboard from web. that keyboard name is "Phonatic" for Windows XP. With the help of that keyboad we will type urdu in Word Excel.

i am facing problem,that there is a database in excel (urdu),

i copy data into notepad, and save as Encoding =UTF-8.
then Interst Richtext box in vb6 exe application, load txt file into it. every thing is ok. file open and show all data correctly

then i copy all the data (including urdu fileds) into Access, the urdu filed looks fine in MS Access.

But when i try to read that data through Recordset or processing throgh VB normal string functions i did not get proper results, when ever i try to read the data i find ? ? ? ? instead of data.

Please any one now how to read utf data through recordset or any way to process in vb6.

Regards
Manzer

**Merri** · May 26th, 2006, 03:20 AM

RichTextBox in VB6 is not Unicode-aware. Basically, when you try to access the RTB, the data is converted to ANSI and then given to your VB6 program. Same when you set something to RTB: Unicode (VB6 strings are UTF-16) is converted to ANSI which is then again converted to Unicode in the RTB.

This weird thing happens due to historical reasons: original Visual Basic wasn't compatible with Unicode, neither was Windows. So for backwards compatibility there is a lot of ANSI conversions going on.

So, do not load the data into a textbox. None of the native VB6 controls is Unicode aware. Instead, open the textfile in binary mode into a byte array and then convert this byte array's contents to what you need. To help with the internal UTF-8 to Unicode conversion I've a module that can be very helpful. It lets you convert any codepage to Unicode and vice versa.

VB Code:

Private Function OpenUTF8toStr(ByVal Filename As String) As String
    Dim FN As Byte, barBuffer() As Byte
    ' check if the file exists
    If LenB(Dir$(Filename)) = 0 Then Exit Function
    ' open file for reading
    FN = FreeFile
    Open Filename For Binary Access Read As #FN
    ' resize byte array
    ReDim barBuffer(LOF(FN) - 3)
    ' read file contents
    ' we skip the two first bytes (Byte Order Mark aka BOM)
    Get #FN, 3, barBuffer
    Close #FN
    ' convert to UTF-16 and to string and return
    OpenUTF8toStr = CStr(ANSItoUTF16(barBuffer, CP_UTF8))
    ' clear memory
    Erase barBuffer
End Function

One last note: this module is still a bit of a work-in-progress. It works, but it isn't optimized or cleaned up.

**manzerehsan** · May 27th, 2006, 04:06 AM

Thanks Merri,

I did not try your solution but I get the idea, My database is in excel and total fields are 18, one of the field is in Urdu language. When I convert excel to access , the process goes ok or smoothly.

With your solution I must convert all my file to text format or open as binary format. Then I must change all the internal coding according to binary code handling. ? ? ?

I need a solution which handles the access database Unicode or UTF fileds. I need to open Unicode field with all other English language fields.

Thanks again.

waiting for your response.

regards
Manzer

**Merri** · May 27th, 2006, 05:49 AM

I don't know how you get data from Excel or Access in VB, but they should be Unicode-aware and thus useable in VB. So you can actually access them directly. The problem is displaying the data as native VB controls don't support Unicode. Excel or Access objects within VB should support Unicode though without doing the ANSI conversion.

If you want to check that the data you possess in a string is Unicode, copy data to a string from a cell that has Unicode characters and then do MsgBox AscW(String) to see the character code of the first character. If it is above 255 then the data is internally in the format you want it to be.

If when saving data the data gets converted to ANSI, then you could use the conversion functions to change data to UTF-8. Because of the automatical ANSI conversion, you can do this kind of trick when saving data:

VB Code:

StringToSave = StrConv(UTF16toANSI(barTemp, CP_UTF8), vbUnicode)

StrConv basically adds zeros to every other byte.

You don't really need to change the internal way you process the data. You only need to make sure that automatical ANSI conversions and such don't occur whenever you process data.

(Might be a bit confusing post, I've been awake quite a while).

**manzerehsan** · May 29th, 2006, 02:30 AM

I describe my problem in pictrorial form.

Please find attached file “Unicode Issue.doc”

Regards,
Manzer

**Merri** · May 29th, 2006, 05:22 AM

You're trying to show Unicode in a control that does not support Unicode.

Solution 1: try to change RichTextBox1.Font.Charset to one of the following:

VB Code:

Global Const ANSI_CHARSET = 0
Global Const SHIFTJIS_CHARSET = 128
Global Const CHINESEBIG5_CHARSET = 136
Global Const HANGEUL_CHARSET = 129
Global Const GB2312_CHARSET = 134
Global Const DEFAULT_CHARSET = 1
Global Const SYMBOL_CHARSET = 2
Global Const OEM_CHARSET = 255
Global Const JOHAB_CHARSET = 130
Global Const HEBREW_CHARSET = 177
Global Const ARABIC_CHARSET = 178
Global Const GREEK_CHARSET = 161
Global Const TURKISH_CHARSET = 162
Global Const VIETNAMESE_CHARSET = 163
Global Const THAI_CHARSET = 222
Global Const EASTEUROPE_CHARSET = 238
Global Const RUSSIAN_CHARSET = 204
Global Const MAC_CHARSET = 77
Global Const BALTIC_CHARSET = 186

I don't know if you can have Urdu without Unicode support (changing Charset only changes the ANSI/DBCS support mode). Please note that even if you get it to show the characters, you're only able to show that one specific language. For example, you can't have Urdu and say, Chinese, together.

Solution 2: use Unicode controls. I've made a few myself: a label, a command button and a listbox. I also have a component that is able to show Unicode in the menus.

UniLabel - UniCommand - UniListBox - UniMenu

I really think you need to understand Unicode in VB6 better than you do now. It takes time to understand it, but it may be worth the time it takes. Here is something that can get you started:
Unicode info on VBForums FAQ forum
A ton of information about Unicode in VB6

**manzerehsan** · May 30th, 2006, 04:43 AM

I now that I am not a expert in VB and this is my first Unicode program. Learning Unicode will take some time.

Please make a sample code for my problem.

This is the database in MS Access

Table:Test

IDs Name Number
1 علی 55
2 اکبر 22
3 محمود 33

Please make a vb code for above table.
To read data from “Test” table show in to text box (normal or rich) or put into string which can be saved as text file.

This is will great help to me.

Regards
Manzer

**Merri** · May 30th, 2006, 01:27 PM

This allows you to save as UTF-8. Paste the code in the next post into a new empty module and give it a name (modCharset or similar).

Sample usage:

VB Code:

Option Explicit
 
Private Sub Form_Load()
    Dim strTemp As String, blnBOM As Boolean
    If Not SaveAsUTF8("C:\test.txt", "ÅÄÖ", True) Then MsgBox "Saving failed": Exit Sub
    If Not OpenFromUTF8("C:\test.txt", strTemp, blnBOM) Then MsgBox "Open failed": Exit Sub
    MsgBox strTemp, , "Byte Order Mark: " & blnBOM
End Sub

You can just directly save what you've gotten from the DB to the file, it is Unicode.

**Merri** · May 30th, 2006, 01:28 PM

Code:

Option Explicit

Public Enum WINCODEPAGE
    CP_UNKNOWN = -1
    CP_ACP = 0
    CP_OEMCP = 1
    CP_MACCP = 2
    CP_THREAD_ACP = 3
    CP_SYMBOL = 42
    CP_AWIN = 101
    CP_709 = 102
    CP_720 = 103
    CP_A708 = 104
    CP_A449 = 105
    CP_TARB = 106
    CP_NAE = 107
    CP_V4 = 108
    CP_MA2 = 109
    CP_I864 = 110
    CP_A437 = 111
    CP_AMAC = 112
    CP_HWIN = 201
    CP_862I = 202
    CP_7BIT = 203
    CP_ISO = 204
    CP_H437 = 205
    CP_HMAC = 206
    CP_OEM_437 = 437
    CP_ARABICDOS = 708
    CP_DOS720 = 720
    CP_DOS737 = 737
    CP_DOS775 = 775
    CP_IBM850 = 850
    CP_IBM852 = 852
    CP_DOS861 = 861
    CP_DOS862 = 862
    CP_IBM866 = 866
    CP_DOS869 = 869
    CP_THAI = 874
    CP_EBCDIC = 875
    CP_JAPAN = 932
    CP_CHINA = 936
    CP_KOREA = 949
    CP_TAIWAN = 950
    CP_UNICODELITTLE = 1200
    CP_UNICODEBIG = 1201
    CP_EASTEUROPE = 1250
    CP_RUSSIAN = 1251
    CP_WESTEUROPE = 1252
    CP_GREEK = 1253
    CP_TURKISH = 1254
    CP_HEBREW = 1255
    CP_ARABIC = 1256
    CP_BALTIC = 1257
    CP_VIETNAMESE = 1258
    CP_JOHAB = 1361
    CP_MAC_ROMAN = 10000
    CP_MAC_JAPAN = 10001
    CP_MAC_ARABIC = 10004
    CP_MAC_GREEK = 10006
    CP_MAC_CYRILLIC = 10007
    CP_MAC_LATIN2 = 10029
    CP_MAC_TURKISH = 10081
    CP_CHINESECNS = 20000
    CP_CHINESEETEN = 20002
    CP_IA5WEST = 20105
    CP_IA5GERMAN = 20106
    CP_IA5SWEDISH = 20107
    CP_IA5NORWEGIAN = 20108
    CP_ASCII = 20127
    CP_RUSSIANKOI8R = 20866
    CP_RUSSIANKOI8U = 21866
    CP_ISOLATIN1 = 28591
    CP_ISOEASTEUROPE = 28592
    CP_ISOTURKISH = 28593
    CP_ISOBALTIC = 28594
    CP_ISORUSSIAN = 28595
    CP_ISOARABIC = 28596
    CP_ISOGREEK = 28597
    CP_ISOHEBREW = 28598
    CP_ISOTURKISH2 = 28599
    CP_ISOLATIN9 = 28605
    CP_HEBREWLOG = 38598
    CP_JAPANNHK = 50220
    CP_JAPANESC = 50221
    CP_JAPANISO = 50222
    CP_KOREAISO = 50225
    CP_TAIWANISO = 50227
    CP_CHINAISO = 50229
    CP_JAPANEUC = 51932
    CP_CHINAEUC = 51936
    CP_KOREAEUC = 51949
    CP_TAIWANEUC = 51950
    CP_CHINAHZ = 52936
    CP_GB18030 = 54936
    CP_UTF7 = 65000
    CP_UTF8 = 65001
End Enum

Public Declare Function GetACP Lib "kernel32" () As Long
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
' ANSI/DBCS/UTF-8 byte array to Unicode string
Public Function CBarToUStr(ByRef Text() As Byte, Optional ByVal cPage As WINCODEPAGE = CP_UNKNOWN, Optional lFlags As Long, Optional bValidate As Boolean = True) As String
    Static barTemp() As Byte, barNew() As Byte
    Dim lngStrLen As Long, lngNewLen As Long
    ' array initialized?
    If (Not Text) = -1 Then Exit Function
    ' get string length
    lngStrLen = UBound(Text) + 1
    If lngStrLen = 0 Then Exit Function
    ' we have to add one null character to the end to have valid input for the conversion function...
    barTemp = Text
    ReDim Preserve barTemp(lngStrLen)
    ' validate/autodetect character set?
    If bValidate Then
        If IsUTF8(Text) Then
            ' it is UTF-8!
            cPage = CP_UTF8
        Else
            Select Case cPage
                Case CP_UNKNOWN, CP_UTF8
                    ' use default character set
                    cPage = GetACP
            End Select
        End If
    End If
    ' change size of new string
    lngNewLen = lngStrLen * 2
    ReDim Preserve barNew(lngNewLen + 1)
    ' get new string
    lngNewLen = (2 * MultiByteToWideChar(CLng(cPage), lFlags, ByVal VarPtr(barTemp(0)), lngStrLen, ByVal VarPtr(barNew(0)), lngNewLen)) - 1
    ' check string length
    Select Case lngNewLen
        Case Is < 2
            Exit Function
        Case Is < UBound(barNew)
            ReDim Preserve barNew(lngNewLen)
    End Select
    ' output result
    CBarToUStr = barNew
End Function
' Validate contents of a byte array as UTF-8
Public Function IsUTF8(ByRef bytArray() As Byte, Optional ByVal lngReadSize As Long = 2048) As Boolean
    Dim lngArraySize As Long, lngReadPosition As Long, lngUtf8ByteSize As Long, lngIsUtf8 As Long
    Dim i As Long
 
    If lngReadSize < 0 Then Exit Function
    If (Not bytArray) = -1 Then Exit Function
    lngArraySize = UBound(bytArray) + 1
    If lngReadSize > lngArraySize Then lngReadSize = lngArraySize
    
    Do While lngReadPosition < lngReadSize
        If bytArray(lngReadPosition) <= &H7F Then
            lngReadPosition = lngReadPosition + 1
        ElseIf bytArray(lngReadPosition) < &HC0 Then
            Exit Function
        ElseIf (bytArray(lngReadPosition) >= &HC0) And (bytArray(lngReadPosition) <= &HFD) Then
            If (bytArray(lngReadPosition) And &HFC) = &HFC Then
                lngUtf8ByteSize = 5
            ElseIf (bytArray(lngReadPosition) And &HF8) = &HF8 Then
                lngUtf8ByteSize = 4
            ElseIf (bytArray(lngReadPosition) And &HF0) = &HF0 Then
                lngUtf8ByteSize = 3
            ElseIf (bytArray(lngReadPosition) And &HE0) = &HE0 Then
                lngUtf8ByteSize = 2
            ElseIf (bytArray(lngReadPosition) And &HC0) = &HC0 Then
                lngUtf8ByteSize = 1
            End If
            If (lngReadPosition + lngUtf8ByteSize) >= lngReadSize Then Exit Do
            For i = (lngReadPosition + 1) To (lngReadPosition + lngUtf8ByteSize) Step 1
                If Not ((bytArray(i) >= &H80) And (bytArray(i) <= &HBF)) Then Exit Function
            Next i
            lngIsUtf8 = lngIsUtf8 + 1
            lngReadPosition = lngReadPosition + lngUtf8ByteSize + 1
        Else
            lngReadPosition = lngReadPosition + 1
        End If
    Loop
    IsUTF8 = lngIsUtf8 > 0
End Function
' Open to string from UTF-8 text file
Public Function OpenFromUTF8(ByVal Filename As String, ByRef Text As String, Optional ByRef BOM As Boolean) As Boolean
    Dim barUTF8() As Byte, barBOM(2) As Byte, blnBOM As Boolean
    Dim lngFN As Long, lngFilesize As Long
    ' make sure the filename exists
    If LenB(Filename) = 0 Then Exit Function
    If LenB(Dir$(Filename)) = 0 Then Exit Function
    ' any size?
    lngFilesize = FileLen(Filename)
    ' if no size then just return a null stirng
    If lngFilesize = 0 Then Text = vbNullString: OpenFromUTF8 = True: Exit Function
    ' get free file
    lngFN = FreeFile
    ' open file
    Open Filename For Binary Access Read Lock Write As #lngFN
        If lngFilesize > 2 Then
            ' check for BOM
            Get #lngFN, , barBOM
            ' validate BOM
            blnBOM = ((barBOM(0) = &HEF) And (barBOM(1) = &HBB) And (barBOM(2) = &HBF))
            If blnBOM Then
                lngFilesize = lngFilesize - 3
            Else
                Seek #lngFN, 1
            End If
        End If
        ' see if anything to read
        If lngFilesize > 0 Then
            ' resize buffer and read file
            ReDim barUTF8(lngFilesize - 1)
            Get #lngFN, , barUTF8
        End If
    Close #lngFN
    ' return values...
    If Not IsMissing(BOM) Then BOM = blnBOM
    ' convert UTF-8 to a Unicode string
    Text = CBarToUStr(barUTF8, CP_UTF8)
    ' return true
    OpenFromUTF8 = True
End Function
' Save string as UTF-8 text file
Public Function SaveAsUTF8(ByVal Filename As String, ByRef Text As String, Optional ByVal BOM As Boolean = True) As Boolean
    Dim barUTF8() As Byte, barBOM(2) As Byte
    Dim lngFN As Long
    ' validate filename and text
    If LenB(Filename) = 0 Then Exit Function
    If LenB(Text) = 0 Then Exit Function
    ' remove the existing file if it exists
    If LenB(Dir$(Filename)) > 0 Then Kill Filename
    ' convert to byte array
    barUTF8 = UStrToCBar(Text, CP_UTF8, , False)
    ' set byte order mark (BOM)
    barBOM(0) = &HEF
    barBOM(1) = &HBB
    barBOM(2) = &HBF
    ' get free file
    lngFN = FreeFile
    ' open file for writing
    Open Filename For Binary Access Write As #lngFN
        ' save BOM and UTF-8 data
        If BOM Then Put #lngFN, , barBOM
        Put #lngFN, , barUTF8
    Close #lngFN
    ' return true
    SaveAsUTF8 = True
End Function
' Unicode string to ANSI/DBCS/UTF-8 byte array
Public Function UStrToCBar(ByRef Text As String, Optional ByVal cPage As WINCODEPAGE = CP_UTF8, Optional lFlags As Long, Optional bValidate As Boolean = True) As Byte()
    Static barNew() As Byte
    Dim lngStrLen As Long, lngNewLen As Long
    ' check length
    lngStrLen = LenB(Text)
    If lngStrLen = 0 Then Exit Function
    ' validate character set?
    If bValidate Then
        Select Case cPage
            Case CP_UNKNOWN
                cPage = GetACP
        End Select
    End If
    ' reserve enough space to the new array
    lngNewLen = lngStrLen * 2
    ReDim Preserve barNew(lngNewLen)
    ' convert from Unicode to given character set or UTF-8
    lngNewLen = WideCharToMultiByte(CLng(cPage), lFlags, ByVal StrPtr(Text), lngStrLen \ 2, ByVal VarPtr(barNew(0)), lngNewLen, ByVal 0&, ByVal 0&) - 1
    ' what is  the length of the new string?
    Select Case lngNewLen
        Case Is < 0
            Exit Function
        Case Is < UBound(barNew)
            ' remove unused bytes
            ReDim Preserve barNew(lngNewLen)
    End Select
    ' return result
    UStrToCBar = barNew
End Function

**manzerehsan** · Jun 1st, 2006, 02:00 AM

I am working on it. After testing this example I will give u my feed back.
Thanks for every thing.
Bye
Manzer

Thread: UTF Unicode (Urdu language Database)

Thread Tools

Display

UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Re: UTF Unicode (Urdu language Database)

Posting Permissions