hi
i am new user.
i download urdu keyboard from web. that keyboard name is "Phonatic" for Windows XP. With the help of that keyboad we will type urdu in Word Excel.
i am facing problem,that there is a database in excel (urdu),
i copy data into notepad, and save as Encoding =UTF-8.
then Interst Richtext box in vb6 exe application, load txt file into it. every thing is ok. file open and show all data correctly
then i copy all the data (including urdu fileds) into Access, the urdu filed looks fine in MS Access.
But when i try to read that data through Recordset or processing throgh VB normal string functions i did not get proper results, when ever i try to read the data i find ? ? ? ? instead of data.
Please any one now how to read utf data through recordset or any way to process in vb6.
Regards
Manzer
Last edited by manzerehsan; May 26th, 2006 at 02:32 AM.
RichTextBox in VB6 is not Unicode-aware. Basically, when you try to access the RTB, the data is converted to ANSI and then given to your VB6 program. Same when you set something to RTB: Unicode (VB6 strings are UTF-16) is converted to ANSI which is then again converted to Unicode in the RTB.
This weird thing happens due to historical reasons: original Visual Basic wasn't compatible with Unicode, neither was Windows. So for backwards compatibility there is a lot of ANSI conversions going on.
So, do not load the data into a textbox. None of the native VB6 controls is Unicode aware. Instead, open the textfile in binary mode into a byte array and then convert this byte array's contents to what you need. To help with the internal UTF-8 to Unicode conversion I've a module that can be very helpful. It lets you convert any codepage to Unicode and vice versa.
VB Code:
Private Function OpenUTF8toStr(ByVal Filename As String) As String
Dim FN As Byte, barBuffer() As Byte
' check if the file exists
If LenB(Dir$(Filename)) = 0 Then Exit Function
' open file for reading
FN = FreeFile
Open Filename For Binary Access Read As #FN
' resize byte array
ReDim barBuffer(LOF(FN) - 3)
' read file contents
' we skip the two first bytes (Byte Order Mark aka BOM)
I did not try your solution but I get the idea, My database is in excel and total fields are 18, one of the field is in Urdu language. When I convert excel to access , the process goes ok or smoothly.
With your solution I must convert all my file to text format or open as binary format. Then I must change all the internal coding according to binary code handling. ? ? ?
I need a solution which handles the access database Unicode or UTF fileds. I need to open Unicode field with all other English language fields.
I don't know how you get data from Excel or Access in VB, but they should be Unicode-aware and thus useable in VB. So you can actually access them directly. The problem is displaying the data as native VB controls don't support Unicode. Excel or Access objects within VB should support Unicode though without doing the ANSI conversion.
If you want to check that the data you possess in a string is Unicode, copy data to a string from a cell that has Unicode characters and then do MsgBox AscW(String) to see the character code of the first character. If it is above 255 then the data is internally in the format you want it to be.
If when saving data the data gets converted to ANSI, then you could use the conversion functions to change data to UTF-8. Because of the automatical ANSI conversion, you can do this kind of trick when saving data:
You don't really need to change the internal way you process the data. You only need to make sure that automatical ANSI conversions and such don't occur whenever you process data.
(Might be a bit confusing post, I've been awake quite a while).
You're trying to show Unicode in a control that does not support Unicode.
Solution 1: try to change RichTextBox1.Font.Charset to one of the following:
VB Code:
Global Const ANSI_CHARSET = 0
Global Const SHIFTJIS_CHARSET = 128
Global Const CHINESEBIG5_CHARSET = 136
Global Const HANGEUL_CHARSET = 129
Global Const GB2312_CHARSET = 134
Global Const DEFAULT_CHARSET = 1
Global Const SYMBOL_CHARSET = 2
Global Const OEM_CHARSET = 255
Global Const JOHAB_CHARSET = 130
Global Const HEBREW_CHARSET = 177
Global Const ARABIC_CHARSET = 178
Global Const GREEK_CHARSET = 161
Global Const TURKISH_CHARSET = 162
Global Const VIETNAMESE_CHARSET = 163
Global Const THAI_CHARSET = 222
Global Const EASTEUROPE_CHARSET = 238
Global Const RUSSIAN_CHARSET = 204
Global Const MAC_CHARSET = 77
Global Const BALTIC_CHARSET = 186
I don't know if you can have Urdu without Unicode support (changing Charset only changes the ANSI/DBCS support mode). Please note that even if you get it to show the characters, you're only able to show that one specific language. For example, you can't have Urdu and say, Chinese, together.
Solution 2: use Unicode controls. I've made a few myself: a label, a command button and a listbox. I also have a component that is able to show Unicode in the menus.
I now that I am not a expert in VB and this is my first Unicode program. Learning Unicode will take some time.
Please make a sample code for my problem.
This is the database in MS Access
Table:Test
IDs Name Number
1 علی 55
2 اکبر 22
3 محمود 33
Please make a vb code for above table.
To read data from “Test” table show in to text box (normal or rich) or put into string which can be saved as text file.
Option Explicit
Public Enum WINCODEPAGE
CP_UNKNOWN = -1
CP_ACP = 0
CP_OEMCP = 1
CP_MACCP = 2
CP_THREAD_ACP = 3
CP_SYMBOL = 42
CP_AWIN = 101
CP_709 = 102
CP_720 = 103
CP_A708 = 104
CP_A449 = 105
CP_TARB = 106
CP_NAE = 107
CP_V4 = 108
CP_MA2 = 109
CP_I864 = 110
CP_A437 = 111
CP_AMAC = 112
CP_HWIN = 201
CP_862I = 202
CP_7BIT = 203
CP_ISO = 204
CP_H437 = 205
CP_HMAC = 206
CP_OEM_437 = 437
CP_ARABICDOS = 708
CP_DOS720 = 720
CP_DOS737 = 737
CP_DOS775 = 775
CP_IBM850 = 850
CP_IBM852 = 852
CP_DOS861 = 861
CP_DOS862 = 862
CP_IBM866 = 866
CP_DOS869 = 869
CP_THAI = 874
CP_EBCDIC = 875
CP_JAPAN = 932
CP_CHINA = 936
CP_KOREA = 949
CP_TAIWAN = 950
CP_UNICODELITTLE = 1200
CP_UNICODEBIG = 1201
CP_EASTEUROPE = 1250
CP_RUSSIAN = 1251
CP_WESTEUROPE = 1252
CP_GREEK = 1253
CP_TURKISH = 1254
CP_HEBREW = 1255
CP_ARABIC = 1256
CP_BALTIC = 1257
CP_VIETNAMESE = 1258
CP_JOHAB = 1361
CP_MAC_ROMAN = 10000
CP_MAC_JAPAN = 10001
CP_MAC_ARABIC = 10004
CP_MAC_GREEK = 10006
CP_MAC_CYRILLIC = 10007
CP_MAC_LATIN2 = 10029
CP_MAC_TURKISH = 10081
CP_CHINESECNS = 20000
CP_CHINESEETEN = 20002
CP_IA5WEST = 20105
CP_IA5GERMAN = 20106
CP_IA5SWEDISH = 20107
CP_IA5NORWEGIAN = 20108
CP_ASCII = 20127
CP_RUSSIANKOI8R = 20866
CP_RUSSIANKOI8U = 21866
CP_ISOLATIN1 = 28591
CP_ISOEASTEUROPE = 28592
CP_ISOTURKISH = 28593
CP_ISOBALTIC = 28594
CP_ISORUSSIAN = 28595
CP_ISOARABIC = 28596
CP_ISOGREEK = 28597
CP_ISOHEBREW = 28598
CP_ISOTURKISH2 = 28599
CP_ISOLATIN9 = 28605
CP_HEBREWLOG = 38598
CP_JAPANNHK = 50220
CP_JAPANESC = 50221
CP_JAPANISO = 50222
CP_KOREAISO = 50225
CP_TAIWANISO = 50227
CP_CHINAISO = 50229
CP_JAPANEUC = 51932
CP_CHINAEUC = 51936
CP_KOREAEUC = 51949
CP_TAIWANEUC = 51950
CP_CHINAHZ = 52936
CP_GB18030 = 54936
CP_UTF7 = 65000
CP_UTF8 = 65001
End Enum
Public Declare Function GetACP Lib "kernel32" () As Long
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
' ANSI/DBCS/UTF-8 byte array to Unicode string
Public Function CBarToUStr(ByRef Text() As Byte, Optional ByVal cPage As WINCODEPAGE = CP_UNKNOWN, Optional lFlags As Long, Optional bValidate As Boolean = True) As String
Static barTemp() As Byte, barNew() As Byte
Dim lngStrLen As Long, lngNewLen As Long
' array initialized?
If (Not Text) = -1 Then Exit Function
' get string length
lngStrLen = UBound(Text) + 1
If lngStrLen = 0 Then Exit Function
' we have to add one null character to the end to have valid input for the conversion function...
barTemp = Text
ReDim Preserve barTemp(lngStrLen)
' validate/autodetect character set?
If bValidate Then
If IsUTF8(Text) Then
' it is UTF-8!
cPage = CP_UTF8
Else
Select Case cPage
Case CP_UNKNOWN, CP_UTF8
' use default character set
cPage = GetACP
End Select
End If
End If
' change size of new string
lngNewLen = lngStrLen * 2
ReDim Preserve barNew(lngNewLen + 1)
' get new string
lngNewLen = (2 * MultiByteToWideChar(CLng(cPage), lFlags, ByVal VarPtr(barTemp(0)), lngStrLen, ByVal VarPtr(barNew(0)), lngNewLen)) - 1
' check string length
Select Case lngNewLen
Case Is < 2
Exit Function
Case Is < UBound(barNew)
ReDim Preserve barNew(lngNewLen)
End Select
' output result
CBarToUStr = barNew
End Function
' Validate contents of a byte array as UTF-8
Public Function IsUTF8(ByRef bytArray() As Byte, Optional ByVal lngReadSize As Long = 2048) As Boolean
Dim lngArraySize As Long, lngReadPosition As Long, lngUtf8ByteSize As Long, lngIsUtf8 As Long
Dim i As Long
If lngReadSize < 0 Then Exit Function
If (Not bytArray) = -1 Then Exit Function
lngArraySize = UBound(bytArray) + 1
If lngReadSize > lngArraySize Then lngReadSize = lngArraySize
Do While lngReadPosition < lngReadSize
If bytArray(lngReadPosition) <= &H7F Then
lngReadPosition = lngReadPosition + 1
ElseIf bytArray(lngReadPosition) < &HC0 Then
Exit Function
ElseIf (bytArray(lngReadPosition) >= &HC0) And (bytArray(lngReadPosition) <= &HFD) Then
If (bytArray(lngReadPosition) And &HFC) = &HFC Then
lngUtf8ByteSize = 5
ElseIf (bytArray(lngReadPosition) And &HF8) = &HF8 Then
lngUtf8ByteSize = 4
ElseIf (bytArray(lngReadPosition) And &HF0) = &HF0 Then
lngUtf8ByteSize = 3
ElseIf (bytArray(lngReadPosition) And &HE0) = &HE0 Then
lngUtf8ByteSize = 2
ElseIf (bytArray(lngReadPosition) And &HC0) = &HC0 Then
lngUtf8ByteSize = 1
End If
If (lngReadPosition + lngUtf8ByteSize) >= lngReadSize Then Exit Do
For i = (lngReadPosition + 1) To (lngReadPosition + lngUtf8ByteSize) Step 1
If Not ((bytArray(i) >= &H80) And (bytArray(i) <= &HBF)) Then Exit Function
Next i
lngIsUtf8 = lngIsUtf8 + 1
lngReadPosition = lngReadPosition + lngUtf8ByteSize + 1
Else
lngReadPosition = lngReadPosition + 1
End If
Loop
IsUTF8 = lngIsUtf8 > 0
End Function
' Open to string from UTF-8 text file
Public Function OpenFromUTF8(ByVal Filename As String, ByRef Text As String, Optional ByRef BOM As Boolean) As Boolean
Dim barUTF8() As Byte, barBOM(2) As Byte, blnBOM As Boolean
Dim lngFN As Long, lngFilesize As Long
' make sure the filename exists
If LenB(Filename) = 0 Then Exit Function
If LenB(Dir$(Filename)) = 0 Then Exit Function
' any size?
lngFilesize = FileLen(Filename)
' if no size then just return a null stirng
If lngFilesize = 0 Then Text = vbNullString: OpenFromUTF8 = True: Exit Function
' get free file
lngFN = FreeFile
' open file
Open Filename For Binary Access Read Lock Write As #lngFN
If lngFilesize > 2 Then
' check for BOM
Get #lngFN, , barBOM
' validate BOM
blnBOM = ((barBOM(0) = &HEF) And (barBOM(1) = &HBB) And (barBOM(2) = &HBF))
If blnBOM Then
lngFilesize = lngFilesize - 3
Else
Seek #lngFN, 1
End If
End If
' see if anything to read
If lngFilesize > 0 Then
' resize buffer and read file
ReDim barUTF8(lngFilesize - 1)
Get #lngFN, , barUTF8
End If
Close #lngFN
' return values...
If Not IsMissing(BOM) Then BOM = blnBOM
' convert UTF-8 to a Unicode string
Text = CBarToUStr(barUTF8, CP_UTF8)
' return true
OpenFromUTF8 = True
End Function
' Save string as UTF-8 text file
Public Function SaveAsUTF8(ByVal Filename As String, ByRef Text As String, Optional ByVal BOM As Boolean = True) As Boolean
Dim barUTF8() As Byte, barBOM(2) As Byte
Dim lngFN As Long
' validate filename and text
If LenB(Filename) = 0 Then Exit Function
If LenB(Text) = 0 Then Exit Function
' remove the existing file if it exists
If LenB(Dir$(Filename)) > 0 Then Kill Filename
' convert to byte array
barUTF8 = UStrToCBar(Text, CP_UTF8, , False)
' set byte order mark (BOM)
barBOM(0) = &HEF
barBOM(1) = &HBB
barBOM(2) = &HBF
' get free file
lngFN = FreeFile
' open file for writing
Open Filename For Binary Access Write As #lngFN
' save BOM and UTF-8 data
If BOM Then Put #lngFN, , barBOM
Put #lngFN, , barUTF8
Close #lngFN
' return true
SaveAsUTF8 = True
End Function
' Unicode string to ANSI/DBCS/UTF-8 byte array
Public Function UStrToCBar(ByRef Text As String, Optional ByVal cPage As WINCODEPAGE = CP_UTF8, Optional lFlags As Long, Optional bValidate As Boolean = True) As Byte()
Static barNew() As Byte
Dim lngStrLen As Long, lngNewLen As Long
' check length
lngStrLen = LenB(Text)
If lngStrLen = 0 Then Exit Function
' validate character set?
If bValidate Then
Select Case cPage
Case CP_UNKNOWN
cPage = GetACP
End Select
End If
' reserve enough space to the new array
lngNewLen = lngStrLen * 2
ReDim Preserve barNew(lngNewLen)
' convert from Unicode to given character set or UTF-8
lngNewLen = WideCharToMultiByte(CLng(cPage), lFlags, ByVal StrPtr(Text), lngStrLen \ 2, ByVal VarPtr(barNew(0)), lngNewLen, ByVal 0&, ByVal 0&) - 1
' what is the length of the new string?
Select Case lngNewLen
Case Is < 0
Exit Function
Case Is < UBound(barNew)
' remove unused bytes
ReDim Preserve barNew(lngNewLen)
End Select
' return result
UStrToCBar = barNew
End Function