|
-
Oct 29th, 2005, 12:47 AM
#1
Thread Starter
Junior Member
[RESOLVE] How to read an UTF-8 text file?
I wrote some chinese words in NOTEPAD, then save as UTF-8 coding.
When I open it in VB, all of chinese words changed to wrong chars, can't read.
I must save it as UTF-8 coding.
Please help me , I will thank you !
VB Code:
Dim FileHandle As Integer
Dim Contents As String
FileHandle = FreeFile
Open filename For Binary As #FileHandle
Contents = Input(LOF(FileHandle), #FileHandle) & vbCrLf
Close #FileHandle
LoadFileEx = Contents
---------------------------------------
我在一个记事本中输入一些中文后保存为UTF-8编码文档,
我在VB中打开他的时候,中文都变成乱码了,不能阅读。
必须保存为UTF-8编码
请帮助我,谢谢。
Last edited by lichkingCN; Oct 29th, 2005 at 05:18 AM.
Reason: resolve
Don't walk before me I may not follow.
Don't walk behind me I may not lead.
Walk beside me and be my friend.
ICQ number: 325052114
-
Oct 29th, 2005, 01:36 AM
#2
Re: How to read an UTF-8 text file?
Try a RichTextBox instead of a Textbox. You can use Unicode.
-
Oct 29th, 2005, 01:38 AM
#3
Thread Starter
Junior Member
Re: How to read an UTF-8 text file?
I tried it. But failed. just the same.
-
Oct 29th, 2005, 02:05 AM
#4
Re: How to read an UTF-8 text file?
Check the properties. You need to use the correct Font, I think.
-
Oct 29th, 2005, 02:19 AM
#5
Thread Starter
Junior Member
Re: How to read an UTF-8 text file?
But... I think i have the correct font.
I'm chinese, so my OS is chinese.
I think problem is text file coding..
-
Oct 29th, 2005, 02:40 AM
#6
Re: How to read an UTF-8 text file?
You have a double-edged problem. First of all, VB can't directly read UTF-8 to a textbox; you'd see garbage. So, you first need to convert UTF-8 data to VB's native format (which is two bytes per character, thus Unicode UTF-16). Windows API provides a way to do a conversion.
The other problem is that VB controls don't natively support Unicode. A TextBox can only contain only SBCS and DBCS character sets (single byte character set and double byte character set). So, you need to change a character set before you assign the text to the textbox or else you will see just question marks.
Now, to get past these problems, here you have some useful code:
Code:
'modCharset.bas
Option Explicit
Public Enum KnownCodePage
CP_UNKNOWN = -1
CP_ACP = 0
CP_OEMCP = 1
CP_SYMBOL = 42
' ARABIC
CP_AWIN = 101 ' Bidi Windows codepage
CP_709 = 102 ' MS-DOS Arabic Support CP 709
CP_720 = 103 ' MS-DOS Arabic Support CP 720
CP_A708 = 104 ' ASMO 708
CP_A449 = 105 ' ASMO 449+
CP_TARB = 106 ' MS Transparent Arabic
CP_NAE = 107 ' Nafitha Enhanced Arabic Char Set
CP_V4 = 108 ' Nafitha v 4.0
CP_MA2 = 109 ' Mussaed Al Arabi (MA/2) CP 786
CP_I864 = 110 ' IBM Arabic Supplement CP 864
CP_A437 = 111 ' Ansi 437 codepage
CP_AMAC = 112 ' Macintosh Code Page
' HEBREW
CP_HWIN = 201 ' Bidi Windows codepage
CP_862I = 202 ' IBM Hebrew Supplement CP 862
CP_7BIT = 203 ' IBM Hebrew Supplement CP 862 Folded
CP_ISO = 204 ' ISO Hebrew 8859-8 Character Set
CP_H437 = 205 ' Ansi 437 codepage
CP_HMAC = 206 ' Macintosh Code Page
' CODE PAGES
CP_OEM_437 = 437
CP_ARABICDOS = 708
CP_DOS720 = 720
CP_IBM850 = 850
CP_IBM852 = 852
CP_DOS862 = 862
CP_IBM866 = 866
CP_THAI = 874
CP_JAPAN = 932
CP_CHINA = 936
CP_KOREA = 949
CP_TAIWAN = 950
' UNICODE
CP_UNICODELITTLE = 1200
CP_UNICODEBIG = 1201
' CODE PAGES
CP_EASTEUROPE = 1250
CP_RUSSIAN = 1251
CP_WESTEUROPE = 1252
CP_GREEK = 1253
CP_TURKISH = 1254
CP_HEBREW = 1255
CP_ARABIC = 1256
CP_BALTIC = 1257
CP_VIETNAMESE = 1258
' KOREAN
CP_JOHAB = 1361
' MAC
CP_MAC_ROMAN = 10000
CP_MAC_JAPAN = 10001
CP_MAC_ARABIC = 10004
CP_MAC_GREEK = 10006
CP_MAC_CYRILLIC = 10007
CP_MAC_LATIN2 = 10029
CP_MAC_TURKISH = 10081
' CODE PAGES
CP_ASCII = 20127
CP_RUSSIANKOI8R = 20866
CP_RUSSIANKOI8U = 21866
CP_ISOLATIN1 = 28591
CP_ISOEASTEUROPE = 28592
CP_ISOTURKISH = 28593
CP_ISOBALTIC = 28594
CP_ISORUSSIAN = 28595
CP_ISOARABIC = 28596
CP_ISOGREEK = 28597
CP_ISOHEBREW = 28598
CP_ISOTURKISH2 = 28599
CP_ISOLATIN9 = 28605
CP_HEBREWLOG = 38598
CP_USER = 50000
CP_AUTOALL = 50001
CP_JAPANNHK = 50220
CP_JAPANESC = 50221
CP_JAPANISO = 50222
CP_KOREAISO = 50225
CP_TAIWANISO = 50227
CP_CHINAISO = 50229
CP_AUTOJAPAN = 50932
CP_AUTOCHINA = 50936
CP_AUTOKOREA = 50949
CP_AUTOTAIWAN = 50950
CP_AUTORUSSIAN = 51251
CP_AUTOGREEK = 51253
CP_AUTOARABIC = 51256
CP_JAPANEUC = 51932
CP_CHINAEUC = 51936
CP_KOREAEUC = 51949
CP_TAIWANEUC = 51950
CP_CHINAHZ = 52936
' UNICODE
CP_UTF7 = 65000
CP_UTF8 = 65001
End Enum
' Flags
Public Const MB_PRECOMPOSED = &H1
Public Const MB_COMPOSITE = &H2
Public Const MB_USEGLYPHCHARS = &H4
Public Const MB_ERR_INVALID_CHARS = &H8
Public Const WC_DEFAULTCHECK = &H100 ' check for default char
Public Const WC_COMPOSITECHECK = &H200 ' convert composite to precomposed
Public Const WC_DISCARDNS = &H10 ' discard non-spacing chars
Public Const WC_SEPCHARS = &H20 ' generate separate chars
Public Const WC_DEFAULTCHAR = &H40 ' replace with default char
' API
Private Declare Function GetACP Lib "kernel32" () As Long
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
Public Function ANSItoUTF16(ByRef Text() As Byte, Optional ByVal cPage As KnownCodePage = CP_UNKNOWN, Optional lFlags As Long) As Byte()
Static tmpArr() As Byte, textStr As String
Dim tmpLen As Long, textLen As Long, A As Long
If (Not Text) = True Then Exit Function
' set code page to a valid one
If cPage = CP_UNKNOWN Then cPage = GetACP
If cPage = CP_ACP Or cPage = CP_WESTEUROPE Then
textLen = UBound(Text)
tmpLen = textLen + textLen + 1
If (Not tmpArr) = True Then ReDim Preserve tmpArr(tmpLen)
If UBound(tmpArr) <> tmpLen Then ReDim Preserve tmpArr(tmpLen)
For A = 0 To UBound(Text)
tmpArr(A + A) = Text(A)
Next A
Else
textStr = CStr(Text) & "|"
textLen = LenB(textStr)
tmpLen = textLen + textLen
ReDim Preserve tmpArr(tmpLen + 1)
'Debug.Print "SIZE OF TMPARR: " & tmpLen + 1
' get the new string to tmpArr
tmpLen = MultiByteToWideChar(CLng(cPage), lFlags, ByVal StrPtr(textStr), -1, ByVal VarPtr(tmpArr(0)), tmpLen)
'Debug.Print "ANSI to Unicode: " & tmpLen
If tmpLen = 0 Then Exit Function
tmpLen = tmpLen + tmpLen - 5
'If tmpArr(tmpLen - 1) = 0 And tmpArr(tmpLen) = 0 Then tmpLen = tmpLen - 2
If UBound(tmpArr) <> tmpLen Then ReDim Preserve tmpArr(tmpLen)
'Debug.Print "SIZE OF TMPARR: " & tmpLen
End If
' return the result
ANSItoUTF16 = tmpArr
End Function
Public Function UTF16toANSI(ByRef Text() As Byte, Optional ByVal cPage As KnownCodePage = CP_UNKNOWN, Optional lFlags As Long) As Byte()
Static tmpArr() As Byte
Dim tmpLen As Long, textLen As Long, A As Long
If (Not Text) = True Then Exit Function
' set code page to a valid one
If cPage = CP_UNKNOWN Then cPage = GetACP
If cPage = CP_ACP Or cPage = CP_WESTEUROPE Then
textLen = UBound(Text)
tmpLen = (textLen + 1) \ 2 - 1
If (Not tmpArr) = True Then ReDim Preserve tmpArr(tmpLen)
If UBound(tmpArr) <> tmpLen Then ReDim Preserve tmpArr(tmpLen)
For A = 0 To tmpLen
tmpArr(A) = Text(A + A)
Next A
Else
textLen = (UBound(Text) + 1) \ 2
' at maximum ANSI can be four bytes per character in new Chinese encoding GB18030–2000
tmpLen = textLen + textLen + textLen + textLen + 1
ReDim Preserve tmpArr(tmpLen - 1)
' get the new string to tmpArr
tmpLen = WideCharToMultiByte(CLng(cPage), lFlags, ByVal VarPtr(Text(0)), textLen, ByVal VarPtr(tmpArr(0)), tmpLen, ByVal 0&, ByVal 0&)
'Debug.Print "Unicode to ANSI: " & tmpLen
If tmpLen = 0 Then Exit Function
' a hopeless try to correct a weird error?
ReDim Preserve tmpArr(tmpLen - 1)
End If
' return the result
UTF16toANSI = tmpArr
End Function
These add ANSItoUTF16 and UTF16toANSI functions to your program. What these actually do is to convert from character set to another (ie. Unicode to some common Chinese character set). We need to do pretty complex conversions: first, read the file (preferably to a byte array to avoid an extra conversion), then convert the byte array UTF-8 to Unicode and set the result in a string. Then display the end result in the textbox which is set to show the correct character set.
VB Code:
' a simple sample
Dim barTemp() As Byte
Open Filename For Binary Access Read As #1
' set buffer to the size of the file
ReDim barTemp(FileLen(Filename) - 1)
' read file to buffer
Get #1, , barTemp
Close #1
' set TextBox character set
' (this will make sure the font is correct and able to display the characters)
Text1.Font.Charset = 134
' you could define those as constants:
' Const GB2312_CHARSET = 134
' Const CHINESEBIG5_CHARSET = 136
' you can find other charsets using Google
' convert from UTF-8 to Unicode and assing to textbox
' (byte array is automatically converted to string)
Text1.Text = ANSItoUTF16(barTemp, CP_UTF8, 0)
Hope you get this to work 
For more information about Unicode in VB6, see this tutorial
-
Oct 29th, 2005, 02:48 AM
#7
Thread Starter
Junior Member
Re: How to read an UTF-8 text file?
thank you very much!
It works very well!
thanks a lot!
-
Oct 29th, 2005, 04:32 AM
#8
Re: How to read an UTF-8 text file?
You can use Thread Tools in the top of the page and select Mark Thread Resolved from there. And great it worked (didn't test it myself)
-
Dec 25th, 2021, 09:02 AM
#9
New Member
Re: How to read an UTF-8 text file?
-
Dec 25th, 2021, 03:17 PM
#10
Re: [RESOLVE] How to read an UTF-8 text file?
Yaso, welcome to VBForums. You should come join us in the more recent threads, found here, rather than a 16 year old thread.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Dec 26th, 2021, 01:31 PM
#11
New Member
Re: [RESOLVE] How to read an UTF-8 text file?
 Originally Posted by Elroy
Yaso, welcome to VBForums.  You should come join us in the more recent threads, found here, rather than a 16 year old thread. 
You are right., but google gives still the VBForums solution. So ...
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|