Currently I am able to load an ANSI text file, and loop through all the lines, reading each one. I do this like ths:
Code:
lngFileNum = FreeFile()
Open strpath For Input As #lngFileNum
Do While Seek(lngFileNum) <= LOF(lngFileNum)
Line Input #lngFileNum, strResult
'Do Something with strResult
Loop
Close 'lngFileNum
Now when I do the same with a unicode file, I get garbage characters even though the charset, and correct font is being used for the controls
When I add the unicode strings to a ressource file and load it, everything displays correctly good. However if I load a text file into memory I get garbage data.
Can enyone tell me how to cycle though (line by line) the unicode text file?
The VB6 native file I/O statements are limited to handling files encoded in "ANSI" using the current locale setting.
Look up the ADO Stream object. It can be loaded from disk and read line by line in text mode. It will handle files in ASCII, UTF-16, UTF-8, whatever. It defaults to "Unicode" (UTF-16).
You could also read such a file using the FSO, but it is limited to ASCII and Unicode (UTF-16).
The VB6 native file I/O statements are limited to handling files encoded in "ANSI" using the current locale setting.
Look up the ADO Stream object. It can be loaded from disk and read line by line in text mode. It will handle files in ASCII, UTF-16, UTF-8, whatever. It defaults to "Unicode" (UTF-16).
You could also read such a file using the FSO, but it is limited to ASCII and Unicode (UTF-16).
Thanks for the reply. I don't want to use the ADO stream or FSO as I would then have to ship extra files in the installation package. I know this can be done by loading the file as a byte array however once I have the file in a byte array I don't know how to read line by line the unicode text and put it into a string.
It is hard to imagine anyone using a version of Windows too old to have at least ADO 2.5 preinstalled as part of the OS, but whatever.
You can convert the Byte array to a String by a simple assignment:
s = b
You'll have to pick apart lines with more code though, such as by using Split().
Thanks again for the quick reply. I tried that but get the same results as if I loaded it like I did above. I think once I have the byte array I need to do some manipulation so that the caracters display correctly. Like I said when I have the strings in the ressource file and load them they display correctly, so I know the characters can be displayed. The problem is when I load from a text file.
I assumed you were reading into the Byte array using a file opened in Binary mode. It sounds as if you are opening it in text mode instead. Use something like:
Open "file.txt" For Binary Access Read As #intFile
you can work with unicode text files as strings, as long as you do not need to display in vb controls, i have an example with textfiles containing arabic characters, using find and replace the write output to new file, i found it unneccessary to use byte arrays
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
If you don't want external files then have search for the
ReadFile APIs and WriteFile APIs..
If i have time, ill post an example later...
_____________________________________________________________________
----If this post has helped you. Please take time to Rate it.
----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.
You don't need to use API calls to read Unicode (UTF-16LE) text files. You just need to do as I suggested earlier.
The attached demo contains a simple UnicodeReader.cls I threw together based on another similar class. It has a few "extras" you might not need but it is pretty lightweight, has no external dependencies, and should not have any serious bugs. You could rip out the logic and use it, pretty it up a bit, or just use it as-is in a new Project.
To run the demo you need the Microsoft Forms 2.0 controls, which of course have issues including not being redistributable. You can get these a number of ways, including by having Office (2000 or later?) installed. The TextBox control from this set made the demo easier because it can handle pure "Unicode" and display multilingual Charsets simultaneously.
To run the demo you need the Microsoft Forms 2.0 controls
Of course you need the APIs.
If you read properly, he tries to avoid the use of any external files that needed to be redistributed. So Microsoft Forms 2.0 is no exception!
I've managed to dig up a module I found on my HD, which contains different ways of reading & writing in UniCode.
Last edited by some1uk03; Sep 13th, 2009 at 04:01 PM.
_____________________________________________________________________
----If this post has helped you. Please take time to Rate it.
----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.
No, the question was how to read Unicode files line by line. I was only using the MS Forms 2.0 controls to demo the results of doing it. Those controls have nothing to do with the process of reading the text from disk.
If the real problem is displaying Unicode symbols then it would be another story of course.
Thanks everybody for your replies and sample codes that you sent. I will work on these and let you know how it went. After a two minute test I am still not seeing the arabic characters but I will play around with the code and see if I can fix it. I will keep you posted.
my comment was supposed to indicate that you can work with the arabic and save it to another text file but not display it in standard vb6 controls or debug window, only richtext control is unicode enabled
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
my comment was supposed to indicate that you can work with the arabic and save it to another text file but not display it in standard vb6 controls or debug window, only richtext control is unicode enabled
I am able to display arabic script in vb6 controls in these conditions:
1) System is middle-east enabled (Regional and Language Options in controls panel)
2) I have set the Font.Charset of all the controls to 178 (Arabic)
3) The strings are located in the ressource file.
My difficulty is when the string is located in a text file. I can read the text file but when displaying it in controls it doesn't work. I think I need to do some sort of conversion.
This seems to be a persistent rumor. RichTextBox is based on the old Riched32.dll, i.e. version 1.0 of Microsoft's Rich Edit control. It does not support Unicode, as can be easily seen by substituting one for the MS Forms 2.0 TextBox I used in the project I attached above.
there are 3rd party unicode controls available, but i have never needed to use any, search in this forum should return some results
sorry for any misinformation about richtext box, i thought i had used it with some unicode testing, but must be mistaken
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
I have tried using the following code while uploading the attached sample test file (text.txt) that includes some arabic script. However the data that comes out is garbage, if I use some arabic script that is located in the ressource file it shows correctly. Is there way to modify the code below so that the characters display correctly?
Code:
Dim fnum As Integer
Dim data() As Byte
Dim str As String
fnum = FreeFile(0)
Open FileName For Binary Access Read As #fnum
ReDim data(LOF(fnum) - 1)
Get #fnum, , data
Close #fnum
str = data
MsgBox StrConv(str, vbUnicode)
U need the unicode aware version of the MESSAGE box to display unicode characters.
Declare Function MessageBoxIndirectW Lib "user32" (lpMsgBoxParams As MSGBOXPARAMSW) As Long
Private Const MB_USERICON = &H80&
Type MSGBOXPARAMSW
cbSize As Long
hwndOwner As Long
hInstance As Long
lpszText As Long
lpszCaption As Long
dwStyle As Long
lpszIcon As Long
dwContextHelpId As Long
lpfnMsgBoxCallback As Long
dwLanguageId As Long
End Type
_____________________________________________________________________
----If this post has helped you. Please take time to Rate it.
----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.
U need the unicode aware version of the MESSAGE box to display unicode characters.
Declare Function MessageBoxIndirectW Lib "user32" (lpMsgBoxParams As MSGBOXPARAMSW) As Long
Private Const MB_USERICON = &H80&
Type MSGBOXPARAMSW
cbSize As Long
hwndOwner As Long
hInstance As Long
lpszText As Long
lpszCaption As Long
dwStyle As Long
lpszIcon As Long
dwContextHelpId As Long
lpfnMsgBoxCallback As Long
dwLanguageId As Long
End Type
Thanks for your reply, I was just using the message box as an example however your api call will be useful.
The same garbage outputs aso on controls like frames, command buttons, option buttons etc... These same controls display correctly when the source of the unicode text is the ressource file. When the source is the upload from a unicode text is where the problem lies.
I think I have a better picture of what you're doing now. The part about things working for a text resource seems a little odd, but perhaps that is a case where the program was built on a machine where a Middle Eastern system locale was set... or something.
In any case if we limit outselves to one Charset and Locale it is fairly easy to spoof most regular VB6 ANSI controls into displaying text for a non-system locale. The process is a little weird, but basically you have to:
Get a Unicode String containing characters from the desired Locale.
Translate the String from Unicode to ANSI using the desired Locale ID value.
Translate the String from ANSI to Unicode using the default (system) Locale ID, creating "mystery meat" Unicode.
Set the control's Charset to the desired script (e.g. Arabic), which can be done through the Font Property dialog or in code.
Set the control's Text or Caption property to the "mystery meat" mangled Unicode.
When the Text/Caption value is assigned, VB6 will perform an implicit Unicode to ANSI conversion using the system Locale ID value, undoing the mangling and converting the mystery meat to the desired ANSI codepage. Everything should display fine - aside from left-to-right vs. right-to-left ordering, another topic.
There are multiple Arabic Locales. I have put a few in my sample program as Consts, more can be found at Locale IDs Assigned by Microsoft.
The attached project uses a regular intrinsic VB6 TextBox control. Note that this technique does not display Unicode characters, it converts a Unicode String with characters from one ANSI codepage into something that can be displayed using an ANSI control.
Last edited by dilettante; Sep 14th, 2009 at 05:37 PM.
Alhumdilah (Praise be to God) I have finally found the solution. The problem was the unicode format, the trouble lies with UTF-8 and UTF-16-BE formats. Once I save the file as UTF-16-LE (in Notepad Save As... Unicode) everything worked. Thanks for everybody's help.
Here is the code for anybody who needs this in the future:
Code:
Private Function LoadLanguageFileUnicode(strPath As String) As Boolean
Dim lngFileNum As Long
Dim strResult() As String
Dim bytResults() As Byte
Dim iCounter As Long
On Error GoTo ErrorHandler
lngFileNum = FreeFile()
Open strPath For Binary Access Read As #lngFileNum
ReDim bytResults(LOF(lngFileNum) - 1)
Get #lngFileNum, , bytResults
Close #lngFileNum
If bytResults(0) = 255 And bytResults(1) = 254 Then 'UTF16LE
strResult = Split(Mid$(bytResults, 2), vbCrLf)
ElseIf bytResults(0) = 254 And bytResults(1) = 255 Then 'UTF16BE Unicode format not supported
LoadLanguageFileUnicode = False
Exit Function
ElseIf bytResults(0) = 239 And bytResults(1) = 187 And bytResults(2) = 191 Then 'UTF8 Unicode format not supported
LoadLanguageFileUnicode = False
Exit Function
End If
For iCounter = 0 To UBound(strResult)
Label(iCounter).Caption = strResult(iCounter)
Next iCounter
LoadLanguageFileUnicode = True
Exit Function
ErrorHandler:
LoadLanguageFileUnicode = False
End Function