Results 1 to 21 of 21

Thread: [RESOLVED] Loading a unicode text file

  1. #1

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Resolved [RESOLVED] Loading a unicode text file

    Hello everybody,

    Currently I am able to load an ANSI text file, and loop through all the lines, reading each one. I do this like ths:

    Code:
    lngFileNum = FreeFile()
    Open strpath For Input As #lngFileNum
    Do While Seek(lngFileNum) <= LOF(lngFileNum)
        Line Input #lngFileNum, strResult
        'Do Something with strResult
    Loop
    Close 'lngFileNum
    Now when I do the same with a unicode file, I get garbage characters even though the charset, and correct font is being used for the controls

    When I add the unicode strings to a ressource file and load it, everything displays correctly good. However if I load a text file into memory I get garbage data.

    Can enyone tell me how to cycle though (line by line) the unicode text file?

  2. #2
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    The VB6 native file I/O statements are limited to handling files encoded in "ANSI" using the current locale setting.

    Look up the ADO Stream object. It can be loaded from disk and read line by line in text mode. It will handle files in ASCII, UTF-16, UTF-8, whatever. It defaults to "Unicode" (UTF-16).

    You could also read such a file using the FSO, but it is limited to ASCII and Unicode (UTF-16).

  3. #3

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Quote Originally Posted by dilettante View Post
    The VB6 native file I/O statements are limited to handling files encoded in "ANSI" using the current locale setting.

    Look up the ADO Stream object. It can be loaded from disk and read line by line in text mode. It will handle files in ASCII, UTF-16, UTF-8, whatever. It defaults to "Unicode" (UTF-16).

    You could also read such a file using the FSO, but it is limited to ASCII and Unicode (UTF-16).
    Thanks for the reply. I don't want to use the ADO stream or FSO as I would then have to ship extra files in the installation package. I know this can be done by loading the file as a byte array however once I have the file in a byte array I don't know how to read line by line the unicode text and put it into a string.

  4. #4
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    It is hard to imagine anyone using a version of Windows too old to have at least ADO 2.5 preinstalled as part of the OS, but whatever.

    You can convert the Byte array to a String by a simple assignment:

    s = b

    You'll have to pick apart lines with more code though, such as by using Split().

  5. #5

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Quote Originally Posted by dilettante View Post
    It is hard to imagine anyone using a version of Windows too old to have at least ADO 2.5 preinstalled as part of the OS, but whatever.

    You can convert the Byte array to a String by a simple assignment:

    s = b

    You'll have to pick apart lines with more code though, such as by using Split().
    Thanks again for the quick reply. I tried that but get the same results as if I loaded it like I did above. I think once I have the byte array I need to do some manipulation so that the caracters display correctly. Like I said when I have the strings in the ressource file and load them they display correctly, so I know the characters can be displayed. The problem is when I load from a text file.

  6. #6
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    I assumed you were reading into the Byte array using a file opened in Binary mode. It sounds as if you are opening it in text mode instead. Use something like:

    Open "file.txt" For Binary Access Read As #intFile

  7. #7
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Loading a unicode text file

    you can work with unicode text files as strings, as long as you do not need to display in vb controls, i have an example with textfiles containing arabic characters, using find and replace the write output to new file, i found it unneccessary to use byte arrays

    http://www.vbforums.com/showthread.p...arabic+unicode
    the sample files i was working with were posted earlier in that thread
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  8. #8
    Frenzied Member some1uk03's Avatar
    Join Date
    Jun 2006
    Location
    London, UK
    Posts
    1,675

    Re: Loading a unicode text file

    If you don't want external files then have search for the

    ReadFile APIs and WriteFile APIs..

    If i have time, ill post an example later...
    _____________________________________________________________________

    ----If this post has helped you. Please take time to Rate it.
    ----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.



  9. #9
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    You don't need to use API calls to read Unicode (UTF-16LE) text files. You just need to do as I suggested earlier.

    The attached demo contains a simple UnicodeReader.cls I threw together based on another similar class. It has a few "extras" you might not need but it is pretty lightweight, has no external dependencies, and should not have any serious bugs. You could rip out the logic and use it, pretty it up a bit, or just use it as-is in a new Project.

    To run the demo you need the Microsoft Forms 2.0 controls, which of course have issues including not being redistributable. You can get these a number of ways, including by having Office (2000 or later?) installed. The TextBox control from this set made the demo easier because it can handle pure "Unicode" and display multilingual Charsets simultaneously.
    Attached Files Attached Files

  10. #10
    Frenzied Member some1uk03's Avatar
    Join Date
    Jun 2006
    Location
    London, UK
    Posts
    1,675

    Re: Loading a unicode text file

    To run the demo you need the Microsoft Forms 2.0 controls
    Of course you need the APIs.

    If you read properly, he tries to avoid the use of any external files that needed to be redistributed. So Microsoft Forms 2.0 is no exception!

    I've managed to dig up a module I found on my HD, which contains different ways of reading & writing in UniCode.
    Attached Files Attached Files
    Last edited by some1uk03; Sep 13th, 2009 at 04:01 PM.
    _____________________________________________________________________

    ----If this post has helped you. Please take time to Rate it.
    ----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.



  11. #11
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    No, the question was how to read Unicode files line by line. I was only using the MS Forms 2.0 controls to demo the results of doing it. Those controls have nothing to do with the process of reading the text from disk.

    If the real problem is displaying Unicode symbols then it would be another story of course.

  12. #12

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Thanks everybody for your replies and sample codes that you sent. I will work on these and let you know how it went. After a two minute test I am still not seeing the arabic characters but I will play around with the code and see if I can fix it. I will keep you posted.

  13. #13
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Loading a unicode text file

    my comment was supposed to indicate that you can work with the arabic and save it to another text file but not display it in standard vb6 controls or debug window, only richtext control is unicode enabled
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  14. #14

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Quote Originally Posted by westconn1 View Post
    my comment was supposed to indicate that you can work with the arabic and save it to another text file but not display it in standard vb6 controls or debug window, only richtext control is unicode enabled
    I am able to display arabic script in vb6 controls in these conditions:

    1) System is middle-east enabled (Regional and Language Options in controls panel)

    2) I have set the Font.Charset of all the controls to 178 (Arabic)

    3) The strings are located in the ressource file.

    My difficulty is when the string is located in a text file. I can read the text file but when displaying it in controls it doesn't work. I think I need to do some sort of conversion.

  15. #15
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    Quote Originally Posted by westconn1 View Post
    ... only richtext control is unicode enabled
    This seems to be a persistent rumor. RichTextBox is based on the old Riched32.dll, i.e. version 1.0 of Microsoft's Rich Edit control. It does not support Unicode, as can be easily seen by substituting one for the MS Forms 2.0 TextBox I used in the project I attached above.

    Overview of rich edit control versions

    About Rich Edit Controls

    Unicode text in a VB6 RichTextBox?

    I'd love to be proven wrong. Have a Demo?
    Last edited by dilettante; Sep 13th, 2009 at 11:14 PM.

  16. #16
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Loading a unicode text file

    there are 3rd party unicode controls available, but i have never needed to use any, search in this forum should return some results

    sorry for any misinformation about richtext box, i thought i had used it with some unicode testing, but must be mistaken
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  17. #17

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    I have tried using the following code while uploading the attached sample test file (text.txt) that includes some arabic script. However the data that comes out is garbage, if I use some arabic script that is located in the ressource file it shows correctly. Is there way to modify the code below so that the characters display correctly?

    Code:
    Dim fnum As Integer
    Dim data() As Byte
    Dim str As String
    
    fnum = FreeFile(0)
    Open FileName For Binary Access Read As #fnum
    ReDim data(LOF(fnum) - 1)
    Get #fnum, , data
    Close #fnum
    
    str = data
    
    MsgBox StrConv(str, vbUnicode)
    Attached Files Attached Files

  18. #18
    Frenzied Member some1uk03's Avatar
    Join Date
    Jun 2006
    Location
    London, UK
    Posts
    1,675

    Re: Loading a unicode text file

    U need the unicode aware version of the MESSAGE box to display unicode characters.

    Declare Function MessageBoxIndirectW Lib "user32" (lpMsgBoxParams As MSGBOXPARAMSW) As Long

    Private Const MB_USERICON = &H80&

    Type MSGBOXPARAMSW
    cbSize As Long
    hwndOwner As Long
    hInstance As Long
    lpszText As Long
    lpszCaption As Long
    dwStyle As Long
    lpszIcon As Long
    dwContextHelpId As Long
    lpfnMsgBoxCallback As Long
    dwLanguageId As Long
    End Type
    _____________________________________________________________________

    ----If this post has helped you. Please take time to Rate it.
    ----If you've solved your problem, then please mark it as RESOLVED from Thread Tools.



  19. #19

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Quote Originally Posted by some1uk03 View Post
    U need the unicode aware version of the MESSAGE box to display unicode characters.

    Declare Function MessageBoxIndirectW Lib "user32" (lpMsgBoxParams As MSGBOXPARAMSW) As Long

    Private Const MB_USERICON = &H80&

    Type MSGBOXPARAMSW
    cbSize As Long
    hwndOwner As Long
    hInstance As Long
    lpszText As Long
    lpszCaption As Long
    dwStyle As Long
    lpszIcon As Long
    dwContextHelpId As Long
    lpfnMsgBoxCallback As Long
    dwLanguageId As Long
    End Type
    Thanks for your reply, I was just using the message box as an example however your api call will be useful.

    The same garbage outputs aso on controls like frames, command buttons, option buttons etc... These same controls display correctly when the source of the unicode text is the ressource file. When the source is the upload from a unicode text is where the problem lies.

  20. #20
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Loading a unicode text file

    I think I have a better picture of what you're doing now. The part about things working for a text resource seems a little odd, but perhaps that is a case where the program was built on a machine where a Middle Eastern system locale was set... or something.

    In any case if we limit outselves to one Charset and Locale it is fairly easy to spoof most regular VB6 ANSI controls into displaying text for a non-system locale. The process is a little weird, but basically you have to:
    • Get a Unicode String containing characters from the desired Locale.
    • Translate the String from Unicode to ANSI using the desired Locale ID value.
    • Translate the String from ANSI to Unicode using the default (system) Locale ID, creating "mystery meat" Unicode.
    • Set the control's Charset to the desired script (e.g. Arabic), which can be done through the Font Property dialog or in code.
    • Set the control's Text or Caption property to the "mystery meat" mangled Unicode.

    When the Text/Caption value is assigned, VB6 will perform an implicit Unicode to ANSI conversion using the system Locale ID value, undoing the mangling and converting the mystery meat to the desired ANSI codepage. Everything should display fine - aside from left-to-right vs. right-to-left ordering, another topic.

    There are multiple Arabic Locales. I have put a few in my sample program as Consts, more can be found at Locale IDs Assigned by Microsoft.

    The attached project uses a regular intrinsic VB6 TextBox control. Note that this technique does not display Unicode characters, it converts a Unicode String with characters from one ANSI codepage into something that can be displayed using an ANSI control.
    Attached Images Attached Images  
    Attached Files Attached Files
    Last edited by dilettante; Sep 14th, 2009 at 05:37 PM.

  21. #21

    Thread Starter
    Hyperactive Member Hassan Basri's Avatar
    Join Date
    Sep 2006
    Posts
    324

    Re: Loading a unicode text file

    Alhumdilah (Praise be to God) I have finally found the solution. The problem was the unicode format, the trouble lies with UTF-8 and UTF-16-BE formats. Once I save the file as UTF-16-LE (in Notepad Save As... Unicode) everything worked. Thanks for everybody's help.

    Here is the code for anybody who needs this in the future:

    Code:
    Private Function LoadLanguageFileUnicode(strPath As String) As Boolean
    
    Dim lngFileNum As Long
    Dim strResult() As String
    Dim bytResults() As Byte
    Dim iCounter As Long
    
    On Error GoTo ErrorHandler
    
    lngFileNum = FreeFile()
    Open strPath For Binary Access Read As #lngFileNum
    ReDim bytResults(LOF(lngFileNum) - 1)
    Get #lngFileNum, , bytResults
    Close #lngFileNum
    
    If bytResults(0) = 255 And bytResults(1) = 254 Then 'UTF16LE
        strResult = Split(Mid$(bytResults, 2), vbCrLf)
    ElseIf bytResults(0) = 254 And bytResults(1) = 255 Then 'UTF16BE Unicode format not supported
        LoadLanguageFileUnicode = False
        Exit Function
    ElseIf bytResults(0) = 239 And bytResults(1) = 187 And bytResults(2) = 191 Then 'UTF8 Unicode format not supported
        LoadLanguageFileUnicode = False
        Exit Function
    End If
    
    For iCounter = 0 To UBound(strResult)
        Label(iCounter).Caption = strResult(iCounter)
    Next iCounter
    
    LoadLanguageFileUnicode = True
    
    Exit Function
    ErrorHandler:
    LoadLanguageFileUnicode = False
    
    End Function

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width