Results 1 to 13 of 13

Thread: How to read line-wise any text file

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    844

    How to read line-wise any text file

    I am trying to read a text file line-wise into a string array (each line of the text file going into one member of the array).

    The name and/or path of the text file may be ANSI or may be unicode.
    The CONTENTS of the text file also may be ANSI or may be unicode. In other words trying to line-wise read the contents of either an ANSI text file or a unicode text file.

    Nothing that I do works.

    For example this code:
    Code:
       Dim New_c          As New cConstructor
       Dim FileContent    As String
       dim LineList()     As string
    
       With New_c.fso.OpenFileStream(File_Name)
          .ReadToByteArr bytResults()
       End With
       
       'Convert to string
       If (bytResults(0) = &HFF) And (bytResults(1) = &HFE) Then
          FileContent = Mid$(bytResults, 2)
       Else
          FileContent = bytResults()                          
       End If
    
       LineList() = Split(FileContent, vbCrLf)
    When I read the contents of the text file using the above code, and display it into an InkEdit textbox, it shows completely wrong stuff (strange characters) that look nothing like the real contents of the text file.

    Also, when I step through the above code, I realize that for a unicode text file (a text file with its CONTENTS being unicode):
    Unlike what I expect, the first byte of the file is NOT &HFF
    And the second byte is NOT &HFE
    So, the Else part of that If statement kicks in.

    I really don't understand what's going on.

    How can I write a simple piece of code (using whatever technique) that would read line-wise the contents of any text file (ANSI or unicode file name and or ANSI or unicode content).

    There should be a way of doing this, but I don't know why nothing works.

    Please help.
    Thanks.

  2. #2
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,892

    Re: How to read line-wise any text file

    You might have a BOM-less file. Maybe the files are UTF-8 instead of UTF-16 or UTF-32 or some other encoding.

    Can you post some example files that you are trying to read?

    Please note that it might take some time to get back to you though due to the holidays.

  3. #3
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,910

    Re: How to read line-wise any text file

    Ilia,

    If you give us a file as a sample, we can probably figure out what's going on much better.

    There are several considerations: ASCII, ANSI, Unicode (UTF-8, UTF-16, etc), BOM marker, line terminators (CRLF or just LF), etc.

    Also, I've got no idea what your cConstructor class is.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  4. #4
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,892

    Re: How to read line-wise any text file

    Quote Originally Posted by Elroy View Post
    Also, I've got no idea what your cConstructor class is.
    It's a vbRichClient5 global multi-use class factory for creating other RC5 class objects.

    Ilia - you don't need to instantiate it by declaring "Dim New_c As New cConstructor", so I recommend deleting that line.

  5. #5

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    844

    Re: How to read line-wise any text file

    Here is my code:
    Code:
    Private Function Read_TextFile_LineWise(ByVal Text_File_Name As String) As String()
       
       'Var
       Dim bytResults()       As Byte
       Dim FileContent        As String
    '   Dim New_c              As New cConstructor
       
       With New_c.fso.OpenFileStream(Text_File_Name)
          .ReadToByteArr bytResults()
       End With
       
       'Convert to string
       If (bytResults(0) = &HFF) And (bytResults(1) = &HFE) Then
          FileContent = Mid$(bytResults, 2)
       Else
          FileContent = bytResults()
          Read_TextFile_LineWise = Split(FileContent, vbCrLf)
       End If
       
       'Result
       Read_TextFile_LineWise = Split(FileContent, vbCrLf)
       
    End Function
    Called like this:
    Code:
    Private Sub Command113_Click()
       
       Dim Text_File_Name     As String
       Dim LineList()         As String
       Dim s                  As String
       Dim i                  As Long
       Dim L                  As Long
       
       s = ""
       
       Text_File_Name = "D:\Temp1\Unicode_TextFile.txt"
       Text_File_Name = "D:\Temp1\ANSI_TextFile.txt"
       
       LineList() = Read_TextFile_LineWise(Text_File_Name)
       L = UBound(LineList()) - LBound(LineList()) + 1
       
       For i = 0 To L - 1
          s = s & LineList(i) & vbCrLf
       Next i
       
       txtOutput.Text = s
    End Sub

    Here are the two files that I am using:
    http://www.mediafire.com/file/9qmqrs...em001.zip/file

    And this is a set of two screenprints for these two files as my code displays them in an InkEdit textbox
    https://i.imgur.com/wS8BclG.jpg

    Please advise.
    Thanks.

  6. #6
    Fanatic Member
    Join Date
    Jul 2007
    Location
    Essex, UK.
    Posts
    579

    Re: How to read line-wise any text file

    Ilia I looked at both files with NotePad++ and they are both UTF-8 so you need to decode them first.

    Why not download Notepad++ it is free and so useful for this kind of thing.

    Merry Christmas.

  7. #7
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: How to read line-wise any text file

    The cFSO-Class has a builtin function for TextContent-Reading:
    New_c.FSO.ReadTextContent(...)

    Here is Example-Code, how to use it with different TextFiles (ANSI, UTF8 with/without BOM + UTF16-LE and UTF16-BE):
    Code:
    Option Explicit
    
    Private Declare Function TextOutW Lib "gdi32" (ByVal hDC As Long, ByVal x As Long, ByVal y As Long, ByVal pS As Long, ByVal SLen As Long) As Long
    
    Private Sub Form_Load()
      AutoRedraw = True: FontName = "Arial"
      Dim FileName As String
      
      FileName = App.Path & "\ANSI.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName)
     
      FileName = App.Path & "\UTF8_without_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, False, CP_UTF8)
     
      FileName = App.Path & "\UTF8_with_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, False, CP_UTF8)
     
      FileName = App.Path & "\UTF16_LE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, True)
     
      FileName = App.Path & "\UTF16_BE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, True)
    End Sub
    
    Private Sub PrintInfo(FileName As String, ByVal S As String)
      Static FNam$, SLen As Long, FLen As Long, SOut$, yOffs As Long
      
      FNam = New_c.FSO.GetFileNameFromFullPath(FileName)
      TextOutW hDC, 0, yOffs, StrPtr(FNam), Len(FNam): yOffs = yOffs + 13
      
      SLen = Len(S)
      FLen = New_c.FSO.FileLen(FileName)
      
      SOut = "FLen: " & FLen & ", SLen: " & SLen & "  [" & S & "]"
      TextOutW hDC, 0, yOffs, StrPtr(SOut), Len(SOut): yOffs = yOffs + 20
    End Sub
    Here the complete Test-Project (including the different TextFiles):
    TextReading.zip

    As for fast line-reading...
    There's also a specialized RC5-Class for that (cCSV)... but if your Files are < 1MB or so,
    the Split-function-based approach will work "well enough" - and is easier to implement.

    IMO you don't "experiment enough" with the offered Functions (behind cFSO - after looking them up in the VB6-ObjectExplorer or via Intellisense).


    HTH

    Olaf

  8. #8
    Fanatic Member
    Join Date
    Jul 2007
    Location
    Essex, UK.
    Posts
    579

    Re: How to read line-wise any text file

    Olaf is VBRC able to open a text file and decode it even if you do not know what format it is in, just like Notepad++?

    Merry Christmas to you.

  9. #9
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: How to read line-wise any text file

    Quote Originally Posted by Steve Grant View Post
    Olaf is VBRC able to open a text file and decode it even if you do not know what format it is in, just like Notepad++?
    It does not really have such a "Universal TextReader"-function built-in.
    (because there's TextFiles out there, where such an universal read-function will deliver the wrong result).

    One can only be sure for TextFiles which have a leading BOM... (or when you know, how they were produced).

    New_c.FSO.ReadTextContent(FileName) will ("kind of universally") read such Files correctly,
    (as long as a BOM is in place) - by leaving the two optional Extra-Params out of the Function-call.

    You can check this out by removing the two Extra-Params in the 5 calls of the example (only leaving the FileName-Param in place).
    Everything should be read out correctly with only one exception -> on this File:
    - UTF8_without_BOM.txt
    So, the "No BOM was found"-default-behaviour of cFSO.ReadTextContent is currently:
    "treat it as ANSI in the current locale" (to be compatible to VB6-behaviour).

    Other "universal functions" (as I assume NotePad++) will probably treat the "No BOM was found"-case as an UTF8-encoded file by default.

    There is certain "heuristics" one could apply on the FileContent-Bytes of "No BOM Files", to determine if it is:
    - 16Bit WChar-content
    - if not - then one could try to look for UTF8-sequences (to "guess" at least that CodePage right)
    - and in case it is "8Bit" without UTF8-sequences, one could try to determine a "most likely ANSI-codepage" by other means
    ..(but that last part is hard, and will have a high error-probability)

    In my own Apps (and also at the place where I work) we follow the simple pattern of:
    Know your Sources (the guys, or environments who produced the File) -
    followed by: If they used anything else but UTF8-encoding, try to convince them to switch to it.

    Just for completeness, one can enforce the routine to "default to UTF8" in the "No-BOM-cases" as well,
    reading all Files correctly using this "general Param-Setting":
    Code:
    Private Sub Form_Load()
      AutoRedraw = True: FontName = "Arial"
      Dim FileName As String
      
      FileName = App.Path & "\ANSI.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF8_without_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF8_with_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF16_LE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF16_BE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    End Sub
    Quote Originally Posted by Steve Grant View Post
    Merry Christmas to you.
    Same to you (and all others)

    Olaf
    Last edited by Schmidt; Dec 25th, 2019 at 07:30 AM.

  10. #10

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    844

    Re: How to read line-wise any text file

    Quote Originally Posted by Schmidt View Post
    It does not really have such a "Universal TextReader"-function built-in.
    ......
    Just for completeness, one can enforce the routine to "default to UTF8" in the "No-BOM-cases" as well,
    reading all Files correctly using this "general Param-Setting":
    Code:
    Private Sub Form_Load()
      AutoRedraw = True: FontName = "Arial"
      Dim FileName As String
      
      FileName = App.Path & "\ANSI.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF8_without_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF8_with_BOM.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF16_LE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
     
      FileName = App.Path & "\UTF16_BE.txt"
      PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    End Sub
    Same to you (and all others)

    Olaf
    Thanks a lot for the great help and advice.

    In the last part of your post you are putting forth a genuinely universal TextReader.
    I just used this:
    Code:
       New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    for all kinds of text files, and it reads all of them correctly.

    So, why are you saying that it is not a universal text reader?
    Looks like it is!

    My point is: If I ALWAYS call this function like this:
    Code:
       New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    will there be a situation that it would result in wrong output?

    At least with the five different files that we discussed, it looks like it ALWAYS gives the correct output.
    Doesn't it?

    In other words if I write a general-purpose function like this:
    Code:
    Public Function Read_TextFile(File_Name As String) As String
       Read_TextFile = New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    End Function
    and put it in a bas module that is used by many vbp projects, and let all of those projects call this function, and I totally forget about it, will you agree with this and endorse it?
    Or, do you advise against it?
    And why?

    By the way, Merry Christmass to everyone.

    Thanks
    Ilia

  11. #11

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    844

    Re: How to read line-wise any text file

    Any comments on this will be greatly appreciated.

    I just need to know if the code that I proposed in my previous post (post #10) (initially proposed by Schmit in post #9):
    Code:
    Public Function Read_TextFile(File_Name As String) As String
       Read_TextFile = New_c.FSO.ReadTextContent(File_Name, , CP_UTF8)
    End Function
    is safe to be used as a universal text file reader or not?

    All my tests show that it can read any and every text file.
    But, I still need to know what other people think about it.

    If I put it in a bas module that is used by many vbp projects, and let all of those projects call this function, and I totally forget about it, will you agree with this and endorse it?
    Or, do you advise against it?
    And why?

    Thanks
    And special thanks to Schmit.
    Ilia

  12. #12
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: How to read line-wise any text file

    Quote Originally Posted by IliaPreston View Post
    In the last part of your post you are putting forth a genuinely universal TextReader.
    I just used this:
    Code:
       New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    for all kinds of text files, and it reads all of them correctly.

    So, why are you saying that it is not a universal text reader?
    Looks like it is!
    On a system with an english locale, the "used Char-range, when operating in ANSI-mode" -
    is a subset of UTF8 (which below CharCode 128 is absolutely the same as "US-ASCII").

    Quote Originally Posted by IliaPreston View Post
    My point is: If I ALWAYS call this function like this:
    Code:
       New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
    will there be a situation that it would result in wrong output?

    At least with the five different files that we discussed, it looks like it ALWAYS gives the correct output.
    Doesn't it?
    The function will work flawlessly, as long as all the input-files were stored using one of the "unicode-encodings"
    (when they come in either UTF8, or UTF16-LE or UTF16BE).

    It will also work fine with any ANSI-file, which was written using an english locale
    (char-range of 0-127 - because then, the CP_UTF8-CodePage-Param will do no harm).

    The "ANSI-situation" is entirely different in Europe (e.g. when you get ANSI-files,
    which were created using a danish, or czech or greek or russian CodePage... heck,
    just a "simple german Umlaut" in your ANSI-file would come out garbled, when you decode it with the CP_UTF8 codepage-setting).

    So, in these "non-english" ANSI-cases, you will have to match the last optional "CP_Param" -
    exactly to the CodePage which was used at creation-time of the file -
    if you don't do that and leave it at CP_UTF8 - it will "scramble the content".

    That's what this optional CodePage-param is for (for all the "non-UTF8", "non-english-ANSI-cases").

    Therefore I would not "hide" this optional Param behind a wrapper-function.
    (who knows, when you will need it)...

    But maybe you *are* absolutely sure, where your Source-Files "come from",
    "always" getting either Unicode- or ANSI-files from only english locales -
    in that scenario the Function really would work universally for such "restricted Input".

    HTH

    Olaf

  13. #13

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    844

    Re: How to read line-wise any text file

    Thanks for all the great help and advice.

    And, why is it that it is recommended that I should not declare New_c like this:
    Code:
    Dim New_c As New cConstructor
    What is the problem with declaring it as above?

    And also, why is it that it even works without declaration?
    When I remove the declaration, it still works. I don't understand why.
    There is Option Explicit, so it should force the declaration.
    Why is it that it doesn't?

    Thanks.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width