|
-
Dec 24th, 2019, 06:16 PM
#1
Thread Starter
Fanatic Member
How to read line-wise any text file
I am trying to read a text file line-wise into a string array (each line of the text file going into one member of the array).
The name and/or path of the text file may be ANSI or may be unicode.
The CONTENTS of the text file also may be ANSI or may be unicode. In other words trying to line-wise read the contents of either an ANSI text file or a unicode text file.
Nothing that I do works.
For example this code:
Code:
Dim New_c As New cConstructor
Dim FileContent As String
dim LineList() As string
With New_c.fso.OpenFileStream(File_Name)
.ReadToByteArr bytResults()
End With
'Convert to string
If (bytResults(0) = &HFF) And (bytResults(1) = &HFE) Then
FileContent = Mid$(bytResults, 2)
Else
FileContent = bytResults()
End If
LineList() = Split(FileContent, vbCrLf)
When I read the contents of the text file using the above code, and display it into an InkEdit textbox, it shows completely wrong stuff (strange characters) that look nothing like the real contents of the text file.
Also, when I step through the above code, I realize that for a unicode text file (a text file with its CONTENTS being unicode):
Unlike what I expect, the first byte of the file is NOT &HFF
And the second byte is NOT &HFE So, the Else part of that If statement kicks in.
I really don't understand what's going on.
How can I write a simple piece of code (using whatever technique) that would read line-wise the contents of any text file (ANSI or unicode file name and or ANSI or unicode content).
There should be a way of doing this, but I don't know why nothing works.
Please help.
Thanks.
-
Dec 24th, 2019, 06:52 PM
#2
Re: How to read line-wise any text file
You might have a BOM-less file. Maybe the files are UTF-8 instead of UTF-16 or UTF-32 or some other encoding.
Can you post some example files that you are trying to read?
Please note that it might take some time to get back to you though due to the holidays.
-
Dec 24th, 2019, 07:07 PM
#3
Re: How to read line-wise any text file
Ilia,
If you give us a file as a sample, we can probably figure out what's going on much better.
There are several considerations: ASCII, ANSI, Unicode (UTF-8, UTF-16, etc), BOM marker, line terminators (CRLF or just LF), etc.
Also, I've got no idea what your cConstructor class is.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
-
Dec 24th, 2019, 07:14 PM
#4
Re: How to read line-wise any text file
 Originally Posted by Elroy
Also, I've got no idea what your cConstructor class is.
It's a vbRichClient5 global multi-use class factory for creating other RC5 class objects.
Ilia - you don't need to instantiate it by declaring "Dim New_c As New cConstructor", so I recommend deleting that line.
-
Dec 24th, 2019, 09:45 PM
#5
Thread Starter
Fanatic Member
Re: How to read line-wise any text file
Here is my code:
Code:
Private Function Read_TextFile_LineWise(ByVal Text_File_Name As String) As String()
'Var
Dim bytResults() As Byte
Dim FileContent As String
' Dim New_c As New cConstructor
With New_c.fso.OpenFileStream(Text_File_Name)
.ReadToByteArr bytResults()
End With
'Convert to string
If (bytResults(0) = &HFF) And (bytResults(1) = &HFE) Then
FileContent = Mid$(bytResults, 2)
Else
FileContent = bytResults()
Read_TextFile_LineWise = Split(FileContent, vbCrLf)
End If
'Result
Read_TextFile_LineWise = Split(FileContent, vbCrLf)
End Function
Called like this:
Code:
Private Sub Command113_Click()
Dim Text_File_Name As String
Dim LineList() As String
Dim s As String
Dim i As Long
Dim L As Long
s = ""
Text_File_Name = "D:\Temp1\Unicode_TextFile.txt"
Text_File_Name = "D:\Temp1\ANSI_TextFile.txt"
LineList() = Read_TextFile_LineWise(Text_File_Name)
L = UBound(LineList()) - LBound(LineList()) + 1
For i = 0 To L - 1
s = s & LineList(i) & vbCrLf
Next i
txtOutput.Text = s
End Sub
Here are the two files that I am using:
http://www.mediafire.com/file/9qmqrs...em001.zip/file
And this is a set of two screenprints for these two files as my code displays them in an InkEdit textbox
https://i.imgur.com/wS8BclG.jpg
Please advise.
Thanks.
-
Dec 25th, 2019, 04:04 AM
#6
Re: How to read line-wise any text file
Ilia I looked at both files with NotePad++ and they are both UTF-8 so you need to decode them first.
Why not download Notepad++ it is free and so useful for this kind of thing.
Merry Christmas.
-
Dec 25th, 2019, 04:08 AM
#7
Re: How to read line-wise any text file
The cFSO-Class has a builtin function for TextContent-Reading:
New_c.FSO.ReadTextContent(...)
Here is Example-Code, how to use it with different TextFiles (ANSI, UTF8 with/without BOM + UTF16-LE and UTF16-BE):
Code:
Option Explicit
Private Declare Function TextOutW Lib "gdi32" (ByVal hDC As Long, ByVal x As Long, ByVal y As Long, ByVal pS As Long, ByVal SLen As Long) As Long
Private Sub Form_Load()
AutoRedraw = True: FontName = "Arial"
Dim FileName As String
FileName = App.Path & "\ANSI.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName)
FileName = App.Path & "\UTF8_without_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, False, CP_UTF8)
FileName = App.Path & "\UTF8_with_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, False, CP_UTF8)
FileName = App.Path & "\UTF16_LE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, True)
FileName = App.Path & "\UTF16_BE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, True)
End Sub
Private Sub PrintInfo(FileName As String, ByVal S As String)
Static FNam$, SLen As Long, FLen As Long, SOut$, yOffs As Long
FNam = New_c.FSO.GetFileNameFromFullPath(FileName)
TextOutW hDC, 0, yOffs, StrPtr(FNam), Len(FNam): yOffs = yOffs + 13
SLen = Len(S)
FLen = New_c.FSO.FileLen(FileName)
SOut = "FLen: " & FLen & ", SLen: " & SLen & " [" & S & "]"
TextOutW hDC, 0, yOffs, StrPtr(SOut), Len(SOut): yOffs = yOffs + 20
End Sub
Here the complete Test-Project (including the different TextFiles):
TextReading.zip
As for fast line-reading...
There's also a specialized RC5-Class for that (cCSV)... but if your Files are < 1MB or so,
the Split-function-based approach will work "well enough" - and is easier to implement.
IMO you don't "experiment enough" with the offered Functions (behind cFSO - after looking them up in the VB6-ObjectExplorer or via Intellisense).
HTH
Olaf
-
Dec 25th, 2019, 04:26 AM
#8
Re: How to read line-wise any text file
Olaf is VBRC able to open a text file and decode it even if you do not know what format it is in, just like Notepad++?
Merry Christmas to you.
-
Dec 25th, 2019, 07:25 AM
#9
Re: How to read line-wise any text file
 Originally Posted by Steve Grant
Olaf is VBRC able to open a text file and decode it even if you do not know what format it is in, just like Notepad++?
It does not really have such a "Universal TextReader"-function built-in.
(because there's TextFiles out there, where such an universal read-function will deliver the wrong result).
One can only be sure for TextFiles which have a leading BOM... (or when you know, how they were produced).
New_c.FSO.ReadTextContent(FileName) will ("kind of universally") read such Files correctly,
(as long as a BOM is in place) - by leaving the two optional Extra-Params out of the Function-call.
You can check this out by removing the two Extra-Params in the 5 calls of the example (only leaving the FileName-Param in place).
Everything should be read out correctly with only one exception -> on this File:
- UTF8_without_BOM.txt
So, the "No BOM was found"-default-behaviour of cFSO.ReadTextContent is currently:
"treat it as ANSI in the current locale" (to be compatible to VB6-behaviour).
Other "universal functions" (as I assume NotePad++) will probably treat the "No BOM was found"-case as an UTF8-encoded file by default.
There is certain "heuristics" one could apply on the FileContent-Bytes of "No BOM Files", to determine if it is:
- 16Bit WChar-content
- if not - then one could try to look for UTF8-sequences (to "guess" at least that CodePage right)
- and in case it is "8Bit" without UTF8-sequences, one could try to determine a "most likely ANSI-codepage" by other means
..(but that last part is hard, and will have a high error-probability)
In my own Apps (and also at the place where I work) we follow the simple pattern of:
Know your Sources (the guys, or environments who produced the File) -
followed by: If they used anything else but UTF8-encoding, try to convince them to switch to it.
Just for completeness, one can enforce the routine to "default to UTF8" in the "No-BOM-cases" as well,
reading all Files correctly using this "general Param-Setting":
Code:
Private Sub Form_Load()
AutoRedraw = True: FontName = "Arial"
Dim FileName As String
FileName = App.Path & "\ANSI.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF8_without_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF8_with_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF16_LE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF16_BE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
End Sub
 Originally Posted by Steve Grant
Merry Christmas to you.
Same to you (and all others) 
Olaf
Last edited by Schmidt; Dec 25th, 2019 at 07:30 AM.
-
Dec 25th, 2019, 08:44 PM
#10
Thread Starter
Fanatic Member
Re: How to read line-wise any text file
 Originally Posted by Schmidt
It does not really have such a "Universal TextReader"-function built-in.
......
Just for completeness, one can enforce the routine to "default to UTF8" in the "No-BOM-cases" as well,
reading all Files correctly using this "general Param-Setting":
Code:
Private Sub Form_Load()
AutoRedraw = True: FontName = "Arial"
Dim FileName As String
FileName = App.Path & "\ANSI.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF8_without_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF8_with_BOM.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF16_LE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
FileName = App.Path & "\UTF16_BE.txt"
PrintInfo FileName, New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
End Sub
Same to you (and all others)
Olaf
Thanks a lot for the great help and advice.
In the last part of your post you are putting forth a genuinely universal TextReader.
I just used this:
Code:
New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
for all kinds of text files, and it reads all of them correctly.
So, why are you saying that it is not a universal text reader?
Looks like it is!
My point is: If I ALWAYS call this function like this:
Code:
New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
will there be a situation that it would result in wrong output?
At least with the five different files that we discussed, it looks like it ALWAYS gives the correct output.
Doesn't it?
In other words if I write a general-purpose function like this:
Code:
Public Function Read_TextFile(File_Name As String) As String
Read_TextFile = New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
End Function
and put it in a bas module that is used by many vbp projects, and let all of those projects call this function, and I totally forget about it, will you agree with this and endorse it?
Or, do you advise against it?
And why?
By the way, Merry Christmass to everyone.
Thanks
Ilia
-
Dec 26th, 2019, 07:51 PM
#11
Thread Starter
Fanatic Member
Re: How to read line-wise any text file
Any comments on this will be greatly appreciated.
I just need to know if the code that I proposed in my previous post (post #10) (initially proposed by Schmit in post #9):
Code:
Public Function Read_TextFile(File_Name As String) As String
Read_TextFile = New_c.FSO.ReadTextContent(File_Name, , CP_UTF8)
End Function
is safe to be used as a universal text file reader or not?
All my tests show that it can read any and every text file.
But, I still need to know what other people think about it.
If I put it in a bas module that is used by many vbp projects, and let all of those projects call this function, and I totally forget about it, will you agree with this and endorse it?
Or, do you advise against it?
And why?
Thanks
And special thanks to Schmit.
Ilia
-
Dec 26th, 2019, 07:57 PM
#12
Re: How to read line-wise any text file
 Originally Posted by IliaPreston
In the last part of your post you are putting forth a genuinely universal TextReader.
I just used this:
Code:
New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
for all kinds of text files, and it reads all of them correctly.
So, why are you saying that it is not a universal text reader?
Looks like it is!
On a system with an english locale, the "used Char-range, when operating in ANSI-mode" -
is a subset of UTF8 (which below CharCode 128 is absolutely the same as "US-ASCII").
 Originally Posted by IliaPreston
My point is: If I ALWAYS call this function like this:
Code:
New_c.FSO.ReadTextContent(FileName, , CP_UTF8)
will there be a situation that it would result in wrong output?
At least with the five different files that we discussed, it looks like it ALWAYS gives the correct output.
Doesn't it?
The function will work flawlessly, as long as all the input-files were stored using one of the "unicode-encodings"
(when they come in either UTF8, or UTF16-LE or UTF16BE).
It will also work fine with any ANSI-file, which was written using an english locale
(char-range of 0-127 - because then, the CP_UTF8-CodePage-Param will do no harm).
The "ANSI-situation" is entirely different in Europe (e.g. when you get ANSI-files,
which were created using a danish, or czech or greek or russian CodePage... heck,
just a "simple german Umlaut" in your ANSI-file would come out garbled, when you decode it with the CP_UTF8 codepage-setting).
So, in these "non-english" ANSI-cases, you will have to match the last optional "CP_Param" -
exactly to the CodePage which was used at creation-time of the file -
if you don't do that and leave it at CP_UTF8 - it will "scramble the content".
That's what this optional CodePage-param is for (for all the "non-UTF8", "non-english-ANSI-cases").
Therefore I would not "hide" this optional Param behind a wrapper-function.
(who knows, when you will need it)...
But maybe you *are* absolutely sure, where your Source-Files "come from",
"always" getting either Unicode- or ANSI-files from only english locales -
in that scenario the Function really would work universally for such "restricted Input".
HTH
Olaf
-
Dec 28th, 2019, 08:21 PM
#13
Thread Starter
Fanatic Member
Re: How to read line-wise any text file
Thanks for all the great help and advice.
And, why is it that it is recommended that I should not declare New_c like this:
Code:
Dim New_c As New cConstructor
What is the problem with declaring it as above?
And also, why is it that it even works without declaration?
When I remove the declaration, it still works. I don't understand why.
There is Option Explicit, so it should force the declaration.
Why is it that it doesn't?
Thanks.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|