-
May 19th, 2024, 06:05 PM
#1
Thread Starter
Fanatic Member
Found a way to test read for 4 unicode file types, and then read content to a string
A little rough looking, but it works and it is fast to read into a string file from a unicode file, and the chars look right.
I send the file path name to a sub, it works on it using streamreader and reports back the file encoding.
Based on that, I open the file contents into a string.
I tested using NotePad++ to convert file encodings, just be sure to save file in NotePad++ after you change encoding.
Working with UTF8, UTF16, big and little endians, UTF32, so far for me.
Code:
'all MARC 21 files are UTF8
tester(FilenameToBreak, Fileunicodetype)
Dim content As String
If Fileunicodetype = "Unicode (UTF-8)" Or Fileunicodetype = "EncodingUnknown" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.UTF8) 'utf8
If Fileunicodetype = "Unicode" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.Default) 'utf16 little endian
If Fileunicodetype = "Unicode (Big-Endian)" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.BigEndianUnicode) 'utf16
If Fileunicodetype = "Unicode (UTF-32)" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.UTF32) 'or use default
Code:
Public Sub tester(ByRef FileName As String, ByRef FileUnicodetype As String)
'will tell you the file unicode encoding
'********************************************************************
Dim sr As New StreamReader(FileName, True)
Dim Countchars As Integer
Do While sr.Peek() >= 0
'Debug.Write(Convert.ToChar(sr.Read()))
Countchars += 1
If Countchars > 10 Then Exit Do
Loop
Debug.WriteLine(" ")
'Test for the encoding after reading, or at least
'after the first read.
' Debug.Print("The encoding used was {0}.", sr.CurrentEncoding)
FileUnicodetype = sr.CurrentEncoding.EncodingName 'CurrentEncoding
sr.Close()
Catch e As Exception
'Debug.Print("The process failed: {0}", e.ToString())
FileUnicodetype = "EncodingUnknown"
End Try
End Sub
-
May 19th, 2024, 07:15 PM
#2
Re: Found a way to test read for 4 unicode file types, and then read content to a str
https://learn.microsoft.com/en-us/do...r?view=net-8.0 is an overload of StreamReader that will automatically detect the correct encoding to use, no need to do all the work yourself.
Just create the StreamReader and then call ReadToEnd to get the contents as a string.
Last edited by PlausiblyDamp; May 19th, 2024 at 07:18 PM.
-
May 19th, 2024, 10:41 PM
#3
Re: Found a way to test read for 4 unicode file types, and then read content to a str
Thread moved to VB.NET CodeBank forum, which is where you should post when sharing a working code snippet rather than asking a question.
-
May 20th, 2024, 03:36 AM
#4
Thread Starter
Fanatic Member
Re: Found a way to test read for 4 unicode file types, and then read content to a str
Originally Posted by PlausiblyDamp
https://learn.microsoft.com/en-us/do...r?view=net-8.0 is an overload of StreamReader that will automatically detect the correct encoding to use, no need to do all the work yourself.
Just create the StreamReader and then call ReadToEnd to get the contents as a string.
I have seen that read to end.
I thought it would read the entire file?
Which I thought is a waste of time?
Can read to end also load it into a string?
-
May 20th, 2024, 04:15 AM
#5
Re: Found a way to test read for 4 unicode file types, and then read content to a str
Originally Posted by sdowney1
I have seen that read to end.
I thought it would read the entire file?
Which I thought is a waste of time?
It seems to be that you are saying that you want to read a little bit of the file to get the encoding and then us that encoding to read the whole file. The point is that creating and reading the file the way PD suggested allows to read the whole file into a String without specifying the encoding because the StreamReader will work it out for itself. Presumably it does internally something like what you're doing externally.
Originally Posted by sdowney1
Can read to end also load it into a string?
When you read the documentation for the ReadToEnd method, surely you saw that it returns a String.
-
May 20th, 2024, 04:20 AM
#6
Thread Starter
Fanatic Member
Re: Found a way to test read for 4 unicode file types, and then read content to a str
Ok, will look into it some more.
One of the assumptions I made is someone opening some kind of textfile from the open dialog. But I suppose somehow someone could feed an exe or some other wrong file into it. Will have to test and see what that does.
-
May 20th, 2024, 04:28 AM
#7
Thread Starter
Fanatic Member
Re: Found a way to test read for 4 unicode file types, and then read content to a str
That site is where I got that sub from that I used.
I actually read that before and it never connected with me it can read a string from a file.
It looked like information overload, so I pulled out from there things so I could make something work.
I learn better by seeing more completed code examples. Some people are better at making connections as they are experienced.
-
May 20th, 2024, 04:54 AM
#8
Thread Starter
Fanatic Member
Re: Found a way to test read for 4 unicode file types, and then read content to a str
I just tried this and it works on all 4 unicode types using Notepad++ to convert the files.
Code:
Dim sr As New StreamReader(FilenameToBreak, True)
content = sr.ReadToEnd()
sr.Close()
Will say that doing all the other coding, helped me learn about various types of unicode and streamreader functions.
I wonder, will that cause an exception if try to read a file that is not text?
-
May 20th, 2024, 07:32 AM
#9
Re: Found a way to test read for 4 unicode file types, and then read content to a str
Originally Posted by sdowney1
I wonder, will that cause an exception if try to read a file that is not text?
All files are just bytes so it will just assume that the encoding info is missing and assume a default, then interpret the bytes present as text and produce gibberish, just as happens when you open a binary file in a text editor.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|