Results 1 to 9 of 9

Thread: Found a way to test read for 4 unicode file types, and then read content to a string

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2024
    Posts
    733

    Found a way to test read for 4 unicode file types, and then read content to a string

    A little rough looking, but it works and it is fast to read into a string file from a unicode file, and the chars look right.

    I send the file path name to a sub, it works on it using streamreader and reports back the file encoding.
    Based on that, I open the file contents into a string.

    I tested using NotePad++ to convert file encodings, just be sure to save file in NotePad++ after you change encoding.

    Working with UTF8, UTF16, big and little endians, UTF32, so far for me.

    Code:
       'all MARC 21 files are UTF8
       tester(FilenameToBreak, Fileunicodetype)
       Dim content As String
       If Fileunicodetype = "Unicode (UTF-8)" Or Fileunicodetype = "EncodingUnknown" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.UTF8) 'utf8
       If Fileunicodetype = "Unicode" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.Default) 'utf16 little endian
       If Fileunicodetype = "Unicode (Big-Endian)" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.BigEndianUnicode) 'utf16
       If Fileunicodetype = "Unicode (UTF-32)" Then content = IO.File.ReadAllText(FilenameToBreak, System.Text.Encoding.UTF32) 'or use default

    Code:
    Public Sub tester(ByRef FileName As String, ByRef FileUnicodetype As String)
    'will tell you the file unicode encoding
    
    '********************************************************************
    Dim sr As New StreamReader(FileName, True)
    Dim Countchars As Integer
    Do While sr.Peek() >= 0
    'Debug.Write(Convert.ToChar(sr.Read()))
    Countchars += 1
    If Countchars > 10 Then Exit Do
    Loop
    Debug.WriteLine(" ")
    
    'Test for the encoding after reading, or at least
    'after the first read.
    
    ' Debug.Print("The encoding used was {0}.", sr.CurrentEncoding)
    
    FileUnicodetype = sr.CurrentEncoding.EncodingName 'CurrentEncoding
    sr.Close()
    Catch e As Exception
    
    'Debug.Print("The process failed: {0}", e.ToString())
    
    FileUnicodetype = "EncodingUnknown"
    
    End Try
    
    
    End Sub

  2. #2
    PowerPoster PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Pontypool, Wales
    Posts
    2,596

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    https://learn.microsoft.com/en-us/do...r?view=net-8.0 is an overload of StreamReader that will automatically detect the correct encoding to use, no need to do all the work yourself.

    Just create the StreamReader and then call ReadToEnd to get the contents as a string.
    Last edited by PlausiblyDamp; May 19th, 2024 at 07:18 PM.

  3. #3
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,560

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    Thread moved to VB.NET CodeBank forum, which is where you should post when sharing a working code snippet rather than asking a question.

  4. #4

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2024
    Posts
    733

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    Quote Originally Posted by PlausiblyDamp View Post
    https://learn.microsoft.com/en-us/do...r?view=net-8.0 is an overload of StreamReader that will automatically detect the correct encoding to use, no need to do all the work yourself.

    Just create the StreamReader and then call ReadToEnd to get the contents as a string.
    I have seen that read to end.
    I thought it would read the entire file?
    Which I thought is a waste of time?

    Can read to end also load it into a string?

  5. #5
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,560

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    Quote Originally Posted by sdowney1 View Post
    I have seen that read to end.
    I thought it would read the entire file?
    Which I thought is a waste of time?
    It seems to be that you are saying that you want to read a little bit of the file to get the encoding and then us that encoding to read the whole file. The point is that creating and reading the file the way PD suggested allows to read the whole file into a String without specifying the encoding because the StreamReader will work it out for itself. Presumably it does internally something like what you're doing externally.
    Quote Originally Posted by sdowney1 View Post
    Can read to end also load it into a string?
    When you read the documentation for the ReadToEnd method, surely you saw that it returns a String.

  6. #6

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2024
    Posts
    733

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    Ok, will look into it some more.
    One of the assumptions I made is someone opening some kind of textfile from the open dialog. But I suppose somehow someone could feed an exe or some other wrong file into it. Will have to test and see what that does.

  7. #7

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2024
    Posts
    733

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    That site is where I got that sub from that I used.
    I actually read that before and it never connected with me it can read a string from a file.
    It looked like information overload, so I pulled out from there things so I could make something work.
    I learn better by seeing more completed code examples. Some people are better at making connections as they are experienced.

  8. #8

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2024
    Posts
    733

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    I just tried this and it works on all 4 unicode types using Notepad++ to convert the files.

    Code:
            Dim sr As New StreamReader(FilenameToBreak, True)
            content = sr.ReadToEnd()
            sr.Close()
    Will say that doing all the other coding, helped me learn about various types of unicode and streamreader functions.

    I wonder, will that cause an exception if try to read a file that is not text?

  9. #9
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,560

    Re: Found a way to test read for 4 unicode file types, and then read content to a str

    Quote Originally Posted by sdowney1 View Post
    I wonder, will that cause an exception if try to read a file that is not text?
    All files are just bytes so it will just assume that the encoding info is missing and assume a default, then interpret the bytes present as text and produce gibberish, just as happens when you open a binary file in a text editor.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width