Results 1 to 12 of 12

Thread: Using ReadLines

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Using ReadLines

    When we use system.io.ReadLines the content of the file should not be loaded to memory right?

    say I want to create a function.

    The function accept a string, say "post.txt" where post.txt is a large file (3G)

    I want to put the first 100k line of post.txt on post.txt and the rest on reserve_post.txt

    How would I do so?

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    There is no "system.io.ReadLines". There is a System.IO.File class and it has a ReadLines method. You are correct that that method doesn't read the entire file into memory immediately, which the ReadAllLines method does. ReadLines returns an IEnumerable(Of String) and each line is only read from the file as it's used.

    I wouldn't use ReadLines for what you want. It's would be more appropriate to go like this:
    vb.net Code:
    1. Dim headFileName = "post.txt"
    2. Dim tailFileName = "reserve_post.txt"
    3. Dim headLines As New List(Of String)
    4.  
    5. Const THRESHOLD As Integer = 100000
    6.  
    7. Using headFile As New FileStream(headFileName, FileMode.Open),
    8.       headReader As New StreamReader(headFileName)
    9.     Do Until headFile.Position >= THRESHOLD
    10.         headLines.Add(headReader.ReadLine())
    11.     Loop
    12.  
    13.     Using tailWriter As New StreamWriter(tailFileName)
    14.         Do Until headReader.EndOfStream
    15.             tailWriter.WriteLine(headReader.ReadLine())
    16.         Loop
    17.     End Using
    18. End Using
    19.  
    20. File.WriteAllLines(headFileName, headLines)
    I suggest this way because using the FileStream allows you to determine exactly how much of the file has been read, regardless of the encoding.

    Note that this does require you to read the data into memory until you reach the threshold. If you want to avoid storing more than one line in memory at a time then you would have to use a third file:
    vb.net Code:
    1. Dim headFileName = "post.txt"
    2. Dim tailFileName = "reserve_post.txt"
    3. Dim tempFileName = "temp.txt"
    4.  
    5. Const THRESHOLD As Integer = 100000
    6.  
    7. Using headFile As New FileStream(headFileName, FileMode.Open),
    8.       headReader As New StreamReader(headFileName)
    9.     Using tempWriter As New StreamWriter(tempFileName)
    10.         Do Until headFile.Position >= THRESHOLD
    11.             tempWriter.WriteLine(headReader.ReadLine())
    12.         Loop
    13.     End Using
    14.  
    15.     Using tailWriter As New StreamWriter(tailFileName)
    16.         Do Until headReader.EndOfStream
    17.             tailWriter.WriteLine(headReader.ReadLine())
    18.         Loop
    19.     End Using
    20. End Using
    21.  
    22. File.Copy(tempFileName, headFileName, True)
    23. File.Delete(tempFileName)
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Re: Using ReadLines

    Thanks. Is streamreader and streamwriter efficient?

    Do they write every single line or wait till there are significant lines in their buffer before dumping?

    And when did you close the streamwriter object?

    Ah I see. So that using thingy sort of terminate the stuff and called dispose member automatically. That's l33t programming.
    Last edited by teguh123; Feb 14th, 2011 at 01:50 AM.

  4. #4
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    A StreamReader will read what you tell it to read. File.ReadLines uses a StreamReader internally.

    A StreamWriter writes immediately to its underlying Stream, so whether data is buffered depends on that Stream. If it's a text file you're writing to then the StreamWriter sits on top of a FileStream, which is basically the only way to write to a file in .NET anyway. Pretty much anything else you do will have a FileStream in there, whether you can see it or not. As such, the efficiency or not is not really a concern but, of course, the authors have optimised it for performance.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  5. #5

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Re: Using ReadLines

    Everything is clear. But why you still need to use FileStream object? Why not use StreamReader object straight?

  6. #6
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    Quote Originally Posted by teguh123 View Post
    Everything is clear. But why you still need to use FileStream object? Why not use StreamReader object straight?
    If you'd read my post properly you'd already know that.
    Quote Originally Posted by jmcilhinney View Post
    I suggest this way because using the FileStream allows you to determine exactly how much of the file has been read, regardless of the encoding.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  7. #7

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Re: Using ReadLines

    Well I do not use filestream at all. To know how much of the file has been read I simply put a counter i. So that's what filestream is for, as a replacement of the counter

    I think I like the do until style jmcilhinney used. However that means I can't use for each on the colToPut. Ah you're using readline rather than readlines. Hmm....

    That filestream thingy is facinating. So two different objects point to the same file and one knows where the enumerator of the other one points. That's something new.

    vb.net Code:
    1. Public Function reserveMaintenance(ByRef cache As String, Optional ByRef reserve As String = "", Optional ByRef unique As Boolean = True, Optional ByRef reverse As Boolean = False, Optional ByRef dirty As Boolean = False, Optional ByRef ignoreReserve As Boolean = False) As System.Collections.Generic.List(Of String)
    2.         dirty = False
    3.  
    4.         If reserve = "" Then
    5.             reserve = reserveFileName(cache, True)
    6.         End If
    7.  
    8.         CreateFileIfNotExist(reserve)
    9.  
    10.         Const lowerLimit As Integer = 5000
    11.         Const upperLimit As Integer = 20000
    12.         Const average = (lowerLimit + upperLimit) \ 2
    13.  
    14.         If System.IO.File.ReadLines(cache).Count > upperLimit Or System.IO.File.ReadLines(cache).Count < lowerLimit Then
    15.             Dim colCache = System.IO.File.ReadLines(cache)
    16.             Dim colReserve = System.IO.File.ReadLines(reserve)
    17.             putStuffIntoFiles(reserveFileName(cache, False, True), colCache, average)
    18.             putStuffIntoFiles(reserveFileName(cache, False, True), colReserve, average)
    19.             System.IO.File.Delete(cache)
    20.             System.IO.File.Delete(reserveFileName(cache))
    21.             My.Computer.FileSystem.RenameFile(reserveFileName(cache, False, True), reserveFileName(cache, False, False))
    22.             My.Computer.FileSystem.RenameFile(reserveFileName(cache, True, True), reserveFileName(cache, True, False))
    23.         End If
    24.         Return fileToStringCol(cache)
    25.     End Function
    26.     Public Sub putStuffIntoFiles(ByVal fileName As String, ByVal colToPut As System.Collections.Generic.IEnumerable(Of String), ByVal limit As Long)
    27.         Dim colBuffer = New System.Collections.Generic.List(Of String)
    28.         Dim i As Long = 1
    29.         CreateFileIfNotExist(fileName)
    30.         Dim trueLimit = limit - System.IO.File.ReadLines(fileName).LongCount
    31.  
    32.         Using cacheFile = New System.IO.StreamWriter(fileName, True), cacheReserveFile = New System.IO.StreamWriter(reserveFileName(fileName), True)
    33.             For Each var In colToPut
    34.                 i = i + 1
    35.                 If i < trueLimit Then
    36.                     cacheFile.WriteLine(var)
    37.                 Else
    38.                     cacheReserveFile.WriteLine(var)
    39.                 End If
    40.             Next
    41.         End Using
    42.     End Sub
    Last edited by teguh123; Feb 15th, 2011 at 12:14 AM.

  8. #8
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    As the name suggests, a StreamReader is for reading from a Stream. It inherits the TextReader class, so it is for reading text from a Stream. It can read text from any Stream (e.g. MemoryStream, NetworkStream, GZipStream, CryptoStream) but it most commonly reads a FileStream. Because of that, the StreamReader has a constructor that allows you to pass a file path as an argument and it will create the FileStream implicitly. The ReadLines function creates a StreamReader internally, which then creates a FileStream internally.

    The reason that I suggest using the FileStream directly is because that is the only way to know for sure how much of the file has been read. That's because different encodings use different numbers of bytes to store each character. Some encodings even use a different number of bytes for two different characters. As such, counting characters is not a reliable way to determine how much of the file has been read.

    Actually, I made an error in my code earlier. This:
    vb.net Code:
    1. Using headFile As New FileStream(headFileName, FileMode.Open),
    2.       headReader As New StreamReader(headFileName)
    should have been this:
    vb.net Code:
    1. Using headFile As New FileStream(headFileName, FileMode.Open),
    2.       headReader As New StreamReader(headFile)
    An alternative to that would be this:
    vb.net Code:
    1. Using headReader As New StreamReader(headFileName)
    2.     Dim headFile As Stream = headReader.BaseStream
    That doesn't really provide any advantage though.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  9. #9

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Re: Using ReadLines

    Ah that explains it. I see. That's why I was confused. So I see filestream object and the streamreader object both access the same file without any other apparent connection between the 2, which just doesn't seems to make sense. How the hell updating the streamreader object will also update the filestream object?

  10. #10
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    Quote Originally Posted by teguh123 View Post
    So I see filestream object and the streamreader object both access the same file
    No they don't. The StreamReader doesn't access any file at all. As I have said previously, the StreamReader specifically reads from a Stream, in this case a FileStream.
    Quote Originally Posted by teguh123 View Post
    without any other apparent connection between the 2
    There certainly is an apparent connection between the two. Did you read my last post?
    Code:
    Dim headFile As Stream = headReader.BaseStream
    Calling any Read method on a StreamReader will cause the StreamReader to call Read on the underlying Stream.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  11. #11

    Thread Starter
    Fanatic Member
    Join Date
    Oct 2007
    Posts
    544

    Re: Using ReadLines

    Oh ya, your last posts makes everything very clear jmcilhinney. So streamreader actually read from the filestream object that it has. Namely the basestream member.

    There is one more thing that I am confused.

    Headfile.position is position in bytes right? Not position as in number of lines. I think I can check that my self. But just curious

    Here is my code now
    vb.net Code:
    1. Public Sub putStuffIntoFiles(ByVal fileName As String, ByVal change As String, ByVal reserveChange As String, ByVal limit As Long)
    2.         CreateFileIfNotExist(change)
    3.         CreateFileIfNotExist(reserveChange)
    4.         Dim trueLimit = limit - FileSize(change)
    5.         trueLimit = Math.Min(trueLimit, System.IO.File.ReadLines(fileName).LongCount)
    6.  
    7.         Using ReadStream = New System.IO.StreamReader(fileName, True) ' CAN just add FileStream = ReadStream.BaseStream here however, intellisense doesn't seem to fire up.
    8.             Using FileStream = ReadStream.BaseStream
    9.                 Using changeStream = New System.IO.StreamWriter(change, True)
    10.                     Do Until FileStream.Position < trueLimit
    11.                         changeStream.WriteLine(ReadStream.ReadLine)
    12.                     Loop
    13.                 End Using
    14.                 Using reserveChangeStream = New System.IO.StreamWriter(reserveChange, True)
    15.                     Do Until ReadStream.EndOfStream
    16.                         reserveChangeStream.WriteLine(ReadStream.ReadLine)
    17.                     Loop
    18.                 End Using
    19.             End Using
    20.         End Using
    21.     End Sub
    Last edited by teguh123; Feb 19th, 2011 at 10:14 AM.

  12. #12
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Using ReadLines

    Quote Originally Posted by teguh123 View Post
    Headfile.position is position in bytes right? Not position as in number of lines. I think I can check that my self. But just curious
    That's right. All the FileStreram knows about is bytes. It doesn;t know about text so it can't know about lines. You're calling ReadLine on the StreamReader so it's easy enough to count lines.

    Also, this:
    Code:
    Do Until FileStream.Position < trueLimit
    should be either this:
    Code:
    Do While FileStream.Position < trueLimit
    or this:
    Code:
    Do Until FileStream.Position > trueLimit
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width