|
-
Feb 13th, 2011, 07:57 AM
#1
Thread Starter
Fanatic Member
Using ReadLines
When we use system.io.ReadLines the content of the file should not be loaded to memory right?
say I want to create a function.
The function accept a string, say "post.txt" where post.txt is a large file (3G)
I want to put the first 100k line of post.txt on post.txt and the rest on reserve_post.txt
How would I do so?
-
Feb 14th, 2011, 01:32 AM
#2
Re: Using ReadLines
There is no "system.io.ReadLines". There is a System.IO.File class and it has a ReadLines method. You are correct that that method doesn't read the entire file into memory immediately, which the ReadAllLines method does. ReadLines returns an IEnumerable(Of String) and each line is only read from the file as it's used.
I wouldn't use ReadLines for what you want. It's would be more appropriate to go like this:
vb.net Code:
Dim headFileName = "post.txt" Dim tailFileName = "reserve_post.txt" Dim headLines As New List(Of String) Const THRESHOLD As Integer = 100000 Using headFile As New FileStream(headFileName, FileMode.Open), headReader As New StreamReader(headFileName) Do Until headFile.Position >= THRESHOLD headLines.Add(headReader.ReadLine()) Loop Using tailWriter As New StreamWriter(tailFileName) Do Until headReader.EndOfStream tailWriter.WriteLine(headReader.ReadLine()) Loop End Using End Using File.WriteAllLines(headFileName, headLines)
I suggest this way because using the FileStream allows you to determine exactly how much of the file has been read, regardless of the encoding.
Note that this does require you to read the data into memory until you reach the threshold. If you want to avoid storing more than one line in memory at a time then you would have to use a third file:
vb.net Code:
Dim headFileName = "post.txt" Dim tailFileName = "reserve_post.txt" Dim tempFileName = "temp.txt" Const THRESHOLD As Integer = 100000 Using headFile As New FileStream(headFileName, FileMode.Open), headReader As New StreamReader(headFileName) Using tempWriter As New StreamWriter(tempFileName) Do Until headFile.Position >= THRESHOLD tempWriter.WriteLine(headReader.ReadLine()) Loop End Using Using tailWriter As New StreamWriter(tailFileName) Do Until headReader.EndOfStream tailWriter.WriteLine(headReader.ReadLine()) Loop End Using End Using File.Copy(tempFileName, headFileName, True) File.Delete(tempFileName)
-
Feb 14th, 2011, 01:40 AM
#3
Thread Starter
Fanatic Member
Re: Using ReadLines
Thanks. Is streamreader and streamwriter efficient?
Do they write every single line or wait till there are significant lines in their buffer before dumping?
And when did you close the streamwriter object?
Ah I see. So that using thingy sort of terminate the stuff and called dispose member automatically. That's l33t programming.
Last edited by teguh123; Feb 14th, 2011 at 01:50 AM.
-
Feb 14th, 2011, 01:48 AM
#4
Re: Using ReadLines
A StreamReader will read what you tell it to read. File.ReadLines uses a StreamReader internally.
A StreamWriter writes immediately to its underlying Stream, so whether data is buffered depends on that Stream. If it's a text file you're writing to then the StreamWriter sits on top of a FileStream, which is basically the only way to write to a file in .NET anyway. Pretty much anything else you do will have a FileStream in there, whether you can see it or not. As such, the efficiency or not is not really a concern but, of course, the authors have optimised it for performance.
-
Feb 14th, 2011, 09:07 AM
#5
Thread Starter
Fanatic Member
Re: Using ReadLines
Everything is clear. But why you still need to use FileStream object? Why not use StreamReader object straight?
-
Feb 14th, 2011, 06:31 PM
#6
Re: Using ReadLines
 Originally Posted by teguh123
Everything is clear. But why you still need to use FileStream object? Why not use StreamReader object straight?
If you'd read my post properly you'd already know that.
 Originally Posted by jmcilhinney
I suggest this way because using the FileStream allows you to determine exactly how much of the file has been read, regardless of the encoding.
-
Feb 15th, 2011, 12:10 AM
#7
Thread Starter
Fanatic Member
Re: Using ReadLines
Well I do not use filestream at all. To know how much of the file has been read I simply put a counter i. So that's what filestream is for, as a replacement of the counter
I think I like the do until style jmcilhinney used. However that means I can't use for each on the colToPut. Ah you're using readline rather than readlines. Hmm....
That filestream thingy is facinating. So two different objects point to the same file and one knows where the enumerator of the other one points. That's something new.
vb.net Code:
Public Function reserveMaintenance(ByRef cache As String, Optional ByRef reserve As String = "", Optional ByRef unique As Boolean = True, Optional ByRef reverse As Boolean = False, Optional ByRef dirty As Boolean = False, Optional ByRef ignoreReserve As Boolean = False) As System.Collections.Generic.List(Of String) dirty = False If reserve = "" Then reserve = reserveFileName(cache, True) End If CreateFileIfNotExist(reserve) Const lowerLimit As Integer = 5000 Const upperLimit As Integer = 20000 Const average = (lowerLimit + upperLimit) \ 2 If System.IO.File.ReadLines(cache).Count > upperLimit Or System.IO.File.ReadLines(cache).Count < lowerLimit Then Dim colCache = System.IO.File.ReadLines(cache) Dim colReserve = System.IO.File.ReadLines(reserve) putStuffIntoFiles(reserveFileName(cache, False, True), colCache, average) putStuffIntoFiles(reserveFileName(cache, False, True), colReserve, average) System.IO.File.Delete(cache) System.IO.File.Delete(reserveFileName(cache)) My.Computer.FileSystem.RenameFile(reserveFileName(cache, False, True), reserveFileName(cache, False, False)) My.Computer.FileSystem.RenameFile(reserveFileName(cache, True, True), reserveFileName(cache, True, False)) End If Return fileToStringCol(cache) End Function Public Sub putStuffIntoFiles(ByVal fileName As String, ByVal colToPut As System.Collections.Generic.IEnumerable(Of String), ByVal limit As Long) Dim colBuffer = New System.Collections.Generic.List(Of String) Dim i As Long = 1 CreateFileIfNotExist(fileName) Dim trueLimit = limit - System.IO.File.ReadLines(fileName).LongCount Using cacheFile = New System.IO.StreamWriter(fileName, True), cacheReserveFile = New System.IO.StreamWriter(reserveFileName(fileName), True) For Each var In colToPut i = i + 1 If i < trueLimit Then cacheFile.WriteLine(var) Else cacheReserveFile.WriteLine(var) End If Next End Using End Sub
Last edited by teguh123; Feb 15th, 2011 at 12:14 AM.
-
Feb 15th, 2011, 12:23 AM
#8
Re: Using ReadLines
As the name suggests, a StreamReader is for reading from a Stream. It inherits the TextReader class, so it is for reading text from a Stream. It can read text from any Stream (e.g. MemoryStream, NetworkStream, GZipStream, CryptoStream) but it most commonly reads a FileStream. Because of that, the StreamReader has a constructor that allows you to pass a file path as an argument and it will create the FileStream implicitly. The ReadLines function creates a StreamReader internally, which then creates a FileStream internally.
The reason that I suggest using the FileStream directly is because that is the only way to know for sure how much of the file has been read. That's because different encodings use different numbers of bytes to store each character. Some encodings even use a different number of bytes for two different characters. As such, counting characters is not a reliable way to determine how much of the file has been read.
Actually, I made an error in my code earlier. This:
vb.net Code:
Using headFile As New FileStream(headFileName, FileMode.Open), headReader As New StreamReader(headFileName)
should have been this:
vb.net Code:
Using headFile As New FileStream(headFileName, FileMode.Open), headReader As New StreamReader(headFile)
An alternative to that would be this:
vb.net Code:
Using headReader As New StreamReader(headFileName) Dim headFile As Stream = headReader.BaseStream
That doesn't really provide any advantage though.
-
Feb 16th, 2011, 11:10 PM
#9
Thread Starter
Fanatic Member
Re: Using ReadLines
Ah that explains it. I see. That's why I was confused. So I see filestream object and the streamreader object both access the same file without any other apparent connection between the 2, which just doesn't seems to make sense. How the hell updating the streamreader object will also update the filestream object?
-
Feb 16th, 2011, 11:47 PM
#10
Re: Using ReadLines
 Originally Posted by teguh123
So I see filestream object and the streamreader object both access the same file
No they don't. The StreamReader doesn't access any file at all. As I have said previously, the StreamReader specifically reads from a Stream, in this case a FileStream.
 Originally Posted by teguh123
without any other apparent connection between the 2
There certainly is an apparent connection between the two. Did you read my last post?
Code:
Dim headFile As Stream = headReader.BaseStream
Calling any Read method on a StreamReader will cause the StreamReader to call Read on the underlying Stream.
-
Feb 19th, 2011, 10:10 AM
#11
Thread Starter
Fanatic Member
Re: Using ReadLines
Oh ya, your last posts makes everything very clear jmcilhinney. So streamreader actually read from the filestream object that it has. Namely the basestream member.
There is one more thing that I am confused.
Headfile.position is position in bytes right? Not position as in number of lines. I think I can check that my self. But just curious 
Here is my code now
vb.net Code:
Public Sub putStuffIntoFiles(ByVal fileName As String, ByVal change As String, ByVal reserveChange As String, ByVal limit As Long) CreateFileIfNotExist(change) CreateFileIfNotExist(reserveChange) Dim trueLimit = limit - FileSize(change) trueLimit = Math.Min(trueLimit, System.IO.File.ReadLines(fileName).LongCount) Using ReadStream = New System.IO.StreamReader(fileName, True) ' CAN just add FileStream = ReadStream.BaseStream here however, intellisense doesn't seem to fire up. Using FileStream = ReadStream.BaseStream Using changeStream = New System.IO.StreamWriter(change, True) Do Until FileStream.Position < trueLimit changeStream.WriteLine(ReadStream.ReadLine) Loop End Using Using reserveChangeStream = New System.IO.StreamWriter(reserveChange, True) Do Until ReadStream.EndOfStream reserveChangeStream.WriteLine(ReadStream.ReadLine) Loop End Using End Using End Using End Sub
Last edited by teguh123; Feb 19th, 2011 at 10:14 AM.
-
Feb 19th, 2011, 11:48 AM
#12
Re: Using ReadLines
 Originally Posted by teguh123
Headfile.position is position in bytes right? Not position as in number of lines. I think I can check that my self. But just curious 
That's right. All the FileStreram knows about is bytes. It doesn;t know about text so it can't know about lines. You're calling ReadLine on the StreamReader so it's easy enough to count lines.
Also, this:
Code:
Do Until FileStream.Position < trueLimit
should be either this:
Code:
Do While FileStream.Position < trueLimit
or this:
Code:
Do Until FileStream.Position > trueLimit
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|