|
-
Jul 17th, 2014, 03:57 PM
#1
Thread Starter
New Member
How do I process a csv file and save to the same file?
Hello,
I have a csv file to be processed, here is what I want:
The file is huge (>1GB), it's in fixed width, I need to remove all the blank spaces between each column, the last column seems to be 20 characters width but when the value is 0 it becomes only one character width following carriage return and starts another line of data.
I want to write a program to read into the file and replace all the blank spaces with a tab so that the csv is changed to tab delimited.
And as the file is huge, I wonder if I can do the replacement in the same file? I don't want to generate another huge file on my machine.
Can someone give me any suggestion? Thank you very much.
-
Jul 17th, 2014, 07:30 PM
#2
Re: How do I process a csv file and save to the same file?
If you can be guaranteed that all data in the output will be at either the same position as in the input or earlier then you could do it in place. You could create a read/write FileStream and then manipulate the Position as you read and write, e.g.
vb.net Code:
Dim baseStream As New FileStream(filePath, FileMode.Open, FileAccess.ReadWrite) Dim reader As New StreamReader(baseStream) Dim writer As New StreamWriter(baseStream) Dim readPosition As Long Dim writePosition As Long Dim line As String Do Until reader.EndOfStream 'Position the stream pointer for reading. baseStream.Position = readPosition 'Read the next line. line = reader.ReadLine() 'Remember the current stream pointer for reading. readPosition = baseStream.Position 'Process line here. 'Position the stream pointer for writing. baseStream.Position = writePosition 'Write the current line. writer.WriteLine(line) 'Remember the current stream pointer for writing. writePosition = baseStream.Position Loop
As long as the `writePosition` never overtakes the `readPosition`, you should be OK I believe. That said, I just realised that I'm not sure how to truncate the file at the end, once you're finished writing. It's not something that I've ever had to do so it's not something I've researched.
-
Jul 17th, 2014, 07:36 PM
#3
Re: How do I process a csv file and save to the same file?
 Originally Posted by jmcilhinney
That said, I just realised that I'm not sure how to truncate the file at the end, once you're finished writing. It's not something that I've ever had to do so it's not something I've researched.
Aha! I just checked out the FileStream documentation and there is a SetLength method. You can call that after the last write to truncate the file. Here's some revised code:
vb.net Code:
Using baseStream As New FileStream(filePath, FileMode.Open, FileAccess.ReadWrite) Dim readPosition As Long Dim writePosition As Long Dim line As String Using reader As New StreamReader(baseStream), writer As New StreamWriter(baseStream) Do Until reader.EndOfStream 'Position the stream pointer for reading. baseStream.Position = readPosition 'Read the next line. line = reader.ReadLine() 'Remember the current stream pointer for reading. readPosition = baseStream.Position 'Process line here. 'Position the stream pointer for writing. baseStream.Position = writePosition 'Write the current line. writer.WriteLine(line) 'Remember the current stream pointer for writing. writePosition = baseStream.Position Loop End Using 'Truncate the file after the last written position. baseStream.SetLength(writePosition) End Using
That is untested but it looks OK to me.
-
Jul 18th, 2014, 07:21 AM
#4
Re: How do I process a csv file and save to the same file?
Just to show there's more than one way to skin a cat...
vb.net Code:
'Get a reader to the file
Using fileReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(DataFile)
Dim rowData As String()
'We're going to be using fixed width
fileReader.TextFieldType = FileIO.FieldType.FixedWidth
'Set the width of each column to read
fileReader.SetFieldWidths(fieldWidths) ' fieldWidths is defined as Integer() and contains the length of each column
'If there is something to read, read it
While Not fileReader.EndOfData
'Read the next line - the parser will stuff each column into an array
rowData = fileReader.ReadFields()
'Here I added the data to a datatable, but you could also use the array.join to stuff it back into a string with comas, tabs, whatever, and write it back out to a file, etc...
Dim newRow = _FileDataTable.NewRow
_FileFields.ForEach(Sub(r) newRow(r.Name) = IIf(rowData(r.FieldIndex) <> "", rowData(r.FieldIndex), DBNull.Value))
_FileDataTable.Rows.Add(newRow)
End While
End Using
I like using the TextFieldParser because it deals with the nuances that come with CSVs that I often deal with.
This is a heavily modified version of something I built a while back to handle files in a generic manner, I didn't have to write out all of the data back to files, so I stored it in a DT until I needed it. Then I could write out just the fields I needed when I needed them (some fields would go back into one file, while other fields go into a different file). _FileFields is an internal private class that's used to track field mappings for the files. I also wasn't writing things back out to the same file, I used new files to write out to.
-tg
-
Jul 18th, 2014, 08:34 AM
#5
Re: How do I process a csv file and save to the same file?
 Originally Posted by techgnome
Just to show there's more than one way to skin a cat...
vb.net Code:
'Get a reader to the file Using fileReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(DataFile) Dim rowData As String() 'We're going to be using fixed width fileReader.TextFieldType = FileIO.FieldType.FixedWidth 'Set the width of each column to read fileReader.SetFieldWidths(fieldWidths) ' fieldWidths is defined as Integer() and contains the length of each column 'If there is something to read, read it While Not fileReader.EndOfData 'Read the next line - the parser will stuff each column into an array rowData = fileReader.ReadFields() 'Here I added the data to a datatable, but you could also use the array.join to stuff it back into a string with comas, tabs, whatever, and write it back out to a file, etc... Dim newRow = _FileDataTable.NewRow _FileFields.ForEach(Sub(r) newRow(r.Name) = IIf(rowData(r.FieldIndex) <> "", rowData(r.FieldIndex), DBNull.Value)) _FileDataTable.Rows.Add(newRow) End While End Using
I like using the TextFieldParser because it deals with the nuances that come with CSVs that I often deal with.
This is a heavily modified version of something I built a while back to handle files in a generic manner, I didn't have to write out all of the data back to files, so I stored it in a DT until I needed it. Then I could write out just the fields I needed when I needed them (some fields would go back into one file, while other fields go into a different file). _FileFields is an internal private class that's used to track field mappings for the files. I also wasn't writing things back out to the same file, I used new files to write out to.
-tg
You could certainly replace the StreamReader in my code with a TextFieldParser because it can be created with a Stream too. I don't think that you'd want to read in a file of over 1GB and store all the data in a DataTable though.
-
Jul 18th, 2014, 09:25 AM
#6
Re: How do I process a csv file and save to the same file?
No, that is true, I'm not sure I'd want to do that either... it's something I just plucked from another project, stripped it down and tossed it up there. Just for simplicity, I'd probably use the Join method, put it back into a string with tabs, then write it out to a new file (knowing the luck my & my users have, writing to the same file at the same time wouldn't be good) ... and then delete the original and rename the new to the old (plus I can see with a file that large, something goes wrong in the middle of processing, then the whole thing is corrupt. I'm paranoid when it comes to data.)
-tg
-
Jul 18th, 2014, 10:39 AM
#7
Re: How do I process a csv file and save to the same file?
 Originally Posted by techgnome
No, that is true, I'm not sure I'd want to do that either... it's something I just plucked from another project, stripped it down and tossed it up there. Just for simplicity, I'd probably use the Join method, put it back into a string with tabs, then write it out to a new file (knowing the luck my & my users have, writing to the same file at the same time wouldn't be good) ... and then delete the original and rename the new to the old (plus I can see with a file that large, something goes wrong in the middle of processing, then the whole thing is corrupt. I'm paranoid when it comes to data.)
-tg
I agree. I'd go with creating a temp file and then move it to overwrite the original when it's done. Using a GB of space temporarily is not a big deal these days and it is a risk using the one file.
-
Jul 23rd, 2014, 07:56 AM
#8
Thread Starter
New Member
Re: How do I process a csv file and save to the same file?
Thank you all for the replies and discussion.
Would it be easier to simply chop off certain lines (say, the first couple lines and/or the last couple lines)?
Since the file is big, I don't want to see my app hang there with "No Responding"
Thanks lots.
-
Jul 23rd, 2014, 08:35 AM
#9
Re: How do I process a csv file and save to the same file?
 Originally Posted by mdivk
Would it be easier to simply chop off certain lines (say, the first couple lines and/or the last couple lines)?
I don't really know what that means. Do you mean process a few lines at a time or don't process a few lines or what?
 Originally Posted by mdivk
Since the file is big, I don't want to see my app hang there with "No Responding"
That's not really an issue. Any remotely lengthy processing should be done on a secondary thread anyway. Process your file on a secondary thread and the UI thread is free to respond.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|