Results 1 to 9 of 9

Thread: How do I process a csv file and save to the same file?

  1. #1

    Thread Starter
    New Member mdivk's Avatar
    Join Date
    Jul 2014
    Posts
    8

    How do I process a csv file and save to the same file?

    Hello,

    I have a csv file to be processed, here is what I want:

    The file is huge (>1GB), it's in fixed width, I need to remove all the blank spaces between each column, the last column seems to be 20 characters width but when the value is 0 it becomes only one character width following carriage return and starts another line of data.

    I want to write a program to read into the file and replace all the blank spaces with a tab so that the csv is changed to tab delimited.

    And as the file is huge, I wonder if I can do the replacement in the same file? I don't want to generate another huge file on my machine.

    Can someone give me any suggestion? Thank you very much.

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: How do I process a csv file and save to the same file?

    If you can be guaranteed that all data in the output will be at either the same position as in the input or earlier then you could do it in place. You could create a read/write FileStream and then manipulate the Position as you read and write, e.g.
    vb.net Code:
    1. Dim baseStream As New FileStream(filePath, FileMode.Open, FileAccess.ReadWrite)
    2. Dim reader As New StreamReader(baseStream)
    3. Dim writer As New StreamWriter(baseStream)
    4. Dim readPosition As Long
    5. Dim writePosition As Long
    6. Dim line As String
    7.  
    8. Do Until reader.EndOfStream
    9.     'Position the stream pointer for reading.
    10.     baseStream.Position = readPosition
    11.  
    12.     'Read the next line.
    13.     line = reader.ReadLine()
    14.  
    15.     'Remember the current stream pointer for reading.
    16.     readPosition = baseStream.Position
    17.  
    18.     'Process line here.
    19.  
    20.     'Position the stream pointer for writing.
    21.     baseStream.Position = writePosition
    22.  
    23.     'Write the current line.
    24.     writer.WriteLine(line)
    25.  
    26.     'Remember the current stream pointer for writing.
    27.     writePosition = baseStream.Position
    28. Loop
    As long as the `writePosition` never overtakes the `readPosition`, you should be OK I believe. That said, I just realised that I'm not sure how to truncate the file at the end, once you're finished writing. It's not something that I've ever had to do so it's not something I've researched.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: How do I process a csv file and save to the same file?

    Quote Originally Posted by jmcilhinney View Post
    That said, I just realised that I'm not sure how to truncate the file at the end, once you're finished writing. It's not something that I've ever had to do so it's not something I've researched.
    Aha! I just checked out the FileStream documentation and there is a SetLength method. You can call that after the last write to truncate the file. Here's some revised code:
    vb.net Code:
    1. Using baseStream As New FileStream(filePath, FileMode.Open, FileAccess.ReadWrite)
    2.     Dim readPosition As Long
    3.     Dim writePosition As Long
    4.     Dim line As String
    5.  
    6.     Using reader As New StreamReader(baseStream),
    7.           writer As New StreamWriter(baseStream)
    8.         Do Until reader.EndOfStream
    9.             'Position the stream pointer for reading.
    10.             baseStream.Position = readPosition
    11.  
    12.             'Read the next line.
    13.             line = reader.ReadLine()
    14.  
    15.             'Remember the current stream pointer for reading.
    16.             readPosition = baseStream.Position
    17.  
    18.             'Process line here.
    19.  
    20.             'Position the stream pointer for writing.
    21.             baseStream.Position = writePosition
    22.  
    23.             'Write the current line.
    24.             writer.WriteLine(line)
    25.  
    26.             'Remember the current stream pointer for writing.
    27.             writePosition = baseStream.Position
    28.         Loop
    29.     End Using
    30.  
    31.     'Truncate the file after the last written position.
    32.     baseStream.SetLength(writePosition)
    33. End Using
    That is untested but it looks OK to me.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  4. #4
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,687

    Re: How do I process a csv file and save to the same file?

    Just to show there's more than one way to skin a cat...

    vb.net Code:
    1. 'Get a reader to the file
    2. Using fileReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(DataFile)
    3.  
    4.     Dim rowData As String()
    5.  
    6.     'We're going to be using fixed width
    7.     fileReader.TextFieldType = FileIO.FieldType.FixedWidth
    8.     'Set the width of each column to read
    9.     fileReader.SetFieldWidths(fieldWidths) ' fieldWidths is defined as Integer() and contains the length of each column
    10.     'If there is something to read, read it
    11.     While Not fileReader.EndOfData
    12.         'Read the next line - the parser will stuff each column into an array
    13.         rowData = fileReader.ReadFields()
    14.        
    15.         'Here I added the data to a datatable, but you could also use the array.join to stuff it back into a string with comas, tabs, whatever, and write it back out to a file, etc...
    16.         Dim newRow = _FileDataTable.NewRow
    17.         _FileFields.ForEach(Sub(r) newRow(r.Name) = IIf(rowData(r.FieldIndex) <> "", rowData(r.FieldIndex), DBNull.Value))
    18.         _FileDataTable.Rows.Add(newRow)
    19.  
    20.     End While
    21. End Using
    I like using the TextFieldParser because it deals with the nuances that come with CSVs that I often deal with.
    This is a heavily modified version of something I built a while back to handle files in a generic manner, I didn't have to write out all of the data back to files, so I stored it in a DT until I needed it. Then I could write out just the fields I needed when I needed them (some fields would go back into one file, while other fields go into a different file). _FileFields is an internal private class that's used to track field mappings for the files. I also wasn't writing things back out to the same file, I used new files to write out to.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  5. #5
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: How do I process a csv file and save to the same file?

    Quote Originally Posted by techgnome View Post
    Just to show there's more than one way to skin a cat...

    vb.net Code:
    1. 'Get a reader to the file
    2. Using fileReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(DataFile)
    3.  
    4.     Dim rowData As String()
    5.  
    6.     'We're going to be using fixed width
    7.     fileReader.TextFieldType = FileIO.FieldType.FixedWidth
    8.     'Set the width of each column to read
    9.     fileReader.SetFieldWidths(fieldWidths) ' fieldWidths is defined as Integer() and contains the length of each column
    10.     'If there is something to read, read it
    11.     While Not fileReader.EndOfData
    12.         'Read the next line - the parser will stuff each column into an array
    13.         rowData = fileReader.ReadFields()
    14.        
    15.         'Here I added the data to a datatable, but you could also use the array.join to stuff it back into a string with comas, tabs, whatever, and write it back out to a file, etc...
    16.         Dim newRow = _FileDataTable.NewRow
    17.         _FileFields.ForEach(Sub(r) newRow(r.Name) = IIf(rowData(r.FieldIndex) <> "", rowData(r.FieldIndex), DBNull.Value))
    18.         _FileDataTable.Rows.Add(newRow)
    19.  
    20.     End While
    21. End Using
    I like using the TextFieldParser because it deals with the nuances that come with CSVs that I often deal with.
    This is a heavily modified version of something I built a while back to handle files in a generic manner, I didn't have to write out all of the data back to files, so I stored it in a DT until I needed it. Then I could write out just the fields I needed when I needed them (some fields would go back into one file, while other fields go into a different file). _FileFields is an internal private class that's used to track field mappings for the files. I also wasn't writing things back out to the same file, I used new files to write out to.

    -tg
    You could certainly replace the StreamReader in my code with a TextFieldParser because it can be created with a Stream too. I don't think that you'd want to read in a file of over 1GB and store all the data in a DataTable though.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  6. #6
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,687

    Re: How do I process a csv file and save to the same file?

    No, that is true, I'm not sure I'd want to do that either... it's something I just plucked from another project, stripped it down and tossed it up there. Just for simplicity, I'd probably use the Join method, put it back into a string with tabs, then write it out to a new file (knowing the luck my & my users have, writing to the same file at the same time wouldn't be good) ... and then delete the original and rename the new to the old (plus I can see with a file that large, something goes wrong in the middle of processing, then the whole thing is corrupt. I'm paranoid when it comes to data.)

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  7. #7
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: How do I process a csv file and save to the same file?

    Quote Originally Posted by techgnome View Post
    No, that is true, I'm not sure I'd want to do that either... it's something I just plucked from another project, stripped it down and tossed it up there. Just for simplicity, I'd probably use the Join method, put it back into a string with tabs, then write it out to a new file (knowing the luck my & my users have, writing to the same file at the same time wouldn't be good) ... and then delete the original and rename the new to the old (plus I can see with a file that large, something goes wrong in the middle of processing, then the whole thing is corrupt. I'm paranoid when it comes to data.)

    -tg
    I agree. I'd go with creating a temp file and then move it to overwrite the original when it's done. Using a GB of space temporarily is not a big deal these days and it is a risk using the one file.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  8. #8

    Thread Starter
    New Member mdivk's Avatar
    Join Date
    Jul 2014
    Posts
    8

    Re: How do I process a csv file and save to the same file?

    Thank you all for the replies and discussion.

    Would it be easier to simply chop off certain lines (say, the first couple lines and/or the last couple lines)?

    Since the file is big, I don't want to see my app hang there with "No Responding"

    Thanks lots.

  9. #9
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: How do I process a csv file and save to the same file?

    Quote Originally Posted by mdivk View Post
    Would it be easier to simply chop off certain lines (say, the first couple lines and/or the last couple lines)?
    I don't really know what that means. Do you mean process a few lines at a time or don't process a few lines or what?
    Quote Originally Posted by mdivk View Post
    Since the file is big, I don't want to see my app hang there with "No Responding"
    That's not really an issue. Any remotely lengthy processing should be done on a secondary thread anyway. Process your file on a secondary thread and the UI thread is free to respond.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width