Results 1 to 6 of 6

Thread: Issue with Parsing Delimited File

  1. #1

    Thread Starter
    Member
    Join Date
    Dec 2011
    Location
    Columbus, Ohio
    Posts
    48

    Issue with Parsing Delimited File

    I am given a CSV file every week that I need to parse and import into SQL. The process I wrote was working great until someone started inputting specific characters. Now it crashes. Here is part of the code that was working:

    Code:
            rsNew = New ADODB.Recordset
            rsNewSQL = "tbl905Working_EPLS"
            rsNew.Open(rsNewSQL, strconn, ADODB.CursorTypeEnum.adOpenDynamic, ADODB.LockTypeEnum.adLockOptimistic)
    
            Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(myPath)
                MyReader.TextFieldType = FileIO.FieldType.Delimited      '*****Opts Delimited, FixedWidth, 
                MyReader.SetDelimiters(",")                              '*****Use only with Delimited   
                Dim CurrentRow As String()
                While Not MyReader.EndOfData                             '*****Loop through all rows in the raw file
    
                    CurrentRow = MyReader.ReadFields
                    Dim FldCt As Integer = CurrentRow.Count - 1
                    If myFlag <> True Then
                        rsNew.AddNew()
                        myRow = myRow + 1
                    End If
                    Dim CurCt As Integer = 0
                    For Each newString As String In CurrentRow
                        myValue = newString
    The group that created the file is using Excel. In some of the fields, they are now inputting text like ""value"". Then they save as *.csv. I can manually go into the CSV and do a find/replace "" with ` and then run my program but it is getting old. They will not change what they are doing and I am forced to deal with it.

    I am not 100% familiar with the process I am using but wondering if I can do something like this:
    After the while not statement, I was thinking I could use ReadLine() to put the line in a string variable then use Replace to swap out the bad characters. Simple enough. Question now is how to put that string into the ReadFields after "fixing it." Also, would running ReadLine and then ReadFields move the cursor down two records causing a record skip since each states they process the line and move forward? If yes, how to go back a record? Is it as simple as myreader.moveprev or something similar or not possible?

    OR

    Short of looping through every character in the string and parsing out fields (I know how to do but want to avoid), any other suggestions?

    Thanks everyone.
    Douglas

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Issue with Parsing Delimited File

    Are you saying that every field value is wrapped in double double quotes, e.g.
    Code:
    ""yes"",""no"",""maybe""
    If not then please be very specific about exactly what they are doing.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,104

    Re: Issue with Parsing Delimited File

    Well, you're still working with ADO, which is kind of dated, so you might not get much help with that part of things.

    There are a couple ways I would suggest. The first is that ADO.NET will query against a csv as easily as it will against a database. I think that DDay posted an example of that in the last day or two (I forget what it was in response to, so I can't find it), but he's kind of prolific, so searching his recent posts might take a while. There are probably other examples, though. However, that may not be quite sufficient for your needs. Using that technique, you'd fill a datatable by querying all records from the csv, then push the datatable out to your database. Unfortunately, the records might still come into the datatable wrong. I rather think that the fields in the datatable would have the "" in them, as well, but they'd be string fields, so you could run a loop through the rows in the datatable fixing each field without great difficulty using the String.Replace statement that you already mentioned.

    Otherwise, you could also do as you were suggesting to read and adjust each line in turn, then use .Split to divide them up into individual items (either before or after you did the replace), then write them out to a datatable and update the datatable to the DB. That's all ADO.NET, too, though. I guess I just don't know ADO enough to say whether or not you could meld that solution into what you have. A datatable is just a memory construct, so you may be able to write a datatable out using ADO, in which case the real problem is to get the data into the datatable in corrected format.
    My usual boring signature: Nothing

  4. #4
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    26,413

    Re: Issue with Parsing Delimited File

    try putting this after:

    Code:
    CurrentRow = MyReader.ReadFields
    Code:
    CurrentRow = Array.ConvertAll(CurrentRow, Function(s) s.Replace("""", ""))

  5. #5

    Thread Starter
    Member
    Join Date
    Dec 2011
    Location
    Columbus, Ohio
    Posts
    48

    Re: Issue with Parsing Delimited File

    jmcilhinney: Thanks for the response. It was way more complex than that. Not all records and not all fields. Would look more like this in notepad: "val1","val2",""""val3"""","(val4, val5,val6,""""val7"""", val8)","val9". They were trying to put quotes around people's alias name like "Snoop Dog" but excel didn't like it so they over quoted until they got the appearance in excel that they wanted. They didn't care that it adversely affected the CSV output because they didn't use it.

    ShaggyHiker: Thanks for the response. Using ADO because it is a requirement here. Was going to try your suggestion until I read .paul's thread this morning.

    .paul: You hit exactly what I hoped to find. Initial testing is showing that it works. I am now running the production file twice. Once manually changed and once letting the program change on the fly so I can compare results and determine it doesn't fail on any records. It is over 900,000 records with 562 columns so it may take me today to complete. Will reply if successful with the final code. Thank you very much. Fingers crossed.

  6. #6

    Thread Starter
    Member
    Join Date
    Dec 2011
    Location
    Columbus, Ohio
    Posts
    48

    Re: Issue with Parsing Delimited File

    Thanks to .paul the code is working perfectly. I ended up using the code as shown below. I decided that I did not want any quotes around the final output and this took care of it. Wanted to be sure to share the final code in case anyone runs into a similar situation.

    [CODE]
    Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(myPath)
    MyReader.TextFieldType = FileIO.FieldType.Delimited '*****Opts Delimited, FixedWidth,
    MyReader.SetDelimiters(",") '*****Use only with Delimited
    Dim CurrentRow As String()
    While Not MyReader.EndOfData '*****Loop through all rows in the raw file
    CurrentRow = MyReader.ReadFields
    CurrentRow = Array.ConvertAll(CurrentRow, Function(s) s.Replace("""""", ""))
    Dim FldCt As Integer = CurrentRow.Count - 1
    If myFlag <> True Then
    rsNew.AddNew()
    myRow = myRow + 1
    End If
    Dim CurCt As Integer = 0
    For Each newString As String In CurrentRow
    myValue = newString
    [\CODE]

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width