Issue with Parsing Delimited File
I am given a CSV file every week that I need to parse and import into SQL. The process I wrote was working great until someone started inputting specific characters. Now it crashes. Here is part of the code that was working:
Code:
rsNew = New ADODB.Recordset
rsNewSQL = "tbl905Working_EPLS"
rsNew.Open(rsNewSQL, strconn, ADODB.CursorTypeEnum.adOpenDynamic, ADODB.LockTypeEnum.adLockOptimistic)
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(myPath)
MyReader.TextFieldType = FileIO.FieldType.Delimited '*****Opts Delimited, FixedWidth,
MyReader.SetDelimiters(",") '*****Use only with Delimited
Dim CurrentRow As String()
While Not MyReader.EndOfData '*****Loop through all rows in the raw file
CurrentRow = MyReader.ReadFields
Dim FldCt As Integer = CurrentRow.Count - 1
If myFlag <> True Then
rsNew.AddNew()
myRow = myRow + 1
End If
Dim CurCt As Integer = 0
For Each newString As String In CurrentRow
myValue = newString
The group that created the file is using Excel. In some of the fields, they are now inputting text like ""value"". Then they save as *.csv. I can manually go into the CSV and do a find/replace "" with ` and then run my program but it is getting old. They will not change what they are doing and I am forced to deal with it.
I am not 100% familiar with the process I am using but wondering if I can do something like this:
After the while not statement, I was thinking I could use ReadLine() to put the line in a string variable then use Replace to swap out the bad characters. Simple enough. Question now is how to put that string into the ReadFields after "fixing it." Also, would running ReadLine and then ReadFields move the cursor down two records causing a record skip since each states they process the line and move forward? If yes, how to go back a record? Is it as simple as myreader.moveprev or something similar or not possible?
OR
Short of looping through every character in the string and parsing out fields (I know how to do but want to avoid), any other suggestions?
Thanks everyone.
Douglas
Re: Issue with Parsing Delimited File
Are you saying that every field value is wrapped in double double quotes, e.g.
Code:
""yes"",""no"",""maybe""
If not then please be very specific about exactly what they are doing.
Re: Issue with Parsing Delimited File
Well, you're still working with ADO, which is kind of dated, so you might not get much help with that part of things.
There are a couple ways I would suggest. The first is that ADO.NET will query against a csv as easily as it will against a database. I think that DDay posted an example of that in the last day or two (I forget what it was in response to, so I can't find it), but he's kind of prolific, so searching his recent posts might take a while. There are probably other examples, though. However, that may not be quite sufficient for your needs. Using that technique, you'd fill a datatable by querying all records from the csv, then push the datatable out to your database. Unfortunately, the records might still come into the datatable wrong. I rather think that the fields in the datatable would have the "" in them, as well, but they'd be string fields, so you could run a loop through the rows in the datatable fixing each field without great difficulty using the String.Replace statement that you already mentioned.
Otherwise, you could also do as you were suggesting to read and adjust each line in turn, then use .Split to divide them up into individual items (either before or after you did the replace), then write them out to a datatable and update the datatable to the DB. That's all ADO.NET, too, though. I guess I just don't know ADO enough to say whether or not you could meld that solution into what you have. A datatable is just a memory construct, so you may be able to write a datatable out using ADO, in which case the real problem is to get the data into the datatable in corrected format.
Re: Issue with Parsing Delimited File
try putting this after:
Code:
CurrentRow = MyReader.ReadFields
Code:
CurrentRow = Array.ConvertAll(CurrentRow, Function(s) s.Replace("""", ""))
Re: Issue with Parsing Delimited File
jmcilhinney: Thanks for the response. It was way more complex than that. Not all records and not all fields. Would look more like this in notepad: "val1","val2",""""val3"""","(val4, val5,val6,""""val7"""", val8)","val9". They were trying to put quotes around people's alias name like "Snoop Dog" but excel didn't like it so they over quoted until they got the appearance in excel that they wanted. They didn't care that it adversely affected the CSV output because they didn't use it.
ShaggyHiker: Thanks for the response. Using ADO because it is a requirement here. Was going to try your suggestion until I read .paul's thread this morning.
.paul: You hit exactly what I hoped to find. Initial testing is showing that it works. I am now running the production file twice. Once manually changed and once letting the program change on the fly so I can compare results and determine it doesn't fail on any records. It is over 900,000 records with 562 columns so it may take me today to complete. Will reply if successful with the final code. Thank you very much. Fingers crossed.
Re: Issue with Parsing Delimited File
Thanks to .paul the code is working perfectly. I ended up using the code as shown below. I decided that I did not want any quotes around the final output and this took care of it. Wanted to be sure to share the final code in case anyone runs into a similar situation.
[CODE]
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(myPath)
MyReader.TextFieldType = FileIO.FieldType.Delimited '*****Opts Delimited, FixedWidth,
MyReader.SetDelimiters(",") '*****Use only with Delimited
Dim CurrentRow As String()
While Not MyReader.EndOfData '*****Loop through all rows in the raw file
CurrentRow = MyReader.ReadFields
CurrentRow = Array.ConvertAll(CurrentRow, Function(s) s.Replace("""""", ""))
Dim FldCt As Integer = CurrentRow.Count - 1
If myFlag <> True Then
rsNew.AddNew()
myRow = myRow + 1
End If
Dim CurCt As Integer = 0
For Each newString As String In CurrentRow
myValue = newString
[\CODE]