- VBForums
- .NET and More
- C#
- [RESOLVED] StreamReader readline() split only at newlines that are not encapsulated in quotes
-
Jan 23rd, 2013, 11:03 AM
#1
Thread Starter
Hyperactive Member
[RESOLVED] StreamReader readline() split only at newlines that are not encapsulated in quotes
I have the following function
Code:
/// <summary>
/// Pulls info from CSV file and stores each entry as list of string arrays
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public static List<string[]> parseCSV(string path)
{
//
List<string[]> parsedData = new List<string[]>();
try
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))"; //Should be commas that are not encapsulated in quotation marks
Regex r = new Regex(pattern);
while ((line = readFile.ReadLine()) != null)
{
row = r.Split(line);
parsedData.Add(row);
}
}
}
catch (Exception e)
{
MessageBox.Show(e.Message);
CommitSuicide();
}
return parsedData;
}
which worked well in the past, but now the company who is sending us the CSV has added a new field which sometimes has newlines (\r\n). This function will no longer support us because (line = readFile.ReadLine()) splits at each newline.
What is the best way to modify the existing function to only split at newlines that aren't enclosed in double quotes? I suppose I could create a StreamReader extension and call it ReadEntry and basically recreate what ReadLine already does... but that sounds rather tedious and out of my skill level, to be honest.
Base 2
Fcnncu"Nqxgu"Lguug##
-
Jan 23rd, 2013, 12:26 PM
#2
Hyperactive Member
Re: StreamReader readline() split only at newlines that are not encapsulated in quote
First thing that pops into my head is you will have to make a custom parser that reads the file character by character with a flag denoting if you are in a "tag" or not. If the next two chars are \r\n and you flag is true, put the char into the collection, else start a new record.
idea #2, tell them to fix their stuff so you don't have to change yours
Last edited by wakawaka; Jan 23rd, 2013 at 12:29 PM.
-
Jan 23rd, 2013, 02:30 PM
#3
Thread Starter
Hyperactive Member
Re: StreamReader readline() split only at newlines that are not encapsulated in quote
So what I ended up doing is creating an extension on StreamReader to read it line-by-line until the number of double quotes in it is even.
Here is my extensions class:
Code:
public static class Extensions
{
public static string ReadEntry(this StreamReader sr)
{
string strReturn = "";
//get first bit
strReturn += sr.ReadLine();
//And get more lines until the number of quotes is even
while (strReturn.GetNumberOf("\"").IsOdd())
{
string strNow = sr.ReadLine();
strReturn += strNow;
}
//Then return what we've gotten
if (strReturn == "")
{
return null;
}
else
{
return strReturn;
}
}
public static int GetNumberOf(this string s, string strSearchString)
{
return s.Length - s.Replace(strSearchString, "").Length;
}
public static Boolean IsOdd(this int i)
{
return i % 2 != 0;
}
}
And then I was able to simply change my original call to
Code:
while ((line = readFile.ReadEntry()) != null)
Base 2
Fcnncu"Nqxgu"Lguug##
- VBForums
- .NET and More
- C#
- [RESOLVED] StreamReader readline() split only at newlines that are not encapsulated in quotes
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|