Results 1 to 3 of 3

Thread: [RESOLVED] StreamReader readline() split only at newlines that are not encapsulated in quotes

  1. #1

    Thread Starter
    Hyperactive Member half flung pie's Avatar
    Join Date
    Jun 2005
    Location
    South Carolina, USA
    Posts
    317

    Resolved [RESOLVED] StreamReader readline() split only at newlines that are not encapsulated in quotes

    I have the following function
    Code:
            /// <summary>
            /// Pulls info from CSV file and stores each entry as list of string arrays
            /// </summary>
            /// <param name="path"></param>
            /// <returns></returns>
            public static List<string[]> parseCSV(string path)
            {
                //
                List<string[]> parsedData = new List<string[]>();
    
                try
                {
                    using (StreamReader readFile = new StreamReader(path))
                    {
                        string line;
                        string[] row;
                        string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";  //Should be commas that are not encapsulated in quotation marks
                        Regex r = new Regex(pattern);
                        while ((line = readFile.ReadLine()) != null)
                        {
                            row = r.Split(line);
                            parsedData.Add(row);
                        }
                    }
                }
                catch (Exception e)
                {
                    MessageBox.Show(e.Message);
                    CommitSuicide();
                }
    
                return parsedData;
            }
    which worked well in the past, but now the company who is sending us the CSV has added a new field which sometimes has newlines (\r\n). This function will no longer support us because (line = readFile.ReadLine()) splits at each newline.

    What is the best way to modify the existing function to only split at newlines that aren't enclosed in double quotes? I suppose I could create a StreamReader extension and call it ReadEntry and basically recreate what ReadLine already does... but that sounds rather tedious and out of my skill level, to be honest.

    Base 2
    Fcnncu"Nqxgu"Lguug##

  2. #2
    Hyperactive Member
    Join Date
    Jan 2010
    Posts
    259

    Re: StreamReader readline() split only at newlines that are not encapsulated in quote

    First thing that pops into my head is you will have to make a custom parser that reads the file character by character with a flag denoting if you are in a "tag" or not. If the next two chars are \r\n and you flag is true, put the char into the collection, else start a new record.

    idea #2, tell them to fix their stuff so you don't have to change yours
    Last edited by wakawaka; Jan 23rd, 2013 at 12:29 PM.

  3. #3

    Thread Starter
    Hyperactive Member half flung pie's Avatar
    Join Date
    Jun 2005
    Location
    South Carolina, USA
    Posts
    317

    Re: StreamReader readline() split only at newlines that are not encapsulated in quote

    So what I ended up doing is creating an extension on StreamReader to read it line-by-line until the number of double quotes in it is even.

    Here is my extensions class:
    Code:
        public static class Extensions
        {
            public static string ReadEntry(this StreamReader sr)
            {
                string strReturn = "";
                //get first bit
                strReturn += sr.ReadLine();
                
                //And get more lines until the number of quotes is even
                while (strReturn.GetNumberOf("\"").IsOdd())
                {
                    string strNow = sr.ReadLine();
                    strReturn += strNow;
                }
                
                //Then return what we've gotten
                if (strReturn == "")
                {
                    return null;
                }
                else
                {
                    return strReturn;
                }
            }
    
            public static int GetNumberOf(this string s, string strSearchString)
            {
                return s.Length - s.Replace(strSearchString, "").Length;
            }
    
            public static Boolean IsOdd(this int i)
            {
                return i % 2 != 0;
            }
        }
    And then I was able to simply change my original call to
    Code:
    while ((line = readFile.ReadEntry()) != null)

    Base 2
    Fcnncu"Nqxgu"Lguug##

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width