Results 1 to 15 of 15

Thread: RegEx - Split on comma character, excluding those in quotationmarks.

  1. #1

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    RegEx - Split on comma character, excluding those in quotationmarks.

    Hey there.

    Given an input string like the following:
    I, Need, Some, Coffee, Before, I, "Fall, Asleep"

    I need to split this into parts like so:
    I
    Need
    Some
    Coffee
    Before
    I
    Fall Asleep


    Splitting on the comma character alone is easy enough, but how can I handle the quotationmarks? Regular expressions is not on my strong side, and I have been googling for quite a bit without any good results.

    Thanks.
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  2. #2
    Learning .Net danasegarane's Avatar
    Join Date
    Aug 2004
    Location
    VBForums
    Posts
    5,853

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    You could try this regular expression pattern..

    Code:
    Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex("(""[^""]*"")|,")         ' Split on hyphens.
                Dim substrings() As String = regex.Split("I, Need, Some, Coffee, Before, I, ""Fall, Asleep""")
                For Each match As String In substrings
                    Console.WriteLine("'{0}'", match)
                Next
    Please mark you thread resolved using the Thread Tools as shown

  3. #3

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    Thanks dana, this is great! Just one thing...is it possible to not include the quotation marks in the result?
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  4. #4

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    I found this (its C# though!), but it splits on whitespace, not comma characters. I'm not sure what do modify to make it split on comma characters instead.

    Code:
                string input = "I want my coffee without \"milk and sugar\"";
                Regex regex = new Regex(@"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", RegexOptions.None);
                List<string> result = (from Match m in regex.Matches(input)
                                       where m.Groups["token"].Success
                                       select m.Groups["token"].Value).ToList();
    
                foreach (string s in result)
                    Console.WriteLine(s);
    
                Console.ReadLine();
    EDIT: Disregard this, it seems like it does not split on whitespace, it finds sequences of characters, so it'll split on anything that isnt a character..which is not what I want.
    Last edited by Atheist; Aug 9th, 2011 at 06:21 AM.
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  5. #5
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,687

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    Quote Originally Posted by Atheist View Post
    Thanks dana, this is great! Just one thing...is it possible to not include the quotation marks in the result?
    you could do a replace on the results...

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  6. #6
    Learning .Net danasegarane's Avatar
    Join Date
    Aug 2004
    Location
    VBForums
    Posts
    5,853

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    I am unable to find the pattern for that one..
    Please mark you thread resolved using the Thread Tools as shown

  7. #7
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,687

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    what? for the replace? Oh, I was thinking a little more low-tech ... string.replace ...

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  8. #8
    Frenzied Member MattP's Avatar
    Join Date
    Dec 2008
    Location
    WY
    Posts
    1,227

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    I know the thread asks for RegEx to handle this, but any reason you're not using a TextFieldParser?

    Code:
            Dim input = "I, Need, Some, Coffee, Before, I, ""Fall, Asleep"""
            Dim results As String()
            Using s As New IO.MemoryStream(New ASCIIEncoding().GetBytes(input)),
                tfp As New FileIO.TextFieldParser(s)
                tfp.Delimiters = {","}
                tfp.HasFieldsEnclosedInQuotes = True
                results = tfp.ReadFields()
            End Using
    Last edited by MattP; Aug 9th, 2011 at 09:36 AM. Reason: Can't spell thread apparently

  9. #9
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    You simply adjust the regular expression pattern to not capture the double quotes:
    Code:
    Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"" Otherwise, ""I'll, Die"""
    Dim regex = New Regex("""([^""]*)""|,")
    
    Dim subStrings() = regex.Split(text)
    For Each match In subStrings
        Dim current = match.Trim()
        If Not current = String.Empty Then
            Console.WriteLine("{0}", current)
        End If
    Next
    Output:
    Code:
    I
    Need
    Coffee
    Before
    I
    Fall, Asleep
    Otherwise
    I'll, Die

  10. #10

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    tg:
    Yeah a string.replace would work, but I figure while I'm at it I might aswell see if there is a possibility to get rid of the "s directly with the regex

    MattP:
    I've never seen the TextFieldParser, it looks quite handy! I would try it, but is there any reason that this is in the Microsoft.VisualBasic namespace? (This regexp expression would be used in a C# application, despite this thread being in the vb.net forum).

    ForumAccount:
    That looks great, there's just one problem.. when splitting a string containing "quotationmarked values", the returned string array will have empty elements before and after the value.

    input: A,"B,C",D

    Will thus give:

    0: A
    1:
    2: B,C
    3:
    4: D

    It is not an option for me to remove empty entries, because the text i am parsing might contain fields that are supposed to be empty. Is there any way to avoid the empty spaces that are "created"?
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  11. #11
    Learning .Net danasegarane's Avatar
    Join Date
    Aug 2004
    Location
    VBForums
    Posts
    5,853

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    How about this one

    Code:
    Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"" Otherwise, ""I'll, Die"""
                Dim regex = New Regex("""([^""]*)""|,")
    
                Dim subStrings = regex.Split(text).Where(Function(str) str.Trim.Length > 0)
                For Each match In subStrings
                    Dim current = match.Trim()
                    If Not current = String.Empty Then
                        Console.WriteLine("{0}", current)
                    End If
                Next
    Please mark you thread resolved using the Thread Tools as shown

  12. #12

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    Thanks dana, but I cant check if the string in the splitted array is empty after the split. This is because the data I am splitting may contain empty fields here and there..and I can not filter them out.
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  13. #13
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    Quote Originally Posted by Atheist View Post
    ForumAccount:
    That looks great, there's just one problem.. when splitting a string containing "quotationmarked values", the returned string array will have empty elements before and after the value.

    input: A,"B,C",D

    Will thus give:

    0: A
    1:
    2: B,C
    3:
    4: D

    It is not an option for me to remove empty entries, because the text i am parsing might contain fields that are supposed to be empty. Is there any way to avoid the empty spaces that are "created"?
    No, it's not possible, at least using the Split method. The MSDN says this:
    Quote Originally Posted by MSDN
    If multiple matches are adjacent to one another, an empty string is inserted into the array. For example, splitting a string on a single hyphen causes the returned array to include an empty string in the position where two adjacent hyphens are found, as the following code shows.
    In your case, the adjacent matches are the non-enclosed beside the quote-enclosed values. As evident in your example.

    I did end up revising the pattern to exclude the spaces after the comma though (no need for .Trim()):
    Code:
    Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"", Otherwise, ""I'll, Die"""
    Dim regex = New Regex(",\s*|""([^""]*)""")
    
    Dim subStrings() = regex.Split(text)
    For Each match In subStrings
        Console.WriteLine("{0}", match)
    Next

  14. #14

    Thread Starter
    Raging swede Atheist's Avatar
    Join Date
    Aug 2005
    Location
    Sweden
    Posts
    8,018

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    Would it not be possible to use regex.Match instead of regex.Split? I really must avoid having arbitrary empty elements in my returned split. If there's no way to do this in regex... I will have to parse my text in some other way..
    Rate posts that helped you. I do not reply to PM's with coding questions.
    How to Get Your Questions Answered
    Current project: tunaOS
    Me on.. BitBucket, Google Code, Github (pretty empty)

  15. #15
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: RegEx - Split on comma character, excluding those in quotationmarks.

    I think I got a working pattern (not for Split):
    Code:
    Dim text = "A, B, C, ""D, E, F"", G, ""H, I"", J"
    Dim r = New Regex("""(?<g>[^""]+)"",?(\s+)?|(?<g>[^"",]+$)|((?<g>[^"",]*),\s*)*")
    Dim matches = r.Matches(text)
    
    For Each m As Match In matches
        Dim g = m.Groups("g")
        For Each c As Capture In g.Captures
            Console.WriteLine("'{0}'", c.Value)
        Next
    Next
    Output:
    Code:
    'A'
    'B'
    'C'
    'D, E, F'
    'G'
    'H, I'
    'J'
    The named capture group 'g' will contain all the fields, there should be no need for:
    • Checking for empties (unless it is a valid empty, i.e. "A,,C")
    • Trimming the result fields

    I did some testing behind the scenes, if you find something that doesn't work then let me know.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width