|
-
Aug 9th, 2011, 04:31 AM
#1
RegEx - Split on comma character, excluding those in quotationmarks.
Hey there.
Given an input string like the following:
I, Need, Some, Coffee, Before, I, "Fall, Asleep"
I need to split this into parts like so:
I
Need
Some
Coffee
Before
I
Fall Asleep
Splitting on the comma character alone is easy enough, but how can I handle the quotationmarks? Regular expressions is not on my strong side, and I have been googling for quite a bit without any good results.
Thanks.
-
Aug 9th, 2011, 05:26 AM
#2
Re: RegEx - Split on comma character, excluding those in quotationmarks.
You could try this regular expression pattern..
Code:
Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex("(""[^""]*"")|,") ' Split on hyphens.
Dim substrings() As String = regex.Split("I, Need, Some, Coffee, Before, I, ""Fall, Asleep""")
For Each match As String In substrings
Console.WriteLine("'{0}'", match)
Next
Please mark you thread resolved using the Thread Tools as shown
-
Aug 9th, 2011, 05:56 AM
#3
Re: RegEx - Split on comma character, excluding those in quotationmarks.
Thanks dana, this is great! Just one thing...is it possible to not include the quotation marks in the result?
-
Aug 9th, 2011, 06:15 AM
#4
Re: RegEx - Split on comma character, excluding those in quotationmarks.
I found this (its C# though!), but it splits on whitespace, not comma characters. I'm not sure what do modify to make it split on comma characters instead.
Code:
string input = "I want my coffee without \"milk and sugar\"";
Regex regex = new Regex(@"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", RegexOptions.None);
List<string> result = (from Match m in regex.Matches(input)
where m.Groups["token"].Success
select m.Groups["token"].Value).ToList();
foreach (string s in result)
Console.WriteLine(s);
Console.ReadLine();
EDIT: Disregard this, it seems like it does not split on whitespace, it finds sequences of characters, so it'll split on anything that isnt a character..which is not what I want.
Last edited by Atheist; Aug 9th, 2011 at 06:21 AM.
-
Aug 9th, 2011, 07:21 AM
#5
Re: RegEx - Split on comma character, excluding those in quotationmarks.
 Originally Posted by Atheist
Thanks dana, this is great! Just one thing...is it possible to not include the quotation marks in the result?
you could do a replace on the results...
-tg
-
Aug 9th, 2011, 08:00 AM
#6
Re: RegEx - Split on comma character, excluding those in quotationmarks.
I am unable to find the pattern for that one..
Please mark you thread resolved using the Thread Tools as shown
-
Aug 9th, 2011, 08:52 AM
#7
Re: RegEx - Split on comma character, excluding those in quotationmarks.
what? for the replace? Oh, I was thinking a little more low-tech ... string.replace ...
-tg
-
Aug 9th, 2011, 09:35 AM
#8
Re: RegEx - Split on comma character, excluding those in quotationmarks.
I know the thread asks for RegEx to handle this, but any reason you're not using a TextFieldParser?
Code:
Dim input = "I, Need, Some, Coffee, Before, I, ""Fall, Asleep"""
Dim results As String()
Using s As New IO.MemoryStream(New ASCIIEncoding().GetBytes(input)),
tfp As New FileIO.TextFieldParser(s)
tfp.Delimiters = {","}
tfp.HasFieldsEnclosedInQuotes = True
results = tfp.ReadFields()
End Using
Last edited by MattP; Aug 9th, 2011 at 09:36 AM.
Reason: Can't spell thread apparently
-
Aug 9th, 2011, 02:18 PM
#9
Re: RegEx - Split on comma character, excluding those in quotationmarks.
You simply adjust the regular expression pattern to not capture the double quotes:
Code:
Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"" Otherwise, ""I'll, Die"""
Dim regex = New Regex("""([^""]*)""|,")
Dim subStrings() = regex.Split(text)
For Each match In subStrings
Dim current = match.Trim()
If Not current = String.Empty Then
Console.WriteLine("{0}", current)
End If
Next
Output:
Code:
I
Need
Coffee
Before
I
Fall, Asleep
Otherwise
I'll, Die
-
Aug 10th, 2011, 04:14 AM
#10
Re: RegEx - Split on comma character, excluding those in quotationmarks.
tg:
Yeah a string.replace would work, but I figure while I'm at it I might aswell see if there is a possibility to get rid of the "s directly with the regex 
MattP:
I've never seen the TextFieldParser, it looks quite handy! I would try it, but is there any reason that this is in the Microsoft.VisualBasic namespace? (This regexp expression would be used in a C# application, despite this thread being in the vb.net forum).
ForumAccount:
That looks great, there's just one problem.. when splitting a string containing "quotationmarked values", the returned string array will have empty elements before and after the value.
input: A,"B,C",D
Will thus give:
0: A
1:
2: B,C
3:
4: D
It is not an option for me to remove empty entries, because the text i am parsing might contain fields that are supposed to be empty. Is there any way to avoid the empty spaces that are "created"?
-
Aug 10th, 2011, 04:52 AM
#11
Re: RegEx - Split on comma character, excluding those in quotationmarks.
How about this one
Code:
Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"" Otherwise, ""I'll, Die"""
Dim regex = New Regex("""([^""]*)""|,")
Dim subStrings = regex.Split(text).Where(Function(str) str.Trim.Length > 0)
For Each match In subStrings
Dim current = match.Trim()
If Not current = String.Empty Then
Console.WriteLine("{0}", current)
End If
Next
Please mark you thread resolved using the Thread Tools as shown
-
Aug 10th, 2011, 06:10 AM
#12
Re: RegEx - Split on comma character, excluding those in quotationmarks.
Thanks dana, but I cant check if the string in the splitted array is empty after the split. This is because the data I am splitting may contain empty fields here and there..and I can not filter them out.
-
Aug 10th, 2011, 09:46 AM
#13
Re: RegEx - Split on comma character, excluding those in quotationmarks.
 Originally Posted by Atheist
ForumAccount:
That looks great, there's just one problem.. when splitting a string containing "quotationmarked values", the returned string array will have empty elements before and after the value.
input: A,"B,C",D
Will thus give:
0: A
1:
2: B,C
3:
4: D
It is not an option for me to remove empty entries, because the text i am parsing might contain fields that are supposed to be empty. Is there any way to avoid the empty spaces that are "created"?
No, it's not possible, at least using the Split method. The MSDN says this:
 Originally Posted by MSDN
If multiple matches are adjacent to one another, an empty string is inserted into the array. For example, splitting a string on a single hyphen causes the returned array to include an empty string in the position where two adjacent hyphens are found, as the following code shows.
In your case, the adjacent matches are the non-enclosed beside the quote-enclosed values. As evident in your example.
I did end up revising the pattern to exclude the spaces after the comma though (no need for .Trim()):
Code:
Dim text = "I, Need, Coffee, Before, I, ""Fall, Asleep"", Otherwise, ""I'll, Die"""
Dim regex = New Regex(",\s*|""([^""]*)""")
Dim subStrings() = regex.Split(text)
For Each match In subStrings
Console.WriteLine("{0}", match)
Next
-
Aug 12th, 2011, 02:22 AM
#14
Re: RegEx - Split on comma character, excluding those in quotationmarks.
Would it not be possible to use regex.Match instead of regex.Split? I really must avoid having arbitrary empty elements in my returned split. If there's no way to do this in regex... I will have to parse my text in some other way..
-
Aug 12th, 2011, 10:33 AM
#15
Re: RegEx - Split on comma character, excluding those in quotationmarks.
I think I got a working pattern (not for Split):
Code:
Dim text = "A, B, C, ""D, E, F"", G, ""H, I"", J"
Dim r = New Regex("""(?<g>[^""]+)"",?(\s+)?|(?<g>[^"",]+$)|((?<g>[^"",]*),\s*)*")
Dim matches = r.Matches(text)
For Each m As Match In matches
Dim g = m.Groups("g")
For Each c As Capture In g.Captures
Console.WriteLine("'{0}'", c.Value)
Next
Next
Output:
Code:
'A'
'B'
'C'
'D, E, F'
'G'
'H, I'
'J'
The named capture group 'g' will contain all the fields, there should be no need for:
- Checking for empties (unless it is a valid empty, i.e. "A,,C")
- Trimming the result fields
I did some testing behind the scenes, if you find something that doesn't work then let me know.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|