[RESOLVED] Regular Expression for A NOT B
Using regular expressions, how can we find all paragraphs of match type A NOT B
E.g Find all paragraphs which has the word "pattern" but doesn't have the word "regular".
So in the following text, 2 & 3 should be selected while 1 should be omitted:
1. With regular expressions you can describe almost any text pattern
2. including a pattern that matches two words near each other.
3. This pattern is relatively simple, consisting of three parts.
Re: Regular Expression for A NOT B
Hi, have a look at msdn for the right syntax to use : MSDN regular expressions
I think you could do in in 2 steps. first check if it matches "^((?!regular).)*$" which will return lines that do NOT contain "regular" and then for each match check if it matches something like ".*pattern.*" and you're done !
have a look at the link I posted to better understand these regular expressions.
Re: Regular Expression for A NOT B
That requires 2 passes thru the data. I was looking for something that can do this in just one pass as performance is critical to the application.
Re: Regular Expression for A NOT B
RegEx it's faster than reading and comparing strings in memory?
If not i think that 's a good way, split all text for line breaks, then for each line if string contains "pattern" and not the "regular" just write them to a new file/array whatever...
Re: Regular Expression for A NOT B
Sorry i take a look at some posts in the web, and yes the RegEx it's faster in the most of the cases...
Re: Regular Expression for A NOT B
Quote:
Originally Posted by
Pradeep1210
That requires 2 passes thru the data. I was looking for something that can do this in just one pass as performance is critical to the application.
Then this would be it :
^((?!regular).)*pattern((?!regular).)*$
Re: Regular Expression for A NOT B
Quote:
Originally Posted by
stlaural
Then this would be it :
^((?!regular).)*pattern((?!regular).)*$
Very close, but not quite. Should not have the begining and end of string characters in the pattern because that will require matching of the whole input string. The OP wants to match substrings (line) within the input string, so the pattern should be like this:
Code:
(?<=\n)((?!regular).)*pattern((?!regular).)*
Re: Regular Expression for A NOT B
Quote:
Originally Posted by
stanav
Very close, but not quite. Should not have the begining and end of string characters in the pattern because that will require matching of the whole input string. The OP wants to match substrings (line) within the input string, so the pattern should be like this:
Code:
(?<=\n)((?!regular).)*pattern((?!regular).)*
Well it does work perfectly with the set of examples that were given. If I use them at once as my input string I get line 2 & 3 as matches. But just to be sure :
^ : Matches the position at the beginning of the input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n' or '\r'.
$ : Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches the position preceding '\n' or '\r'.
So with multiline property enabled its the same thing right ?
Re: Regular Expression for A NOT B
Quote:
Originally Posted by
stlaural
So with multiline property enabled its the same thing right ?
Yes, that's true. But the default regex options in VS is none, thus unless the OP turns the multiline on, that pattern won't work. On the other hand, if we match a new line character at the begining of the pattern as I did, it will work regardless of what regex multiline option is.
Re: Regular Expression for A NOT B
Quote:
Originally Posted by
stanav
Yes, that's true. But the default regex options in VS is none, thus unless the OP turns the multiline on, that pattern won't work. On the other hand, if we match a new line character at the begining of the pattern as I did, it will work regardless of what regex multiline option is.
Good point ! thanks for the precisions. So Pradeep1210 now has two way to accomplish what he was trying to do. :check:
Re: Regular Expression for A NOT B
Thanks for the great help :thumb: