-
Greedy,Reluctant,Possessive Quantifers?
Anyone know what the similarities or differences are between these quantifers? Under the docs the descriptions are all the same. :confused:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Also how are these quantifers used? I made a simple test program using the greedy,reluctant,possessive quantifiers(X, once or not at all) quantifers
but i don't understand the output.
Code:
import java.util.*;
public class X{
public static void main(String[] args){
String data = new String("mxpxr");
Scanner s = new Scanner(data);
// s.useDelimiter("x"); // returns mpr
// s.useDelimiter("x?"); // returns mpr{greedy quantifer}
// s.useDelimiter("x??"); // returns mxpxr{reluctant quantifer}
// s.useDelimiter("x?+"); // returns mpr{possessive quantifer}
while(s.hasNext()){
System.out.println(s.next());
}
}
}
-
Re: Greedy,Reluctant,Possessive Quantifers?
I've never heard of possessive and reluctant qualifiers. But the little documentation we have is enough that I can see what they mean.
example string: "abababab"
The difference between greedy, possessive and reluctant (reluctant is usually called non-greedy) qualifiers is in the matching strategy.
Greedy is the default: the quantifier will try to match as much as possible, so long as the overall pattern still matches.
"(ab)*(ab)+"
In this case, the first part is greedy. The second part only needs a single "ab" to succeed, so the first can greedily grab "ababab".
Reluctant matches as little as possible to make the pattern succeed.
"(ab)*?(ab)+"
Since the second part can match the whole string and the first needs match nothing, the reluctant first part will indeed match the empty string "". The second part gets to take everything else, i.e. "abababab".
Possessive matches take as much as they can, even if that prevents a complete match of the pattern.
"(ab)*+(ab)+"
In this case, the first part grabs the whole string, "abababab", leaving nothing for the second part. Since the second part needs at least "ab", the overall pattern fails.
Non-greedy patterns are usually used in delimiter situations. Take, for example, this HTML snippet:
Code:
<DIV>Hello<P>Blabla</P></DIV>
Now you want to do a search & replace to make all tag names lowercase. You devise this regexp:
"<(/?)(.*)>"
to be replaced with an epression like this:
"<"+match1+match2.toLowerCase()">"
When you match this against the above a single time, you'll find that the result is
Code:
<div>hello<p>blabla</p></div>
This is not intended. The problem is that .* matches everything - really everything. In this case, it matched "DIV>Hello<P>Blabla</P></DIV".
Making the group non-greedy solves this problem. Since the pattern CAN match only "DIV", it will.
-
Re: Greedy,Reluctant,Possessive Quantifers?
Ok im starting to understand what the quantifers mean. I wish sun would
have a little better documentation on them. :rolleyes: CornedBee could you
go over this pattern expression when you get a chance? "<(/?)(.*)>".
(/?) Is a Greedy quantifer which matches once or not at all. Since
<DIV>Hello<P>Blabla</P></DIV> has at least one / we should be fine.
Now what is the use of the . in (.*)? It seems like it means match anything
after the / but i dont see it in the docs. Also shouldn't we end up with
<DIV>Hello<P>Blabla</p></div> since the pattern is looking for forward slashes?
Thanks for the help. :)
-
Re: Greedy,Reluctant,Possessive Quantifers?
/? means zero or one slashes, so it finds both <something> and </something>.
The . in .* is the match-all. . matches any character (except a newline unless a special flag is set).
* is the any-amount.
So .* means "match any amount of any characters".
-
Re: Greedy,Reluctant,Possessive Quantifers?
If /? means zero or one slashes then why does /? match both instances of the / in <HTML><P>Hello</P></HTML>. I thought /* would be used instead. :confused:
-
Re: Greedy,Reluctant,Possessive Quantifers?
Stuff like this is throwing me off. If a greedy quantifer is supposed to grab as much as possible then why is the output ab?
Code:
import java.util.*;
import java.util.regex.*;
public class X{
public static void main(String[] args){
Scanner s = new Scanner("ababab");
Pattern p = Pattern.compile("(ab)*");// shouldn't it be ababab?
s.findInLine(p);
MatchResult result = s.match();
for (int i=1; i<=result.groupCount(); i++)
System.out.println(result.group(i));
s.close();
}
}
-
Re: Greedy,Reluctant,Possessive Quantifers?
Quote:
Originally Posted by Dilenger4
If /? means zero or one slashes then why does /? match both instances of the / in <HTML><P>Hello</P></HTML>. I thought /* would be used instead. :confused:
It doesn't. What gives you the impression?
In the faulty case I've shown, the /? matches nothing and the .* everything, including the two slashes.
As for your code, I don't know. I think the Scanner is making things more complicated:
Code:
public class X
{
public static void main(String[] args) {
Pattern p = Pattern.compile("(ab)*");// shouldn't it be ababab?
Matcher m = p.matcher("ababa");
if(m.find()) {
System.out.println("Matched: " + m.group());
}
}
}
Matched: abab
-
Re: Greedy,Reluctant,Possessive Quantifers?
Yes the Scanner is definitely making things more difficult. Ive been trying to use the Scanner with a Pattern but i still can't figure it out.
Code:
/*
runs with no output
import java.util.*;
import java.util.regex.*;
public class X{
public static void main(String[] args){
Scanner s = new Scanner("ababa");
Pattern p = Pattern.compile("(ab)*");// should be abab
while(s.hasNext(p)){
System.out.print(s.next());
}
}
}
*/
/*
throws java.lang.IllegalStateException
import java.util.*;
import java.util.regex.*;
public class X{
public static void main(String[] args){
Scanner s = new Scanner("ababa");
Pattern p = Pattern.compile("(ab)*");// should be abab
MatchResult mr = s.match(); // if match took a Pattern as an arg but it dosen't
System.out.print(mr.group());
}
}
*/
-
Re: Greedy,Reluctant,Possessive Quantifers?
This is the only way ive been able to get a Scanner and Pattern to work together so far.
Code:
import java.util.*;
import java.util.regex.*;
public class X{
public static void main(String[] args){
Scanner s = new Scanner("ababa");
Pattern p = Pattern.compile("(ab)*");// should be abab
String result = s.findInLine(p);
System.out.println(result);
}
}
-
Re: Greedy,Reluctant,Possessive Quantifers?
I just found this:
http://www.weitz.de/regex-coach/
Looks pretty cool.
-
Re: Greedy,Reluctant,Possessive Quantifers?
That does look pretty cool. Im gonna dl it when i get a chance. Perhaps tommorow. The semester is almost over and i have a terrorism and civil liberties paper due tommorow so i doubt ill get to play with it today. :cry:
-
Re: Greedy,Reluctant,Possessive Quantifers?
Ive been working with the Regex Coach but im pretty much lost. :sick: Take the following target string <DIV>Hello<P>Blabla</P></DIV>. I tried to match every instance of "<". Using a reg ex such as (<)* only matches <DIV>Hello<P>Blabla</P></DIV>. Shouldn't the greedy quantifer try to match as much as possible which would be every instance of < in the html string?
-
Re: Greedy,Reluctant,Possessive Quantifers?
No, the quantifier expects the characters to be next to each other in order to match them.
To match disjoint similar patterns, you need to make your pattern a global one. This won't result in a single continous match, though, it will result in several disjoint matches, and it depends on the matcher how you retrieve them.
Example (using JavaScript RegEx syntax):
Code:
var base = "<DIV>Hello<P>Blabla</P></DIV>";
var re = /</; // This pattern is not global.
var input = base;
input = input.replace(re, "|"); // Thus it matches once here.
alert(input); // Gives "|DIV>Hello<P>Blabla</P></DIV>".
re = /</g; // This pattern is global.
input = base;
input = input.replace(re, "|"); // Thus it matches multiple times here.
alert(input); // Gives "|DIV>Hello|P>Blabla|/P>|/DIV>".
-
Re: Greedy,Reluctant,Possessive Quantifers?
I understand the semantics but the syntax confusing. :ehh:. /</ and /</g i can't find in the docs. Plus everything uses backslashes and not forwardslashes.
Are /</ and /</g from the POSIX character classes?
-
Re: Greedy,Reluctant,Possessive Quantifers?
No, in JavaScript, the forward slashes delimit the pattern, like in Perl, and the letters after the second slash indicate options.
So /</ is the RegEx pattern "<". /</g is the pattern "<" with the global match option.
-
Re: Greedy,Reluctant,Possessive Quantifers?
JavaScript :rolleyes: I was wondering why there were no types assigned to your variables. :lol:
-
Re: Greedy,Reluctant,Possessive Quantifers?
The Regex Coach does have a global option but it only affects the replacement option. There is no effect on the match itself. Are you sure /g makes a pattern global? Regex dosen't recognize /g and there is no /g in the docs for the Pattern class. :sick:
-
Re: Greedy,Reluctant,Possessive Quantifers?
The Java docs have this to say about the g flag:
Quote:
Perl uses the g flag to request a match that resumes where the last match left off. This functionality is provided implicitly by the Matcher class: Repeated invocations of the find method will resume where the last match left off, unless the matcher is reset.
A normal find is used when you want to find one match and take some action. With the g flag, you can then call find again and it will find the next match, where you can take some action again. If the g flag is not specified, it will find the same match again.
The replace function has an implicit action, so if the g flag is there, it will replace all occurences, without it it will only replace the first.
In Java, the g flag is always there. This means that find remembers the position where it left of, and replace is split into replaceAll and replaceFirst.
-
Re: Greedy,Reluctant,Possessive Quantifers?
:thumb: Ah ok. I didn't know the g flag is always there. :sick:
Yup works like a dream. :lol:
Code:
String html = "<DIV>Hello<P>Blabla</P></DIV>";
Scanner s = new Scanner(html);
Pattern p = Pattern.compile("<");
html = html.replace(s.findInLine(p),"[");
System.out.println(html);