Grouping and Capturing
()'s in regular expressions serve two purposes. To group the enclosed elements and to capture the text that is matched by the enclosed sub-expression. I won't go into detail about how to write regular expressions but you can find all you need to know at the sun docs http://java.sun.com/reference/api/index.html. Just choose your api version and take a trip over to the java.util.regex package. You will find only two classes within. Pattern and Matcher. Pattern is the class you will most likley want to check out since the docs for the Pattern class contain some simple examples on how to grab an instance of a Pattern and how to use it with a Matcher.You will also find an extensive list of character classes plus short hand notation for them, quantifiers and special constructs like ?: which we will use shortly. Ok ok enough you say! With that said let's begin!
To use capturing nothing special is needed within a regular expression. Only to group the parts of the regular expression that you wish to capture with ()'s.
Lets use the following regular expression (\\w+?\\@{1}\\w+?)(\\.{1})(com|net|org|edu). According to the rules, parentheses are grouped from left to right and their position within the regular expression determines the contents of the corresponding backreferences. So using the above expression we have three groups (\\w+?\\@{1}\\w+?),(\\.{1}),(com|net|org|edu). Ok fine you ask. Now how do we access the text that was matched? java.util.regex.Matcher provides a number of result query methods which can be used. group() provides access to the text that was matched by the previous match application while group(int i) allows you to access the text that was matched by the ith group ()'s. We will use the following code for test purposes.
Code:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class GroupingandCapturing{
public static void main(String[] args){
String email = "whatever@who.com";
Pattern p = Pattern.compile("(\\w+?\\@{1}\\w+?)(\\.{1})(com|net|org|edu)");
Matcher m = p.matcher(email);
boolean isvalid = m.matches();
String s0 = m.group(0); //whatever@who.com, 0 always returns the whole text matched
String s1 = m.group(1); //whatever@who
String s2 = m.group(2); //.
String s3 = m.group(3); //com
// String s4 = m.group(4); //no group 4!
if(isvalid){
System.out.println("Email address is valid");
}else{
System.out.println("Email address is not valid");
}
System.out.println(s0);
System.out.println(s1);
System.out.println(s2);
System.out.println(s3);
// System.out.println(s4);
}
}
Now what about this ?: that was mentioned? ?: is used to specify grouping but non-capturing. The match operation can be significantly sped up when using ?: since no data needs to be preserved from the match operation. You will notice that if you run the previous code using a regular expression that uses non-capturing ()'s such as (?:\\w+?\\@{1}\\w+?)(?:\\.{1})(?:com|net|org|edu) the only call to a group method that will not throw an exception will be group(0).