Results 1 to 6 of 6

Thread: Need help with regex(part II)

  1. #1

    Thread Starter
    Dazed Member
    Join Date
    Oct 1999
    Location
    Ridgefield Park, NJ
    Posts
    3,418

    Need help with regex(part II)

    I wrote a pattern which validates email addresses. Its seems to work but i wanted to see if anyone could come up with some variations and possibly some tips. Thanks.
    Code:
    String email = new String("(?:\\w+?\\@{1}\\w+?)\\.{1}(?:com|net|org|edu)");

  2. #2
    VBA Nutter visualAd's Avatar
    Join Date
    Apr 2002
    Location
    Ickenham, UK
    Posts
    4,906

    Re: Need help with regex(part II)

    Quote Originally Posted by Dilenger4
    I wrote a pattern which validates email addresses. Its seems to work but i wanted to see if anyone could come up with some variations and possibly some tips. Thanks.
    Code:
    String email = new String("(?:\\w+?\\@{1}\\w+?)\\.{1}(?:com|net|org|edu)");
    An email address typically consists of two major parts:
    • Mailbox - the can contain letters, numbers, underscores, hyphens and can also contain a + sign as well as dots of course. So I would recommend you use the ( . ) to match this part:
    • Fully Qualified domain name / host name - If I were you I wouldn't chck for valid TLD's, what about .org.uk and .sch.uk and .de and ac.uk and .mil?

      You can however check that it contains only letters, numbers and hyphens. But they cannot contain underscores, so you can't use \w.

    Code:
    PCRE:
    
    /^.+@((?i)[a-z0-9\-]+(\.(?i)[a-z0-9\-]+)?)+$/
    PHP || MySql || Apache || Get Firefox || OpenOffice.org || Click || Slap ILMV || 1337 c0d || GotoMyPc For FREE! Part 1, Part 2

    | PHP Session --> Database Handler * Custom Error Handler * Installing PHP * HTML Form Handler * PHP 5 OOP * Using XML * Ajax * Xslt | VB6 Winsock - HTTP POST / GET * Winsock - HTTP File Upload

    Latest quote: crptcblade - VB6 executables can't be decompiled, only disassembled. And the disassembled code is even less useful than I am.

    Random VisualAd: Blog - Latest Post: When the Internet becomes Electricity!!


    Spread happiness and joy. Rate good posts.

  3. #3

    Thread Starter
    Dazed Member
    Join Date
    Oct 1999
    Location
    Ridgefield Park, NJ
    Posts
    3,418

    Re: Need help with regex(part II)

    Thanks for replying visualAd. Yeah i didn't take into account that an email address might contain hyphens, underscores and dots. The + sign ive never seen used in an address though. Ive just been using the code below to test the patterns i create. I found the following expression which might be better suited. "(\\w[\\-.\\w]*.*@\\w+\\.(?:com|net|org))". I didn't create it so im a bit shady on how it works. I guess it tests for a word character \\w(dont know why they didn't specify a quanitifer), [\\-.\\w](Guess it's supposed to be read "-" or "." or just a word character or set of words), then more words, don't know why the @ isn't escaped then well we get the rest .
    Code:
    import java.util.regex.Pattern; 
    import java.util.regex.Matcher; 
    
    public class mailval{
     public static void main(String[] args){
      String email = new String("(?:\\w+?\\@{1}\\w+?)\\.{1}(?:com|net|org|edu)"); 
      Pattern emp = Pattern.compile(email);
      Matcher m = null;  
      
      String[] malto = new String[7]; 
      malto[0] = "[email protected]"; // should be true
      malto[1] = "whatever@@whatever.com"; // false
      malto[2] = "whatever@@whatever.biz"; // false
      malto[3] = "[email protected]"; // true
      malto[4] = "[email protected]"; // true
      malto[5] = "[email protected]"; //false
      malto[6] = "[email protected]"; //false
    
      for(int i = 0; i < malto.length; i++){
       if(emp.matcher(malto[i]).matches()){
    	System.out.println("true"); 
       }else{
        System.out.println("false"); 
       }
      }
     }
    }

  4. #4
    VBA Nutter visualAd's Avatar
    Join Date
    Apr 2002
    Location
    Ickenham, UK
    Posts
    4,906

    Re: Need help with regex(part II)

    I assumed that the regex I gave you didn't work. Actually I'm sure of it - the Java pattern syntax is a little different from the Perl Compatible syntax I gave you.

    I still think you should not limit the top level domains to only that small subset. If you want to ensure that the email is correct, the only real measure you can take is to send the person an email and ask them to click a confirmation link. The regex can only really be used to detect mistakes made by people entering an address and should be there for the convenience of the user more than a method used by the devleoper to ensure a fake email hasn't been entered.

    The first part of the email address should really be free form and I were you the only constraint I would put on it would be to ensure that there is at least one character there before the @, which is why I suggest you use the \S class which matches any non white space character.

    Then of course you have the literal @ character which is a requitement and the omission of that will defintaly mean that the email address is invalid.

    The next part of the address is the host. On a local network this may not be a fully qualified domain but if you want to ensure it is a fully qualified domain then you should match at least one group of letters, numbers and hyphens and at least on dot followed by another group of letters numbers and hyphens.

    So all said, give this one a go:
    Code:
    \\S+?@(?:[\\w\\d\\-]+?\\.)(?:[\\w\\d\\-]+?\\.?)+

    \\S+? : matches the first part of the email. This will match at least one non-whitespace chracter in a non greedy way.

    @ - matches the @ sign

    (?:[\\w\\d\\-]+?\\.) : matches the first part of a fuly qualified domain i.e: vbforums.com. This can be any word character \\w, any numeric character \\d and any hyphen \\- followed by a literal dot \\.

    (?:[\\w\\d\\-]+?\\.?)+ : matches further parts of the domain name. As with the first part of domain, word characters, numeric characters and hyphens are mateched but this time with an optional dot \\.? at the end as it may be the last part of the domain i.e: vbforums.com. The entire subpattern must be matched at least once.
    PHP || MySql || Apache || Get Firefox || OpenOffice.org || Click || Slap ILMV || 1337 c0d || GotoMyPc For FREE! Part 1, Part 2

    | PHP Session --> Database Handler * Custom Error Handler * Installing PHP * HTML Form Handler * PHP 5 OOP * Using XML * Ajax * Xslt | VB6 Winsock - HTTP POST / GET * Winsock - HTTP File Upload

    Latest quote: crptcblade - VB6 executables can't be decompiled, only disassembled. And the disassembled code is even less useful than I am.

    Random VisualAd: Blog - Latest Post: When the Internet becomes Electricity!!


    Spread happiness and joy. Rate good posts.

  5. #5

    Thread Starter
    Dazed Member
    Join Date
    Oct 1999
    Location
    Ridgefield Park, NJ
    Posts
    3,418

    Re: Need help with regex(part II)

    I didn't get to try the pattern you posted. /^.+@((?i)[a-z0-9\-]+(\.(?i)[a-z0-9\-]+)?)+$/ It dosen't seem too far off from a regular expression written in Java though.

  6. #6
    Kitten CornedBee's Avatar
    Join Date
    Aug 2001
    Location
    In a microchip!
    Posts
    11,594

    Re: Need help with regex(part II)

    Here is a pattern that really allows ALL valid e-mail address and NO others.
    ^[A-Za-z0-9!#-'\*\+\-\/=\?\^_`\{-~]+(\.[A-Za-z0-9!#-'\*\+\-\/=\?\^_`\{-~]+)*@[A-Za-z0-9!#-'\*\+\-\/=\?\^_`\{-~]+(\.[A-Za-z0-9!#-'\*\+\-\/=\?\^_`\{-~]+)*$

    It seems complicated, but really is very simple. The core part is this character class:
    [A-Za-z0-9!#-'\*\+\-\/=\?\^_`\{-~]
    This is the collection of all characters that are valid as normal parts of e-mail addresses. Substitute this by [[:mail:]] and the whole expression becomes:
    ^[[:mail:]]+(\.[[:mail:]]+)*@[[:mail:]]+(\.[[:mail:]]+)*$
    So we have the start of the string, followed by one or more mail characters. Then there are any numbers of groups that consist of a dot followed by one or more mail characters.
    Then comes the @.
    After that, the same pattern again.
    All the buzzt
    CornedBee

    "Writing specifications is like writing a novel. Writing code is like writing poetry."
    - Anonymous, published by Raymond Chen

    Don't PM me with your problems, I scan most of the forums daily. If you do PM me, I will not answer your question.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width