Results 1 to 7 of 7

Thread: [RESOLVED] Finding URLs in a String using Regular Expressions

  1. #1

    Thread Starter
    Member
    Join Date
    Jun 2007
    Location
    England
    Posts
    61

    Resolved [RESOLVED] Finding URLs in a String using Regular Expressions

    Hello.

    I am trying to write an app that will read in a HTML document and extract all the URLs from it.

    I currently have the HTML document being read in line by line and I need to be able to identify if there are any URLs in the string.

    Someone suggested using Regular Expressions?

    I am having abit of trouble doing this.

    I am trying something like this but it doesnt work. Any help will be great!! THANKS

    Code:
    		String test = new String("bla bla bla http://somesite.com/tmp/page.html bla bla");
    		String regex = "@\"http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]*)?\\b\")";
    		Pattern p = Pattern.compile(regex);
    		Matcher m = p.matcher(test);
    		
    		if (m.find()){
    			System.out.println(m.group(1));
    		}
    		else{
    			System.out.println("Not found!");
    		}
    Last edited by DonCash; Oct 24th, 2007 at 09:58 AM.

  2. #2

    Thread Starter
    Member
    Join Date
    Jun 2007
    Location
    England
    Posts
    61

    Re: Finding URLs in a String using Regular Expressions

    Sorted. Ive worked it out.. Thanks anyway.

    Code:
    	    URL url = new URL("http://www.bla.com");
    		BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
    		String strLine = "";
    
    		String URLregex = "http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w-./?%&=]*)?\\b";
    		
    		Pattern p = Pattern.compile(URLregex);
    				
    		while ((strLine = in.readLine()) != null){
    	
    			Matcher m = p.matcher(strLine);
    
    			if (m.find()){
    				System.out.println(m.group(0));
    				
    			}
    				
    		}

  3. #3
    Arabic Poster ComputerJy's Avatar
    Join Date
    Nov 2005
    Location
    Happily misplaced
    Posts
    2,513

    Re: [RESOLVED] Finding URLs in a String using Regular Expressions

    I don't know if you've noticed but your code will only read the first url in each line.

    So if the whole page was formated into a single line you'll only get one response.

    That's a logical error
    "I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
    My Blog

  4. #4

    Thread Starter
    Member
    Join Date
    Jun 2007
    Location
    England
    Posts
    61

    Re: [RESOLVED] Finding URLs in a String using Regular Expressions

    Yeah I did notice that it only read one URL per line.

    Thanks for pointing it out.

    How do you suggest I change this?

  5. #5
    Arabic Poster ComputerJy's Avatar
    Join Date
    Nov 2005
    Location
    Happily misplaced
    Posts
    2,513

    Re: [RESOLVED] Finding URLs in a String using Regular Expressions

    Try this code:
    Code:
    URL url = new URL("http://www.bla.com");
    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
    String strLine = null;
    
    String URLregex = "http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w-./?%&=]*)?\\b";
    
    Pattern p = Pattern.compile(URLregex);
    
    while ((strLine = in.readLine()) != null) {
    
        Matcher m = p.matcher(strLine);
    
        while (m.find()) {
            System.out.println(m.group(0));
            strLine.replaceFirst(URLregex, "");
        }
    }
    Just replaced the If with a while and replaced each found Uri with an empty string
    "I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
    My Blog

  6. #6

    Thread Starter
    Member
    Join Date
    Jun 2007
    Location
    England
    Posts
    61

    Re: [RESOLVED] Finding URLs in a String using Regular Expressions

    Thanks mate, ill give it ago.

    I just sent you a PM the second before you posted that!

  7. #7

    Thread Starter
    Member
    Join Date
    Jun 2007
    Location
    England
    Posts
    61

    Re: [RESOLVED] Finding URLs in a String using Regular Expressions

    Sorted. Nice one geeza! Your the man

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width