|
-
Oct 24th, 2007, 07:24 AM
#1
Thread Starter
Member
[RESOLVED] Finding URLs in a String using Regular Expressions
Hello.
I am trying to write an app that will read in a HTML document and extract all the URLs from it.
I currently have the HTML document being read in line by line and I need to be able to identify if there are any URLs in the string.
Someone suggested using Regular Expressions?
I am having abit of trouble doing this.
I am trying something like this but it doesnt work. Any help will be great!! THANKS
Code:
String test = new String("bla bla bla http://somesite.com/tmp/page.html bla bla");
String regex = "@\"http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]*)?\\b\")";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(test);
if (m.find()){
System.out.println(m.group(1));
}
else{
System.out.println("Not found!");
}
Last edited by DonCash; Oct 24th, 2007 at 09:58 AM.
-
Oct 24th, 2007, 09:58 AM
#2
Thread Starter
Member
Re: Finding URLs in a String using Regular Expressions
Sorted. Ive worked it out.. Thanks anyway.
Code:
URL url = new URL("http://www.bla.com");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String strLine = "";
String URLregex = "http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w-./?%&=]*)?\\b";
Pattern p = Pattern.compile(URLregex);
while ((strLine = in.readLine()) != null){
Matcher m = p.matcher(strLine);
if (m.find()){
System.out.println(m.group(0));
}
}
-
Oct 24th, 2007, 10:01 PM
#3
Re: [RESOLVED] Finding URLs in a String using Regular Expressions
I don't know if you've noticed but your code will only read the first url in each line.
So if the whole page was formated into a single line you'll only get one response.
That's a logical error
"I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
My Blog
-
Oct 25th, 2007, 03:52 AM
#4
Thread Starter
Member
Re: [RESOLVED] Finding URLs in a String using Regular Expressions
Yeah I did notice that it only read one URL per line.
Thanks for pointing it out.
How do you suggest I change this?
-
Oct 25th, 2007, 08:40 AM
#5
Re: [RESOLVED] Finding URLs in a String using Regular Expressions
Try this code:
Code:
URL url = new URL("http://www.bla.com");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String strLine = null;
String URLregex = "http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w-./?%&=]*)?\\b";
Pattern p = Pattern.compile(URLregex);
while ((strLine = in.readLine()) != null) {
Matcher m = p.matcher(strLine);
while (m.find()) {
System.out.println(m.group(0));
strLine.replaceFirst(URLregex, "");
}
}
Just replaced the If with a while and replaced each found Uri with an empty string
"I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
My Blog
-
Oct 25th, 2007, 08:42 AM
#6
Thread Starter
Member
Re: [RESOLVED] Finding URLs in a String using Regular Expressions
Thanks mate, ill give it ago.
I just sent you a PM the second before you posted that!
-
Oct 25th, 2007, 08:44 AM
#7
Thread Starter
Member
Re: [RESOLVED] Finding URLs in a String using Regular Expressions
Sorted. Nice one geeza! Your the man
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|