|
-
Dec 29th, 2010, 04:01 PM
#1
Thread Starter
Lively Member
Scrape a href from HTML but the right one.
Hey guys I'm trying to scrape the right url from html file using webbrowser
I want to scrape this Href and navigate to it. But the problem is every other comment with reply is almost the same. So if I use to scrape hrefs and check the name it will give me the reply buttons of all the comments + the new comment button. Is there a way to grab this link only this one by it's Class name or something?
The One I need:
Code:
<a href="forums.php?op=post&p=1409951"><img src="/images/icons/comment_add.png" class="inline_icon" align="top"> New Comment</a>
The ones I don't need:
Code:
<a href="forums.php?op=post&p=1409971">Reply To This</a>
I'm trying to create my own browser and this should be a button short cut If I want to comment. Thanks a lot.
-
Dec 29th, 2010, 05:22 PM
#2
Thread Starter
Lively Member
Re: Scrape a href from HTML but the right one.
-
Dec 29th, 2010, 08:10 PM
#3
Fanatic Member
Re: Scrape a href from HTML but the right one.
vb.net Code:
For each h as htmlElement in WebBrowser1.Document.GetElementsByTagName("a")
if h.InnerText = "Reply To This" AndAlso System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&p=\d*?", System.Text.RegularExpressions.RegexOptions.IgnoreCase).Success Then
WebBrowser1.Navigate(h.GetAttribute("href"))
Exit For
End if
Next
Hope that helps.
If I helped you out, please take the time to rate me 
-
Dec 30th, 2010, 09:02 AM
#4
Thread Starter
Lively Member
Re: Scrape a href from HTML but the right one.
 Originally Posted by J-Deezy
vb.net Code:
For each h as htmlElement in WebBrowser1.Document.GetElementsByTagName("a") if h.InnerText = "Reply To This" AndAlso System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&p=\d*?", System.Text.RegularExpressions.RegexOptions.IgnoreCase).Success Then WebBrowser1.Navigate(h.GetAttribute("href")) Exit For End if Next
Hope that helps.
Thanks, It looks like what I need but it won't navigate nothing happens 
tried MsgBox to see if there is a link but no msgbox either
Last edited by voidale; Dec 30th, 2010 at 09:05 AM.
-
Dec 30th, 2010, 12:39 PM
#5
Thread Starter
Lively Member
Re: Scrape a href from HTML but the right one.
bump still need it
-
Dec 30th, 2010, 09:05 PM
#6
Fanatic Member
Re: Scrape a href from HTML but the right one.
The most likely problem would be that either the Regex is not matching, or the inner text is incorrect.
First try this:
vb.net Code:
For each h As HtmlElement in WebBrowser1.Document.GetElementsByTagName("a") if h.InnerText.ToLower.Contains("reply to this") then msgbox("found the appropriate innertext") if System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&p=\d*?").Success Then msgbox("match was successful") WebBrowser1.Navigate(h.GetAttribute("href")) Exit For Else msgbox("it's the match that's failing" & vbnewLine & h.GetAttribute("href")) End If End If Next
And report back what messageboxes, if any, appear.
If I helped you out, please take the time to rate me 
-
Dec 31st, 2010, 02:01 PM
#7
Fanatic Member
Re: Scrape a href from HTML but the right one.
it could be that "forums.php?op=post&p=1409971" is not a valid url might have to do WebBrowser1.Navigate("www.thewebsite.com/" & h.GetAttribute("href"))
Live life to the fullest!!
-
Dec 31st, 2010, 02:40 PM
#8
Re: Scrape a href from HTML but the right one.
He said he DOESN'T want the ones that say "Reply to this", but the IF condition says "if it DOES contain that". Wouldn't it be "If Not h.InnerText.ToLower.Contains("reply to this") Then" ?
Also, +1 on the relative URL path. He'll need to append the whole domain name to it. To be even more accurate, before navigating, take the current URL, lop off the filename and append that to the beginning. The forums could be 9 directories deep for all we know.
-
Jan 1st, 2011, 01:13 AM
#9
Fanatic Member
Re: Scrape a href from HTML but the right one.
Sometimes the href in the page source doesn't have a whole URL, but when you access the href attribute it can come up with the entire appended URL, as I can't physically test this; you'll have to do some debugging.
@The incorrect button href, glanced at the thread, simple mistake and simple solution:
Code:
if h.InnerText.ToLower.Contains("reply to this") then
change to:
Code:
if h.InnerText.ToLower.Contains("new comment") then
If I helped you out, please take the time to rate me 
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|