Results 1 to 9 of 9

Thread: Scrape a href from HTML but the right one.

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Mar 2010
    Posts
    123

    Scrape a href from HTML but the right one.

    Hey guys I'm trying to scrape the right url from html file using webbrowser

    I want to scrape this Href and navigate to it. But the problem is every other comment with reply is almost the same. So if I use to scrape hrefs and check the name it will give me the reply buttons of all the comments + the new comment button. Is there a way to grab this link only this one by it's Class name or something?

    The One I need:
    Code:
    <a href="forums.php?op=post&amp;p=1409951"><img src="/images/icons/comment_add.png" class="inline_icon" align="top">&nbsp;New Comment</a>
    The ones I don't need:
    Code:
    <a href="forums.php?op=post&amp;p=1409971">Reply To This</a>
    I'm trying to create my own browser and this should be a button short cut If I want to comment. Thanks a lot.

  2. #2

    Thread Starter
    Lively Member
    Join Date
    Mar 2010
    Posts
    123

    Re: Scrape a href from HTML but the right one.

    bump

  3. #3
    Fanatic Member
    Join Date
    Aug 2010
    Posts
    624

    Re: Scrape a href from HTML but the right one.

    vb.net Code:
    1. For each h as htmlElement in WebBrowser1.Document.GetElementsByTagName("a")
    2.     if h.InnerText = "Reply To This" AndAlso System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&amp;p=\d*?", System.Text.RegularExpressions.RegexOptions.IgnoreCase).Success Then
    3.         WebBrowser1.Navigate(h.GetAttribute("href"))
    4.         Exit For
    5.     End if
    6. Next

    Hope that helps.
    If I helped you out, please take the time to rate me

  4. #4

    Thread Starter
    Lively Member
    Join Date
    Mar 2010
    Posts
    123

    Re: Scrape a href from HTML but the right one.

    Quote Originally Posted by J-Deezy View Post
    vb.net Code:
    1. For each h as htmlElement in WebBrowser1.Document.GetElementsByTagName("a")
    2.     if h.InnerText = "Reply To This" AndAlso System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&amp;p=\d*?", System.Text.RegularExpressions.RegexOptions.IgnoreCase).Success Then
    3.         WebBrowser1.Navigate(h.GetAttribute("href"))
    4.         Exit For
    5.     End if
    6. Next

    Hope that helps.
    Thanks, It looks like what I need but it won't navigate nothing happens
    tried MsgBox to see if there is a link but no msgbox either
    Last edited by voidale; Dec 30th, 2010 at 09:05 AM.

  5. #5

    Thread Starter
    Lively Member
    Join Date
    Mar 2010
    Posts
    123

    Re: Scrape a href from HTML but the right one.

    bump still need it

  6. #6
    Fanatic Member
    Join Date
    Aug 2010
    Posts
    624

    Re: Scrape a href from HTML but the right one.

    The most likely problem would be that either the Regex is not matching, or the inner text is incorrect.

    First try this:

    vb.net Code:
    1. For each h As HtmlElement in WebBrowser1.Document.GetElementsByTagName("a")
    2.     if h.InnerText.ToLower.Contains("reply to this") then
    3.         msgbox("found the appropriate innertext")
    4.         if System.Text.RegularExpressions.Regex.Match(h.GetAttribute("href"), "forums\.php\?op=post&amp;p=\d*?").Success Then
    5.             msgbox("match was successful")
    6.             WebBrowser1.Navigate(h.GetAttribute("href"))
    7.             Exit For
    8.         Else
    9.             msgbox("it's the match that's failing" & vbnewLine & h.GetAttribute("href"))
    10.         End If
    11.     End If
    12. Next

    And report back what messageboxes, if any, appear.
    If I helped you out, please take the time to rate me

  7. #7
    Fanatic Member newprogram's Avatar
    Join Date
    Apr 2006
    Location
    in your basement
    Posts
    769

    Re: Scrape a href from HTML but the right one.

    it could be that "forums.php?op=post&amp;p=1409971" is not a valid url might have to do WebBrowser1.Navigate("www.thewebsite.com/" & h.GetAttribute("href"))
    Live life to the fullest!!

  8. #8
    Frenzied Member stateofidleness's Avatar
    Join Date
    Jan 2009
    Posts
    1,780

    Re: Scrape a href from HTML but the right one.

    He said he DOESN'T want the ones that say "Reply to this", but the IF condition says "if it DOES contain that". Wouldn't it be "If Not h.InnerText.ToLower.Contains("reply to this") Then" ?
    Also, +1 on the relative URL path. He'll need to append the whole domain name to it. To be even more accurate, before navigating, take the current URL, lop off the filename and append that to the beginning. The forums could be 9 directories deep for all we know.

  9. #9
    Fanatic Member
    Join Date
    Aug 2010
    Posts
    624

    Re: Scrape a href from HTML but the right one.

    Sometimes the href in the page source doesn't have a whole URL, but when you access the href attribute it can come up with the entire appended URL, as I can't physically test this; you'll have to do some debugging.

    @The incorrect button href, glanced at the thread, simple mistake and simple solution:
    Code:
    if h.InnerText.ToLower.Contains("reply to this") then
    change to:
    Code:
    if h.InnerText.ToLower.Contains("new comment") then
    If I helped you out, please take the time to rate me

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width