Results 1 to 6 of 6

Thread: Get all links from page

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Jun 2010
    Posts
    94

    Get all links from page

    i want to be able to get all links from the current webpage, and then take the ones that have a certain part of the url. How can i do this. Basically I want to:
    1) Get all the links
    2) Delete the links that do not contain "/article/"
    3) Put those links in a textbox.

    I know how to do number 3, but how can I do number 1 and 2?

  2. #2
    Member
    Join Date
    Dec 2007
    Posts
    32

    Re: Get all links from page

    I don't know how to do this stuff ... But I do know how to get the HTML source code of webpage currently opened in web browser control ...

    Code:
    Text1.Text = WebBrowser1.Document.Body.InnerHtml
    This might be used to fetch all the hyperlinks in the document. But I don't know how to use this:

    Code:
    WebBrowser1.Document.Body.Links
    Last edited by nakaam_aashiq; Sep 6th, 2010 at 10:54 PM.

  3. #3
    Member
    Join Date
    Dec 2007
    Posts
    32

    Re: Get all links from page

    Hey dude, I just got the code for doing your step 1 of program.

    Try to open your that webpage in WebBrowser control.

    vb Code:
    1. WebBrowser1.Navigate("http://www.google.com.pk")

    Now, use the following code under any Button.Click event or else:

    vb Code:
    1. If (WebBrowser1.ReadyState = WebBrowserReadyState.Complete) Then
    2.     For Each ClientControl As HtmlElement In WebBrowser1.Document.Links
    3.         ListBox1.Items.Add(ClientControl.GetAttribute("href"))
    4.     Next
    5. End If

    Now, ListBox1 contains all the hyper-links in webpage: i.e. in google.com.pk

  4. #4

    Thread Starter
    Lively Member
    Join Date
    Jun 2010
    Posts
    94

    Re: Get all links from page

    thanks, works great.
    How can i sort the links and only keep ones that contain "/article/" in them?

  5. #5
    Hyperactive Member
    Join Date
    Nov 2004
    Posts
    362

    Re: Get all links from page

    use regex class to filter link's href

  6. #6

    Thread Starter
    Lively Member
    Join Date
    Jun 2010
    Posts
    94

    Re: Get all links from page

    i used

    Code:
    For x = 0 To lcount
                addition = LinkGrabber.Items.Item(x)
                If addition.ToString().Contains("/article/") Then
                    alist.Text.Insert(0, addition.ToString())
                End If
            Next
    but I get some error about 90 not being compatible with 'index'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width