Results 1 to 27 of 27

Thread: Get HyperLinks

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Get HyperLinks

    Hi Experts Again,

    Is there a way to Get The HyperLinks From an Internet Page.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  2. #2
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    just read the HTML into a string, then use Regex in order to get the links...

    read it into a string
    http://www.vbforums.com/showthread.php?t=372593

    regex example getting things between <p> tags...
    http://www.vbforums.com/showthread.php?t=391698

  3. #3
    Admodistrator |2eM!x's Avatar
    Join Date
    Jan 2005
    Posts
    3,900

    Re: Get HyperLinks

    Here, this will find all links on a page, took me a while to write..

    VB Code:
    1. Option Strict On
    2. Option Explicit On
    3.  
    4. Public Class Form1
    5.  
    6.     Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
    7.         Dim WebParse As New WebPageLinks("http://vbforums.com/")
    8.         Dim URLs As Specialized.StringCollection = WebParse.Execute()
    9.         For Each WebAddress As String In URLs
    10.             ListBox1.Items.Add(WebAddress)
    11.         Next
    12.     End Sub
    13. End Class
    14.  
    15. Public Class WebPageLinks
    16.     Dim Web As String
    17.  
    18.     Public Function Execute() As Specialized.StringCollection
    19.         Dim Inet As New Net.WebClient
    20.         Dim ColLinks As New Specialized.StringCollection
    21.         Dim WebText As New IO.StreamReader(Inet.OpenRead(Web))
    22.         Dim Parse As String
    23.         Dim Domain As String
    24.         ColLinks.AddRange(Microsoft.VisualBasic.Split(WebText.ReadToEnd.ToString, "www."))
    25.         ColLinks.RemoveAt(0)
    26.         For t As Int32 = 0 To ColLinks.Count - 1
    27.             Parse = ColLinks(t).Substring(0, ColLinks(t).IndexOf(".") + 4)
    28.             Domain = Parse
    29.             Do
    30.                 Try
    31.                     Parse = ColLinks(t).Substring(0, Parse.Length + 1)
    32.                     IO.Path.GetFileName(Parse)
    33.                 Catch ex As Exception
    34.                     Parse = Domain
    35.                     Exit Do
    36.                 End Try
    37.             Loop Until Parse.Chars(Parse.Length - 4) = "."
    38.             ColLinks(t) = "www." & Parse
    39.         Next
    40.         Return ColLinks
    41.     End Function
    42.  
    43.     Public Sub New(ByVal Website As String)
    44.         Web = Website
    45.     End Sub
    46. End Class
    Last edited by |2eM!x; Mar 26th, 2006 at 01:49 AM.

  4. #4
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: Get HyperLinks

    Quote Originally Posted by Remix
    Microsoft.VisualBasic.Split
    Um, String.Split ?

  5. #5
    Admodistrator |2eM!x's Avatar
    Join Date
    Jan 2005
    Posts
    3,900

    Re: Get HyperLinks

    String.split does allow multiple letters..BTW I messed it up, only shows domain names ATM, so gimme a few.

  6. #6
    Admodistrator |2eM!x's Avatar
    Join Date
    Jan 2005
    Posts
    3,900

    Re: Get HyperLinks

    Okay I fixed it

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    Quote Originally Posted by |2eM!x
    Here, this will find all links on a page, took me a while to write..

    VB Code:
    1. Option Strict On
    2. Option Explicit On
    3.  
    4. Public Class Form1
    5.  
    6.     Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
    7.         Dim WebParse As New WebPageLinks("http://vbforums.com/")
    8.         Dim URLs As Specialized.StringCollection = WebParse.Execute()
    9.         For Each WebAddress As String In URLs
    10.             ListBox1.Items.Add(WebAddress)
    11.         Next
    12.     End Sub
    13. End Class
    14.  
    15. Public Class WebPageLinks
    16.     Dim Web As String
    17.  
    18.     Public Function Execute() As Specialized.StringCollection
    19.         Dim Inet As New Net.WebClient
    20.         Dim ColLinks As New Specialized.StringCollection
    21.         Dim WebText As New IO.StreamReader(Inet.OpenRead(Web))
    22.         Dim Parse As String
    23.         Dim Domain As String
    24.         ColLinks.AddRange(Microsoft.VisualBasic.Split(WebText.ReadToEnd.ToString, "www."))
    25.         ColLinks.RemoveAt(0)
    26.         For t As Int32 = 0 To ColLinks.Count - 1
    27.             Parse = ColLinks(t).Substring(0, ColLinks(t).IndexOf(".") + 4)
    28.             Domain = Parse
    29.             Do
    30.                 Try
    31.                     Parse = ColLinks(t).Substring(0, Parse.Length + 1)
    32.                     IO.Path.GetFileName(Parse)
    33.                 Catch ex As Exception
    34.                     Parse = Domain
    35.                     Exit Do
    36.                 End Try
    37.             Loop Until Parse.Chars(Parse.Length - 4) = "."
    38.             ColLinks(t) = "www." & Parse
    39.         Next
    40.         Return ColLinks
    41.     End Function
    42.  
    43.     Public Sub New(ByVal Website As String)
    44.         Web = Website
    45.     End Sub
    46. End Class
    Amazing Code But,

    it takes only ww.site.com
    what if the HyperLink is www.site.com/index.php?showforum=xxx

    Thanx to u all
    And will check Regex it really Damn Fast

    Cheer
    Last edited by _Conan_; Mar 26th, 2006 at 02:32 AM.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  8. #8
    Hyperactive Member OMITT3D's Avatar
    Join Date
    Mar 2006
    Posts
    368

    Re: Get HyperLinks

    Not to mention a hyperlink doesn't always have www in it. Some designers simply put ../Images/1.gif etc...or some sites sub domain is not www.

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    With Regex i tried this but i think there is a wrong type with RegularExpressions.Regex

    VB Code:
    1. Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
    2.         Dim returnstring As String
    3.         returnstring = SearchPage("http://www.yahoo.com")
    4.         Dim Regex As New System.Text.RegularExpressions.Regex("(?<=<a href=>).*?(?=</a>)")
    5.         Dim Mymatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(returnstring)
    6.         For Each FoundMatch As System.Text.RegularExpressions.Match In Mymatches
    7.             MsgBox(FoundMatch.Value)
    8.         Next
    9.     End Sub
    10.     'Function Code...
    11.     Private Function SearchPage(ByVal sURL As String) As String
    12.         Dim client As System.Net.WebClient = New System.Net.WebClient
    13.         Dim data As System.IO.Stream = client.OpenRead(sURL)
    14.         Dim reader As System.IO.StreamReader = New System.IO.StreamReader(data)
    15.         SearchPage = reader.ReadToEnd
    16.     End Function
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  10. #10

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    Quote Originally Posted by OMITT3D
    Not to mention a hyperlink doesn't always have www in it. Some designers simply put ../Images/1.gif etc...or some sites sub domain is not www.

    i know but the code will work with sites those have links like

    http://www.site.com/page
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  11. #11
    Hyperactive Member OMITT3D's Avatar
    Join Date
    Mar 2006
    Posts
    368

    Re: Get HyperLinks

    Look at the code to most websites google for example.

    <a href=/intl/en/about.html>About Google</a>

    <a href="/ads/">Advertising&nbsp;Programs</a>

    <a href=/language_tools?hl=en>Language Tools</a>

    <a href=/preferences?hl=en>Preferences</a>

    Etc.

  12. #12

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    |2eM!x .. APPRECIATED Work and Worth rates.

    But For Example: if i tried to get all Threads in the Forum in this section:
    http://www.vbforums.com/forumdisplay.php?f=25

    i will not Get any Thread, Sorry For Bother
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  13. #13
    Hyperactive Member OMITT3D's Avatar
    Join Date
    Mar 2006
    Posts
    368

    Re: Get HyperLinks

    Obviously look at the source
    <a href="forumdisplay.php?f=8"><strong>API</strong></a> It needs to parse <a href= instead of www.

  14. #14

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    Then as usual it was my fault.
    I did not check the source i was just moving the mouse through the link and in the Status Bar will appear the full link. i'm so silly . lol
    cuz i thought i can grab the link from the source.

    Then what if i used webrowser control and get the HyperLinks. is it possible ?

    |2eM!x ur code is handy man
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  15. #15
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    Did anyone see my first post?? Did you visit the links? You just modify the regex expression in the second link, instead of the <p>...</p> tags, change it to the link tags ex. <a href=...</a> ... you jsut read the page into a string (first link)... then parse out the links using regex (second link)... Conan is going about it the right way in his later post... (no doubt using those examples )
    Last edited by gigemboy; Mar 26th, 2006 at 05:22 AM.

  16. #16
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    Here is a small sample, using a simple regex expression that displays everything within the <a href=...> block (not including the link description and closing </a> tag), so you can see that it is parsing the right stuff... I do notice that yahoo has some weird links in there, as you will see if you run this sample. This can be easily modified to not include the beginning "<a href=" and the ending ">", but the below is so you can see full text that it is matching on...
    VB Code:
    1. Dim returnstring As String
    2.         returnstring = SearchPage("http://www.yahoo.com")
    3.         Dim Regex As New System.Text.RegularExpressions.Regex("<a href=.*?>")
    4.         Dim Mymatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(returnstring)
    5.         For Each FoundMatch As System.Text.RegularExpressions.Match In Mymatches
    6.             MsgBox(FoundMatch.Value)
    7.         Next
    8.         MessageBox.Show("done!")

  17. #17
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: Get HyperLinks

    Shouldn't it be "<a href=""(.*)"">" ?

    Or "<a href=""(.+?)"">" if you do not want to match blank link targets

  18. #18
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    I was just giving a general example to show that it does work...
    Last edited by gigemboy; Mar 26th, 2006 at 05:17 AM.

  19. #19

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    thanx for you all.
    Now I am Confused of this Thread, Wha Shall i Mark it, UnResolved or CLosed.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  20. #20
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    you tell us hehe do you have it working? Are you having problems with it??

  21. #21

    Thread Starter
    Addicted Member
    Join Date
    Nov 2005
    Posts
    153

    Re: Get HyperLinks

    Quote Originally Posted by gigemboy
    you tell us hehe do you have it working? Are you having problems with it??

    Yes Sir,

    becuase not all source have the full link like:

    http://www.vbforums.com/showthread.php?t=395225

    in the source it's like:
    <a href="showthread.php?t=395225">

    but i doubt maybe if there is a way to get the full link by using WebBrowser Control.
    This is The Prophet MOHAMMAD!
    =======================
    My SuperMen:
    RhinoBull, gigemboy, jmcilhinney, |2eM!x, Edneeis and Hack

  22. #22
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    it is because that is the "link" that is displayed in the source code... cant change that... it will be the same as viewed in any browser or control...

  23. #23
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: Get HyperLinks

    Code:
    <a(.*)href="http://(.*)google.com(.*)>(.*)</a>
    works fine for me

  24. #24
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    that will only pull up the links with "google.com" in the name, and starting with "http://", which is not what he wanted...

  25. #25
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: Get HyperLinks

    Quote Originally Posted by gigemboy
    that will only pull up the links with "google.com" in the name, and starting with "http://", which is not what he wanted...
    google.com is there for an example
    remove it to get a regular expression to match links

  26. #26
    PowerPoster
    Join Date
    Aug 2005
    Location
    College Station, TX
    Posts
    4,521

    Re: Get HyperLinks

    but the whole point is that he wants to rebuild the links into something he can just click. Some links do not include the entire link in the href parameter (relative links), so you would have to "build" a clickable link by appending "http://", domain name, sometimes the root folder the page is in, etc... so dont "roll" your eyes at me for you misunderstanding what he is wanting

    View this thread for more info about the same kind of question, as well as an example of a screen scraper micrsoft project that has this type of functionality for a reference...

    http://www.vbforums.com/showthread.php?t=396773
    Last edited by gigemboy; Apr 5th, 2006 at 04:44 AM.

  27. #27
    Addicted Member
    Join Date
    Mar 2006
    Posts
    186

    Re: Get HyperLinks

    I guess he wants to make an app similar to this one: ... http://www.astanda.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width