|
-
Mar 26th, 2006, 12:20 AM
#1
Thread Starter
Addicted Member
Get HyperLinks
Hi Experts Again,
Is there a way to Get The HyperLinks From an Internet Page.
-
Mar 26th, 2006, 12:48 AM
#2
Re: Get HyperLinks
just read the HTML into a string, then use Regex in order to get the links...
read it into a string
http://www.vbforums.com/showthread.php?t=372593
regex example getting things between <p> tags...
http://www.vbforums.com/showthread.php?t=391698
-
Mar 26th, 2006, 01:07 AM
#3
Re: Get HyperLinks
Here, this will find all links on a page, took me a while to write..
VB Code:
Option Strict On
Option Explicit On
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim WebParse As New WebPageLinks("http://vbforums.com/")
Dim URLs As Specialized.StringCollection = WebParse.Execute()
For Each WebAddress As String In URLs
ListBox1.Items.Add(WebAddress)
Next
End Sub
End Class
Public Class WebPageLinks
Dim Web As String
Public Function Execute() As Specialized.StringCollection
Dim Inet As New Net.WebClient
Dim ColLinks As New Specialized.StringCollection
Dim WebText As New IO.StreamReader(Inet.OpenRead(Web))
Dim Parse As String
Dim Domain As String
ColLinks.AddRange(Microsoft.VisualBasic.Split(WebText.ReadToEnd.ToString, "www."))
ColLinks.RemoveAt(0)
For t As Int32 = 0 To ColLinks.Count - 1
Parse = ColLinks(t).Substring(0, ColLinks(t).IndexOf(".") + 4)
Domain = Parse
Do
Try
Parse = ColLinks(t).Substring(0, Parse.Length + 1)
IO.Path.GetFileName(Parse)
Catch ex As Exception
Parse = Domain
Exit Do
End Try
Loop Until Parse.Chars(Parse.Length - 4) = "."
ColLinks(t) = "www." & Parse
Next
Return ColLinks
End Function
Public Sub New(ByVal Website As String)
Web = Website
End Sub
End Class
Last edited by |2eM!x; Mar 26th, 2006 at 01:49 AM.
-
Mar 26th, 2006, 01:21 AM
#4
Re: Get HyperLinks
 Originally Posted by Remix
Microsoft.VisualBasic.Split
Um, String.Split ?
-
Mar 26th, 2006, 01:24 AM
#5
Re: Get HyperLinks
String.split does allow multiple letters..BTW I messed it up, only shows domain names ATM, so gimme a few.
-
Mar 26th, 2006, 01:46 AM
#6
Re: Get HyperLinks
Okay I fixed it
-
Mar 26th, 2006, 02:27 AM
#7
Thread Starter
Addicted Member
Re: Get HyperLinks
 Originally Posted by |2eM!x
Here, this will find all links on a page, took me a while to write..
VB Code:
Option Strict On
Option Explicit On
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim WebParse As New WebPageLinks("http://vbforums.com/")
Dim URLs As Specialized.StringCollection = WebParse.Execute()
For Each WebAddress As String In URLs
ListBox1.Items.Add(WebAddress)
Next
End Sub
End Class
Public Class WebPageLinks
Dim Web As String
Public Function Execute() As Specialized.StringCollection
Dim Inet As New Net.WebClient
Dim ColLinks As New Specialized.StringCollection
Dim WebText As New IO.StreamReader(Inet.OpenRead(Web))
Dim Parse As String
Dim Domain As String
ColLinks.AddRange(Microsoft.VisualBasic.Split(WebText.ReadToEnd.ToString, "www."))
ColLinks.RemoveAt(0)
For t As Int32 = 0 To ColLinks.Count - 1
Parse = ColLinks(t).Substring(0, ColLinks(t).IndexOf(".") + 4)
Domain = Parse
Do
Try
Parse = ColLinks(t).Substring(0, Parse.Length + 1)
IO.Path.GetFileName(Parse)
Catch ex As Exception
Parse = Domain
Exit Do
End Try
Loop Until Parse.Chars(Parse.Length - 4) = "."
ColLinks(t) = "www." & Parse
Next
Return ColLinks
End Function
Public Sub New(ByVal Website As String)
Web = Website
End Sub
End Class
Amazing Code But, 
it takes only ww.site.com
what if the HyperLink is www.site.com/index.php?showforum=xxx
Thanx to u all
And will check Regex it really Damn Fast
Cheer
Last edited by _Conan_; Mar 26th, 2006 at 02:32 AM.
-
Mar 26th, 2006, 02:30 AM
#8
Hyperactive Member
Re: Get HyperLinks
Not to mention a hyperlink doesn't always have www in it. Some designers simply put ../Images/1.gif etc...or some sites sub domain is not www.
-
Mar 26th, 2006, 02:46 AM
#9
Thread Starter
Addicted Member
Re: Get HyperLinks
With Regex i tried this but i think there is a wrong type with RegularExpressions.Regex
VB Code:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
Dim returnstring As String
returnstring = SearchPage("http://www.yahoo.com")
Dim Regex As New System.Text.RegularExpressions.Regex("(?<=<a href=>).*?(?=</a>)")
Dim Mymatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(returnstring)
For Each FoundMatch As System.Text.RegularExpressions.Match In Mymatches
MsgBox(FoundMatch.Value)
Next
End Sub
'Function Code...
Private Function SearchPage(ByVal sURL As String) As String
Dim client As System.Net.WebClient = New System.Net.WebClient
Dim data As System.IO.Stream = client.OpenRead(sURL)
Dim reader As System.IO.StreamReader = New System.IO.StreamReader(data)
SearchPage = reader.ReadToEnd
End Function
-
Mar 26th, 2006, 02:49 AM
#10
Thread Starter
Addicted Member
Re: Get HyperLinks
 Originally Posted by OMITT3D
Not to mention a hyperlink doesn't always have www in it. Some designers simply put ../Images/1.gif etc...or some sites sub domain is not www.
i know but the code will work with sites those have links like
http://www.site.com/page
-
Mar 26th, 2006, 02:52 AM
#11
Hyperactive Member
Re: Get HyperLinks
Look at the code to most websites google for example.
<a href=/intl/en/about.html>About Google</a>
<a href="/ads/">Advertising Programs</a>
<a href=/language_tools?hl=en>Language Tools</a>
<a href=/preferences?hl=en>Preferences</a>
Etc.
-
Mar 26th, 2006, 03:06 AM
#12
Thread Starter
Addicted Member
Re: Get HyperLinks
|2eM!x .. APPRECIATED Work and Worth rates.
But For Example: if i tried to get all Threads in the Forum in this section:
http://www.vbforums.com/forumdisplay.php?f=25
i will not Get any Thread, Sorry For Bother
-
Mar 26th, 2006, 03:17 AM
#13
Hyperactive Member
Re: Get HyperLinks
Obviously look at the source
<a href="forumdisplay.php?f=8"><strong>API</strong></a> It needs to parse <a href= instead of www.
-
Mar 26th, 2006, 03:33 AM
#14
Thread Starter
Addicted Member
Re: Get HyperLinks
Then as usual it was my fault.
I did not check the source i was just moving the mouse through the link and in the Status Bar will appear the full link. i'm so silly . lol
cuz i thought i can grab the link from the source.
Then what if i used webrowser control and get the HyperLinks. is it possible ?
|2eM!x ur code is handy man
-
Mar 26th, 2006, 04:37 AM
#15
Re: Get HyperLinks
Did anyone see my first post?? Did you visit the links? You just modify the regex expression in the second link, instead of the <p>...</p> tags, change it to the link tags ex. <a href=...</a> ... you jsut read the page into a string (first link)... then parse out the links using regex (second link)... Conan is going about it the right way in his later post... (no doubt using those examples )
Last edited by gigemboy; Mar 26th, 2006 at 05:22 AM.
-
Mar 26th, 2006, 04:58 AM
#16
Re: Get HyperLinks
Here is a small sample, using a simple regex expression that displays everything within the <a href=...> block (not including the link description and closing </a> tag), so you can see that it is parsing the right stuff... I do notice that yahoo has some weird links in there, as you will see if you run this sample. This can be easily modified to not include the beginning "<a href=" and the ending ">", but the below is so you can see full text that it is matching on...
VB Code:
Dim returnstring As String
returnstring = SearchPage("http://www.yahoo.com")
Dim Regex As New System.Text.RegularExpressions.Regex("<a href=.*?>")
Dim Mymatches As System.Text.RegularExpressions.MatchCollection = Regex.Matches(returnstring)
For Each FoundMatch As System.Text.RegularExpressions.Match In Mymatches
MsgBox(FoundMatch.Value)
Next
MessageBox.Show("done!")
-
Mar 26th, 2006, 05:01 AM
#17
Re: Get HyperLinks
Shouldn't it be "<a href=""(.*)"">" ?
Or "<a href=""(.+?)"">" if you do not want to match blank link targets
-
Mar 26th, 2006, 05:09 AM
#18
Re: Get HyperLinks
I was just giving a general example to show that it does work...
Last edited by gigemboy; Mar 26th, 2006 at 05:17 AM.
-
Mar 26th, 2006, 05:38 AM
#19
Thread Starter
Addicted Member
Re: Get HyperLinks
thanx for you all.
Now I am Confused of this Thread, Wha Shall i Mark it, UnResolved or CLosed.
-
Mar 26th, 2006, 03:05 PM
#20
Re: Get HyperLinks
you tell us hehe do you have it working? Are you having problems with it??
-
Mar 26th, 2006, 11:49 PM
#21
Thread Starter
Addicted Member
Re: Get HyperLinks
 Originally Posted by gigemboy
you tell us hehe  do you have it working? Are you having problems with it??
Yes Sir,
becuase not all source have the full link like:
http://www.vbforums.com/showthread.php?t=395225
in the source it's like:
<a href="showthread.php?t=395225">
but i doubt maybe if there is a way to get the full link by using WebBrowser Control.
-
Mar 27th, 2006, 02:03 AM
#22
Re: Get HyperLinks
it is because that is the "link" that is displayed in the source code... cant change that... it will be the same as viewed in any browser or control...
-
Apr 5th, 2006, 04:12 AM
#23
Addicted Member
Re: Get HyperLinks
Code:
<a(.*)href="http://(.*)google.com(.*)>(.*)</a>
works fine for me
-
Apr 5th, 2006, 04:16 AM
#24
Re: Get HyperLinks
that will only pull up the links with "google.com" in the name, and starting with "http://", which is not what he wanted...
-
Apr 5th, 2006, 04:21 AM
#25
Addicted Member
Re: Get HyperLinks
 Originally Posted by gigemboy
that will only pull up the links with "google.com" in the name, and starting with "http://", which is not what he wanted...
google.com is there for an example
remove it to get a regular expression to match links
-
Apr 5th, 2006, 04:39 AM
#26
Re: Get HyperLinks
but the whole point is that he wants to rebuild the links into something he can just click. Some links do not include the entire link in the href parameter (relative links), so you would have to "build" a clickable link by appending "http://", domain name, sometimes the root folder the page is in, etc... so dont "roll" your eyes at me for you misunderstanding what he is wanting 
View this thread for more info about the same kind of question, as well as an example of a screen scraper micrsoft project that has this type of functionality for a reference...
http://www.vbforums.com/showthread.php?t=396773
Last edited by gigemboy; Apr 5th, 2006 at 04:44 AM.
-
Apr 5th, 2006, 05:05 AM
#27
Addicted Member
Re: Get HyperLinks
I guess he wants to make an app similar to this one: ... http://www.astanda.com
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|