|
-
Oct 10th, 2000, 09:59 PM
#1
Thread Starter
New Member
How do we extract the URL address of a particular link in the html document?
Do we use string operation to find the word "href" before the text link, and then get the url address following href, or can we use other better methods?
Using string operation to get the url address is quite cumbersome.
We'll need to deal with absolute/relative link, and not to mention a cgi link.
Rgds,
Greg
-
Oct 10th, 2000, 10:47 PM
#2
Addicted Member
do you want to extract it from the browser or the "View Source" file?
-
Oct 11th, 2000, 02:01 AM
#3
Frenzied Member
I got exactely the code you need, from Sams Tech Teach Yourself Visual Basic in 21 Days (great book), I modified it a bit (the code, not the book ).
Parameters:
S = The sourcecode of the html page (you can get it with the Inet control)
baseUrl = The url used for relative links (www.vb-world.net/) for example.
Code:
Option Explicit
Public Function GetLinks(s As String, baseUrl As String)
Dim pos As Long, pos1 As Long, pos2 As Long
Dim buf As String, temp As String
Dim sq As String, dq As String
Dim qc As String, Start As Long
buf = ""
'Make sure a nonempty string has been passed.
If s = Null Or Len(s) = 0 Then
GetLinks = buf
Exit Function
End If
'Make sure there is at least one link
Start = InStr(1, s, "<a href=", vbTextCompare)
If Start = 0 Then
GetLinks = buf
Exit Function
End If
'Define the single
dq = Chr$(34)
sq = Chr$(39)
Do
'get the first Dq or Sq
pos = InStr(Start, s, dq, vbTextCompare)
pos2 = InStr(Start, s, sq, vbTextCompare)
If pos = 0 And pos2 = 0 Then Exit Do 'Nothing found
If pos > 0 And pos2 > 0 Then
If pos < pos2 Then 'It's a Dq
qc = dq
Else
qc = sq
pos = pos2
End If
ElseIf pos = 0 Then 'Only Signle
qc = sq
pos = pos2
ElseIf pos2 = 0 Then
qc = dq
End If
pos1 = InStr(pos + 1, s, qc, vbTextCompare)
If pos1 = 0 Then Exit Do
temp = Mid$(s, pos + 1, pos1 - pos - 1)
'Forget about FTP and Mailto links
If LCase(Left(temp, 7)) = "mailto:" Or LCase(Left(temp, 3)) = "ftp" Then
GoTo DoNotAdd
End If
'See if it's a full URL, if not add the base Url
If LCase(Left(temp, 7)) <> "http://" Then
temp = baseUrl & temp
End If
'Strip off anything following a # or ?
pos = InStr(1, temp, "#")
If pos > 0 Then
temp = Left(temp, pos - 1)
End If
pos = InStr(1, temp, "?")
If pos > 0 Then
temp = Left(temp, pos - 1)
End If
buf = buf & temp & "|"
DoNotAdd:
'Locate the next link
pos = InStr(pos1, s, "<a href=", vbTextCompare)
Start = pos
'If there a no more links then quit
If pos = 0 Then Exit Do
DoEvents
Loop While True
'Strip off the trailing |
GetLinks = Left(buf, Len(buf) - 1)
'MsgBox buf
End Function
Code:
'USAGE: (this is just an example!)
Inet1.URL = "http://www.cool.com/"
mystr = Inet1.OpenURL
MsgBox GetLinks(mystr, Inet1.RemoteHost & "/")
The code may need some modification, I have done it, but lost it somehow, will post later if I find it!
Again, all respect, flowers, presents, money, kisses, pies and ofcourse the new ferrari goes to the great Peter Aitken, the author of the book Sams Teach Yourself Internet Programming with VB6 in 21 days! BUY IT!!!
Hope it helped ya!
[Edited by Jop on 10-11-2000 at 03:07 AM]
Jop - validweb.nl
Alcohol doesn't solve any problems, but then again, neither does milk.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|