|
-
Aug 4th, 2001, 03:47 PM
#1
Thread Starter
Frenzied Member
using vb to download via html links ??
I have been working on an application that downloads a webpage and then searches the html for links to image files
it works like
if there is a "a href=" and a ".jpg" on the same line with no "<" or ">" between, then I assume that is a reference to an image.
This works okay, but not always. I have to let the app decide whether the link is absolute (contains a "//")
or relative (no "//")
also, some websites are set up to prevent this method of downloading.
So, this is my idea. Open the html page once it has downloaded into the app itself (I know there is a control for that, i just don't remember which one)
then, the question is, can I make the VB emulate clicking a link to a file, then saving the file to disk and going back to the html page?
If you can help, I appreciate it.
Thanks
Wengang
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 4th, 2001, 04:00 PM
#2
Frenzied Member
If I understand you correctly you would like to parse out links that are like this:
http://dallas.citynews.com/8647.gif (That's one of my Bloodeye Gifs)
This is an anchor element that links to a gif file. Are these the kind of links that you want to parse out?
If they are then I would use a WebBrowser Control and URLDownloadToFile API.
-Use the WebBrowser to Navigate to a page that have "Image Links"
-Once the links have been parsed out, then use URLDownloadToFile API to download the image file to your HD.
-
Aug 4th, 2001, 04:36 PM
#3
Thread Starter
Frenzied Member
Hey. Thanks
it sounds like what I was getting at
but I hope you're available later because I'm sure more questions will come up soon
Thanks again.
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 4th, 2001, 04:54 PM
#4
Frenzied Member
Check out this link for getting every link in a web-page using the WebBrowser Control. You could adapt this code to check for an Image file extension on every link.
http://forums.vb-world.net/showthrea...threadid=74893
-
Aug 4th, 2001, 06:21 PM
#5
Thread Starter
Frenzied Member
I found these 2 subs on the website you mentioned earlier
Public Static Sub ListAllLinkUrls(doc as HTMLdocument, List As ListBox)
If doc.frames.length = 0 Then
For i = 0 To doc.links.length - 1
List.AddItem doc.links(i).href
Next i
Else
For i = 0 To doc.frames.length - 1
ListAllLinks doc.frames(i), List
Next i
End If
End Sub
Public Static Sub ListAllLinks(doc as HTMLdocument, List As ListBox)
If doc.frames.length = 0 Then
For i = 0 To doc.links.length - 1
List.AddItem doc.links(i).outerText
Next i
Else
For i = 0 To doc.frames.length - 1
ListAllLinks doc.frames(i), List
Next i
End If
End Sub
But I got stuck on HTMLdocument. That type doesn't exist
it is being regarded as user-defined but I haven't defined it or set it to any object
THe usage code is for the webbrowser and a listbox like this:
ListAllLinks webbrowser1.document, List1
and all links on the page should be listed in a list box (or the link text if using this second sub)
any ideas?
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 4th, 2001, 06:42 PM
#6
you have to add a referance to the microsoft html library
and that will fix it
-
Aug 4th, 2001, 06:45 PM
#7
Frenzied Member
You will need to make a reference to the MSHTML.lib(Miscrosoft HTML Object library).
You will also need to set doc = WebBrowser1.Document in the DocumentComplete Event
The same applies to this code from the last link I posted.
VB Code:
Dim Doc As New HTMLDocument
Dim e As HTMLGenericElement
Dim a As HTMLAnchorElement
Dim x As Integer
Private Sub Form_Load()
WebBrowser1.Navigate "http://altavista.com/"
End Sub
Private Sub Form_Unload(Cancel As Integer)
Set Web = Nothing
End Sub
Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
URL As Variant, Flags As Variant, TargetFrameName As Variant, _
PostData As Variant, Headers As Variant, Cancel As Boolean)
Set Web = Nothing
x = 0
End Sub
Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Set Doc = WebBrowser1.Document
If (pDisp Is WebBrowser1.Object) Then
For Each e In Doc.All
If e.tagName = "A" Then
Set a = e
x = x + 1
List1.AddItem x & "-" & a.href
End If
Next
End If
End Sub
-
Aug 4th, 2001, 06:49 PM
#8
hey bloodeye
gotta question while your on the subject that code works great on most sites can it be modified to go through each frame in sites that use frames?
-
Aug 4th, 2001, 07:05 PM
#9
Frenzied Member
Well I'm not on my VB computer to test, but this comes to mind.
VB Code:
Dim f as HTMLFrameElement
For Each f in Doc
For Each e In Doc.All
If e.tagName = "A" Then
Set a = e
x = x + 1
List1.AddItem x & "-" & a.href
End If
Next
Next
'It may be even something simpler like changing this:
For Each e In Doc.All
to:
For Each e In Doc.Frames
Play around with it.....see what you come up with. I'll check it out tomorrow.
-
Aug 5th, 2001, 02:38 AM
#10
Thread Starter
Frenzied Member
oh!!
no matter what I do with this code
I still get
Object variable or with block variable not set
this is on the first line in the sub:
if doc.frames.length = 0 then
I've moved the subs to a module,
I've added this sub:
Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Set doc = wb1.Document
End Sub
I've even Dimmed doc as HTMLDocument
and, I've added a reference to the HTML Object library
What could it be??
Thanks
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 5th, 2001, 09:18 AM
#11
Frenzied Member
egiggey - This seems to work for me.
VB Code:
Dim Doc As New HTMLDocument
Dim e As HTMLGenericElement
Dim a As HTMLAnchorElement
Dim x As Long
Private Sub Form_Load()
WebBrowser1.Navigate "http://www.hairdos.com/frameset.htm"
End Sub
Private Sub Form_Unload(Cancel As Integer)
Set Doc = Nothing
End Sub
Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
URL As Variant, Flags As Variant, TargetFrameName As Variant, _
PostData As Variant, Headers As Variant, Cancel As Boolean)
Set Doc = Nothing
x = 0
End Sub
Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Dim i As Long
Set Doc = WebBrowser1.Document
If (pDisp Is WebBrowser1.Object) Then
For i = 0 To Doc.frames.length - 1
For Each e In Doc.frames(i).Document.All
If e.tagName = "A" Then
Set a = e
x = x + 1
List1.AddItem x & "-" & a.href
End If
Next
Next
End If
End Sub
Last edited by Bloodeye; Aug 5th, 2001 at 09:21 AM.
-
Aug 5th, 2001, 10:01 AM
#12
Frenzied Member
wengang -
I played around with the code. I changed it a little, it seems to work now.
Module Code:
VB Code:
Public Sub ListAllLinkUrls(doc As HTMLDocument, List As ListBox)
Dim i As Long
Dim x As Long
For i = 0 To doc.frames.length - 1
For x = 0 To doc.frames(i).Document.links.length - 1
List.AddItem doc.frames(i).Document.links(x).href
Next x
Next i
End Sub
Public Sub ListAllLinks(doc As HTMLDocument, List As ListBox)
Dim i As Long
Dim x As Long
For i = 0 To doc.frames.length - 1
For x = 0 To doc.frames(i).Document.links.length - 1
List.AddItem doc.frames(i).Document.links(x).outerText
Next x
Next i
End Sub
Form Code:
VB Code:
Dim doc As HTMLDocument
Private Sub Form_Load()
WebBrowser1.Navigate "http://www.hairdos.com/frameset.htm"
End Sub
Private Sub Form_Unload(Cancel As Integer)
Set doc = Nothing
End Sub
Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
URL As Variant, Flags As Variant, TargetFrameName As Variant, _
PostData As Variant, Headers As Variant, Cancel As Boolean)
Set doc = Nothing
End Sub
Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Set doc = WebBrowser1.Document
If (pDisp Is WebBrowser1.Object) Then
ListAllLinks doc, List1
'ListAllLinkUrls doc, List1
End If
End Sub
-
Aug 5th, 2001, 04:57 PM
#13
Thread Starter
Frenzied Member
hello again.
I have tried with all the above code added and it still isn't working. (No error message now, but listbox3 is simply not populating)
I modified the document complete sub like so:
Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Set doc = wb1.Document
MsgBox ("Reaching sub")
If (pDisp Is wb1.Object) Then
MsgBox ("reaching here")
ListAllLinks doc, List3
End If
End Sub
Okay when I do Form_Load I set:
wb1.navigate "http://www.yahoo.com"
and then I get both messages as above, but nothing in List3
I am using the ListLinks and ListLinkURLs subs in a loop thru List1 which is populated with URLs I want to search. List2 holds a name for the corresponding list item in list 1 so that it will later be used as the folder name where the pictures found in the List1 URL (presumably all added to List3) will be placed
So I have:
List1 (list of URLs I want to search)
List2(a name for each item in list1)
List3(an empty list box to hold the URLlinks of each item in List1)
These are my subs on the form (the pertinent ones)
Private Sub Form_Unload(Cancel As Integer)
Set doc = Nothing
End Sub
Private Sub wb1_BeforeNavigate2(ByVal pDisp As Object, _
URL As Variant, Flags As Variant, TargetFrameName As Variant, _
PostData As Variant, Headers As Variant, Cancel As Boolean)
Set doc = Nothing
End Sub
Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Set doc = wb1.Document
MsgBox ("Reaching sub")
If (pDisp Is wb1.Object) Then
MsgBox ("reaching here")
ListAllLinks doc, List3
End If
End Sub
Private Sub Command2_Click()
'download all
For t = 0 To (List1.ListCount - 1)
List3.Clear
wb1.Navigate List1.List(t)
'before passing this point, List3 should be loaded with URL links, right??
If Dir(lblDirectory.Caption & List2.List(t), vbDirectory) = "" Then MkDir (lblDirectory.Caption & List2.List(t))
strNewFolder = lblDirectory.Caption & List2.List(t) & "\"
For s = 0 To (List3.ListCount - 1)
If Right$(List3.List(s), 3) = "jpg" Then
p = Len(List3.List(s)) + 1
Do
p = p - 1
Loop Until Mid(List3.List(s), p, 1) = "/"
LocalFile = Right$(List3.List(s), Len(List3.List(s)) - p)
LocalFile = strNewFolder & LocalFile
lblFile.Caption = List3.List(s)
returnValue = URLDownloadToFile(0, List3.List(s), LocalFile, 0, 0)
End If
Next s
List1.List(t) = ""
List2.List(t) = ""
Next t
MsgBox ("Finished")
End Sub
again, after formload event, the documentcomplete sub is never again called. The items in list1 and list2 just disappear one after the other and list3 never populates once (even after formload).
So what could be the reason for that?
Thanks again, by the way, for a ton of help.
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 5th, 2001, 06:15 PM
#14
Frenzied Member
wengang - Ok I think I have it now. I had altered the code originally so it would work with framed pages. In doing so, it made it non-functional for pages that aren't framed. It should now work for both.
VB Code:
Public Sub ListAllLinkUrls(doc As HTMLDocument, List As ListBox)
Dim i As Long
Dim x As Long
If doc.frames.length = 0 Then
For i = 0 To doc.links.length - 1
List.AddItem doc.links(i).href
Next i
Else
For i = 0 To doc.frames.length - 1
For x = 0 To doc.frames(i).Document.links.length - 1
List.AddItem doc.frames(i).Document.links(x).href
Next x
Next i
End If
End Sub
Public Sub ListAllLinks(doc As HTMLDocument, List As ListBox)
Dim i As Long
Dim x As Long
If doc.frames.length = 0 Then
For i = 0 To doc.links.length - 1
If doc.links(i).outerText <> "" Then
List.AddItem doc.links(i).outerText
End If
Next i
Else
For i = 0 To doc.frames.length - 1
For x = 0 To doc.frames(i).Document.links.length - 1
If doc.frames(i).Document.links(x).outerText <> "" Then
List.AddItem doc.frames(i).Document.links(x).outerText
End If
Next x
Next i
End If
End Sub
-
Aug 5th, 2001, 08:01 PM
#15
Thread Starter
Frenzied Member
Hey.
Yes, the two subs above work. I have tested them with form_load navigate "google.com"
but my problem is now I can't actually get the event to fire in my application. Specifically, when I click the command button (see code above) it should loop thru each item in the list1 box, put all links found on that page and then download the jpg files
Well, the documentcomplete event is not firing at all.
Is it because the document is not finishing?
Does document complete mean the same thing as "Done" in IE or does it mean the html has fully downloaded (as opposed to all the items on the page)
If that is the case, could it be that I need some kind of timer or delay loop to allow the webbrowser to download the page?
As it is now, this command button code does not work.
What do you think?
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
-
Aug 5th, 2001, 08:09 PM
#16
Thread Starter
Frenzied Member
yep that was it
it works great now!!
i added a bln to make the command button sub wait until it got the document complete event finished
Wow! Once I fine tune this I'll email it to you.
(not that you need it, but just to say thanks).
Wen Gang, Programmer
VB6, QB, HTML, ASP, VBScript, Visual C++, Java
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|