Results 1 to 16 of 16

Thread: using vb to download via html links ??

  1. #1

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604

    using vb to download via html links ??

    I have been working on an application that downloads a webpage and then searches the html for links to image files
    it works like

    if there is a "a href=" and a ".jpg" on the same line with no "<" or ">" between, then I assume that is a reference to an image.

    This works okay, but not always. I have to let the app decide whether the link is absolute (contains a "//")
    or relative (no "//")
    also, some websites are set up to prevent this method of downloading.

    So, this is my idea. Open the html page once it has downloaded into the app itself (I know there is a control for that, i just don't remember which one)

    then, the question is, can I make the VB emulate clicking a link to a file, then saving the file to disk and going back to the html page?

    If you can help, I appreciate it.
    Thanks
    Wengang
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  2. #2
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    If I understand you correctly you would like to parse out links that are like this:
    http://dallas.citynews.com/8647.gif (That's one of my Bloodeye Gifs)

    This is an anchor element that links to a gif file. Are these the kind of links that you want to parse out?

    If they are then I would use a WebBrowser Control and URLDownloadToFile API.
    -Use the WebBrowser to Navigate to a page that have "Image Links"
    -Once the links have been parsed out, then use URLDownloadToFile API to download the image file to your HD.

  3. #3

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    Hey. Thanks
    it sounds like what I was getting at
    but I hope you're available later because I'm sure more questions will come up soon

    Thanks again.
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  4. #4
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    Check out this link for getting every link in a web-page using the WebBrowser Control. You could adapt this code to check for an Image file extension on every link.

    http://forums.vb-world.net/showthrea...threadid=74893

  5. #5

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    I found these 2 subs on the website you mentioned earlier

    Public Static Sub ListAllLinkUrls(doc as HTMLdocument, List As ListBox)
    If doc.frames.length = 0 Then
    For i = 0 To doc.links.length - 1
    List.AddItem doc.links(i).href
    Next i
    Else
    For i = 0 To doc.frames.length - 1
    ListAllLinks doc.frames(i), List
    Next i
    End If
    End Sub

    Public Static Sub ListAllLinks(doc as HTMLdocument, List As ListBox)
    If doc.frames.length = 0 Then
    For i = 0 To doc.links.length - 1
    List.AddItem doc.links(i).outerText
    Next i
    Else
    For i = 0 To doc.frames.length - 1
    ListAllLinks doc.frames(i), List
    Next i
    End If
    End Sub

    But I got stuck on HTMLdocument. That type doesn't exist
    it is being regarded as user-defined but I haven't defined it or set it to any object
    THe usage code is for the webbrowser and a listbox like this:

    ListAllLinks webbrowser1.document, List1

    and all links on the page should be listed in a list box (or the link text if using this second sub)

    any ideas?
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  6. #6
    egiggey
    Guest
    you have to add a referance to the microsoft html library
    and that will fix it

  7. #7
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    You will need to make a reference to the MSHTML.lib(Miscrosoft HTML Object library).
    You will also need to set doc = WebBrowser1.Document in the DocumentComplete Event

    The same applies to this code from the last link I posted.
    VB Code:
    1. Dim Doc As New HTMLDocument
    2. Dim e As HTMLGenericElement
    3. Dim a As HTMLAnchorElement
    4. Dim x As Integer
    5.  
    6.  
    7.  
    8. Private Sub Form_Load()
    9.     WebBrowser1.Navigate "http://altavista.com/"
    10. End Sub
    11.  
    12. Private Sub Form_Unload(Cancel As Integer)
    13.     Set Web = Nothing
    14. End Sub
    15.  
    16. Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
    17. URL As Variant, Flags As Variant, TargetFrameName As Variant, _
    18. PostData As Variant, Headers As Variant, Cancel As Boolean)
    19.     Set Web = Nothing
    20.     x = 0
    21. End Sub
    22.  
    23. Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    24.     Set Doc = WebBrowser1.Document
    25.     If (pDisp Is WebBrowser1.Object) Then
    26.         For Each e In Doc.All
    27.             If e.tagName = "A" Then
    28.                 Set a = e
    29.                 x = x + 1
    30.                 List1.AddItem x & "-" & a.href
    31.             End If
    32.         Next
    33.     End If
    34. End Sub

  8. #8
    egiggey
    Guest
    hey bloodeye

    gotta question while your on the subject that code works great on most sites can it be modified to go through each frame in sites that use frames?

  9. #9
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    Well I'm not on my VB computer to test, but this comes to mind.

    VB Code:
    1. Dim f as HTMLFrameElement
    2.  
    3. For Each f in Doc    
    4.     For Each e In Doc.All
    5.         If e.tagName = "A" Then
    6.             Set a = e
    7.             x = x + 1
    8.             List1.AddItem x & "-" & a.href
    9.         End If
    10.     Next
    11. Next
    12.  
    13. 'It may be even something simpler like changing this:
    14. For Each e In Doc.All
    15.  
    16. to:
    17.  
    18. For Each e In Doc.Frames

    Play around with it.....see what you come up with. I'll check it out tomorrow.

  10. #10

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    oh!!
    no matter what I do with this code
    I still get
    Object variable or with block variable not set
    this is on the first line in the sub:
    if doc.frames.length = 0 then

    I've moved the subs to a module,

    I've added this sub:
    Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    Set doc = wb1.Document
    End Sub

    I've even Dimmed doc as HTMLDocument

    and, I've added a reference to the HTML Object library

    What could it be??

    Thanks
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  11. #11
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    egiggey - This seems to work for me.

    VB Code:
    1. Dim Doc As New HTMLDocument
    2. Dim e As HTMLGenericElement
    3. Dim a As HTMLAnchorElement
    4. Dim x As Long
    5.  
    6.  
    7.  
    8. Private Sub Form_Load()
    9.     WebBrowser1.Navigate "http://www.hairdos.com/frameset.htm"
    10. End Sub
    11.  
    12. Private Sub Form_Unload(Cancel As Integer)
    13.     Set Doc = Nothing
    14. End Sub
    15.  
    16. Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
    17. URL As Variant, Flags As Variant, TargetFrameName As Variant, _
    18. PostData As Variant, Headers As Variant, Cancel As Boolean)
    19.     Set Doc = Nothing
    20.     x = 0
    21. End Sub
    22.  
    23. Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    24.     Dim i As Long
    25.     Set Doc = WebBrowser1.Document
    26.    
    27.     If (pDisp Is WebBrowser1.Object) Then
    28.         For i = 0 To Doc.frames.length - 1
    29.             For Each e In Doc.frames(i).Document.All
    30.                 If e.tagName = "A" Then
    31.                     Set a = e
    32.                     x = x + 1
    33.                     List1.AddItem x & "-" & a.href
    34.                 End If
    35.             Next
    36.          Next
    37.     End If
    38. End Sub
    Last edited by Bloodeye; Aug 5th, 2001 at 09:21 AM.

  12. #12
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    wengang -

    I played around with the code. I changed it a little, it seems to work now.

    Module Code:
    VB Code:
    1. Public Sub ListAllLinkUrls(doc As HTMLDocument, List As ListBox)
    2.     Dim i As Long
    3.     Dim x As Long
    4.    
    5.     For i = 0 To doc.frames.length - 1
    6.         For x = 0 To doc.frames(i).Document.links.length - 1
    7.             List.AddItem doc.frames(i).Document.links(x).href
    8.         Next x
    9.     Next i
    10. End Sub
    11.  
    12. Public Sub ListAllLinks(doc As HTMLDocument, List As ListBox)
    13.     Dim i As Long
    14.     Dim x As Long
    15.    
    16.     For i = 0 To doc.frames.length - 1
    17.         For x = 0 To doc.frames(i).Document.links.length - 1
    18.             List.AddItem doc.frames(i).Document.links(x).outerText
    19.         Next x
    20.     Next i
    21. End Sub

    Form Code:
    VB Code:
    1. Dim doc As HTMLDocument
    2.  
    3. Private Sub Form_Load()
    4.     WebBrowser1.Navigate "http://www.hairdos.com/frameset.htm"
    5. End Sub
    6.  
    7. Private Sub Form_Unload(Cancel As Integer)
    8.     Set doc = Nothing
    9. End Sub
    10.  
    11. Private Sub WebBrowser1_BeforeNavigate2(ByVal pDisp As Object, _
    12. URL As Variant, Flags As Variant, TargetFrameName As Variant, _
    13. PostData As Variant, Headers As Variant, Cancel As Boolean)
    14.     Set doc = Nothing
    15. End Sub
    16.  
    17. Private Sub WebBrowser1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    18.     Set doc = WebBrowser1.Document
    19.     If (pDisp Is WebBrowser1.Object) Then
    20.         ListAllLinks doc, List1
    21.         'ListAllLinkUrls doc, List1
    22.     End If
    23. End Sub

  13. #13

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    hello again.

    I have tried with all the above code added and it still isn't working. (No error message now, but listbox3 is simply not populating)

    I modified the document complete sub like so:

    Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    Set doc = wb1.Document
    MsgBox ("Reaching sub")
    If (pDisp Is wb1.Object) Then
    MsgBox ("reaching here")
    ListAllLinks doc, List3
    End If
    End Sub

    Okay when I do Form_Load I set:
    wb1.navigate "http://www.yahoo.com"
    and then I get both messages as above, but nothing in List3

    I am using the ListLinks and ListLinkURLs subs in a loop thru List1 which is populated with URLs I want to search. List2 holds a name for the corresponding list item in list 1 so that it will later be used as the folder name where the pictures found in the List1 URL (presumably all added to List3) will be placed

    So I have:
    List1 (list of URLs I want to search)
    List2(a name for each item in list1)
    List3(an empty list box to hold the URLlinks of each item in List1)

    These are my subs on the form (the pertinent ones)

    Private Sub Form_Unload(Cancel As Integer)
    Set doc = Nothing
    End Sub

    Private Sub wb1_BeforeNavigate2(ByVal pDisp As Object, _
    URL As Variant, Flags As Variant, TargetFrameName As Variant, _
    PostData As Variant, Headers As Variant, Cancel As Boolean)
    Set doc = Nothing
    End Sub

    Private Sub wb1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    Set doc = wb1.Document
    MsgBox ("Reaching sub")
    If (pDisp Is wb1.Object) Then
    MsgBox ("reaching here")
    ListAllLinks doc, List3
    End If
    End Sub

    Private Sub Command2_Click()
    'download all

    For t = 0 To (List1.ListCount - 1)
    List3.Clear
    wb1.Navigate List1.List(t)
    'before passing this point, List3 should be loaded with URL links, right??
    If Dir(lblDirectory.Caption & List2.List(t), vbDirectory) = "" Then MkDir (lblDirectory.Caption & List2.List(t))
    strNewFolder = lblDirectory.Caption & List2.List(t) & "\"
    For s = 0 To (List3.ListCount - 1)
    If Right$(List3.List(s), 3) = "jpg" Then
    p = Len(List3.List(s)) + 1
    Do
    p = p - 1
    Loop Until Mid(List3.List(s), p, 1) = "/"
    LocalFile = Right$(List3.List(s), Len(List3.List(s)) - p)
    LocalFile = strNewFolder & LocalFile
    lblFile.Caption = List3.List(s)
    returnValue = URLDownloadToFile(0, List3.List(s), LocalFile, 0, 0)
    End If
    Next s
    List1.List(t) = ""
    List2.List(t) = ""
    Next t
    MsgBox ("Finished")
    End Sub

    again, after formload event, the documentcomplete sub is never again called. The items in list1 and list2 just disappear one after the other and list3 never populates once (even after formload).


    So what could be the reason for that?
    Thanks again, by the way, for a ton of help.
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  14. #14
    Frenzied Member
    Join Date
    Mar 2001
    Location
    You are HERE •™
    Posts
    1,300
    wengang - Ok I think I have it now. I had altered the code originally so it would work with framed pages. In doing so, it made it non-functional for pages that aren't framed. It should now work for both.


    VB Code:
    1. Public Sub ListAllLinkUrls(doc As HTMLDocument, List As ListBox)
    2.     Dim i As Long
    3.     Dim x As Long
    4.    
    5.     If doc.frames.length = 0 Then
    6.         For i = 0 To doc.links.length - 1
    7.             List.AddItem doc.links(i).href
    8.         Next i
    9.     Else
    10.         For i = 0 To doc.frames.length - 1
    11.             For x = 0 To doc.frames(i).Document.links.length - 1
    12.                 List.AddItem doc.frames(i).Document.links(x).href
    13.             Next x
    14.         Next i
    15.     End If
    16. End Sub
    17.  
    18. Public Sub ListAllLinks(doc As HTMLDocument, List As ListBox)
    19.     Dim i As Long
    20.     Dim x As Long
    21.    
    22.     If doc.frames.length = 0 Then
    23.         For i = 0 To doc.links.length - 1
    24.             If doc.links(i).outerText <> "" Then
    25.                 List.AddItem doc.links(i).outerText
    26.             End If
    27.         Next i
    28.     Else
    29.         For i = 0 To doc.frames.length - 1
    30.             For x = 0 To doc.frames(i).Document.links.length - 1
    31.                 If doc.frames(i).Document.links(x).outerText <> "" Then
    32.                     List.AddItem doc.frames(i).Document.links(x).outerText
    33.                 End If
    34.             Next x
    35.         Next i
    36.     End If
    37. End Sub

  15. #15

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    Hey.
    Yes, the two subs above work. I have tested them with form_load navigate "google.com"

    but my problem is now I can't actually get the event to fire in my application. Specifically, when I click the command button (see code above) it should loop thru each item in the list1 box, put all links found on that page and then download the jpg files

    Well, the documentcomplete event is not firing at all.

    Is it because the document is not finishing?

    Does document complete mean the same thing as "Done" in IE or does it mean the html has fully downloaded (as opposed to all the items on the page)

    If that is the case, could it be that I need some kind of timer or delay loop to allow the webbrowser to download the page?

    As it is now, this command button code does not work.

    What do you think?
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

  16. #16

    Thread Starter
    Frenzied Member wengang's Avatar
    Join Date
    Mar 2000
    Location
    Beijing, China
    Posts
    1,604
    yep that was it

    it works great now!!

    i added a bln to make the command button sub wait until it got the document complete event finished

    Wow! Once I fine tune this I'll email it to you.
    (not that you need it, but just to say thanks).
    Wen Gang, Programmer
    VB6, QB, HTML, ASP, VBScript, Visual C++, Java

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width