Results 1 to 8 of 8

Thread: Web crawler...

  1. #1

    Thread Starter
    Member
    Join Date
    Nov 1999
    Posts
    44
    Hello,

    Trying to figure out a way to build a web-crawler to find broken links on a web site...

    I am familiar with the simplicity of the FTP functions of getting a file... and with the built-in browser feature...

    Is there a way to download a HTML file by itself and store it into a string or array?

    Thanks so much,
    Scott

  2. #2
    Addicted Member
    Join Date
    Aug 2000
    Posts
    208
    Visual Basic is not de best language to do that but..
    I believe it's possible ..
    try whit INET control ..




  3. #3
    Guest
    Or you could use the Webbrowser Control and get and store the html page like this:

    Code:
    Private Sub Command1_Click()
    Dim readpage As String
    readpage = Webbrowser1.Document.documentElement.innerHTML
    End Sub

  4. #4
    Junior Member
    Join Date
    Sep 2000
    Posts
    25

    Cool Ive got VB Webcrawler code.....

    I'll email it to you.... Let me know if it helps.

    The code is NOT mine. I got it from "Visual Basic Programmers Library" by JAMSA Press. I highly recommend the book, which I bought for $54.95 at Borders.

    However, I'd buy it through http://www.bestbookbuys.com . If you look here ( http://www.bestbookbuys.com/cgi-bin/...26&search.y=13 ) you can get it delivered to you for $24.95!

    Let me know if it helps.....

  5. #5
    Junior Member
    Join Date
    Sep 2000
    Posts
    25

    Question Scotty! Come in, Scotty!

    Oops! Ya didn't leave an email. Even a sophisticated machine like a transporter can't get the code to ya with no coordinates.

    Code ready.

    Standing by for coordinates....

  6. #6
    Guest
    If you were to use INET control, you'd put the webpage into a string like this:

    Code:
    Private Sub Command1_Click()
    Dim ReturnStr As String
    Dim websource As String
    websource = Inet1.OpenURL("http://forums.vb-world.net", icString)
    ReturnStr = Inet1.GetChunk(2048, icString)
    Do While Len(ReturnStr) <> 0
        DoEvents
        websource = websource & ReturnStr
        ReturnStr = Inet1.GetChunk(2048, icString)
    Loop
    End Sub

  7. #7

    Thread Starter
    Member
    Join Date
    Nov 1999
    Posts
    44

    Smile

    My email = [email protected]

    Thanks for all the replies...

    My goal is to search inside html tags for obsolete info...

    Mostly pages that have been moved to another server.



  8. #8
    Guest

    Thumbs up

    Matthew:
    You are a life saver!
    I've been looking for this code for weeks (the Inet one).
    Everywhere I asked I got the same answer:
    strMyWebpage = Inet1.OpenURL("http://forums.vb-world.net", icString)
    And it would only return a part of the page.
    Thanks. Now back to the topic.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width