Results 1 to 3 of 3

Thread: How can I make a complete list of all the urls on a remote website with Excel VBA?

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Jan 2012
    Posts
    19

    How can I make a complete list of all the urls on a remote website with Excel VBA?

    Basically I need a sitemap. But I do not need the 4 or 5 different pieces of info that a sitemap puts with each webpage, like how often the webpage is likely to change, and then submits to a search site like google. I need a list of every webpage on the site and that is all.

    I do not want to automate ie to go to a site that creates a sitemap for you. I tried that and then their site went down soon after. I would prefer to figure it out and do it myself from a macro.

    I want it to work for different websites written and hosted differently from each other. How can I do this in Excel VBA? I have googled and found SiteMapPath. I can only find examples of it used with asp.net I think. I am having a hard time getting started with this at all in VBA. Thanks!

  2. #2
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: How can I make a complete list of all the urls on a remote website with Excel VBA

    the only way i can think to do what you want would be if you have full ftp access, or webdev or other access to the remote filesystem

    most web server hosts specifically try to prevent what you are trying to do
    if they have an index page then you can easily get all the files names from that page using IE automation, without crashing the site
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  3. #3
    New Member
    Join Date
    Jul 2011
    Posts
    9

    Re: How can I make a complete list of all the urls on a remote website with Excel VBA

    If I were you, I'd write an application that connects to a website's index page using SHDOCVW.dll and then log all the links you come across by finding the elements that use <ahref> tags, filtering out any values that do not match the first part of the URL (to avoid external links) and storing the rest.

    Then work through each link that you have logged and do the same until you have a tree. I'm not really sure what the best way to log the data would be, perhaps create a dictionary/collection with the last term of the URL as the key and have each item as an array containing the links on that page. If the page has no links, then store an empty array or a 'Null'.

    Once you've got a key for every page, close the webpage and construct your site map programmatically using the data you've found.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width