Results 1 to 9 of 9

Thread: Download files attached to a website

  1. #1

    Thread Starter
    Member
    Join Date
    Nov 2011
    Posts
    34

    Lightbulb Download files attached to a website

    Can anyone help me with a VB program to download files linked from a website. The idea is that the user enters the URL of the website and then the program will list the files and the user clicks a button to save all the files to disk. The file types that I want it to show and download are:

    • Image Files
    • CSS Files
    • HTML Files
    • JS Files


    If anyone can help at all it would be greatly appreciated.

    Thanks,
    Benedict3578

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Download files attached to a website

    Most web sites won't provide direct access to style sheets and script files so, at most, you be able to get HTML files and images. Many web sites won't even have HTML files because their pages are dynamic and generated by PHP or ASP.NET or the like. I think that you might want to think about this a bit more to determine whether it's viable and worthwhile.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3

    Thread Starter
    Member
    Join Date
    Nov 2011
    Posts
    34

    Question Re: Download files attached to a website

    Quote Originally Posted by jmcilhinney View Post
    Most web sites won't provide direct access to style sheets and script files so, at most, you be able to get HTML files and images. Many web sites won't even have HTML files because their pages are dynamic and generated by PHP or ASP.NET or the like. I think that you might want to think about this a bit more to determine whether it's viable and worthwhile.
    Just using the Safari web inspector I can list and view over 20 different .js and four .css files on this website. I have also tried on many other websites and I seem to be able to list the .js and .css files, and it appears that I can see all of them as I have tested it on my own server. The main interest is in image files so even if the program could only access those it would be fine.

  4. #4
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Download files attached to a website

    You have to be able to access the style sheets and scripts in order for them to be used in the browser but I think that what you'll find is happening there is that the browser is loading a page and then checking its header to find style sheets and scripts. It's not actually going to the web server and browsing because any decent admin will have turned directory browsing off. That means that you can specify the URL of a site and download the default page but from there you can only crawl the pages. It's not like an FTP site where you can get a file or folder listing with a single command.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  5. #5
    Lively Member
    Join Date
    Oct 2012
    Posts
    71

    Re: Download files attached to a website

    just download internet download manager :P

  6. #6

    Thread Starter
    Member
    Join Date
    Nov 2011
    Posts
    34

    Question Re: Download files attached to a website

    Quote Originally Posted by jmcilhinney View Post
    You have to be able to access the style sheets and scripts in order for them to be used in the browser but I think that what you'll find is happening there is that the browser is loading a page and then checking its header to find style sheets and scripts. It's not actually going to the web server and browsing because any decent admin will have turned directory browsing off. That means that you can specify the URL of a site and download the default page but from there you can only crawl the pages. It's not like an FTP site where you can get a file or folder listing with a single command.
    That is all I need though. If my program can look through the headers and find the files listed in the header even if they are not all the files in the server then that is fine. The only files needed are the ones that the HTML file requires to load.

  7. #7

    Thread Starter
    Member
    Join Date
    Nov 2011
    Posts
    34

    Re: Download files attached to a website

    Quote Originally Posted by nosewey View Post
    just download internet download manager :P
    The point is that I need to develop my own program to do it because the download part is only step one in what will be a complex program. I can already download the files but the point is that my program has to be able to do it.

  8. #8
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Download files attached to a website

    In that case, you can use a WebClient to download a file from a URL. If you provide the domain then it will download the default document, e.g. Index.htm or Default.aspx. You can then use the HTML Agility Pack to load the document into a DOM and examine it's contents from there.

    http://htmlagilitypack.codeplex.com/
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  9. #9

    Thread Starter
    Member
    Join Date
    Nov 2011
    Posts
    34

    Re: Download files attached to a website

    Quote Originally Posted by jmcilhinney View Post
    In that case, you can use a WebClient to download a file from a URL. If you provide the domain then it will download the default document, e.g. Index.htm or Default.aspx. You can then use the HTML Agility Pack to load the document into a DOM and examine it's contents from there.

    http://htmlagilitypack.codeplex.com/
    Awesome. Thanks for the help!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width