Download files attached to a website
Can anyone help me with a VB program to download files linked from a website. The idea is that the user enters the URL of the website and then the program will list the files and the user clicks a button to save all the files to disk. The file types that I want it to show and download are:
- Image Files
- CSS Files
- HTML Files
- JS Files
If anyone can help at all it would be greatly appreciated.
Thanks,
Benedict3578
:wave:
Re: Download files attached to a website
Most web sites won't provide direct access to style sheets and script files so, at most, you be able to get HTML files and images. Many web sites won't even have HTML files because their pages are dynamic and generated by PHP or ASP.NET or the like. I think that you might want to think about this a bit more to determine whether it's viable and worthwhile.
Re: Download files attached to a website
Quote:
Originally Posted by
jmcilhinney
Most web sites won't provide direct access to style sheets and script files so, at most, you be able to get HTML files and images. Many web sites won't even have HTML files because their pages are dynamic and generated by PHP or ASP.NET or the like. I think that you might want to think about this a bit more to determine whether it's viable and worthwhile.
Just using the Safari web inspector I can list and view over 20 different .js and four .css files on this website. I have also tried on many other websites and I seem to be able to list the .js and .css files, and it appears that I can see all of them as I have tested it on my own server. The main interest is in image files so even if the program could only access those it would be fine.
Re: Download files attached to a website
You have to be able to access the style sheets and scripts in order for them to be used in the browser but I think that what you'll find is happening there is that the browser is loading a page and then checking its header to find style sheets and scripts. It's not actually going to the web server and browsing because any decent admin will have turned directory browsing off. That means that you can specify the URL of a site and download the default page but from there you can only crawl the pages. It's not like an FTP site where you can get a file or folder listing with a single command.
Re: Download files attached to a website
just download internet download manager :P
Re: Download files attached to a website
Quote:
Originally Posted by
jmcilhinney
You have to be able to access the style sheets and scripts in order for them to be used in the browser but I think that what you'll find is happening there is that the browser is loading a page and then checking its header to find style sheets and scripts. It's not actually going to the web server and browsing because any decent admin will have turned directory browsing off. That means that you can specify the URL of a site and download the default page but from there you can only crawl the pages. It's not like an FTP site where you can get a file or folder listing with a single command.
That is all I need though. If my program can look through the headers and find the files listed in the header even if they are not all the files in the server then that is fine. The only files needed are the ones that the HTML file requires to load.
Re: Download files attached to a website
Quote:
Originally Posted by
nosewey
just download internet download manager :P
The point is that I need to develop my own program to do it because the download part is only step one in what will be a complex program. I can already download the files but the point is that my program has to be able to do it.
Re: Download files attached to a website
In that case, you can use a WebClient to download a file from a URL. If you provide the domain then it will download the default document, e.g. Index.htm or Default.aspx. You can then use the HTML Agility Pack to load the document into a DOM and examine it's contents from there.
http://htmlagilitypack.codeplex.com/
Re: Download files attached to a website
Quote:
Originally Posted by
jmcilhinney
In that case, you can use a WebClient to download a file from a URL. If you provide the domain then it will download the default document, e.g. Index.htm or Default.aspx. You can then use the HTML Agility Pack to load the document into a DOM and examine it's contents from there.
http://htmlagilitypack.codeplex.com/
Awesome. Thanks for the help! :D