|
-
Oct 29th, 2008, 02:14 AM
#1
Thread Starter
Junior Member
File Download [RESOLVED]
so I'm using code from http://schmidt.devlib.org/java/file-download.html in the hopes of inputting a URL and getting the file that URL points to. It works, sometimes. 90% of the time it doesn't work. Also, it seems a lot of the pages I want to download do not end in html, although I'm not too sure if that is a problem or not. Anyway, what I want is to know if its possible to supply a URL as a string and then download the file that URL points to. I have a feeling there's an easier way to do that, but I'm not sure.
Here's an example of a page that I want to download:
https://www.sportsbet.com.au/results/racing/Date/today
I'm writing a little program that download HTML files, scapes important information from those HTML files and writes the important stuff to a database. So far, as you might have guessed, things aren't going too well.
Code:
import java.io.*;
import java.net.*;
/*
* Command line program to download data from URLs and save
* it to local files. Run like this:
* java FileDownload http://schmidt.devlib.org/java/file-download.html
* @author Marco Schmidt
*/
public class FileDownload {
public static void download(String address, String localFileName) {
OutputStream out = null;
URLConnection conn = null;
InputStream in = null;
SocketAddress sa = new InetSocketAddress("proxy.csu.edu.au", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, sa);
try {
URL url = new URL(address);
out = new BufferedOutputStream(
new FileOutputStream(localFileName));
conn = url.openConnection(proxy);
in = conn.getInputStream();
byte[] buffer = new byte[1024];
int numRead;
long numWritten = 0;
while ((numRead = in.read(buffer)) != -1) {
out.write(buffer, 0, numRead);
numWritten += numRead;
}
System.out.println(localFileName + "\t" + numWritten);
} catch (Exception exception) {
exception.printStackTrace();
} finally {
try {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
} catch (IOException ioe) {
}
}
}
public static void download(String address) {
int lastSlashIndex = address.lastIndexOf('/');
if (lastSlashIndex >= 0 &&
lastSlashIndex < address.length() - 1) {
download(address, address.substring(lastSlashIndex + 1));
} else {
System.err.println("Could not figure out local file name for " +
address);
}
}
public static void main(String[] args) {
download("http://schmidt.devlib.org/java/file-download.html");
}
}
If anyone has any ideas about another way to download files, or any other tips they would be greatly appreciated.
Angus Cheng
Last edited by anguruso; Oct 31st, 2008 at 05:19 AM.
Reason: Add resolved tag, put a cool green tick
-
Oct 29th, 2008, 03:05 AM
#2
Thread Starter
Junior Member
Re: File Download
Looks like I got it to work, for some cases.
Although this code doesn't work for URLs that don't end in a filename, such as this one. If I were trying to get the HTML code behind a page like this one, how would I get it?
-
Oct 29th, 2008, 06:38 AM
#3
Re: File Download
Well, your problem is locating file names. Your code is not checking if the url is a file or a directory. If you want your code to work, try the following url instead:
https://www.sportsbet.com.au/results...day/index.html
"I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
My Blog
-
Oct 31st, 2008, 05:18 AM
#4
Thread Starter
Junior Member
Re: File Download
Well basically I'm a stupid idiot and spent a lot of time on something really simple. Right now everything is working just as I want it to which is GREAT
What I did was very simple.
1. I didn't bother with proxy settings and used a direct internet connection (might bite me in the *** later).
2. There are two download methods in the above code.
download(String address);
download(String address, String outputFileName);
At first I was calling the first download function, which looks for a filename from the address, then calls the second download function.
So now I have supplied the address of the page I want to download, then hardcoded an outputFileName. Everything works and just in case anyone out there is as stupid as me (not likely) here it is:
Code:
import java.io.*;
import java.net.*;
/*
* Command line program to download data from URLs and save
* it to local files. Run like this:
* java FileDownload [SOME SORT OF ADDRESS]
* @author Marco Schmidt
*/
public class FileDownload {
public static void download(String address, String localFileName) {
OutputStream out = null;
URLConnection conn = null;
InputStream in = null;
//SocketAddress sa = new InetSocketAddress("proxy.csu.edu.au", 8080);
//Proxy proxy = new Proxy(Proxy.Type.HTTP, sa);
try {
URL url = new URL(address);
out = new BufferedOutputStream(
new FileOutputStream(localFileName));
//conn = url.openConnection(proxy);
conn = url.openConnection();
in = conn.getInputStream();
byte[] buffer = new byte[1024];
int numRead;
long numWritten = 0;
while ((numRead = in.read(buffer)) != -1) {
out.write(buffer, 0, numRead);
numWritten += numRead;
}
System.out.println(localFileName + "\t" + numWritten);
} catch (Exception exception) {
exception.printStackTrace();
} finally {
try {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
} catch (IOException ioe) {
}
}
}
public static void download(String address) {
int lastSlashIndex = address.lastIndexOf('/');
if (lastSlashIndex >= 0 &&
lastSlashIndex < address.length() - 1) {
download(address, address.substring(lastSlashIndex + 1));
} else {
System.err.println("Could not figure out local file name for " +
address);
}
}
public static void main(String[] args) {
download("[ADDRESS]", "jur.txt");
}
}
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|