|
-
May 24th, 2004, 11:47 AM
#1
Thread Starter
Hyperactive Member
i've tried using the code that tewl kindly provided and it seems to fail when trying to get the image using the web client
Code:
private void Crawl()
{
//get source
string source = "";
HttpWebRequest hwr = (HttpWebRequest)WebRequest.Create("http://intranet");
HttpWebResponse hwrsp = (HttpWebResponse)hwr.GetResponse();
Stream s = hwrsp.GetResponseStream();
StreamReader sr = new StreamReader(s);
source += sr.ReadToEnd();
sr.Close();
s.Close();
//MessageBox.Show(source);
//get image stuff
WebClient dl = new WebClient();
string flink = "", fpath = "";
string[] f = null;
Match mMatch = Regex.Match(source, "<img([^>])src=[',\"](.*?)[',\"].*?>", RegexOptions.IgnoreCase);
while (mMatch.Success)
{
flink = "http://intranet" + mMatch.Groups[2].ToString();
f = Regex.Split(flink,"/");
fpath = @"D:\Watermark\WebCrawler\CrawledImages\" + f[f.Length - 1];
dl.DownloadFile(flink,fpath);
mMatch = mMatch.NextMatch();
}
}
I'm trying to test this against my works intranet homr page, i don't know whether thats got anything to do with it?
here is the output from vs:
'DefaultDomain': Loaded 'c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll', No symbols loaded.
'WebCrawler': Loaded 'D:\Watermark\WebCrawler\WebCrawler\bin\Debug\WebCrawler.exe', Symbols loaded.
'WebCrawler.exe': Loaded 'c:\windows\assembly\gac\system.windows.forms\1.0.5000.0__b77a5c561934e089\system.windows.forms.dll' , No symbols loaded.
'WebCrawler.exe': Loaded 'c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll', No symbols loaded.
'WebCrawler.exe': Loaded 'c:\windows\assembly\gac\system.drawing\1.0.5000.0__b03f5f7f11d50a3a\system.drawing.dll', No symbols loaded.
'WebCrawler.exe': Loaded 'c:\windows\assembly\gac\system.xml\1.0.5000.0__b77a5c561934e089\system.xml.dll', No symbols loaded.
An unhandled exception of type 'System.Net.WebException' occurred in system.dll
Additional information: The underlying connection was closed: The remote name could not be resolved.
The program '[2996] WebCrawler.exe' has exited with code 0 (0x0).
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|