|
-
Oct 24th, 2011, 02:06 PM
#1
Thread Starter
Hyperactive Member
[RESOLVED] Speeding up my code
I have a method which uses the WebClient.DownloadString to download the html for 12x20 + 20 pages (260). It is a loop that looks simplified like this:
Code:
foreach page in pages (which are the 20)
Use regex to find 12 links to other pages
foreach of these matches (12)
Use regex to find an int
store that int in a list
next match
next page
This process takes around 100 seconds at the longest. How can I speed it up? Or is it impossible? I don't have that fast internet at home, about 10 megabit down.
Or is it the regex that takes time?
-
Oct 24th, 2011, 02:46 PM
#2
Re: Speeding up my code
I would expect that the time taken to execute a regular expression would be next to nothing compared to the time taken to download something from the internet. Of course, you can use an instance of StopWatch (if I remember correctly) to verify that. If my assumption is correct, you can perhaps look into launching multiple threads to download multiple html pages at the same time - this feels safe provided that the pages themselves are of reasonable.
-
Oct 24th, 2011, 03:10 PM
#3
Thread Starter
Hyperactive Member
Re: Speeding up my code
Okay. I think I will look into Thread Pool to queue the processing of each page instead of doing them one at a time. Though I am not sure how to handle the result/value of each processing.
-
Oct 24th, 2011, 07:35 PM
#4
Re: Speeding up my code
I wouldn't use explicit threading. Use the async API of whatever you're using to download the page, and process the page in a callback method. This will often run on a worker thread, so you may need to invoke back to the UI thread to update the display.
If using the TPL to do your async-y-ness, you can specify that continuations run on the thread that sets them up (the UI thread), and they get invoked in a manner similar to regular UI events.
-
Oct 25th, 2011, 09:40 AM
#5
Thread Starter
Hyperactive Member
Re: Speeding up my code
Hmm I think I'll go with the async method, TPL is too advanced for me yet lol (read about it).
Though I am not sure what you mean.
I should run the async, then in the "finished" method I should process the page in a separate method and then what? (I haven't done any course in programming so I lack a bit on the theoretical side)
What I am doing now is that I am adding up each returned value from the processpage function, which is an int. How can I do that in the way you described?
-
Oct 26th, 2011, 09:24 AM
#6
PowerPoster
Re: Speeding up my code
threading does not mean things will go faster. Remember that. you can thread (async) which means that it will do x amount of processing at the same time.
The problem is:
1) The time taken to download the string
2) Efficient Regex expression
if you can, compile your regex expression, as this will increase performance. Also NGEN is a possibility for your .NET code, as this will compile down to native code.
Really, its all dependant upon the speed of downloading your data in addition to the processing happening in your foreach loop plus your Regex expression.
-
Oct 26th, 2011, 01:45 PM
#7
Thread Starter
Hyperactive Member
Re: Speeding up my code
Yeah I know. But I think I can decrease the time it takes with like 20% if I multi-thread it instead of running it on a single thread.
I will look into regex compiling and NGEN.
-
Oct 26th, 2011, 02:29 PM
#8
PowerPoster
Re: Speeding up my code
remember though, threading is expensive. so for such a small timeconsuming task, it may actually cause more overheads.
-
Oct 26th, 2011, 03:31 PM
#9
Thread Starter
Hyperactive Member
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|