Already tried parallel.foreach but wasnt fast enough. maybe something with background threads? couldnt get it to work because when i try it it does a for each on every background thread i start.
for each line in list is what i try to do.
Printable View
Already tried parallel.foreach but wasnt fast enough. maybe something with background threads? couldnt get it to work because when i try it it does a for each on every background thread i start.
for each line in list is what i try to do.
How fast do you need it and why? How fast was Parallel.ForEach and why do you think that that isn't already something with background threads? Maybe you should actually show us the single-threaded code and your attempt at using Parallel.ForEach because there's every chance that you just did it wrong. In fact, if you are correct that each thread does a For Each loop then you definitely did it wrong.
Sure, here it is: (Trying to make a proxy scraper)
Cant find the code where i did it singelthreaded anymore but this is parallel, and this is just to slow for a 15k line file.Code:Async Sub ProxCheckerhttp()
Dim OpenFileDlg As New OpenFileDialog With {
.FileName = "Select Combo", ' Default file name
.DefaultExt = ".txt", ' Default file extension
.Filter = "Text Files (*.txt)|*.TXT",
.Multiselect = False,
.RestoreDirectory = True
}
' Show open file dialog box
Dim saveFileDialog1 As New SaveFileDialog With {
.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*",
.FilterIndex = 1,
.RestoreDirectory = True}
Try
If OpenFileDlg.ShowDialog() = DialogResult.OK And saveFileDialog1.ShowDialog() = DialogResult.OK Then
Await Task.Run(Sub()
'Get each line from the file
Dim lines() As String = IO.File.ReadAllLines(OpenFileDlg.FileName)
'Iterate through each line
Parallel.ForEach(lines, Sub(item As String)
Try
Dim httpWebRequest As HttpWebRequest = CType(WebRequest.Create("http://www.google.com"), HttpWebRequest)
httpWebRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
httpWebRequest.Timeout = 4500
httpWebRequest.ReadWriteTimeout = 8000
httpWebRequest.Proxy = New WebProxy(item)
Dim httpWebResponse As HttpWebResponse = CType(httpWebRequest.GetResponse(), HttpWebResponse)
workinglist.Add(item)
Console.ForegroundColor = ConsoleColor.Green
Console.WriteLine(item)
Console.ForegroundColor = ConsoleColor.White
proxychecked += 1
proxyworking += 1
Catch ex As WebException
proxychecked += 1
proxydead += 1
Catch ex As System.UriFormatException
proxychecked += 1
proxydead += 1
Catch ex As Exception
End Try
Console.Title = "Checking Proxies | " + Convert.ToString(proxychecked) + "/" + lines.Count.ToString + " Working: " + Convert.ToString(proxyworking) + " Dead: " + Convert.ToString(proxydead)
End Sub)
End Sub)
Dim filee As System.IO.StreamWriter
filee = My.Computer.FileSystem.OpenTextFileWriter(saveFileDialog1.FileName, True)
For Each item In workinglist
filee.WriteLine(item)
Next
filee.Close()
Module1.ShowNotify("Proxy Checker is done!", "Check the results out.")
End If
Catch ex As System.IO.FileNotFoundException
Console.WriteLine("We couldn't find that file... Restarting in 3 seconds.")
Threading.Thread.Sleep(3000)
Console.Clear()
ProxChecker()
End Try
I am only guessing here so I could be wrong but the limiting factor is going to be how long each attempt takes, if you are taking 8 seconds to timeout and you have 15,000 things to check you could be suffering a lot of time penalties there.
Anything you do that uses the threadpool is going to hit the limits of the threadpool (IIRC it is about 25 threads per core as a maximum, often less than that, as a default), if you managed the threads yourself you could create more but even then there will be limits to just how much you can speed this up by using multiple threads.
Roughly how long is it taking to process all of the entries? What is the ratio of working to non-working proxies? How long does it take to process a failing / working proxy?
I have no idea how to keep track of that, but i will lower the timeouts and see how it goes!
Also understand a computer has limits.
A single-core CPU can only really execute one thing at a time. Threads are sort of simulated in that environment by using "time slicing" and noting that if it can execute billions of instructions per second, then it can divide each second into several "pieces" and devote millions of instructions to different things. To humans, that looks enough like "doing multiple things at once" we're happy. Multiple-core CPUs, same thing, just more cores.
There are costs to all of this. The act of switching between tasks is called a "context switch" and involves a little bit of work. If multi-core CPUs don't share cache memory, time can be spent waiting for caches to update. All of this limits exactly how many things a CPU can do in some unit of time.
Now, you're also trying to make network requests. Your hardware has buffers and can only maintain so many open connections. It's also being shared with everything else on your machine that wants to make connections. When receiving data, that data has to go into buffers, then your program has to read it. This can take some time. So, conceptually, if you try to open thousands of connections at once, you'll go far slower than if you opened fewer connections at a time.
So for both threads and connections, if we were to graph performance vs. number of threads/connections, we'd see that we get faster up to some point, then suddenly much, much slower as we overwhelm the system.
So up-front, if you're expecting to be able to do 15,000 tests simultaneously, this is a bad expectation and won't work.
It's far more likely the best scheduling algorithm will plan at most 2 threads per CPU core you have, which likely means 8-16 total threads. If we assume the maximum 16, then you're going to have to make roughly 1,000 8-second cycles to finish, so 2 hours is a reasonable expectation. PlausiblyDamp mentioned maybe 25 threads per core. This could or could not work, it all depends on how much I/O contention that introduces. I think he's right that lowering the timeout is sensible: if a proxy has more than 500ms latency I'm not very interested in it. That's almost an 8x reduction in your total runtime for doing practically nothing.
But all said and done, what you're writing is more or less a war dialer/port scanner, and these are slow tools.
I suppose you could use something like https://msdn.microsoft.com/en-us/lib...v=vs.110).aspx to record the times for each request, it would be accurate enough to get a good idea where the time is being spent. Ultimately though if you are looking at performance you will need some way of timing things, either through code of by using a profiler. If you don't know where the delays are then it is very hard to remove them.