dcsimg
Results 1 to 7 of 7

Thread: Fastest way for multithreaded for each loop?

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Jan 2018
    Posts
    18

    Fastest way for multithreaded for each loop?

    Already tried parallel.foreach but wasnt fast enough. maybe something with background threads? couldnt get it to work because when i try it it does a for each on every background thread i start.
    for each line in list is what i try to do.

  2. #2
    .NUT jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    99,416

    Re: Fastest way for multithreaded for each loop?

    How fast do you need it and why? How fast was Parallel.ForEach and why do you think that that isn't already something with background threads? Maybe you should actually show us the single-threaded code and your attempt at using Parallel.ForEach because there's every chance that you just did it wrong. In fact, if you are correct that each thread does a For Each loop then you definitely did it wrong.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3

    Thread Starter
    Junior Member
    Join Date
    Jan 2018
    Posts
    18

    Re: Fastest way for multithreaded for each loop?

    Sure, here it is: (Trying to make a proxy scraper)
    Code:
    Async Sub ProxCheckerhttp()
    
            Dim OpenFileDlg As New OpenFileDialog With {
                    .FileName = "Select Combo", ' Default file name
                    .DefaultExt = ".txt", ' Default file extension
                    .Filter = "Text Files (*.txt)|*.TXT",
                    .Multiselect = False,
                    .RestoreDirectory = True
                }
            ' Show open file dialog box
            Dim saveFileDialog1 As New SaveFileDialog With {
        .Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*",
        .FilterIndex = 1,
        .RestoreDirectory = True}
    
    
            Try
                If OpenFileDlg.ShowDialog() = DialogResult.OK And saveFileDialog1.ShowDialog() = DialogResult.OK Then
                    Await Task.Run(Sub()
                                       'Get each line from the file
                                       Dim lines() As String = IO.File.ReadAllLines(OpenFileDlg.FileName)
    
                                       'Iterate through each line
                                       Parallel.ForEach(lines, Sub(item As String)
                                                                   Try
                                                                       Dim httpWebRequest As HttpWebRequest = CType(WebRequest.Create("http://www.google.com"), HttpWebRequest)
                                                                       httpWebRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
                                                                       httpWebRequest.Timeout = 4500
                                                                       httpWebRequest.ReadWriteTimeout = 8000
                                                                       httpWebRequest.Proxy = New WebProxy(item)
                                                                       Dim httpWebResponse As HttpWebResponse = CType(httpWebRequest.GetResponse(), HttpWebResponse)
                                                                       workinglist.Add(item)
                                                                       Console.ForegroundColor = ConsoleColor.Green
                                                                       Console.WriteLine(item)
                                                                       Console.ForegroundColor = ConsoleColor.White
                                                                       proxychecked += 1
                                                                       proxyworking += 1
                                                                   Catch ex As WebException
                                                                       proxychecked += 1
                                                                       proxydead += 1
                                                                   Catch ex As System.UriFormatException
                                                                       proxychecked += 1
                                                                       proxydead += 1
                                                                   Catch ex As Exception
                                                                   End Try
                                                                   Console.Title = "Checking Proxies | " + Convert.ToString(proxychecked) + "/" + lines.Count.ToString + " Working: " + Convert.ToString(proxyworking) + " Dead: " + Convert.ToString(proxydead)
    
    
                                                               End Sub)
                                   End Sub)
                    Dim filee As System.IO.StreamWriter
                    filee = My.Computer.FileSystem.OpenTextFileWriter(saveFileDialog1.FileName, True)
                    For Each item In workinglist
                        filee.WriteLine(item)
                    Next
                    filee.Close()
    
                    Module1.ShowNotify("Proxy Checker is done!", "Check the results out.")
    
                End If
            Catch ex As System.IO.FileNotFoundException
                Console.WriteLine("We couldn't find that file... Restarting in 3 seconds.")
                Threading.Thread.Sleep(3000)
                Console.Clear()
                ProxChecker()
            End Try
    Cant find the code where i did it singelthreaded anymore but this is parallel, and this is just to slow for a 15k line file.

  4. #4
    Fanatic Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    826

    Re: Fastest way for multithreaded for each loop?

    I am only guessing here so I could be wrong but the limiting factor is going to be how long each attempt takes, if you are taking 8 seconds to timeout and you have 15,000 things to check you could be suffering a lot of time penalties there.

    Anything you do that uses the threadpool is going to hit the limits of the threadpool (IIRC it is about 25 threads per core as a maximum, often less than that, as a default), if you managed the threads yourself you could create more but even then there will be limits to just how much you can speed this up by using multiple threads.

    Roughly how long is it taking to process all of the entries? What is the ratio of working to non-working proxies? How long does it take to process a failing / working proxy?

  5. #5

    Thread Starter
    Junior Member
    Join Date
    Jan 2018
    Posts
    18

    Re: Fastest way for multithreaded for each loop?

    I have no idea how to keep track of that, but i will lower the timeouts and see how it goes!

  6. #6
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,580

    Re: Fastest way for multithreaded for each loop?

    Also understand a computer has limits.

    A single-core CPU can only really execute one thing at a time. Threads are sort of simulated in that environment by using "time slicing" and noting that if it can execute billions of instructions per second, then it can divide each second into several "pieces" and devote millions of instructions to different things. To humans, that looks enough like "doing multiple things at once" we're happy. Multiple-core CPUs, same thing, just more cores.

    There are costs to all of this. The act of switching between tasks is called a "context switch" and involves a little bit of work. If multi-core CPUs don't share cache memory, time can be spent waiting for caches to update. All of this limits exactly how many things a CPU can do in some unit of time.

    Now, you're also trying to make network requests. Your hardware has buffers and can only maintain so many open connections. It's also being shared with everything else on your machine that wants to make connections. When receiving data, that data has to go into buffers, then your program has to read it. This can take some time. So, conceptually, if you try to open thousands of connections at once, you'll go far slower than if you opened fewer connections at a time.

    So for both threads and connections, if we were to graph performance vs. number of threads/connections, we'd see that we get faster up to some point, then suddenly much, much slower as we overwhelm the system.

    So up-front, if you're expecting to be able to do 15,000 tests simultaneously, this is a bad expectation and won't work.

    It's far more likely the best scheduling algorithm will plan at most 2 threads per CPU core you have, which likely means 8-16 total threads. If we assume the maximum 16, then you're going to have to make roughly 1,000 8-second cycles to finish, so 2 hours is a reasonable expectation. PlausiblyDamp mentioned maybe 25 threads per core. This could or could not work, it all depends on how much I/O contention that introduces. I think he's right that lowering the timeout is sensible: if a proxy has more than 500ms latency I'm not very interested in it. That's almost an 8x reduction in your total runtime for doing practically nothing.

    But all said and done, what you're writing is more or less a war dialer/port scanner, and these are slow tools.
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  7. #7
    Fanatic Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    826

    Re: Fastest way for multithreaded for each loop?

    Quote Originally Posted by SoldierCrimes View Post
    I have no idea how to keep track of that, but i will lower the timeouts and see how it goes!
    I suppose you could use something like https://msdn.microsoft.com/en-us/lib...v=vs.110).aspx to record the times for each request, it would be accurate enough to get a good idea where the time is being spent. Ultimately though if you are looking at performance you will need some way of timing things, either through code of by using a profiler. If you don't know where the delays are then it is very hard to remove them.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width