|
-
Apr 29th, 2024, 12:02 PM
#1
[RESOLVED] Core Affinity
I am working on a program that could be considered embarrassingly parallel, in that a series of actions is repeated many times, and the actions do not have to be performed sequentially. The program is a genetic algorithm, so there is a population, and each member is evaluated in what amounts to a long series of calculations. Each member of the population could be evaluated independently of one another, but there is no waiting around during an evaluation, as it is just a calculation, albeit a long one.
The bulk of the actual work done is the calculation, so I initially set this up as each evaluation being a Task. I felt that I could launch an evaluation of each member of the population in a different Task, and they would get processed across all available CPU cores. That doesn't appear to be what happened, though, because evaluating each one sequentially was no slower, and seemed slightly faster (it's very hard to measure that) than doing each evaluation in a Task. That would make sense if all the Tasks were queueing on a single core, as the overhead of queuing up the tasks would make the Task approach slightly slower.
I have a different approach that I am considering, though it's probably not worth doing, which is to divide the problem at a much more granular level and spawn a series of Tasks or Threads that would each be an entire genetic algorithm evaluating an entire population. There is no advantage to this, either, unless the threads can take advantage of multiple cores. Such a program would actually be worse in a couple ways, since it would be far more complex and would make it impossible to take reasonable snapshots of the process, but if multiple cores could be used simultaneously, it could be considerably faster. That might be worthwhile, as the program takes four hours to run to completion, so cutting that in half, or more, would save significant time.
What I can't figure out is whether or not it would work, and it seems likely that it would not. I see that there is some means to set the affinity of a process to a single core, but that's kind of the opposite of what I am looking for. I'd like the multiple threads to take advantage of all the cores on the system, not restrict them to a single core, which is what the program is currently doing.
Is that even possible? I see that processes could be assigned to specific cores, but is it possible for Tasks or threads to be assigned to, "whatever is available"?
My usual boring signature: Nothing
 
-
Apr 29th, 2024, 12:19 PM
#2
Re: Core Affinity
If you have multiple cores, then threads should be scheduled across all of them; unless something is forcing affinity or limiting the number of cores you can use.
How are you launching the tasks?
-
Apr 29th, 2024, 01:36 PM
#3
Lively Member
Re: Core Affinity
On the actual implementation level, I would look at 2 options:
Task Parallel Library or Task.WhenAll
Both should take advantage of multiple CPU cores.
-
Apr 29th, 2024, 02:27 PM
#4
Re: Core Affinity
I overlooked the WhenAll, but now that I think about it, it probably isn't ideal for me. I'm using WaitAll because there isn't anything useful that can be done until all the Tasks have been completed. I don't believe that the Task Parallel Library would be the way to go with this, but I'll have to look at that again.
I really don't want to redesign to go for a much more efficient route. I know what that would be, but all I'd gain would be speed, and not a game-changing amount of that, but the interface would be far worse. The more modest type of scenario I have might not impact the interface any, so boosting the speed of it would be tolerable...maybe.
How I'm launching the tasks is just putting them all into an array of Task with:
tskList(x) = Task.Factory.StartNew(calcAction, x)
and once that has been fully populated, I WaitAll on the task array.
All of this is done in a backgroundworker anyways, such that the UI is free. I can barely time the performance. With or without the tasks, it's ripping through the generations. I originally wrote this in VB6 with a long running loop wrapping a DoEvents to keep the UI active while all the evolution stuff was going on. That would progress at a stately rate of a few generations a second, and the total run could take a few days. I re-wrote into VS2003, and apparently brought it somewhat up to date with FW2.0 in VS2005, though I clearly didn't put much effort into it. This re-write, which brought it to FW 4.7.2, got rid of the DoEvents and moves all the long running stuff into a BGW. That old version was also running on a single core Pentium of whatever speed was in vogue back around 2000. That would take at least 36 hours, and possibly many more, to get some result. On my current system, without the Tasks, the time is down to around four hours. So, even without going to any extremes, I've gotten a pretty significant boost in speed.
One result of this performance boost is that, while the Tasks are doing the most computationally expensive part of the operation, I'm still going through several hundred generations a second. They all have to access some common data, too, and the way that is done may be creating too much of a bottleneck, as I have to lock an object to supply the data. I could avoid that by either giving each Task a copy of the data, or by arranging the common data in such a fashion that the access would be thread safe without locking. I opted not to make copies, as the data could be fairly large, and it would mean making millions of copies, over time, which would end up thrashing the memory pretty badly. Since they are all just reading the data, one alternative would be to put the data into a public array and just let them read it directly, which wouldn't need any locking. That would mean some ugly organization, though, and I'm not sure I'm willing to do that.
My usual boring signature: Nothing
 
-
May 1st, 2024, 12:00 AM
#5
Re: Core Affinity
I'd think you should not run each population individual in its own thread unless the calculation per item is going into seconds. you should start some 16 or so threads that work on the population. this eliminates all the overhead of thread creation and switch. you also can make the number of threads depending on the number of cores.
-
May 1st, 2024, 09:32 AM
#6
Re: Core Affinity
Yeah, all that's true or likely so. Having more threads than there are cores, defeats the purpose....and now that I think about it, I might have more cores on this system than I thought. I was thinking I had four, but six would be more likely. I'll have to look at some point.
My understanding of Tasks was that they were lighter weight than creating full threads, and would be managed by the OS to only start up when there was a core free to work on. That's not what is happening, based on what I am seeing, unless they are all on the same core...which looks likely. In that case, using Tasks in the way I was doing was essentially the same as running each one sequentially, with the added overhead of Task management.
I have realized that there is a much better way to do this, which splits the work up more meaningfully, but I'm reluctant to try it, because it would mean that some of the nicer features of the UI would be rendered meaningless. Therefore, I think I won't change it. The time taken is not too steep a price to pay for having a nice UI.
My usual boring signature: Nothing
 
-
May 1st, 2024, 10:26 AM
#7
Re: Core Affinity
 Originally Posted by Shaggy Hiker
Yeah, all that's true or likely so. Having more threads than there are cores, defeats the purpose....and now that I think about it, I might have more cores on this system than I thought. I was thinking I had four, but six would be more likely. I'll have to look at some point.
My understanding of Tasks was that they were lighter weight than creating full threads, and would be managed by the OS to only start up when there was a core free to work on. That's not what is happening, based on what I am seeing, unless they are all on the same core...which looks likely. In that case, using Tasks in the way I was doing was essentially the same as running each one sequentially, with the added overhead of Task management.
I have realized that there is a much better way to do this, which splits the work up more meaningfully, but I'm reluctant to try it, because it would mean that some of the nicer features of the UI would be rendered meaningless. Therefore, I think I won't change it. The time taken is not too steep a price to pay for having a nice UI.
IIRC Tasks are delegated to the ThreadPool when created using Task.Run - this allows the ThreadPool itself to balance the number of running threads.
If you run the following Console app
Code:
Imports System.Runtime.InteropServices
Module Program
Sub Main()
Dim tasks As New List(Of Task)
For i = 0 To 100
Dim x = i
tasks.Add(Task.Run(Sub() DoStuff(x)))
Next
Task.WaitAll(tasks.ToArray)
End Sub
Private Async Sub DoStuff(i As Integer)
Console.WriteLine($"Starting Task {i} on Thread {Environment.CurrentManagedThreadId}, on cpu {GetCurrentProcessorNumber()}, {ThreadPool.PendingWorkItemCount()} Items queued")
Task.Delay(5000).Wait()
Console.WriteLine($"Ending Task {i} on Thread {Environment.CurrentManagedThreadId}, on cpu {GetCurrentProcessorNumber()}, {ThreadPool.PendingWorkItemCount()} Items queued")
End Sub
<DllImport("Kernel32.dll")>
Public Function GetCurrentProcessorNumber() As Integer
End Function
End Module
You will see the output creates multiple tasks, some using the same thread, and the threads are using different cpus / cores. Note that a thread may also move between cores.
Last edited by PlausiblyDamp; May 2nd, 2024 at 06:18 AM.
-
May 1st, 2024, 12:48 PM
#8
Re: Core Affinity
I'm sure it is me but I've been lost in the description of what you are doing. So I wrote this example, a very simple one at that. I promise that it will drive 6 cores at 100%. Don't know if it helps...
Code:
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
'make up some stuff
Button1.Enabled = False
Label1.Text = "GO"
Debug.WriteLine("")
PopulationQ = New Concurrent.ConcurrentQueue(Of Population)
Dim stuff() As String = {"Shaggy", "digitalShaman", "PlausiblyDamp", "BlackRiver1987", "fredflintstone", "dbasnett"}
For Each nm As String In stuff
Dim foo As New Population
foo.Name = nm
PopulationQ.Enqueue(foo)
Next
Dim sTsk As Task = Task.Run(Sub() TheyAreOff())
Await sTsk
Label1.Text = "DONE"
Button1.Enabled = True
End Sub
Private PopulationQ As Concurrent.ConcurrentQueue(Of Population)
Private PopulationList As List(Of Population)
Private Sub TheyAreOff() 'it is now post time...
Const numTasks As Integer = 3 ' Environment.ProcessorCount - 2
PopulationList = New List(Of Population)
For Each pop As Population In PopulationQ
PopulationList.Add(pop)
pop.PopTask = Task.Run(Sub()
PopulationProc()
End Sub)
'' >>>>>> this code is to limit how many task start
'Threading.Thread.Sleep(2) 'so the task has a chance to start
'Dim running As Integer
'Do
' running = (From p In PopulationList
' Where p.PopTask.Status = TaskStatus.Running
' Select p).Count
' If running < numTasks Then
' Exit Do
' Else
' Threading.Thread.Sleep(5)
' End If
'Loop
'' <<<<<< end limit
Next
Task.WaitAll((From p In PopulationList Select p.PopTask).ToArray)
End Sub
Private Class Population
Public Name As String
Public PopTask As Task
'etc..
End Class
Private Sub PopulationProc()
Dim aPopulation As Population = Nothing
While aPopulation Is Nothing
PopulationQ.TryDequeue(aPopulation)
End While
Dim x As Long
' Debug.WriteLine(Long.MaxValue >> 29)
' the following was the first time I've ever heard my fans run :)
For x = 1L To Long.MaxValue >> 29
Dim l As Long = x \ 2L
Next
Debug.WriteLine(aPopulation.Name)
End Sub
-
May 1st, 2024, 02:23 PM
#9
Re: Core Affinity
Ah, I was using Task.Factory.Startnew for an incorrect reason. I see it has been deprecated, to some extent. Switching over to Task.Run has caused the CPU usage to greatly increase, and the performance...well, the performance is a bit erratic. It was looking like it had tripled until I came in here to write this, at which point it dropped way back. That may be because the OS is seeing that Chrome is gobbling up some resources, as well, and it is sharing...to the disadvantage of the process. CPU usage is still elevated above what I had been seeing, so I do think it is working...and now performance is almost twice what it had been in the sequential approach.
My usual boring signature: Nothing
 
-
May 1st, 2024, 02:27 PM
#10
Re: [RESOLVED] Core Affinity
Hmmm, a bit of playing around suggests that what I REALLY need to do is run this on a computer that I'm not also using to surf the web.
My usual boring signature: Nothing
 
-
May 2nd, 2024, 03:40 AM
#11
Re: [RESOLVED] Core Affinity
All threads have a priority. Those with the higher priority are executed before those with a lower priority. Have a look at
https://learn.microsoft.com/en-us/wi...ing-priorities
All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/
C++23 Compiler: Microsoft VS2022 (17.6.5)
-
May 2nd, 2024, 08:16 AM
#12
Re: [RESOLVED] Core Affinity
Yeah, I'm aware of priorities, but my understanding is that these run of the mill process threads will all have the same, basic, priority. My understanding is that elevating thread priorities should be done solely for pretty specialized things, such as core OS functionality.
My usual boring signature: Nothing
 
-
May 2nd, 2024, 08:24 AM
#13
Re: [RESOLVED] Core Affinity
I'm still playing around with this. At one point, I felt that the threads had hung up, possibly because of a deadlock (or at least, that what I was thinking might be the cause), so I went back to sequential operations because I really wanted to get some results. While doing other things on the computer, I came to realize that when I thought the threads were hung...they may not have been. The genetic algorithm has to evaluate every genome. Normally, this is speed S, but after observing the behavior for some time, I realized that some populations will have a speed NS, where N could be 10-20, or more. In other words, they're moving along, but some populations can be FAR slower to evaluate than others.
My test dataset was small and quick. Now that I'm evaluating some actual data, I'm learning some new things about the performance.
My usual boring signature: Nothing
 
-
May 2nd, 2024, 08:58 AM
#14
Re: [RESOLVED] Core Affinity
How many threads are we talking about? Since the threads appear to be CPU bound don't start more than there are cores, maybe even a few less.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|