You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I have an idea to reduce memory latency on NUMA or CCX-based systems.
I've read older issues here on this topic, where it is argued that work stealing is more important to performance than partitioning tasks by NUMA nodes. This makes sense. I'm not familiar with the details of the TaskFlow work stealing algorithm, but would it be possible to give hints as to which threads the work should be stolen from? Given the choice of local queues to steal from, the hint would let the system favour queues from the same NUMA node. Or favour queue from hyperthreading sibling thread.
Combined with thread affinity settings, this could be beneficial for certain memory-bound applications.
The text was updated successfully, but these errors were encountered:
I think it's possible to give a thread a hint about which worker queue to steal. The entire work-stealing function is implemented here. Currently, as you can see, it's completely random. Perhaps can think about how to use w._vtm as the hint?
Perhaps can think about how to use w._vtm as the hint?
Yes, that should work without too much effort.
If there were M preferred workers out of N total workers, then rdvtm could perform random selection alternatively from M preferred and then N total workers to steal work from.
Hello! I have an idea to reduce memory latency on NUMA or CCX-based systems.
I've read older issues here on this topic, where it is argued that work stealing is more important to performance than partitioning tasks by NUMA nodes. This makes sense. I'm not familiar with the details of the TaskFlow work stealing algorithm, but would it be possible to give hints as to which threads the work should be stolen from? Given the choice of local queues to steal from, the hint would let the system favour queues from the same NUMA node. Or favour queue from hyperthreading sibling thread.
Combined with thread affinity settings, this could be beneficial for certain memory-bound applications.
The text was updated successfully, but these errors were encountered: