X-OpenMP—eXtreme fine-grained tasking using lock-less work stealing

P Nookala, K Chard, I Raicu - Future Generation Computer Systems, 2024 - Elsevier
Processors with 100s of threads of execution are among the state-of-the-art in high-end
computing systems. This transition to many-core computing has required the community to …

Enabling extremely fine-grained parallelism via scalable concurrent queues on modern many-core architectures

P Nookala, P Dinda, KC Hale… - … on Modeling, Analysis …, 2021 - ieeexplore.ieee.org
Enabling efficient fine-grained task parallelism is a significant challenge for hardware
platforms with increasingly many cores. Existing techniques do not scale to hundreds of …

Extreme Fine-Grained Parallelism on Modern Many-Core Architectures

P Nookala - 2022 - search.proquest.com
Processors with 100s of threads of execution and GPUs with 1000s of cores are among the
state-of-the-art in high-end computing systems. This transition to many-core computing has …

[PDF][PDF] Scheduling computations

GMT Rito - 2016 - run.unl.pt
For quite some time, the Work Stealing algorithm has been the de facto standard for
scheduling multithreaded computations. To ensure scalability and achieve high …