A probabilistic machine learning approach to scheduling parallel loops with Bayesian optimization
This article proposes Bayesian optimization augmented factoring self-scheduling (BO FSS),
a new parallel loop scheduling strategy. BO FSS is an automatic tuning variant of the …
a new parallel loop scheduling strategy. BO FSS is an automatic tuning variant of the …
Automated scheduling algorithm selection and chunk parameter calculation in openmp
A Mohammed, JH Müller Korndörfer… - IEEE Transactions on …, 2022 - edoc.unibas.ch
Increasing node and cores-per-node counts in supercomputers render scheduling and load
balancing critical for exploiting parallelism. OpenMP applications can achieve high …
balancing critical for exploiting parallelism. OpenMP applications can achieve high …
COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop
P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …
threads, using a programmer-specified scheduling policy. While the existing scheduling …
An adaptive self‐scheduling loop scheduler
J Dennis Booth, P Allen Lane - Concurrency and Computation …, 2022 - Wiley Online Library
Many shared‐memory parallel irregular applications, such as sparse linear algebra and
graph algorithms, depend on efficient loop scheduling (LS) in a fork‐join manner despite …
graph algorithms, depend on efficient loop scheduling (LS) in a fork‐join manner despite …
A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler
JD Booth, P Lane - ACM Transactions on Architecture and Code …, 2023 - dl.acm.org
Parallelizing code in a shared-memory environment is commonly done utilizing loop
scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is …
scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is …
Chunking loops with non-uniform workloads
IK Prabhu, VK Nandivada - Proceedings of the 34th ACM International …, 2020 - dl.acm.org
Task-parallel languages such as X10 implement dynamic lightweight task-parallel execution
model, where programmers are encouraged to express the ideal parallelism in the program …
model, where programmers are encouraged to express the ideal parallelism in the program …
PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization
The scalability of high-performance, parallel iterative applications is directly affected by how
well they use the available computing resources. These applications are subject to load …
well they use the available computing resources. These applications are subject to load …
Automated Scheduling Algorithm Selection in OpenMP
FM Ciorba, A Mohammed… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Scientific and data analysis applications are increasingly complex, with evolving
computational and memory requirements during execution. Conversely, modern high …
computational and memory requirements during execution. Conversely, modern high …
[图书][B] A case for simple, low-cost autotuning heuristics for efficient performance of irregular algorithms
PA Lane - 2022 - search.proquest.com
Irregular algorithms (ie, algorithms that make irregular memory accesses) are notorious for
being difficult to optimize on parallel architectures due to load imbalance in the number of …
being difficult to optimize on parallel architectures due to load imbalance in the number of …
[PDF][PDF] COWS for High Performance
P MISHRA, VK NANDIVADA - 2023 - cse.iitm.ac.in
COWS for High Performance Page 1 COWS for High Performance Cost Aware Work Stealing for
Irregular Parallel Loops PRASOON MISHRA, Indian Institute of Technology Madras, India V …
Irregular Parallel Loops PRASOON MISHRA, Indian Institute of Technology Madras, India V …