A probabilistic machine learning approach to scheduling parallel loops with Bayesian optimization

K Kim, Y Kim, S Park - IEEE Transactions on Parallel and …, 2020 - ieeexplore.ieee.org
This article proposes Bayesian optimization augmented factoring self-scheduling (BO FSS),
a new parallel loop scheduling strategy. BO FSS is an automatic tuning variant of the …

Automated scheduling algorithm selection and chunk parameter calculation in openmp

A Mohammed, JH Müller Korndörfer… - IEEE Transactions on …, 2022 - edoc.unibas.ch
Increasing node and cores-per-node counts in supercomputers render scheduling and load
balancing critical for exploiting parallelism. OpenMP applications can achieve high …

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …

An adaptive self‐scheduling loop scheduler

J Dennis Booth, P Allen Lane - Concurrency and Computation …, 2022 - Wiley Online Library
Many shared‐memory parallel irregular applications, such as sparse linear algebra and
graph algorithms, depend on efficient loop scheduling (LS) in a fork‐join manner despite …

A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler

JD Booth, P Lane - ACM Transactions on Architecture and Code …, 2023 - dl.acm.org
Parallelizing code in a shared-memory environment is commonly done utilizing loop
scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is …

Chunking loops with non-uniform workloads

IK Prabhu, VK Nandivada - Proceedings of the 34th ACM International …, 2020 - dl.acm.org
Task-parallel languages such as X10 implement dynamic lightweight task-parallel execution
model, where programmers are encouraged to express the ideal parallelism in the program …

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

V Freitas, LL Pilla, AL Santana, M Castro… - Journal of Parallel and …, 2021 - Elsevier
The scalability of high-performance, parallel iterative applications is directly affected by how
well they use the available computing resources. These applications are subject to load …

Automated Scheduling Algorithm Selection in OpenMP

FM Ciorba, A Mohammed… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Scientific and data analysis applications are increasingly complex, with evolving
computational and memory requirements during execution. Conversely, modern high …

[图书][B] A case for simple, low-cost autotuning heuristics for efficient performance of irregular algorithms

PA Lane - 2022 - search.proquest.com
Irregular algorithms (ie, algorithms that make irregular memory accesses) are notorious for
being difficult to optimize on parallel architectures due to load imbalance in the number of …

[PDF][PDF] COWS for High Performance

P MISHRA, VK NANDIVADA - 2023 - cse.iitm.ac.in
COWS for High Performance Page 1 COWS for High Performance Cost Aware Work Stealing for
Irregular Parallel Loops PRASOON MISHRA, Indian Institute of Technology Madras, India V …