ECP software technology capability assessment report

MA Heroux, LC McInnes, R Thakur, JS Vetter, XS Li… - 2020 - osti.gov
The Exascale Computing Project (ECP) Software Technology (ST) Focus Area is
responsible for developing critical software capabilities that will enable successful execution …

X-OpenMP—eXtreme fine-grained tasking using lock-less work stealing

P Nookala, K Chard, I Raicu - Future Generation Computer Systems, 2024 - Elsevier
Processors with 100s of threads of execution are among the state-of-the-art in high-end
computing systems. This transition to many-core computing has required the community to …

Finer-lru: A scalable page management scheme for hpc manycore architectures

J Bang, C Kim, S Kim, Q Chen, C Lee… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
In HPC systems, the increasing need for a higher level of concurrency has led to packing
more cores within a single chip. However, since multiple processes share memory space …

Software combining to mitigate multithreaded MPI contention

A Amer, C Archer, M Blocksome, C Cao… - Proceedings of the …, 2019 - dl.acm.org
Efforts to mitigate lock contention from concurrent threaded accesses to MPI have reduced
contention through fine-grained locking, avoided locking altogether by offloading …

Runtime techniques for efficient execution of virtualized, migratable MPI ranks

S White - 2022 - ideals.illinois.edu
Abstract The Message Passing Interface (MPI) is the dominant programming system for
scientific applications that run on distributed memory parallel computers. MPI is a library …

ECP Software Technology Capability Assessment Report V3. 0

MA Heroux, LC McInnes, R Thakur, JS Vetter, XS Li… - 2022 - osti.gov
The Exascale Computing Project (ECP) Software Technology (ST) focus area is responsible
for (1) developing critical software capabilities that will enable the successful execution of …

A Survey on Minimizing Lock Contention in Shared Resources in Linux Kernel

S Cho, S Lee, KT Pham, NL Anh… - … on Information and …, 2022 - ieeexplore.ieee.org
Many programs in multi-core environment use shared-memory parallelism using multi-
threading. The multiple threads typically use locks to coordinate access the shared …

I/O Performance Optimization Schemes for Manycore HPC Systems

방지우 - 2023 - s-space.snu.ac.kr
High-performance computing (HPC) systems are composed of thousands of compute nodes,
storage systems, and high-speed networks, which provide multiple layers of I/O stacks with …

A Fine-Grained Page Management Scheme For Hpc Manycore I/O Systems

J Bang, C Kim, Q Chen, C Lee, EK Byun, H Sung… - O Systems - papers.ssrn.com
In HPC systems, the increasing need for a higher level of concurrency has led to packing
more cores within a single chip. However, since multiple processes share memory space …

Partial aggregation for collective communication in distributed memory machines

R Kowalewski - 2021 - edoc.ub.uni-muenchen.de
Abstract High Performance Computing (HPC) systems interconnect a large number of
Process-ing Elements (PEs) in high-bandwidth networks to simulate complex scientific …