MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU

S Ramesh, A Mahéo, S Shende, AD Malony… - Proceedings of the 24th …, 2017 - dl.acm.org
MPI implementations are becoming increasingly complex and highly tunable, and thus
scalability limitations can come from numerous sources. The MPI Tools Interface (MPI_T) …

Designing a profiling and visualization tool for scalable and in-depth analysis of high-performance GPU clusters

P Kousha, B Ramesh, KK Suresh… - 2019 IEEE 26th …, 2019 - ieeexplore.ieee.org
The recent advent of advanced fabrics like NVIDIA NVLink is enabling the deployment of
dense Graphics Processing Unit (GPU) systems, eg, DGX-2 and Summit. The Message …

INAM2: InfiniBand Network Analysis and Monitoring with MPI

H Subramoni, AM Augustine, M Arnold… - … Conference on High …, 2016 - Springer
Modern high-end computing is being driven by the tight integration of several hardware and
software components. On the hardware front, there are the multi-/many-core architectures …

A survey of methods for collective communication optimization and tuning

U Wickramasinghe, A Lumsdaine - arXiv preprint arXiv:1611.06334, 2016 - arxiv.org
New developments in HPC technology in terms of increasing computing power on
multi/many core processors, high-bandwidth memory/IO subsystems and communication …

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters

M Wasi-ur-Rahman, NS Islam, X Lu, D Shankar… - Journal of Parallel and …, 2018 - Elsevier
MapReduce is the most popular parallel computing framework for big data processing which
allows massive scalability across distributed computing environment. Advanced RDMA …

Planning for performance: Enhancing achievable performance for MPI through persistent collective operations

DJ Holmes, B Morgan, A Skjellum, PV Bangalore… - Parallel Computing, 2019 - Elsevier
Advantages of nonblocking collective communication in MPI have been established over the
past quarter century, even predating MPI-1. For regular computations with fixed …

Enabling callback-driven runtime introspection via MPI_T

MA Hermanns, NT Hjlem, M Knobloch… - Proceedings of the 25th …, 2018 - dl.acm.org
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …

Planning for performance: persistent collective operations for MPI

B Morgan, DJ Holmes, A Skjellum… - Proceedings of the 24th …, 2017 - dl.acm.org
Advantages of nonblocking collective communication in MPI have been established over the
past quarter century, even predating MPI-1. For regular computations with fixed …

Communication optimization technology based on network dynamic performance model

X Cui, X Li, B Wang - Mathematical Problems in Engineering, 2020 - Wiley Online Library
This work analyses different communication modes in applications of supercomputing,
proposes a communication dynamic performance model based on topology awareness, and …

Sonar: Automated communication characterization for hpc applications

S Lammel, F Zahn, H Fröning - … , E-MuCoCoS, HPC-IODC, IXPUG, IWOPH …, 2016 - Springer
Future computing systems will need to operate within hard power and energy constraints,
this is particularly true for Exascale-class systems. These constraints are hard for technical …