A survey of field-based testing techniques

A Bertolino, P Braione, GD Angelis, L Gazzola… - ACM Computing …, 2021 - dl.acm.org
Field testing refers to testing techniques that operate in the field to reveal those faults that
escape in-house testing. Field testing techniques are becoming increasingly popular with …

Addressing failures in exascale computing

M Snir, RW Wisniewski, JA Abraham… - … Journal of High …, 2014 - journals.sagepub.com
We present here a report produced by a workshop on 'Addressing failures in exascale
computing'held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to …

Scalable temporal order analysis for large scale debugging

DH Ahn, BR De Supinski, I Laguna, GL Lee… - Proceedings of the …, 2009 - dl.acm.org
We present a scalable temporal order analysis technique that supports debugging of large
scale applications by classifying MPI tasks based on their logical program execution order …

Debugging high-performance computing applications at massive scales

I Laguna, DH Ahn, BR De Supinski, T Gamblin… - Communications of the …, 2015 - dl.acm.org
Debugging high-performance computing applications at massive scales Page 1 72
COMMUNICATIONS OF THE ACM | SEPTEMBER 2015 | VOL. 58 | NO. 9 DOI:10.1145/2667219 …

Large scale debugging of parallel tasks with automaded

I Laguna, T Gamblin, BR de Supinski… - Proceedings of 2011 …, 2011 - dl.acm.org
Developing correct HPC applications continues to be a challenge as the number of cores
increases in today's largest systems. Most existing debugging techniques perform poorly at …

Diagnosing performance bottlenecks in emerging petascale applications

NR Tallent, JM Mellor-Crummey, L Adhianto… - Proceedings of the …, 2009 - dl.acm.org
Cutting-edge science and engineering applications require petascale computing. It is,
however, a significant challenge to use petascale computing platforms effectively …

Vrisha: using scaling properties of parallel programs for bug detection and localization

B Zhou, M Kulkarni, S Bagchi - … of the 20th international symposium on …, 2011 - dl.acm.org
Detecting and isolating bugs that arise in parallel programs is a tedious and a challenging
task. An especially subtle class of bugs are those that are scale-dependent: while small …

Dyninst and mrnet: Foundational infrastructure for parallel tools

WR Williams, X Meng, B Welton, BP Miller - Tools for High Performance …, 2016 - Springer
Parallel tools require common pieces of infrastructure: the ability to control, monitor, and
instrument programs, and the ability to massively scale these operations as the application …

WuKong: automatically detecting and localizing bugs that manifest at large system scales

B Zhou, J Too, M Kulkarni, S Bagchi - Proceedings of the 22nd …, 2013 - dl.acm.org
A key challenge in developing large scale applications is finding bugs that are latent at the
small scales of testing, but manifest themselves when the application is deployed at a large …

Scalable performance analysis of exascale mpi programs through signature-based clustering algorithms

A Bahmani, F Mueller - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
Extreme-scale computing poses a number of challenges to application performance.
Developers need to study application behavior by collecting detailed information with the …