Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores
To foster the energy-efficiency in current and future multi-core processors, the benefits and
trade-offs of a large set of optimization solutions must be evaluated. For this purpose, it is …
trade-offs of a large set of optimization solutions must be evaluated. For this purpose, it is …
[HTML][HTML] CIMAR, NIMAR, and LMMA: Novel algorithms for thread and memory migrations in user space on NUMA systems using hardware counters
This paper introduces two novel algorithms for thread migrations, named CIMAR (Core-
aware Interchange and Migration Algorithm with performance Record–IMAR–) and NIMAR …
aware Interchange and Migration Algorithm with performance Record–IMAR–) and NIMAR …
An extended roofline model with communication-awareness for distributed-memory hpc systems
D Cardwell, F Song - Proceedings of the International Conference on …, 2019 - dl.acm.org
Performance modeling of parallel applications on distributed memory systems is a
challenging task due to the effects of CPU speed, memory access time, and communication …
challenging task due to the effects of CPU speed, memory access time, and communication …
Using an extended Roofline Model to understand data and thread affinities on NUMA systems
OG Lorenzo, TF Pena, JCC Domínguez… - Annals of Multicore …, 2014 - dialnet.unirioja.es
Today's microprocessors include multicores that feature a diverse set of compute cores and
onboard memory subsystems connected by complex communication networks and …
onboard memory subsystems connected by complex communication networks and …
Using performance attributes for managing heterogeneous memory in hpc applications
The complexity of memory systems has increased considerably over the past decade.
Supercomputers may now include several levels of heterogeneous and non-uniform …
Supercomputers may now include several levels of heterogeneous and non-uniform …
Multiobjective optimization technique based on monitoring information to increase the performance of thread migration on multicores
OG Lorenzo, TF Pena, JC Cabaleiro… - 2014 IEEE …, 2014 - ieeexplore.ieee.org
Multicore systems present on-board memory hierarchies and communication networks that
influence their performance when they execute shared memory parallel codes …
influence their performance when they execute shared memory parallel codes …
Performance debugging toolbox for binaries: sensitivity analysis and dependence profiling
F Gruber - 2019 - theses.hal.science
Debugging, as usually understood, revolves around finding and removing defects in
software that prevent it from functioning correctly. That is, when one talks about bugs and …
software that prevent it from functioning correctly. That is, when one talks about bugs and …
Performance analysis of applications in the context of architectural rooflines
Intuitive visual representations of architecture capabilities and the performance of
applications are critical to enabling effective performance analysis, which in turn guides …
applications are critical to enabling effective performance analysis, which in turn guides …
LBMA and IMAR2: Weighted lottery based migration strategies for NUMA multiprocessing servers
Multicore NUMA systems present on‐board memory hierarchies and communication
networks that influence performance when executing shared memory parallel codes …
networks that influence performance when executing shared memory parallel codes …
MD-Roofline: A Training Performance Analysis Model for Distributed Deep Learning
Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it
leaves an enormous challenge for AI researchers and operation engineers to analyze …
leaves an enormous challenge for AI researchers and operation engineers to analyze …