Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs
It is difficult to scale parallel programs in a system that employs a large number of cores. To
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …
Maximizing hardware prefetch effectiveness with machine learning
Modern processors are equipped with multiple hardware prefetchers, each of which targets
a distinct level in the memory hierarchy and employs a separate prefetching algorithm …
a distinct level in the memory hierarchy and employs a separate prefetching algorithm …
DR-BW: identifying bandwidth contention in NUMA architectures with supervised learning
Non-Uniform Memory Access (NUMA) architectures are widely used in mainstream multi-
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …
Hpc ontology: Towards a unified ontology for managing training datasets and ai models for high-performance computing
Machine learning (ML) techniques have been widely studied to address various challenges
of productively and efficiently running large-scale scientific applications on heterogeneous …
of productively and efficiently running large-scale scientific applications on heterogeneous …
Predator: Predictive false sharing detection
False sharing is a notorious problem for multithreaded applications that can drastically
degrade both performance and scalability. Existing approaches can precisely identify the …
degrade both performance and scalability. Existing approaches can precisely identify the …
Dynamic error mitigation in NoCs using intelligent prediction techniques
D DiTomaso, T Boraten, A Kodi… - 2016 49th Annual IEEE …, 2016 - ieeexplore.ieee.org
Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-
core systems. As technology continues to scale down into the nanometer regime, device …
core systems. As technology continues to scale down into the nanometer regime, device …
Laser: Light, accurate sharing detection and repair
L Luo, A Sriraman, B Fugate, S Hu… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Contention for shared memory, in the forms of true sharing and false sharing, is a
challenging performance bug to discover and to repair. Understanding cache contention …
challenging performance bug to discover and to repair. Understanding cache contention …
A zero-positive learning approach for diagnosing software performance regressions
The field of machine programming (MP), the automation of the development of software, is
making notable research advances. This is, in part, due to the emergence of a wide range of …
making notable research advances. This is, in part, due to the emergence of a wide range of …
Swing to SWT and back: Patterns for API migration by wrapping
TT Bartolomei, K Czarnecki… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
Evolving requirements may necessitate API migration-re-engineering an application to
replace its dependence on one API with the dependence on another API for the same …
replace its dependence on one API with the dependence on another API for the same …
Featherlight on-the-fly false-sharing detection
Shared-memory parallel programs routinely suffer from false sharing---a performance
degradation caused by different threads accessing different variables that reside on the …
degradation caused by different threads accessing different variables that reside on the …