Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs

X Liu, B Wu - Proceedings of the International Conference for High …, 2015 - dl.acm.org
It is difficult to scale parallel programs in a system that employs a large number of cores. To
identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization …

Maximizing hardware prefetch effectiveness with machine learning

S Rahman, M Burtscher, Z Zong… - 2015 IEEE 17th …, 2015 - ieeexplore.ieee.org
Modern processors are equipped with multiple hardware prefetchers, each of which targets
a distinct level in the memory hierarchy and employs a separate prefetching algorithm …

DR-BW: identifying bandwidth contention in NUMA architectures with supervised learning

H Xu, S Wen, A Gimenez, T Gamblin… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Non-Uniform Memory Access (NUMA) architectures are widely used in mainstream multi-
socket computer systems to scale memory bandwidth. Without a NUMA-aware design …

Hpc ontology: Towards a unified ontology for managing training datasets and ai models for high-performance computing

C Liao, PH Lin, G Verma… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Machine learning (ML) techniques have been widely studied to address various challenges
of productively and efficiently running large-scale scientific applications on heterogeneous …

Predator: Predictive false sharing detection

T Liu, C Tian, Z Hu, ED Berger - Proceedings of the 19th ACM SIGPLAN …, 2014 - dl.acm.org
False sharing is a notorious problem for multithreaded applications that can drastically
degrade both performance and scalability. Existing approaches can precisely identify the …

Dynamic error mitigation in NoCs using intelligent prediction techniques

D DiTomaso, T Boraten, A Kodi… - 2016 49th Annual IEEE …, 2016 - ieeexplore.ieee.org
Network-on-chips (NoCs) are quickly becoming the standard communication fabric for multi-
core systems. As technology continues to scale down into the nanometer regime, device …

Laser: Light, accurate sharing detection and repair

L Luo, A Sriraman, B Fugate, S Hu… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Contention for shared memory, in the forms of true sharing and false sharing, is a
challenging performance bug to discover and to repair. Understanding cache contention …

A zero-positive learning approach for diagnosing software performance regressions

M Alam, J Gottschlich, N Tatbul… - Advances in …, 2019 - proceedings.neurips.cc
The field of machine programming (MP), the automation of the development of software, is
making notable research advances. This is, in part, due to the emergence of a wide range of …

Swing to SWT and back: Patterns for API migration by wrapping

TT Bartolomei, K Czarnecki… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
Evolving requirements may necessitate API migration-re-engineering an application to
replace its dependence on one API with the dependence on another API for the same …

Featherlight on-the-fly false-sharing detection

M Chabbi, S Wen, X Liu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Shared-memory parallel programs routinely suffer from false sharing---a performance
degradation caused by different threads accessing different variables that reside on the …