Tpugraphs: A performance prediction dataset on large tensor computational graphs

M Phothilimthana, S Abu-El-Haija… - Advances in …, 2024 - proceedings.neurips.cc
Precise hardware performance models play a crucial role in code optimizations. They can
assist compilers in making heuristic decisions or aid autotuners in identifying the optimal …

nanoBench: A low-overhead tool for running microbenchmarks on x86 systems

A Abel, J Reineke - … on Performance Analysis of Systems and …, 2020 - ieeexplore.ieee.org
We present nanoBench, a tool for evaluating small microbenchmarks using hardware
performance counters on Intel and AMD x86 systems. Most existing tools and libraries are …

APT-GET: profile-guided timely software prefetching

S Jamilan, TA Khan, G Ayers, B Kasikci… - Proceedings of the …, 2022 - dl.acm.org
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …

Difftune: Optimizing cpu simulator parameters with learned differentiable surrogates

A Renda, Y Chen, C Mendis… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer
from inaccuracies due to the cost and complexity of setting their fine-grained parameters …

GRANITE: A graph neural network model for basic block throughput estimation

O Sýkora, PM Phothilimthana, C Mendis… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Analytical hardware performance models yield swift estimation of desired hardware
performance metrics. However, developing these analytical models for modern processors …

uiCA: Accurate throughput prediction of basic blocks on recent Intel microarchitectures

A Abel, J Reineke - Proceedings of the 36th ACM International …, 2022 - dl.acm.org
Performance models that statically predict the steady-state throughput of basic blocks on
particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide …

Facile: Fast, accurate, and interpretable basic-block throughput prediction

A Abel, S Sharma, J Reineke - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Basic-block throughput models such as uiCA, IACA, GRANITE, Ithemal, llvm-mca, OSACA,
or CQA guide optimizing compilers and help performance engineers identify and eliminate …

At the locus of performance: Quantifying the effects of copious 3D-stacked cache on HPC workloads

J Domke, E Vatai, B Gerofi, Y Kodama… - ACM Transactions on …, 2023 - dl.acm.org
Over the last three decades, innovations in the memory subsystem were primarily targeted at
overcoming the data movement bottleneck. In this paper, we focus on a specific market trend …

Pmevo: portable inference of port mappings for out-of-order processors by evolutionary optimization

F Ritter, S Hack - Proceedings of the 41st ACM SIGPLAN Conference on …, 2020 - dl.acm.org
Achieving peak performance in a computer system requires optimizations in every layer of
the system, be it hardware or software. A detailed understanding of the underlying hardware …

COMET: Neural Cost Model Explanation Framework

I Chaudhary, A Renda, C Mendis… - … of Machine Learning …, 2024 - proceedings.mlsys.org
Cost models predict the cost of executing given assembly code basic blocks on a specific
microarchitecture. Recently, neural cost models have been shown to be fairly accurate and …