BHive: A benchmark suite and measurement framework for validating x86-64 basic block performance...

M Phothilimthana, S Abu-El-Haija… - Advances in …, 2024 - proceedings.neurips.cc

Precise hardware performance models play a crucial role in code optimizations. They can
assist compilers in making heuristic decisions or aid autotuners in identifying the optimal …

被引用次数：13 相关文章所有 7 个版本

[PDF] arxiv.org

nanoBench: A low-overhead tool for running microbenchmarks on x86 systems

A Abel, J Reineke - … on Performance Analysis of Systems and …, 2020 - ieeexplore.ieee.org

We present nanoBench, a tool for evaluating small microbenchmarks using hardware
performance counters on Intel and AMD x86 systems. Most existing tools and libraries are …

被引用次数：62 相关文章所有 4 个版本

[PDF] acm.org

APT-GET: profile-guided timely software prefetching

S Jamilan, TA Khan, G Ayers, B Kasikci… - Proceedings of the …, 2022 - dl.acm.org

Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …

被引用次数：28 相关文章所有 7 个版本

[PDF] arxiv.org

Difftune: Optimizing cpu simulator parameters with learned differentiable surrogates

A Renda, Y Chen, C Mendis… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org

CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer
from inaccuracies due to the cost and complexity of setting their fine-grained parameters …

被引用次数：43 相关文章所有 11 个版本

[PDF] arxiv.org

GRANITE: A graph neural network model for basic block throughput estimation

O Sýkora, PM Phothilimthana, C Mendis… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

Analytical hardware performance models yield swift estimation of desired hardware
performance metrics. However, developing these analytical models for modern processors …

被引用次数：18 相关文章所有 7 个版本

[PDF] acm.org

uiCA: Accurate throughput prediction of basic blocks on recent Intel microarchitectures

A Abel, J Reineke - Proceedings of the 36th ACM International …, 2022 - dl.acm.org

Performance models that statically predict the steady-state throughput of basic blocks on
particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide …

被引用次数：31 相关文章所有 5 个版本

[PDF] arxiv.org

Facile: Fast, accurate, and interpretable basic-block throughput prediction

A Abel, S Sharma, J Reineke - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Basic-block throughput models such as uiCA, IACA, GRANITE, Ithemal, llvm-mca, OSACA,
or CQA guide optimizing compilers and help performance engineers identify and eliminate …

被引用次数：5 相关文章所有 4 个版本

[PDF] acm.org Full View

At the locus of performance: Quantifying the effects of copious 3D-stacked cache on HPC workloads

J Domke, E Vatai, B Gerofi, Y Kodama… - ACM Transactions on …, 2023 - dl.acm.org

Over the last three decades, innovations in the memory subsystem were primarily targeted at
overcoming the data movement bottleneck. In this paper, we focus on a specific market trend …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Pmevo: portable inference of port mappings for out-of-order processors by evolutionary optimization

F Ritter, S Hack - Proceedings of the 41st ACM SIGPLAN Conference on …, 2020 - dl.acm.org

Achieving peak performance in a computer system requires optimizations in every layer of
the system, be it hardware or software. A detailed understanding of the underlying hardware …

被引用次数：17 相关文章所有 5 个版本

[PDF] mlsys.org

COMET: Neural Cost Model Explanation Framework

I Chaudhary, A Renda, C Mendis… - … of Machine Learning …, 2024 - proceedings.mlsys.org

Cost models predict the cost of executing given assembly code basic blocks on a specific
microarchitecture. Recently, neural cost models have been shown to be fairly accurate and …