Tpugraphs: A performance prediction dataset on large tensor computational graphs
M Phothilimthana, S Abu-El-Haija… - Advances in …, 2024 - proceedings.neurips.cc
Precise hardware performance models play a crucial role in code optimizations. They can
assist compilers in making heuristic decisions or aid autotuners in identifying the optimal …
assist compilers in making heuristic decisions or aid autotuners in identifying the optimal …
nanoBench: A low-overhead tool for running microbenchmarks on x86 systems
We present nanoBench, a tool for evaluating small microbenchmarks using hardware
performance counters on Intel and AMD x86 systems. Most existing tools and libraries are …
performance counters on Intel and AMD x86 systems. Most existing tools and libraries are …
APT-GET: profile-guided timely software prefetching
Prefetching which predicts future memory accesses and preloads them from main memory,
is a widely-adopted technique to overcome the processor-memory performance gap …
is a widely-adopted technique to overcome the processor-memory performance gap …
Difftune: Optimizing cpu simulator parameters with learned differentiable surrogates
CPU simulators are useful tools for modeling CPU execution behavior. However, they suffer
from inaccuracies due to the cost and complexity of setting their fine-grained parameters …
from inaccuracies due to the cost and complexity of setting their fine-grained parameters …
GRANITE: A graph neural network model for basic block throughput estimation
Analytical hardware performance models yield swift estimation of desired hardware
performance metrics. However, developing these analytical models for modern processors …
performance metrics. However, developing these analytical models for modern processors …
uiCA: Accurate throughput prediction of basic blocks on recent Intel microarchitectures
Performance models that statically predict the steady-state throughput of basic blocks on
particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide …
particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide …
Facile: Fast, accurate, and interpretable basic-block throughput prediction
Basic-block throughput models such as uiCA, IACA, GRANITE, Ithemal, llvm-mca, OSACA,
or CQA guide optimizing compilers and help performance engineers identify and eliminate …
or CQA guide optimizing compilers and help performance engineers identify and eliminate …
At the locus of performance: Quantifying the effects of copious 3D-stacked cache on HPC workloads
Over the last three decades, innovations in the memory subsystem were primarily targeted at
overcoming the data movement bottleneck. In this paper, we focus on a specific market trend …
overcoming the data movement bottleneck. In this paper, we focus on a specific market trend …
Pmevo: portable inference of port mappings for out-of-order processors by evolutionary optimization
Achieving peak performance in a computer system requires optimizations in every layer of
the system, be it hardware or software. A detailed understanding of the underlying hardware …
the system, be it hardware or software. A detailed understanding of the underlying hardware …
COMET: Neural Cost Model Explanation Framework
Cost models predict the cost of executing given assembly code basic blocks on a specific
microarchitecture. Recently, neural cost models have been shown to be fairly accurate and …
microarchitecture. Recently, neural cost models have been shown to be fairly accurate and …