Cost-effective speculative scheduling in high performance processors

S Mashimo, A Fujita, R Matsuo, S Akaki… - … Conference on Field …, 2019 - ieeexplore.ieee.org

High-performance soft processors in field-programmable gate arrays (FPGAs) have become
increasingly important as recent large FPGA systems have relied on soft processors to run …

被引用次数：47 相关文章所有 5 个版本

[PDF] acm.org

Filter caching for free: The untapped potential of the store-buffer

R Alves, A Ros, D Black-Schaffer… - Proceedings of the 46th …, 2019 - dl.acm.org

Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding
store-miss latency. The store-buffer needs to be large (for performance) and searched on …

被引用次数：12 相关文章所有 11 个版本

[PDF] acm.org Full View

Early address prediction: Efficient pipeline prefetch and reuse

R Alves, S Kaxiras, D Black-Schaffer - ACM Transactions on Architecture …, 2021 - dl.acm.org

Achieving low load-to-use latency with low energy and storage overheads is critical for
performance. Existing techniques either prefetch into the pipeline (via address prediction …

被引用次数：5 相关文章所有 3 个版本

[PDF] hal.science

Eole: Combining static and dynamic scheduling through value prediction to reduce complexity and increase performance

A Perais, A Seznec - ACM Transactions on Computer Systems (TOCS), 2016 - dl.acm.org

Recent work in the field of value prediction (VP) has shown that given an efficient confidence
estimation mechanism, prediction validation could be removed from the out-of-order engine …

被引用次数：16 相关文章所有 6 个版本

Fat loads: Exploiting locality amongst contemporaneous load operations to optimize cache accesses

V Baoni, A Mittal, GS Sohi - MICRO-54: 54th Annual IEEE/ACM …, 2021 - dl.acm.org

This paper considers locality among load instructions that are in processing
contemporaneously within a processor to optimize the number of accesses to the memory …

被引用次数：3 相关文章

[HTML] diva-portal.org

Dynamically disabling way-prediction to reduce instruction replay

R Alves, S Kaxiras… - 2018 IEEE 36th …, 2018 - ieeexplore.ieee.org

Way-predictors have long been used to reduce dynamic cache energy without the
performance loss of serial caches. However, they produce variable-latency hits, as incorrect …

被引用次数：7 相关文章所有 3 个版本

[PDF] academia.edu

Recycling data slack in out-of-order cores

GS Ravi, M Lipasti - 2019 IEEE International Symposium on …, 2019 - ieeexplore.ieee.org

In order to operate reliably and produce expected outputs, modern processors set timing
margins conservatively at design time to support extreme variations in workload and …

被引用次数：7 相关文章所有 5 个版本

[PDF] acm.org Full View

HAIR: Halving the Area of the Integer Register File with Odd/Even Banking

P Michaud, A Peysieux - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org

This article proposes a new microarchitectural scheme for reducing the hardware complexity
of the integer register file of a superscalar processor. The register file is split into two banks …

Out-of-Step Pipeline for Gather/Scatter Instructions

Y Ge, K Yoda, M Ito, T Ichiba… - … , Automation & Test …, 2023 - ieeexplore.ieee.org

Wider SIMD units suffer from low scalability of gather/scatter instructions that appear in
sparse matrix calculations. We address this problem with an out-of-step pipeline which …

[PDF] nudt.edu.cn

乱序超标量处理器核的功耗优化

孙彩霞，李文哲，高军，王永文 - 计算机工程与科学, 2017 - joces.nudt.edu.cn

为了追求更高的性能, 处理器核的主频不断提升, 处理器核的设计日益复杂,
随之而来的是功耗问题越来越突出. 除了在工艺级和电路级采用低功耗技术外 …