An open source FPGA-optimized out-of-order RISC-V soft processor

S Mashimo, A Fujita, R Matsuo, S Akaki… - … Conference on Field …, 2019 - ieeexplore.ieee.org
High-performance soft processors in field-programmable gate arrays (FPGAs) have become
increasingly important as recent large FPGA systems have relied on soft processors to run …

Filter caching for free: The untapped potential of the store-buffer

R Alves, A Ros, D Black-Schaffer… - Proceedings of the 46th …, 2019 - dl.acm.org
Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding
store-miss latency. The store-buffer needs to be large (for performance) and searched on …

Early address prediction: Efficient pipeline prefetch and reuse

R Alves, S Kaxiras, D Black-Schaffer - ACM Transactions on Architecture …, 2021 - dl.acm.org
Achieving low load-to-use latency with low energy and storage overheads is critical for
performance. Existing techniques either prefetch into the pipeline (via address prediction …

Eole: Combining static and dynamic scheduling through value prediction to reduce complexity and increase performance

A Perais, A Seznec - ACM Transactions on Computer Systems (TOCS), 2016 - dl.acm.org
Recent work in the field of value prediction (VP) has shown that given an efficient confidence
estimation mechanism, prediction validation could be removed from the out-of-order engine …

Fat loads: Exploiting locality amongst contemporaneous load operations to optimize cache accesses

V Baoni, A Mittal, GS Sohi - MICRO-54: 54th Annual IEEE/ACM …, 2021 - dl.acm.org
This paper considers locality among load instructions that are in processing
contemporaneously within a processor to optimize the number of accesses to the memory …

Dynamically disabling way-prediction to reduce instruction replay

R Alves, S Kaxiras… - 2018 IEEE 36th …, 2018 - ieeexplore.ieee.org
Way-predictors have long been used to reduce dynamic cache energy without the
performance loss of serial caches. However, they produce variable-latency hits, as incorrect …

Recycling data slack in out-of-order cores

GS Ravi, M Lipasti - 2019 IEEE International Symposium on …, 2019 - ieeexplore.ieee.org
In order to operate reliably and produce expected outputs, modern processors set timing
margins conservatively at design time to support extreme variations in workload and …

HAIR: Halving the Area of the Integer Register File with Odd/Even Banking

P Michaud, A Peysieux - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org
This article proposes a new microarchitectural scheme for reducing the hardware complexity
of the integer register file of a superscalar processor. The register file is split into two banks …

Out-of-Step Pipeline for Gather/Scatter Instructions

Y Ge, K Yoda, M Ito, T Ichiba… - … , Automation & Test …, 2023 - ieeexplore.ieee.org
Wider SIMD units suffer from low scalability of gather/scatter instructions that appear in
sparse matrix calculations. We address this problem with an out-of-step pipeline which …

乱序超标量处理器核的功耗优化

孙彩霞, 李文哲, 高军, 王永文 - 计算机工程与科学, 2017 - joces.nudt.edu.cn
为了追求更高的性能, 处理器核的主频不断提升, 处理器核的设计日益复杂,
随之而来的是功耗问题越来越突出. 除了在工艺级和电路级采用低功耗技术外 …