An open source FPGA-optimized out-of-order RISC-V soft processor
S Mashimo, A Fujita, R Matsuo, S Akaki… - … Conference on Field …, 2019 - ieeexplore.ieee.org
High-performance soft processors in field-programmable gate arrays (FPGAs) have become
increasingly important as recent large FPGA systems have relied on soft processors to run …
increasingly important as recent large FPGA systems have relied on soft processors to run …
Filter caching for free: The untapped potential of the store-buffer
Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding
store-miss latency. The store-buffer needs to be large (for performance) and searched on …
store-miss latency. The store-buffer needs to be large (for performance) and searched on …
Early address prediction: Efficient pipeline prefetch and reuse
Achieving low load-to-use latency with low energy and storage overheads is critical for
performance. Existing techniques either prefetch into the pipeline (via address prediction …
performance. Existing techniques either prefetch into the pipeline (via address prediction …
Eole: Combining static and dynamic scheduling through value prediction to reduce complexity and increase performance
Recent work in the field of value prediction (VP) has shown that given an efficient confidence
estimation mechanism, prediction validation could be removed from the out-of-order engine …
estimation mechanism, prediction validation could be removed from the out-of-order engine …
Fat loads: Exploiting locality amongst contemporaneous load operations to optimize cache accesses
This paper considers locality among load instructions that are in processing
contemporaneously within a processor to optimize the number of accesses to the memory …
contemporaneously within a processor to optimize the number of accesses to the memory …
Dynamically disabling way-prediction to reduce instruction replay
Way-predictors have long been used to reduce dynamic cache energy without the
performance loss of serial caches. However, they produce variable-latency hits, as incorrect …
performance loss of serial caches. However, they produce variable-latency hits, as incorrect …
Recycling data slack in out-of-order cores
In order to operate reliably and produce expected outputs, modern processors set timing
margins conservatively at design time to support extreme variations in workload and …
margins conservatively at design time to support extreme variations in workload and …
HAIR: Halving the Area of the Integer Register File with Odd/Even Banking
P Michaud, A Peysieux - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org
This article proposes a new microarchitectural scheme for reducing the hardware complexity
of the integer register file of a superscalar processor. The register file is split into two banks …
of the integer register file of a superscalar processor. The register file is split into two banks …
Out-of-Step Pipeline for Gather/Scatter Instructions
Y Ge, K Yoda, M Ito, T Ichiba… - … , Automation & Test …, 2023 - ieeexplore.ieee.org
Wider SIMD units suffer from low scalability of gather/scatter instructions that appear in
sparse matrix calculations. We address this problem with an out-of-step pipeline which …
sparse matrix calculations. We address this problem with an out-of-step pipeline which …
乱序超标量处理器核的功耗优化
孙彩霞, 李文哲, 高军, 王永文 - 计算机工程与科学, 2017 - joces.nudt.edu.cn
为了追求更高的性能, 处理器核的主频不断提升, 处理器核的设计日益复杂,
随之而来的是功耗问题越来越突出. 除了在工艺级和电路级采用低功耗技术外 …
随之而来的是功耗问题越来越突出. 除了在工艺级和电路级采用低功耗技术外 …