Branch runahead: An alternative to branch prediction for impossible to predict branches

S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org
High performance microprocessors require high levels of instruction supply. Branch
prediction has been the most important driver of this for nearly 30 years. Unfortunately …

Precise runahead execution

A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Runahead execution improves processor performance by accurately prefetching long-
latency memory accesses. When a long-latency load causes the instruction window to fill up …

Graphattack: Optimizing data supply for graph applications on in-order multicore architectures

A Manocha, T Sorensen, E Tureci, O Matthews… - ACM Transactions on …, 2021 - dl.acm.org
Graph structures are a natural representation of important and pervasive data. While graph
applications have significant parallelism, their characteristic pointer indirect loads to …

A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architecture

J Fang, Y Xu, H Kong, M Cai - The Journal of Supercomputing, 2023 - Springer
Cache prefetching is a traditional way to reduce memory access latency. In multi-core
systems, aggressive prefetching may harm the system. In the past, prefetching throttling …

Slipstream processors revisited: Exploiting branch sets

V Srinivasan, RBR Chowdhury… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Delinquent branches and loads remain key performance limiters in some applications. One
approach to mitigate them is pre-execution. Broadly, there are two classes of pre-execution …

Timely, Efficient, and Accurate Branch Precomputation

A Deshmukh, LC Cai, YN Patt - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Out-of-order cores rely on high-accuracy branch predictors to supply useful instructions to
the processor backend. However, there remains a large fraction of mispredictions caused by …

Bootstrapping: Using smt hardware to improve single-thread performance

S Kondguli, M Huang - Proceedings of the Twenty-Fourth International …, 2019 - dl.acm.org
Single-thread performance improvement remains a central design goal for general purpose
processors. Microarchitectural designs for the core have reached a plateau over the past …

[PDF][PDF] 基于指令流混合模式学习的缓存预取算法

王玉庆, 杨秋松, 李明树 - 电子学报, 2023 - ejournal.org.cn
近期缓存预取算法的研究热点是使用基于模式识别的预测技术, 例如Lookahead,
推算访存请求的地址. 此类算法一方面很难学习访存行为中的依赖缓存失效 …

[图书][B] Optimizing Data Supply and Memory Management for Graph Applications in Post-Moore Hardware-Software Systems

A Manocha - 2023 - search.proquest.com
Graph structures naturally and efficiently capture relationships between entities, such as
individuals in a social network, pages in the World Wide Web, and amino acids in protein …

[图书][B] Slipstream Processors Revisited: Exploiting Branch Sets

V Srinivasan - 2019 - search.proquest.com
Delinquent branches (frequently mispredict) and loads (frequently miss) remain key IPC
bottlenecks in some applications. One approach to reduce their effect is pre-execution via …