Effective hardware-based data prefetching for high-performance processors

TF Chen, JL Baer - IEEE transactions on computers, 1995 - ieeexplore.ieee.org
Memory latency and bandwidth are progressing at a much slower pace than processor
performance. In this paper, we describe and evaluate the performance of three variations of …

A performance study of software and hardware data prefetching schemes

TF Chen, JL Baer - ACM SIGARCH Computer Architecture News, 1994 - dl.acm.org
Prefetching, ie, exploiting the overlap of processor computations with data accesses, is one
of several approaches for tolerating memory latencies. Prefetching can be either hardware …

Data prefetching and data forwarding in shared memory multiprocessors

DK Poulsen, PCYPC Yew - 1994 Internatonal Conference on …, 1994 - ieeexplore.ieee.org
This paper studies and compares the use of data prefetching and an alternative mechanism,
data forwarding, for reducing memory latency due to interprocessor communication in cache …

Design of a parallel vector access unit for SDRAM memory systems

BK Mathew, SA McKee, JB Carter… - … Symposium on High …, 2000 - ieeexplore.ieee.org
We are attacking the memory bottleneck by building a" smart" memory controller that
improves effective memory bandwidth, bus utilization, and cache efficiency by letting …

[PDF][PDF] Reducing branch misprediction penalties via dynamic control independence detection

Y Chou, J Fung, JP Shen - … of the 13th international conference on …, 1999 - dl.acm.org
This paper presents the concept of dynamic control independence (DCl) and shows how it
can be detected and exploited in an out-of-order superscalar processor to reduce the …

Effective cache prefetching on bus-based multiprocessors

DM Tullsen, SJ Eggers - ACM Transactions on Computer Systems …, 1995 - dl.acm.org
Compiler-directed cache prefetching has the potential to hide much of the high memory
latency seen by current and future high-performance processors. However, prefetching is …

An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors1

EH Gornish, A Veidenbaum - International Journal of Parallel …, 1999 - Springer
Both hardware and software prefetching have been shown to be effective in tolerating the
large memory latencies inherent in shared-memory multiprocessors; however, both types of …

Sunder: A programmable hardware prefetch architecture for numerical loops

T Chiueh - Supercomputing'94: Proceedings of the 1994 ACM …, 1994 - ieeexplore.ieee.org
Beyond data caching, data prefetching is by far the most effective way to address the
memory access bottleneck associated with high-performance processors. This is particularly …

Verifying distributed directory-based cache coherence protocols: S3. mp, a case study

F Pong, A Nowatzyk, G Aybay, M Dubois - EURO-PAR'95 Parallel …, 1995 - Springer
This paper presents the results for the verification of the S3. mp cache coherence protocol.
The S3. mp protocol uses a distributed directory with limited number of pointers and …

[图书][B] Adaptive and integrated data cache prefetching for shared memory multiprocessors

EH Gornish - 1995 - search.proquest.com
Memory latency has always been a major issue in shared-memory multiprocessors and high-
speed systems. This is even more true as the gap between processor and memory speeds …