Effective hardware-based data prefetching for high-performance processors
TF Chen, JL Baer - IEEE transactions on computers, 1995 - ieeexplore.ieee.org
Memory latency and bandwidth are progressing at a much slower pace than processor
performance. In this paper, we describe and evaluate the performance of three variations of …
performance. In this paper, we describe and evaluate the performance of three variations of …
A performance study of software and hardware data prefetching schemes
TF Chen, JL Baer - ACM SIGARCH Computer Architecture News, 1994 - dl.acm.org
Prefetching, ie, exploiting the overlap of processor computations with data accesses, is one
of several approaches for tolerating memory latencies. Prefetching can be either hardware …
of several approaches for tolerating memory latencies. Prefetching can be either hardware …
Data prefetching and data forwarding in shared memory multiprocessors
DK Poulsen, PCYPC Yew - 1994 Internatonal Conference on …, 1994 - ieeexplore.ieee.org
This paper studies and compares the use of data prefetching and an alternative mechanism,
data forwarding, for reducing memory latency due to interprocessor communication in cache …
data forwarding, for reducing memory latency due to interprocessor communication in cache …
Design of a parallel vector access unit for SDRAM memory systems
We are attacking the memory bottleneck by building a" smart" memory controller that
improves effective memory bandwidth, bus utilization, and cache efficiency by letting …
improves effective memory bandwidth, bus utilization, and cache efficiency by letting …
[PDF][PDF] Reducing branch misprediction penalties via dynamic control independence detection
This paper presents the concept of dynamic control independence (DCl) and shows how it
can be detected and exploited in an out-of-order superscalar processor to reduce the …
can be detected and exploited in an out-of-order superscalar processor to reduce the …
Effective cache prefetching on bus-based multiprocessors
DM Tullsen, SJ Eggers - ACM Transactions on Computer Systems …, 1995 - dl.acm.org
Compiler-directed cache prefetching has the potential to hide much of the high memory
latency seen by current and future high-performance processors. However, prefetching is …
latency seen by current and future high-performance processors. However, prefetching is …
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors1
EH Gornish, A Veidenbaum - International Journal of Parallel …, 1999 - Springer
Both hardware and software prefetching have been shown to be effective in tolerating the
large memory latencies inherent in shared-memory multiprocessors; however, both types of …
large memory latencies inherent in shared-memory multiprocessors; however, both types of …
Sunder: A programmable hardware prefetch architecture for numerical loops
T Chiueh - Supercomputing'94: Proceedings of the 1994 ACM …, 1994 - ieeexplore.ieee.org
Beyond data caching, data prefetching is by far the most effective way to address the
memory access bottleneck associated with high-performance processors. This is particularly …
memory access bottleneck associated with high-performance processors. This is particularly …
Verifying distributed directory-based cache coherence protocols: S3. mp, a case study
F Pong, A Nowatzyk, G Aybay, M Dubois - EURO-PAR'95 Parallel …, 1995 - Springer
This paper presents the results for the verification of the S3. mp cache coherence protocol.
The S3. mp protocol uses a distributed directory with limited number of pointers and …
The S3. mp protocol uses a distributed directory with limited number of pointers and …
[图书][B] Adaptive and integrated data cache prefetching for shared memory multiprocessors
EH Gornish - 1995 - search.proquest.com
Memory latency has always been a major issue in shared-memory multiprocessors and high-
speed systems. This is even more true as the gap between processor and memory speeds …
speed systems. This is even more true as the gap between processor and memory speeds …