Data prefetch mechanisms
SP Vanderwiel, DJ Lilja - ACM Computing Surveys (CSUR), 2000 - dl.acm.org
The expanding gap between microprocessor and DRAM performance has necessitated the
use of increasingly aggressive techniques designed to reduce or hide the latency of main …
use of increasingly aggressive techniques designed to reduce or hide the latency of main …
[图书][B] Parallel computer architecture: a hardware/software approach
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …
traditionally disparate approaches on a common machine structure. This book explains the …
Value locality and load value prediction
Since the introduction of virtual memory demand-paging and cache memories, computer
systems have been exploiting spatial and temporal locality to reduce the average latency of …
systems have been exploiting spatial and temporal locality to reduce the average latency of …
Exceeding the dataflow limit via value prediction
MH Lipasti, JP Shen - Proceedings of the 29th Annual IEEE …, 1996 - ieeexplore.ieee.org
For decades, the serialization constraints imposed by true data dependences have been
regarded as an absolute limit-the dataflow limit-on the parallel execution of serial programs …
regarded as an absolute limit-the dataflow limit-on the parallel execution of serial programs …
Memory bandwidth limitations of future microprocessors
D Burger, JR Goodman, A Kägi - ACM SIGARCH Computer Architecture …, 1996 - dl.acm.org
This paper makes the case that pin bandwidth will be a critical consideration for future
microprocessors. We show that many of the techniques used to tolerate growing memory …
microprocessors. We show that many of the techniques used to tolerate growing memory …
Managing wire delay in large chip-multiprocessor caches
BM Beckmann, DA Wood - 37th International Symposium on …, 2004 - ieeexplore.ieee.org
In response to increasing (relative) wire delay, architects have proposed various
technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block …
technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block …
Dependence based prefetching for linked data structures
A Roth, A Moshovos, GS Sohi - … of the eighth international conference on …, 1998 - dl.acm.org
We introduce a dynamic scheme that captures the accesspat-terns of linked data structures
and can be used to predict future accesses with high accuracy. Our technique exploits the …
and can be used to predict future accesses with high accuracy. Our technique exploits the …
A data cache with multiple caching strategies tuned to different types of locality
Current data cache organizations fail to deliver high performance in scalar processors for
many vector applications. There are two main reasons for this loss of performance: the use …
many vector applications. There are two main reasons for this loss of performance: the use …
When prefetching works, when it doesn't, and why
In emerging and future high-end processor systems, tolerating increasing cache miss
latency and properly managing memory bandwidth will be critical to achieving high …
latency and properly managing memory bandwidth will be critical to achieving high …
[图书][B] Vector microprocessors
K Asanovic - 1998 - search.proquest.com
Most previous research into vector architectures has concentrated on supercomputing
applications and small enhancements to existing vector supercomputer implementations …
applications and small enhancements to existing vector supercomputer implementations …