Data prefetch mechanisms

SP Vanderwiel, DJ Lilja - ACM Computing Surveys (CSUR), 2000 - dl.acm.org
The expanding gap between microprocessor and DRAM performance has necessitated the
use of increasingly aggressive techniques designed to reduce or hide the latency of main …

[图书][B] Parallel computer architecture: a hardware/software approach

D Culler, JP Singh, A Gupta - 1999 - books.google.com
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …

Value locality and load value prediction

MH Lipasti, CB Wilkerson, JP Shen - Proceedings of the seventh …, 1996 - dl.acm.org
Since the introduction of virtual memory demand-paging and cache memories, computer
systems have been exploiting spatial and temporal locality to reduce the average latency of …

Exceeding the dataflow limit via value prediction

MH Lipasti, JP Shen - Proceedings of the 29th Annual IEEE …, 1996 - ieeexplore.ieee.org
For decades, the serialization constraints imposed by true data dependences have been
regarded as an absolute limit-the dataflow limit-on the parallel execution of serial programs …

Memory bandwidth limitations of future microprocessors

D Burger, JR Goodman, A Kägi - ACM SIGARCH Computer Architecture …, 1996 - dl.acm.org
This paper makes the case that pin bandwidth will be a critical consideration for future
microprocessors. We show that many of the techniques used to tolerate growing memory …

Managing wire delay in large chip-multiprocessor caches

BM Beckmann, DA Wood - 37th International Symposium on …, 2004 - ieeexplore.ieee.org
In response to increasing (relative) wire delay, architects have proposed various
technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block …

Dependence based prefetching for linked data structures

A Roth, A Moshovos, GS Sohi - … of the eighth international conference on …, 1998 - dl.acm.org
We introduce a dynamic scheme that captures the accesspat-terns of linked data structures
and can be used to predict future accesses with high accuracy. Our technique exploits the …

A data cache with multiple caching strategies tuned to different types of locality

A González, C Aliagas, M Valero - ACM International Conference on …, 1995 - dl.acm.org
Current data cache organizations fail to deliver high performance in scalar processors for
many vector applications. There are two main reasons for this loss of performance: the use …

When prefetching works, when it doesn't, and why

J Lee, H Kim, R Vuduc - ACM Transactions on Architecture and Code …, 2012 - dl.acm.org
In emerging and future high-end processor systems, tolerating increasing cache miss
latency and properly managing memory bandwidth will be critical to achieving high …

[图书][B] Vector microprocessors

K Asanovic - 1998 - search.proquest.com
Most previous research into vector architectures has concentrated on supercomputing
applications and small enhancements to existing vector supercomputer implementations …