A performance study of software and hardware data prefetching schemes

SP Vanderwiel, DJ Lilja - ACM Computing Surveys (CSUR), 2000 - dl.acm.org

The expanding gap between microprocessor and DRAM performance has necessitated the
use of increasingly aggressive techniques designed to reduce or hide the latency of main …

被引用次数：469 相关文章所有 18 个版本

[PDF] washington.edu

[图书][B] Parallel computer architecture: a hardware/software approach

D Culler, JP Singh, A Gupta - 1999 - books.google.com

The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …

被引用次数：3011 相关文章所有 16 个版本

[PDF] acm.org

Value locality and load value prediction

MH Lipasti, CB Wilkerson, JP Shen - Proceedings of the seventh …, 1996 - dl.acm.org

Since the introduction of virtual memory demand-paging and cache memories, computer
systems have been exploiting spatial and temporal locality to reduce the average latency of …

被引用次数：970 相关文章所有 24 个版本

[PDF] utah.edu

Exceeding the dataflow limit via value prediction

MH Lipasti, JP Shen - Proceedings of the 29th Annual IEEE …, 1996 - ieeexplore.ieee.org

For decades, the serialization constraints imposed by true data dependences have been
regarded as an absolute limit-the dataflow limit-on the parallel execution of serial programs …

被引用次数：752 相关文章所有 20 个版本

[PDF] acm.org

Memory bandwidth limitations of future microprocessors

D Burger, JR Goodman, A Kägi - ACM SIGARCH Computer Architecture …, 1996 - dl.acm.org

This paper makes the case that pin bandwidth will be a critical consideration for future
microprocessors. We show that many of the techniques used to tolerate growing memory …

被引用次数：647 相关文章所有 23 个版本

[PDF] psu.edu

Managing wire delay in large chip-multiprocessor caches

BM Beckmann, DA Wood - 37th International Symposium on …, 2004 - ieeexplore.ieee.org

In response to increasing (relative) wire delay, architects have proposed various
technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block …

被引用次数：502 相关文章所有 18 个版本

[PDF] acm.org

Dependence based prefetching for linked data structures

A Roth, A Moshovos, GS Sohi - … of the eighth international conference on …, 1998 - dl.acm.org

We introduce a dynamic scheme that captures the accesspat-terns of linked data structures
and can be used to predict future accesses with high accuracy. Our technique exploits the …

被引用次数：429 相关文章所有 28 个版本

[PDF] acm.org

A data cache with multiple caching strategies tuned to different types of locality

A González, C Aliagas, M Valero - ACM International Conference on …, 1995 - dl.acm.org

Current data cache organizations fail to deliver high performance in scalar processors for
many vector applications. There are two main reasons for this loss of performance: the use …

被引用次数：428 相关文章所有 7 个版本

[PDF] acm.org

When prefetching works, when it doesn't, and why

J Lee, H Kim, R Vuduc - ACM Transactions on Architecture and Code …, 2012 - dl.acm.org

In emerging and future high-end processor systems, tolerating increasing cache miss
latency and properly managing memory bandwidth will be critical to achieving high …

被引用次数：208 相关文章所有 13 个版本

[PDF] berkeley.edu

[图书][B] Vector microprocessors

K Asanovic - 1998 - search.proquest.com

Most previous research into vector architectures has concentrated on supercomputing
applications and small enhancements to existing vector supercomputer implementations …

被引用次数：253 相关文章所有 7 个版本