Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs
Modern computing systems employ significant heterogeneity and specialization to meet
performance targets at manageable power. However, memory latency bottlenecks remain …
performance targets at manageable power. However, memory latency bottlenecks remain …
[PDF][PDF] Performance improvement with circuit-level speculation
T Liu, SL Lu - Proceedings of the 33rd annual ACM/IEEE …, 2000 - dl.acm.org
Current superscalar microprocessors' performance depends on its frequency and the
number of useful instructions that can be processed per cycle (IPC). In this paper we …
number of useful instructions that can be processed per cycle (IPC). In this paper we …
Multithreading decoupled architectures for complexity-effective general purpose computing
M Sung, R Krashinsky, K Asanović - ACM SIGARCH Computer …, 2001 - dl.acm.org
Decoupled architectures have not traditionally been used in the context of general purpose
computing because of their inability to tolerate control-intensive code that exists across a …
computing because of their inability to tolerate control-intensive code that exists across a …
Speculative Precomputation: Exploring the Use of Multithreading for Latency.
H Wang, PH Wang, RD Weldon… - Intel Technology …, 2002 - search.ebscohost.com
Speculative Precomputation (SP) is a technique to improve the latency of single-threaded
applications by utilizing idle multi-threading hardware resources to perform aggressive long …
applications by utilizing idle multi-threading hardware resources to perform aggressive long …
Design and evaluation of a hierarchical decoupled architecture
The speed gap between processor and main memory is the major performance bottleneck of
modern computer systems. As a result, today's microprocessors suffer from frequent cache …
modern computer systems. As a result, today's microprocessors suffer from frequent cache …
[PDF][PDF] Microarchitectural miss/execute decoupling
The decoupled access/execute architecture described a machine that enables the access of
memory values to be decoupled from the consumption of those values. Although never …
memory values to be decoupled from the consumption of those values. Although never …
Navigating Heterogeneity and Scalability in Modern Chip Design
M Orenes-Vera - 2024 - search.proquest.com
Computing systems have become ubiquitous in the modern world but their design is far from
one-size-fits-all. From battery-powered devices to supercomputers, deployment …
one-size-fits-all. From battery-powered devices to supercomputers, deployment …
[图书][B] On the Realization of Fine Grained Multithreading in Software
A Grävinghoff - 2002 - Citeseer
This work deals with the design, implementation and evaluation of a multithreading system
that enables fine-grained context switches without hardware support. The current chapter …
that enables fine-grained context switches without hardware support. The current chapter …
[图书][B] Programming Model and Execution Model for OpenMP on the Cyclops-64 Manycore Processor
G Gan - 2010 - capsl.udel.edu
During the last ten years, multicore processors have matured from academic research
projects to real products in industry. They are now used in across almost the entire spectrum …
projects to real products in industry. They are now used in across almost the entire spectrum …
Mini-graph processing
AW Bracy - 2008 - search.proquest.com
For years, single-thread performance was the most dominant force driving processor
development. In recent years, however, the poor scaling of single-thread super-scalar …
development. In recent years, however, the poor scaling of single-thread super-scalar …