Argo: A real-time network-on-chip architecture with an efficient GALS implementation
In this paper, we present an area-efficient, globally asynchronous, locally synchronous
network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC …
network-on-chip (NoC) architecture for a hard real-time multiprocessor platform. The NoC …
Towards ultra-high-speed cryogenic single-flux-quantum computing
CMOS microprocessors are limited in their capacity for clock speed improvement because of
increasing computing power, ie, they face a power-wall problem. Single-flux-quantum (SFQ) …
increasing computing power, ie, they face a power-wall problem. Single-flux-quantum (SFQ) …
Chip multiprocessing and the cell broadband engine
M Gschwind - Proceedings of the 3rd conference on Computing …, 2006 - dl.acm.org
Chip multiprocessing has become an exciting new direction for system designers to deliver
increased performance by exploiting CMOS scaling. We discuss key design decisions facing …
increased performance by exploiting CMOS scaling. We discuss key design decisions facing …
[图书][B] Embedded DSP processor design: Application specific instruction set processors
D Liu - 2008 - books.google.com
This book provides design methods for Digital Signal Processors and Application Specific
Instruction set Processors, based on the author's extensive, industrial design experience …
Instruction set Processors, based on the author's extensive, industrial design experience …
The Cell Broadband Engine: exploiting multiple levels of parallelism in a chip multiprocessor
M Gschwind - International journal of parallel programming, 2007 - Springer
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for
delivering high performance (eg, deep pipelining) become too expensive and power …
delivering high performance (eg, deep pipelining) become too expensive and power …
A programmable 512 GOPS stream processor for signal, image, and video processing
BK Khailany, T Williams, J Lin, EP Long… - IEEE Journal of solid …, 2008 - ieeexplore.ieee.org
A 34-million transistor stream processor system-on-chip (SoC) for signal, image, and video
processing contains 80 parallel integer ALUs organized into 16 data-parallel lanes with a 5 …
processing contains 80 parallel integer ALUs organized into 16 data-parallel lanes with a 5 …
Introduction to the cell broadband engine architecture
CR Johns, DA Brokenshire - IBM Journal of Research and …, 2007 - ieeexplore.ieee.org
This paper provides an overview of the Cell Broadband Engine™ Architecture (CBEA). The
CBEA defines a revolutionary extension to a more conventional processor organization and …
CBEA defines a revolutionary extension to a more conventional processor organization and …
μManycore: A Cloud-Native CPU for Tail at Scale
Microservices are emerging as a popular cloud-computing paradigm. Microservice
environments execute typically-short service requests that interact with one another via …
environments execute typically-short service requests that interact with one another via …
Optimizing matrix multiplication for a short-vector SIMD architecture–CELL processor
Matrix multiplication is one of the most common numerical operations, especially in the area
of dense linear algebra, where it forms the core of many important algorithms, including …
of dense linear algebra, where it forms the core of many important algorithms, including …
Multi-core acceleration of chemical kinetics for simulation and prediction
JC Linford, J Michalakes, M Vachharajani… - Proceedings of the …, 2009 - dl.acm.org
This work implements a computationally expensive chemical kinetics kernel from a large-
scale community atmospheric model on three multi-core platforms: NVIDIA GPUs using …
scale community atmospheric model on three multi-core platforms: NVIDIA GPUs using …