[图书][B] Advanced computer architecture: parallelism, scalability, programmability
Course Syllabus Course Title: Advanced Computer Architecture Page 1 Page 1 of 5
Philadelphia University Faculty of Information Technology Department of Computer Science …
Philadelphia University Faculty of Information Technology Department of Computer Science …
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near
communication-optimal for all combinations of matrix dimensions, processor counts, and …
communication-optimal for all combinations of matrix dimensions, processor counts, and …
A three-dimensional approach to parallel matrix multiplication
RC Agarwal, SM Balle, FG Gustavson… - IBM Journal of …, 1995 - ieeexplore.ieee.org
A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing
systems is presented. The P processors are configured as a “virtual” processing cube with …
systems is presented. The P processors are configured as a “virtual” processing cube with …
Evaluating HPC networks via simulation of parallel workloads
This paper presents an evaluation and comparison of three topologies that are popular for
building interconnection networks in large-scale supercomputers: torus, fat-tree, and …
building interconnection networks in large-scale supercomputers: torus, fat-tree, and …
Parallel Branch‐and‐Bound Algorithms
TG Crainic, B Le Cun… - Parallel combinatorial …, 2006 - Wiley Online Library
In the beginning of the twenty-first century, large unsolved combinatorial optimization
problems have been solved exactly. Two impressive cases should be mentioned. First are …
problems have been solved exactly. Two impressive cases should be mentioned. First are …
Tensor quantization: High-dimensional data compression
Quantization is an important technique to transform the input sample values from a large set
(or a continuous range) into the output sample values in a small set (or a finite set). It has …
(or a continuous range) into the output sample values in a small set (or a finite set). It has …
Codesign tradeoffs for high-performance, low-power linear algebra architectures
A Pedram, RA Van De Geijn… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
As technology is reaching physical limits, reducing power consumption is a key issue on our
path to sustained performance. In this paper, we study fundamental tradeoffs and limits in …
path to sustained performance. In this paper, we study fundamental tradeoffs and limits in …
Matrix multiplication on the OTIS-mesh optoelectronic computer
CF Wang, S Sahni - IEEE Transactions on Computers, 2001 - ieeexplore.ieee.org
We develop algorithms to multiply two vectors, a vector and a matrix, and two matrices on an
OTIS-Mesh optoelectronic computer. Two mappings, group row and group submesh, of a …
OTIS-Mesh optoelectronic computer. Two mappings, group row and group submesh, of a …
SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems
M Krishnan, J Nieplocha - 18th International Parallel and …, 2004 - ieeexplore.ieee.org
Summary form only given. We describe a novel parallel algorithm that implements a dense
matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's …
matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's …
Using simulation to design extremescale applications and architectures: programming model exploration
CL Janssen, H Adalsteinsson, JP Kenny - ACM SIGMETRICS …, 2011 - dl.acm.org
A key problem facing application developers is that they are expected to utilize extreme
levels of parallelism soon after delivery of future leadership class machines, but developing …
levels of parallelism soon after delivery of future leadership class machines, but developing …