Codesign tradeoffs for high-performance, low-power linear algebra architectures

A Pedram, RA Van De Geijn… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
As technology is reaching physical limits, reducing power consumption is a key issue on our
path to sustained performance. In this paper, we study fundamental tradeoffs and limits in …

Memory-size-and bandwidth-efficient method for feeding systolic array matrix multipliers

JZ Yinger, A Ling, T Czajkowski, D Capalija… - US Patent …, 2022 - Google Patents
Matrix multiplication systolic array feed methods and related processing element (PE)
microarchitectures for efficiently implementing systolic array generic matrix multiplier …

A high-performance, low-power linear algebra core

A Pedram, A Gerstlauer… - ASAP 2011-22nd IEEE …, 2011 - ieeexplore.ieee.org
Achieving high-performance while reducing power consumption is a key concern as
technology scaling is reaching its limits. It is well-accepted that application-specific custom …

Hyper-systolic parallel computing

T Lippert, A Seyfried, A Bode… - IEEE Transactions on …, 1998 - ieeexplore.ieee.org
We introduce a new class of parallel algorithms for the exact computation of systems with
pairwise mutual interactions of n elements, so called n/sup 2/-problems. Hitherto, practical …

On the efficiency of register file versus broadcast interconnect for collective communications in data-parallel hardware accelerators

A Pedram, A Gerstlauer… - 2012 IEEE 24th …, 2012 - ieeexplore.ieee.org
Reducing power consumption and increasing efficiency is a key concern for many
applications. How to design highly efficient computing elements while maintaining enough …

A systolic implementation of the MLEM reconstruction algorithm for positron emission tomography images

R Möller - Parallel Computing, 1999 - Elsevier
This work is part of a project concerned with parallel reconstruction of 3D images from the
data delivered by positron emission tomography. The intention of this investigation is the …

An energy-efficient custom architecture for the SKA1-Low central signal processor

L Fiorin, E Vermij, J Van Lunteren, R Jongerius… - Proceedings of the 12th …, 2015 - dl.acm.org
The Square Kilometre Array (SKA) will be the biggest radio telescope ever built, with
unprecedented sensitivity, angular resolution, and survey speed. This paper explores the …

[PDF][PDF] Taxonomies in Operation, Design, and Meta-Design.

C Niederée, C Muscogiuri, ML Hemmje - WISE Workshops, 2002 - Citeseer
Taxonomies are a well-established instrument for organizing and accessing resources in
Information, Content and Knowledge Management (ICKM) systems. Furthermore, they …

Resolution-invariant image representation and its applications

J Wang, S Zhu, Y Gong - 2009 IEEE Conference on Computer …, 2009 - ieeexplore.ieee.org
We present a resolution-invariant image representation (RIIR) framework in this paper. The
RIIR framework includes the methods of building a set of multi-resolution bases from training …

FFT for the APE Parallel Computer

T Lippert, K Schilling, S Trentmann… - … Journal of Modern …, 1997 - World Scientific
We present a parallel FFT algorithm for SIMD systems following the" Transpose Algorithm"
approach. The method is based on the assignment of the data field onto a one-dimensional …