SPIRAL: Extreme performance portability
In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …
highly efficient code for a wide range of computing platforms and establish the correctness of …
Automated partitioning of a computation for parallel or other high capability architecture
TJ Biggerstaff - US Patent 8,060,857, 2011 - Google Patents
This invention relates to programming of computers with various kinds of facilities for parallel
or other high capability execution of computer programs, specifically to the automated …
or other high capability execution of computer programs, specifically to the automated …
Computer generation of hardware for linear digital signal processing transforms
Linear signal transforms such as the discrete Fourier transform (DFT) are very widely used in
digital signal processing and other domains. Due to high performance or efficiency …
digital signal processing and other domains. Due to high performance or efficiency …
Algebraic signal processing theory: Cooley–Tukey type algorithms for DCTs and DSTs
This paper presents a systematic methodology to derive and classify fast algorithms for
linear transforms. The approach is based on the algebraic signal processing theory. This …
linear transforms. The approach is based on the algebraic signal processing theory. This …
Discrete Fourier transform on multicore
F Franchetti, M Puschel, Y Voronenko… - IEEE Signal …, 2009 - ieeexplore.ieee.org
This article gives an overview on the techniques needed to implement the discrete Fourier
transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible …
transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible …
Spiral in scala: towards the systematic construction of generators for performance libraries
Program generators for high performance libraries are an appealing solution to the recurring
problem of porting and optimizing code with every new processor generation, but only few …
problem of porting and optimizing code with every new processor generation, but only few …
DP-Fair: a unifying theory for optimal hard real-time multiprocessor scheduling
We consider the problem of optimal real-time scheduling of periodic and sporadic tasks on
identical multiprocessors. A number of recent papers have used the notions of fluid …
identical multiprocessors. A number of recent papers have used the notions of fluid …
Operator language: A program generation framework for fast kernels
F Franchetti, F de Mesmay, D McFarlin… - IFIP Working Conference …, 2009 - Springer
Abstract We present the Operator Language (OL), a framework to automatically generate
fast numerical kernels. OL provides the structure to extend the program generation system …
fast numerical kernels. OL provides the structure to extend the program generation system …
Bandit-based optimization on graphs with application to library performance tuning
F De Mesmay, A Rimmel, Y Voronenko… - Proceedings of the 26th …, 2009 - dl.acm.org
The problem of choosing fast implementations for a class of recursive algorithms such as the
fast Fourier transforms can be formulated as an optimization problem over the language …
fast Fourier transforms can be formulated as an optimization problem over the language …
Shoal: Smart allocation and replication of memory for parallel programs
Shoal: Smart Allocation and Replication of Memory For Parallel Programs Page 1 This
paper is included in the Proceedings of the 2015 USENIX Annual Technical Conference (USENIC …
paper is included in the Proceedings of the 2015 USENIX Annual Technical Conference (USENIC …