SPIRAL: Extreme performance portability

F Franchetti, TM Low, DT Popovici… - Proceedings of the …, 2018 - ieeexplore.ieee.org
In this paper, we address the question of how to automatically map computational kernels to
highly efficient code for a wide range of computing platforms and establish the correctness of …

Automated partitioning of a computation for parallel or other high capability architecture

TJ Biggerstaff - US Patent 8,060,857, 2011 - Google Patents
This invention relates to programming of computers with various kinds of facilities for parallel
or other high capability execution of computer programs, specifically to the automated …

Computer generation of hardware for linear digital signal processing transforms

P Milder, F Franchetti, JC Hoe, M Püschel - ACM Transactions on Design …, 2012 - dl.acm.org
Linear signal transforms such as the discrete Fourier transform (DFT) are very widely used in
digital signal processing and other domains. Due to high performance or efficiency …

Algebraic signal processing theory: Cooley–Tukey type algorithms for DCTs and DSTs

M Puschel, JMF Moura - IEEE Transactions on Signal …, 2008 - ieeexplore.ieee.org
This paper presents a systematic methodology to derive and classify fast algorithms for
linear transforms. The approach is based on the algebraic signal processing theory. This …

Discrete Fourier transform on multicore

F Franchetti, M Puschel, Y Voronenko… - IEEE Signal …, 2009 - ieeexplore.ieee.org
This article gives an overview on the techniques needed to implement the discrete Fourier
transform (DFT) efficiently on current multicore systems. The focus is on Intel-compatible …

Spiral in scala: towards the systematic construction of generators for performance libraries

G Ofenbeck, T Rompf, A Stojanov, M Odersky… - Proceedings of the 12th …, 2013 - dl.acm.org
Program generators for high performance libraries are an appealing solution to the recurring
problem of porting and optimizing code with every new processor generation, but only few …

DP-Fair: a unifying theory for optimal hard real-time multiprocessor scheduling

S Funk, G Levin, C Sadowski, I Pye, S Brandt - Real-Time Systems, 2011 - Springer
We consider the problem of optimal real-time scheduling of periodic and sporadic tasks on
identical multiprocessors. A number of recent papers have used the notions of fluid …

Operator language: A program generation framework for fast kernels

F Franchetti, F de Mesmay, D McFarlin… - IFIP Working Conference …, 2009 - Springer
Abstract We present the Operator Language (OL), a framework to automatically generate
fast numerical kernels. OL provides the structure to extend the program generation system …

Bandit-based optimization on graphs with application to library performance tuning

F De Mesmay, A Rimmel, Y Voronenko… - Proceedings of the 26th …, 2009 - dl.acm.org
The problem of choosing fast implementations for a class of recursive algorithms such as the
fast Fourier transforms can be formulated as an optimization problem over the language …

Shoal: Smart allocation and replication of memory for parallel programs

S Kaestle, R Achermann, T Roscoe… - 2015 USENIX Annual …, 2015 - usenix.org
Shoal: Smart Allocation and Replication of Memory For Parallel Programs Page 1 This
paper is included in the Proceedings of the 2015 USENIX Annual Technical Conference (USENIC …