Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

J Ragan-Kelley, C Barnes, A Adams, S Paris… - Acm Sigplan …, 2013 - dl.acm.org
Image processing pipelines combine the challenges of stencil computations and stream
programs. They are composed of large graphs of different stencil stages, as well as complex …

Flashmeta: A framework for inductive program synthesis

O Polozov, S Gulwani - Proceedings of the 2015 ACM SIGPLAN …, 2015 - dl.acm.org
Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with
disruptive applications for automating repetitive tasks in end-user programming. However …

The design and implementation of FFTW3

M Frigo, SG Johnson - Proceedings of the IEEE, 2005 - ieeexplore.ieee.org
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the
hardware in order to maximize performance. This paper shows that such an approach can …

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

VW Lee, C Kim, J Chhugani, M Deisher, D Kim… - Proceedings of the 37th …, 2010 - dl.acm.org
Recent advances in computing have led to an explosion in the amount of data being
generated. Processing the ever-growing data in a timely manner has made throughput …

Auto-tuning a high-level language targeted to GPU codes

S Grauer-Gray, L Xu, R Searles… - 2012 innovative …, 2012 - ieeexplore.ieee.org
Determining the best set of optimizations to apply to a kernel to be executed on the graphics
processing unit (GPU) is a challenging problem. There are large sets of possible …

BLIS: A framework for rapidly instantiating BLAS functionality

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org
The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

Precimonious: Tuning assistant for floating-point precision

C Rubio-González, C Nguyen, HD Nguyen… - Proceedings of the …, 2013 - dl.acm.org
Given the variety of numerical errors that can occur, floating-point programs are difficult to
write, test and debug. One common practice employed by developers without an advanced …

Modern development methods and tools for embedded reconfigurable systems: A survey

L Jóźwiak, N Nedjah, M Figueroa - Integration, 2010 - Elsevier
Heterogeneous reconfigurable systems provide drastically higher performance and lower
power consumption than traditional CPU-centric systems. Moreover, they do it at much lower …

Memory coherence in shared virtual memory systems

K Li, P Hudak - ACM Transactions on Computer Systems (TOCS), 1989 - dl.acm.org
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

CK Luk, S Hong, H Kim - Proceedings of the 42nd Annual IEEE/ACM …, 2009 - dl.acm.org
Heterogeneous multiprocessors are increasingly important in the multi-core era due to their
potential for high performance and energy efficiency. In order for software to fully realize this …