Compiling for reconfigurable computing: A survey
Reconfigurable computing platforms offer the promise of substantially accelerating
computations through the concurrent nature of hardware structures and the ability of these …
computations through the concurrent nature of hardware structures and the ability of these …
Loop parallelization in the polytope model
C Lengauer - International Conference on Concurrency Theory, 1993 - Springer
During the course of the last decade, a mathematical model for the parallelization of FOR-
loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is …
loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is …
Supporting very large models using automatic dataflow graph partitioning
This paper presents Tofu, a system that partitions very large DNN models across multiple
GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow …
GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow …
[图书][B] The compiler design handbook: optimizations and machine code generation
YN Srikant, P Shankar - 2002 - taylorfrancis.com
The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …
beginning. Add embedded systems, multiple memory banks, highly pipelined units …
Compiling affine loop nests for distributed-memory parallel architectures
U Bondhugula - Proceedings of the International Conference on High …, 2013 - dl.acm.org
We present new techniques for compilation of arbitrarily nested loops with affine
dependences for distributed-memory parallel architectures. Our framework is implemented …
dependences for distributed-memory parallel architectures. Our framework is implemented …
Unifying data and control transformations for distributed shared-memory machines
M Cierniak, W Li - ACM SIGPLAN Notices, 1995 - dl.acm.org
We present a unified approach to locality optimization that employs both data and control
transformations. Data transformations include changing the array layout in memory. Control …
transformations. Data transformations include changing the array layout in memory. Control …
ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)
JA Sánchez, V Romero, AH Toselli… - 2014 14th International …, 2014 - ieeexplore.ieee.org
A contest on Handwritten Text Recognition organised in the context of the ICFHR 2014
conference is described. Two tracks with increased freedom on the use of training data were …
conference is described. Two tracks with increased freedom on the use of training data were …
Automatic memory partitioning and scheduling for throughput and power optimization
Memory bottleneck has become a limiting factor in satisfying the explosive demands on
performance and cost in modern embedded system design. Selected computation kernels …
performance and cost in modern embedded system design. Selected computation kernels …
[图书][B] An optimizing Fortran D compiler for MIMD distributed-memory machines
CW Tseng - 1993 - search.proquest.com
Massively parallel MIMD distributed-memory machines can provide enormous
computational power; however, the difficulty of developing parallel programs for these …
computational power; however, the difficulty of developing parallel programs for these …
[图书][B] Compiling for NUMA parallel machines
W Li - 1993 - search.proquest.com
A common feature of many scalable parallel machines is non-uniform memory access
(NUMA)--data access to local memory is much faster than to non-local memories. In …
(NUMA)--data access to local memory is much faster than to non-local memories. In …