Compiling for reconfigurable computing: A survey

JMP Cardoso, PC Diniz, M Weinhardt - ACM Computing Surveys (CSUR …, 2010 - dl.acm.org
Reconfigurable computing platforms offer the promise of substantially accelerating
computations through the concurrent nature of hardware structures and the ability of these …

Loop parallelization in the polytope model

C Lengauer - International Conference on Concurrency Theory, 1993 - Springer
During the course of the last decade, a mathematical model for the parallelization of FOR-
loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is …

Supporting very large models using automatic dataflow graph partitioning

M Wang, C Huang, J Li - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org
This paper presents Tofu, a system that partitions very large DNN models across multiple
GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow …

[图书][B] The compiler design handbook: optimizations and machine code generation

YN Srikant, P Shankar - 2002 - taylorfrancis.com
The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …

Compiling affine loop nests for distributed-memory parallel architectures

U Bondhugula - Proceedings of the International Conference on High …, 2013 - dl.acm.org
We present new techniques for compilation of arbitrarily nested loops with affine
dependences for distributed-memory parallel architectures. Our framework is implemented …

Unifying data and control transformations for distributed shared-memory machines

M Cierniak, W Li - ACM SIGPLAN Notices, 1995 - dl.acm.org
We present a unified approach to locality optimization that employs both data and control
transformations. Data transformations include changing the array layout in memory. Control …

ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)

JA Sánchez, V Romero, AH Toselli… - 2014 14th International …, 2014 - ieeexplore.ieee.org
A contest on Handwritten Text Recognition organised in the context of the ICFHR 2014
conference is described. Two tracks with increased freedom on the use of training data were …

Automatic memory partitioning and scheduling for throughput and power optimization

J Cong, W Jiang, B Liu, Y Zou - ACM Transactions on Design Automation …, 2011 - dl.acm.org
Memory bottleneck has become a limiting factor in satisfying the explosive demands on
performance and cost in modern embedded system design. Selected computation kernels …

[图书][B] An optimizing Fortran D compiler for MIMD distributed-memory machines

CW Tseng - 1993 - search.proquest.com
Massively parallel MIMD distributed-memory machines can provide enormous
computational power; however, the difficulty of developing parallel programs for these …

[图书][B] Compiling for NUMA parallel machines

W Li - 1993 - search.proquest.com
A common feature of many scalable parallel machines is non-uniform memory access
(NUMA)--data access to local memory is much faster than to non-local memories. In …