Compile-time techniques for data distribution in distributed memory machines

JMP Cardoso, PC Diniz, M Weinhardt - ACM Computing Surveys (CSUR …, 2010 - dl.acm.org

Reconfigurable computing platforms offer the promise of substantially accelerating
computations through the concurrent nature of hardware structures and the ability of these …

被引用次数：139 相关文章所有 9 个版本

[PDF] psu.edu

Loop parallelization in the polytope model

C Lengauer - International Conference on Concurrency Theory, 1993 - Springer

During the course of the last decade, a mathematical model for the parallelization of FOR-
loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is …

被引用次数：329 相关文章所有 11 个版本

[PDF] acm.org

Supporting very large models using automatic dataflow graph partitioning

M Wang, C Huang, J Li - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org

This paper presents Tofu, a system that partitions very large DNN models across multiple
GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow …

被引用次数：189 相关文章所有 9 个版本

[图书][B] The compiler design handbook: optimizations and machine code generation

YN Srikant, P Shankar - 2002 - taylorfrancis.com

The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …

被引用次数：217 相关文章所有 9 个版本

[PDF] acm.org

Compiling affine loop nests for distributed-memory parallel architectures

U Bondhugula - Proceedings of the International Conference on High …, 2013 - dl.acm.org

We present new techniques for compilation of arbitrarily nested loops with affine
dependences for distributed-memory parallel architectures. Our framework is implemented …

被引用次数：122 相关文章所有 10 个版本

[PDF] acm.org

Unifying data and control transformations for distributed shared-memory machines

M Cierniak, W Li - ACM SIGPLAN Notices, 1995 - dl.acm.org

We present a unified approach to locality optimization that employs both data and control
transformations. Data transformations include changing the array layout in memory. Control …

被引用次数：314 相关文章所有 8 个版本

[PDF] academia.edu

ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)

JA Sánchez, V Romero, AH Toselli… - 2014 14th International …, 2014 - ieeexplore.ieee.org

A contest on Handwritten Text Recognition organised in the context of the ICFHR 2014
conference is described. Two tracks with increased freedom on the use of training data were …

被引用次数：109 相关文章所有 4 个版本

[PDF] acm.org

Automatic memory partitioning and scheduling for throughput and power optimization

J Cong, W Jiang, B Liu, Y Zou - ACM Transactions on Design Automation …, 2011 - dl.acm.org

Memory bottleneck has become a limiting factor in satisfying the explosive demands on
performance and cost in modern embedded system design. Selected computation kernels …

被引用次数：144 相关文章所有 12 个版本

[PDF] psu.edu

[图书][B] An optimizing Fortran D compiler for MIMD distributed-memory machines

CW Tseng - 1993 - search.proquest.com

Massively parallel MIMD distributed-memory machines can provide enormous
computational power; however, the difficulty of developing parallel programs for these …

被引用次数：303 相关文章所有 11 个版本

[PDF] cornell.edu

[图书][B] Compiling for NUMA parallel machines

W Li - 1993 - search.proquest.com

A common feature of many scalable parallel machines is non-uniform memory access
(NUMA)--data access to local memory is much faster than to non-local memories. In …

被引用次数：164 相关文章所有 5 个版本