Tensor contraction on distributed hybrid architectures using a task-based runtime system

文章

学术资源搜索

获得 3 条结果（用时0.05秒）

我的图书馆

Tensor contraction on distributed hybrid architectures using a task-based runtime system

在引用文章中搜索

[PDF] arxiv.org

A linear algebraic approach to model parallelism in deep learning

RJ Hewett, TJ Grady II - arXiv preprint arXiv:2006.03108, 2020 - arxiv.org

Training deep neural networks (DNNs) in large-cluster computing environments is
increasingly necessary, as networks grow in size and complexity. Local memory and …

被引用次数：16 相关文章所有 2 个版本

[PDF] nsf.gov

Understanding recursive divide-and-conquer dynamic programs in fork-join and data-flow execution models

P Nookala, Z Ahmad, MM Javanmard… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

On shared-memory multicore machines, classic two-way recursive divide-and-conquer
algorithms are implemented using common fork-join based parallel programming paradigms …

被引用次数：5 相关文章所有 3 个版本

Extreme Fine-Grained Parallelism on Modern Many-Core Architectures

P Nookala - 2022 - search.proquest.com

Processors with 100s of threads of execution and GPUs with 1000s of cores are among the
state-of-the-art in high-end computing systems. This transition to many-core computing has …