A linear algebraic approach to model parallelism in deep learning

RJ Hewett, TJ Grady II - arXiv preprint arXiv:2006.03108, 2020 - arxiv.org
Training deep neural networks (DNNs) in large-cluster computing environments is
increasingly necessary, as networks grow in size and complexity. Local memory and …

Understanding recursive divide-and-conquer dynamic programs in fork-join and data-flow execution models

P Nookala, Z Ahmad, MM Javanmard… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
On shared-memory multicore machines, classic two-way recursive divide-and-conquer
algorithms are implemented using common fork-join based parallel programming paradigms …

Extreme Fine-Grained Parallelism on Modern Many-Core Architectures

P Nookala - 2022 - search.proquest.com
Processors with 100s of threads of execution and GPUs with 1000s of cores are among the
state-of-the-art in high-end computing systems. This transition to many-core computing has …