A linear algebraic approach to model parallelism in deep learning
RJ Hewett, TJ Grady II - arXiv preprint arXiv:2006.03108, 2020 - arxiv.org
Training deep neural networks (DNNs) in large-cluster computing environments is
increasingly necessary, as networks grow in size and complexity. Local memory and …
increasingly necessary, as networks grow in size and complexity. Local memory and …
Understanding recursive divide-and-conquer dynamic programs in fork-join and data-flow execution models
On shared-memory multicore machines, classic two-way recursive divide-and-conquer
algorithms are implemented using common fork-join based parallel programming paradigms …
algorithms are implemented using common fork-join based parallel programming paradigms …
Extreme Fine-Grained Parallelism on Modern Many-Core Architectures
P Nookala - 2022 - search.proquest.com
Processors with 100s of threads of execution and GPUs with 1000s of cores are among the
state-of-the-art in high-end computing systems. This transition to many-core computing has …
state-of-the-art in high-end computing systems. This transition to many-core computing has …