Training of deep learning pipelines on memory-constrained gpus via segmented fused-tiled execution

文章

学术资源搜索

获得 4 条结果（用时0.06秒）

我的图书馆

Training of deep learning pipelines on memory-constrained gpus via segmented fused-tiled execution

在引用文章中搜索

[PDF] neurips.cc

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

被引用次数：40 相关文章所有 6 个版本

[PDF] acm.org

BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs

M Lakshminarasimhan, M Hall, S Williams… - Proceedings of the 53rd …, 2024 - dl.acm.org

The end-to-end performance of deep learning model inference is often limited by excess
data movement on GPUs. To reduce data movement, existing deep learning frameworks …

Optimization of Tensor Computation in Deep Learning

Y Xu - 2024 - search.proquest.com

Optimization on tensor computations can significantly reduce inference and training time;
therefore, tensor computations represent the fundamental computational bottleneck in …

[PDF] cam.ac.uk

Taming TinyML: deep learning inference at computational extremes

E Liberis - 2023 - repository.cam.ac.uk

The advanced data modelling capabilities of neural networks allowed deep learning to
become a cornerstone of many applications of artificial intelligence (AI). AI can be brought …