Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs

M Lakshminarasimhan, M Hall, S Williams… - Proceedings of the 53rd …, 2024 - dl.acm.org
The end-to-end performance of deep learning model inference is often limited by excess
data movement on GPUs. To reduce data movement, existing deep learning frameworks …

Optimization of Tensor Computation in Deep Learning

Y Xu - 2024 - search.proquest.com
Optimization on tensor computations can significantly reduce inference and training time;
therefore, tensor computations represent the fundamental computational bottleneck in …

Taming TinyML: deep learning inference at computational extremes

E Liberis - 2023 - repository.cam.ac.uk
The advanced data modelling capabilities of neural networks allowed deep learning to
become a cornerstone of many applications of artificial intelligence (AI). AI can be brought …