Monarch mixer: A simple sub-quadratic gemm-based architecture
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …
and model dimension to reach longer contexts and better performance. However, existing …
BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs
The end-to-end performance of deep learning model inference is often limited by excess
data movement on GPUs. To reduce data movement, existing deep learning frameworks …
data movement on GPUs. To reduce data movement, existing deep learning frameworks …
Optimization of Tensor Computation in Deep Learning
Y Xu - 2024 - search.proquest.com
Optimization on tensor computations can significantly reduce inference and training time;
therefore, tensor computations represent the fundamental computational bottleneck in …
therefore, tensor computations represent the fundamental computational bottleneck in …
Taming TinyML: deep learning inference at computational extremes
E Liberis - 2023 - repository.cam.ac.uk
The advanced data modelling capabilities of neural networks allowed deep learning to
become a cornerstone of many applications of artificial intelligence (AI). AI can be brought …
become a cornerstone of many applications of artificial intelligence (AI). AI can be brought …