Scaling laws for sparsely-connected foundation models
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained
on massive datasets (ie," foundation models"), in both vision and language domains. In this …
on massive datasets (ie," foundation models"), in both vision and language domains. In this …
Maskllm: Learnable semi-structured sparsity for large language models
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
Lookahead: An inference acceleration framework for large language model with lossless generation accuracy
As Large Language Models (LLMs) have made significant advancements across various
tasks, such as question answering, translation, text summarization, and dialogue systems …
tasks, such as question answering, translation, text summarization, and dialogue systems …
Effective Interplay between Sparsity and Quantization: From Theory to Practice
SB Harma, A Chakraborty, E Kostenok… - arXiv preprint arXiv …, 2024 - arxiv.org
The increasing size of deep neural networks necessitates effective model compression to
improve computational efficiency and reduce their memory footprint. Sparsity and …
improve computational efficiency and reduce their memory footprint. Sparsity and …
ELSA: Exploiting Layer-wise N: M Sparsity for Vision Transformer Acceleration
N: M sparsity is an emerging model compression method supported by more and more
accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing …
accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing …
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
M Mozaffari, A Yazdanbakhsh, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose SLoPe, a Double-Pruned Sparse Plus Lazy Low-rank Adapter Pretraining
method for LLMs that improves the accuracy of sparse LLMs while accelerating their …
method for LLMs that improves the accuracy of sparse LLMs while accelerating their …
Progressive Gradient Flow for Robust N: M Sparsity Training in Transformers
AR Bambhaniya, A Yazdanbakhsh… - arXiv preprint arXiv …, 2024 - arxiv.org
N: M Structured sparsity has garnered significant interest as a result of relatively modest
overhead and improved efficiency. Additionally, this form of sparsity holds considerable …
overhead and improved efficiency. Additionally, this form of sparsity holds considerable …
Beyond 2: 4: exploring V: N: M sparsity for efficient transformer inference on GPUs
To date, 2: 4 sparsity has stood as the only sparse pattern that can be accelerated using
sparse tensor cores on GPUs. In practice, 2: 4 sparsity often possesses low actual speedups …
sparse tensor cores on GPUs. In practice, 2: 4 sparsity often possesses low actual speedups …
Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms
Model sparsity is a promising approach to reducing parameters or FLOPs of convolutional
neural networks (CNNs). Compared to unstructured or coarse-grained structured sparsity …
neural networks (CNNs). Compared to unstructured or coarse-grained structured sparsity …
S-STE: Continuous Pruning Function for Efficient 2: 4 Sparse Pre-training
Y Hu, J Zhu, J Chen - arXiv preprint arXiv:2409.09099, 2024 - arxiv.org
Training deep neural networks (DNNs) is costly. Fortunately, Nvidia Ampere and Hopper
GPUs can accelerate matrix multiplications twice as fast as a dense equivalent by …
GPUs can accelerate matrix multiplications twice as fast as a dense equivalent by …