Loraprune: Pruning meets low-rank parameter-efficient fine-tuning
Large pre-trained models (LPMs), such as LLaMA and GLM, have shown exceptional
performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) …
performance across various tasks through fine-tuning. Although low-rank adaption (LoRA) …
Pruning's effect on generalization through the lens of training and regularization
Practitioners frequently observe that pruning improves model generalization. A long-
standing hypothesis based on bias-variance trade-off attributes this generalization …
standing hypothesis based on bias-variance trade-off attributes this generalization …
Fast as chita: Neural network pruning with combinatorial optimization
The sheer size of modern neural networks makes model serving a serious computational
challenge. A popular class of compression techniques overcomes this challenge by pruning …
challenge. A popular class of compression techniques overcomes this challenge by pruning …
Singe: Sparsity via integrated gradients estimation of neuron relevance
The leap in performance in state-of-the-art computer vision methods is attributed to the
development of deep neural networks. However it often comes at a computational price …
development of deep neural networks. However it often comes at a computational price …
FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning
The increasing computational demands of modern neural networks present deployment
challenges on resource-constrained devices. Network pruning offers a solution to reduce …
challenges on resource-constrained devices. Network pruning offers a solution to reduce …
Register Tiling for Unstructured Sparsity in Neural Network Inference
Unstructured sparse neural networks are an important class of machine learning (ML)
models, as they compact model size and reduce floating point operations. The execution …
models, as they compact model size and reduce floating point operations. The execution …
UFKT: Unimportant filters knowledge transfer for CNN pruning
As the deep learning models have been widely used in recent years, there is a high demand
for reducing the model size in terms of memory and computation without much compromise …
for reducing the model size in terms of memory and computation without much compromise …
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
While excellent in transfer learning Vision-Language models (VLMs) come with high
computational costs due to their large number of parameters. To address this issue …
computational costs due to their large number of parameters. To address this issue …
UPSCALE: unconstrained channel pruning
As neural networks grow in size and complexity, inference speeds decline. To combat this,
one of the most effective compression techniques–channel pruning–removes channels from …
one of the most effective compression techniques–channel pruning–removes channels from …
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Structural model pruning is a prominent approach used for reducing the computational cost
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …