A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

A survey on lora of large language models

Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - Frontiers of Computer …, 2025 - Springer
Abstract Low-Rank Adaptation (LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …

Assessing the brittleness of safety alignment via pruning and low-rank modifications

B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as
evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent text in response to prompts or …

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net
The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

Nuteprune: Efficient progressive pruning with numerous teachers for large language models

S Li, J Chen, X Han, J Bai - arXiv preprint arXiv:2402.09773, 2024 - arxiv.org
The considerable size of Large Language Models (LLMs) presents notable deployment
challenges, particularly on resource-constrained hardware. Structured pruning, offers an …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Compressing large language models by streamlining the unimportant layer

X Chen, Y Hu, J Zhang - arXiv preprint arXiv:2403.19135, 2024 - arxiv.org
Large language models (LLM) have been extensively applied in various natural language
tasks and domains, but their applicability is constrained by the large number of parameters …

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

X Luo, D Liu, H Kong, S Huai, H Chen… - ACM Transactions on …, 2024 - dl.acm.org
Deep neural networks (DNNs) have recently achieved impressive success across a wide
range of real-world vision and language processing tasks, spanning from image …