Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

Recurrent neural networks for edge intelligence: a survey

VS Lalapura, J Amudha, HS Satheesh - ACM Computing Surveys …, 2021 - dl.acm.org
Recurrent Neural Networks are ubiquitous and pervasive in many artificial intelligence
applications such as speech recognition, predictive healthcare, creative art, and so on …

Terngrad: Ternary gradients to reduce communication in distributed deep learning

W Wen, C Xu, F Yan, C Wu, Y Wang… - Advances in neural …, 2017 - proceedings.neurips.cc
High network communication cost for synchronizing gradients and parameters is the well-
known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary …

Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity

L Yin, Y Wu, Z Zhang, CY Hsieh, Y Wang, Y Jia… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs), renowned for their remarkable performance across diverse
domains, present a challenge when it comes to practical deployment due to their colossal …

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arXiv preprint arXiv:1910.04732, 2019 - arxiv.org
Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …

Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference

JH Luo, J Wu - Pattern Recognition, 2020 - Elsevier
Channel pruning is an important method to speed up CNN model's inference. Previous filter
pruning algorithms regard importance evaluation and model fine-tuning as two independent …

Memristive LSTM network for sentiment analysis

S Wen, H Wei, Y Yang, Z Guo, Z Zeng… - … on Systems, Man …, 2019 - ieeexplore.ieee.org
This paper presents a complete solution for the hardware design of a memristor-based long
short-term memory (MLSTM) network. Throughout the design process, we fully consider the …

[PDF][PDF] Gpu kernels for block-sparse weights

S Gray, A Radford, DP Kingma - arXiv preprint arXiv:1711.09224, 2017 - cdn.openai.com
We're releasing highly optimized GPU kernels for an underexplored class of neural network
architectures: networks with block-sparse weights. The kernels allow for efficient evaluation …

bert2bert: Towards reusable pretrained language models

C Chen, Y Yin, L Shang, X Jiang, Y Qin, F Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …

Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures

H Yang, W Wen, H Li - arXiv preprint arXiv:1908.09979, 2019 - arxiv.org
In seeking for sparse and efficient neural network models, many previous works investigated
on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 …