Learning intrinsic sparse structures within long short-term memory

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org

Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

被引用次数：964 相关文章所有 2 个版本

Recurrent neural networks for edge intelligence: a survey

VS Lalapura, J Amudha, HS Satheesh - ACM Computing Surveys …, 2021 - dl.acm.org

Recurrent Neural Networks are ubiquitous and pervasive in many artificial intelligence
applications such as speech recognition, predictive healthcare, creative art, and so on …

被引用次数：77 相关文章

[PDF] neurips.cc

Terngrad: Ternary gradients to reduce communication in distributed deep learning

W Wen, C Xu, F Yan, C Wu, Y Wang… - Advances in neural …, 2017 - proceedings.neurips.cc

High network communication cost for synchronizing gradients and parameters is the well-
known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary …

被引用次数：1172 相关文章所有 7 个版本

[PDF] arxiv.org

Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity

L Yin, Y Wu, Z Zhang, CY Hsieh, Y Wang, Y Jia… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs), renowned for their remarkable performance across diverse
domains, present a challenge when it comes to practical deployment due to their colossal …

被引用次数：44 相关文章所有 4 个版本

[PDF] arxiv.org

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arXiv preprint arXiv:1910.04732, 2019 - arxiv.org

Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …

被引用次数：287 相关文章所有 3 个版本

[PDF] arxiv.org

Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference

JH Luo, J Wu - Pattern Recognition, 2020 - Elsevier

Channel pruning is an important method to speed up CNN model's inference. Previous filter
pruning algorithms regard importance evaluation and model fine-tuning as two independent …

被引用次数：247 相关文章所有 6 个版本

Memristive LSTM network for sentiment analysis

S Wen, H Wei, Y Yang, Z Guo, Z Zeng… - … on Systems, Man …, 2019 - ieeexplore.ieee.org

This paper presents a complete solution for the hardware design of a memristor-based long
short-term memory (MLSTM) network. Throughout the design process, we fully consider the …

被引用次数：172 相关文章所有 2 个版本

[PDF] openai.com

[PDF][PDF] Gpu kernels for block-sparse weights

S Gray, A Radford, DP Kingma - arXiv preprint arXiv:1711.09224, 2017 - cdn.openai.com

We're releasing highly optimized GPU kernels for an underexplored class of neural network
architectures: networks with block-sparse weights. The kernels allow for efficient evaluation …

被引用次数：194 相关文章

[PDF] arxiv.org

bert2bert: Towards reusable pretrained language models

C Chen, Y Yin, L Shang, X Jiang, Y Qin, F Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …

被引用次数：75 相关文章所有 6 个版本

[PDF] arxiv.org

Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures

H Yang, W Wen, H Li - arXiv preprint arXiv:1908.09979, 2019 - arxiv.org

In seeking for sparse and efficient neural network models, many previous works investigated
on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 …

被引用次数：130 相关文章所有 4 个版本