Model compression and hardware acceleration for neural networks: A comprehensive survey
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
Recurrent neural networks for edge intelligence: a survey
VS Lalapura, J Amudha, HS Satheesh - ACM Computing Surveys …, 2021 - dl.acm.org
Recurrent Neural Networks are ubiquitous and pervasive in many artificial intelligence
applications such as speech recognition, predictive healthcare, creative art, and so on …
applications such as speech recognition, predictive healthcare, creative art, and so on …
Terngrad: Ternary gradients to reduce communication in distributed deep learning
High network communication cost for synchronizing gradients and parameters is the well-
known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary …
known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary …
Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity
Large Language Models (LLMs), renowned for their remarkable performance across diverse
domains, present a challenge when it comes to practical deployment due to their colossal …
domains, present a challenge when it comes to practical deployment due to their colossal …
Structured pruning of large language models
Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …
variety of natural language tasks. Meanwhile, the size of these models and their latency …
Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference
Channel pruning is an important method to speed up CNN model's inference. Previous filter
pruning algorithms regard importance evaluation and model fine-tuning as two independent …
pruning algorithms regard importance evaluation and model fine-tuning as two independent …
Memristive LSTM network for sentiment analysis
This paper presents a complete solution for the hardware design of a memristor-based long
short-term memory (MLSTM) network. Throughout the design process, we fully consider the …
short-term memory (MLSTM) network. Throughout the design process, we fully consider the …
[PDF][PDF] Gpu kernels for block-sparse weights
We're releasing highly optimized GPU kernels for an underexplored class of neural network
architectures: networks with block-sparse weights. The kernels allow for efficient evaluation …
architectures: networks with block-sparse weights. The kernels allow for efficient evaluation …
bert2bert: Towards reusable pretrained language models
In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …
upper limit of deep models. However, large language model pre-training costs intensive …
Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures
In seeking for sparse and efficient neural network models, many previous works investigated
on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 …
on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 …