Ammus: A survey of transformer-based pretrained models in natural language processing
KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …
almost every NLP task. The evolution of these models started with GPT and BERT. These …
Pre-trained language models for text generation: A survey
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers
Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc
How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …
exceptionally challenging even for powerful cloud servers due to their prohibitive …
Llm-qat: Data-free quantization aware training for large language models
Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …
Post-training quantization for vision transformer
Recently, transformer has achieved remarkable performance on a variety of computer vision
applications. Compared with mainstream convolutional neural networks, vision transformers …
applications. Compared with mainstream convolutional neural networks, vision transformers …
A primer in BERTology: What we know about how BERT works
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …
understanding of what is behind their success is still limited. This paper is the first survey of …
I-bert: Integer-only bert quantization
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results
in many Natural Language Processing tasks. However, their memory footprint, inference …
in many Natural Language Processing tasks. However, their memory footprint, inference …
Squeezellm: Dense-and-sparse quantization
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …
wide range of tasks. However, deploying these models for inference has been a significant …
A fast post-training pruning framework for transformers
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …
However, prior work on pruning Transformers requires retraining the models. This can add …
Efficient methods for natural language processing: A survey
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …
scaling model parameters and training data; however, using only scale to improve …