Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc
How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

Llm-qat: Data-free quantization aware training for large language models

Z Liu, B Oguz, C Zhao, E Chang, P Stock… - arXiv preprint arXiv …, 2023 - arxiv.org
Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …

Post-training quantization for vision transformer

Z Liu, Y Wang, K Han, W Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recently, transformer has achieved remarkable performance on a variety of computer vision
applications. Compared with mainstream convolutional neural networks, vision transformers …

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

I-bert: Integer-only bert quantization

S Kim, A Gholami, Z Yao… - … on machine learning, 2021 - proceedings.mlr.press
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results
in many Natural Language Processing tasks. However, their memory footprint, inference …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc
Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …