Q8bert: Quantized 8bit bert

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：504 相关文章所有 2 个版本

[PDF] arxiv.org

Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org

Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

被引用次数：317 相关文章所有 2 个版本

[PDF] neurips.cc

Llm-pruner: On the structural pruning of large language models

X Ma, G Fang, X Wang - Advances in neural information …, 2023 - proceedings.neurips.cc

Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …

被引用次数：325 相关文章所有 5 个版本

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

被引用次数：762 相关文章所有 6 个版本

[PDF] mlsys.org

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org

We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

被引用次数：254 相关文章所有 4 个版本

[PDF] arxiv.org

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

被引用次数：492 相关文章所有 5 个版本

[PDF] neurips.cc

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc

How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

被引用次数：302 相关文章所有 7 个版本

[PDF] arxiv.org

Llm-qat: Data-free quantization aware training for large language models

Z Liu, B Oguz, C Zhao, E Chang, P Stock… - arXiv preprint arXiv …, 2023 - arxiv.org

Several post-training quantization methods have been applied to large language models
(LLMs), and have been shown to perform well down to 8-bits. We find that these methods …

被引用次数：165 相关文章所有 3 个版本

[PDF] acm.org

A survey on text classification: From traditional to deep learning

Q Li, H Peng, J Li, C Xia, R Yang, L Sun… - ACM Transactions on …, 2022 - dl.acm.org

Text classification is the most fundamental and essential task in natural language
processing. The last decade has seen a surge of research in this area due to the …

被引用次数：339 相关文章所有 6 个版本

[PDF] arxiv.org

Large language models as general pattern machines

S Mirchandani, F Xia, P Florence, B Ichter… - arXiv preprint arXiv …, 2023 - arxiv.org

We observe that pre-trained large language models (LLMs) are capable of autoregressively
completing complex token sequences--from arbitrary ones procedurally generated by …

被引用次数：135 相关文章所有 4 个版本

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Ammus: A survey of transformer-based pretrained models in natural language processing

Llm-pruner: On the structural pruning of large language models

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

Efficiently scaling transformer inference

Glm-130b: An open bilingual pre-trained model

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Llm-qat: Data-free quantization aware training for large language models

A survey on text classification: From traditional to deep learning

Large language models as general pattern machines

高级搜索

引用