Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Bridging fairness and environmental sustainability in natural language processing

M Hessenthaler, E Strubell, D Hovy… - arXiv preprint arXiv …, 2022 - arxiv.org
Fairness and environmental impact are important research directions for the sustainable
development of artificial intelligence. However, while each topic is an active research area in …

Differentially private model compression

F Mireshghallah, A Backurs, HA Inan… - Advances in …, 2022 - proceedings.neurips.cc
Recent papers have shown that large pre-trained language models (LLMs) such as BERT,
GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private …

Low resource causal event detection from biomedical literature

Z Liang, E Noriega-Atala, C Morrison… - Proceedings of the …, 2022 - aclanthology.org
Recognizing causal precedence relations among the chemical interactions in biomedical
literature is crucial to understanding the underlying biological mechanisms. However …

Hyperparameter power impact in transformer language model training

LHP de Chavannes, MGK Kongsbak… - Proceedings of the …, 2021 - aclanthology.org
Training large language models can consume a large amount of energy. We hypothesize
that the language model's configuration impacts its energy consumption, and that there is …

[PDF][PDF] Online News Sentiment Classification Using DistilBERT.

SK Akpatsa, H Lei, X Li, VHKS Obeng… - Journal of Quantum …, 2022 - cdn.techscience.cn
The ability of pre-trained BERT model to achieve outstanding performances on many
Natural Language Processing (NLP) tasks has attracted the attention of researchers in …

TangoBERT: Reducing inference cost by using cascaded architecture

J Mamou, O Pereg, M Wasserblat… - arXiv preprint arXiv …, 2022 - arxiv.org
The remarkable success of large transformer-based models such as BERT, RoBERTa and
XLNet in many NLP tasks comes with a large increase in monetary and environmental cost …

General cross-architecture distillation of pretrained language models into matrix embeddings

L Galke, I Cuber, C Meyer, HF Nölscher… - … Joint Conference on …, 2022 - ieeexplore.ieee.org
Large pretrained language models (PreLMs) are rev-olutionizing natural language
processing across all benchmarks. However, their sheer size is prohibitive for small …

Sparse distillation: Speeding up text classification by using bigger Student models

Q Ye, M Khabsa, M Lewis, S Wang, X Ren… - arXiv preprint arXiv …, 2021 - arxiv.org
Distilling state-of-the-art transformer models into lightweight student models is an effective
way to reduce computation cost at inference time. The student models are typically compact …

Alternative non-BERT model choices for the textual classification in low-resource languages and environments

SM Maheen, MR Faisal, MR Rahman… - Proceedings of the …, 2022 - aclanthology.org
Abstract Natural Language Processing (NLP) tasks in non-dominant and low-resource
languages have not experienced significant progress. Although pre-trained BERT models …