Compressing large-scale transformer-based models: A case study on bert
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …
various Natural Language Processing (NLP) tasks. However, these models often have …
Bridging fairness and environmental sustainability in natural language processing
M Hessenthaler, E Strubell, D Hovy… - arXiv preprint arXiv …, 2022 - arxiv.org
Fairness and environmental impact are important research directions for the sustainable
development of artificial intelligence. However, while each topic is an active research area in …
development of artificial intelligence. However, while each topic is an active research area in …
Differentially private model compression
Recent papers have shown that large pre-trained language models (LLMs) such as BERT,
GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private …
GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private …
Low resource causal event detection from biomedical literature
Recognizing causal precedence relations among the chemical interactions in biomedical
literature is crucial to understanding the underlying biological mechanisms. However …
literature is crucial to understanding the underlying biological mechanisms. However …
Hyperparameter power impact in transformer language model training
LHP de Chavannes, MGK Kongsbak… - Proceedings of the …, 2021 - aclanthology.org
Training large language models can consume a large amount of energy. We hypothesize
that the language model's configuration impacts its energy consumption, and that there is …
that the language model's configuration impacts its energy consumption, and that there is …
[PDF][PDF] Online News Sentiment Classification Using DistilBERT.
SK Akpatsa, H Lei, X Li, VHKS Obeng… - Journal of Quantum …, 2022 - cdn.techscience.cn
The ability of pre-trained BERT model to achieve outstanding performances on many
Natural Language Processing (NLP) tasks has attracted the attention of researchers in …
Natural Language Processing (NLP) tasks has attracted the attention of researchers in …
TangoBERT: Reducing inference cost by using cascaded architecture
The remarkable success of large transformer-based models such as BERT, RoBERTa and
XLNet in many NLP tasks comes with a large increase in monetary and environmental cost …
XLNet in many NLP tasks comes with a large increase in monetary and environmental cost …
General cross-architecture distillation of pretrained language models into matrix embeddings
Large pretrained language models (PreLMs) are rev-olutionizing natural language
processing across all benchmarks. However, their sheer size is prohibitive for small …
processing across all benchmarks. However, their sheer size is prohibitive for small …
Sparse distillation: Speeding up text classification by using bigger Student models
Distilling state-of-the-art transformer models into lightweight student models is an effective
way to reduce computation cost at inference time. The student models are typically compact …
way to reduce computation cost at inference time. The student models are typically compact …
Alternative non-BERT model choices for the textual classification in low-resource languages and environments
Abstract Natural Language Processing (NLP) tasks in non-dominant and low-resource
languages have not experienced significant progress. Although pre-trained BERT models …
languages have not experienced significant progress. Although pre-trained BERT models …