Exploring the boundaries of low-resource BERT distillation

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu

Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

被引用次数：196 相关文章所有 14 个版本

[PDF] arxiv.org

Bridging fairness and environmental sustainability in natural language processing

M Hessenthaler, E Strubell, D Hovy… - arXiv preprint arXiv …, 2022 - arxiv.org

Fairness and environmental impact are important research directions for the sustainable
development of artificial intelligence. However, while each topic is an active research area in …

被引用次数：13 相关文章所有 5 个版本

[PDF] neurips.cc

Differentially private model compression

F Mireshghallah, A Backurs, HA Inan… - Advances in …, 2022 - proceedings.neurips.cc

Recent papers have shown that large pre-trained language models (LLMs) such as BERT,
GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private …

被引用次数：13 相关文章所有 5 个版本

[PDF] aclanthology.org

Low resource causal event detection from biomedical literature

Z Liang, E Noriega-Atala, C Morrison… - Proceedings of the …, 2022 - aclanthology.org

Recognizing causal precedence relations among the chemical interactions in biomedical
literature is crucial to understanding the underlying biological mechanisms. However …

被引用次数：12 相关文章所有 3 个版本

[PDF] aclanthology.org

Hyperparameter power impact in transformer language model training

LHP de Chavannes, MGK Kongsbak… - Proceedings of the …, 2021 - aclanthology.org

Training large language models can consume a large amount of energy. We hypothesize
that the language model's configuration impacts its energy consumption, and that there is …

被引用次数：17 相关文章所有 5 个版本

[PDF] techscience.cn

[PDF][PDF] Online News Sentiment Classification Using DistilBERT.

SK Akpatsa, H Lei, X Li, VHKS Obeng… - Journal of Quantum …, 2022 - cdn.techscience.cn

The ability of pre-trained BERT model to achieve outstanding performances on many
Natural Language Processing (NLP) tasks has attracted the attention of researchers in …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

TangoBERT: Reducing inference cost by using cascaded architecture

J Mamou, O Pereg, M Wasserblat… - arXiv preprint arXiv …, 2022 - arxiv.org

The remarkable success of large transformer-based models such as BERT, RoBERTa and
XLNet in many NLP tasks comes with a large increase in monetary and environmental cost …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

General cross-architecture distillation of pretrained language models into matrix embeddings

L Galke, I Cuber, C Meyer, HF Nölscher… - … Joint Conference on …, 2022 - ieeexplore.ieee.org

Large pretrained language models (PreLMs) are rev-olutionizing natural language
processing across all benchmarks. However, their sheer size is prohibitive for small …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Sparse distillation: Speeding up text classification by using bigger Student models

Q Ye, M Khabsa, M Lewis, S Wang, X Ren… - arXiv preprint arXiv …, 2021 - arxiv.org

Distilling state-of-the-art transformer models into lightweight student models is an effective
way to reduce computation cost at inference time. The student models are typically compact …

被引用次数：2 相关文章所有 4 个版本

[PDF] aclanthology.org

Alternative non-BERT model choices for the textual classification in low-resource languages and environments

SM Maheen, MR Faisal, MR Rahman… - Proceedings of the …, 2022 - aclanthology.org

Abstract Natural Language Processing (NLP) tasks in non-dominant and low-resource
languages have not experienced significant progress. Although pre-trained BERT models …

被引用次数：2 相关文章所有 4 个版本