Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

Multi-task learning with deep neural networks: A survey

M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arXiv preprint arXiv:2012.09816, 2020 - arxiv.org
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

[PDF][PDF] Language models are few-shot learners

TB Brown - arXiv preprint arXiv:2005.14165, 2020 - splab.sdu.edu.cn
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot
performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning …

Patient knowledge distillation for bert model compression

S Sun, Y Cheng, Z Gan, J Liu - arXiv preprint arXiv:1908.09355, 2019 - arxiv.org
Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc
In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

Multi-task deep neural networks for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1901.11504, 2019 - arxiv.org
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning
representations across multiple natural language understanding (NLU) tasks. MT-DNN not …

[图书][B] Synthetic data for deep learning

SI Nikolenko - 2021 - Springer
You are holding in your hands… oh, come on, who holds books like this in their hands
anymore? Anyway, you are reading this, and it means that I have managed to release one of …