Pre-trained models for natural language processing: A survey
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …
Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …
solving the most complex problem statements. However, these models are huge in size with …
Multi-task learning with deep neural networks: A survey
M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …
simultaneously learned by a shared model. Such approaches offer advantages like …
Knowledge distillation: A survey
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …
especially for computer vision tasks. The great success of deep learning is mainly due to its …
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
Z Allen-Zhu, Y Li - arXiv preprint arXiv:2012.09816, 2020 - arxiv.org
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …
how the superior performance of ensemble can be distilled into a single model using …
[PDF][PDF] Language models are few-shot learners
TB Brown - arXiv preprint arXiv:2005.14165, 2020 - splab.sdu.edu.cn
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot
performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning …
performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning …
Patient knowledge distillation for bert model compression
Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …
language processing (NLP) tasks. However, the high demand for computing resources in …
Superglue: A stickier benchmark for general-purpose language understanding systems
In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …
striking performance improvements across a range of language understanding tasks. The …
Multi-task deep neural networks for natural language understanding
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning
representations across multiple natural language understanding (NLU) tasks. MT-DNN not …
representations across multiple natural language understanding (NLU) tasks. MT-DNN not …
[图书][B] Synthetic data for deep learning
SI Nikolenko - 2021 - Springer
You are holding in your hands… oh, come on, who holds books like this in their hands
anymore? Anyway, you are reading this, and it means that I have managed to release one of …
anymore? Anyway, you are reading this, and it means that I have managed to release one of …