Improving multi-task deep neural networks via knowledge distillation for natural language...

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer

Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

被引用次数：1751 相关文章所有 9 个版本

[PDF] arxiv.org

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

被引用次数：707 相关文章所有 10 个版本

[PDF] arxiv.org

Multi-task learning with deep neural networks: A survey

M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org

Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …

被引用次数：753 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：2734 相关文章所有 12 个版本

[PDF] arxiv.org

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arXiv preprint arXiv:2012.09816, 2020 - arxiv.org

We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

被引用次数：377 相关文章所有 4 个版本

[PDF] sdu.edu.cn

[PDF][PDF] Language models are few-shot learners

TB Brown - arXiv preprint arXiv:2005.14165, 2020 - splab.sdu.edu.cn

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot
performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning …

被引用次数：32436 相关文章

[PDF] arxiv.org

Patient knowledge distillation for bert model compression

S Sun, Y Cheng, Z Gan, J Liu - arXiv preprint arXiv:1908.09355, 2019 - arxiv.org

Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …

被引用次数：871 相关文章所有 4 个版本

[PDF] neurips.cc

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc

In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

被引用次数：2201 相关文章所有 10 个版本

[PDF] arxiv.org

Multi-task deep neural networks for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1901.11504, 2019 - arxiv.org

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning
representations across multiple natural language understanding (NLU) tasks. MT-DNN not …

被引用次数：1424 相关文章所有 6 个版本

[PDF] academia.edu

[图书][B] Synthetic data for deep learning

SI Nikolenko - 2021 - Springer

You are holding in your hands… oh, come on, who holds books like this in their hands
anymore? Anyway, you are reading this, and it means that I have managed to release one of …

被引用次数：543 相关文章所有 13 个版本