Knowledge distillation for small-footprint highway networks

黄震华，杨顺志，林威，倪娟，孙圣力，陈运文，汤庸 - 计算机学报, 2022 - 159.226.43.17

摘要高性能的深度学习网络通常是计算型和参数密集型的, 难以应用于资源受限的边缘设备.
为了能够在低资源设备上运行深度学习模型, 需要研发高效的小规模网络 …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：2882 相关文章所有 12 个版本

[PDF] arxiv.org

Patient knowledge distillation for bert model compression

S Sun, Y Cheng, Z Gan, J Liu - arXiv preprint arXiv:1908.09355, 2019 - arxiv.org

Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …

被引用次数：887 相关文章所有 4 个版本

[PDF] thecvf.com

Knowledge distillation via instance relationship graph

Y Liu, J Cao, B Li, C Yuan, W Hu… - Proceedings of the …, 2019 - openaccess.thecvf.com

The key challenge of knowledge distillation is to extract general, moderate and sufficient
knowledge from a teacher network to guide a student network. In this paper, a novel …

被引用次数：338 相关文章所有 3 个版本

[PDF] ieee.org

Recent progresses in deep learning based acoustic models

D Yu, J Li - IEEE/CAA Journal of automatica sinica, 2017 - ieeexplore.ieee.org

In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …

被引用次数：198 相关文章所有 7 个版本

[HTML] nih.gov

Towards model compression for deep learning based speech enhancement

K Tan, DL Wang - IEEE/ACM transactions on audio, speech …, 2021 - ieeexplore.ieee.org

The use of deep neural networks (DNNs) has dramatically elevated the performance of
speech enhancement over the last decade. However, to achieve strong enhancement …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Large-scale domain adaptation via teacher-student learning

J Li, ML Seltzer, X Wang, R Zhao, Y Gong - arXiv preprint arXiv …, 2017 - arxiv.org

High accuracy speech recognition requires a large amount of transcribed data for
supervised training. In the absence of such data, domain adaptation of a well-trained …

被引用次数：160 相关文章所有 8 个版本

[PDF] arxiv.org

Conditional teacher-student learning

Z Meng, J Li, Y Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …

被引用次数：113 相关文章所有 6 个版本

[PDF] arxiv.org

Lessons from building acoustic models with a million hours of speech

SHK Parthasarathi, N Strom - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

This is a report of our lessons learned building acoustic models from 1 Million hours of
unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ …

被引用次数：99 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient knowledge distillation for rnn-transducer models

S Panchapagesan, DS Park, CC Chiu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Knowledge Distillation is an effective method of transferring knowledge from a large model
to a smaller model. Distillation can be viewed as a type of model compression, and has …

被引用次数：55 相关文章所有 4 个版本