Knowledge distillation: A survey
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …
especially for computer vision tasks. The great success of deep learning is mainly due to its …
Patient knowledge distillation for bert model compression
Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …
language processing (NLP) tasks. However, the high demand for computing resources in …
Knowledge distillation via instance relationship graph
The key challenge of knowledge distillation is to extract general, moderate and sufficient
knowledge from a teacher network to guide a student network. In this paper, a novel …
knowledge from a teacher network to guide a student network. In this paper, a novel …
Recent progresses in deep learning based acoustic models
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …
models and the motivation and insights behind the surveyed techniques. We first discuss …
Towards model compression for deep learning based speech enhancement
The use of deep neural networks (DNNs) has dramatically elevated the performance of
speech enhancement over the last decade. However, to achieve strong enhancement …
speech enhancement over the last decade. However, to achieve strong enhancement …
Large-scale domain adaptation via teacher-student learning
High accuracy speech recognition requires a large amount of transcribed data for
supervised training. In the absence of such data, domain adaptation of a well-trained …
supervised training. In the absence of such data, domain adaptation of a well-trained …
Conditional teacher-student learning
The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …
such as domain adaptation and model compression. One shortcoming of the T/S learning is …
Lessons from building acoustic models with a million hours of speech
SHK Parthasarathi, N Strom - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This is a report of our lessons learned building acoustic models from 1 Million hours of
unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ …
unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ …
Efficient knowledge distillation for rnn-transducer models
Knowledge Distillation is an effective method of transferring knowledge from a large model
to a smaller model. Distillation can be viewed as a type of model compression, and has …
to a smaller model. Distillation can be viewed as a type of model compression, and has …