Deep clustering: Discriminative embeddings for segmentation and separation JR Hershey, Z Chen, J Le Roux, S Watanabe 2016 IEEE international conference on acoustics, speech and signal …, 2016 | 1494 | 2016 |
Approximating the Kullback Leibler divergence between Gaussian mixture models JR Hershey, PA Olsen 2007 IEEE International Conference on Acoustics, Speech and Signal …, 2007 | 1375 | 2007 |
SDR–half-baked or well done? J Le Roux, S Wisdom, H Erdogan, JR Hershey ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 1105 | 2019 |
Hybrid CTC/attention architecture for end-to-end speech recognition S Watanabe, T Hori, S Kim, JR Hershey, T Hayashi IEEE Journal of Selected Topics in Signal Processing 11 (8), 1240-1253, 2017 | 862 | 2017 |
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks H Erdogan, JR Hershey, S Watanabe, J Le Roux 2015 IEEE International Conference on Acoustics, Speech and Signal …, 2015 | 742 | 2015 |
Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR F Weninger, H Erdogan, S Watanabe, E Vincent, J Le Roux, JR Hershey, ... Latent Variable Analysis and Signal Separation: 12th International …, 2015 | 675 | 2015 |
Deep unfolding: Model-based inspiration of novel deep architectures JR Hershey, JL Roux, F Weninger arXiv preprint arXiv:1409.2574, 2014 | 492 | 2014 |
Single-channel multi-speaker separation using deep clustering Y Isik, JL Roux, Z Chen, S Watanabe, JR Hershey arXiv preprint arXiv:1607.02173, 2016 | 478 | 2016 |
Attention-based multimodal fusion for video description C Hori, T Hori, TY Lee, Z Zhang, B Harsham, JR Hershey, TK Marks, ... Proceedings of the IEEE international conference on computer vision, 4193-4202, 2017 | 410 | 2017 |
Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking Q Wang, H Muckenhirn, K Wilson, P Sridhar, Z Wu, J Hershey, ... arXiv preprint arXiv:1810.04826, 2018 | 399 | 2018 |
Audio vision: Using audio-visual synchrony to locate sounds J Hershey, J Movellan Advances in neural information processing systems 12, 1999 | 374 | 1999 |
Improved MVDR beamforming using single-channel mask prediction networks. H Erdogan, JR Hershey, S Watanabe, MI Mandel, J Le Roux Interspeech, 1981-1985, 2016 | 354 | 2016 |
Discriminatively trained recurrent neural networks for single-channel speech separation F Weninger, JR Hershey, J Le Roux, B Schuller 2014 IEEE global conference on signal and information processing (GlobalSIP …, 2014 | 354 | 2014 |
Full-capacity unitary recurrent neural networks S Wisdom, T Powers, J Hershey, J Le Roux, L Atlas Advances in Neural Information Processing Systems, 4880-4888, 2016 | 344 | 2016 |
Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation ZQ Wang, J Le Roux, JR Hershey 2018 IEEE International conference on acoustics, speech and signal …, 2018 | 258 | 2018 |
Monaural speech separation and recognition challenge M Cooke, JR Hershey, SJ Rennie Computer Speech & Language 24 (1), 1-15, 2010 | 247 | 2010 |
Deep beamforming networks for multi-channel speech recognition X Xiao, S Watanabe, H Erdogan, L Lu, J Hershey, ML Seltzer, G Chen, ... 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016 | 215 | 2016 |
Alternative objective functions for deep clustering ZQ Wang, J Le Roux, JR Hershey 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 213 | 2018 |
Super-human multi-talker speech recognition: A graphical modeling approach JR Hershey, SJ Rennie, PA Olsen, TT Kristjansson Computer Speech & Language 24 (1), 45-66, 2010 | 212 | 2010 |
Universal sound separation I Kavalerov, S Wisdom, H Erdogan, B Patton, K Wilson, J Le Roux, ... 2019 IEEE Workshop on Applications of Signal Processing to Audio and …, 2019 | 204 | 2019 |