English Conversational Telephone Speech Recognition by Humans and Machines G Saon, G Kurata, T Sercu, K Audhkhasi, S Thomas, D Dimitriadis, X Cui, ... arXiv preprint arXiv:1703.02136, 2017 | 477 | 2017 |
The subspace Gaussian mixture model—A structured model for speech recognition D Povey, L Burget, M Agarwal, P Akyazi, F Kai, A Ghoshal, O Glembek, ... Computer Speech & Language 25 (2), 404-439, 2011 | 391 | 2011 |
Efficient Knowledge Distillation from an Ensemble of Teachers. T Fukuda, M Suzuki, G Kurata, S Thomas, J Cui, B Ramabhadran Interspeech, 3697-3701, 2017 | 250 | 2017 |
Subspace Gaussian mixture models for speech recognition D Povey, L Burget, M Agarwal, P Akyazi, K Feng, A Ghoshal, NK Goel, ... 2010 IEEE International Conference on Acoustics, Speech and Signal …, 2010 | 248 | 2010 |
Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models L Burget, P Schwarz, M Agarwal, P Akyazi, K Feng, A Ghoshal, N Goel, ... 2010 IEEE International Conference on Acoustics, Speech and Signal …, 2010 | 210 | 2010 |
Deep neural network features and semi-supervised training for low resource speech recognition S Thomas, ML Seltzer, K Church, H Hermansky 2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013 | 176 | 2013 |
Avlnet: Learning audio-visual language representations from instructional videos A Rouditchenko, A Boggust, D Harwath, B Chen, D Joshi, S Thomas, ... arXiv preprint arXiv:2006.09199, 2020 | 144 | 2020 |
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions S Thomas, S Ganapathy, G Saon, H Soltau ICASSP, 2014 | 144 | 2014 |
Everything at Once-Multi-Modal Fusion Transformer for Video Retrieval N Shvetsova, B Chen, A Rouditchenko, S Thomas, B Kingsbury, RS Feris, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 142 | 2022 |
Multilingual MLP features for low-resource LVCSR systems S Thomas, S Ganapathy, H Hermansky | 131 | 2012 |
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition. A Jansen, E Dupoux, S Goldwater, M Johnson, S Khudanpur, K Church, ... ICASSP, 8111-8115, 2013 | 120 | 2013 |
Rapid evaluation of speech representations for spoken term discovery MA Carlin, S Thomas, A Jansen, H Hermansky Twelfth Annual Conference of the International Speech Communication Association, 2011 | 116 | 2011 |
Recognition of reverberant speech using frequency domain linear prediction S Thomas, S Ganapathy, H Hermansky IEEE Signal Processing Letters 15, 681-684, 2008 | 112 | 2008 |
Cross-lingual and multi-stream posterior features for low resource LVCSR systems. S Thomas, S Ganapathy, H Hermansky Interspeech, 877-880, 2010 | 90 | 2010 |
Joint modeling of accents and acoustics for multi-accent speech recognition X Yang, K Audhkhasi, A Rosenberg, S Thomas, B Ramabhadran, ... 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 87 | 2018 |
Multimodal clustering networks for self-supervised learning from unlabeled videos B Chen, A Rouditchenko, K Duarte, H Kuehne, S Thomas, A Boggust, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 85 | 2021 |
Annealed dropout training of deep networks SJ Rennie, V Goel, S Thomas 2014 IEEE Spoken Language Technology Workshop (SLT), 159-164, 2014 | 83 | 2014 |
Invariant Representations for Noisy Speech Recognition D Serdyuk, K Audhkhasi, P Brakel, B Ramabhadran, S Thomas, Y Bengio arXiv preprint arXiv:1612.01928, 2016 | 81 | 2016 |
Weak top-down constraints for unsupervised acoustic model training A Jansen, S Thomas, H Hermansky 2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013 | 71 | 2013 |
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems Y Huang, HK Kuo, S Thomas, Z Kons, K Audhkhasi, B Kingsbury, R Hoory, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 69 | 2020 |