Hubert: Self-supervised speech representation learning by masked prediction of hidden units WN Hsu, B Bolte, YHH Tsai, K Lakhotia, R Salakhutdinov, A Mohamed IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3451-3460, 2021 | 2195 | 2021 |
Data2vec: A general framework for self-supervised learning in speech, vision and language A Baevski, WN Hsu, Q Xu, A Babu, J Gu, M Auli International Conference on Machine Learning, 1298-1312, 2022 | 725 | 2022 |
An unsupervised autoregressive model for speech representation learning YA Chung, WN Hsu, H Tang, J Glass INTERSPEECH, 2019 | 442 | 2019 |
Unsupervised learning of disentangled and interpretable representations from sequential data WN Hsu, Y Zhang, J Glass Thirty-first Conference on Neural Information Processing Systems (NeurIPS), 2017 | 405 | 2017 |
Hierarchical generative modeling for controllable speech synthesis WN Hsu, Y Zhang, RJ Weiss, H Zen, Y Wu, Y Wang, Y Cao, Y Jia, Z Chen, ... Seventh International Conference on Learning Representations (ICLR), 2019 | 296* | 2019 |
Unsupervised speech recognition A Baevski, WN Hsu, A Conneau, M Auli Advances in Neural Information Processing Systems 34, 27826-27839, 2021 | 279 | 2021 |
On generative spoken language modeling from raw audio K Lakhotia, E Kharitonov, WN Hsu, Y Adi, A Polyak, B Bolte, TA Nguyen, ... Transactions of the Association for Computational Linguistics 9, 1336-1354, 2021 | 270 | 2021 |
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations A Polyak, Y Adi, J Copet, E Kharitonov, K Lakhotia, WN Hsu, A Mohamed, ... INTERSPEECH, 2021 | 243 | 2021 |
Learning audio-visual speech representation by masked multimodal cluster prediction B Shi, WN Hsu, K Lakhotia, A Mohamed arXiv preprint arXiv:2201.02184, 2022 | 233 | 2022 |
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training WN Hsu, A Sriram, A Baevski, T Likhomanenko, Q Xu, V Pratap, J Kahn, ... INTERSPEECH, 2021 | 226 | 2021 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 202 | 2019 |
Active learning by learning WN Hsu, HT Lin Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015 | 194 | 2015 |
Learning Latent Representations for Speech Generation and Transformation WN Hsu, Y Zhang, J Glass INTERSPEECH, 1273-1277, 2017 | 182 | 2017 |
Scaling speech technology to 1,000+ languages V Pratap, A Tjandra, B Shi, P Tomasello, A Babu, S Kundu, A Elkahky, ... Journal of Machine Learning Research 25 (97), 1-52, 2024 | 164 | 2024 |
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation WN Hsu, Y Zhang, J Glass 2017 IEEE automatic speech recognition and understanding workshop (ASRU), 16-23, 2017 | 164 | 2017 |
Semi-supervised training for improving data efficiency in end-to-end speech synthesis YA Chung, Y Wang, WN Hsu, Y Zhang, RJ Skerry-Ryan ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 138 | 2019 |
Direct speech-to-speech translation with discrete units A Lee, PJ Chen, C Wang, J Gu, S Popuri, X Ma, A Polyak, Y Adi, Q He, ... arXiv preprint arXiv:2107.05604, 2021 | 135 | 2021 |
Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization WN Hsu, Y Zhang, RJ Weiss, YA Chung, Y Wang, Y Wu, J Glass ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 124 | 2019 |
Textless speech-to-speech translation on real data A Lee, H Gong, PA Duquenne, H Schwenk, PJ Chen, C Wang, S Popuri, ... arXiv preprint arXiv:2112.08352, 2021 | 114 | 2021 |
Voicebox: Text-guided multilingual universal speech generation at scale M Le, A Vyas, B Shi, B Karrer, L Sari, R Moritz, M Williamson, V Manohar, ... Advances in neural information processing systems 36, 2024 | 113 | 2024 |