Wavlm: Large-scale self-supervised pre-training for full stack speech processing S Chen, C Wang, Z Chen, Y Wu, S Liu, Z Chen, J Li, N Kanda, T Yoshioka, ... IEEE Journal of Selected Topics in Signal Processing 16 (6), 1505-1518, 2022 | 1147 | 2022 |
Neural codec language models are zero-shot text to speech synthesizers C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu, Z Chen, Y Liu, H Wang, ... arXiv preprint arXiv:2301.02111, 2023 | 348 | 2023 |
Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing J Ao, R Wang, L Zhou, C Wang, S Ren, Y Wu, S Liu, T Ko, Q Li, Y Zhang, ... arXiv preprint arXiv:2110.07205, 2021 | 172 | 2021 |
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition J Li, Y Wu, Y Gaur, C Wang, R Zhao, S Liu InterSpeech 2020, 2020 | 146 | 2020 |
Continuous speech separation with conformer S Chen, Y Wu, Z Chen, J Wu, J Li, T Yoshioka, C Wang, S Liu, M Zhou ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 130 | 2021 |
Beats: Audio pre-training with acoustic tokenizers S Chen, Y Wu, C Wang, S Liu, D Tompkins, Z Chen, F Wei arXiv preprint arXiv:2212.09058, 2022 | 126 | 2022 |
Unispeech: Unified speech representation learning with labeled and unlabeled data C Wang, Y Wu, Y Qian, K Kumatani, S Liu, F Wei, M Zeng, X Huang ICML 2021, 2021 | 116 | 2021 |
Large-scale self-supervised speech representation learning for automatic speaker verification Z Chen, S Chen, Y Wu, Y Qian, C Wang, S Liu, Y Qian, M Zeng ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 101 | 2022 |
Curriculum Pre-training for End-to-End Speech Translation C Wang, Y Wu, S Liu, M Zhou, Z Yang Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 99 | 2020 |
Bridging the gap between pre-training and fine-tuning for end-to-end speech translation C Wang, Y Wu, S Liu, Z Yang, M Zhou Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 9161-9168, 2020 | 84 | 2020 |
Speak foreign languages with your own voice: Cross-lingual neural codec language modeling Z Zhang, L Zhou, C Wang, S Chen, Y Wu, S Liu, Z Chen, Y Liu, H Wang, ... arXiv preprint arXiv:2303.03926, 2023 | 83 | 2023 |
Unispeech-sat: Universal speech representation learning with speaker aware pre-training S Chen, Y Wu, C Wang, Z Chen, Z Chen, S Liu, J Wu, Y Qian, F Wei, J Li, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 69 | 2022 |
Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition Y Wang, J Li, H Wang, Y Qian, C Wang, Y Wu ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 58 | 2022 |
Low latency end-to-end streaming speech recognition with a scout network C Wang, Y Wu, L Lu, S Liu, J Li, G Ye, M Zhou InterSpeech 2020, 2020 | 58 | 2020 |
Semantic mask for transformer based end-to-end speech recognition C Wang, Y Wu, Y Du, J Li, S Liu, L Lu, S Ren, G Ye, S Zhao, M Zhou InterSpeech 2020, 2019 | 48 | 2019 |
Why does self-supervised learning for speech recognition benefit speaker recognition? S Chen, Y Wu, C Wang, S Liu, Z Chen, P Wang, G Liu, J Li, J Wu, X Yu, ... arXiv preprint arXiv:2204.12765, 2022 | 33 | 2022 |
Improving noise robustness of contrastive speech representation learning with speech reconstruction H Wang, Y Qian, X Wang, Y Wang, C Wang, S Liu, T Yoshioka, J Li, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 28 | 2022 |
Supervision-guided codebooks for masked prediction in speech pre-training C Wang, Y Wang, Y Wu, S Chen, J Li, S Liu, F Wei arXiv preprint arXiv:2206.10125, 2022 | 18 | 2022 |
Source dependency-aware transformer with supervised self-attention C Wang, S Wu, S Liu arXiv preprint arXiv:1909.02273, 2019 | 13 | 2019 |
Unispeech at scale: An empirical study of pre-training method on large-scale speech recognition dataset C Wang, Y Wu, S Liu, J Li, Y Qian, K Kumatani, F Wei arXiv preprint arXiv:2107.05233, 2021 | 11 | 2021 |