LauraGPT: Listen, attend, understand, and regenerate audio with GPT J Wang, Z Du, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ... | 30* | 2023 |
emotion2vec: Self-supervised pre-training for speech emotion representation Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen Proc. ACL 2024, 2023 | 19 | 2023 |
MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets Z Ma, Z Zheng, C Tang, Y Wang, X Chen Proc. Interspeech 2023, 2022 | 18 | 2022 |
Chatmusician: Understanding and generating music intrinsically with llm R Yuan, H Lin, Y Wang, Z Tian, S Wu, T Shen, G Zhang, Y Wu, C Liu, ... Proc. ACL 2024, 2024 | 15 | 2024 |
Hierarchical deep residual reasoning for temporal moment localization Z Ma, X Han, X Song, Y Cui, L Nie Proceedings of the 3rd ACM International Conference on Multimedia in Asia, 1-7, 2021 | 11 | 2021 |
Voiceflow: Efficient text-to-speech with rectified flow matching Y Guo, C Du, Z Ma, X Chen, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 10* | 2024 |
Towards universal speech discrete tokens: A case study for asr and tts Y Yang, F Shen, C Du, Z Ma, K Yu, D Povey, X Chen ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 10 | 2024 |
Ella-v: Stable neural codec language modeling with alignment-guided sequence reordering Y Song, Z Chen, X Wang, Z Ma, X Chen arXiv preprint arXiv:2401.07333, 2024 | 10 | 2024 |
Map-neo: Highly capable and transparent bilingual large language model series G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu, D Pan, E Cheng, J Liu, ... arXiv preprint arXiv:2405.19327, 2024 | 7 | 2024 |
Chinese tiny llm: Pretraining a chinese-centric large language model X Du, Z Yu, S Gao, D Pan, Y Cheng, Z Ma, R Yuan, X Qu, J Liu, T Zheng, ... Proc. 1st COLM Conference, 2024 | 7 | 2024 |
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ... arXiv preprint arXiv:2402.08846, 2024 | 7 | 2024 |
Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition Z Ma, W Wu, Z Zheng, Y Guo, Q Chen, S Zhang, X Chen ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 6 | 2024 |
EAT: Self-supervised pre-training with efficient audio transformer W Chen, Y Liang, Z Ma, Z Zheng, X Chen arXiv preprint arXiv:2401.03497, 2024 | 6 | 2024 |
Improving few-shot learning for talking face system with tts data augmentation Q Chen, Z Ma, T Liu, X Tan, Q Lu, K Yu, X Chen ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 5 | 2023 |
Towards Weakly Supervised Text-to-Audio Grounding X Xu, Z Ma, M Wu, K Yu arXiv preprint arXiv:2401.02584, 2024 | 4 | 2024 |
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning G Yang, Z Ma, Z Zheng, Y Song, Z Niu, X Chen 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 4 | 2023 |
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition Z Zheng, Z Ma, Y Wang, X Chen Proc. Interspeech 2023, 2023 | 4 | 2023 |
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation Z Ma, Z Zheng, G Yang, Y Wang, C Zhang, X Chen Proc. Interspeech 2023, 2023 | 4 | 2023 |
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation Z Liang, Z Song, Z Ma, C Du, K Yu, X Chen Proc. Interspeech 2023, 2023 | 4 | 2023 |
Tessp: text-enhanced self-supervised speech pre-training Z Yao, S Ren, S Chen, Z Ma, P Guo, L Xie arXiv preprint arXiv:2211.13443, 2022 | 4 | 2022 |