Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan, C Zhou, J Zhou arXiv preprint arXiv:2311.07919, 2023 | 83 | 2023 |
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ... arXiv preprint arXiv:2306.03509, 2023 | 44 | 2023 |
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ... The Twelfth International Conference on Learning Representations, 2024 | 6 | 2024 |
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models Z Jiang, Q Yang, J Zuo, Z Ye, R Huang, Y Ren, Z Zhao arXiv preprint arXiv:2305.13612, 2023 | 6 | 2023 |
Qwen2-audio technical report Y Chu, J Xu, Q Yang, H Wei, X Wei, Z Guo, Y Leng, Y Lv, J He, J Lin, ... arXiv preprint arXiv:2407.10759, 2024 | 5 | 2024 |
Boosting prompting mechanisms for zero-shot speech synthesis Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ... The Twelfth International Conference on Learning Representations, 2023 | 5 | 2023 |
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Q Yang, J Xu, W Liu, Y Chu, Z Jiang, X Zhou, Y Leng, Y Lv, Z Zhao, ... arXiv preprint arXiv:2402.07729, 2024 | 4 | 2024 |
Dict-tts: Learning to pronounce with prior dictionary knowledge for text-to-speech Z Jiang, Z Su, Z Zhao, Q Yang, Y Ren, J Liu Advances in Neural Information Processing Systems 35, 11960-11974, 2022 | 4 | 2022 |
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis Q Yang, J Zuo, Z Su, Z Jiang, M Li, Z Zhao, F Chen, Z Wang, B Huai arXiv preprint arXiv:2407.14006, 2024 | | 2024 |