Audioldm 2: Learning holistic audio generation with self-supervised pretraining H Liu, Y Yuan, X Liu, X Mei, Q Kong, Q Tian, Y Wang, W Wang, Y Wang, ... IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 84 | 2024 |
Efficient neural music generation MWY Lam, Q Tian, T Li, Z Yin, S Feng, M Tu, Y Ji, R Xia, M Ma, X Song, ... Advances in Neural Information Processing Systems 36, 2024 | 38 | 2024 |
Neural dubber: Dubbing for videos according to scripts C Hu, Q Tian, T Li, W Yuping, Y Wang, H Zhao Advances in neural information processing systems 34, 16582-16595, 2021 | 31 | 2021 |
Lm-vc: Zero-shot voice conversion via speech generation based on language models Z Wang, Y Chen, L Xie, Q Tian, Y Wang IEEE Signal Processing Letters, 2023 | 23 | 2023 |
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models P Anastassiou, J Chen, J Chen, Y Chen, Z Chen, Z Chen, J Cong, L Deng, ... arXiv preprint arXiv:2406.02430, 2024 | 21 | 2024 |
Neufa: Neural network based end-to-end forced alignment with bidirectional attention mechanism J Li, Y Meng, Z Wu, H Meng, Q Tian, Y Wang, Y Wang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 20 | 2022 |
Polyvoice: Language models for speech to speech translation Q Dong, Z Huang, Q Tian, C Xu, T Ko, Y Zhao, S Feng, T Li, K Wang, ... arXiv preprint arXiv:2306.02982, 2023 | 19 | 2023 |
Controllable and lossless non-autoregressive end-to-end text-to-speech Z Liu, Q Tian, C Hu, X Liu, M Wu, Y Wang, H Zhao, Y Wang arXiv preprint arXiv:2207.06088, 2022 | 13 | 2022 |
Inferring speaking styles from multi-modal conversational context by multi-scale relational graph convolutional networks J Li, Y Meng, X Wu, Z Wu, J Jia, H Meng, Q Tian, Y Wang, Y Wang Proceedings of the 30th ACM International Conference on Multimedia, 5811-5820, 2022 | 12 | 2022 |
Cloning one’s voice using very limited data in the wild D Dai, Y Chen, L Chen, M Tu, L Liu, R Xia, Q Tian, Y Wang, Y Wang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 10 | 2022 |
DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin T Li, C Hu, J Cong, X Zhu, J Li, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 8 | 2023 |
Streaming voice conversion via intermediate bottleneck features and non-streaming teacher guidance Y Chen, M Tu, T Li, X Li, Q Kong, J Li, Z Wang, Q Tian, Y Wang, Y Wang ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 7 | 2023 |
Seed-asr: Understanding diverse speech and contexts with llm-based speech recognition Y Bai, J Chen, J Chen, W Chen, Z Chen, C Ding, L Dong, Q Dong, Y Du, ... arXiv preprint arXiv:2407.04675, 2024 | 4 | 2024 |
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing J Li, S Li, P Chen, L Zhang, Y Meng, Z Wu, H Meng, Q Tian, Y Wang, ... IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 517-528, 2023 | 3 | 2023 |
MSM-VC: high-fidelity source style transfer for non-parallel voice conversion by multi-scale style modeling Z Wang, X Wang, Q Xie, T Li, L Xie, Q Tian, Y Wang IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 3 | 2023 |
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion Z Wang, Y Chen, X Wang, Z Chen, L Xie, Y Wang, Y Wang arXiv preprint arXiv:2401.11053, 2024 | 2 | 2024 |
Delivering speaking style in low-resource voice conversion with multi-factor constraints Z Wang, X Wang, L Xie, Y Chen, Q Tian, Y Wang ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 2 | 2023 |
Multi-level temporal-channel speaker retrieval for robust zero-shot voice conversion Z Wang, L Xue, Q Kong, L Xie, Y Chen, Q Tian, Y Wang arXiv preprint arXiv:2305.07204, 2023 | 2 | 2023 |
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning T Li, Z Wang, X Zhu, J Cong, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | | 2024 |
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network D Jia, Q Tian, K Peng, J Li, Y Chen, M Ma, Y Wang, Y Wang arXiv preprint arXiv:2212.05751, 2022 | | 2022 |