VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature C Du, Y Guo, X Chen, K Yu arXiv preprint arXiv:2204.00768, 2022 | 54 | 2022 |
Emodiff: Intensity controllable emotional text-to-speech with soft-label guidance Y Guo, C Du, X Chen, K Yu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 27 | 2023 |
UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding C Du, Y Guo, F Shen, Z Liu, Z Liang, X Chen, S Wang, H Zhang, K Yu Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17924 …, 2024 | 21 | 2024 |
Diffvoice: Text-to-speech with latent diffusion Z Liu, Y Guo, K Yu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 13 | 2023 |
Unsupervised word-level prosody tagging for controllable speech synthesis Y Guo, C Du, K Yu ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 12 | 2022 |
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching Y Guo, C Du, Z Ma, X Chen, K Yu ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 8 | 2024 |
Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition Z Ma, W Wu, Z Zheng, Y Guo, Q Chen, S Zhang, X Chen ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 6 | 2024 |
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech C Du, Y Guo, H Wang, Y Yang, Z Niu, S Wang, H Zhang, X Chen, K Yu arXiv preprint arXiv:2401.14321, 2024 | 6 | 2024 |
Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature C Du, Y Guo, X Chen, K Yu IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 5 | 2023 |
DSE-TTS: dual speaker embedding for cross-lingual text-to-speech S Liu, Y Guo, C Du, X Chen, K Yu arXiv preprint arXiv:2306.14145, 2023 | 4 | 2023 |
Acoustic bpe for speech generation with discrete tokens F Shen, Y Guo, C Du, X Chen, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 3 | 2024 |
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge C Du, Y Guo, F Shen, K Yu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 3 | 2023 |
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention J Li, Y Guo, X Chen, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 1 | 2024 |
GlobalWalk: Learning Global-aware Node Embeddings via Biased Sampling Z Xue, Z Guo, Y Guo arXiv preprint arXiv:2201.09882, 2022 | 1* | 2022 |
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation B Li, Z Xie, X Xu, Y Guo, M Yan, J Zhang, K Yu, M Wu arXiv preprint arXiv:2407.13198, 2024 | | 2024 |
On the Effectiveness of Acoustic BPE in Decoder-Only TTS B Li, F Shen, Y Guo, S Wang, X Chen, K Yu arXiv preprint arXiv:2407.03892, 2024 | | 2024 |
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech H Wang, C Du, Y Guo, S Wang, X Chen, K Yu arXiv preprint arXiv:2404.19723, 2024 | | 2024 |
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations S Liu, Y Guo, X Chen, K Yu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge Y Guo, C Wang, Y Yang, H Wang, Z Ma, C Du, S Wang, H Li, S Fan, ... arXiv preprint arXiv:2404.06079, 2024 | | 2024 |
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations H Zhang, Y Guo, S Liu, X Chen, K Yu arXiv preprint arXiv:2311.01260, 2023 | | 2023 |