关注
Ziyang Ma
标题
引用次数
引用次数
年份
LauraGPT: Listen, attend, understand, and regenerate audio with GPT
J Wang, Z Du, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ...
30*2023
emotion2vec: Self-supervised pre-training for speech emotion representation
Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen
Proc. ACL 2024, 2023
192023
MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets
Z Ma, Z Zheng, C Tang, Y Wang, X Chen
Proc. Interspeech 2023, 2022
182022
Chatmusician: Understanding and generating music intrinsically with llm
R Yuan, H Lin, Y Wang, Z Tian, S Wu, T Shen, G Zhang, Y Wu, C Liu, ...
Proc. ACL 2024, 2024
152024
Hierarchical deep residual reasoning for temporal moment localization
Z Ma, X Han, X Song, Y Cui, L Nie
Proceedings of the 3rd ACM International Conference on Multimedia in Asia, 1-7, 2021
112021
Voiceflow: Efficient text-to-speech with rectified flow matching
Y Guo, C Du, Z Ma, X Chen, K Yu
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
10*2024
Towards universal speech discrete tokens: A case study for asr and tts
Y Yang, F Shen, C Du, Z Ma, K Yu, D Povey, X Chen
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
102024
Ella-v: Stable neural codec language modeling with alignment-guided sequence reordering
Y Song, Z Chen, X Wang, Z Ma, X Chen
arXiv preprint arXiv:2401.07333, 2024
102024
Map-neo: Highly capable and transparent bilingual large language model series
G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu, D Pan, E Cheng, J Liu, ...
arXiv preprint arXiv:2405.19327, 2024
72024
Chinese tiny llm: Pretraining a chinese-centric large language model
X Du, Z Yu, S Gao, D Pan, Y Cheng, Z Ma, R Yuan, X Qu, J Liu, T Zheng, ...
Proc. 1st COLM Conference, 2024
72024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ...
arXiv preprint arXiv:2402.08846, 2024
72024
Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition
Z Ma, W Wu, Z Zheng, Y Guo, Q Chen, S Zhang, X Chen
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
62024
EAT: Self-supervised pre-training with efficient audio transformer
W Chen, Y Liang, Z Ma, Z Zheng, X Chen
arXiv preprint arXiv:2401.03497, 2024
62024
Improving few-shot learning for talking face system with tts data augmentation
Q Chen, Z Ma, T Liu, X Tan, Q Lu, K Yu, X Chen
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
52023
Towards Weakly Supervised Text-to-Audio Grounding
X Xu, Z Ma, M Wu, K Yu
arXiv preprint arXiv:2401.02584, 2024
42024
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning
G Yang, Z Ma, Z Zheng, Y Song, Z Niu, X Chen
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023
42023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Z Zheng, Z Ma, Y Wang, X Chen
Proc. Interspeech 2023, 2023
42023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Z Ma, Z Zheng, G Yang, Y Wang, C Zhang, X Chen
Proc. Interspeech 2023, 2023
42023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
Z Liang, Z Song, Z Ma, C Du, K Yu, X Chen
Proc. Interspeech 2023, 2023
42023
Tessp: text-enhanced self-supervised speech pre-training
Z Yao, S Ren, S Chen, Z Ma, P Guo, L Xie
arXiv preprint arXiv:2211.13443, 2022
42022
系统目前无法执行此操作,请稍后再试。
文章 1–20