关注
Zejun Ma
Zejun Ma
Bytedance
在 bytedance.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection
K Chen, X Du, B Zhu, Z Ma, T Berg-Kirkpatrick, S Dubnov
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
1722022
Bytesing: A chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and wavernn vocoders
Y Gu, X Yin, Y Rao, Y Wan, B Tang, Y Zhang, J Chen, Y Wang, Z Ma
2021 12th International Symposium on Chinese Spoken Language Processing …, 2021
822021
Salmonn: Towards generic hearing abilities for large language models
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
752023
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias
Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ...
arXiv preprint arXiv:2306.03509, 2023
382023
Ppg-based singing voice conversion with adversarial representation learning
Z Li, B Tang, X Yin, Y Wan, L Xu, C Shen, Z Ma
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
352021
Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection
M Han, L Dong, Z Liang, M Cai, S Zhou, Z Ma, B Xu
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
342022
A unified sequence-to-sequence front-end model for mandarin text-to-speech synthesis
J Pan, X Yin, Z Zhang, S Liu, Y Zhang, Z Ma, Y Wang
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
322020
S3t: Self-supervised pre-training with swin transformer for music classification
H Zhao, C Zhang, B Zhu, Z Ma, K Zhang
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
312022
Towards realistic visual dubbing with heterogeneous sources
T Xie, L Liao, C Bi, B Tang, X Yin, J Yang, M Wang, J Yao, Y Zhang, Z Ma
Proceedings of the 29th ACM International Conference on Multimedia, 1739-1747, 2021
312021
Zero-shot audio source separation through query-based learning from weakly-labeled data
K Chen, X Du, B Zhu, Z Ma, T Berg-Kirkpatrick, S Dubnov
Proceedings of the AAAI Conference on Artificial Intelligence 36 (4), 4441-4449, 2022
302022
Deep LSTM for large vocabulary continuous speech recognition
X Tian, J Zhang, Z Ma, Y He, J Wei, P Wu, W Situ, S Li, Y Zhang
arXiv preprint arXiv:1703.07090, 2017
272017
Bytecover: Cover song identification via multi-loss training
X Du, Z Yu, B Zhu, X Chen, Z Ma
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
262021
A hybrid text normalization system using multi-head self-attention for mandarin
J Zhang, J Pan, X Yin, C Li, S Liu, Y Zhang, Y Wang, Z Ma
ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020
252020
Make-an-audio 2: Temporal-enhanced text-to-audio generation
J Huang, Y Ren, R Huang, D Yang, Z Ye, C Zhang, J Liu, X Yin, Z Ma, ...
arXiv preprint arXiv:2305.18474, 2023
242023
Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system
X Liang, B Wang, H Huang, S Wu, P Wu, L Lu, Z Ma, Z Li
arXiv e-prints, arXiv: 2304.13343, 2023
222023
Bytecover2: Towards dimensionality reduction of latent embedding for efficient cover song identification
X Du, K Chen, Z Wang, B Zhu, Z Ma
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
222022
Learning hierarchical representations for expressive speaking style in end-to-end speech synthesis
X An, Y Wang, S Yang, Z Ma, L Xie
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019
212019
Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks
Y Lu, M Huang, X Qu, P Wei, Z Ma
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
202022
Connecting speech encoder and large language model for asr
W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
182024
Polyvoice: Language models for speech to speech translation
Q Dong, Z Huang, Q Tian, C Xu, T Ko, Y Zhao, S Feng, T Li, K Wang, ...
arXiv preprint arXiv:2306.02982, 2023
182023
系统目前无法执行此操作,请稍后再试。
文章 1–20