Uniaudio: An audio foundation model toward universal audio generation D Yang, J Tian, X Tan, R Huang, S Liu, X Chang, J Shi, S Zhao, J Bian, ... arXiv preprint arXiv:2310.00704, 2023 | 59 | 2023 |
Hifi-codec: Group-residual vector quantization for high fidelity audio codec D Yang, S Liu, R Huang, J Tian, C Weng, Y Zou arXiv preprint arXiv:2305.02765, 2023 | 54 | 2023 |
Reproducing whisper-style training using an open-source toolkit and publicly available data Y Peng, J Tian, B Yan, D Berrebbi, X Chang, X Li, J Shi, S Arora, W Chen, ... 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 25 | 2023 |
Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study X Chang, B Yan, K Choi, JW Jung, Y Lu, S Maiti, R Sharma, J Shi, J Tian, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 19 | 2024 |
LAE: Language-aware encoder for monolingual and multilingual asr J Tian, J Yu, C Zhang, C Weng, Y Zou, D Yu Interspeech 2022, 2022 | 18 | 2022 |
OWSM v3. 1: Better and faster open whisper-style speech models based on e-branchformer Y Peng, J Tian, W Chen, S Arora, B Yan, Y Sudo, M Shakeel, K Choi, ... arXiv preprint arXiv:2401.16658, 2024 | 14 | 2024 |
Consistent training and decoding for end-to-end speech recognition using lattice-free MMI J Tian, J Yu, C Weng, SX Zhang, D Su, D Yu, Y Zou ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 13 | 2022 |
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model J Tian, J Yu, C Weng, Y Zou, D Yu IEEE Signal Processing Letters 29, 812-816, 2022 | 10 | 2022 |
Integrating Lattice-Free MMI into End-to-End Speech Recognition J Tian, J Yu, C Weng, Y Zou, D Yu IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022 | 9* | 2022 |
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks J Tian, B Yan, J Yu, C Weng, D Yu, S Watanabe International Conference on Learning Representations (ICLR) 2023, 2022 | 8 | 2022 |
A random gossip BMUF process for neural language modeling Y Huang, J Tian, L Han, G Wang, X Song, D Su, D Yu ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 3 | 2020 |
AutoPrep: An Automatic Preprocessing Framework for In-The-Wild Speech Data J Yu, H Chen, Y Bian, X Li, Y Luo, J Tian, M Liu, J Jiang, S Wang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 2 | 2024 |
The minetrans systems for iwslt 2023 offline speech translation and speech-to-speech translation tasks Y Du, G Zhengsheng, J Tian, Z Zhang, X Wang, J Yu, Z Tu, T Xu, E Chen Proceedings of the 20th International Conference on Spoken Language …, 2023 | 2 | 2023 |
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction Z Zhao, R Gu, D Yang, J Tian, Y Zou Interspeech 2022, 2022 | 2 | 2022 |
Towards Robust Speech Representation Learning for Thousands of Languages W Chen, W Zhang, Y Peng, X Li, J Tian, J Shi, X Chang, S Maiti, K Livescu, ... arXiv preprint arXiv:2407.00837, 2024 | 1 | 2024 |
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units X Chang, J Shi, J Tian, Y Wu, Y Tang, Y Wu, S Watanabe, Y Adi, X Chen, ... arXiv preprint arXiv:2406.07725, 2024 | 1 | 2024 |
UniAudio: Towards Universal Audio Generation with Large Language Models D Yang, J Tian, X Tan, R Huang, S Liu, H Guo, X Chang, J Shi, J Bian, ... Forty-first International Conference on Machine Learning, 0 | 1 | |
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners R Huang, C Zhang, Y Wang, D Yang, J Tian, Z Ye, L Liu, Z Wang, Z Jiang, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | | 2024 |
CMU’s IWSLT 2024 Offline Speech Translation System: A Cascaded Approach For Long-Form Robustness B Yan, P Fernandes, J Tian, S Ouyang, W Chen, K Livescu, L Li, ... Proceedings of the 21st International Conference on Spoken Language …, 2024 | | 2024 |
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models J Tian, Y Peng, W Chen, K Choi, K Livescu, S Watanabe arXiv preprint arXiv:2406.09282, 2024 | | 2024 |