关注
gao zhifu
gao zhifu
Speech Lab, Alibaba Group
在 alibaba-inc.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
Z Gao, Y Song, IV McLoughlin, P Li, Y Jiang, LR Dai
INTERSPEECH 2019, 361-365, 2019
832019
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition
Z Gao, S Zhang, I McLoughlin, Z Yan
arXiv preprint arXiv:2206.08317, 2022
522022
An Effective Deep Embedding Learning Architecture for Speaker Verification
Y Jiang, Y Song, IV McLoughlin, Z Gao, LR Dai
INTERSPEECH 2019, 4040-4044, 2019
342019
San-m: Memory equipped self-attention for end-to-end speech recognition
Z Gao, S Zhang, M Lei, I McLoughlin
INTERSPEECH 2020, 6-10, 2020
302020
Streaming chunk-aware multihead attention for online end-to-end speech recognition
S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan, L Xie
INTERSPEECH 2020, 2142-2146, 2020
302020
An improved deep embedding learning method for short duration speaker verification
Z Gao, Y Song, IV McLoughlin, W Guo, LR Dai
INTERSPEECH 2018, 3578-3582, 2018
302018
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li, L Zuo, Z Du, Z Xiao, ...
INERSPEECH 2023, 2023
232023
Lauragpt: Listen, attend, understand, and regenerate audio with gpt
Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, W Wang, S Zheng, ...
arXiv preprint arXiv:2310.04673, 2023
212023
emotion2vec: Self-supervised pre-training for speech emotion representation
Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen
arXiv preprint arXiv:2312.15185, 2023
192023
Extremely Low Footprint End-to-End ASR System for Smart Device
Z Gao, Y Yao, S Zhang, J Yang, M Lei, I McLoughlin
INTERSPEECH 2021, 4548-4552, 2021
152021
Universal ASR: Unifying streaming and non-streaming ASR using a single encoder-decoder model
Z Gao, S Zhang, M Lei, I McLoughlin
arXiv preprint arXiv:2010.14099, 2020
142020
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ...
arXiv preprint arXiv:2402.08846, 2024
72024
SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability
X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
42024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ...
arXiv preprint arXiv:2407.05407, 2024
12024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
T SpeechTeam
arXiv preprint arXiv:2407.04051, 2024
12024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
G Yang, Z Ma, F Yu, Z Gao, S Zhang, X Chen
arXiv preprint arXiv:2406.05839, 2024
12024
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
X Shi, H Luo, Z Gao, S Zhang, Z Yan
INERSPEECH 2023, 2023
12023
Wav2vec‐MoE: An unsupervised pre‐training and adaptation method for multi‐accent ASR
Y Lin, S Zhang, Z Gao, L Wang, Y Yang, J Dang
Electronics Letters 59 (11), e12823, 2023
2023
Streaming End-to-End Speech Recognition Method, Apparatus and Electronic Device
S Zhang, GAO Zhifu
US Patent App. 17/976,464, 2023
2023
Speech Processing method, Speech Encoder, Speech Decoder and Speech Recognition System
S Zhang, GAO Zhifu, M Lei
US Patent App. 17/951,569, 2023
2023
系统目前无法执行此操作,请稍后再试。
文章 1–20