关注
Guangzhi Sun
Guangzhi Sun
在 cam.ac.uk 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, Y Wu
ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020
1412020
Generating diverse and natural text-to-speech samples using a quantized fine-grained vae and autoregressive prosody prior
G Sun, Y Zhang, RJ Weiss, Y Cao, H Zen, A Rosenberg, B Ramabhadran, ...
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
114*2020
Salmonn: Towards generic hearing abilities for large language models
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
602023
Speaker diarisation using 2D self-attentive combination of embeddings
G Sun, C Zhang, PC Woodland
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
392019
Transformer language models with LSTM-based cross-utterance information representation
G Sun, C Zhang, PC Woodland
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
362021
Tree-constrained pointer generator for end-to-end contextual speech recognition
G Sun, C Zhang, PC Woodland
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2021
252021
Connecting speech encoder and large language model for asr
W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
172024
Combination of deep speaker embeddings for diarisation
G Sun, C Zhang, PC Woodland
Neural Networks 141, 372-384, 2021
172021
Can contextual biasing remain effective with Whisper and GPT-2?
G Sun, X Zheng, C Zhang, PC Woodland
arXiv preprint arXiv:2306.01942, 2023
112023
Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator
G Sun, C Zhang, PC Woodland
IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 345-354, 2022
112022
Tree-constrained pointer generator with graph neural network encodings for contextual speech recognition
G Sun, C Zhang, PC Woodland
arXiv preprint arXiv:2207.00857, 2022
102022
Fine-grained audio-visual joint representations for multimodal large language models
G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.05863, 2023
72023
End-to-end spoken language understanding with tree-constrained pointer generator
G Sun, C Zhang, PC Woodland
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
72023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
J Hwang, M Hira, C Chen, X Zhang, Z Ni, G Sun, P Ma, R Huang, V Pratap, ...
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-9, 2023
62023
Knowledge-aware audio-grounded generative slot filling for limited annotated data
G Sun, C Zhang, I Vulić, P Budzianowski, PC Woodland
arXiv preprint arXiv:2307.01764, 2023
62023
Cross-utterance conditioned VAE for non-autoregressive text-to-speech
Y Li, C Yu, G Sun, H Jiang, F Sun, W Zu, Y Wen, Y Yang, J Wang
arXiv preprint arXiv:2205.04120, 2022
62022
Cross-utterance language models with acoustic error sampling
G Sun, C Zhang, PC Woodland
arXiv preprint arXiv:2009.01008, 2020
52020
Content-aware speaker embeddings for speaker diarisation
G Sun, D Liu, C Zhang, PC Woodland
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
32021
Graph neural networks for contextual ASR with the tree-constrained pointer generator
G Sun, C Zhang, PC Woodland
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
22024
Extending large language models for speech and audio captioning
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
22024
系统目前无法执行此操作,请稍后再试。
文章 1–20