Ego4d: Around the world in 3,000 hours of egocentric video K Grauman, A Westbury, E Byrne, Z Chavis, A Furnari, R Girdhar, ... The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2022 | 683 | 2022 |
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection R Tao, Z Pan, RK Das, X Qian, MZ Shou, H Li ACM International Conference on Multimedia (MM), 2021 | 158 | 2021 |
Self-supervised Speaker Recognition with Loss-gated Learning R Tao, KA Lee, RK Das, V Hautamäki, H Li The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2022 | 47 | 2022 |
Muse: Multi-modal target speaker extraction with visual cues Z Pan, R Tao, C Xu, H Li The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2021 | 41 | 2021 |
Selective listening by synchronizing speech with lips Z Pan, R Tao, C Xu, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2022 | 40 | 2022 |
Audio-visual speaker recognition with a cross-modal discriminative network R Tao, RK Das, H Li INTERSPEECH, 2020 | 38 | 2020 |
HLT-NUS submission for 2020 NIST conversational telephone speech SRE RK Das, R Tao, H Li arXiv, 2021 | 19 | 2021 |
HLT-NUS submission for 2019 NIST multimedia speaker recognition evaluation RK Das, R Tao, J Yang, W Rao, C Yu, H Li APSIPA, 605-609, 2020 | 15* | 2020 |
Target Active Speaker Detection with Audio-visual Cues Y Jiang, R Tao, Z Pan, H Li INTERSPEECH, 2023 | 12 | 2023 |
Speaker recognition with two-step multi-modal deep cleansing R Tao, KA Lee, Z Shi, H Li The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2023 | 10 | 2023 |
Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs R Tao, KA Lee, RK Das, V Hautamäki, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2023 | 5 | 2023 |
Prompt-driven Target Speech Diarization Y Jiang, Z Chen, R Tao, L Deng, Y Qian, H Li The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2024 | 3 | 2024 |
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification DT Truong, R Tao, JQ Yip, KA Lee, ES Chng The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2024 | 2 | 2024 |
USED: Universal Speaker Extraction and Diarization J Ao, MS Yıldırım, M Ge, S Wang, R Tao, Y Qian, L Deng, L Xiao, H Li arXiv preprint arXiv:2309.10674, 2023 | 1 | 2023 |
I4U System Description for NIST SRE'20 CTS Challenge KA Lee, T Kinnunen, D Colibro, C Vair, A Nautsch, H Sun, L He, T Liang, ... arXiv preprint arXiv:2211.01091, 2022 | 1 | 2022 |
A Benchmark for Multi-speaker Anonymization X Miao, R Tao, C Zeng, X Wang arXiv preprint arXiv:2407.05608, 2024 | | 2024 |
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection DT Truong, R Tao, T Nguyen, HT Luong, KA Lee, ES Chng INTERSPEECH, 2024 | | 2024 |
Target Speech Diarization with Multimodal Prompts Y Jiang, R Tao, Z Chen, Y Qian, H Li arXiv preprint arXiv:2406.07198, 2024 | | 2024 |
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio? T Liu, L Zhang, RK Das, Y Ma, R Tao, H Li INTERSPEECH, 2024 | | 2024 |
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention R Tao, X Qian, Y Jiang, J Li, J Wang, H Li arXiv preprint arXiv:2404.18501, 2024 | | 2024 |