Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes Z Wang, H Huang, Y Zhao, Z Zhang, Z Zhao arXiv preprint arXiv:2308.08769, 2023 | 18 | 2023 |
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ... ICCV 2023, 15735-15745, 2023 | 12 | 2023 |
Connecting Multi-modal Contrastive Representations Z Wang, Y Zhao, X Cheng, H Huang, J Liu, L Tang, L Li, Y Wang, A Yin, ... NeurIPS 2023, 2023 | 11 | 2023 |
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao ICCV 2023, 2023 | 6 | 2023 |
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao EMNLP 2023, 2023 | 5 | 2023 |
Scene-robust natural language video localization via learning domain-invariant representations Z Wang, Y Zhao, H Huang, Y Xia, Z Zhao ACL 2023, 144-160, 2023 | 5 | 2023 |
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers H Huang, Z Wang, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao arXiv preprint arXiv:2312.08168, 2023 | 4 | 2023 |
Extending multi-modal contrastive representations Z Wang, Z Zhang, L Liu, Y Zhao, H Huang, T Jin, Z Zhao arXiv preprint arXiv:2310.08884, 2023 | 2 | 2023 |
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ... ICML 2024, 2024 | | 2024 |
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin, M Li, X Duan, Z Zhao arXiv preprint arXiv:2312.15197, 2023 | | 2023 |