CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Z Yang, J Teng, W Zheng, M Ding, ..., T Liu, B Xu, Y Dong, J Tang arXiv preprint arXiv:2408.06072, 2024 | 4 | 2024 |
Dap: Domain-aware prompt learning for vision-and-language navigation T Liu, Y Hu, W Wu, Y Wang, K Xu, Q Yin ICASSP 2024 Oral Presentation (Top 5%), 2615-2619, 2024 | 4 | 2024 |
DARA: Domain-and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding T Liu, X Liu, S Huang, H Chen, Q Yin, L Qin, D Wang, Y Hu ICME 2024 Oral, 2024 | 3 | 2024 |
PANDA: Prompt-Based Context-and Indoor-Aware Pretraining for Vision and Language Navigation T Liu, Y Hu, W Wu, Y Wang, K Xu, Q Yin International Conference on Multimedia Modeling, 187-200, 2024 | 2* | 2024 |
M IST: Multi-Modal Interactive Side-Tuning for Memory-efficient Referring Expression Comprehension X Liu*, T Liu*, S Huang, Y Hu, Q Yin, D Wang, H Chen arXiv preprint arXiv:2407.01131, 2024 | 1 | 2024 |
Skill-dependent representations for object navigation Y Wang, Y Hu, W Wu, T Liu, Y Peng 2023 6th International Conference on Intelligent Robotics and Control …, 2023 | 1 | 2023 |
ACT: Action-assoCiated and Target-Related Representations for Object Navigation Y Wang, Y Hu, W Wu, T Liu, Y Peng International Conference on Multimedia Modeling, 121-133, 2024 | | 2024 |
Dynamic Multi-modal Prompting for Efficient Visual Grounding W Wu, T Liu, Y Wang, K Xu, Q Yin, Y Hu Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 359-371, 2023 | | 2023 |