Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs W Xia, D Wang, X Pang, Z Wang, B Zhao, D Hu arXiv preprint arXiv:2311.02847, 2023 | 2 | 2023 |
Revisiting pre-training in audio-visual learning R Feng, W Xia, D Hu arXiv preprint arXiv:2302.03533, 2023 | 2 | 2023 |
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World H Lin, L Ruan, W Xia, P Liu, J Wen, Y Xu, D Hu, R Song, WX Zhao, Q Jin, ... Proceedings of the 31st ACM International Conference on Multimedia, 1303-1313, 2023 | 1 | 2023 |
Robust cross-modal knowledge distillation for unconstrained videos W Xia, X Li, A Deng, H Xiong, D Dou, D Hu arXiv preprint arXiv:2304.07775, 2023 | 1 | 2023 |
Learning Manipulation by Predicting Interaction J Zeng, Q Bu, B Wang, W Xia, L Chen, H Dong, H Song, D Wang, D Hu, ... arXiv preprint arXiv:2406.00439, 2024 | | 2024 |
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation J Zhang, C Bai, H He, W Xia, Z Wang, B Zhao, X Li, X Li arXiv preprint arXiv:2405.19586, 2024 | | 2024 |
Balanced Audiovisual Dataset for Imbalance Analysis W Xia, X Zhao, X Pang, C Zhang, D Hu arXiv preprint arXiv:2302.10912, 2023 | | 2023 |