Clip-adapter: Better vision-language models with feature adapters P Gao, S Geng, R Zhang, T Ma, R Fang, Y Zhang, H Li, Y Qiao International Journal of Computer Vision, 2021 | 578 | 2021 |
Uniformer: Unified transformer for efficient spatiotemporal representation learning K Li, Y Wang, P Gao, G Song, Y Liu, H Li, Y Qiao arXiv preprint arXiv:2201.04676, 2022 | 460* | 2022 |
Tip-adapter: Training-free clip-adapter for better vision-language modeling R Zhang, R Fang, W Zhang, P Gao, K Li, J Dai, Y Qiao, H Li arXiv preprint arXiv:2111.03930, 2021 | 447* | 2021 |
Llama-adapter: Efficient fine-tuning of language models with zero-init attention R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu, S Yan, P Lu, H Li, Y Qiao arXiv preprint arXiv:2303.16199, 2023 | 442 | 2023 |
Dynamic fusion with intra-and inter-modality attention flow for visual question answering P Gao, Z Jiang, H You, P Lu, SCH Hoi, X Wang, H Li Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 412* | 2019 |
Llama-adapter v2: Parameter-efficient visual instruction model P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou, W Zhang, P Lu, C He, ... arXiv preprint arXiv:2304.15010, 2023 | 339 | 2023 |
Pointclip: Point cloud understanding by clip R Zhang, Z Guo, W Zhang, K Li, X Miao, B Cui, Y Qiao, P Gao, H Li Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 308 | 2022 |
Fast convergence of detr with spatially modulated co-attention P Gao, M Zheng, X Wang, J Dai, H Li Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 282 | 2021 |
End-to-end object detection with adaptive clustering transformer M Zheng, P Gao, R Zhang, K Li, X Wang, H Li, H Dong arXiv preprint arXiv:2011.09315, 2020 | 206 | 2020 |
Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training R Zhang, Z Guo, P Gao, R Fang, B Zhao, D Wang, Y Qiao, H Li Advances in neural information processing systems 35, 27061-27074, 2022 | 172 | 2022 |
Frozen clip models are efficient video learners Z Lin, S Geng, R Zhang, P Gao, G De Melo, X Wang, J Dai, Y Qiao, H Li European Conference on Computer Vision, 388-404, 2022 | 145 | 2022 |
Convmae: Masked convolution meets masked autoencoders P Gao, T Ma, H Li, Z Lin, J Dai, Y Qiao arXiv preprint arXiv:2205.03892, 2022 | 141* | 2022 |
Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning X Zhu, R Zhang, B He, Z Guo, Z Zeng, Z Qin, S Zhang, P Gao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 122 | 2023 |
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 103 | 2023 |
MonoDETR: Depth-guided transformer for monocular 3D object detection R Zhang, H Qiu, T Wang, Z Guo, Z Cui, Y Qiao, H Li, P Gao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 98 | 2023 |
Personalize segment anything model with one shot R Zhang, Z Jiang, Z Guo, S Yan, J Pan, X Ma, H Dong, P Gao, H Li arXiv preprint arXiv:2305.03048, 2023 | 95 | 2023 |
Multi-modality latent interaction network for visual question answering P Gao, H You, Z Zhang, X Wang, H Li Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 95* | 2019 |
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ... arXiv preprint arXiv:2306.09265, 2023 | 94 | 2023 |
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders R Zhang, L Wang, Y Qiao, P Gao, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 89 | 2023 |
Question-guided hybrid convolution for visual question answering P Gao, H Li, S Li, P Lu, Y Li, SCH Hoi, X Wang Proceedings of the European Conference on Computer Vision (ECCV), 469-485, 2018 | 85 | 2018 |