Openmmlab’s pre-training toolbox and benchmark M Contributors | 55 | 2023 |
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams H Zhang, Y Wang, Y Tang, Y Liu, J Feng, J Dai, X Jin arXiv preprint arXiv:2406.08085, 2024 | 10 | 2024 |
Coarse correspondence elicit 3d spacetime understanding in multimodal language model B Liu, Y Dong, Y Wang, Y Rao, Y Tang, WC Ma, R Krishna arXiv preprint arXiv:2408.00754, 2024 | 9 | 2024 |
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Y Wang, H Zhang, J Tian, Y Tang arXiv preprint arXiv:2412.01268, 2024 | | 2024 |
Hierarchical Memory for Long Video QA Y Wang, H Zhang, Y Tang, Y Liu, J Feng, J Dai, X Jin CVPR 2024 Workshop, 2024 | | 2024 |