When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach

T Ma, B Bai, H Lin, H Wang, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Visual grounding refers to the process of associating natural language expressions with
corresponding regions within an image. Existing benchmarks for visual grounding primarily …

GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes

H Lin, C Wei, L He, Y Guo, Y Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Pedestrian trajectory prediction is a well-established task with significant recent
advancements. However existing datasets are unable to fulfill the demand for studying …