Memory network with pixel-level spatio-temporal learning for visual object tracking

E Song, W Chai, G Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

被引用次数：73 相关文章所有 3 个版本

[PDF] thecvf.com

Onetracker: Unifying visual object tracking with foundation models and efficient tuning

L Hong, S Yan, R Zhang, W Li, X Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual object tracking aims to localize the target object of each frame based on its initial
appearance in the first frame. Depending on the input modility tracking tasks can be divided …

被引用次数：11 相关文章所有 3 个版本

[PDF] neurips.cc

Reading relevant feature from global representation memory for visual object tracking

X Zhou, P Guo, L Hong, J Li, W Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Reference features from a template or historical frames are crucial for visual object tracking.
Prior works utilize all features from a fixed template or memory for visual object tracking …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Openvis: Open-vocabulary video instance segmentation

P Guo, T Huang, P He, X Liu, T Xiao, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect,
segment, and track arbitrary object categories in a video, without being constrained to …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

E Song, W Chai, T Ye, JN Hwang, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

被引用次数：4 相关文章所有 2 个版本

Video Visualization and Visual Analytics: A Task-Based and Application-Driven Investigation

W Xia, G Sun, T Li, B Chang, J Tang… - … on Circuits and …, 2024 - ieeexplore.ieee.org

Video data refers to digital information in the form of a series of frames or images
representing continuous motion captured by a video recording device. In various domains …

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval

Z Li, L Zhang, K Zhang, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Image-text retrieval is a fundamental task in bridging the semantics between vision and
language. The key challenge lies in accurately and efficiently learning the semantic …

SRRT: Exploring Search Region Regulation for Visual Object Tracking

J Zhu, X Chen, P Zhang, X Wang… - … on Circuits and …, 2024 - ieeexplore.ieee.org

The dominant trackers generate a fixed-size rectangular region based on the previous
prediction or initial bounding box as the model input, ie, search region. While this manner …

[PDF] arxiv.org

Multi-step Temporal Modeling for UAV Tracking

X Yuan, T Xu, X Liu, Y Wang, H Qin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based approaches have
gained traction due to their optimal balance between efficiency and precision. However …

被引用次数：3 相关文章所有 3 个版本

LGTrack: Exploiting Local and Global Properties for Robust Visual Tracking

C Liu, J Zhao, C Bo, S Li, D Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Re-detection is a necessary capability for long-term tracking. Target candidate proposals in
the whole image can provide a chance of tracking reset when tracking fails due to tracking …