Foundational models defining a new era in vision: A survey and outlook

M Awais, M Naseer, S Khan, RM Anwer… - arXiv preprint arXiv …, 2023 - arxiv.org
Vision systems to see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Sam 2: Segment anything in images and videos

N Ravi, V Gabeur, YT Hu, R Hu, C Ryali, T Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …

MOSE: A new dataset for video object segmentation in complex scenes

H Ding, C Liu, S He, X Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …

Segment and track anything

Y Cheng, L Li, Y Xu, X Li, Z Yang, W Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
This report presents a framework called Segment And Track Anything (SAMTrack) that
allows users to precisely and effectively segment and track any object in a video …

Putting the object back into video object segmentation

HK Cheng, SW Oh, B Price, JY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …

Logic-induced diagnostic reasoning for semi-supervised semantic segmentation

C Liang, W Wang, J Miao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent advances in semi-supervised semantic segmentation have been heavily reliant on
pseudo labeling to compensate for limited labeled data, disregarding the valuable relational …

Boosting video object segmentation via space-time correspondence learning

Y Zhang, L Li, W Wang, R Xie… - Proceedings of the …, 2023 - openaccess.thecvf.com
Current top-leading solutions for video object segmentation (VOS) typically follow a
matching-based regime: for each query frame, the segmentation mask is inferred according …

Efficient emotional adaptation for audio-driven talking-head generation

Y Gan, Z Yang, X Yue, L Sun… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Audio-driven talking-head synthesis is a popular research topic for virtual human-related
applications. However, the inflexibility and inefficiency of existing methods, which …

Local-global context aware transformer for language-guided video segmentation

C Liang, W Wang, T Zhou, J Miao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …