Foundational models defining a new era in vision: A survey and outlook
Vision systems to see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
A comprehensive survey on segment anything model for vision and beyond
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …
Sam 2: Segment anything in images and videos
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …
promptable visual segmentation in images and videos. We build a data engine, which …
MOSE: A new dataset for video object segmentation in complex scenes
Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
Segment and track anything
This report presents a framework called Segment And Track Anything (SAMTrack) that
allows users to precisely and effectively segment and track any object in a video …
allows users to precisely and effectively segment and track any object in a video …
Putting the object back into video object segmentation
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …
reading which puts the object representation from memory back into the video object …
Logic-induced diagnostic reasoning for semi-supervised semantic segmentation
Recent advances in semi-supervised semantic segmentation have been heavily reliant on
pseudo labeling to compensate for limited labeled data, disregarding the valuable relational …
pseudo labeling to compensate for limited labeled data, disregarding the valuable relational …
Boosting video object segmentation via space-time correspondence learning
Current top-leading solutions for video object segmentation (VOS) typically follow a
matching-based regime: for each query frame, the segmentation mask is inferred according …
matching-based regime: for each query frame, the segmentation mask is inferred according …
Efficient emotional adaptation for audio-driven talking-head generation
Audio-driven talking-head synthesis is a popular research topic for virtual human-related
applications. However, the inflexibility and inefficiency of existing methods, which …
applications. However, the inflexibility and inefficiency of existing methods, which …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …