Transformers in computational visual media: A survey
Transformers, the dominant architecture for natural language processing, have also recently
attracted much attention from computational visual media researchers due to their capacity …
attracted much attention from computational visual media researchers due to their capacity …
Deep learning for video object segmentation: a review
As one of the fundamental problems in the field of video understanding, video object
segmentation aims at segmenting objects of interest throughout the given video sequence …
segmentation aims at segmenting objects of interest throughout the given video sequence …
Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model
HK Cheng, AG Schwing - European Conference on Computer Vision, 2022 - Springer
We present XMem, a video object segmentation architecture for long videos with unified
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …
Aiatrack: Attention in attention for transformer visual tracking
Transformer trackers have achieved impressive advancements recently, where the attention
mechanism plays an important role. However, the independent correlation computation in …
mechanism plays an important role. However, the independent correlation computation in …
Uniformer: Unifying convolution and self-attention for visual recognition
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …
large local redundancy and complex global dependency in these visual data. Convolution …
MOSE: A new dataset for video object segmentation in complex scenes
Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
Hiera: A hierarchical vision transformer without the bells-and-whistles
Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …
the pursuit of supervised classification performance. While these components lead to …
Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks
In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …
based downstream tasks, including visual object tracking (VOT) and video object …
Associating objects with transformers for video object segmentation
This paper investigates how to realize better and more efficient embedding learning to tackle
the semi-supervised video object segmentation under challenging multi-object scenarios …
the semi-supervised video object segmentation under challenging multi-object scenarios …
Putting the object back into video object segmentation
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …
reading which puts the object representation from memory back into the video object …