Sstvos: Sparse spatiotemporal transformers for video object segmentation

Y Xu, H Wei, M Lin, Y Deng, K Sheng, M Zhang… - Computational Visual …, 2022 - Springer

Transformers, the dominant architecture for natural language processing, have also recently
attracted much attention from computational visual media researchers due to their capacity …

被引用次数：116 相关文章所有 6 个版本

[PDF] springer.com

Deep learning for video object segmentation: a review

M Gao, F Zheng, JJQ Yu, C Shan, G Ding… - Artificial Intelligence …, 2023 - Springer

As one of the fundamental problems in the field of video understanding, video object
segmentation aims at segmenting objects of interest throughout the given video sequence …

被引用次数：62 相关文章所有 13 个版本

[PDF] arxiv.org

Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model

HK Cheng, AG Schwing - European Conference on Computer Vision, 2022 - Springer

We present XMem, a video object segmentation architecture for long videos with unified
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …

被引用次数：303 相关文章所有 8 个版本

[PDF] arxiv.org

Aiatrack: Attention in attention for transformer visual tracking

S Gao, C Zhou, C Ma, X Wang, J Yuan - European Conference on …, 2022 - Springer

Transformer trackers have achieved impressive advancements recently, where the attention
mechanism plays an important role. However, the independent correlation computation in …

被引用次数：221 相关文章所有 5 个版本

[PDF] arxiv.org

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

被引用次数：321 相关文章所有 6 个版本

[PDF] thecvf.com

MOSE: A new dataset for video object segmentation in complex scenes

H Ding, C Liu, S He, X Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …

被引用次数：94 相关文章所有 7 个版本

[PDF] mlr.press

Hiera: A hierarchical vision transformer without the bells-and-whistles

C Ryali, YT Hu, D Bolya, C Wei, H Fan… - International …, 2023 - proceedings.mlr.press

Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …

被引用次数：73 相关文章所有 6 个版本

[PDF] thecvf.com

Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks

Q Wu, T Yang, Z Liu, B Wu, Y Shan… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …

被引用次数：71 相关文章所有 6 个版本

[PDF] neurips.cc

Associating objects with transformers for video object segmentation

Z Yang, Y Wei, Y Yang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

This paper investigates how to realize better and more efficient embedding learning to tackle
the semi-supervised video object segmentation under challenging multi-object scenarios …

被引用次数：267 相关文章所有 6 个版本

[PDF] thecvf.com

Putting the object back into video object segmentation

HK Cheng, SW Oh, B Price, JY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …

被引用次数：40 相关文章所有 4 个版本