Youtube-vos: A large-scale video object segmentation benchmark

S Ghosh, N Das, I Das, U Maulik - ACM computing surveys (CSUR), 2019 - dl.acm.org

The machine learning community has been overwhelmed by a plethora of deep learning--
based approaches. Many challenging computer vision tasks, such as detection, localization …

被引用次数：493 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning for visual tracking: A comprehensive survey

SM Marvasti-Zadeh, L Cheng… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

Visual target tracking is one of the most sought-after yet challenging research topics in
computer vision. Given the ill-posed nature of the problem and its popularity in a broad …

被引用次数：392 相关文章所有 6 个版本

[PDF] neurips.cc

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

被引用次数：475 相关文章所有 5 个版本

[PDF] arxiv.org

Sam 2: Segment anything in images and videos

N Ravi, V Gabeur, YT Hu, R Hu, C Ryali, T Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …

被引用次数：265 相关文章所有 2 个版本

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

被引用次数：282 相关文章所有 12 个版本

[PDF] thecvf.com

Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

被引用次数：190 相关文章所有 3 个版本

[PDF] thecvf.com

Generalized decoding for pixel, image, and language

X Zou, ZY Dou, J Yang, Z Gan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present X-Decoder, a generalized decoding model that can predict pixel-level
segmentation and language tokens seamlessly. X-Decoder takes as input two types of …

被引用次数：237 相关文章所有 6 个版本

[PDF] thecvf.com

Universal instance perception as object discovery and retrieval

B Yan, Y Jiang, J Wu, D Wang, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com

All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …

被引用次数：150 相关文章所有 5 个版本

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

被引用次数：125 相关文章所有 3 个版本

[PDF] arxiv.org

Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model

HK Cheng, AG Schwing - European Conference on Computer Vision, 2022 - Springer

We present XMem, a video object segmentation architecture for long videos with unified
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …

被引用次数：378 相关文章所有 8 个版本