Towards open vocabulary learning: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

被引用次数：102 相关文章所有 3 个版本

[PDF] arxiv.org

Open-vocabulary sam: Segment and recognize twenty-thousand classes interactively

H Yuan, X Li, C Zhou, Y Li, K Chen, CC Loy - European Conference on …, 2025 - Springer

Abstract The CLIP and Segment Anything Model (SAM) are remarkable vision foundation
models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is …

被引用次数：29 相关文章所有 2 个版本

[PDF] thecvf.com

Tube-Link: A flexible cross tube framework for universal video segmentation

X Li, H Yuan, W Zhang, G Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video segmentation aims to segment and track every pixel in diverse scenarios accurately.
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …

被引用次数：49 相关文章所有 5 个版本

[PDF] arxiv.org

Toward general-purpose robots via foundation models: A survey and meta-analysis

Y Hu, Q Xie, V Jain, J Francis, J Patrikar… - arXiv preprint arXiv …, 2023 - arxiv.org

Building general-purpose robots that operate seamlessly in any environment, with any
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …

被引用次数：57 相关文章所有 2 个版本

[PDF] thecvf.com

Towards language-driven video inpainting via multimodal large language models

J Wu, X Li, C Si, S Zhou, J Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a new task--language-driven video inpainting which uses natural language
instructions to guide the inpainting process. This approach overcomes the limitations of …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation

J Xie, W Li, X Li, Z Liu, YS Ong, CC Loy - International Journal of …, 2024 - Springer

We present MosaicFusion, a simple yet effective diffusion-based data augmentation
approach for large vocabulary instance segmentation. Our method is training-free and does …

被引用次数：28 相关文章所有 2 个版本

[PDF] thecvf.com

Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation

J Wu, X Li, H Ding, X Li, G Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we focus on open vocabulary instance segmentation to expand a segmentation
model to classify and segment instance-level novel categories. Previous approaches have …

被引用次数：30 相关文章所有 12 个版本

[PDF] thecvf.com

Open-vocabulary video anomaly detection

P Wu, X Zhou, G Pang, Y Sun, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Current video anomaly detection (VAD) approaches with weak supervisions are inherently
limited to a closed-set setting and may struggle in open-world applications where there can …

被引用次数：17 相关文章所有 4 个版本

[PDF] springer.com

Domain generalization for semantic segmentation: A survey

TH Rafi, R Mahjabin, E Ghosh, YW Ko… - Artificial Intelligence …, 2024 - Springer

Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …

被引用次数：2 相关文章

[PDF] arxiv.org

Clip-ad: A language-guided staged dual-path model for zero-shot anomaly detection

X Chen, J Zhang, G Tian, H He, W Zhang… - … Joint Conference on …, 2024 - Springer

This paper considers zero-shot Anomaly Detection (AD), performing AD without reference
images of the test objects. We propose a framework called CLIP-AD to leverage the zero …

被引用次数：24 相关文章所有 2 个版本