Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Open-vocabulary sam: Segment and recognize twenty-thousand classes interactively

H Yuan, X Li, C Zhou, Y Li, K Chen, CC Loy - European Conference on …, 2025 - Springer
Abstract The CLIP and Segment Anything Model (SAM) are remarkable vision foundation
models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is …

Tube-Link: A flexible cross tube framework for universal video segmentation

X Li, H Yuan, W Zhang, G Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video segmentation aims to segment and track every pixel in diverse scenarios accurately.
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …

Toward general-purpose robots via foundation models: A survey and meta-analysis

Y Hu, Q Xie, V Jain, J Francis, J Patrikar… - arXiv preprint arXiv …, 2023 - arxiv.org
Building general-purpose robots that operate seamlessly in any environment, with any
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …

Towards language-driven video inpainting via multimodal large language models

J Wu, X Li, C Si, S Zhou, J Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a new task--language-driven video inpainting which uses natural language
instructions to guide the inpainting process. This approach overcomes the limitations of …

Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation

J Xie, W Li, X Li, Z Liu, YS Ong, CC Loy - International Journal of …, 2024 - Springer
We present MosaicFusion, a simple yet effective diffusion-based data augmentation
approach for large vocabulary instance segmentation. Our method is training-free and does …

Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation

J Wu, X Li, H Ding, X Li, G Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this work, we focus on open vocabulary instance segmentation to expand a segmentation
model to classify and segment instance-level novel categories. Previous approaches have …

Open-vocabulary video anomaly detection

P Wu, X Zhou, G Pang, Y Sun, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Current video anomaly detection (VAD) approaches with weak supervisions are inherently
limited to a closed-set setting and may struggle in open-world applications where there can …

Domain generalization for semantic segmentation: A survey

TH Rafi, R Mahjabin, E Ghosh, YW Ko… - Artificial Intelligence …, 2024 - Springer
Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …

Clip-ad: A language-guided staged dual-path model for zero-shot anomaly detection

X Chen, J Zhang, G Tian, H He, W Zhang… - … Joint Conference on …, 2024 - Springer
This paper considers zero-shot Anomaly Detection (AD), performing AD without reference
images of the test objects. We propose a framework called CLIP-AD to leverage the zero …