Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arXiv preprint arXiv …, 2024 - arxiv.org
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Visa: Reasoning video object segmentation via large language models

C Yan, H Wang, S Yan, X Jiang, Y Hu, G Kang… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as
categories, masks, or short phrases, restricting their ability to perform complex video …

ViLLa: Video Reasoning Segmentation with Large Language Model

R Zheng, L Qi, X Chen, Y Wang, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Although video perception models have made remarkable advancements in recent years,
they still heavily rely on explicit text descriptions or pre-defined categories to identify target …