Zero-shot visual relation detection via composite visual cues from large language models

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

被引用次数：16 相关文章所有 7 个版本

[PDF] thecvf.com

Compositional feature augmentation for unbiased scene graph generation

L Li, G Chen, J Xiao, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Scene Graph Generation (SGG) aims to detect all the visual relation triplets< sub,
pred, obj> in a given image. With the emergence of various advanced techniques for better …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Doraemongpt: Toward understanding dynamic scenes with large language models

Z Yang, G Chen, X Li, W Wang, Y Yang - arXiv preprint arXiv:2401.08392, 2024 - arxiv.org

The field of AI agents is advancing at an unprecedented rate due to the capabilities of large
language models (LLMs). However, LLM-driven visual agents mainly focus on solving tasks …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Nicest: Noisy label correction and training for robust scene graph generation

L Li, J Xiao, H Shi, H Zhang, Y Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Nearly all existing scene graph generation (SGG) models have overlooked the ground-truth
annotation qualities of mainstream SGG datasets, ie, they assume: 1) all the manually …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Less is more: Toward zero-shot local scene graph generation via foundation models

S Zhao, H Xu - arXiv preprint arXiv:2310.01356, 2023 - arxiv.org

Humans inherently recognize objects via selective visual perception, transform specific
regions from the visual field into structured symbolic knowledge, and reason their …

被引用次数：3 相关文章所有 3 个版本

[PDF] acm.org

Improving reference-based distinctive image captioning with contrastive rewards

Y Mao, J Xiao, D Zhang, M Cao, J Shao… - ACM Transactions on …, 2023 - dl.acm.org

Distinctive Image Captioning (DIC)—generating distinctive captions that describe the unique
details of a target image—has received considerable attention over the last few years. A …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

UAHOI: Uncertainty-aware robust interaction learning for HOI detection

M Chen, M Chen, Y Yang - Computer Vision and Image Understanding, 2024 - Elsevier

This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge
of identifying and understanding the interactions between humans and objects within a …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Compositional zero-shot learning via progressive language-based observations

L Li, G Chen, J Xiao, L Chen - arXiv preprint arXiv:2311.14749, 2023 - arxiv.org

Compositional zero-shot learning aims to recognize unseen state-object compositions by
leveraging known primitives (state and object) during training. However, effectively modeling …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

H Shi, L Li, J Xiao, Y Zhuang, L Chen - International Journal of Computer …, 2024 - Springer

Abstract Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-
structure representation based on panoptic segmentation masks. Despite remarkable …

Gaussian Distribution-Aware Commonsense Knowledge Learning for Scene Graph Generation

H Tian, N Xu, M Kankanhalli… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Knowledge-based Scene Graph Generation (SGG) requires external commonsense
knowledge beyond the visual scene to infer the relation between objects. Such knowledge …