A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Compositional feature augmentation for unbiased scene graph generation

L Li, G Chen, J Xiao, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Scene Graph Generation (SGG) aims to detect all the visual relation triplets< sub,
pred, obj> in a given image. With the emergence of various advanced techniques for better …

Doraemongpt: Toward understanding dynamic scenes with large language models

Z Yang, G Chen, X Li, W Wang, Y Yang - arXiv preprint arXiv:2401.08392, 2024 - arxiv.org
The field of AI agents is advancing at an unprecedented rate due to the capabilities of large
language models (LLMs). However, LLM-driven visual agents mainly focus on solving tasks …

Nicest: Noisy label correction and training for robust scene graph generation

L Li, J Xiao, H Shi, H Zhang, Y Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Nearly all existing scene graph generation (SGG) models have overlooked the ground-truth
annotation qualities of mainstream SGG datasets, ie, they assume: 1) all the manually …

Less is more: Toward zero-shot local scene graph generation via foundation models

S Zhao, H Xu - arXiv preprint arXiv:2310.01356, 2023 - arxiv.org
Humans inherently recognize objects via selective visual perception, transform specific
regions from the visual field into structured symbolic knowledge, and reason their …

Improving reference-based distinctive image captioning with contrastive rewards

Y Mao, J Xiao, D Zhang, M Cao, J Shao… - ACM Transactions on …, 2023 - dl.acm.org
Distinctive Image Captioning (DIC)—generating distinctive captions that describe the unique
details of a target image—has received considerable attention over the last few years. A …

UAHOI: Uncertainty-aware robust interaction learning for HOI detection

M Chen, M Chen, Y Yang - Computer Vision and Image Understanding, 2024 - Elsevier
This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge
of identifying and understanding the interactions between humans and objects within a …

Compositional zero-shot learning via progressive language-based observations

L Li, G Chen, J Xiao, L Chen - arXiv preprint arXiv:2311.14749, 2023 - arxiv.org
Compositional zero-shot learning aims to recognize unseen state-object compositions by
leveraging known primitives (state and object) during training. However, effectively modeling …

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

H Shi, L Li, J Xiao, Y Zhuang, L Chen - International Journal of Computer …, 2024 - Springer
Abstract Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-
structure representation based on panoptic segmentation masks. Despite remarkable …

Gaussian Distribution-Aware Commonsense Knowledge Learning for Scene Graph Generation

H Tian, N Xu, M Kankanhalli… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Knowledge-based Scene Graph Generation (SGG) requires external commonsense
knowledge beyond the visual scene to infer the relation between objects. Such knowledge …