A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Compositional feature augmentation for unbiased scene graph generation
Abstract Scene Graph Generation (SGG) aims to detect all the visual relation triplets< sub,
pred, obj> in a given image. With the emergence of various advanced techniques for better …
pred, obj> in a given image. With the emergence of various advanced techniques for better …
Doraemongpt: Toward understanding dynamic scenes with large language models
The field of AI agents is advancing at an unprecedented rate due to the capabilities of large
language models (LLMs). However, LLM-driven visual agents mainly focus on solving tasks …
language models (LLMs). However, LLM-driven visual agents mainly focus on solving tasks …
Nicest: Noisy label correction and training for robust scene graph generation
Nearly all existing scene graph generation (SGG) models have overlooked the ground-truth
annotation qualities of mainstream SGG datasets, ie, they assume: 1) all the manually …
annotation qualities of mainstream SGG datasets, ie, they assume: 1) all the manually …
Less is more: Toward zero-shot local scene graph generation via foundation models
Humans inherently recognize objects via selective visual perception, transform specific
regions from the visual field into structured symbolic knowledge, and reason their …
regions from the visual field into structured symbolic knowledge, and reason their …
Improving reference-based distinctive image captioning with contrastive rewards
Distinctive Image Captioning (DIC)—generating distinctive captions that describe the unique
details of a target image—has received considerable attention over the last few years. A …
details of a target image—has received considerable attention over the last few years. A …
UAHOI: Uncertainty-aware robust interaction learning for HOI detection
This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge
of identifying and understanding the interactions between humans and objects within a …
of identifying and understanding the interactions between humans and objects within a …
Compositional zero-shot learning via progressive language-based observations
Compositional zero-shot learning aims to recognize unseen state-object compositions by
leveraging known primitives (state and object) during training. However, effectively modeling …
leveraging known primitives (state and object) during training. However, effectively modeling …
From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation
Abstract Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-
structure representation based on panoptic segmentation masks. Despite remarkable …
structure representation based on panoptic segmentation masks. Despite remarkable …
Gaussian Distribution-Aware Commonsense Knowledge Learning for Scene Graph Generation
Knowledge-based Scene Graph Generation (SGG) requires external commonsense
knowledge beyond the visual scene to infer the relation between objects. Such knowledge …
knowledge beyond the visual scene to infer the relation between objects. Such knowledge …