A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
Causal intervention for weakly-supervised semantic segmentation
We present a causal inference framework to improve Weakly-Supervised Semantic
Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …
Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …
Panoptic scene graph generation
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …
understanding in images—from a detection perspective, ie., objects are detected using …
Multi-modal knowledge graph construction and application: A survey
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
Unbiased scene graph generation from biased training
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
Bipartite graph network with adaptive message passing for unbiased scene graph generation
Scene graph generation is an important visual understanding task with a broad range of
vision applications. Despite recent tremendous progress, it remains challenging due to the …
vision applications. Despite recent tremendous progress, it remains challenging due to the …
Auto-encoding scene graphs for image captioning
Abstract We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language
inductive bias into the encoder-decoder image captioning framework for more human-like …
inductive bias into the encoder-decoder image captioning framework for more human-like …
Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering
Abstract Knowledge-based visual question answering requires the ability of associating
external knowledge for open-ended cross-modal scene understanding. One limitation of …
external knowledge for open-ended cross-modal scene understanding. One limitation of …