Fine-grained scene graph generation with data transfer

A Zhang, Y Yao, Q Chen, W Ji, Z Liu, M Sun… - European conference on …, 2022 - Springer
Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in
images. Recent works have made a steady progress on SGG, and provide useful tools for …

Panoptic scene graph generation with semantics-prototype learning

L Li, W Ji, Y Wu, M Li, Y Qin, L Wei… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships
(predicate) to connect human language and visual scenes. However, different language …

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement

ZQ Cheng, Q Dai, S Li, T Mitamura… - Proceedings of the 30th …, 2022 - dl.acm.org
Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of
images for" human-like''event understanding. Specifically, GSR task not only detects the …

Open scene understanding: Grounded situation recognition meets segment anything for helping people with visual impairments

R Liu, J Zhang, K Peng, J Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Grounded Situation Recognition (GSR) is capable of recognizing and interpreting
visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the …

Grounded video situation recognition

Z Khan, CV Jawahar… - Advances in Neural …, 2022 - proceedings.neurips.cc
Dense video understanding requires answering several questions such as who is doing
what to whom, with what, how, why, and where. Recently, Video Situation Recognition …

Training multimedia event extraction with generated images and captions

Z Du, Y Li, X Guo, Y Sun, B Li - … of the 31st ACM International Conference …, 2023 - dl.acm.org
Contemporary news reporting increasingly features multimedia content, motivating research
on multimedia event extraction. However, the task lacks annotated multimodal training data …

Biased-predicate annotation identification via unbiased visual predicate representation

L Li, C Wang, Y Qin, W Ji, R Liang - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Panoptic Scene Graph Generation (PSG) translates visual scenes to structured linguistic
descriptions, ie, mapping visual instances to subjects/objects, and their relationships to …

Ambiguous images with human judgments for robust visual event classification

K Sanders, R Kriz, A Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Contemporary vision benchmarks predominantly consider tasks on which humans can
achieve near-perfect performance. However, humans are frequently presented with visual …

Video event extraction via tracking visual states of arguments

G Yang, M Li, J Zhang, X Lin, H Ji… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Video event extraction aims to detect salient events from a video and identify the arguments
for each event as well as their semantic roles. Existing methods focus on capturing the …