Rethinking the two-stage framework for grounded situation recognition

A Zhang, Y Yao, Q Chen, W Ji, Z Liu, M Sun… - European conference on …, 2022 - Springer

Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in
images. Recent works have made a steady progress on SGG, and provide useful tools for …

被引用次数：87 相关文章所有 6 个版本

[PDF] aaai.org

Panoptic scene graph generation with semantics-prototype learning

L Li, W Ji, Y Wu, M Li, Y Qin, L Wei… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships
(predicate) to connect human language and visual scenes. However, different language …

被引用次数：25 相关文章所有 3 个版本

[PDF] acm.org

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org

As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

被引用次数：35 相关文章所有 5 个版本

[PDF] acm.org

Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement

ZQ Cheng, Q Dai, S Li, T Mitamura… - Proceedings of the 30th …, 2022 - dl.acm.org

Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of
images for" human-like''event understanding. Specifically, GSR task not only detects the …

被引用次数：41 相关文章所有 6 个版本

[PDF] thecvf.com

Open scene understanding: Grounded situation recognition meets segment anything for helping people with visual impairments

R Liu, J Zhang, K Peng, J Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Grounded Situation Recognition (GSR) is capable of recognizing and interpreting
visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the …

被引用次数：12 相关文章所有 7 个版本

[PDF] neurips.cc

Grounded video situation recognition

Z Khan, CV Jawahar… - Advances in Neural …, 2022 - proceedings.neurips.cc

Dense video understanding requires answering several questions such as who is doing
what to whom, with what, how, why, and where. Recently, Video Situation Recognition …

被引用次数：15 相关文章所有 9 个版本

[PDF] acm.org

Training multimedia event extraction with generated images and captions

Z Du, Y Li, X Guo, Y Sun, B Li - … of the 31st ACM International Conference …, 2023 - dl.acm.org

Contemporary news reporting increasingly features multimedia content, motivating research
on multimedia event extraction. However, the task lacks annotated multimodal training data …

被引用次数：10 相关文章所有 6 个版本

[PDF] acm.org

Biased-predicate annotation identification via unbiased visual predicate representation

L Li, C Wang, Y Qin, W Ji, R Liang - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Panoptic Scene Graph Generation (PSG) translates visual scenes to structured linguistic
descriptions, ie, mapping visual instances to subjects/objects, and their relationships to …

被引用次数：9 相关文章

[PDF] neurips.cc

Ambiguous images with human judgments for robust visual event classification

K Sanders, R Kriz, A Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

Contemporary vision benchmarks predominantly consider tasks on which humans can
achieve near-perfect performance. However, humans are frequently presented with visual …

被引用次数：11 相关文章所有 5 个版本

[PDF] aaai.org

Video event extraction via tracking visual states of arguments

G Yang, M Li, J Zhang, X Lin, H Ji… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Video event extraction aims to detect salient events from a video and identify the arguments
for each event as well as their semantic roles. Existing methods focus on capturing the …

被引用次数：11 相关文章所有 6 个版本