Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Reltr: Relation transformer for scene graph generation

Y Cong, MY Yang, B Rosenhahn - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Different objects in the same scene are more or less related to each other, but only a limited
number of these relationships are noteworthy. Inspired by Detection Transformer, which …

[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Panoptic video scene graph generation

J Yang, W Peng, X Li, Z Guo, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Towards building comprehensive real-world visual perception systems, we propose and
study a new problem called panoptic scene graph generation (PVSG). PVSG is related to …

Dynamic scene graph generation via anticipatory pre-training

Y Li, X Yang, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Humans can not only see the collection of objects in visual scenes, but also identify the
relationship between objects. The visual relationship in the scene can be abstracted into the …

Unbiased scene graph generation in videos

S Nag, K Min, S Tripathi… - Proceedings of the …, 2023 - openaccess.thecvf.com
The task of dynamic scene graph generation (SGG) from videos is complicated and
challenging due to the inherent dynamics of a scene, temporal fluctuation of model …

Meta spatio-temporal debiasing for video scene graph generation

L Xu, H Qu, J Kuen, J Gu, J Liu - European Conference on Computer …, 2022 - Springer
Video scene graph generation (VidSGG) aims to parse the video content into scene graphs,
which involves modeling the spatio-temporal contextual information in the video. However …

Pair then relation: Pair-net for panoptic scene graph generation

J Wang, Z Wen, X Li, Z Guo, J Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that
aims to create a more comprehensive scene graph representation using panoptic …

Structured sparse r-cnn for direct scene graph generation

Y Teng, L Wang - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Scene graph generation (SGG) is to detect object pairs with their relations in an image.
Existing SGG approaches often use multi-stage pipelines to decompose this task into object …

Spatial-temporal knowledge-embedded transformer for video scene graph generation

T Pu, T Chen, H Wu, Y Lu, L Lin - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer
their relationships for a given video. It requires not only a comprehensive understanding of …