Learning interactions and relationships between movie characters

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …

被引用次数：21 相关文章所有 7 个版本

[PDF] thecvf.com

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

被引用次数：36 相关文章所有 7 个版本

[PDF] arxiv.org

Mm-vid: Advancing video understanding with gpt-4v (ision)

K Lin, F Ahmed, L Li, CC Lin, E Azarnasab… - arXiv preprint arXiv …, 2023 - arxiv.org

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V,
combined with specialized tools in vision, audio, and speech, to facilitate advanced video …

被引用次数：36 相关文章所有 2 个版本

[PDF] thecvf.com

Condensed movies: Story based retrieval with contextual embeddings

M Bain, A Nagrani, A Brown… - Proceedings of the …, 2020 - openaccess.thecvf.com

Our objective in this work is the long range understandingof the narrative structure of
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …

被引用次数：100 相关文章所有 10 个版本

[PDF] thecvf.com

How You Feelin'? Learning Emotions and Mental States in Movie Scenes

D Srivastava, AK Singh… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Movie story analysis requires understanding characters' emotions and mental states.
Towards this goal, we formulate emotion understanding as predicting a diverse and multi …

被引用次数：11 相关文章所有 6 个版本

[PDF] thecvf.com

Laeo-net: revisiting people looking at each other in videos

MJ Marin-Jimenez, V Kalogeiton… - Proceedings of the …, 2019 - openaccess.thecvf.com

Capturing the'mutual gaze'of people is essential for understanding and interpreting the
social interactions between them. To this end, this paper addresses the problem of detecting …

被引用次数：66 相关文章所有 26 个版本

[PDF] thecvf.com

Face, body, voice: Video person-clustering with multiple modalities

A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …

被引用次数：31 相关文章所有 17 个版本

[PDF] thecvf.com

Social fabric: Tubelet compositions for video relation detection

S Chen, Z Shi, P Mettes… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

This paper strives to classify and detect the relationship between object tubelets appearing
within a video as a< subject-predicate-object> triplet. Where existing works treat object …

被引用次数：25 相关文章所有 7 个版本

[PDF] ustc.edu.cn

Linking the characters: Video-oriented social graph generation via hierarchical-cumulative GCN

S Wu, J Chen, T Xu, L Chen, L Wu, Y Hu… - Proceedings of the 29th …, 2021 - dl.acm.org

Recent years have witnessed the booming of online video platforms. Along this line, a graph
to illustrate social relation among characters has been long expected to not only benefit the …

被引用次数：23 相关文章所有 6 个版本

Learning social relationship from videos via pre-trained multimodal transformer

Y Teng, C Song, B Wu - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org

As a crucial task for video analysis, social relation recognition from characters provides
intelligent applications with great potential to better understand the behaviors or emotions of …

被引用次数：12 相关文章所有 2 个版本