Autoad ii: The sequel-who, when, and what in movie audio description

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com
Audio Description (AD) is the task of generating descriptions of visual content, at suitable
time intervals, for the benefit of visually impaired audiences. For movies, this presents …

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com
The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

Mm-vid: Advancing video understanding with gpt-4v (ision)

K Lin, F Ahmed, L Li, CC Lin, E Azarnasab… - arXiv preprint arXiv …, 2023 - arxiv.org
We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V,
combined with specialized tools in vision, audio, and speech, to facilitate advanced video …

Condensed movies: Story based retrieval with contextual embeddings

M Bain, A Nagrani, A Brown… - Proceedings of the …, 2020 - openaccess.thecvf.com
Our objective in this work is the long range understandingof the narrative structure of
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …

How You Feelin'? Learning Emotions and Mental States in Movie Scenes

D Srivastava, AK Singh… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Movie story analysis requires understanding characters' emotions and mental states.
Towards this goal, we formulate emotion understanding as predicting a diverse and multi …

Laeo-net: revisiting people looking at each other in videos

MJ Marin-Jimenez, V Kalogeiton… - Proceedings of the …, 2019 - openaccess.thecvf.com
Capturing the'mutual gaze'of people is essential for understanding and interpreting the
social interactions between them. To this end, this paper addresses the problem of detecting …

Face, body, voice: Video person-clustering with multiple modalities

A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …

Social fabric: Tubelet compositions for video relation detection

S Chen, Z Shi, P Mettes… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
This paper strives to classify and detect the relationship between object tubelets appearing
within a video as a< subject-predicate-object> triplet. Where existing works treat object …

Linking the characters: Video-oriented social graph generation via hierarchical-cumulative GCN

S Wu, J Chen, T Xu, L Chen, L Wu, Y Hu… - Proceedings of the 29th …, 2021 - dl.acm.org
Recent years have witnessed the booming of online video platforms. Along this line, a graph
to illustrate social relation among characters has been long expected to not only benefit the …

Learning social relationship from videos via pre-trained multimodal transformer

Y Teng, C Song, B Wu - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org
As a crucial task for video analysis, social relation recognition from characters provides
intelligent applications with great potential to better understand the behaviors or emotions of …