Action Scene Graphs for Long-Form Understanding of Egocentric Videos

I Rodin, A Furnari, K Min, S Tripathi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Egocentric Action Scene Graphs (EASGs) a new representation for long-
form understanding of egocentric videos. EASGs extend standard manually-annotated …

VideoSAGE: Video Summarization with Graph Representation Learning

JMR Chaves, S Tripathi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We propose a graph-based representation learning framework for video summarization.
First we convert an input video to a graph where nodes correspond to each of the video …

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arXiv preprint arXiv:2409.05659, 2024 - arxiv.org
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

Beyond Words: Enhancing Natural Interaction by Recognizing Social Conversation Contexts in HRI

J Jang, Y Yoon - 2024 21st International Conference on …, 2024 - ieeexplore.ieee.org
With the ongoing advancements in AI technology, human-robot interactions have become
increasingly prevalent, extending across diverse domains such as AI speakers and service …