Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Q Chen, S Di, W Xie - arXiv preprint arXiv:2408.14469, 2024 - arxiv.org
This paper considers the problem of Multi-Hop Video Question Answering (MH-VidQA) in
long-form egocentric videos. This task not only requires to answer visual questions, but also …

ActionVOS: Actions as Prompts for Video Object Segmentation

L Ouyang, R Liu, Y Huang, R Furuta, Y Sato - arXiv preprint arXiv …, 2024 - arxiv.org
Delving into the realm of egocentric vision, the advancement of referring video object
segmentation (RVOS) stands as pivotal in understanding human activities. However …

AMEGO: Active Memory from long EGOcentric videos

G Goletto, T Nagarajan, G Averta, D Damen - arXiv preprint arXiv …, 2024 - arxiv.org
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their
unstructured nature presents challenges for perception. In this paper, we introduce AMEGO …