Streaming dense video captioning

X Zhou, A Arnab, S Buch, S Yan… - Proceedings of the …, 2024 - openaccess.thecvf.com
An ideal model for dense video captioning--predicting captions localized temporally in a
video--should be able to handle long input videos predict rich detailed textual descriptions …

Streaming Dense Video Captioning

X Zhou, A Arnab, S Buch, S Yan, A Myers… - arXiv preprint arXiv …, 2024 - arxiv.org
An ideal model for dense video captioning--predicting captions localized temporally in a
video--should be able to handle long input videos, predict rich, detailed textual descriptions …

Streaming Dense Video Captioning

X Zhou, A Arnab, S Buch, S Yan, A Myers… - arXiv e …, 2024 - ui.adsabs.harvard.edu
An ideal model for dense video captioning--predicting captions localized temporally in a
video--should be able to handle long input videos, predict rich, detailed textual descriptions …