Movienet: A holistic dataset for movie understanding
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
Hvpr: Hybrid voxel-point representation for single-stage 3d object detection
We address the problem of 3D object detection, that is, estimating 3D object bounding boxes
from point clouds. 3D object detection methods exploit either voxel-based or point-based …
from point clouds. 3D object detection methods exploit either voxel-based or point-based …
Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks
Category-level 6D pose estimation, aiming to predict the location and orientation of unseen
object instances, is fundamental to many scenarios such as robotic manipulation and …
object instances, is fundamental to many scenarios such as robotic manipulation and …
A unified framework for shot type classification based on subject centric lens
Shots are key narrative elements of various videos, eg movies, TV series, and user-
generated videos that are thriving over the Internet. The types of shots greatly influence how …
generated videos that are thriving over the Internet. The types of shots greatly influence how …
Ava-avd: Audio-visual speaker diarization in the wild
Audio-visual speaker diarization aims at detecting" who spoke when''using both auditory
and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …
and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …
Moviecuts: A new dataset and benchmark for cut type recognition
Understanding movies and their structural patterns is a crucial task in decoding the craft of
video editing. While previous works have developed tools for general analysis, such as …
video editing. While previous works have developed tools for general analysis, such as …
Face, body, voice: Video person-clustering with multiple modalities
A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …
their identity. Previous methods focus on the narrower task of face-clustering, and for the …
[PDF][PDF] End-to-end audio-visual neural speaker diarization
In this paper, we propose a novel end-to-end neural-networkbased audio-visual speaker
diarization method. Unlike most existing audio-visual methods, our audio-visual model takes …
diarization method. Unlike most existing audio-visual methods, our audio-visual model takes …
Learning to cut by watching movies
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …
remains challenging and requires non-trivial video editing expertise. Many video editing …
Transformation vs tradition: Artificial general intelligence (agi) for arts and humanities
Recent advances in artificial general intelligence (AGI), particularly large language models
and creative image generation systems have demonstrated impressive capabilities on …
and creative image generation systems have demonstrated impressive capabilities on …