Movienet: A holistic dataset for movie understanding

Q Huang, Y Xiong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

Hvpr: Hybrid voxel-point representation for single-stage 3d object detection

J Noh, S Lee, B Ham - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
We address the problem of 3D object detection, that is, estimating 3D object bounding boxes
from point clouds. 3D object detection methods exploit either voxel-based or point-based …

Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks

J Wang, K Chen, Q Dou - 2021 IEEE/RSJ International …, 2021 - ieeexplore.ieee.org
Category-level 6D pose estimation, aiming to predict the location and orientation of unseen
object instances, is fundamental to many scenarios such as robotic manipulation and …

A unified framework for shot type classification based on subject centric lens

A Rao, J Wang, L Xu, X Jiang, Q Huang, B Zhou… - Computer Vision–ECCV …, 2020 - Springer
Shots are key narrative elements of various videos, eg movies, TV series, and user-
generated videos that are thriving over the Internet. The types of shots greatly influence how …

Ava-avd: Audio-visual speaker diarization in the wild

EZ Xu, Z Song, S Tsutsui, C Feng, M Ye… - Proceedings of the 30th …, 2022 - dl.acm.org
Audio-visual speaker diarization aims at detecting" who spoke when''using both auditory
and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …

Moviecuts: A new dataset and benchmark for cut type recognition

A Pardo, FC Heilbron, JL Alcázar, A Thabet… - … on Computer Vision, 2022 - Springer
Understanding movies and their structural patterns is a crucial task in decoding the craft of
video editing. While previous works have developed tools for general analysis, such as …

Face, body, voice: Video person-clustering with multiple modalities

A Brown, V Kalogeiton… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
The objective of this work is person-clustering in videos--grouping characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …

[PDF][PDF] End-to-end audio-visual neural speaker diarization

MK He, J Du, CH Lee - Proc. Interspeech, 2022 - drive.google.com
In this paper, we propose a novel end-to-end neural-networkbased audio-visual speaker
diarization method. Unlike most existing audio-visual methods, our audio-visual model takes …

Learning to cut by watching movies

A Pardo, F Caba, JL Alcázar… - Proceedings of the …, 2021 - openaccess.thecvf.com
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …

Transformation vs tradition: Artificial general intelligence (agi) for arts and humanities

Z Liu, Y Li, Q Cao, J Chen, T Yang, Z Wu, J Hale… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in artificial general intelligence (AGI), particularly large language models
and creative image generation systems have demonstrated impressive capabilities on …