New generation deep learning for video object detection: A survey

L Jiao, R Zhang, F Liu, S Yang, B Hou… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Video object detection, a basic task in the computer vision field, is rapidly evolving and
widely used. In recent years, deep learning methods have rapidly become widespread in the …

A review of video object detection: Datasets, metrics and methods

H Zhu, H Wei, B Li, X Yuan, N Kehtarnavaz - Applied Sciences, 2020 - mdpi.com
Although there are well established object detection methods based on static images, their
application to video data on a frame by frame basis faces two shortcomings:(i) lack of …

Global tracking transformers

X Zhou, T Yin, V Koltun… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
We present a novel transformer-based architecture for global multi-object tracking. Our
network takes a short sequence of frames as input and produces global trajectories for all …

Transflow: Transformer as flow learner

Y Lu, Q Wang, S Ma, T Geng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Optical flow is an indispensable building block for various important computer vision tasks,
including motion estimation, object tracking, and disparity measurement. In this work, we …

Detection and tracking meet drones challenge

P Zhu, L Wen, D Du, X Bian, H Fan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide
range of applications, including agriculture, aerial photography, and surveillance …

Tf-blender: Temporal feature blender for video object detection

Y Cui, L Yan, Z Cao, D Liu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Video objection detection is a challenging task because isolated video frames may
encounter appearance deterioration, which introduces great confusion for detection. One of …

Disentangled non-local neural networks

M Yin, Z Yao, Y Cao, X Li, Z Zhang, S Lin… - Computer Vision–ECCV …, 2020 - Springer
The non-local block is a popular module for strengthening the context modeling ability of a
regular convolutional neural network. This paper first studies the non-local block in depth …

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Memory enhanced global-local aggregation for video object detection

Y Chen, Y Cao, H Hu, L Wang - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
How do humans recognize an object in a piece of video? Due to the deteriorated quality of
single frame, it may be hard for people to identify an occluded object in this frame by just …

TransVOD: end-to-end video object detection with spatial-temporal transformers

Q Zhou, X Li, L He, Y Yang, G Cheng… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …