End-to-end learning of motion representation for video understanding

[HTML][HTML] Deep learning and transfer learning for device-free human activity recognition: A survey

J Yang, Y Xu, H Cao, H Zou, L Xie - Journal of Automation and Intelligence, 2022 - Elsevier

Device-free activity recognition plays a crucial role in smart building, security, and human–
computer interaction, which shows its strength in its convenience and cost-efficiency …

被引用次数：37 相关文章

[PDF] wiley.com Full View

Deep learning for multiple object tracking: a survey

Y Xu, X Zhou, S Chen, F Li - IET Computer Vision, 2019 - Wiley Online Library

Deep learning has been proved effective in multiple object tracking, which confronts the
difficulties of frequent occlusions, confusing appearance, in‐and‐out objects, and lack of …

被引用次数：122 相关文章所有 6 个版本

[PDF] thecvf.com

Targeted supervised contrastive learning for long-tailed recognition

T Li, P Cao, Y Yuan, L Fan, Y Yang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Real-world data often exhibits long tail distributions with heavy class imbalance, where the
majority classes can dominate the training process and alter the decision boundaries of the …

被引用次数：188 相关文章所有 6 个版本

[PDF] arxiv.org

Raft: Recurrent all-pairs field transforms for optical flow

Z Teed, J Deng - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

Abstract We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network
architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D …

被引用次数：2301 相关文章所有 12 个版本

[PDF] thecvf.com

X3d: Expanding architectures for efficient video recognition

C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com

This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …

被引用次数：1081 相关文章所有 7 个版本

[PDF] thecvf.com

Graph convolutional networks for temporal action localization

R Zeng, W Huang, M Tan, Y Rong… - Proceedings of the …, 2019 - openaccess.thecvf.com

Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …

被引用次数：575 相关文章所有 8 个版本

[PDF] neurips.cc

Deep multimodal fusion by channel exchanging

Y Wang, W Huang, F Sun, T Xu… - Advances in neural …, 2020 - proceedings.neurips.cc

Deep multimodal fusion by using multiple sources of data for classification or regression has
exhibited a clear advantage over the unimodal counterpart on various applications. Yet …

被引用次数：242 相关文章所有 6 个版本

[PDF] thecvf.com

Stm: Spatiotemporal and motion encoding for action recognition

B Jiang, MM Wang, W Gan, W Wu… - Proceedings of the …, 2019 - openaccess.thecvf.com

Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …

被引用次数：500 相关文章所有 6 个版本

[PDF] arxiv.org

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org

Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

被引用次数：207 相关文章所有 2 个版本

[PDF] arxiv.org

Quality assessment of in-the-wild videos

D Li, T Jiang, M Jiang - Proceedings of the 27th ACM international …, 2019 - dl.acm.org

Quality assessment of in-the-wild videos is a challenging problem because of the absence
of reference videos and shooting distortions. Knowledge of the human visual system can …

被引用次数：299 相关文章所有 9 个版本