Self-supervised visual feature learning with deep neural networks: A survey

L Jing, Y Tian - IEEE transactions on pattern analysis and …, 2020 - ieeexplore.ieee.org
Large-scale labeled data are generally required to train deep neural networks in order to
obtain better performance in visual feature learning from images or videos for computer …

[HTML][HTML] Video-based human activity recognition using deep learning approaches

GAS Surek, LO Seman, SF Stefenon, VC Mariani… - Sensors, 2023 - mdpi.com
Due to its capacity to gather vast, high-level data about human activity from wearable or
stationary sensors, human activity recognition substantially impacts people's day-to-day …

Self-supervised spatiotemporal feature learning via video rotation prediction

L Jing, X Yang, J Liu, Y Tian - arXiv preprint arXiv:1811.11387, 2018 - arxiv.org
The success of deep neural networks generally requires a vast amount of training data to be
labeled, which is expensive and unfeasible in scale, especially for video collections. To …

Ship target detection and identification based on SSD_MobilenetV2

Y Zou, L Zhao, S Qin, M Pan, Z Li - 2020 IEEE 5th Information …, 2020 - ieeexplore.ieee.org
There are many deep learning algorithms currently used in ship supervision, but they
generally have the problems of insufficient target detection speed and accurate identification …

[HTML][HTML] Spatial self-attention network with self-attention distillation for fine-grained image recognition

AA Baffour, Z Qin, Y Wang, Z Qin, KKR Choo - Journal of Visual …, 2021 - Elsevier
The underlining task for fine-grained image recognition captures both the inter-class and
intra-class discriminate features. Existing methods generally use auxiliary data to guide the …

Facs3d-net: 3d convolution based spatiotemporal representation for action unit detection

L Yang, IO Ertugrul, JF Cohn, Z Hammal… - 2019 8th …, 2019 - ieeexplore.ieee.org
Most approaches to automatic facial action unit (AU) detection consider only spatial
information and ignore AU dynamics. For humans, dynamics improves AU perception. Is …

3D deformable convolution temporal reasoning network for action recognition

Y Ou, Z Chen - Journal of Visual Communication and Image …, 2023 - Elsevier
Modeling and reasoning of the interactions between multiple entities (actors and objects)
are beneficial for the action recognition task. In this paper, we propose a 3D Deformable …

Recognizing american sign language manual signs from rgb-d videos

L Jing, E Vahdani, M Huenerfauth, Y Tian - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we propose a 3D Convolutional Neural Network (3DCNN) based multi-stream
framework to recognize American Sign Language (ASL) manual signs (consisting of …

Analysis of pruned neural networks (MobileNetV2-YOLO v2) for underwater object detection

AF Ayob, K Khairuddin, YM Mustafah, AR Salisa… - Proceedings of the 11th …, 2021 - Springer
Underwater object detection involves the activity of multiple object identification within a
dynamic and noisy environment. Such task is challenging due to the inconsistency of …

G‐YOLOX: A Lightweight Network for Detecting Vehicle Types

Q Luo, J Wang, M Gao, H Lin, H Zhou… - Journal of …, 2022 - Wiley Online Library
In recent years, vehicle type detection has had an important role in traffic management. A
lightweight detection network based on multiscale ghost convolution called G‐YOLOX is …