A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

A comprehensive survey on hardware-aware neural architecture search

H Benmeziane, KE Maghraoui, H Ouarnoughi… - arXiv preprint arXiv …, 2021 - arxiv.org
Neural Architecture Search (NAS) methods have been growing in popularity. These
techniques have been fundamental to automate and speed up the time consuming and error …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

Vidtr: Video transformer without convolutions

Y Zhang, X Li, C Liu, B Shuai, Y Zhu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We introduce Video Transformer (VidTr) with separable-attention for video
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

Ar-net: Adaptive frame resolution for efficient action recognition

Y Meng, CC Lin, R Panda, P Sattigeri… - Computer Vision–ECCV …, 2020 - Springer
Action recognition is an open and challenging problem in computer vision. While current
state-of-the-art models offer excellent recognition results, their computational expense limits …

Tokenlearner: What can 8 learned tokens do for images and videos?

MS Ryoo, AJ Piergiovanni, A Arnab… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …

Can weight sharing outperform random architecture search? an investigation with tunas

G Bender, H Liu, B Chen, G Chu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Abstract Efficient Neural Architecture Search methods based on weight sharing have shown
good promise in democratizing Neural Architecture Search for computer vision models …

Frameexit: Conditional early exiting for efficient video recognition

A Ghodrati, BE Bejnordi… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
In this paper, we propose a conditional early exiting framework for efficient video
recognition. While existing works focus on selecting a subset of salient frames to reduce the …