A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …
the computer vision. It has critical application in wide variety of tasks including gaming …
A comprehensive survey on hardware-aware neural architecture search
Neural Architecture Search (NAS) methods have been growing in popularity. These
techniques have been fundamental to automate and speed up the time consuming and error …
techniques have been fundamental to automate and speed up the time consuming and error …
Movinets: Mobile video networks for efficient video recognition
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …
memory efficient video networks that can operate on streaming video for online inference …
Vidtr: Video transformer without convolutions
Abstract We introduce Video Transformer (VidTr) with separable-attention for video
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …
A comprehensive study of deep video action recognition
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …
last decade, we have witnessed great advancements in video action recognition thanks to …
Enable deep learning on mobile devices: Methods, systems, and applications
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …
intelligence (AI), including computer vision, natural language processing, and speech …
Ar-net: Adaptive frame resolution for efficient action recognition
Action recognition is an open and challenging problem in computer vision. While current
state-of-the-art models offer excellent recognition results, their computational expense limits …
state-of-the-art models offer excellent recognition results, their computational expense limits …
Tokenlearner: What can 8 learned tokens do for images and videos?
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …
of adaptively learned tokens, and which is applicable to both image and video …
Can weight sharing outperform random architecture search? an investigation with tunas
Abstract Efficient Neural Architecture Search methods based on weight sharing have shown
good promise in democratizing Neural Architecture Search for computer vision models …
good promise in democratizing Neural Architecture Search for computer vision models …
Frameexit: Conditional early exiting for efficient video recognition
A Ghodrati, BE Bejnordi… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
In this paper, we propose a conditional early exiting framework for efficient video
recognition. While existing works focus on selecting a subset of salient frames to reduce the …
recognition. While existing works focus on selecting a subset of salient frames to reduce the …