Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

IA-RED: Interpretability-Aware Redundancy Reduction for Vision Transformers

B Pan, R Panda, Y Jiang, Z Wang… - Advances in Neural …, 2021 - proceedings.neurips.cc
The self-attention-based model, transformer, is recently becoming the leading backbone in
the field of computer vision. In spite of the impressive success made by transformers in a …

An image is worth 16x16 words, what is a video worth?

G Sharir, A Noy, L Zelnik-Manor - arXiv preprint arXiv:2103.13915, 2021 - arxiv.org
Leading methods in the domain of action recognition try to distill information from both the
spatial and temporal dimensions of an input video. Methods that reach State of the Art (SotA) …

Adaptive focus for efficient video recognition

Y Wang, Z Chen, H Jiang, S Song… - proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we explore the spatial redundancy in video recognition with the aim to improve
the computational efficiency. It is observed that the most informative region in each frame of …

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning

M Sponner, B Waschneck, A Kumar - ACM Computing Surveys, 2024 - dl.acm.org
Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …

Dynamic network quantization for efficient video inference

X Sun, R Panda, CFR Chen, A Oliva… - Proceedings of the …, 2021 - openaccess.thecvf.com
Deep convolutional networks have recently achieved great success in video recognition, yet
their practical realization remains a challenge due to the large amount of computational …

Adamml: Adaptive multi-modal learning for efficient video recognition

R Panda, CFR Chen, Q Fan, X Sun… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multi-modal learning, which focuses on utilizing various modalities to improve the
performance of a model, is widely used in video recognition. While traditional multi-modal …

Adafuse: Adaptive temporal fusion network for efficient action recognition

Y Meng, R Panda, CC Lin, P Sattigeri… - arXiv preprint arXiv …, 2021 - arxiv.org
Temporal modelling is the key for efficient video action recognition. While understanding
temporal information can improve recognition accuracy for dynamic actions, removing …

Redundancy-aware transformer for video question answering

Y Li, X Yang, A Zhang, C Feng, X Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically,
the current video encoders tend to holistically embed all video clues at different granularities …

Stop or forward: Dynamic layer skipping for efficient action recognition

J Seon, J Hwang, J Mun, B Han - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
One of the challenges for analyzing video contents (eg, actions) is high computational cost,
especially for the tasks that require processing densely sampled frames in a long video. We …