VA-RED $^ 2$: Video Adaptive Redundancy Reduction

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

被引用次数：683 相关文章所有 7 个版本

[PDF] neurips.cc

IA-RED: Interpretability-Aware Redundancy Reduction for Vision Transformers

B Pan, R Panda, Y Jiang, Z Wang… - Advances in Neural …, 2021 - proceedings.neurips.cc

The self-attention-based model, transformer, is recently becoming the leading backbone in
the field of computer vision. In spite of the impressive success made by transformers in a …

被引用次数：140 相关文章所有 8 个版本

[PDF] arxiv.org

An image is worth 16x16 words, what is a video worth?

G Sharir, A Noy, L Zelnik-Manor - arXiv preprint arXiv:2103.13915, 2021 - arxiv.org

Leading methods in the domain of action recognition try to distill information from both the
spatial and temporal dimensions of an input video. Methods that reach State of the Art (SotA) …

被引用次数：118 相关文章所有 2 个版本

[PDF] thecvf.com

Adaptive focus for efficient video recognition

Y Wang, Z Chen, H Jiang, S Song… - proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we explore the spatial redundancy in video recognition with the aim to improve
the computational efficiency. It is observed that the most informative region in each frame of …

被引用次数：107 相关文章所有 6 个版本

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning

M Sponner, B Waschneck, A Kumar - ACM Computing Surveys, 2024 - dl.acm.org

Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …

被引用次数：4 相关文章

[PDF] thecvf.com

Dynamic network quantization for efficient video inference

X Sun, R Panda, CFR Chen, A Oliva… - Proceedings of the …, 2021 - openaccess.thecvf.com

Deep convolutional networks have recently achieved great success in video recognition, yet
their practical realization remains a challenge due to the large amount of computational …

被引用次数：49 相关文章所有 8 个版本

[PDF] thecvf.com

Adamml: Adaptive multi-modal learning for efficient video recognition

R Panda, CFR Chen, Q Fan, X Sun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-modal learning, which focuses on utilizing various modalities to improve the
performance of a model, is widely used in video recognition. While traditional multi-modal …

被引用次数：56 相关文章所有 9 个版本

[PDF] arxiv.org

Adafuse: Adaptive temporal fusion network for efficient action recognition

Y Meng, R Panda, CC Lin, P Sattigeri… - arXiv preprint arXiv …, 2021 - arxiv.org

Temporal modelling is the key for efficient video action recognition. While understanding
temporal information can improve recognition accuracy for dynamic actions, removing …

被引用次数：68 相关文章所有 5 个版本

Redundancy-aware transformer for video question answering

Y Li, X Yang, A Zhang, C Feng, X Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically,
the current video encoders tend to holistically embed all video clues at different granularities …

被引用次数：12 相关文章所有 3 个版本

[PDF] thecvf.com

Stop or forward: Dynamic layer skipping for efficient action recognition

J Seon, J Hwang, J Mun, B Han - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

One of the challenges for analyzing video contents (eg, actions) is high computational cost,
especially for the tasks that require processing densely sampled frames in a long video. We …

被引用次数：8 相关文章所有 3 个版本