Dynamic neural networks: A survey

SF Ahmed, MSB Alam, M Hassan, MR Rozbu… - Artificial Intelligence …, 2023 - Springer

Deep learning (DL) is revolutionizing evidence-based decision-making techniques that can
be applied across various sectors. Specifically, it possesses the ability to utilize two or more …

被引用次数：118 相关文章所有 10 个版本

[PDF] arxiv.org

Domain generalization: A survey

K Zhou, Z Liu, Y Qiao, T Xiang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …

被引用次数：929 相关文章所有 9 个版本

[PDF] mlr.press

Fast inference from transformers via speculative decoding

Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press

Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …

被引用次数：216 相关文章所有 7 个版本

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：81 相关文章所有 5 个版本

[PDF] thecvf.com

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …

被引用次数：1129 相关文章所有 6 个版本

[PDF] thecvf.com

A-vit: Adaptive tokens for efficient vision transformer

H Yin, A Vahdat, JM Alvarez, A Mallya… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …

被引用次数：203 相关文章所有 4 个版本

[PDF] arxiv.org

Not all patches are what you need: Expediting vision transformers via token reorganizations

Y Liang, C Ge, Z Tong, Y Song, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head
self-attention (MHSA) among them. Complete leverage of these image tokens brings …

被引用次数：244 相关文章所有 7 个版本

[PDF] thecvf.com

Are multimodal transformers robust to missing modality?

M Ma, J Ren, L Zhao, D Testuggine… - Proceedings of the …, 2022 - openaccess.thecvf.com

Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …

被引用次数：124 相关文章所有 8 个版本

[PDF] thecvf.com

Adaptive rotated convolution for rotated object detection

Y Pu, Y Wang, Z Xia, Y Han, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …

被引用次数：51 相关文章所有 6 个版本

[PDF] thecvf.com

A dynamic multi-scale voxel flow network for video prediction

X Hu, Z Huang, A Huang, J Xu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

The performance of video prediction has been greatly boosted by advanced deep neural
networks. However, most of the current methods suffer from large model sizes and require …

被引用次数：47 相关文章所有 8 个版本