Deep learning modelling techniques: current progress, applications, advantages, and challenges
Deep learning (DL) is revolutionizing evidence-based decision-making techniques that can
be applied across various sectors. Specifically, it possesses the ability to utilize two or more …
be applied across various sectors. Specifically, it possesses the ability to utilize two or more …
Domain generalization: A survey
Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …
challenging for machines to reproduce. This is because most learning algorithms strongly …
Fast inference from transformers via speculative decoding
Y Leviathan, M Kalman… - … Conference on Machine …, 2023 - proceedings.mlr.press
Inference from large autoregressive models like Transformers is slow-decoding K tokens
takes K serial runs of the model. In this work we introduce speculative decoding-an …
takes K serial runs of the model. In this work we introduce speculative decoding-an …
Flatten transformer: Vision transformer using focused linear attention
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
Simmim: A simple framework for masked image modeling
This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …
simplified recently proposed relevant approaches, without the need for special designs …
A-vit: Adaptive tokens for efficient vision transformer
We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …
Not all patches are what you need: Expediting vision transformers via token reorganizations
Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head
self-attention (MHSA) among them. Complete leverage of these image tokens brings …
self-attention (MHSA) among them. Complete leverage of these image tokens brings …
Are multimodal transformers robust to missing modality?
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …
Therefore multimodal models that are robust against modal-incomplete data are highly …
Adaptive rotated convolution for rotated object detection
Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …
orientation. In this scenario, the oriented directions of objects vary considerably across …
A dynamic multi-scale voxel flow network for video prediction
The performance of video prediction has been greatly boosted by advanced deep neural
networks. However, most of the current methods suffer from large model sizes and require …
networks. However, most of the current methods suffer from large model sizes and require …