Rethinking and improving relative position encoding for vision transformer

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier

Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

被引用次数：162 相关文章所有 9 个版本

Federated vehicular transformers and their federations: Privacy-preserving computing and cooperation for autonomous driving

Y Tian, J Wang, Y Wang, C Zhao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Cooperative computing is promising to enhance the performance and safety of autonomous
vehicles benefiting from the increase in the amount, diversity as well as scope of data …

被引用次数：56 相关文章所有 2 个版本

[PDF] openreview.net

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Y Zhang, J Yan - The eleventh international conference on learning …, 2023 - openreview.net

Recently many deep models have been proposed for multivariate time series (MTS)
forecasting. In particular, Transformer-based models have shown great potential because …

被引用次数：411 相关文章

[PDF] neurips.cc

Vision gnn: An image is worth graph of nodes

K Han, Y Wang, J Guo, Y Tang… - Advances in neural …, 2022 - proceedings.neurips.cc

Network architecture plays a key role in the deep learning-based computer vision system.
The widely-used convolutional neural network and transformer treat the image as a grid or …

被引用次数：308 相关文章所有 8 个版本

[PDF] ecva.net

Petr: Position embedding transformation for multi-view 3d object detection

Y Liu, T Wang, X Zhang, J Sun - European Conference on Computer …, 2022 - Springer

In this paper, we develop position embedding transformation (PETR) for multi-view 3D
object detection. PETR encodes the position information of 3D coordinates into image …

被引用次数：489 相关文章所有 6 个版本

[PDF] thecvf.com

Stratified transformer for 3d point cloud segmentation

X Lai, J Liu, L Jiang, L Wang, H Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract 3D point cloud segmentation has made tremendous progress in recent years. Most
current methods focus on aggregating local features, but fail to directly model long-range …

被引用次数：374 相关文章所有 9 个版本

[PDF] thecvf.com

Metaformer is actually what you need for vision

W Yu, M Luo, P Zhou, C Si, Y Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com

Transformers have shown great potential in computer vision tasks. A common belief is their
attention-based token mixer module contributes most to their competence. However, recent …

被引用次数：908 相关文章所有 8 个版本

[PDF] neurips.cc

Flexible diffusion modeling of long videos

W Harvey, S Naderiparizi, V Masrani… - Advances in …, 2022 - proceedings.neurips.cc

We present a framework for video modeling based on denoising diffusion probabilistic
models that produces long-duration video completions in a variety of realistic environments …

被引用次数：213 相关文章所有 8 个版本

[PDF] arxiv.org

Davit: Dual attention vision transformers

M Ding, B Xiao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer

In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

被引用次数：272 相关文章所有 5 个版本

[PDF] ecva.net

Tinyvit: Fast pretraining distillation for small vision transformers

K Wu, J Zhang, H Peng, M Liu, B Xiao, J Fu… - European conference on …, 2022 - Springer

Vision transformer (ViT) recently has drawn great attention in computer vision due to its
remarkable model capability. However, most prevailing ViT models suffer from huge number …

被引用次数：175 相关文章所有 5 个版本