Transformers in computational visual media: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

被引用次数：1505 相关文章所有 8 个版本

[PDF] arxiv.org

A comprehensive survey on applications of transformers for deep learning tasks

S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2023 - Elsevier

Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …

被引用次数：68 相关文章所有 4 个版本

[PDF] neurips.cc

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

被引用次数：493 相关文章所有 6 个版本

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

被引用次数：426 相关文章所有 9 个版本

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

被引用次数：595 相关文章所有 8 个版本

[PDF] thecvf.com

Stytr2: Image style transfer with transformers

Y Deng, F Tang, W Dong, C Ma… - Proceedings of the …, 2022 - openaccess.thecvf.com

The goal of image style transfer is to render an image with artistic features guided by a style
reference while maintaining the original content. Owing to the locality in convolutional neural …

被引用次数：240 相关文章所有 6 个版本

[PDF] arxiv.org

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

被引用次数：105 相关文章所有 8 个版本

[PDF] aaai.org

Evo-vit: Slow-fast token evolution for dynamic vision transformer

Y Xu, Z Zhang, M Zhang, K Sheng, K Li… - Proceedings of the …, 2022 - ojs.aaai.org

Vision transformers (ViTs) have recently received explosive popularity, but the huge
computational cost is still a severe issue. Since the computation complexity of ViT is …

被引用次数：148 相关文章所有 5 个版本

[PDF] mdpi.com

Convolutional neural networks or vision transformers: Who will win the race for action recognitions in visual data?

O Moutik, H Sekkat, S Tigani, A Chehri, R Saadane… - Sensors, 2023 - mdpi.com

Understanding actions in videos remains a significant challenge in computer vision, which
has been the subject of several pieces of research in the last decades. Convolutional neural …

被引用次数：58 相关文章所有 12 个版本

[PDF] thecvf.com

Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

被引用次数：38 相关文章所有 5 个版本