On the integration of self-attention and convolution

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：105 相关文章所有 5 个版本

[PDF] neurips.cc

Hornet: Efficient high-order spatial interactions with recursive gated convolutions

Y Rao, W Zhao, Y Tang, J Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …

被引用次数：274 相关文章所有 5 个版本

[PDF] mdpi.com

Application of deep learning in multitemporal remote sensing image classification

X Cheng, Y Sun, W Zhang, Y Wang, X Cao, Y Wang - Remote Sensing, 2023 - mdpi.com

The rapid advancement of remote sensing technology has significantly enhanced the
temporal resolution of remote sensing data. Multitemporal remote sensing image …

被引用次数：18 相关文章所有 4 个版本

[PDF] ieee.org

Transformer meets remote sensing video detection and tracking: A comprehensive survey

L Jiao, X Zhang, X Liu, F Liu, S Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org

Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation

Q Wan, Z Huang, J Lu, G Yu, L Zhang - arXiv preprint arXiv:2301.13156, 2023 - arxiv.org

Since the introduction of Vision Transformers, the landscape of many computer vision tasks
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …

被引用次数：108 相关文章所有 3 个版本

[PDF] mdpi.com

YOLOv7-RAR for urban vehicle detection

Y Zhang, Y Sun, Z Wang, Y Jiang - Sensors, 2023 - mdpi.com

Aiming at the problems of high missed detection rates of the YOLOv7 algorithm for vehicle
detection on urban roads, weak perception of small targets in perspective, and insufficient …

被引用次数：68 相关文章所有 7 个版本

[PDF] thecvf.com

Slide-transformer: Hierarchical vision transformer with local self-attention

X Pan, T Ye, Z Xia, S Song… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …

被引用次数：44 相关文章所有 6 个版本

[PDF] mdpi.com

YOLO-tea: A tea disease detection model improved by YOLOv5

Z Xue, R Xu, D Bai, H Lin - Forests, 2023 - mdpi.com

Diseases and insect pests of tea leaves cause huge economic losses to the tea industry
every year, so the accurate identification of them is significant. Convolutional neural …

被引用次数：75 相关文章所有 5 个版本

[PDF] thecvf.com

Completionformer: Depth completion with convolutions and vision transformers

Y Zhang, X Guo, M Poggi, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Given sparse depths and the corresponding RGB images, depth completion aims at spatially
propagating the sparse measurements throughout the whole image to get a dense depth …

被引用次数：68 相关文章所有 6 个版本

[PDF] arxiv.org

E-branchformer: Branchformer with enhanced merging for speech recognition

K Kim, F Wu, Y Peng, J Pan, P Sridhar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …

被引用次数：82 相关文章所有 5 个版本