Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Hornet: Efficient high-order spatial interactions with recursive gated convolutions

Y Rao, W Zhao, Y Tang, J Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …

Application of deep learning in multitemporal remote sensing image classification

X Cheng, Y Sun, W Zhang, Y Wang, X Cao, Y Wang - Remote Sensing, 2023 - mdpi.com
The rapid advancement of remote sensing technology has significantly enhanced the
temporal resolution of remote sensing data. Multitemporal remote sensing image …

Transformer meets remote sensing video detection and tracking: A comprehensive survey

L Jiao, X Zhang, X Liu, F Liu, S Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …

Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation

Q Wan, Z Huang, J Lu, G Yu, L Zhang - arXiv preprint arXiv:2301.13156, 2023 - arxiv.org
Since the introduction of Vision Transformers, the landscape of many computer vision tasks
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …

YOLOv7-RAR for urban vehicle detection

Y Zhang, Y Sun, Z Wang, Y Jiang - Sensors, 2023 - mdpi.com
Aiming at the problems of high missed detection rates of the YOLOv7 algorithm for vehicle
detection on urban roads, weak perception of small targets in perspective, and insufficient …

Slide-transformer: Hierarchical vision transformer with local self-attention

X Pan, T Ye, Z Xia, S Song… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …

YOLO-tea: A tea disease detection model improved by YOLOv5

Z Xue, R Xu, D Bai, H Lin - Forests, 2023 - mdpi.com
Diseases and insect pests of tea leaves cause huge economic losses to the tea industry
every year, so the accurate identification of them is significant. Convolutional neural …

Completionformer: Depth completion with convolutions and vision transformers

Y Zhang, X Guo, M Poggi, Z Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Given sparse depths and the corresponding RGB images, depth completion aims at spatially
propagating the sparse measurements throughout the whole image to get a dense depth …

E-branchformer: Branchformer with enhanced merging for speech recognition

K Kim, F Wu, Y Peng, J Pan, P Sridhar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …