Flatten transformer: Vision transformer using focused linear attention
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …
Hornet: Efficient high-order spatial interactions with recursive gated convolutions
Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …
Application of deep learning in multitemporal remote sensing image classification
X Cheng, Y Sun, W Zhang, Y Wang, X Cao, Y Wang - Remote Sensing, 2023 - mdpi.com
The rapid advancement of remote sensing technology has significantly enhanced the
temporal resolution of remote sensing data. Multitemporal remote sensing image …
temporal resolution of remote sensing data. Multitemporal remote sensing image …
Transformer meets remote sensing video detection and tracking: A comprehensive survey
Transformer has shown excellent performance in remote sensing field with long-range
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …
modeling capabilities. Remote sensing video (RSV) moving object detection and tracking …
Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation
Since the introduction of Vision Transformers, the landscape of many computer vision tasks
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …
(eg, semantic segmentation), which has been overwhelmingly dominated by CNNs, recently …
YOLOv7-RAR for urban vehicle detection
Y Zhang, Y Sun, Z Wang, Y Jiang - Sensors, 2023 - mdpi.com
Aiming at the problems of high missed detection rates of the YOLOv7 algorithm for vehicle
detection on urban roads, weak perception of small targets in perspective, and insufficient …
detection on urban roads, weak perception of small targets in perspective, and insufficient …
Slide-transformer: Hierarchical vision transformer with local self-attention
Self-attention mechanism has been a key factor in the recent progress of Vision Transformer
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …
(ViT), which enables adaptive feature extraction from global contexts. However, existing self …
YOLO-tea: A tea disease detection model improved by YOLOv5
Z Xue, R Xu, D Bai, H Lin - Forests, 2023 - mdpi.com
Diseases and insect pests of tea leaves cause huge economic losses to the tea industry
every year, so the accurate identification of them is significant. Convolutional neural …
every year, so the accurate identification of them is significant. Convolutional neural …
Completionformer: Depth completion with convolutions and vision transformers
Given sparse depths and the corresponding RGB images, depth completion aims at spatially
propagating the sparse measurements throughout the whole image to get a dense depth …
propagating the sparse measurements throughout the whole image to get a dense depth …
E-branchformer: Branchformer with enhanced merging for speech recognition
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …
global information, has shown remarkable performance and is currently regarded as the …