You only look at one sequence: Rethinking transformer in vision through object detection

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

被引用次数：1214 相关文章所有 6 个版本

[PDF] sciencedirect.com

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier

Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

被引用次数：162 相关文章所有 9 个版本

[PDF] arxiv.org

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - … on Computer Vision, 2022 - Springer

Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

被引用次数：382 相关文章所有 10 个版本

[PDF] arxiv.org

Bytetrack: Multi-object tracking by associating every detection box

Y Zhang, P Sun, Y Jiang, D Yu, F Weng, Z Yuan… - European conference on …, 2022 - Springer

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …

被引用次数：1416 相关文章所有 12 个版本

[PDF] neurips.cc

Inception transformer

C Si, W Yu, P Zhou, Y Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent studies show that transformer has strong capability of building long-range
dependencies, yet is incompetent in capturing high frequencies that predominantly convey …

被引用次数：186 相关文章所有 8 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：356 相关文章所有 22 个版本

[PDF] thecvf.com

Srformer: Permuted self-attention for single image super-resolution

Y Zhou, Z Li, CL Guo, S Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com

Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …

被引用次数：101 相关文章所有 5 个版本

[PDF] arxiv.org

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

被引用次数：129 相关文章所有 10 个版本

[PDF] thecvf.com

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

被引用次数：694 相关文章所有 14 个版本

[PDF] thecvf.com

Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation

M Heidari, A Kazerouni, M Soltany… - Proceedings of the …, 2023 - openaccess.thecvf.com

Convolutional neural networks (CNNs) have been the consensus for medical image
segmentation tasks. However, they inevitably suffer from the limitation in modeling long …

被引用次数：201 相关文章所有 11 个版本