[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - … on Computer Vision, 2022 - Springer
Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

Bytetrack: Multi-object tracking by associating every detection box

Y Zhang, P Sun, Y Jiang, D Yu, F Weng, Z Yuan… - European conference on …, 2022 - Springer
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …

Inception transformer

C Si, W Yu, P Zhou, Y Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recent studies show that transformer has strong capability of building long-range
dependencies, yet is incompetent in capturing high frequencies that predominantly convey …

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

Srformer: Permuted self-attention for single image super-resolution

Y Zhou, Z Li, CL Guo, S Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation

M Heidari, A Kazerouni, M Soltany… - Proceedings of the …, 2023 - openaccess.thecvf.com
Convolutional neural networks (CNNs) have been the consensus for medical image
segmentation tasks. However, they inevitably suffer from the limitation in modeling long …