[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas
J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …
and video monitoring applications. We present a comprehensive analysis of YOLO's …
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …
prevalence in natural language processing or computer vision. Since medical imaging bear …
Simple open-vocabulary object detection
Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …
improvements in image classification. For object detection, pre-training and scaling …
Bytetrack: Multi-object tracking by associating every detection box
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …
videos. Most methods obtain identities by associating detection boxes whose scores are …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …
Srformer: Permuted self-attention for single image super-resolution
Previous works have shown that increasing the window size for Transformer-based image
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
super-resolution models (eg, SwinIR) can significantly improve the model performance but …
Vision transformers need registers
Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …
In this paper, we identify and characterize artifacts in feature maps of both supervised and …
Conformer: Local features coupling global representations for visual recognition
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …
at extracting local features but experience difficulty to capture global representations. Within …
Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation
Convolutional neural networks (CNNs) have been the consensus for medical image
segmentation tasks. However, they inevitably suffer from the limitation in modeling long …
segmentation tasks. However, they inevitably suffer from the limitation in modeling long …