[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas
J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …
and video monitoring applications. We present a comprehensive analysis of YOLO's …
Object detection using YOLO: Challenges, architectural successors, datasets and applications
Object detection is one of the predominant and challenging problems in computer vision.
Over the decade, with the expeditious evolution of deep learning, researchers have …
Over the decade, with the expeditious evolution of deep learning, researchers have …
YOLOv6: A single-stage object detection framework for industrial applications
For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …
Convnext v2: Co-designing and scaling convnets with masked autoencoders
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …
visual recognition has enjoyed rapid modernization and performance boost in the early …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
Diffusiondet: Diffusion model for object detection
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …
Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with …
Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with …
Bevformer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers
Abstract 3D visual perception tasks, including 3D detection and map segmentation based on
multi-camera images, are essential for autonomous driving systems. In this work, we present …
multi-camera images, are essential for autonomous driving systems. In this work, we present …
Hornet: Efficient high-order spatial interactions with recursive gated convolutions
Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …
Exploring plain vision transformer backbones for object detection
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …