Feature pyramid networks for object detection

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

被引用次数：853 相关文章所有 6 个版本

[PDF] springer.com

Object detection using YOLO: Challenges, architectural successors, datasets and applications

T Diwan, G Anirudh, JV Tembhurne - multimedia Tools and Applications, 2023 - Springer

Object detection is one of the predominant and challenging problems in computer vision.
Over the decade, with the expeditious evolution of deep learning, researchers have …

被引用次数：511 相关文章所有 7 个版本

[PDF] arxiv.org

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arXiv preprint arXiv …, 2022 - arxiv.org

For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

被引用次数：1500 相关文章所有 3 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：337 相关文章所有 8 个版本

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

被引用次数：263 相关文章所有 6 个版本

[PDF] thecvf.com

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

被引用次数：309 相关文章所有 5 个版本

[PDF] arxiv.org

Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation

Z Liu, H Tang, A Amini, X Yang, H Mao… - … on robotics and …, 2023 - ieeexplore.ieee.org

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with …

被引用次数：570 相关文章所有 4 个版本

[PDF] arxiv.org

Bevformer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers

Z Li, W Wang, H Li, E Xie, C Sima, T Lu, Y Qiao… - European conference on …, 2022 - Springer

Abstract 3D visual perception tasks, including 3D detection and map segmentation based on
multi-camera images, are essential for autonomous driving systems. In this work, we present …

被引用次数：821 相关文章所有 9 个版本

[PDF] neurips.cc

Hornet: Efficient high-order spatial interactions with recursive gated convolutions

Y Rao, W Zhao, Y Tang, J Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …

被引用次数：235 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring plain vision transformer backbones for object detection

Y Li, H Mao, R Girshick, K He - European conference on computer vision, 2022 - Springer

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …

被引用次数：637 相关文章所有 6 个版本