Object detection using deep learning, CNNs and vision transformers: A review

AB Amjoud, M Amrouch - IEEE Access, 2023 - ieeexplore.ieee.org
Detecting objects remains one of computer vision and image understanding applications'
most fundamental and challenging aspects. Significant advances in object detection have …

[HTML][HTML] 2D and 3D object detection algorithms from images: A Survey

W Chen, Y Li, Z Tian, F Zhang - Array, 2023 - Elsevier
Object detection is a crucial branch of computer vision that aims to locate and classify
objects in images. Using deep convolutional neural networks (CNNs) as the primary …

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Detrs beat yolos on real-time object detection

Y Zhao, W Lv, S Xu, J Wei, G Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The YOLO series has become the most popular framework for real-time object detection due
to its reasonable trade-off between speed and accuracy. However we observe that the …

Transfusion: Robust lidar-camera fusion for 3d object detection with transformers

X Bai, Z Hu, X Zhu, Q Huang, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
LiDAR and camera are two important sensors for 3D object detection in autonomous driving.
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …

Dino: Detr with improved denoising anchor boxes for end-to-end object detection

H Zhang, F Li, S Liu, L Zhang, H Su, J Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org
We present DINO (\textbf {D} ETR with\textbf {I} mproved de\textbf {N} oising anch\textbf {O} r
boxes), a state-of-the-art end-to-end object detector.% in this paper. DINO improves over …

Petr: Position embedding transformation for multi-view 3d object detection

Y Liu, T Wang, X Zhang, J Sun - European Conference on Computer …, 2022 - Springer
In this paper, we develop position embedding transformation (PETR) for multi-view 3D
object detection. PETR encodes the position information of 3D coordinates into image …

Dn-detr: Accelerate detr training by introducing query denoising

F Li, H Zhang, S Liu, J Guo, LM Ni… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present in this paper a novel denoising training method to speedup DETR (DEtection
TRansformer) training and offer a deepened understanding of the slow convergence issue …

Masked-attention mask transformer for universal image segmentation

B Cheng, I Misra, AG Schwing… - Proceedings of the …, 2022 - openaccess.thecvf.com
Image segmentation groups pixels with different semantics, eg, category or instance
membership. Each choice of semantics defines a task. While only the semantics of each task …