[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas
J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …
and video monitoring applications. We present a comprehensive analysis of YOLO's …
A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
YOLOv6: A single-stage object detection framework for industrial applications
For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …
Segment everything everywhere all at once
In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Biformer: Vision transformer with bi-level routing attention
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …
range dependency. However, such power comes at a cost: it incurs a huge computation …
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
Planning-oriented autonomous driving
Modern autonomous driving system is characterized as modular tasks in sequential order,
ie, perception, prediction, and planning. In order to perform a wide diversity of tasks and …
ie, perception, prediction, and planning. In order to perform a wide diversity of tasks and …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …