A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
[HTML][HTML] A comprehensive survey of image augmentation techniques for deep learning
Although deep learning has achieved satisfactory performance in computer vision, a large
volume of images is required. However, collecting images is often expensive and …
volume of images is required. However, collecting images is often expensive and …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Internimage: Exploring large-scale vision foundation models with deformable convolutions
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …
large-scale models based on convolutional neural networks (CNNs) are still in an early …
Segnext: Rethinking convolutional attention design for semantic segmentation
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …
segmentation. Recent transformer-based models have dominated the field of se-mantic …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
Depth anything: Unleashing the power of large-scale unlabeled data
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …
Milestones in autonomous driving and intelligent vehicles: Survey of surveys
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace
due to the convenience, safety, and economic benefits. Although a number of surveys have …
due to the convenience, safety, and economic benefits. Although a number of surveys have …