A survey of modern deep learning based object detection models
Object Detection is the task of classification and localization of objects in an image or video.
It has gained prominence in recent years due to its widespread applications. This article …
It has gained prominence in recent years due to its widespread applications. This article …
Domain generalization: A survey
Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …
challenging for machines to reproduce. This is because most learning algorithms strongly …
Visual prompt tuning
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
Conditional prompt learning for vision-language models
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …
to investigate ways to adapt these models to downstream datasets. A recently proposed …
Unified-io: A unified model for vision, language, and multi-modal tasks
We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …
computer vision tasks, including pose estimation, object detection, depth estimation and …
Out-of-distribution detection with deep nearest neighbors
Abstract Out-of-distribution (OOD) detection is a critical task for deploying machine learning
models in the open world. Distance-based methods have demonstrated promise, where …
models in the open world. Distance-based methods have demonstrated promise, where …
Test-time prompt tuning for zero-shot generalization in vision-language models
Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …
generalization in many downstream tasks with properly designed text prompts. Instead of …
Mitigating neural network overconfidence with logit normalization
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …
models in the real world. However, neural networks are known to suffer from the …
Slip: Self-supervision meets language-image pre-training
Recent work has shown that self-supervised pre-training leads to improvements over
supervised learning on challenging visual recognition tasks. CLIP, an exciting new …
supervised learning on challenging visual recognition tasks. CLIP, an exciting new …
Learning to prompt for vision-language models
Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …
representations that are transferable across a wide range of downstream tasks. Different …