A survey of modern deep learning based object detection models

SSA Zaidi, MS Ansari, A Aslam, N Kanwal… - Digital Signal …, 2022 - Elsevier
Object Detection is the task of classification and localization of objects in an image or video.
It has gained prominence in recent years due to its widespread applications. This article …

Domain generalization: A survey

K Zhou, Z Liu, Y Qiao, T Xiang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …

Visual prompt tuning

M Jia, L Tang, BC Chen, C Cardie, S Belongie… - … on Computer Vision, 2022 - Springer
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net
We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Out-of-distribution detection with deep nearest neighbors

Y Sun, Y Ming, X Zhu, Y Li - International Conference on …, 2022 - proceedings.mlr.press
Abstract Out-of-distribution (OOD) detection is a critical task for deploying machine learning
models in the open world. Distance-based methods have demonstrated promise, where …

Test-time prompt tuning for zero-shot generalization in vision-language models

M Shu, W Nie, DA Huang, Z Yu… - Advances in …, 2022 - proceedings.neurips.cc
Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …

Mitigating neural network overconfidence with logit normalization

H Wei, R Xie, H Cheng, L Feng… - … conference on machine …, 2022 - proceedings.mlr.press
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …

Slip: Self-supervision meets language-image pre-training

N Mu, A Kirillov, D Wagner, S Xie - European conference on computer …, 2022 - Springer
Recent work has shown that self-supervised pre-training leads to improvements over
supervised learning on challenging visual recognition tasks. CLIP, an exciting new …

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022 - Springer
Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …