Few-shot object detection: A survey

S Antonelli, D Avola, L Cinque, D Crisostomi… - ACM Computing …, 2022 - dl.acm.org
Deep learning approaches have recently raised the bar in many fields, from Natural
Language Processing to Computer Vision, by leveraging large amounts of data. However …

Survey on multi-output learning

D Xu, Y Shi, IW Tsang, YS Ong… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
The aim of multi-output learning is to simultaneously predict multiple outputs given an input.
It is an important learning problem for decision-making since making decisions in the real …

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - … on Computer Vision, 2022 - Springer
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Deep hierarchical semantic segmentation

L Li, T Zhou, W Wang, J Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Humans are able to recognize structured relations in observation, allowing us to decompose
complex scenes into simpler parts and abstract the visual world in multiple levels. However …

Decoupling zero-shot semantic segmentation

J Ding, N Xue, GS Xia, D Dai - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not
been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot …

Contrastive embedding for generalized zero-shot learning

Z Han, Z Fu, S Chen, J Yang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and
unseen classes, when only the labeled examples from seen classes are provided. Recent …

[HTML][HTML] Combined scaling for zero-shot transfer learning

H Pham, Z Dai, G Ghiasi, K Kawaguchi, H Liu, AW Yu… - Neurocomputing, 2023 - Elsevier
Recent developments in multimodal training methodologies, including CLIP and ALIGN,
obviate the necessity for individual data labeling. These approaches utilize pairs of data and …

Free: Feature refinement for generalized zero-shot learning

S Chen, W Wang, B Xia, Q Peng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts
dedicated to overcoming the problems of visual-semantic domain gaps and seen-unseen …

Clip2scene: Towards label-efficient 3d scene understanding by clip

R Chen, Y Liu, L Kong, X Zhu, Y Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D
zero-shot and few-shot learning. Despite the impressive performance in 2D, applying CLIP …

Fine-tuned clip models are efficient video learners

H Rasheed, MU Khattak, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP
model. Since training on a similar scale for videos is infeasible, recent approaches focus on …