Self-supervised learning: A succinct review

V Rani, ST Nabi, M Kumar, A Mittal, K Kumar - Archives of Computational …, 2023 - Springer
Abstract Machine learning has made significant advances in the field of image processing.
The foundation of this success is supervised learning, which necessitates annotated labels …

Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …

Open-vocabulary object detection using captions

A Zareian, KD Rosa, DH Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Despite the remarkable accuracy of deep neural networks in object detection, they are costly
to train and scale due to supervision requirements. Particularly, learning more object …

End-to-end learning of visual representations from uncurated instructional videos

A Miech, JB Alayrac, L Smaira… - Proceedings of the …, 2020 - openaccess.thecvf.com
Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video
models still rely on manually annotated data. With the recent introduction of the HowTo100M …

A survey of self-supervised and few-shot object detection

G Huang, I Laradji, D Vazquez… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Labeling data is often expensive and time-consuming, especially for tasks such as object
detection and instance segmentation, which require dense labeling of the image. While few …

Noise estimation using density estimation for self-supervised multimodal learning

E Amrani, R Ben-Ari, D Rotman… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
One of the key factors of enabling machine learning models to comprehend and solve real-
world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is …

Contrastive learning for neural topic model

T Nguyen, AT Luu - Advances in neural information …, 2021 - proceedings.neurips.cc
Recent empirical studies show that adversarial topic models (ATM) can successfully capture
semantic patterns of the document by differentiating a document with another dissimilar …

Look at what i'm doing: Self-supervised spatial grounding of narrations in instructional videos

R Tan, B Plummer, K Saenko, H Jin… - Advances in Neural …, 2021 - proceedings.neurips.cc
We introduce the task of spatially localizing narrated interactions in videos. Key to our
approach is the ability to learn to spatially localize interactions with self-supervision on a …

Detours for navigating instructional videos

K Ashutosh, Z Xue, T Nagarajan… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce the video detours problem for navigating instructional videos. Given a source
video and a natural language query asking to alter the how-to video's current path of …

Localized vision-language matching for open-vocabulary object detection

MA Bravo, S Mittal, T Brox - DAGM German conference on pattern …, 2022 - Springer
In this work, we propose an open-vocabulary object detection method that, based on image-
caption pairs, learns to detect novel object classes along with a given set of known classes …