- 学术资源搜索

Self-supervised learning: A succinct review

V Rani, ST Nabi, M Kumar, A Mittal, K Kumar - Archives of Computational …, 2023 - Springer

Abstract Machine learning has made significant advances in the field of image processing.
The foundation of this success is supervised learning, which necessitates annotated labels …

被引用次数：142 相关文章所有 6 个版本

[PDF] arxiv.org

Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …

被引用次数：46 相关文章所有 2 个版本

[PDF] thecvf.com

Open-vocabulary object detection using captions

A Zareian, KD Rosa, DH Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Despite the remarkable accuracy of deep neural networks in object detection, they are costly
to train and scale due to supervision requirements. Particularly, learning more object …

被引用次数：438 相关文章所有 6 个版本

[PDF] thecvf.com

End-to-end learning of visual representations from uncurated instructional videos

A Miech, JB Alayrac, L Smaira… - Proceedings of the …, 2020 - openaccess.thecvf.com

Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video
models still rely on manually annotated data. With the recent introduction of the HowTo100M …

被引用次数：824 相关文章所有 15 个版本

[PDF] arxiv.org

A survey of self-supervised and few-shot object detection

G Huang, I Laradji, D Vazquez… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

Labeling data is often expensive and time-consuming, especially for tasks such as object
detection and instance segmentation, which require dense labeling of the image. While few …

被引用次数：102 相关文章所有 6 个版本

[PDF] aaai.org

Noise estimation using density estimation for self-supervised multimodal learning

E Amrani, R Ben-Ari, D Rotman… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

One of the key factors of enabling machine learning models to comprehend and solve real-
world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is …

被引用次数：140 相关文章所有 7 个版本

[PDF] neurips.cc

Contrastive learning for neural topic model

T Nguyen, AT Luu - Advances in neural information …, 2021 - proceedings.neurips.cc

Recent empirical studies show that adversarial topic models (ATM) can successfully capture
semantic patterns of the document by differentiating a document with another dissimilar …

被引用次数：66 相关文章所有 7 个版本

[PDF] neurips.cc

Look at what i'm doing: Self-supervised spatial grounding of narrations in instructional videos

R Tan, B Plummer, K Saenko, H Jin… - Advances in Neural …, 2021 - proceedings.neurips.cc

We introduce the task of spatially localizing narrated interactions in videos. Key to our
approach is the ability to learn to spatially localize interactions with self-supervision on a …

被引用次数：22 相关文章所有 7 个版本

[PDF] thecvf.com

Detours for navigating instructional videos

K Ashutosh, Z Xue, T Nagarajan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce the video detours problem for navigating instructional videos. Given a source
video and a natural language query asking to alter the how-to video's current path of …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Localized vision-language matching for open-vocabulary object detection

MA Bravo, S Mittal, T Brox - DAGM German conference on pattern …, 2022 - Springer

In this work, we propose an open-vocabulary object detection method that, based on image-
caption pairs, learns to detect novel object classes along with a given set of known classes …

被引用次数：27 相关文章所有 6 个版本