Ambient sound provides supervision for visual learning

Y Wang, CM Albrecht, NAA Braham… - IEEE Geoscience and …, 2022 - ieeexplore.ieee.org

In deep learning research, self-supervised learning (SSL) has received great attention,
triggering interest within both the computer vision and remote sensing communities. While …

被引用次数：217 相关文章所有 5 个版本

[PDF] arxiv.org

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

被引用次数：775 相关文章所有 10 个版本

[PDF] arxiv.org

Contrastive learning for unpaired image-to-image translation

T Park, AA Efros, R Zhang, JY Zhu - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

In image-to-image translation, each patch in the output should reflect the content of the
corresponding patch in the input, independent of domain. We propose a straightforward …

被引用次数：1440 相关文章所有 7 个版本

[PDF] thecvf.com

Self-supervised learning of pretext-invariant representations

I Misra, L Maaten - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

The goal of self-supervised learning from images is to construct image representations that
are semantically meaningful via pretext tasks that do not require semantic annotations. Many …

被引用次数：1722 相关文章所有 7 个版本

[PDF] neurips.cc

Self-supervised multimodal versatile networks

JB Alayrac, A Recasens, R Schneider… - Advances in neural …, 2020 - proceedings.neurips.cc

Videos are a rich source of multi-modal supervision. In this work, we learn representations
using self-supervision by leveraging three modalities naturally present in videos: visual …

被引用次数：430 相关文章所有 5 个版本

[PDF] acm.org Full View

Eamm: One-shot emotional talking face via audio-based emotion-aware motion model

X Ji, H Zhou, K Wang, Q Wu, W Wu, F Xu… - ACM SIGGRAPH 2022 …, 2022 - dl.acm.org

Although significant progress has been made to audio-driven talking face generation,
existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In …

被引用次数：151 相关文章所有 4 个版本

[PDF] arxiv.org

Semi-supervised and unsupervised deep visual learning: A survey

Y Chen, M Mancini, X Zhu… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

State-of-the-art deep learning models are often trained with a large amount of costly labeled
training data. However, requiring exhaustive manual annotations may degrade the model's …

被引用次数：120 相关文章所有 18 个版本

[PDF] thecvf.com

Videobert: A joint model for video and language representation learning

C Sun, A Myers, C Vondrick… - Proceedings of the …, 2019 - openaccess.thecvf.com

Self-supervised learning has become increasingly important to leverage the abundance of
unlabeled data available on platforms like YouTube. Whereas most existing approaches …

被引用次数：1473 相关文章所有 10 个版本

[PDF] nsf.gov

Self-supervised visual feature learning with deep neural networks: A survey

L Jing, Y Tian - IEEE transactions on pattern analysis and …, 2020 - ieeexplore.ieee.org

Large-scale labeled data are generally required to train deep neural networks in order to
obtain better performance in visual feature learning from images or videos for computer …

被引用次数：2199 相关文章所有 7 个版本

[PDF] arxiv.org

What should not be contrastive in contrastive learning

T Xiao, X Wang, AA Efros, T Darrell - arXiv preprint arXiv:2008.05659, 2020 - arxiv.org

Recent self-supervised contrastive methods have been able to produce impressive
transferable visual representations by learning to be invariant to different data …

被引用次数：340 相关文章所有 4 个版本