[HTML][HTML] Hierarchical pretraining on multimodal electronic health records

X Wang, J Luo, J Wang, Z Yin, S Cui… - Proceedings of the …, 2023 - ncbi.nlm.nih.gov
Pretraining has proven to be a powerful technique in natural language processing (NLP),
exhibiting remarkable success in various NLP downstream tasks. However, in the medical …

Diffusion-based co-speech gesture generation using joint text and audio representation

A Deichler, S Mehta, S Alexanderson… - Proceedings of the 25th …, 2023 - dl.acm.org
This paper describes a system developed for the GENEA (Generation and Evaluation of Non-
verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing …

Clara: Multilingual contrastive learning for audio representation acquisition

KA Noriy, X Yang, M Budka, JJ Zhang - arXiv preprint arXiv:2310.11830, 2023 - arxiv.org
This paper proposes a novel framework for multilingual speech and sound representation
learning using contrastive learning. The lack of sizeable labelled datasets hinders speech …

Contrastive learning for cross-modal artist retrieval

A Ferraro, J Kim, S Oramas, A Ehmann… - arXiv preprint arXiv …, 2023 - arxiv.org
Music retrieval and recommendation applications often rely on content features encoded as
embeddings, which provide vector representations of items in a music dataset. Numerous …

AUC-CL: A Batchsize-Robust Framework for Self-Supervised Contrastive Representation Learning

R Sharma, K Ji, C Chen - The Twelfth International Conference on …, 2023 - openreview.net
Self-supervised learning through contrastive representations is an emergent and promising
avenue, aiming at alleviating the availability of labeled data. Recent research in the field …

Mini-Batch Optimization of Contrastive Loss

J Cho, K Sreenivasan, K Lee, K Mun, S Yi… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive learning has gained significant attention as a method for self-supervised
learning. The contrastive loss function ensures that embeddings of positive sample pairs …

The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning

N Vaessen, DA van Leeuwen - arXiv preprint arXiv:2402.13723, 2024 - arxiv.org
Foundation models in speech are often trained using many GPUs, which implicitly leads to
large effective batch sizes. In this paper we study the effect of batch size on pre-training, both …

Representation Learning Dynamics of Self-Supervised Models

P Esser, S Mukherjee, D Ghoshdastidar - arXiv preprint arXiv:2309.02011, 2023 - arxiv.org
Self-Supervised Learning (SSL) is an important paradigm for learning representations from
unlabelled data, and SSL with neural networks has been highly successful in practice …

Guarding Barlow Twins Against Overfitting with Mixed Samples

WGC Bandara, CM De Melo, VM Patel - arXiv preprint arXiv:2312.02151, 2023 - arxiv.org
Self-supervised Learning (SSL) aims to learn transferable feature representations for
downstream applications without relying on labeled data. The Barlow Twins algorithm …

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

S Swetha, J Yang, T Neiman, MN Rizve, S Tran… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized
the field of vision-language understanding by integrating visual perception capabilities into …