Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

A survey of multimodal deep generative models

M Suzuki, Y Matsuo - Advanced Robotics, 2022 - Taylor & Francis
Multimodal learning is a framework for building models that make predictions based on
different types of modalities. Important challenges in multimodal learning are the inference of …

Trusted multi-view classification with dynamic evidential fusion

Z Han, C Zhang, H Fu, JT Zhou - IEEE transactions on pattern …, 2022 - ieeexplore.ieee.org
Existing multi-view classification algorithms focus on promoting accuracy by exploiting
different views, typically integrating them into common representations for follow-up tasks …

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org
Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

Variational mixture-of-experts autoencoders for multi-modal deep generative models

Y Shi, B Paige, P Torr - Advances in neural information …, 2019 - proceedings.neurips.cc
Learning generative models that span multiple data modalities, such as vision and
language, is often motivated by the desire to learn more useful, generalisable …

Multimodal generative models for scalable weakly-supervised learning

M Wu, N Goodman - Advances in neural information …, 2018 - proceedings.neurips.cc
Multiple modalities often co-occur when describing natural phenomena. Learning a joint
representation of these modalities should yield deeper and more useful representations …

Deep partial multi-view learning

C Zhang, Y Cui, Z Han, JT Zhou… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Although multi-view learning has made significant progress over the past few decades, it is
still challenging due to the difficulty in modeling complex correlations among different views …

Learning modality-specific and-agnostic representations for asynchronous multimodal language sequences

D Yang, H Kuang, S Huang, L Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Understanding human behaviors and intents from videos is a challenging task. Video flows
usually involve time-series data from different modalities, such as natural language, facial …

[HTML][HTML] The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution

O Rozenblatt-Rosen, A Regev, P Oberdoerffer, T Nawy… - Cell, 2020 - cell.com
Crucial transitions in cancer—including tumor initiation, local expansion, metastasis, and
therapeutic resistance—involve complex interactions between cells within the dynamic …

Gaussian process prior variational autoencoders

FP Casale, A Dalca, L Saglietti… - Advances in neural …, 2018 - proceedings.neurips.cc
Variational autoencoders (VAE) are a powerful and widely-used class of models to learn
complex data distributions in an unsupervised fashion. One important limitation of VAEs is …