Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Graph embedding contrastive multi-modal representation learning for clustering

W Xia, T Wang, Q Gao, M Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Multi-modal clustering (MMC) aims to explore complementary information from diverse
modalities for clustering performance facilitating. This article studies challenging problems in …

From word types to tokens and back: A survey of approaches to word meaning representation and interpretation

M Apidianaki - Computational Linguistics, 2023 - direct.mit.edu
Vector-based word representation paradigms situate lexical meaning at different levels of
abstraction. Distributional and static embedding models generate a single vector per word …

MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

N Hayat, KJ Geras, FE Shamout - Machine Learning for …, 2022 - proceedings.mlr.press
Multi-modal fusion approaches aim to integrate information from different data sources.
Unlike natural datasets, such as in audio-visual applications, where samples consist of …

Mind Artist: Creating Artistic Snapshots with Human Thought

J Chen, Y Qi, Y Wang, G Pan - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract We introduce Mind Artist (MindArt) a novel and efficient neural decoding
architecture to snap artistic photographs from our mind in a controllable manner. Recently …

All in One Framework for Multimodal Re-identification in the Wild

H Li, M Ye, M Zhang, B Du - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Abstract In Re-identification (ReID) recent advancements yield noteworthy progress in both
unimodal and cross-modal retrieval tasks. However the challenge persists in developing a …

Core-periphery principle guided redesign of self-attention in transformers

X Yu, L Zhang, H Dai, Y Lyu, L Zhao, Z Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Designing more efficient, reliable, and explainable neural network architectures is critical to
studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc …

Towards Weakly Supervised Text-to-Audio Grounding

X Xu, Z Ma, M Wu, K Yu - arXiv preprint arXiv:2401.02584, 2024 - arxiv.org
Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events
described by natural language. This task can facilitate applications such as multimodal …

Regression metric loss: Learning a semantic representation space for medical images

H Chao, J Zhang, P Yan - … Conference on Medical Image Computing and …, 2022 - Springer
Regression plays an essential role in many medical imaging applications for estimating
various clinical risk or measurement scores. While training strategies and loss functions …

A Cross-Domain Multimodal Supervised Latent Topic Model for Item Tagging and Cold-Start Recommendation

R Tang, C Yang, Y Wang - IEEE MultiMedia, 2023 - ieeexplore.ieee.org
Cross-domain data analysis is playing an increasingly important role in media convergence
and can be adopted for many applications. Most existing methods consider the domain …