Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Graph embedding contrastive multi-modal representation learning for clustering
Multi-modal clustering (MMC) aims to explore complementary information from diverse
modalities for clustering performance facilitating. This article studies challenging problems in …
modalities for clustering performance facilitating. This article studies challenging problems in …
From word types to tokens and back: A survey of approaches to word meaning representation and interpretation
M Apidianaki - Computational Linguistics, 2023 - direct.mit.edu
Vector-based word representation paradigms situate lexical meaning at different levels of
abstraction. Distributional and static embedding models generate a single vector per word …
abstraction. Distributional and static embedding models generate a single vector per word …
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Multi-modal fusion approaches aim to integrate information from different data sources.
Unlike natural datasets, such as in audio-visual applications, where samples consist of …
Unlike natural datasets, such as in audio-visual applications, where samples consist of …
Mind Artist: Creating Artistic Snapshots with Human Thought
Abstract We introduce Mind Artist (MindArt) a novel and efficient neural decoding
architecture to snap artistic photographs from our mind in a controllable manner. Recently …
architecture to snap artistic photographs from our mind in a controllable manner. Recently …
All in One Framework for Multimodal Re-identification in the Wild
Abstract In Re-identification (ReID) recent advancements yield noteworthy progress in both
unimodal and cross-modal retrieval tasks. However the challenge persists in developing a …
unimodal and cross-modal retrieval tasks. However the challenge persists in developing a …
Core-periphery principle guided redesign of self-attention in transformers
Designing more efficient, reliable, and explainable neural network architectures is critical to
studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc …
studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc …
Towards Weakly Supervised Text-to-Audio Grounding
Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events
described by natural language. This task can facilitate applications such as multimodal …
described by natural language. This task can facilitate applications such as multimodal …
Regression metric loss: Learning a semantic representation space for medical images
Regression plays an essential role in many medical imaging applications for estimating
various clinical risk or measurement scores. While training strategies and loss functions …
various clinical risk or measurement scores. While training strategies and loss functions …
A Cross-Domain Multimodal Supervised Latent Topic Model for Item Tagging and Cold-Start Recommendation
R Tang, C Yang, Y Wang - IEEE MultiMedia, 2023 - ieeexplore.ieee.org
Cross-domain data analysis is playing an increasingly important role in media convergence
and can be adopted for many applications. Most existing methods consider the domain …
and can be adopted for many applications. Most existing methods consider the domain …