Explainable semantic space by grounding language to vision with cross-modal contrastive learning

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

被引用次数：431 相关文章所有 9 个版本

Graph embedding contrastive multi-modal representation learning for clustering

W Xia, T Wang, Q Gao, M Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Multi-modal clustering (MMC) aims to explore complementary information from diverse
modalities for clustering performance facilitating. This article studies challenging problems in …

被引用次数：37 相关文章所有 4 个版本

[PDF] mit.edu

From word types to tokens and back: A survey of approaches to word meaning representation and interpretation

M Apidianaki - Computational Linguistics, 2023 - direct.mit.edu

Vector-based word representation paradigms situate lexical meaning at different levels of
abstraction. Distributional and static embedding models generate a single vector per word …

被引用次数：23 相关文章所有 4 个版本

[PDF] mlr.press

MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

N Hayat, KJ Geras, FE Shamout - Machine Learning for …, 2022 - proceedings.mlr.press

Multi-modal fusion approaches aim to integrate information from different data sources.
Unlike natural datasets, such as in audio-visual applications, where samples consist of …

被引用次数：24 相关文章所有 6 个版本

[PDF] thecvf.com

Mind Artist: Creating Artistic Snapshots with Human Thought

J Chen, Y Qi, Y Wang, G Pan - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract We introduce Mind Artist (MindArt) a novel and efficient neural decoding
architecture to snap artistic photographs from our mind in a controllable manner. Recently …

[PDF] thecvf.com

All in One Framework for Multimodal Re-identification in the Wild

H Li, M Ye, M Zhang, B Du - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Abstract In Re-identification (ReID) recent advancements yield noteworthy progress in both
unimodal and cross-modal retrieval tasks. However the challenge persists in developing a …

Core-periphery principle guided redesign of self-attention in transformers

X Yu, L Zhang, H Dai, Y Lyu, L Zhao, Z Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Designing more efficient, reliable, and explainable neural network architectures is critical to
studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Towards Weakly Supervised Text-to-Audio Grounding

X Xu, Z Ma, M Wu, K Yu - arXiv preprint arXiv:2401.02584, 2024 - arxiv.org

Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events
described by natural language. This task can facilitate applications such as multimodal …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Regression metric loss: Learning a semantic representation space for medical images

H Chao, J Zhang, P Yan - … Conference on Medical Image Computing and …, 2022 - Springer

Regression plays an essential role in many medical imaging applications for estimating
various clinical risk or measurement scores. While training strategies and loss functions …

被引用次数：4 相关文章所有 4 个版本

A Cross-Domain Multimodal Supervised Latent Topic Model for Item Tagging and Cold-Start Recommendation

R Tang, C Yang, Y Wang - IEEE MultiMedia, 2023 - ieeexplore.ieee.org

Cross-domain data analysis is playing an increasingly important role in media convergence
and can be adopted for many applications. Most existing methods consider the domain …

被引用次数：4 相关文章所有 4 个版本