FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes

M Lao, N Pu, Z Zhong, N Sebe, MS Lew - Proceedings of the 31st ACM …, 2023 - dl.acm.org
This paper presents a new setting for visual question answering (VQA) called personalized
federated VQA (FedVQA) that addresses the growing need for decentralization and data …

Safeguarding data in multimodal ai: A differentially private approach to clip training

A Huang, P Liu, R Nakada, L Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-
language tasks. While CLIP has revolutionized multimodal learning through joint training on …

[PDF][PDF] Exploring deep learning for multimodal understanding

M Lao - 2023 - scholarlypublications …
[14] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T.,
Louf, R., Funtowicz, M., et al.: Transformers: State-of-the-art natural language processing. In …