MixPHM: redundancy-aware parameter-efficient tuning for low-resource visual question answering

J Jiang, N Zheng - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Recently, finetuning pretrained vision-language models (VLMs) has been a prevailing
paradigm for achieving state-of-the-art performance in VQA. However, as VLMs scale, it …

Vision-language alignment learning under affinity and divergence principles for few-shot out-of-distribution generalization

L Zhu, W Yin, Y Yang, F Wu, Z Zeng, Q Gu… - International Journal of …, 2024 - Springer
Recent advances in fine-tuning large-scale vision-language pre-trained models (VL-PTMs)
have shown promising results in quick adaption to downstream tasks. However, prior …

Unseen And Adverse Outdoor Scenes Recognition Through Event-based Captions

H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
This paper presents EventCAP, ie, event-based captions, for refined and enriched
qualitative and quantitative captions by Deep Learning (DL) models and Vision Language …

Measuring scientific inquiry ability related to hands-on practice: An automated approach based on multimodal data analysis

Y Song, L Guo, Q Zheng - Education and Information Technologies, 2024 - Springer
Scientific inquiry ability is closely related to the process of hands-on inquiry practice.
However, its assessment is often separated from this practice due to the limitation of …

CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering

Y Jiang, J Yin - arXiv preprint arXiv:2405.07451, 2024 - arxiv.org
While vision-language pretrained models (VLMs) excel in various multimodal understanding
tasks, their potential in fine-grained audio-visual reasoning, particularly for audio-visual …

[PDF][PDF] MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering Supplemental Material

J Jiang, N Zheng - openaccess.thecvf.com
The document provides some supplementary materials for our experiments. Specifically, in
Sec. 1, we explore the impact of different routing mechanisms and hyperparameters on …