Towards calibrated robust fine-tuning of vision-language models

T Zhao, L Zhang, Y Ma, L Cheng - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org

In the rapidly evolving landscape of artificial intelligence, multimodal learning systems
(MMLS) have gained traction for their ability to process and integrate information from …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting

RA Bafghi, N Harilal, C Monteleoni… - Proceedings of the …, 2024 - openaccess.thecvf.com

Artificial neural networks often suffer from catastrophic forgetting where learning new
concepts leads to a complete loss of previously acquired knowledge. We observe that this …

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Q Yang, W Yan, A Agrawal - arXiv preprint arXiv:2407.07840, 2024 - arxiv.org

Despite tremendous advancements, current state-of-the-art Vision-Language Models
(VLMs) are still far from perfect. They tend to hallucinate and may generate biased …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Open-Vocabulary Calibration for Vision-Language Models

S Wang, J Wang, G Wang, B Zhang, K Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-language models (VLMs) have emerged as formidable tools, showing their strong
capability in handling various open-vocabulary tasks in image recognition, text-driven visual …

相关文章所有 2 个版本

[PDF] arxiv.org

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

C Oh, Y Li, K Song, S Yun, D Han - arXiv preprint arXiv:2410.03782, 2024 - arxiv.org

Adapting a pre-trained foundation model on downstream tasks should ensure robustness
against distribution shifts without the need to retrain the whole model. Although existing …

相关文章所有 2 个版本

[PDF] arxiv.org

Improving Network Interpretability via Explanation Consistency Evaluation

H Wu, H Jiang, K Wang, Z Tang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

While deep neural networks have achieved remarkable performance, they tend to lack
transparency in prediction. The pursuit of greater interpretability in neural networks often …

MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer

M Zhu, Z Wang, M Hu, R Dang, X Lin, X Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Transferring visual-language knowledge from large-scale foundation models for video
recognition has proved to be effective. To bridge the domain gap, additional parametric …

相关文章所有 2 个版本

[PDF] openreview.net

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding

T Jin, W Yan, Y Wang, S Cai, Z Zhao - ACM Multimedia 2024, 2024 - openreview.net

In the field of machine learning, continual learning is a crucial concept that allows models to
adapt to non-stationary data distributions. However, most of the existing works focus on uni …

相关文章所有 2 个版本

[PDF] openreview.net

Open-Vocabulary Calibration for Fine-tuned CLIP

S Wang, J Wang, G Wang, B Zhang, K Zhou… - Forty-first International … - openreview.net

Vision-language models (VLMs) have emerged as formidable tools, showing their strong
capability in handling various open-vocabulary tasks in image recognition, text-driven visual …

被引用次数：1 相关文章

[PDF] ed.ac.uk

Meta-learning algorithms and applications

O Bohdal - 2024 - era.ed.ac.uk

Meta-learning in the broader context concerns how an agent learns about their own
learning, allowing them to improve their learning process. Learning how to learn is not only …

相关文章所有 2 个版本