Machine vision therapy: Multimodal large language models can enhance visual robustness via denoising in-context learning

Z Huang, C Liu, Y Dong, H Su, S Zheng… - Forty-first International …, 2023 - openreview.net
Although pre-trained models such as Contrastive Language-Image Pre-Training (CLIP)
show impressive generalization results, their robustness is still limited under Out-of …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

CDSG-SAM: A cross-domain self-generating prompt few-shot brain tumor segmentation pipeline based on SAM

Y Yang, X Fang, X Li, Y Han, Z Yu - Biomedical Signal Processing and …, 2025 - Elsevier
In clinical practice, accurate segmentation of brain tumor regions is essential for patient
treatment and survival. Accurate brain tumor segmentation is an important task and it is …