Large ai models in health informatics: Applications, challenges, and the future

J Qiu, L Li, J Sun, J Peng, P Shi… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Large AI models, or foundation models, are models recently emerging with massive scales
both parameter-wise and data-wise, the magnitudes of which can reach beyond billions …

On the challenges and perspectives of foundation models for medical image analysis

S Zhang, D Metaxas - Medical Image Analysis, 2023 - Elsevier
This article discusses the opportunities, applications and future directions of large-scale
pretrained models, ie, foundation models, which promise to significantly improve the …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Reproducible scaling laws for contrastive language-image learning

M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Videochat: Chat-centric video understanding

KC Li, Y He, Y Wang, Y Li, W Wang, P Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we initiate an attempt of developing an end-to-end chat-centric video
understanding system, coined as VideoChat. It integrates video foundation models and …

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Detrs with collaborative hybrid assignments training

Z Zong, G Song, Y Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …

Bevformer v2: Adapting modern image backbones to bird's-eye-view recognition via perspective supervision

C Yang, Y Chen, H Tian, C Tao, X Zhu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which
converges faster and better suits modern image backbones. Existing state-of-the-art BEV …

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …