Large ai models in health informatics: Applications, challenges, and the future
Large AI models, or foundation models, are models recently emerging with massive scales
both parameter-wise and data-wise, the magnitudes of which can reach beyond billions …
both parameter-wise and data-wise, the magnitudes of which can reach beyond billions …
On the challenges and perspectives of foundation models for medical image analysis
This article discusses the opportunities, applications and future directions of large-scale
pretrained models, ie, foundation models, which promise to significantly improve the …
pretrained models, ie, foundation models, which promise to significantly improve the …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Reproducible scaling laws for contrastive language-image learning
M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
Videochat: Chat-centric video understanding
In this paper, we initiate an attempt of developing an end-to-end chat-centric video
understanding system, coined as VideoChat. It integrates video foundation models and …
understanding system, coined as VideoChat. It integrates video foundation models and …
Vision mamba: Efficient visual representation learning with bidirectional state space model
Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …
Mamba deep learning model, have shown great potential for long sequence modeling …
Detrs with collaborative hybrid assignments training
In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …
Bevformer v2: Adapting modern image backbones to bird's-eye-view recognition via perspective supervision
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which
converges faster and better suits modern image backbones. Existing state-of-the-art BEV …
converges faster and better suits modern image backbones. Existing state-of-the-art BEV …
Eva-02: A visual representation for neon genesis
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …
to reconstruct strong and robust language-aligned vision features via masked image …