[HTML][HTML] Review of image classification algorithms based on convolutional neural networks

L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com
Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …

Avoiding overfitting: A survey on regularization methods for convolutional neural networks

CFGD Santos, JP Papa - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
Several image processing tasks, such as image classification and object detection, have
been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org
Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

Flamingo: a visual language model for few-shot learning

JB Alayrac, J Donahue, P Luc… - Advances in neural …, 2022 - proceedings.neurips.cc
Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …

Pix2struct: Screenshot parsing as pretraining for visual language understanding

K Lee, M Joshi, IR Turc, H Hu, F Liu… - International …, 2023 - proceedings.mlr.press
Visually-situated language is ubiquitous—sources range from textbooks with diagrams to
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …

Deit iii: Revenge of the vit

H Touvron, M Cord, H Jégou - European conference on computer vision, 2022 - Springer
Abstract A Vision Transformer (ViT) is a simple neural architecture amenable to serve
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …

Masked autoencoders are scalable vision learners

K He, X Chen, S Xie, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

Resnet strikes back: An improved training procedure in timm

R Wightman, H Touvron, H Jégou - arXiv preprint arXiv:2110.00476, 2021 - arxiv.org
The influential Residual Networks designed by He et al. remain the gold-standard
architecture in numerous scientific publications. They typically serve as the default …