[HTML][HTML] Review of image classification algorithms based on convolutional neural networks
L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com
Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …
emergence of deep learning has promoted the development of this field. Convolutional …
Avoiding overfitting: A survey on regularization methods for convolutional neural networks
CFGD Santos, JP Papa - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
Several image processing tasks, such as image classification and object detection, have
been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and …
been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Pali: A jointly-scaled multilingual language-image model
Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
Flamingo: a visual language model for few-shot learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …
examples is an open challenge for multimodal machine learning research. We introduce …
Pix2struct: Screenshot parsing as pretraining for visual language understanding
Visually-situated language is ubiquitous—sources range from textbooks with diagrams to
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …
web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to …
Deit iii: Revenge of the vit
Abstract A Vision Transformer (ViT) is a simple neural architecture amenable to serve
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
Masked autoencoders are scalable vision learners
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …
for computer vision. Our MAE approach is simple: we mask random patches of the input …
Towards a general-purpose foundation model for computational pathology
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …
requiring the objective characterization of histopathological entities from whole-slide images …
Resnet strikes back: An improved training procedure in timm
The influential Residual Networks designed by He et al. remain the gold-standard
architecture in numerous scientific publications. They typically serve as the default …
architecture in numerous scientific publications. They typically serve as the default …