Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …
prevalence in natural language processing or computer vision. Since medical imaging bear …
Directional convergence and alignment in deep learning
Z Ji, M Telgarsky - Advances in Neural Information …, 2020 - proceedings.neurips.cc
In this paper, we show that although the minimizers of cross-entropy and related
classification losses are off at infinity, network weights learned by gradient flow converge in …
classification losses are off at infinity, network weights learned by gradient flow converge in …
Towards understanding sharpness-aware minimization
M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …
worst-case weight perturbations which significantly improves generalization in various …
Swad: Domain generalization by seeking flat minima
Abstract Domain generalization (DG) methods aim to achieve generalizability to an unseen
target domain by using only training data from the source domains. Although a variety of DG …
target domain by using only training data from the source domains. Although a variety of DG …
Effect of data encoding on the expressive power of variational quantum-machine-learning models
Quantum computers can be used for supervised learning by treating parametrized quantum
circuits as models that map data inputs to predictions. While a lot of work has been done to …
circuits as models that map data inputs to predictions. While a lot of work has been done to …
What is being transferred in transfer learning?
B Neyshabur, H Sedghi… - Advances in neural …, 2020 - proceedings.neurips.cc
One desired capability for machines is the ability to transfer their understanding of one
domain to another domain where data is (usually) scarce. Despite ample adaptation of …
domain to another domain where data is (usually) scarce. Despite ample adaptation of …
Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks
Recently, learning algorithms motivated from sharpness of loss surface as an effective
measure of generalization gap have shown state-of-the-art performances. Nevertheless …
measure of generalization gap have shown state-of-the-art performances. Nevertheless …
Understanding gradient descent on the edge of stability in deep learning
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
What neural networks memorize and why: Discovering the long tail via influence estimation
Deep learning algorithms are well-known to have a propensity for fitting the training data
very well and often fit even outliers and mislabeled data points. Such fitting requires …
very well and often fit even outliers and mislabeled data points. Such fitting requires …
Bayesian deep learning and a probabilistic perspective of generalization
AG Wilson, P Izmailov - Advances in neural information …, 2020 - proceedings.neurips.cc
The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …