Entropy-sgd: Biasing gradient descent into wide valleys

P Chaudhari, A Choromanska, S Soatto… - Journal of Statistical …, 2019 - iopscience.iop.org
This paper proposes a new optimization algorithm called Entropy-SGD for training deep
neural networks that is motivated by the local geometry of the energy landscape. Local …

Regularization for deep learning: A taxonomy

J Kukačka, V Golkov, D Cremers - arXiv preprint arXiv:1710.10686, 2017 - arxiv.org
Regularization is one of the crucial ingredients of deep learning, yet the term regularization
has various definitions, and regularization methods are often studied separately from each …

Online deep learning: Learning deep neural networks on the fly

D Sahoo, Q Pham, J Lu, SCH Hoi - arXiv preprint arXiv:1711.03705, 2017 - arxiv.org
Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch learning
setting, which requires the entire training data to be made available prior to the learning …

C-mil: Continuation multiple instance learning for weakly supervised object detection

F Wan, C Liu, W Ke, X Ji, J Jiao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Weakly supervised object detection (WSOD) is a challenging task when provided with image
category supervision but required to simultaneously learn object locations and object …

Active bias: Training more accurate neural networks by emphasizing high variance samples

HS Chang, E Learned-Miller… - Advances in Neural …, 2017 - proceedings.neurips.cc
Self-paced learning and hard example mining re-weight training instances to improve
learning accuracy. This paper presents two improved alternatives based on lightweight …

Empirical analysis of the hessian of over-parametrized neural networks

L Sagun, U Evci, VU Guney, Y Dauphin… - arXiv preprint arXiv …, 2017 - arxiv.org
We study the properties of common loss surfaces through their Hessian matrix. In particular,
in the context of deep learning, we empirically show that the spectrum of the Hessian is …

Min-entropy latent model for weakly supervised object detection

F Wan, P Wei, J Jiao, Z Han… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
Weakly supervised object detection is a challenging task when provided with image
category supervision but required to learn, at the same time, object locations and object …

Understanding the impact of entropy on policy optimization

Z Ahmed, N Le Roux, M Norouzi… - … on machine learning, 2019 - proceedings.mlr.press
Entropy regularization is commonly used to improve policy optimization in reinforcement
learning. It is believed to help with exploration by encouraging the selection of more …

Maximum mean discrepancy gradient flow

M Arbel, A Korba, A Salim… - Advances in Neural …, 2019 - proceedings.neurips.cc
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and
study its convergence properties. The MMD is an integral probability metric defined for a …

Multi-level residual networks from dynamical systems view

B Chang, L Meng, E Haber, F Tung… - arXiv preprint arXiv …, 2017 - arxiv.org
Deep residual networks (ResNets) and their variants are widely used in many computer
vision applications and natural language processing tasks. However, the theoretical …