Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite...

Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review

A Alexanderian - Inverse Problems, 2021 - iopscience.iop.org

We present a review of methods for optimal experimental design (OED) for Bayesian inverse
problems governed by partial differential equations with infinite-dimensional parameters …

被引用次数：81 相关文章所有 5 个版本

[PDF] neurips.cc

Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration

J Gardner, G Pleiss, KQ Weinberger… - Advances in neural …, 2018 - proceedings.neurips.cc

Despite advances in scalable models, the inference tools used for Gaussian processes
(GPs) have yet to fully capitalize on developments in computing hardware. We present an …

被引用次数：1391 相关文章所有 7 个版本

[PDF] arxiv.org

Pyhessian: Neural networks through the lens of the hessian

Z Yao, A Gholami, K Keutzer… - 2020 IEEE international …, 2020 - ieeexplore.ieee.org

We present PYHESSIAN, a new scalable framework that enables fast computation of
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …

被引用次数：322 相关文章所有 6 个版本

[PDF] neurips.cc

Hawq-v2: Hessian aware trace-weighted quantization of neural networks

Z Dong, Z Yao, D Arfeen, A Gholami… - Advances in neural …, 2020 - proceedings.neurips.cc

Quantization is an effective method for reducing memory footprint and inference time of
Neural Networks. However, ultra low precision quantization could lead to significant …

被引用次数：299 相关文章所有 10 个版本

[PDF] mlr.press

How to train your neural ODE: the world of Jacobian and kinetic regularization

C Finlay, JH Jacobsen, L Nurbekyan… - … on machine learning, 2020 - proceedings.mlr.press

Training neural ODEs on large datasets has not been tractable due to the necessity of
allowing the adaptive numerical ODE solver to refine its step size to very small values. In …

被引用次数：265 相关文章所有 6 个版本

[PDF] thecvf.com

Gradient norm aware minimization seeks first-order flatness and improves generalization

X Zhang, R Xu, H Yu, H Zou… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Recently, flat minima are proven to be effective for improving generalization and sharpness-
aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of …

被引用次数：46 相关文章所有 6 个版本

[PDF] arxiv.org

Generative data augmentation for commonsense reasoning

Y Yang, C Malaviya, J Fernandez… - arXiv preprint arXiv …, 2020 - arxiv.org

Recent advances in commonsense reasoning depend on large-scale human-annotated
training data to achieve peak performance. However, manual curation of training examples …

被引用次数：216 相关文章所有 5 个版本

[PDF] arxiv.org

There are many consistent explanations of unlabeled data: Why you should average

B Athiwaratkun, M Finzi, P Izmailov… - arXiv preprint arXiv …, 2018 - arxiv.org

Presently the most successful approaches to semi-supervised learning are based on
consistency regularization, whereby a model is trained to be robust to small perturbations of …

被引用次数：282 相关文章所有 7 个版本

[PDF] arxiv.org

The hessian penalty: A weak prior for unsupervised disentanglement

W Peebles, J Peebles, JY Zhu, A Efros… - Computer Vision–ECCV …, 2020 - Springer

Existing disentanglement methods for deep generative models rely on hand-picked priors
and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a …

被引用次数：134 相关文章所有 8 个版本

[PDF] thecvf.com

Network quantization with element-wise gradient scaling

J Lee, D Kim, B Ham - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com

Network quantization aims at reducing bit-widths of weights and/or activations, particularly
important for implementing deep neural networks with limited hardware resources. Most …

被引用次数：123 相关文章所有 6 个版本