On the optimization and generalization of multi-head attention

P Deora, R Ghaderi, H Taheri… - arXiv preprint arXiv …, 2023 - arxiv.org
The training and generalization dynamics of the Transformer's core mechanism, namely the
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …

Emerging Directions in Bayesian Computation

S Winter, T Campbell, L Lin, S Srivastava… - Statistical …, 2024 - projecteuclid.org
Bayesian models are powerful tools for studying complex data, allowing the analyst to
encode rich hierarchical dependencies and leverage prior information. Most importantly …

Deep neural networks for parameterized homogenization in concurrent multiscale structural optimization

N Black, AR Najafi - Structural and Multidisciplinary Optimization, 2023 - Springer
Concurrent multiscale structural optimization is concerned with the improvement of
macroscale structural performance through the design of microscale architectures. The …

Stability and generalization analysis of gradient methods for shallow neural networks

Y Lei, R Jin, Y Ying - Advances in Neural Information …, 2022 - proceedings.neurips.cc
While significant theoretical progress has been achieved, unveiling the generalization
mystery of overparameterized neural networks still remains largely elusive. In this paper, we …

Learning trajectories are generalization indicators

J Fu, Z Zhang, D Yin, Y Lu… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper explores the connection between learning trajectories of Deep Neural Networks
(DNNs) and their generalization capabilities when optimized using (stochastic) gradient …

Machine learning and the future of bayesian computation

S Winter, T Campbell, L Lin, S Srivastava… - arXiv preprint arXiv …, 2023 - arxiv.org
Bayesian models are a powerful tool for studying complex data, allowing the analyst to
encode rich hierarchical dependencies and leverage prior information. Most importantly …

Toward better PAC-bayes bounds for uniformly stable algorithms

S Zhou, Y Lei, A Kabán - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We give sharper bounds for uniformly stable randomized algorithms in a PAC-Bayesian
framework, which improve the existing results by up to a factor of $\sqrt {n} $(ignoring a log …

Generalization error bounds for iterative learning algorithms with bounded updates

J Fu, N Zheng - arXiv preprint arXiv:2309.05077, 2023 - arxiv.org
This paper explores the generalization characteristics of iterative learning algorithms with
bounded updates for non-convex loss functions, employing information-theoretic …

Towards Stability and Generalization Bounds in Decentralized Minibatch Stochastic Gradient Descent

J Wang, H Chen - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Decentralized Stochastic Gradient Descent (D-SGD) represents an efficient communication
approach tailored for mastering insights from vast, distributed datasets. Inspired by parallel …

Sharper Bounds for Uniformly Stable Algorithms with Stationary Mixing Process

S Fu, Y Lei, Q Cao, X Tian, D Tao - The Eleventh International …, 2023 - openreview.net
Generalization analysis of learning algorithms often builds on a critical assumption that
training examples are independently and identically distributed, which is often violated in …