On the optimization and generalization of multi-head attention
The training and generalization dynamics of the Transformer's core mechanism, namely the
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …
Emerging Directions in Bayesian Computation
Bayesian models are powerful tools for studying complex data, allowing the analyst to
encode rich hierarchical dependencies and leverage prior information. Most importantly …
encode rich hierarchical dependencies and leverage prior information. Most importantly …
Deep neural networks for parameterized homogenization in concurrent multiscale structural optimization
Concurrent multiscale structural optimization is concerned with the improvement of
macroscale structural performance through the design of microscale architectures. The …
macroscale structural performance through the design of microscale architectures. The …
Stability and generalization analysis of gradient methods for shallow neural networks
While significant theoretical progress has been achieved, unveiling the generalization
mystery of overparameterized neural networks still remains largely elusive. In this paper, we …
mystery of overparameterized neural networks still remains largely elusive. In this paper, we …
Learning trajectories are generalization indicators
This paper explores the connection between learning trajectories of Deep Neural Networks
(DNNs) and their generalization capabilities when optimized using (stochastic) gradient …
(DNNs) and their generalization capabilities when optimized using (stochastic) gradient …
Machine learning and the future of bayesian computation
Bayesian models are a powerful tool for studying complex data, allowing the analyst to
encode rich hierarchical dependencies and leverage prior information. Most importantly …
encode rich hierarchical dependencies and leverage prior information. Most importantly …
Toward better PAC-bayes bounds for uniformly stable algorithms
We give sharper bounds for uniformly stable randomized algorithms in a PAC-Bayesian
framework, which improve the existing results by up to a factor of $\sqrt {n} $(ignoring a log …
framework, which improve the existing results by up to a factor of $\sqrt {n} $(ignoring a log …
Generalization error bounds for iterative learning algorithms with bounded updates
J Fu, N Zheng - arXiv preprint arXiv:2309.05077, 2023 - arxiv.org
This paper explores the generalization characteristics of iterative learning algorithms with
bounded updates for non-convex loss functions, employing information-theoretic …
bounded updates for non-convex loss functions, employing information-theoretic …
Towards Stability and Generalization Bounds in Decentralized Minibatch Stochastic Gradient Descent
Decentralized Stochastic Gradient Descent (D-SGD) represents an efficient communication
approach tailored for mastering insights from vast, distributed datasets. Inspired by parallel …
approach tailored for mastering insights from vast, distributed datasets. Inspired by parallel …
Sharper Bounds for Uniformly Stable Algorithms with Stationary Mixing Process
Generalization analysis of learning algorithms often builds on a critical assumption that
training examples are independently and identically distributed, which is often violated in …
training examples are independently and identically distributed, which is often violated in …