- 学术资源搜索

Signal propagation in transformers: Theoretical perspectives and the role of rank collapse

L Noci, S Anagnostidis, L Biggio… - Advances in …, 2022 - proceedings.neurips.cc

Transformers have achieved remarkable success in several domains, ranging from natural
language processing to computer vision. Nevertheless, it has been recently shown that …

被引用次数：64 相关文章所有 6 个版本

Evaluation of classification models in limited data scenarios with application to additive manufacturing

F Pourkamali-Anaraki, T Nasrin, RE Jensen… - … Applications of Artificial …, 2023 - Elsevier

This paper presents a novel framework that enables the generation of unbiased estimates
for test loss using fewer labeled samples, effectively evaluating the predictive performance …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Heavy-tailed class imbalance and why adam outperforms gradient descent on language models

F Kunstner, R Yadav, A Milligan, M Schmidt… - arXiv preprint arXiv …, 2024 - arxiv.org

Adam has been shown to outperform gradient descent in optimizing large language
transformers empirically, and by a larger margin than on other tasks, but it is unclear why this …

被引用次数：18 相关文章所有 3 个版本

[PDF] acm.org

MetaFL: Privacy-preserving User Authentication in Virtual Reality with Federated Learning

R Cheng, Y Wu, A Kundu, H Latapie, M Lee… - Proceedings of the …, 2024 - dl.acm.org

The increasing popularity of virtual reality (VR) has stressed the importance of authenticating
VR users while preserving their privacy. Behavioral biometrics, owing to their robustness …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

An adaptive stochastic gradient method with non-negative gauss-newton stepsizes

A Orvieto, L Xiao - arXiv preprint arXiv:2407.04358, 2024 - arxiv.org

We consider the problem of minimizing the average of a large number of smooth but
possibly non-convex functions. In the context of most machine learning applications, each …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

L Noci, A Meterez, T Hofmann, A Orvieto - arXiv preprint arXiv:2402.17457, 2024 - arxiv.org

Recently, there has been growing evidence that if the width and depth of a neural network
are scaled toward the so-called rich feature learning limit ($\mu $ P and its depth extension) …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Initial guessing bias: How untrained networks favor some classes

E Francazi, A Lucchi, M Baity-Jesi - arXiv preprint arXiv:2306.00809, 2023 - arxiv.org

Understanding and controlling biasing effects in neural networks is crucial for ensuring
accurate and fair model performance. In the context of classification problems, we provide a …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Signal propagation in transformers: Theoretical perspectives and the role of rank collapse

Evaluation of classification models in limited data scenarios with application to additive manufacturing

Heavy-tailed class imbalance and why adam outperforms gradient descent on language models

MetaFL: Privacy-preserving User Authentication in Virtual Reality with Federated Learning

An adaptive stochastic gradient method with non-negative gauss-newton stepsizes

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

Initial guessing bias: How untrained networks favor some classes

Deconstructing the Goldilocks Zone of Neural Network Initialization

Super Consistency of Neural Network Landscapes and Learning Rate Transfer

FOSI: Hybrid First and Second Order Optimization

高级搜索

引用