How to dp-fy ml: A practical guide to machine learning with differential privacy
Abstract Machine Learning (ML) models are ubiquitous in real-world applications and are a
constant focus of research. Modern ML models have become more complex, deeper, and …
constant focus of research. Modern ML models have become more complex, deeper, and …
A state-of-the-art survey on solving non-iid data in federated learning
Federated Learning (FL) proposed in recent years has received significant attention from
researchers in that it can enable multiple clients to cooperatively train global models without …
researchers in that it can enable multiple clients to cooperatively train global models without …
Hidden progress in deep learning: Sgd learns parities near the computational limit
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …
methods as we scale up datasets, model sizes, and training times. While there are some …
Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …
have to be chosen after multiple trials, making the training process inefficient. To relieve this …
Personalized cross-silo federated learning on non-iid data
Non-IID data present a tough challenge for federated learning. In this paper, we explore a
novel idea of facilitating pairwise collaborations between clients with similar data. We …
novel idea of facilitating pairwise collaborations between clients with similar data. We …
Seeing out of the box: End-to-end pre-training for vision-language representation learning
We study on joint learning of Convolutional Neural Network (CNN) and Transformer for
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …
optimization algorithm would lead to a material reduction on the time and cost of training …
Vision transformers provably learn spatial structure
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …
Towards theoretically understanding why sgd generalizes better than adam in deep learning
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse
generalization performance than SGD despite their faster training speed. This work aims to …
generalization performance than SGD despite their faster training speed. This work aims to …