How to dp-fy ml: A practical guide to machine learning with differential privacy

N Ponomareva, H Hazimeh, A Kurakin, Z Xu… - Journal of Artificial …, 2023 - jair.org
Abstract Machine Learning (ML) models are ubiquitous in real-world applications and are a
constant focus of research. Modern ML models have become more complex, deeper, and …

A state-of-the-art survey on solving non-iid data in federated learning

X Ma, J Zhu, Z Lin, S Chen, Y Qin - Future Generation Computer Systems, 2022 - Elsevier
Federated Learning (FL) proposed in recent years has received significant attention from
researchers in that it can enable multiple clients to cooperatively train global models without …

Hidden progress in deep learning: Sgd learns parities near the computational limit

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

Personalized cross-silo federated learning on non-iid data

Y Huang, L Chu, Z Zhou, L Wang, J Liu, J Pei… - Proceedings of the …, 2021 - ojs.aaai.org
Non-IID data present a tough challenge for federated learning. In this paper, we explore a
novel idea of facilitating pairwise collaborations between clients with similar data. We …

Seeing out of the box: End-to-end pre-training for vision-language representation learning

Z Huang, Z Zeng, Y Huang, B Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
We study on joint learning of Convolutional Neural Network (CNN) and Transformer for
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Vision transformers provably learn spatial structure

S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …

Towards theoretically understanding why sgd generalizes better than adam in deep learning

P Zhou, J Feng, C Ma, C Xiong… - Advances in Neural …, 2020 - proceedings.neurips.cc
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse
generalization performance than SGD despite their faster training speed. This work aims to …