How to dp-fy ml: A practical guide to machine learning with differential privacy

N Ponomareva, H Hazimeh, A Kurakin, Z Xu… - Journal of Artificial …, 2023 - jair.org
Abstract Machine Learning (ML) models are ubiquitous in real-world applications and are a
constant focus of research. Modern ML models have become more complex, deeper, and …

On provable copyright protection for generative models

N Vyas, SM Kakade, B Barak - International Conference on …, 2023 - proceedings.mlr.press
There is a growing concern that learned conditional generative models may output samples
that are substantially similar to some copyrighted data $ C $ that was in their training set. We …

Privacy side channels in machine learning systems

E Debenedetti, G Severi, N Carlini… - 33rd USENIX Security …, 2024 - usenix.org
Most current approaches for protecting privacy in machine learning (ML) assume that
models exist in a vacuum. Yet, in reality, these models are part of larger systems that include …

Differentially private natural language models: Recent advances and future directions

L Hu, I Habernal, L Shen, D Wang - arXiv preprint arXiv:2301.09112, 2023 - arxiv.org
Recent developments in deep learning have led to great success in various natural
language processing (NLP) tasks. However, these applications may involve data that …

Can Public Large Language Models Help Private Cross-device Federated Learning?

B Wang, YJ Zhang, Y Cao, B Li, HB McMahan… - arXiv preprint arXiv …, 2023 - arxiv.org
We study (differentially) private federated learning (FL) of language models. The language
models in cross-device FL are relatively small, which can be trained with meaningful formal …

Vip: A differentially private foundation model for computer vision

Y Yu, M Sanjabi, Y Ma, K Chaudhuri, C Guo - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of
foundation models trained on internet-scale data. On the flip side, the uncurated nature of …

Textfusion: Privacy-preserving pre-trained model inference via token fusion

X Zhou, J Lu, T Gui, R Ma, Z Fei, Y Wang… - Proceedings of the …, 2022 - aclanthology.org
Recently, more and more pre-trained language models are released as a cloud service. It
allows users who lack computing resources to perform inference with a powerful model by …

Privacy-preserving models for legal natural language processing

Y Yin, I Habernal - arXiv preprint arXiv:2211.02956, 2022 - arxiv.org
Pre-training large transformer models with in-domain data improves domain adaptation and
helps gain performance on the domain-specific downstream tasks. However, sharing …

Purifying large language models by ensembling a small language model

T Li, Q Liu, T Pang, C Du, Q Guo, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The emerging success of large language models (LLMs) heavily relies on collecting
abundant training data from external (untrusted) sources. Despite substantial efforts devoted …

Private ad modeling with DP-SGD

C Denison, B Ghazi, P Kamath, R Kumar… - arXiv preprint arXiv …, 2022 - arxiv.org
A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient
descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has …