How to dp-fy ml: A practical guide to machine learning with differential privacy
Abstract Machine Learning (ML) models are ubiquitous in real-world applications and are a
constant focus of research. Modern ML models have become more complex, deeper, and …
constant focus of research. Modern ML models have become more complex, deeper, and …
On provable copyright protection for generative models
There is a growing concern that learned conditional generative models may output samples
that are substantially similar to some copyrighted data $ C $ that was in their training set. We …
that are substantially similar to some copyrighted data $ C $ that was in their training set. We …
Privacy side channels in machine learning systems
Most current approaches for protecting privacy in machine learning (ML) assume that
models exist in a vacuum. Yet, in reality, these models are part of larger systems that include …
models exist in a vacuum. Yet, in reality, these models are part of larger systems that include …
Differentially private natural language models: Recent advances and future directions
Recent developments in deep learning have led to great success in various natural
language processing (NLP) tasks. However, these applications may involve data that …
language processing (NLP) tasks. However, these applications may involve data that …
Can Public Large Language Models Help Private Cross-device Federated Learning?
We study (differentially) private federated learning (FL) of language models. The language
models in cross-device FL are relatively small, which can be trained with meaningful formal …
models in cross-device FL are relatively small, which can be trained with meaningful formal …
Vip: A differentially private foundation model for computer vision
Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of
foundation models trained on internet-scale data. On the flip side, the uncurated nature of …
foundation models trained on internet-scale data. On the flip side, the uncurated nature of …
Textfusion: Privacy-preserving pre-trained model inference via token fusion
Recently, more and more pre-trained language models are released as a cloud service. It
allows users who lack computing resources to perform inference with a powerful model by …
allows users who lack computing resources to perform inference with a powerful model by …
Privacy-preserving models for legal natural language processing
Y Yin, I Habernal - arXiv preprint arXiv:2211.02956, 2022 - arxiv.org
Pre-training large transformer models with in-domain data improves domain adaptation and
helps gain performance on the domain-specific downstream tasks. However, sharing …
helps gain performance on the domain-specific downstream tasks. However, sharing …
Purifying large language models by ensembling a small language model
The emerging success of large language models (LLMs) heavily relies on collecting
abundant training data from external (untrusted) sources. Despite substantial efforts devoted …
abundant training data from external (untrusted) sources. Despite substantial efforts devoted …
Private ad modeling with DP-SGD
A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient
descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has …
descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has …