Stealing part of a production language model
We introduce the first model-stealing attack that extracts precise, nontrivial information from
black-box production language models like OpenAI's ChatGPT or Google's PaLM-2 …
black-box production language models like OpenAI's ChatGPT or Google's PaLM-2 …
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
S Feng, F Tramèr - arXiv preprint arXiv:2404.00473, 2024 - arxiv.org
Practitioners commonly download pretrained machine learning models from open
repositories and finetune them to fit specific applications. We show that this practice …
repositories and finetune them to fit specific applications. We show that this practice …
A False Sense of Safety: Unsafe Information Leakage in'Safe'AI Responses
Large Language Models (LLMs) are vulnerable to jailbreaks $\unicode {x2013} $ methods
to elicit harmful or generally impermissible outputs. Safety measures are developed and …
to elicit harmful or generally impermissible outputs. Safety measures are developed and …
Sequencing the Neurome: Towards Scalable Exact Parameter Reconstruction of Black-Box Neural Networks
Inferring the exact parameters of a neural network with only query access is an NP-Hard
problem, with few practical existing algorithms. Solutions would have major implications for …
problem, with few practical existing algorithms. Solutions would have major implications for …