Backdoor attacks on pre-trained models by layerwise weight poisoning

L Li, D Song, X Li, J Zeng, R Ma, X Qiu - arXiv preprint arXiv:2108.13888, 2021 - arxiv.org
arXiv preprint arXiv:2108.13888, 2021arxiv.org
\textbf {P} re-\textbf {T} rained\textbf {M} odel\textbf {s} have been widely applied and recently
proved vulnerable under backdoor attacks: the released pre-trained weights can be
maliciously poisoned with certain triggers. When the triggers are activated, even the fine-
tuned model will predict pre-defined labels, causing a security threat. These backdoors
generated by the poisoning methods can be erased by changing hyper-parameters during
fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight …
\textbf{P}re-\textbf{T}rained \textbf{M}odel\textbf{s} have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers. When the triggers are activated, even the fine-tuned model will predict pre-defined labels, causing a security threat. These backdoors generated by the poisoning methods can be erased by changing hyper-parameters during fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight-poisoning attack method that introduces a layerwise weight poisoning strategy to plant deeper backdoors; we also introduce a combinatorial trigger that cannot be easily detected. The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may provide hints for future model robustness studies.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果