Defending against universal perturbations with shared adversarial training
CK Mummadi, T Brox… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Proceedings of the IEEE/CVF international conference on …, 2019•openaccess.thecvf.com
Classifiers such as deep neural networks have been shown to be vulnerable against
adversarial perturbations on problems with high-dimensional input space. While adversarial
training improves the robustness of image classifiers against such adversarial perturbations,
it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this
work, we show that adversarial training is more effective in preventing universal
perturbations, where the same perturbation needs to fool a classifier on many inputs …
adversarial perturbations on problems with high-dimensional input space. While adversarial
training improves the robustness of image classifiers against such adversarial perturbations,
it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this
work, we show that adversarial training is more effective in preventing universal
perturbations, where the same perturbation needs to fool a classifier on many inputs …
Abstract
Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果