Adversarial attacks and defenses in deep learning: From a perspective of cybersecurity

S Zhou, C Liu, D Ye, T Zhu, W Zhou, PS Yu - ACM Computing Surveys, 2022 - dl.acm.org
The outstanding performance of deep neural networks has promoted deep learning
applications in a broad set of domains. However, the potential risks caused by adversarial …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

“real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice

G Apruzzese, HS Anderson, S Dambra… - … IEEE Conference on …, 2023 - ieeexplore.ieee.org
Recent years have seen a proliferation of research on adversarial machine learning.
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …

Sok: Explainable machine learning for computer security applications

A Nadeem, D Vos, C Cao, L Pajola… - 2023 IEEE 8th …, 2023 - ieeexplore.ieee.org
Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …

Adversarial policies beat superhuman go AIs

TT Wang, A Gleave, T Tseng, K Pelrine… - International …, 2023 - proceedings.mlr.press
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …

Policycleanse: Backdoor detection and mitigation for competitive reinforcement learning

J Guo, A Li, L Wang, C Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
While real-world applications of reinforcement learning (RL) are becoming popular, the
security and robustness of RL systems are worthy of more attention and exploration. In …

" Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences

D Olszewski, A Lu, C Stillman, K Warren… - Proceedings of the …, 2023 - dl.acm.org
Reproducibility is crucial to the advancement of science; it strengthens confidence in
seemingly contradictory results and expands the boundaries of known discoveries …

Curiosity-driven and victim-aware adversarial policies

C Gong, Z Yang, Y Bai, J Shi, A Sinha, B Xu… - Proceedings of the 38th …, 2022 - dl.acm.org
Recent years have witnessed great potential in applying Deep Reinforcement Learning
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …

Robust deep reinforcement learning through bootstrapped opportunistic curriculum

J Wu, Y Vorobeychik - International Conference on Machine …, 2022 - proceedings.mlr.press
Despite considerable advances in deep reinforcement learning, it has been shown to be
highly vulnerable to adversarial perturbations to state observations. Recent efforts that have …

Security and Privacy Issues in Deep Reinforcement Learning: Threats and Countermeasures

K Mo, P Ye, X Ren, S Wang, W Li, J Li - ACM Computing Surveys, 2024 - dl.acm.org
Deep Reinforcement Learning (DRL) is an essential subfield of Artificial Intelligence (AI),
where agents interact with environments to learn policies for solving complex tasks. In recent …