Adversarial policy training against deep reinforcement learning

S Zhou, C Liu, D Ye, T Zhu, W Zhou, PS Yu - ACM Computing Surveys, 2022 - dl.acm.org

The outstanding performance of deep neural networks has promoted deep learning
applications in a broad set of domains. However, the potential risks caused by adversarial …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：244 相关文章所有 6 个版本

[PDF] arxiv.org

“real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice

G Apruzzese, HS Anderson, S Dambra… - … IEEE Conference on …, 2023 - ieeexplore.ieee.org

Recent years have seen a proliferation of research on adversarial machine learning.
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …

被引用次数：67 相关文章所有 15 个版本

[PDF] arxiv.org

Sok: Explainable machine learning for computer security applications

A Nadeem, D Vos, C Cao, L Pajola… - 2023 IEEE 8th …, 2023 - ieeexplore.ieee.org

Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …

被引用次数：40 相关文章所有 8 个版本

[PDF] mlr.press

Adversarial policies beat superhuman go AIs

TT Wang, A Gleave, T Tseng, K Pelrine… - International …, 2023 - proceedings.mlr.press

We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …

被引用次数：21 相关文章所有 9 个版本

[PDF] thecvf.com

Policycleanse: Backdoor detection and mitigation for competitive reinforcement learning

J Guo, A Li, L Wang, C Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

While real-world applications of reinforcement learning (RL) are becoming popular, the
security and robustness of RL systems are worthy of more attention and exploration. In …

被引用次数：9 相关文章所有 3 个版本

" Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences

D Olszewski, A Lu, C Stillman, K Warren… - Proceedings of the …, 2023 - dl.acm.org

Reproducibility is crucial to the advancement of science; it strengthens confidence in
seemingly contradictory results and expands the boundaries of known discoveries …

被引用次数：9 相关文章

[PDF] smu.edu.sg

Curiosity-driven and victim-aware adversarial policies

C Gong, Z Yang, Y Bai, J Shi, A Sinha, B Xu… - Proceedings of the 38th …, 2022 - dl.acm.org

Recent years have witnessed great potential in applying Deep Reinforcement Learning
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …

被引用次数：21 相关文章所有 10 个版本

[PDF] mlr.press

Robust deep reinforcement learning through bootstrapped opportunistic curriculum

J Wu, Y Vorobeychik - International Conference on Machine …, 2022 - proceedings.mlr.press

Despite considerable advances in deep reinforcement learning, it has been shown to be
highly vulnerable to adversarial perturbations to state observations. Recent efforts that have …

被引用次数：17 相关文章所有 9 个版本

Security and Privacy Issues in Deep Reinforcement Learning: Threats and Countermeasures

K Mo, P Ye, X Ren, S Wang, W Li, J Li - ACM Computing Surveys, 2024 - dl.acm.org

Deep Reinforcement Learning (DRL) is an essential subfield of Artificial Intelligence (AI),
where agents interact with environments to learn policies for solving complex tasks. In recent …

被引用次数：1 相关文章