Adversarial attacks and defenses in deep learning: From a perspective of cybersecurity
The outstanding performance of deep neural networks has promoted deep learning
applications in a broad set of domains. However, the potential risks caused by adversarial …
applications in a broad set of domains. However, the potential risks caused by adversarial …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
“real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice
Recent years have seen a proliferation of research on adversarial machine learning.
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …
Sok: Explainable machine learning for computer security applications
Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …
Adversarial policies beat superhuman go AIs
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …
Policycleanse: Backdoor detection and mitigation for competitive reinforcement learning
While real-world applications of reinforcement learning (RL) are becoming popular, the
security and robustness of RL systems are worthy of more attention and exploration. In …
security and robustness of RL systems are worthy of more attention and exploration. In …
" Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences
D Olszewski, A Lu, C Stillman, K Warren… - Proceedings of the …, 2023 - dl.acm.org
Reproducibility is crucial to the advancement of science; it strengthens confidence in
seemingly contradictory results and expands the boundaries of known discoveries …
seemingly contradictory results and expands the boundaries of known discoveries …
Curiosity-driven and victim-aware adversarial policies
Recent years have witnessed great potential in applying Deep Reinforcement Learning
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …
Robust deep reinforcement learning through bootstrapped opportunistic curriculum
J Wu, Y Vorobeychik - International Conference on Machine …, 2022 - proceedings.mlr.press
Despite considerable advances in deep reinforcement learning, it has been shown to be
highly vulnerable to adversarial perturbations to state observations. Recent efforts that have …
highly vulnerable to adversarial perturbations to state observations. Recent efforts that have …
Security and Privacy Issues in Deep Reinforcement Learning: Threats and Countermeasures
Deep Reinforcement Learning (DRL) is an essential subfield of Artificial Intelligence (AI),
where agents interact with environments to learn policies for solving complex tasks. In recent …
where agents interact with environments to learn policies for solving complex tasks. In recent …