The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses
We formalize and extend existing definitions of backdoor-based watermarks and adversarial
defenses as interactive protocols between two players. The existence of these schemes is …
defenses as interactive protocols between two players. The existence of these schemes is …
Hardness of Deceptive Certificate Selection
S Wäldchen - World Conference on Explainable Artificial Intelligence, 2023 - Springer
Recent progress towards theoretical interpretability guarantees for AI has been made with
classifiers that are based on interactive proof systems. A prover selects a certificate from the …
classifiers that are based on interactive proof systems. A prover selects a certificate from the …
Models That Prove Their Own Correctness
How can we trust the correctness of a learned model on a particular input of interest? Model
accuracy is typically measured\emph {on average} over a distribution of inputs, giving no …
accuracy is typically measured\emph {on average} over a distribution of inputs, giving no …
[PDF][PDF] Extending Merlin-Arthur Classifiers for Improved Interpretability.
B Turan - xAI (Late-breaking Work, Demos, Doctoral Consortium), 2023 - ceur-ws.org
In my doctoral research, I aim to address the interpretability challenges associated with deep
learning by extending the Merlin-Arthur Classifier framework. This novel approach employs …
learning by extending the Merlin-Arthur Classifier framework. This novel approach employs …
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks
As AI becomes omnipresent in today's world, it is crucial to study the safety aspects of
learning, such as guaranteed watermarking capabilities and defenses against adversarial …
learning, such as guaranteed watermarking capabilities and defenses against adversarial …