Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

[HTML][HTML] The path toward equal performance in medical machine learning

E Petersen, S Holm, M Ganz, A Feragen - Patterns, 2023 - cell.com
To ensure equitable quality of care, differences in machine learning model performance
between patient groups must be addressed. Here, we argue that two separate mechanisms …

The measure and mismeasure of fairness

S Corbett-Davies, JD Gaebler, H Nilforoshan… - The Journal of Machine …, 2023 - dl.acm.org
The field of fair machine learning aims to ensure that decisions guided by algorithms are
equitable. Over the last decade, several formal, mathematical definitions of fairness have …

Beyond the safeguards: exploring the security risks of ChatGPT

E Derner, K Batistič - arXiv preprint arXiv:2305.08005, 2023 - arxiv.org
The increasing popularity of large language models (LLMs) such as ChatGPT has led to
growing concerns about their safety, security risks, and ethical implications. This paper aims …

Representation in AI evaluations

AS Bergman, LA Hendricks, M Rauh, B Wu… - Proceedings of the …, 2023 - dl.acm.org
Calls for representation in artificial intelligence (AI) and machine learning (ML) are
widespread, with" representation" or" representativeness" generally understood to be both …

A security risk taxonomy for large language models

E Derner, K Batistič, J Zahálka, R Babuška - arXiv preprint arXiv …, 2023 - arxiv.org
As large language models (LLMs) permeate more and more applications, an assessment of
their associated security risks becomes increasingly necessary. The potential for exploitation …

Designing equitable algorithms

A Chohlas-Wood, M Coots, S Goel… - Nature Computational …, 2023 - nature.com
Predictive algorithms are now commonly used to distribute society's resources and
sanctions. But these algorithms can entrench and exacerbate inequities. To guard against …

“You Can't Fix What You Can't Measure”: Privately Measuring Demographic Performance Disparities in Federated Learning

M Juarez, A Korolova - … through the Lens of Causality and …, 2023 - proceedings.mlr.press
As in traditional machine learning models, models trained with federated learning may
exhibit disparate performance across demographic groups. Model holders must identify …

Seeing through the data: A statistical evaluation of prohibited item detection benchmark datasets for X-ray security screening

BKS Isaac-Medina, S Yucer… - Proceedings of the …, 2023 - openaccess.thecvf.com
The rapid progress in automatic prohibited object detection within the context of X-ray
security screening, driven forward by advances in deep learning, has resulted in the first …

Making It Possible for the Auditing of AI: A Systematic Review of AI Audits and AI Auditability

Y Li, S Goel - Information Systems Frontiers, 2024 - Springer
Artificial intelligence (AI) technologies have become the key driver of innovation in society.
However, numerous vulnerabilities of AI systems can lead to negative consequences for …