Understanding metric-related pitfalls in image analysis validation

A Reinke, MD Tizabi, M Baumgartner, M Eisenmann… - Nature …, 2024 - nature.com
Validation metrics are key for tracking scientific progress and bridging the current chasm
between artificial intelligence research and its translation into practice. However, increasing …

A primer on Bayesian neural networks: review and debates

J Arbel, K Pitas, M Vladimirova, V Fortuin - arXiv preprint arXiv:2309.16314, 2023 - arxiv.org
Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

Dual focal loss for calibration

L Tao, M Dong, C Xu - International Conference on Machine …, 2023 - proceedings.mlr.press
The use of deep neural networks in real-world applications require well-calibrated networks
with confidence scores that accurately reflect the actual probability. However, it has been …

On (assessing) the fairness of risk score models

E Petersen, M Ganz, S Holm, A Feragen - Proceedings of the 2023 ACM …, 2023 - dl.acm.org
Recent work on algorithmic fairness has largely focused on the fairness of discrete
decisions, or classifications. While such decisions are often based on risk score models, the …

Analysis and comparison of classification metrics

L Ferrer - arXiv preprint arXiv:2209.05355, 2022 - arxiv.org
A variety of different performance metrics are commonly used in the machine learning
literature for the evaluation of classification systems. Some of the most common ones for …

Metrics reloaded: recommendations for image analysis validation

L Maier-Hein, A Reinke, P Godau, MD Tizabi… - Nature …, 2024 - nature.com
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an
underestimated global problem. In biomedical image analysis, chosen performance metrics …

The calibration gap between model and human confidence in large language models

M Steyvers, H Tejeda, A Kumar, C Belem… - arXiv preprint arXiv …, 2024 - arxiv.org
For large language models (LLMs) to be trusted by humans they need to be well-calibrated
in the sense that they can accurately assess and communicate how likely it is that their …

Minimum-risk recalibration of classifiers

Z Sun, D Song, A Hero - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of
predictive models. Despite the development of numerous recalibration algorithms, there is …

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

E Onal, K Flöge, E Caldwell, A Sheverdin… - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor
calibration, particularly when fine-tuned on small datasets. To address these challenges, we …

Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance

T Decker, A Koebler, M Lebacher, I Thon… - Proceedings of the 30th …, 2024 - dl.acm.org
Monitoring and maintaining machine learning models are among the most critical
challenges in translating recent advances in the field into real-world applications. However …