Better uncertainty calibration via proper scores for classification and beyond

A Reinke, MD Tizabi, M Baumgartner, M Eisenmann… - Nature …, 2024 - nature.com

Validation metrics are key for tracking scientific progress and bridging the current chasm
between artificial intelligence research and its translation into practice. However, increasing …

被引用次数：84 相关文章所有 27 个版本

[PDF] arxiv.org

A primer on Bayesian neural networks: review and debates

J Arbel, K Pitas, M Vladimirova, V Fortuin - arXiv preprint arXiv:2309.16314, 2023 - arxiv.org

Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

被引用次数：20 相关文章所有 2 个版本

[PDF] mlr.press

Dual focal loss for calibration

L Tao, M Dong, C Xu - International Conference on Machine …, 2023 - proceedings.mlr.press

The use of deep neural networks in real-world applications require well-calibrated networks
with confidence scores that accurately reflect the actual probability. However, it has been …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

On (assessing) the fairness of risk score models

E Petersen, M Ganz, S Holm, A Feragen - Proceedings of the 2023 ACM …, 2023 - dl.acm.org

Recent work on algorithmic fairness has largely focused on the fairness of discrete
decisions, or classifications. While such decisions are often based on risk score models, the …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Analysis and comparison of classification metrics

L Ferrer - arXiv preprint arXiv:2209.05355, 2022 - arxiv.org

A variety of different performance metrics are commonly used in the machine learning
literature for the evaluation of classification systems. Some of the most common ones for …

被引用次数：30 相关文章所有 2 个版本

[PDF] ed.ac.uk

Metrics reloaded: recommendations for image analysis validation

L Maier-Hein, A Reinke, P Godau, MD Tizabi… - Nature …, 2024 - nature.com

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an
underestimated global problem. In biomedical image analysis, chosen performance metrics …

被引用次数：165 相关文章所有 20 个版本

[PDF] arxiv.org

The calibration gap between model and human confidence in large language models

M Steyvers, H Tejeda, A Kumar, C Belem… - arXiv preprint arXiv …, 2024 - arxiv.org

For large language models (LLMs) to be trusted by humans they need to be well-calibrated
in the sense that they can accurately assess and communicate how likely it is that their …

被引用次数：4 相关文章所有 3 个版本

[PDF] neurips.cc

Minimum-risk recalibration of classifiers

Z Sun, D Song, A Hero - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of
predictive models. Despite the development of numerous recalibration algorithms, there is …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

E Onal, K Flöge, E Caldwell, A Sheverdin… - arXiv preprint arXiv …, 2024 - arxiv.org

Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor
calibration, particularly when fine-tuned on small datasets. To address these challenges, we …

被引用次数：4 相关文章所有 3 个版本

[PDF] acm.org

Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance

T Decker, A Koebler, M Lebacher, I Thon… - Proceedings of the 30th …, 2024 - dl.acm.org

Monitoring and maintaining machine learning models are among the most critical
challenges in translating recent advances in the field into real-world applications. However …

被引用次数：1 相关文章所有 3 个版本