Understanding metric-related pitfalls in image analysis validation
Validation metrics are key for tracking scientific progress and bridging the current chasm
between artificial intelligence research and its translation into practice. However, increasing …
between artificial intelligence research and its translation into practice. However, increasing …
A primer on Bayesian neural networks: review and debates
Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …
but their widespread applicability is hindered by inherent limitations such as overconfidence …
Dual focal loss for calibration
The use of deep neural networks in real-world applications require well-calibrated networks
with confidence scores that accurately reflect the actual probability. However, it has been …
with confidence scores that accurately reflect the actual probability. However, it has been …
On (assessing) the fairness of risk score models
Recent work on algorithmic fairness has largely focused on the fairness of discrete
decisions, or classifications. While such decisions are often based on risk score models, the …
decisions, or classifications. While such decisions are often based on risk score models, the …
Analysis and comparison of classification metrics
L Ferrer - arXiv preprint arXiv:2209.05355, 2022 - arxiv.org
A variety of different performance metrics are commonly used in the machine learning
literature for the evaluation of classification systems. Some of the most common ones for …
literature for the evaluation of classification systems. Some of the most common ones for …
Metrics reloaded: recommendations for image analysis validation
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an
underestimated global problem. In biomedical image analysis, chosen performance metrics …
underestimated global problem. In biomedical image analysis, chosen performance metrics …
The calibration gap between model and human confidence in large language models
For large language models (LLMs) to be trusted by humans they need to be well-calibrated
in the sense that they can accurately assess and communicate how likely it is that their …
in the sense that they can accurately assess and communicate how likely it is that their …
Minimum-risk recalibration of classifiers
Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of
predictive models. Despite the development of numerous recalibration algorithms, there is …
predictive models. Despite the development of numerous recalibration algorithms, there is …
Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
E Onal, K Flöge, E Caldwell, A Sheverdin… - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor
calibration, particularly when fine-tuned on small datasets. To address these challenges, we …
calibration, particularly when fine-tuned on small datasets. To address these challenges, we …
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance
Monitoring and maintaining machine learning models are among the most critical
challenges in translating recent advances in the field into real-world applications. However …
challenges in translating recent advances in the field into real-world applications. However …