Can you rely on your model evaluation? improving model evaluation with synthetic test data

B van Breugel, T Liu, D Oglic… - Nature Reviews …, 2024 - nature.com

The creation and application of data in biomedicine and healthcare often face privacy
constraints, bias, distributional shifts, underrepresentation of certain groups and data …

被引用次数：5 相关文章

[PDF] aclanthology.org

Muffin or chihuahua? challenging multimodal large language models with multipanel vqa

Y Fan, J Gu, K Zhou, Q Yan, S Jiang… - Proceedings of the …, 2024 - aclanthology.org

Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily
lives. These images, characterized by their composition of multiple subfigures in distinct …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Why tabular foundation models should be a research priority

B van Breugel, M van der Schaar - arXiv preprint arXiv:2405.01147, 2024 - arxiv.org

Recent text and image foundation models are incredibly impressive, and these models are
attracting an ever-increasing portion of research resources. In this position piece we aim to …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Generative Conditional Distributions by Neural (Entropic) Optimal Transport

B Nguyen, B Nguyen, HT Nguyen… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning conditional distributions is challenging because the desired outcome is not a
single distribution but multiple distributions that correspond to multiple instances of the …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

P Rauba, N Seedat, MR Luyten… - arXiv preprint arXiv …, 2024 - arxiv.org

The predominant de facto paradigm of testing ML models relies on either using only held-out
data to compute aggregate evaluation metrics or by assessing the performance on different …

被引用次数：1 相关文章所有 3 个版本

[PDF] acm.org

A structured regression approach for evaluating model performance across intersectional subgroups

C Herlihy, K Truong, A Chouldechova… - The 2024 ACM …, 2024 - dl.acm.org

Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to
measure an AI system's performance across different subgroups defined by combinations of …

被引用次数：1 相关文章所有 4 个版本

Improving Fraud Detection with 1D-Convolutional Spiking Neural Networks Through Bayesian Optimization

D Perdigão, F Antunes, C Silva, B Ribeiro - EPIA Conference on Artificial …, 2024 - Springer

The digitalization of the banking sector has enabled an increasing number of fraudulent
activities in the past years. The development of new practical solutions for fraud detection is …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Task Me Anything

J Zhang, W Huang, Z Ma, O Michel, D He… - arXiv preprint arXiv …, 2024 - arxiv.org

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously
assess the general capabilities of models instead of evaluating for a specific capability. As a …

被引用次数：18 相关文章

[PDF] arxiv.org

ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models

W Pang, M Shafieinejad, L Liu, X He - arXiv preprint arXiv:2405.17724, 2024 - arxiv.org

Recent research in tabular data synthesis has focused on single tables, whereas real-world
applications often involve complex data with tens or hundreds of interconnected tables …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

FairJob: A Real-World Dataset for Fairness in Online Systems

M Vladimirova, F Pavone, E Diemert - arXiv preprint arXiv:2407.03059, 2024 - arxiv.org

We introduce a fairness-aware dataset for job recommendations in advertising, designed to
foster research in algorithmic fairness within real-world scenarios. It was collected and …