Synthetic data in biomedicine via generative artificial intelligence
The creation and application of data in biomedicine and healthcare often face privacy
constraints, bias, distributional shifts, underrepresentation of certain groups and data …
constraints, bias, distributional shifts, underrepresentation of certain groups and data …
Muffin or chihuahua? challenging multimodal large language models with multipanel vqa
Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily
lives. These images, characterized by their composition of multiple subfigures in distinct …
lives. These images, characterized by their composition of multiple subfigures in distinct …
Why tabular foundation models should be a research priority
B van Breugel, M van der Schaar - arXiv preprint arXiv:2405.01147, 2024 - arxiv.org
Recent text and image foundation models are incredibly impressive, and these models are
attracting an ever-increasing portion of research resources. In this position piece we aim to …
attracting an ever-increasing portion of research resources. In this position piece we aim to …
Generative Conditional Distributions by Neural (Entropic) Optimal Transport
Learning conditional distributions is challenging because the desired outcome is not a
single distribution but multiple distributions that correspond to multiple instances of the …
single distribution but multiple distributions that correspond to multiple instances of the …
Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models
P Rauba, N Seedat, MR Luyten… - arXiv preprint arXiv …, 2024 - arxiv.org
The predominant de facto paradigm of testing ML models relies on either using only held-out
data to compute aggregate evaluation metrics or by assessing the performance on different …
data to compute aggregate evaluation metrics or by assessing the performance on different …
A structured regression approach for evaluating model performance across intersectional subgroups
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to
measure an AI system's performance across different subgroups defined by combinations of …
measure an AI system's performance across different subgroups defined by combinations of …
Improving Fraud Detection with 1D-Convolutional Spiking Neural Networks Through Bayesian Optimization
The digitalization of the banking sector has enabled an increasing number of fraudulent
activities in the past years. The development of new practical solutions for fraud detection is …
activities in the past years. The development of new practical solutions for fraud detection is …
ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
Recent research in tabular data synthesis has focused on single tables, whereas real-world
applications often involve complex data with tens or hundreds of interconnected tables …
applications often involve complex data with tens or hundreds of interconnected tables …
FairJob: A Real-World Dataset for Fairness in Online Systems
We introduce a fairness-aware dataset for job recommendations in advertising, designed to
foster research in algorithmic fairness within real-world scenarios. It was collected and …
foster research in algorithmic fairness within real-world scenarios. It was collected and …