Scientific discovery in the age of artificial intelligence

H Wang, T Fu, Y Du, W Gao, K Huang, Z Liu… - Nature, 2023 - nature.com
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment
and accelerate research, helping scientists to generate hypotheses, design experiments …

Too large; data reduction for vision-language pre-training

AJ Wang, KQ Lin, DJ Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper examines the problems of severe image-text misalignment and high redundancy
in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address …

Optimizing data collection for machine learning

R Mahmood, J Lucas, JM Alvarez… - Advances in Neural …, 2022 - proceedings.neurips.cc
Modern deep learning systems require huge data sets to achieve impressive performance,
but there is little guidance on how much or what kind of data to collect. Over-collecting data …

Performance scaling via optimal transport: Enabling data selection from partially revealed sources

F Kang, HA Just, AK Sahu, R Jia - Advances in Neural …, 2024 - proceedings.neurips.cc
Traditionally, data selection has been studied in settings where all samples from prospective
sources are fully revealed to a machine learning developer. However, in practical data …

Delegated classification

E Saig, I Talgam-Cohen… - Advances in Neural …, 2024 - proceedings.neurips.cc
When machine learning is outsourced to a rational agent, conflicts of interest might arise and
severely impact predictive performance. In this work, we propose a theoretical framework for …

Playground v2. 5: Three insights towards enhancing aesthetic quality in text-to-image generation

D Li, A Kamko, E Akhgari, A Sabet, L Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-
image generative models. We focus on three critical aspects for model improvement …

Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns

JG Tejero, MS Zinkernagel, S Wolf… - Proceedings of the …, 2023 - openaccess.thecvf.com
Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly.
For segmentation applications, the burden is particularly high as manual delineations of …

A meta-learning approach to predicting performance and data requirements

A Jain, G Swaminathan, P Favaro… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose an approach to estimate the number of samples required for a model to reach a
target performance. We find that the power law, the de facto principle to estimate model …

[HTML][HTML] Machine learning-based label quality assurance for object detection projects in requirements engineering

N Pičuljan, Ž Car - Applied Sciences, 2023 - mdpi.com
Featured Application Our machine learning-based label quality assurance demo showcases
the potential of our approach to improve object detection projects within the data …

[HTML][HTML] Low fidelity data driven machine learning based optimisation method for box-wing configuration

M Hasan, A Khandoker, G Gessl, MA Hamid… - Aerospace Science and …, 2024 - Elsevier
Wing design optimization traditionally involves computationally expensive high-fidelity
simulations, limiting the exploration of design spaces. In this study, we propose a …