Wilds: A benchmark of in-the-wild distribution shifts

H Wang, T Fu, Y Du, W Gao, K Huang, Z Liu… - Nature, 2023 - nature.com

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment
and accelerate research, helping scientists to generate hypotheses, design experiments …

被引用次数：595 相关文章所有 14 个版本

[PDF] strawman.com

The current and future state of AI interpretation of medical images

P Rajpurkar, MP Lungren - New England Journal of Medicine, 2023 - Mass Medical Soc

The Current and Future State of AI Interpretation of Medical Images | New England Journal of
Medicine Skip to main content The New England Journal of Medicine homepage Advanced …

被引用次数：205 相关文章所有 9 个版本

[PDF] arxiv.org

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

被引用次数：934 相关文章所有 5 个版本

[PDF] neurips.cc

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc

Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

被引用次数：233 相关文章所有 9 个版本

[PDF] neurips.cc

Test-time prompt tuning for zero-shot generalization in vision-language models

M Shu, W Nie, DA Huang, Z Yu… - Advances in …, 2022 - proceedings.neurips.cc

Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …

被引用次数：217 相关文章所有 8 个版本

[PDF] arxiv.org

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

被引用次数：177 相关文章所有 3 个版本

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

被引用次数：245 相关文章所有 8 个版本

[HTML] nih.gov

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

被引用次数：132 相关文章所有 3 个版本

[PDF] arxiv.org

Prompting gpt-3 to be reliable

C Si, Z Gan, Z Yang, S Wang, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LLMs) show impressive abilities via few-shot prompting.
Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world …

被引用次数：212 相关文章所有 3 个版本

[PDF] mlr.press

Efficient test-time model adaptation without forgetting

S Niu, J Wu, Y Zhang, Y Chen… - International …, 2022 - proceedings.mlr.press

Test-time adaptation provides an effective means of tackling the potential distribution shift
between model training and inference, by dynamically updating the model at test time. This …

被引用次数：249 相关文章所有 6 个版本