- 学术资源搜索

Advances, challenges and opportunities in creating data for trustworthy AI

W Liang, GA Tadesse, D Ho, L Fei-Fei… - Nature Machine …, 2022 - nature.com

As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …

被引用次数：310 相关文章所有 3 个版本

[PDF] arxiv.org

Survey of hallucination in natural language generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

被引用次数：2799 相关文章所有 7 个版本

[PDF] neurips.cc

Qlora: Efficient finetuning of quantized llms

T Dettmers, A Pagnoni, A Holtzman… - Advances in Neural …, 2024 - proceedings.neurips.cc

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to
finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit …

被引用次数：1765 相关文章所有 6 个版本

[PDF] arxiv.org

Should chatgpt be biased? challenges and risks of bias in large language models

E Ferrara - arXiv preprint arXiv:2304.03738, 2023 - arxiv.org

As the capabilities of generative language models continue to advance, the implications of
biases ingrained within these models have garnered increasing attention from researchers …

被引用次数：334 相关文章所有 5 个版本

[PDF] neurips.cc

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc

Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

被引用次数：285 相关文章所有 9 个版本

[PDF] arxiv.org

Large language models are not fair evaluators

P Wang, L Li, L Chen, Z Cai, D Zhu, B Lin… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large
language models~(LLMs), eg, GPT-4, as a referee to score and compare the quality of …

被引用次数：317 相关文章所有 2 个版本

[PDF] pnas.org Full View

The debate over understanding in AI's large language models

M Mitchell, DC Krakauer - Proceedings of the National …, 2023 - National Acad Sciences

We survey a current, heated debate in the artificial intelligence (AI) research community on
whether large pretrained language models can be said to understand language—and the …

被引用次数：239 相关文章所有 9 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：187 相关文章所有 4 个版本

[PDF] arxiv.org

Unnatural instructions: Tuning language models with (almost) no human labor

O Honovich, T Scialom, O Levy, T Schick - arXiv preprint arXiv:2212.09689, 2022 - arxiv.org

Instruction tuning enables pretrained language models to perform new tasks from inference-
time natural language descriptions. These approaches rely on vast amounts of human …

被引用次数：276 相关文章所有 5 个版本

[PDF] neurips.cc

Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality

CY Hsieh, J Zhang, Z Ma… - Advances in neural …, 2024 - proceedings.neurips.cc

In the last year alone, a surge of new benchmarks to measure $\textit {compositional} $
understanding of vision-language models have permeated the machine learning ecosystem …

被引用次数：81 相关文章所有 7 个版本