- 学术资源搜索

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

被引用次数：164 相关文章所有 3 个版本

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

被引用次数：66 相关文章所有 3 个版本

[PDF] arxiv.org

Impact of pretraining term frequencies on few-shot reasoning

Y Razeghi, RL Logan IV, M Gardner… - arXiv preprint arXiv …, 2022 - arxiv.org

Pretrained Language Models (LMs) have demonstrated ability to perform numerical
reasoning by extrapolating from a few examples in few-shot settings. However, the extent to …

被引用次数：187 相关文章所有 7 个版本

[PDF] arxiv.org

Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks

Z Wu, L Qiu, A Ross, E Akyürek, B Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

The impressive performance of recent language models across a wide range of tasks
suggests that they possess a degree of abstract reasoning skills. Are these skills general …

被引用次数：80 相关文章所有 4 个版本

[PDF] arxiv.org

Detecting pretraining data from large language models

W Shi, A Ajith, M Xia, Y Huang, D Liu, T Blevins… - arXiv preprint arXiv …, 2023 - arxiv.org

Although large language models (LLMs) are widely deployed, the data used to train them is
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …

被引用次数：111 相关文章所有 4 个版本

[PDF] arxiv.org

Speak, memory: An archaeology of books known to chatgpt/gpt-4

KK Chang, M Cramer, S Soni, D Bamman - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we carry out a data archaeology to infer books that are known to ChatGPT and
GPT-4 using a name cloze membership inference query. We find that OpenAI models have …

被引用次数：80 相关文章所有 5 个版本

[PDF] aaai.org

Task contamination: Language models may not be few-shot anymore

C Li, J Flanigan - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Large language models (LLMs) offer impressive performance in various zero-shot and few-
shot tasks. However, their success in zero-shot or few-shot settings may be affected by task …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

How much are llms contaminated? a comprehensive survey and the llmsanitize library

M Ravaut, B Ding, F Jiao, H Chen, X Li, R Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rise of Large Language Models (LLMs) in recent years, new opportunities are
emerging, but also new challenges, and contamination is quickly becoming critical …

被引用次数：6 相关文章所有 2 个版本

[PDF] ieee.org

A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe?

J Fields, K Chovanec, P Madiraju - IEEE Access, 2024 - ieeexplore.ieee.org

Text classification in natural language processing (NLP) is evolving rapidly, particularly with
the surge in transformer-based models, including large language models (LLM). This paper …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：37 相关文章所有 3 个版本