The two word test: A semantic benchmark for large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：945 相关文章所有 4 个版本

[PDF] arxiv.org

An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

Z Gu, I Arawjo, K Li, JK Kummerfeld… - Proceedings of the CHI …, 2024 - dl.acm.org

Readers find text difficult to consume for many reasons. Summarization can address some of
these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating …

被引用次数：4 相关文章所有 5 个版本

[PDF] osf.io

Exploring the prospects and challenges of large language models for language learning and production

AM Borghi, C De Livio, F Mannella, L Tummolini… - Sistemi …, 2023 - rivisteweb.it

LLMs such as GPT-3 (Brown et al., 2020), PaLM (Chowdhery et al., 2022), and LLaMA
(Touvron et al., 2023) consist of large neural networks containing hundreds of billions (or …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Can large language models understand uncommon meanings of common words?

J Wu, F Che, X Zheng, S Zhang, R Jin, S Nie… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) like ChatGPT have shown significant advancements across
diverse natural language understanding (NLU) tasks, including intelligent dialogue and …

大语言模型评估技术研究进展.

赵睿卓，曲紫畅，陈国英，王坤龙… - … Ju Cai Ji Yu Chu Li, 2024 - search.ebscohost.com

随着大语言模型的广泛应用, 针对大语言模型的评估工作变得至关重要. 除了大语言模型在下游
任务上的表现情况需要评估外, 其存在的一些潜在风险更需要评估, 例如大语言模型可能违背 …