REALTIME QA: what's the answer right now?

Y Zhuang, Y Yu, K Wang, H Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs) have demonstrated impressive performance in
various NLP tasks, but they still suffer from challenges such as hallucination and weak …

被引用次数：108 相关文章所有 6 个版本

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

被引用次数：245 相关文章所有 8 个版本

[PDF] arxiv.org

Measuring and narrowing the compositionality gap in language models

O Press, M Zhang, S Min, L Schmidt, NA Smith… - arXiv preprint arXiv …, 2022 - arxiv.org

We investigate the ability of language models to perform compositional reasoning tasks
where the overall solution depends on correctly composing the answers to sub-problems …

被引用次数：271 相关文章所有 6 个版本

[PDF] arxiv.org

Prompting gpt-3 to be reliable

C Si, Z Gan, Z Yang, S Wang, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LLMs) show impressive abilities via few-shot prompting.
Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world …

被引用次数：212 相关文章所有 3 个版本

[PDF] arxiv.org

Adapting large language models for education: Foundational capabilities, potentials, and challenges

Q Li, L Fu, W Zhang, X Chen, J Yu, W Xia… - arXiv preprint arXiv …, 2023 - arxiv.org

Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions

H Trivedi, N Balasubramanian, T Khot… - arXiv preprint arXiv …, 2022 - arxiv.org

Prompting-based large language models (LLMs) are surprisingly powerful at generating
natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question …

被引用次数：190 相关文章所有 5 个版本

[PDF] openreview.net

Ask me anything: A simple strategy for prompting language models

S Arora, A Narayan, MF Chen, L Orr… - The Eleventh …, 2022 - openreview.net

Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a
natural language prompt that demonstrates how to perform the task and no additional …

被引用次数：182 相关文章所有 3 个版本

[PDF] arxiv.org

Kola: Carefully benchmarking world knowledge of large language models

J Yu, X Wang, S Tu, S Cao, D Zhang-Li, X Lv… - arXiv preprint arXiv …, 2023 - arxiv.org

The unprecedented performance of large language models (LLMs) necessitates
improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we …

被引用次数：86 相关文章所有 3 个版本

[PDF] arxiv.org

Context-faithful prompting for large language models

W Zhou, S Zhang, H Poon, M Chen - arXiv preprint arXiv:2303.11315, 2023 - arxiv.org

Large language models (LLMs) encode parametric knowledge about world facts and have
shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on …

被引用次数：71 相关文章所有 5 个版本

[PDF] arxiv.org

From matching to generation: A survey on generative information retrieval

X Li, J Jin, Y Zhou, Y Zhang, P Zhang, Y Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Information Retrieval (IR) systems are crucial tools for users to access information, widely
applied in scenarios like search engines, question answering, and recommendation …

被引用次数：12 相关文章所有 2 个版本