Toolqa: A dataset for llm question answering with external tools

Y Zhuang, Y Yu, K Wang, H Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) have demonstrated impressive performance in
various NLP tasks, but they still suffer from challenges such as hallucination and weak …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Measuring and narrowing the compositionality gap in language models

O Press, M Zhang, S Min, L Schmidt, NA Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
We investigate the ability of language models to perform compositional reasoning tasks
where the overall solution depends on correctly composing the answers to sub-problems …

Prompting gpt-3 to be reliable

C Si, Z Gan, Z Yang, S Wang, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LLMs) show impressive abilities via few-shot prompting.
Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world …

Adapting large language models for education: Foundational capabilities, potentials, and challenges

Q Li, L Fu, W Zhang, X Chen, J Yu, W Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …

Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions

H Trivedi, N Balasubramanian, T Khot… - arXiv preprint arXiv …, 2022 - arxiv.org
Prompting-based large language models (LLMs) are surprisingly powerful at generating
natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question …

Ask me anything: A simple strategy for prompting language models

S Arora, A Narayan, MF Chen, L Orr… - The Eleventh …, 2022 - openreview.net
Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a
natural language prompt that demonstrates how to perform the task and no additional …

Kola: Carefully benchmarking world knowledge of large language models

J Yu, X Wang, S Tu, S Cao, D Zhang-Li, X Lv… - arXiv preprint arXiv …, 2023 - arxiv.org
The unprecedented performance of large language models (LLMs) necessitates
improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we …

Context-faithful prompting for large language models

W Zhou, S Zhang, H Poon, M Chen - arXiv preprint arXiv:2303.11315, 2023 - arxiv.org
Large language models (LLMs) encode parametric knowledge about world facts and have
shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on …

From matching to generation: A survey on generative information retrieval

X Li, J Jin, Y Zhou, Y Zhang, P Zhang, Y Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Information Retrieval (IR) systems are crucial tools for users to access information, widely
applied in scenarios like search engines, question answering, and recommendation …