Toolqa: A dataset for llm question answering with external tools
Abstract Large Language Models (LLMs) have demonstrated impressive performance in
various NLP tasks, but they still suffer from challenges such as hallucination and weak …
various NLP tasks, but they still suffer from challenges such as hallucination and weak …
[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
Measuring and narrowing the compositionality gap in language models
We investigate the ability of language models to perform compositional reasoning tasks
where the overall solution depends on correctly composing the answers to sub-problems …
where the overall solution depends on correctly composing the answers to sub-problems …
Adapting large language models for education: Foundational capabilities, potentials, and challenges
Online education platforms, leveraging the internet to distribute education resources, seek to
provide convenient education but often fall short in real-time communication with students …
provide convenient education but often fall short in real-time communication with students …
Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions
Prompting-based large language models (LLMs) are surprisingly powerful at generating
natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question …
natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question …
Ask me anything: A simple strategy for prompting language models
Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a
natural language prompt that demonstrates how to perform the task and no additional …
natural language prompt that demonstrates how to perform the task and no additional …
Kola: Carefully benchmarking world knowledge of large language models
The unprecedented performance of large language models (LLMs) necessitates
improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we …
improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we …
Context-faithful prompting for large language models
Large language models (LLMs) encode parametric knowledge about world facts and have
shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on …
shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on …
From matching to generation: A survey on generative information retrieval
Information Retrieval (IR) systems are crucial tools for users to access information, widely
applied in scenarios like search engines, question answering, and recommendation …
applied in scenarios like search engines, question answering, and recommendation …