- 学术资源搜索

Large language models for data annotation: A survey

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …

被引用次数：94 相关文章所有 2 个版本

[PDF] aclanthology.org

Think twice before trusting: Self-detection for large language models through comprehensive answer reflection

M Li, W Wang, F Feng, F Zhu, Q Wang… - Findings of the …, 2024 - aclanthology.org

Abstract Self-detection for Large Language Models (LLMs) seeks to evaluate the
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …

被引用次数：1 相关文章

[PDF] arxiv.org

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

N Balepur, A Ravichander, R Rudinger - arXiv preprint arXiv:2402.12483, 2024 - arxiv.org

Multiple-choice question answering (MCQA) is often used to evaluate large language
models (LLMs). To see if MCQA assesses LLMs as intended, we probe if LLMs can perform …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?

N Balepur, R Rudinger - arXiv preprint arXiv:2407.01992, 2024 - arxiv.org

Recent work shows that large language models (LLMs) can answer multiple-choice
questions using only the choices, but does this mean that MCQA leaderboard rankings of …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning

S Palta, N Balepur, P Rankel, S Wiegreffe… - arXiv preprint arXiv …, 2024 - arxiv.org

Questions involving commonsense reasoning about everyday situations often admit many
$\textit {possible} $ or $\textit {plausible} $ answers. In contrast, multiple-choice question …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?

N Balepur, F Gu, A Ravichander, S Feng… - arXiv preprint arXiv …, 2024 - arxiv.org

Question answering (QA)-producing correct answers for input questions-is popular, but we
test a reverse question answering (RQA) task: given an input answer, generate a question …

Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs

Y Fang, M Li, W Wang, H Lin, F Feng - arXiv preprint arXiv:2406.11514, 2024 - arxiv.org

Large Language Models (LLMs) excel in various natural language processing tasks but
struggle with hallucination issues. Existing solutions have considered utilizing LLMs' …

被引用次数：4 相关文章所有 2 个版本