Accurate, yet inconsistent? consistency analysis on language understanding models

L Fluri, D Paleka, F Tramèr - 2024 IEEE Conference on Secure …, 2024 - ieeexplore.ieee.org

If machine learning models were to achieve superhuman abilities at various reasoning or
decision-making tasks, how would we go about evaluating such models, given that humans …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Factual consistency of multilingual pretrained language models

C Fierro, A Søgaard - arXiv preprint arXiv:2203.11552, 2022 - arxiv.org

Pretrained language models can be queried for factual knowledge, with potential
applications in knowledge base acquisition and tasks that require inference. However, for …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Measuring reliability of large language models through semantic consistency

H Raj, D Rosati, S Majumdar - arXiv preprint arXiv:2211.05853, 2022 - arxiv.org

While large pretrained language models (PLMs) demonstrate incredible fluency and
performance on many natural language tasks, recent work has shown that well-performing …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Predicting question-answering performance of large language models through semantic consistency

E Rabinovich, S Ackerman, O Raz, E Farchi… - arXiv preprint arXiv …, 2023 - arxiv.org

Semantic consistency of a language model is broadly defined as the model's ability to
produce semantically-equivalent outputs, given semantically-equivalent inputs. We address …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Unpacking large language models with conceptual consistency

P Sahu, M Cogswell, Y Gong, A Divakaran - arXiv preprint arXiv …, 2022 - arxiv.org

If a Large Language Model (LLM) answers" yes" to the question" Are mountains tall?" then
does it know what a mountain is? Can you rely on it responding correctly or incorrectly to …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on awesome Korean NLP datasets

B Ban - 2022 13th International Conference on Information and …, 2022 - ieeexplore.ieee.org

English based datasets are commonly available from Kaggle, GitHub, or recently published
papers. Although benchmark tests with English datasets are sufficient to show off the …

被引用次数：10 相关文章所有 8 个版本

[PDF] frontiersin.org

Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents

G Betz, K Richardson - Frontiers in Artificial Intelligence, 2022 - frontiersin.org

Neural language models (NLMs) are susceptible to producing inconsistent output. This
paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org