Evaluating superhuman models with consistency checks

L Fluri, D Paleka, F Tramèr - 2024 IEEE Conference on Secure …, 2024 - ieeexplore.ieee.org
If machine learning models were to achieve superhuman abilities at various reasoning or
decision-making tasks, how would we go about evaluating such models, given that humans …

Factual consistency of multilingual pretrained language models

C Fierro, A Søgaard - arXiv preprint arXiv:2203.11552, 2022 - arxiv.org
Pretrained language models can be queried for factual knowledge, with potential
applications in knowledge base acquisition and tasks that require inference. However, for …

Measuring reliability of large language models through semantic consistency

H Raj, D Rosati, S Majumdar - arXiv preprint arXiv:2211.05853, 2022 - arxiv.org
While large pretrained language models (PLMs) demonstrate incredible fluency and
performance on many natural language tasks, recent work has shown that well-performing …

Predicting question-answering performance of large language models through semantic consistency

E Rabinovich, S Ackerman, O Raz, E Farchi… - arXiv preprint arXiv …, 2023 - arxiv.org
Semantic consistency of a language model is broadly defined as the model's ability to
produce semantically-equivalent outputs, given semantically-equivalent inputs. We address …

Unpacking large language models with conceptual consistency

P Sahu, M Cogswell, Y Gong, A Divakaran - arXiv preprint arXiv …, 2022 - arxiv.org
If a Large Language Model (LLM) answers" yes" to the question" Are mountains tall?" then
does it know what a mountain is? Can you rely on it responding correctly or incorrectly to …

A survey on awesome Korean NLP datasets

B Ban - 2022 13th International Conference on Information and …, 2022 - ieeexplore.ieee.org
English based datasets are commonly available from Kaggle, GitHub, or recently published
papers. Although benchmark tests with English datasets are sufficient to show off the …

Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents

G Betz, K Richardson - Frontiers in Artificial Intelligence, 2022 - frontiersin.org
Neural language models (NLMs) are susceptible to producing inconsistent output. This
paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train …

Querying Labeled Time Series Data with Scenario Programs

D Shanker - arXiv preprint arXiv:2406.17627, 2024 - arxiv.org
In order to ensure autonomous vehicles are safe for on-road deployment, simulation-based
testing has become an integral complement to on-road testing. The rise in simulation testing …

Automated Detection of Core Comments in Online UAV Discussion Forums

D Han, N Moniz, J Cleland-Huang, N Chawla - 2024 - researchsquare.com
Unmanned aerial vehicles (UAVs) have become popular, and accordingly, diagnosing
failure is essential but not easy due to its difficulty and complexity. Online UAV forums can …

[PDF][PDF] Improving Consistency in Large Language Models through Chain of Guidance

H Raj, V Gupta, D Rosati, S Majumdar - shubhobm.github.io
Consistency is a fundamental dimension of trustworthiness in Large Language Models
(LLMs). For humans to be able to trust LLM-based applications, their outputs should be …