Evaluating superhuman models with consistency checks
If machine learning models were to achieve superhuman abilities at various reasoning or
decision-making tasks, how would we go about evaluating such models, given that humans …
decision-making tasks, how would we go about evaluating such models, given that humans …
Factual consistency of multilingual pretrained language models
Pretrained language models can be queried for factual knowledge, with potential
applications in knowledge base acquisition and tasks that require inference. However, for …
applications in knowledge base acquisition and tasks that require inference. However, for …
Measuring reliability of large language models through semantic consistency
While large pretrained language models (PLMs) demonstrate incredible fluency and
performance on many natural language tasks, recent work has shown that well-performing …
performance on many natural language tasks, recent work has shown that well-performing …
Predicting question-answering performance of large language models through semantic consistency
Semantic consistency of a language model is broadly defined as the model's ability to
produce semantically-equivalent outputs, given semantically-equivalent inputs. We address …
produce semantically-equivalent outputs, given semantically-equivalent inputs. We address …
Unpacking large language models with conceptual consistency
If a Large Language Model (LLM) answers" yes" to the question" Are mountains tall?" then
does it know what a mountain is? Can you rely on it responding correctly or incorrectly to …
does it know what a mountain is? Can you rely on it responding correctly or incorrectly to …
A survey on awesome Korean NLP datasets
B Ban - 2022 13th International Conference on Information and …, 2022 - ieeexplore.ieee.org
English based datasets are commonly available from Kaggle, GitHub, or recently published
papers. Although benchmark tests with English datasets are sufficient to show off the …
papers. Although benchmark tests with English datasets are sufficient to show off the …
Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents
G Betz, K Richardson - Frontiers in Artificial Intelligence, 2022 - frontiersin.org
Neural language models (NLMs) are susceptible to producing inconsistent output. This
paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train …
paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train …
Querying Labeled Time Series Data with Scenario Programs
D Shanker - arXiv preprint arXiv:2406.17627, 2024 - arxiv.org
In order to ensure autonomous vehicles are safe for on-road deployment, simulation-based
testing has become an integral complement to on-road testing. The rise in simulation testing …
testing has become an integral complement to on-road testing. The rise in simulation testing …
Automated Detection of Core Comments in Online UAV Discussion Forums
Unmanned aerial vehicles (UAVs) have become popular, and accordingly, diagnosing
failure is essential but not easy due to its difficulty and complexity. Online UAV forums can …
failure is essential but not easy due to its difficulty and complexity. Online UAV forums can …
[PDF][PDF] Improving Consistency in Large Language Models through Chain of Guidance
Consistency is a fundamental dimension of trustworthiness in Large Language Models
(LLMs). For humans to be able to trust LLM-based applications, their outputs should be …
(LLMs). For humans to be able to trust LLM-based applications, their outputs should be …