Leveraging large language models for nlg evaluation: Advances and challenges
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …
introducing Large Language Models (LLMs) has opened new avenues for assessing …
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Leveraging large language models for nlg evaluation: A survey
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …
introducing Large Language Models (LLMs) has opened new avenues for assessing …
Holistic evaluation for interleaved text-and-image generation
Interleaved text-and-image generation has been an intriguing research direction, where the
models are required to generate both images and text pieces in an arbitrary order. Despite …
models are required to generate both images and text pieces in an arbitrary order. Despite …
X-ace: Explainable and multi-factor audio captioning evaluation
Automated audio captioning (AAC) aims to generate descriptions based on audio input,
attracting exploration of emerging audio language models (ALMs). However, current …
attracting exploration of emerging audio language models (ALMs). However, current …
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks.
However, we discover that LLMs seem to confuse different evaluation criteria, which reduces …
However, we discover that LLMs seem to confuse different evaluation criteria, which reduces …
CheckEval: Robust Evaluation Framework using Large Language Model via Checklist
We introduce CheckEval, a novel evaluation framework using Large Language Models,
addressing the challenges of ambiguity and inconsistency in current evaluation methods …
addressing the challenges of ambiguity and inconsistency in current evaluation methods …
FormalAlign: Automated Alignment Evaluation for Autoformalization
Autoformalization aims to convert informal mathematical proofs into machine-verifiable
formats, bridging the gap between natural and formal languages. However, ensuring …
formats, bridging the gap between natural and formal languages. However, ensuring …
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
The rapid progress in Large Language Models (LLMs) poses potential risks such as
generating unethical content. Assessing LLMs' values can help expose their misalignment …
generating unethical content. Assessing LLMs' values can help expose their misalignment …
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text
Large Language Model (LLM) integrations into applications like Microsoft365 suite and
Google Workspace for creating/processing documents, emails, presentations, etc. has led to …
Google Workspace for creating/processing documents, emails, presentations, etc. has led to …