Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arXiv preprint arXiv:2402.01383, 2024 - arxiv.org
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

A survey of dynamic graph neural networks

Y Zheng, L Yi, Z Wei - Frontiers of Computer Science, 2025 - Springer
Graph neural networks (GNNs) have emerged as a powerful tool for effectively mining and
learning from graph-structured data, with applications spanning numerous domains …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Knowledge distillation of llm for education

E Latif, L Fang, P Ma, X Zhai - arXiv preprint arXiv:2312.15842, 2023 - arxiv.org
This study proposes a method for distilling the knowledge of fine-tuned Large Language
Models (LLMs) into a smaller, more efficient, and accurate neural network, specifically …

Are LLM-based Evaluators Confusing NLG Quality Criteria?

X Hu, M Gao, S Hu, Y Zhang, Y Chen, T Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks.
However, we discover that LLMs seem to confuse different evaluation criteria, which reduces …

UniDE: A multi-level and low-resource framework for automatic dialogue evaluation via LLM-based data augmentation and multitask learning

G Ye, H Zhao, Z Zhang, Z Jiang - Information Processing & Management, 2025 - Elsevier
Abstract Automatic Dialogue Evaluation (ADE) plays a vital role in developing dialogue and
interactive systems. However, when selecting quality dimensions, previous methods often …

Building a llama2-finetuned llm for odia language utilizing domain knowledge instruction set

GS Kohli, S Parida, S Sekhar, S Saha, NB Nair… - Proceedings of the …, 2023 - dl.acm.org
Building LLMs for languages other than English is in great demand due to the unavailability
and performance of multilingual LLMs, such as understanding the local context. The …

ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues

J Mendonça, I Trancoso, A Lavie - arXiv preprint arXiv:2407.11660, 2024 - arxiv.org
Despite being heralded as the new standard for dialogue evaluation, the closed-source
nature of GPT-4 poses challenges for the community. Motivated by the need for lightweight …

Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues

KC Chu, YP Chen, H Nakayama - arXiv preprint arXiv:2407.09897, 2024 - arxiv.org
This paper investigates the quality of multi-agent dialogues in simulations powered by Large
Language Models (LLMs). Analyzing dialogues and memory over multiple sessions …

ConvoCache: Smart Re-Use of Chatbot Responses

C Atkins, I Wood, MA Kaafar, H Asghar, N Basta… - arXiv preprint arXiv …, 2024 - arxiv.org
We present ConvoCache, a conversational caching system that solves the problem of slow
and expensive generative AI models in spoken chatbots. ConvoCache finds a semantically …