User simulation for evaluating information access systems

K Balog, CX Zhai - Proceedings of the Annual International ACM SIGIR …, 2023 - dl.acm.org
With the emergence of various information access systems exhibiting increasing complexity,
there is a critical need for sound and scalable means of automatic evaluation. To address …

[HTML][HTML] Information retrieval meets large language models: a strategic report from chinese ir community

Q Ai, T Bai, Z Cao, Y Chang, J Chen, Z Chen, Z Cheng… - AI Open, 2023 - Elsevier
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond
traditional search to meet diverse user information needs. Recently, Large Language …

User simulation with large language models for evaluating task-oriented dialogue

S Davidson, S Romeo, R Shu, J Gung, A Gupta… - arXiv preprint arXiv …, 2023 - arxiv.org
One of the major impediments to the development of new task-oriented dialogue (TOD)
systems is the need for human evaluation at multiple stages and iterations of the …

User Behavior Simulation with Large Language Model-based Agents for Recommender Systems

L Wang, J Zhang, H Yang, ZY Chen, J Tang… - ACM Transactions on …, 2024 - dl.acm.org
Simulating high quality user behavior data has always been a fundamental yet challenging
problem in human-centered applications such as recommendation systems, social networks …

A Theory Guided Scaffolding Instruction Framework for LLM-Enabled Metaphor Reasoning

Y Tian, N Xu, W Mao - Proceedings of the 2024 Conference of the …, 2024 - aclanthology.org
Metaphor detection is a challenging task in figurative language processing, which aims to
distinguish between metaphorical and literal expressions in text. Existing methods tackle …

Leveraging user simulation to develop and evaluate conversational information access agents

N Bernard - Proceedings of the 17th ACM International Conference …, 2024 - dl.acm.org
We observe a change in the way users access information, that is, the rise of conversational
information access (CIA) agents. However, the automatic evaluation of these agents remains …

DiQAD: A Benchmark Dataset for Open-domain Dialogue Quality Assessment

Y Zhao, L Yan, W Sun, C Meng, S Wang… - Findings of the …, 2023 - aclanthology.org
Dialogue assessment plays a critical role in the development of open-domain dialogue
systems. Existing work are uncapable of providing an end-to-end and human-epistemic …

What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems

M Vlachou, C Macdonald - arXiv preprint arXiv:2401.05783, 2024 - arxiv.org
In Conversational Recommendation Systems (CRS), a user can provide feedback on
recommended items at each interaction turn, leading the CRS towards more desirable …

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Y Zhao, L Yan, W Sun, C Meng, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Dialogue assessment plays a critical role in the development of open-domain dialogue
systems. Existing work are uncapable of providing an end-to-end and human-epistemic …

[PDF][PDF] User simulation in task-oriented dialog systems based on large language models via in-context learning

R Horst - 2024 - fbmn.h-da.de
The growing importance of human-computer interaction and natural language processing is
highlighted by significant advances such as the introduction of ChatGPT, a Large Language …