Simulating simple user behavior for system effectiveness evaluation

K Balog, CX Zhai - Proceedings of the Annual International ACM SIGIR …, 2023 - dl.acm.org

With the emergence of various information access systems exhibiting increasing complexity,
there is a critical need for sound and scalable means of automatic evaluation. To address …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating mixed-initiative conversational search systems via user simulation

I Sekulić, M Aliannejadi, F Crestani - … on Web Search and Data Mining, 2022 - dl.acm.org

Clarifying the underlying user information need by asking clarifying questions is an
important feature of modern conversational search system. However, evaluation of such …

被引用次数：66 相关文章所有 6 个版本

[PDF] acm.org

Exploiting simulated user feedback for conversational search: Ranking, rewriting, and beyond

P Owoicho, I Sekulic, M Aliannejadi, J Dalton… - Proceedings of the 46th …, 2023 - dl.acm.org

This research aims to explore various methods for assessing user feedback in mixed-
initiative conversational search (CS) systems. While CS systems enjoy profuse …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Distributionally-informed recommender system evaluation

MD Ekstrand, B Carterette, F Diaz - ACM Transactions on …, 2024 - dl.acm.org

Current practice for evaluating recommender systems typically focuses on point estimates of
user-oriented effectiveness metrics or business metrics, sometimes combined with …

被引用次数：16 相关文章所有 7 个版本

[PDF] acm.org

Let the llms talk: Simulating human-to-human conversational qa via zero-shot llm-to-llm interactions

Z Abbasiantaeb, Y Yuan, E Kanoulas… - Proceedings of the 17th …, 2024 - dl.acm.org

CQA systems aim to create interactive search systems that effectively retrieve information by
interacting with users. To replicate human-to-human conversations, existing work uses …

被引用次数：48 相关文章所有 3 个版本

[PDF] psu.edu

Time-based calibration of effectiveness measures

MD Smucker, CLA Clarke - Proceedings of the 35th international ACM …, 2012 - dl.acm.org

Many current effectiveness measures incorporate simplifying assumptions about user
behavior. These assumptions prevent the measures from reflecting aspects of the search …

被引用次数：240 相关文章所有 5 个版本

[PDF] georgetown.edu

Win-win search: Dual-agent stochastic game in session search

J Luo, S Zhang, H Yang - Proceedings of the 37th international ACM …, 2014 - dl.acm.org

Session search is a complex search task that involves multiple search iterations triggered by
query reformulations. We observe a Markov chain in session search: user's judgment of …

被引用次数：127 相关文章所有 4 个版本

[PDF] arxiv.org

Measuring recommender system effects with simulated users

S Yao, Y Halpern, N Thain, X Wang, K Lee… - arXiv preprint arXiv …, 2021 - arxiv.org

Imagine a food recommender system--how would we check if it is\emph {causing} and
fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's …

被引用次数：47 相关文章所有 2 个版本

A general evaluation measure for document organization tasks

E Amigó, J Gonzalo, F Verdejo - … of the 36th international ACM SIGIR …, 2013 - dl.acm.org

A number of key Information Access tasks--Document Retrieval, Clustering, Filtering, and
their combinations--can be seen as instances of a generic {\em document organization} …

被引用次数：117 相关文章所有 2 个版本

[PDF] northeastern.edu

Users versus models: What observation tells us about effectiveness metrics

A Moffat, P Thomas, F Scholer - Proceedings of the 22nd ACM …, 2013 - dl.acm.org

Retrieval system effectiveness can be measured in two quite different ways: by monitoring
the behavior of users and gathering data about the ease and accuracy with which they …

被引用次数：108 相关文章所有 10 个版本