查看文章

researchgate.net 中的 [PDF]

Predicting Search Satisfaction Metrics with Interleaved Comparisons

作者

Anne Schuth, Katja Hofmann, Filip Radlinski

发表日期

2015/8

研讨会论文

SIGIR 2015

出版商

ACM

简介

The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled experiment, AB tests compare the performance of an experimental system (treatment) on one sample of the user population, to that of a baseline system (control) on another sample. Given an online evaluation metric that accurately reflects user satisfaction, these tests enjoy high validity. However, due to the high variance across users, these comparisons often have low sensitivity, requiring millions of queries to detect statistically significant differences between systems. Interleaving is an alternative online evaluation approach, where each user is presented with a combination of results from both the control and treatment systems. Compared to AB tests, interleaving has been shown to be substantially more sensitive. However, interleaving methods have so far focused on user clicks only, and lack support for more …

引用总数

被引用次数：46

20152016201720182019202020212022202320243 8 9 8 1 6 4 5 1 1

学术搜索中的文章

Predicting search satisfaction metrics with interleaved comparisons

A Schuth, K Hofmann, F Radlinski - Proceedings of the 38th International ACM SIGIR …, 2015

被引用次数：46 相关文章所有 9 个版本