Adapting standard retrieval benchmarks to evaluate generated answers
Large language models can now directly generate answers to many factual questions
without referencing external sources. Unfortunately, relatively little attention has been paid to …
without referencing external sources. Unfortunately, relatively little attention has been paid to …
Assessing and verifying task utility in llm-powered applications
N Arabzadeh, S Huo, N Mehta, Q Wu, C Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of Large Language Models (LLMs) has led to a surge in applications
that facilitate collaboration among multiple agents, assisting humans in their daily tasks …
that facilitate collaboration among multiple agents, assisting humans in their daily tasks …
Fr\'echet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels
N Arabzadeh, CLA Clarke - arXiv preprint arXiv:2401.17543, 2024 - arxiv.org
The rapid advancement of natural language processing, information retrieval (IR), computer
vision, and other technologies has presented significant challenges in evaluating the …
vision, and other technologies has presented significant challenges in evaluating the …
Context-Aware Query Term Difficulty Estimation for Performance Prediction
Research has already found that many retrieval methods are sensitive to the choice and
order of terms that appear in a query, which can significantly impact retrieval effectiveness …
order of terms that appear in a query, which can significantly impact retrieval effectiveness …
Evaluating Relative Retrieval Effectiveness with Normalized Residual Gain
Traditional search evaluation metrics, such as MRR and NDCG, focus on absolute
measures of effectiveness. While they allow us to compare the absolute performance of one …
measures of effectiveness. While they allow us to compare the absolute performance of one …
A Comparison of Methods for Evaluating Generative IR
N Arabzadeh, CLA Clarke - arXiv preprint arXiv:2404.04044, 2024 - arxiv.org
Information retrieval systems increasingly incorporate generative components. For example,
in a retrieval augmented generation (RAG) system, a retrieval component might provide a …
in a retrieval augmented generation (RAG) system, a retrieval component might provide a …
Learning to Jointly Transform and Rank Difficult Queries
Recent empirical studies have shown that while neural rankers exhibit increasingly higher
retrieval effectiveness on tasks such as ad hoc retrieval, these improved performances are …
retrieval effectiveness on tasks such as ad hoc retrieval, these improved performances are …
[PDF][PDF] Context-Aware Query Term Difficulty Estimation for Performance Prediction
E Bagheri - ls3.rnet.torontomu.ca
Research has already found that many retrieval methods are sensitive to the choice and
order of terms that appear in a query, which can significantly impact retrieval effectiveness …
order of terms that appear in a query, which can significantly impact retrieval effectiveness …