Adapting standard retrieval benchmarks to evaluate generated answers

N Arabzadeh, A Bigdeli, CLA Clarke - European Conference on …, 2024 - Springer
Large language models can now directly generate answers to many factual questions
without referencing external sources. Unfortunately, relatively little attention has been paid to …

Assessing and verifying task utility in llm-powered applications

N Arabzadeh, S Huo, N Mehta, Q Wu, C Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of Large Language Models (LLMs) has led to a surge in applications
that facilitate collaboration among multiple agents, assisting humans in their daily tasks …

Fr\'echet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels

N Arabzadeh, CLA Clarke - arXiv preprint arXiv:2401.17543, 2024 - arxiv.org
The rapid advancement of natural language processing, information retrieval (IR), computer
vision, and other technologies has presented significant challenges in evaluating the …

Context-Aware Query Term Difficulty Estimation for Performance Prediction

A Saleminezhad, N Arabzadeh, S Beheshti… - … on Information Retrieval, 2024 - Springer
Research has already found that many retrieval methods are sensitive to the choice and
order of terms that appear in a query, which can significantly impact retrieval effectiveness …

Evaluating Relative Retrieval Effectiveness with Normalized Residual Gain

A Bigdeli, N Arabzadeh, E Bagheri… - Proceedings of the 2024 …, 2024 - dl.acm.org
Traditional search evaluation metrics, such as MRR and NDCG, focus on absolute
measures of effectiveness. While they allow us to compare the absolute performance of one …

A Comparison of Methods for Evaluating Generative IR

N Arabzadeh, CLA Clarke - arXiv preprint arXiv:2404.04044, 2024 - arxiv.org
Information retrieval systems increasingly incorporate generative components. For example,
in a retrieval augmented generation (RAG) system, a retrieval component might provide a …

Learning to Jointly Transform and Rank Difficult Queries

A Bigdeli, N Arabzadeh, E Bagheri - European Conference on Information …, 2024 - Springer
Recent empirical studies have shown that while neural rankers exhibit increasingly higher
retrieval effectiveness on tasks such as ad hoc retrieval, these improved performances are …

[PDF][PDF] Context-Aware Query Term Difficulty Estimation for Performance Prediction

E Bagheri - ls3.rnet.torontomu.ca
Research has already found that many retrieval methods are sensitive to the choice and
order of terms that appear in a query, which can significantly impact retrieval effectiveness …