Another look at DPR: reproduction of training and replication of retrieval

J Xian, T Teofili, R Pradeep, J Lin - … Conference on Web Search and Data …, 2024 - dl.acm.org

We provide a reproducible, end-to-end demonstration of vector search with OpenAI
embeddings using Lucene on the popular MS MARCO passage ranking test collection. The …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Salient phrase aware dense retrieval: can a dense retriever imitate a sparse one?

X Chen, K Lakhotia, B Oğuz, A Gupta, P Lewis… - arXiv preprint arXiv …, 2021 - arxiv.org

Despite their recent popularity and well-known advantages, dense retrievers still lag behind
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …

被引用次数：60 相关文章所有 3 个版本

[PDF] cmu.edu

Tevatron: An efficient and flexible toolkit for neural retrieval

L Gao, X Ma, J Lin, J Callan - Proceedings of the 46th International ACM …, 2023 - dl.acm.org

Recent rapid advances in deep pre-trained language models and the introduction of large
datasets have powered research in embedding-based neural retrieval. While many …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

Simple yet effective neural ranking and reranking baselines for cross-lingual information retrieval

J Lin, D Alfonso-Hermelo, V Jeronymo… - arXiv preprint arXiv …, 2023 - arxiv.org

The advent of multilingual language models has generated a resurgence of interest in cross-
lingual information retrieval (CLIR), which is the task of searching documents in one …

被引用次数：10 相关文章所有 3 个版本

[PDF] github.io

Resources for brewing beir: Reproducible reference models and statistical analyses

E Kamalloo, N Thakur, C Lassance, X Ma… - Proceedings of the 47th …, 2024 - dl.acm.org

BEIR is a benchmark dataset originally designed for zero-shot evaluation of retrieval models
across 18 different domain/task combinations. In recent years, we have witnessed the …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Resources for brewing BEIR: reproducible reference models and an official leaderboard

E Kamalloo, N Thakur, C Lassance, X Ma… - arXiv preprint arXiv …, 2023 - arxiv.org

BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across
18 different domain/task combinations. In recent years, we have witnessed the growing …

被引用次数：10 相关文章所有 2 个版本

Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering

MS Tamber, R Pradeep, J Lin - European Conference on Information …, 2023 - Springer

One of the contributions of the landmark Dense Passage Retriever (DPR) work is the
curation of a corpus of passages generated from Wikipedia articles that have been …

被引用次数：6 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] Enhancing Biomedical Question Answering with Large Language Models

H Yang, S Li, T Gonçalves - Information, 2024 - mdpi.com

In the field of Information Retrieval, biomedical question answering is a specialized task that
focuses on answering questions related to medical and healthcare domains. The goal is to …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

X Ma, T Teofili, J Lin - Proceedings of the 32nd ACM International …, 2023 - dl.acm.org

Anserini is a Lucene-based toolkit for reproducible information retrieval research in Java that
has been gaining traction in the community. It provides retrieval capabilities for both" …

被引用次数：3 相关文章所有 7 个版本

[PDF] ceur-ws.org

[PDF][PDF] Multi-stage Literature Retrieval System Trained by PubMed Search Logs for Biomedical Question Answering.

A Shin, Q Jin, Z Lu - CLEF (Working Notes), 2023 - ceur-ws.org

This paper discusses our submission to the 2023 BioASQ challenge, document retrieval
subtask (subtask B, phase A). In the subtask, systems must return top 10 most relevant …

被引用次数：5 相关文章