Vector search with OpenAI embeddings: Lucene is all you need
We provide a reproducible, end-to-end demonstration of vector search with OpenAI
embeddings using Lucene on the popular MS MARCO passage ranking test collection. The …
embeddings using Lucene on the popular MS MARCO passage ranking test collection. The …
Salient phrase aware dense retrieval: can a dense retriever imitate a sparse one?
Despite their recent popularity and well-known advantages, dense retrievers still lag behind
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …
Tevatron: An efficient and flexible toolkit for neural retrieval
Recent rapid advances in deep pre-trained language models and the introduction of large
datasets have powered research in embedding-based neural retrieval. While many …
datasets have powered research in embedding-based neural retrieval. While many …
Simple yet effective neural ranking and reranking baselines for cross-lingual information retrieval
The advent of multilingual language models has generated a resurgence of interest in cross-
lingual information retrieval (CLIR), which is the task of searching documents in one …
lingual information retrieval (CLIR), which is the task of searching documents in one …
Resources for brewing beir: Reproducible reference models and statistical analyses
BEIR is a benchmark dataset originally designed for zero-shot evaluation of retrieval models
across 18 different domain/task combinations. In recent years, we have witnessed the …
across 18 different domain/task combinations. In recent years, we have witnessed the …
Resources for brewing BEIR: reproducible reference models and an official leaderboard
BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across
18 different domain/task combinations. In recent years, we have witnessed the growing …
18 different domain/task combinations. In recent years, we have witnessed the growing …
Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering
One of the contributions of the landmark Dense Passage Retriever (DPR) work is the
curation of a corpus of passages generated from Wikipedia articles that have been …
curation of a corpus of passages generated from Wikipedia articles that have been …
[HTML][HTML] Enhancing Biomedical Question Answering with Large Language Models
H Yang, S Li, T Gonçalves - Information, 2024 - mdpi.com
In the field of Information Retrieval, biomedical question answering is a specialized task that
focuses on answering questions related to medical and healthcare domains. The goal is to …
focuses on answering questions related to medical and healthcare domains. The goal is to …
Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes
Anserini is a Lucene-based toolkit for reproducible information retrieval research in Java that
has been gaining traction in the community. It provides retrieval capabilities for both" …
has been gaining traction in the community. It provides retrieval capabilities for both" …
[PDF][PDF] Multi-stage Literature Retrieval System Trained by PubMed Search Logs for Biomedical Question Answering.
This paper discusses our submission to the 2023 BioASQ challenge, document retrieval
subtask (subtask B, phase A). In the subtask, systems must return top 10 most relevant …
subtask (subtask B, phase A). In the subtask, systems must return top 10 most relevant …