Semantic models for the first-stage retrieval: A comprehensive review

J Guo, Y Cai, Y Fan, F Sun, R Zhang… - ACM Transactions on …, 2022 - dl.acm.org
Multi-stage ranking pipelines have been a practical solution in modern search systems,
where the first-stage retrieval is to return a subset of candidate documents and latter stages …

Information retrieval: recent advances and beyond

KA Hambarde, H Proenca - IEEE Access, 2023 - ieeexplore.ieee.org
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

C-pack: Packaged resources to advance general chinese embedding

S Xiao, Z Liu, P Zhang, N Muennighoff, D Lian… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce C-Pack, a package of resources that significantly advance the field of general
Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a …

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arXiv preprint arXiv …, 2023 - arxiv.org
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

Learning to retrieve prompts for in-context learning

O Rubin, J Herzig, J Berant - arXiv preprint arXiv:2112.08633, 2021 - arxiv.org
In-context learning is a recent paradigm in natural language understanding, where a large
pre-trained language model (LM) observes a test instance and a few training examples as …

Text and code embeddings by contrastive pre-training

A Neelakantan, T Xu, R Puri, A Radford, JM Han… - arXiv preprint arXiv …, 2022 - arxiv.org
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …

Generate rather than retrieve: Large language models are strong context generators

W Yu, D Iter, S Wang, Y Xu, M Ju, S Sanyal… - arXiv preprint arXiv …, 2022 - arxiv.org
Knowledge-intensive tasks, such as open-domain question answering (QA), require access
to a large amount of world or domain knowledge. A common approach for knowledge …

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arXiv preprint arXiv …, 2021 - arxiv.org
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arXiv preprint arXiv …, 2021 - arxiv.org
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …