Laprador: Unsupervised pretrained dense retriever for zero-shot text retrieval

L Wang, N Yang, X Huang, B Jiao, L Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

被引用次数：267 相关文章所有 2 个版本

[PDF] arxiv.org

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org

Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

被引用次数：109 相关文章所有 4 个版本

[PDF] arxiv.org

RetroMAE: Pre-training retrieval-oriented language models via masked auto-encoder

S Xiao, Z Liu, Y Shao, Z Cao - arXiv preprint arXiv:2205.12035, 2022 - arxiv.org

Despite pre-training's progress in many important NLP tasks, it remains to explore effective
pre-training strategies for dense retrieval. In this paper, we propose RetroMAE, a new …

被引用次数：108 相关文章所有 3 个版本

[PDF] arxiv.org

An efficiency study for SPLADE models

C Lassance, S Clinchant - Proceedings of the 45th International ACM …, 2022 - dl.acm.org

Latency and efficiency issues are often overlooked when evaluating IR models based on
Pretrained Language Models (PLMs) in reason of multiple hardware and software testing …

被引用次数：56 相关文章所有 5 个版本

[PDF] arxiv.org

Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning

Y Yu, C Xiong, S Sun, C Zhang, A Overwijk - arXiv preprint arXiv …, 2022 - arxiv.org

We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the
generalization ability of dense retrieval by combating the distribution shifts between source …

被引用次数：47 相关文章所有 3 个版本

[PDF] arxiv.org

Simans: Simple ambiguous negatives sampling for dense text retrieval

K Zhou, Y Gong, X Liu, WX Zhao, Y Shen… - arXiv preprint arXiv …, 2022 - arxiv.org

Sampling proper negatives from a large document pool is vital to effectively train a dense
retrieval model. However, existing negative sampling strategies suffer from the uninformative …

被引用次数：31 相关文章所有 3 个版本

[PDF] arxiv.org

A thorough examination on zero-shot dense retrieval

R Ren, Y Qu, J Liu, WX Zhao, Q Wu, Y Ding… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent years have witnessed the significant advance in dense retrieval (DR) based on
powerful pre-trained language models (PLM). DR models have achieved excellent …

被引用次数：31 相关文章所有 4 个版本

[PDF] arxiv.org

Led: Lexicon-enlightened dense retriever for large-scale retrieval

K Zhang, C Tao, T Shen, C Xu, X Geng, B Jiao… - Proceedings of the …, 2023 - dl.acm.org

Retrieval models based on dense representations in semantic space have become an
indispensable branch for first-stage retrieval. These retrievers benefit from surging advances …

被引用次数：21 相关文章所有 4 个版本

[PDF] arxiv.org

Master: Multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers

K Zhou, X Liu, Y Gong, WX Zhao, D Jiang… - … Conference on Machine …, 2023 - Springer

Pre-trained Transformers (eg, BERT) have been commonly used in existing dense retrieval
methods for parameter initialization, and recent studies are exploring more effective pre …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Mixed-modality representation learning and pre-training for joint table-and-text retrieval in openqa

J Huang, W Zhong, Q Liu, M Gong, D Jiang… - arXiv preprint arXiv …, 2022 - arxiv.org

Retrieving evidences from tabular and textual resources is essential for open-domain
question answering (OpenQA), which provides more comprehensive information. However …

被引用次数：13 相关文章所有 3 个版本