Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

RetroMAE: Pre-training retrieval-oriented language models via masked auto-encoder

S Xiao, Z Liu, Y Shao, Z Cao - arXiv preprint arXiv:2205.12035, 2022 - arxiv.org
Despite pre-training's progress in many important NLP tasks, it remains to explore effective
pre-training strategies for dense retrieval. In this paper, we propose RetroMAE, a new …

An efficiency study for SPLADE models

C Lassance, S Clinchant - Proceedings of the 45th International ACM …, 2022 - dl.acm.org
Latency and efficiency issues are often overlooked when evaluating IR models based on
Pretrained Language Models (PLMs) in reason of multiple hardware and software testing …

Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning

Y Yu, C Xiong, S Sun, C Zhang, A Overwijk - arXiv preprint arXiv …, 2022 - arxiv.org
We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the
generalization ability of dense retrieval by combating the distribution shifts between source …

Simans: Simple ambiguous negatives sampling for dense text retrieval

K Zhou, Y Gong, X Liu, WX Zhao, Y Shen… - arXiv preprint arXiv …, 2022 - arxiv.org
Sampling proper negatives from a large document pool is vital to effectively train a dense
retrieval model. However, existing negative sampling strategies suffer from the uninformative …

A thorough examination on zero-shot dense retrieval

R Ren, Y Qu, J Liu, WX Zhao, Q Wu, Y Ding… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent years have witnessed the significant advance in dense retrieval (DR) based on
powerful pre-trained language models (PLM). DR models have achieved excellent …

Led: Lexicon-enlightened dense retriever for large-scale retrieval

K Zhang, C Tao, T Shen, C Xu, X Geng, B Jiao… - Proceedings of the …, 2023 - dl.acm.org
Retrieval models based on dense representations in semantic space have become an
indispensable branch for first-stage retrieval. These retrievers benefit from surging advances …

Master: Multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers

K Zhou, X Liu, Y Gong, WX Zhao, D Jiang… - … Conference on Machine …, 2023 - Springer
Pre-trained Transformers (eg, BERT) have been commonly used in existing dense retrieval
methods for parameter initialization, and recent studies are exploring more effective pre …

Mixed-modality representation learning and pre-training for joint table-and-text retrieval in openqa

J Huang, W Zhong, Q Liu, M Gong, D Jiang… - arXiv preprint arXiv …, 2022 - arxiv.org
Retrieving evidences from tabular and textual resources is essential for open-domain
question answering (OpenQA), which provides more comprehensive information. However …