Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

Z Liu, S Xiao, Y Shao, Z Cao - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
To better support information retrieval tasks such as web search and open-domain question
answering, growing effort is made to develop retrieval-oriented language models, eg …

TOME: A two-stage approach for model-based retrieval

R Ren, WX Zhao, J Liu, H Wu, JR Wen… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, model-based retrieval has emerged as a new paradigm in text retrieval that
discards the index in the traditional retrieval model and instead memorizes the candidate …

Making large language models a better foundation for dense retrieval

C Li, Z Liu, S Xiao, Y Shao - arXiv preprint arXiv:2312.15503, 2023 - arxiv.org
Dense retrieval needs to learn discriminative text embeddings to represent the semantic
relationship between query and document. It may benefit from the using of large language …

Master: Multi-task pre-trained bottlenecked masked autoencoders are better dense retrievers

K Zhou, X Liu, Y Gong, WX Zhao, D Jiang… - … Conference on Machine …, 2023 - Springer
Pre-trained Transformers (eg, BERT) have been commonly used in existing dense retrieval
methods for parameter initialization, and recent studies are exploring more effective pre …

Llama2vec: Unsupervised adaptation of large language models for dense retrieval

C Li, Z Liu, S Xiao, Y Shao, D Lian - … of the 62nd Annual Meeting of …, 2024 - aclanthology.org
Dense retrieval calls for discriminative embeddings to represent the semantic relationship
between query and document. It may benefit from the using of large language models …

Learning Discrete Document Representations in Web Search

R Huang, D Zhang, W Lu, H Li, M Wang, D Shi… - Proceedings of the 29th …, 2023 - dl.acm.org
Product quantization (PQ) has been usually applied to dense retrieval (DR) of documents
thanks to its competitive time, memory efficiency and compatibility with other approximate …

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

Z Yang, Z Shao, Y Dong, J Tang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Negative sampling stands as a pivotal technique in dense retrieval, essential for training
effective retrieval models and significantly impacting retrieval performance. While existing …

Retromae-2: Duplex masked auto-encoder for pre-training retrieval-oriented language models

S Xiao, Z Liu, Y Shao, Z Cao - arXiv preprint arXiv:2305.02564, 2023 - arxiv.org
To better support information retrieval tasks such as web search and open-domain question
answering, growing effort is made to develop retrieval-oriented language models, eg …

Improving News Recommendation via Bottlenecked Multi-task Pre-training

X Xiao, Q Li, S Liu, K Zhou - Proceedings of the 46th International ACM …, 2023 - dl.acm.org
Recent years have witnessed the boom of deep neural networks in online news
recommendation service. As news articles mainly consist of textual content, pre-trained …