One embedder, any task: Instruction-finetuned text embeddings

H Su, W Shi, J Kasai, Y Wang, Y Hu… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …

Chartqa: A benchmark for question answering about charts with visual and logical reasoning

A Masry, DX Long, JQ Tan, S Joty, E Hoque - arXiv preprint arXiv …, 2022 - arxiv.org
Charts are very popular for analyzing data. When exploring charts, people often ask a
variety of complex reasoning questions that involve several logical and arithmetic …

Promptagator: Few-shot dense retrieval from 8 examples

Z Dai, VY Zhao, J Ma, Y Luan, J Ni, J Lu… - arXiv preprint arXiv …, 2022 - arxiv.org
Much recent research on information retrieval has focused on how to transfer from one task
(typically with abundant supervised data) to various other tasks where supervision is limited …

Autoregressive search engines: Generating substrings as document identifiers

M Bevilacqua, G Ottaviano, P Lewis… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering

S Siriwardhana, R Weerasekera, E Wen… - Transactions of the …, 2023 - direct.mit.edu
Abstract Retrieval Augment Generation (RAG) is a recent advancement in Open-Domain
Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia …

Uni-perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks

X Zhu, J Zhu, H Li, X Wu, H Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Biological intelligence systems of animals perceive the world by integrating information in
different modalities and processing simultaneously for various tasks. In contrast, current …

GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval

K Wang, N Thakur, N Reimers, I Gurevych - arXiv preprint arXiv …, 2021 - arxiv.org
Dense retrieval approaches can overcome the lexical gap and lead to significantly improved
search results. However, they require large amounts of training data which is not available …

Training data is more valuable than you think: A simple and effective method by retrieving from training data

S Wang, Y Xu, Y Fang, Y Liu, S Sun, R Xu… - arXiv preprint arXiv …, 2022 - arxiv.org
Retrieval-based methods have been shown to be effective in NLP tasks via introducing
external knowledge. However, the indexing and retrieving of large-scale corpora bring …

Simple entity-centric questions challenge dense retrievers

C Sciavolino, Z Zhong, J Lee, D Chen - arXiv preprint arXiv:2109.08535, 2021 - arxiv.org
Open-domain question answering has exploded in popularity recently due to the success of
dense retrieval models, which have surpassed sparse models using only a few supervised …