One embedder, any task: Instruction-finetuned text embeddings
We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …
instructions: every text input is embedded together with instructions explaining the use case …
Chartqa: A benchmark for question answering about charts with visual and logical reasoning
Charts are very popular for analyzing data. When exploring charts, people often ask a
variety of complex reasoning questions that involve several logical and arithmetic …
variety of complex reasoning questions that involve several logical and arithmetic …
Promptagator: Few-shot dense retrieval from 8 examples
Much recent research on information retrieval has focused on how to transfer from one task
(typically with abundant supervised data) to various other tasks where supervision is limited …
(typically with abundant supervised data) to various other tasks where supervision is limited …
Autoregressive search engines: Generating substrings as document identifiers
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
Dense text retrieval based on pretrained language models: A survey
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …
required to return relevant information resources to user's queries in natural language. From …
Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering
Abstract Retrieval Augment Generation (RAG) is a recent advancement in Open-Domain
Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia …
Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia …
Uni-perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks
Biological intelligence systems of animals perceive the world by integrating information in
different modalities and processing simultaneously for various tasks. In contrast, current …
different modalities and processing simultaneously for various tasks. In contrast, current …
GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval
Dense retrieval approaches can overcome the lexical gap and lead to significantly improved
search results. However, they require large amounts of training data which is not available …
search results. However, they require large amounts of training data which is not available …
Training data is more valuable than you think: A simple and effective method by retrieving from training data
Retrieval-based methods have been shown to be effective in NLP tasks via introducing
external knowledge. However, the indexing and retrieving of large-scale corpora bring …
external knowledge. However, the indexing and retrieving of large-scale corpora bring …
Simple entity-centric questions challenge dense retrievers
Open-domain question answering has exploded in popularity recently due to the success of
dense retrieval models, which have surpassed sparse models using only a few supervised …
dense retrieval models, which have surpassed sparse models using only a few supervised …