Large language models on graphs: A comprehensive survey
Large language models (LLMs), such as GPT4 and LLaMA, are creating significant
advancements in natural language processing, due to their strong text encoding/decoding …
advancements in natural language processing, due to their strong text encoding/decoding …
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
More than 50 000 papers have been published about COVID-19 since the beginning of
2020 and several hundred new papers continue to be published every day. This incredible …
2020 and several hundred new papers continue to be published every day. This incredible …
BioGPT: generative pre-trained transformer for biomedical text generation and mining
Pre-trained language models have attracted increasing attention in the biomedical domain,
inspired by their great success in the general natural language domain. Among the two main …
inspired by their great success in the general natural language domain. Among the two main …
Text embeddings by weakly-supervised contrastive pre-training
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
MTEB: Massive text embedding benchmark
Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …
covering their possible applications to other tasks. It is unclear whether state-of-the-art …
Linkbert: Pretraining language models with document links
Language model (LM) pretraining can learn various knowledge from text corpora, helping
downstream tasks. However, existing methods such as BERT model a single document, and …
downstream tasks. However, existing methods such as BERT model a single document, and …
One embedder, any task: Instruction-finetuned text embeddings
We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …
instructions: every text input is embedded together with instructions explaining the use case …
Colbertv2: Effective and efficient retrieval via lightweight late interaction
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …
intensive language tasks. While many neural IR methods encode queries and documents …
Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …
and narrow settings, which has considerably limited insights into their out-of-distribution …
Dense text retrieval based on pretrained language models: A survey
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …
required to return relevant information resources to user's queries in natural language. From …