Specter: Document-level representation learning using citation-informed transformers

B Jin, G Liu, C Han, M Jiang, H Ji… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Large language models (LLMs), such as GPT4 and LLaMA, are creating significant
advancements in natural language processing, due to their strong text encoding/decoding …

被引用次数：110 相关文章所有 2 个版本

[PDF] oup.com

Text mining approaches for dealing with the rapidly expanding literature on COVID-19

LL Wang, K Lo - Briefings in Bioinformatics, 2021 - academic.oup.com

More than 50 000 papers have been published about COVID-19 since the beginning of
2020 and several hundred new papers continue to be published every day. This incredible …

被引用次数：143 相关文章所有 14 个版本

[PDF] arxiv.org

BioGPT: generative pre-trained transformer for biomedical text generation and mining

R Luo, L Sun, Y Xia, T Qin, S Zhang… - Briefings in …, 2022 - academic.oup.com

Pre-trained language models have attracted increasing attention in the biomedical domain,
inspired by their great success in the general natural language domain. Among the two main …

被引用次数：748 相关文章所有 9 个版本

[PDF] arxiv.org

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

被引用次数：381 相关文章所有 2 个版本

[PDF] arxiv.org

MTEB: Massive text embedding benchmark

N Muennighoff, N Tazi, L Magne, N Reimers - arXiv preprint arXiv …, 2022 - arxiv.org

Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

被引用次数：510 相关文章所有 4 个版本

[PDF] arxiv.org

Linkbert: Pretraining language models with document links

M Yasunaga, J Leskovec, P Liang - arXiv preprint arXiv:2203.15827, 2022 - arxiv.org

Language model (LM) pretraining can learn various knowledge from text corpora, helping
downstream tasks. However, existing methods such as BERT model a single document, and …

被引用次数：364 相关文章所有 11 个版本

[PDF] arxiv.org

One embedder, any task: Instruction-finetuned text embeddings

H Su, W Shi, J Kasai, Y Wang, Y Hu… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce INSTRUCTOR, a new method for computing text embeddings given task
instructions: every text input is embedded together with instructions explaining the use case …

被引用次数：230 相关文章所有 4 个版本

[PDF] arxiv.org

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arXiv preprint arXiv …, 2021 - arxiv.org

Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

被引用次数：376 相关文章所有 5 个版本

[PDF] arxiv.org

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arXiv preprint arXiv …, 2021 - arxiv.org

Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …

被引用次数：874 相关文章所有 6 个版本

[PDF] arxiv.org

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org

Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

被引用次数：164 相关文章所有 4 个版本