Text and code embeddings by contrastive pre-training

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：523 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] GPT models in construction industry: Opportunities, limitations, and a use case validation

A Saka, R Taiwo, N Saka, BA Salami, S Ajayi… - Developments in the …, 2023 - Elsevier

Abstract Large Language Models (LLMs) trained on large data sets came into prominence in
2018 after Google introduced BERT. Subsequently, different LLMs such as GPT models from …

被引用次数：51 相关文章所有 11 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2820 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arXiv preprint arXiv …, 2023 - arxiv.org

As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

被引用次数：226 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring the potential of large language models (llms) in learning on graphs

Z Chen, H Mao, H Li, W Jin, H Wen, X Wei… - ACM SIGKDD …, 2024 - dl.acm.org

Learning on Graphs has attracted immense attention due to its wide real-world applications.
The most popular pipeline for learning on graphs with textual node attributes primarily relies …

被引用次数：229 相关文章所有 9 个版本

[PDF] arxiv.org

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

被引用次数：354 相关文章所有 2 个版本

[PDF] arxiv.org

MTEB: Massive text embedding benchmark

N Muennighoff, N Tazi, L Magne, N Reimers - arXiv preprint arXiv …, 2022 - arxiv.org

Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

被引用次数：439 相关文章所有 4 个版本

[PDF] neurips.cc

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

被引用次数：256 相关文章所有 6 个版本

[PDF] arxiv.org

ChemCrow: Augmenting large-language models with chemistry tools

AM Bran, S Cox, O Schilter, C Baldassari… - arXiv preprint arXiv …, 2023 - arxiv.org

Over the last decades, excellent computational chemistry tools have been developed.
Integrating them into a single platform with enhanced accessibility could help reaching their …

被引用次数：201 相关文章所有 2 个版本

[PDF] nature.com

Augmenting large language models with chemistry tools

A M. Bran, S Cox, O Schilter, C Baldassari… - Nature Machine …, 2024 - nature.com

Large language models (LLMs) have shown strong performance in tasks across domains
but struggle with chemistry-related problems. These models also lack access to external …

被引用次数：113 相关文章所有 8 个版本