Self-supervised representation learning: Introduction, advances, and challenges
Self-supervised representation learning (SSRL) methods aim to provide powerful, deep
feature learning without the requirement of large annotated data sets, thus alleviating the …
feature learning without the requirement of large annotated data sets, thus alleviating the …
Glm: General language model pretraining with autoregressive blank infilling
There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5) …
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5) …
[HTML][HTML] GPT understands, too
Prompting a pretrained language model with natural language patterns has been proved
effective for natural language understanding (NLU). However, our preliminary study reveals …
effective for natural language understanding (NLU). However, our preliminary study reveals …
An introduction to deep learning in natural language processing: Models, techniques, and tools
Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …
involves the design and implementation of systems and algorithms able to interact through …
ProteinBERT: a universal deep-learning model of protein sequence and function
Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …
language tasks, and has recently been repurposed to biological sequences. However …
A robustly optimized BERT pre-training approach with post-training
Z Liu, W Lin, Y Shi, J Zhao - China National Conference on Chinese …, 2021 - Springer
In the paper, we present a ''+''+''three-stage paradigm, which is a supplementary framework
for the standard ''+''language model approach. Furthermore, based on three-stage paradigm …
for the standard ''+''language model approach. Furthermore, based on three-stage paradigm …
Retentive network: A successor to transformer for large language models
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …
language models, simultaneously achieving training parallelism, low-cost inference, and …
Calibrate before use: Improving few-shot performance of language models
GPT-3 can perform numerous tasks when provided a natural language prompt that contains
a few training examples. We show that this type of few-shot learning can be unstable: the …
a few training examples. We show that this type of few-shot learning can be unstable: the …
Benchmarking large language models in retrieval-augmented generation
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …
hallucination of large language models (LLMs). However, existing research lacks rigorous …
Measuring mathematical problem solving with the math dataset
Many intellectual endeavors require mathematical problem solving, but this skill remains
beyond the capabilities of computers. To measure this ability in machine learning models …
beyond the capabilities of computers. To measure this ability in machine learning models …