[HTML][HTML] Attention Is All You Need.(Nips), 2017
摘要占主导地位的序列转导模型基于复杂的递归或卷积神经网络, 包括编码器和解码器.
性能最好的模型还通过注意力机制连接编码器和解码器. 我们提出了一种新的简单网络架构 …
性能最好的模型还通过注意力机制连接编码器和解码器. 我们提出了一种新的简单网络架构 …
On the state of the art of evaluation in neural language models
Ongoing innovations in recurrent neural network architectures have provided a steady influx
of apparently state-of-the-art results on language modelling benchmarks. However, these …
of apparently state-of-the-art results on language modelling benchmarks. However, these …
Unsupervised opinion summarization as copycat-review generation
Opinion summarization is the task of automatically creating summaries that reflect subjective
information expressed in multiple documents, such as product reviews. While the majority of …
information expressed in multiple documents, such as product reviews. While the majority of …
Robust speech recognition via large-scale weak supervision
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
Simvlm: Simple visual language model pretraining with weak supervision
With recent progress in joint modeling of visual and textual representations, Vision-
Language Pretraining (VLP) has achieved impressive performance on many multimodal …
Language Pretraining (VLP) has achieved impressive performance on many multimodal …
Unifying vision-and-language tasks via text generation
Existing methods for vision-and-language learning typically require designing task-specific
architectures and objectives for each task. For example, a multi-label answer classifier for …
architectures and objectives for each task. For example, a multi-label answer classifier for …
Training graph neural networks with 1000 layers
Deep graph neural networks (GNNs) have achieved excellent results on various tasks on
increasingly large graph datasets with millions of nodes and edges. However, memory …
increasingly large graph datasets with millions of nodes and edges. However, memory …
Ctrl: A conditional transformer language model for controllable generation
Large-scale language models show promising text generation capabilities, but users cannot
easily control particular aspects of the generated text. We release CTRL, a 1.63 billion …
easily control particular aspects of the generated text. We release CTRL, a 1.63 billion …
Virtex: Learning visual representations from textual annotations
The de-facto approach to many vision tasks is to start from pretrained visual representations,
typically learned via supervised training on ImageNet. Recent methods have explored …
typically learned via supervised training on ImageNet. Recent methods have explored …
Generalization through memorization: Nearest neighbor language models
We introduce $ k $ NN-LMs, which extend a pre-trained neural language model (LM) by
linearly interpolating it with a $ k $-nearest neighbors ($ k $ NN) model. The nearest …
linearly interpolating it with a $ k $-nearest neighbors ($ k $ NN) model. The nearest …