[HTML][HTML] Attention Is All You Need.(Nips), 2017

A Vaswani, N Shazeer, N Parmar, J Uszkoreit… - arXiv preprint arXiv …, 2017 - codetds.com
摘要占主导地位的序列转导模型基于复杂的递归或卷积神经网络, 包括编码器和解码器.
性能最好的模型还通过注意力机制连接编码器和解码器. 我们提出了一种新的简单网络架构 …

On the state of the art of evaluation in neural language models

G Melis, C Dyer, P Blunsom - arXiv preprint arXiv:1707.05589, 2017 - arxiv.org
Ongoing innovations in recurrent neural network architectures have provided a steady influx
of apparently state-of-the-art results on language modelling benchmarks. However, these …

Unsupervised opinion summarization as copycat-review generation

A Bražinskas, M Lapata, I Titov - arXiv preprint arXiv:1911.02247, 2019 - arxiv.org
Opinion summarization is the task of automatically creating summaries that reflect subjective
information expressed in multiple documents, such as product reviews. While the majority of …

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Simvlm: Simple visual language model pretraining with weak supervision

Z Wang, J Yu, AW Yu, Z Dai, Y Tsvetkov… - arXiv preprint arXiv …, 2021 - arxiv.org
With recent progress in joint modeling of visual and textual representations, Vision-
Language Pretraining (VLP) has achieved impressive performance on many multimodal …

Unifying vision-and-language tasks via text generation

J Cho, J Lei, H Tan, M Bansal - International Conference on …, 2021 - proceedings.mlr.press
Existing methods for vision-and-language learning typically require designing task-specific
architectures and objectives for each task. For example, a multi-label answer classifier for …

Training graph neural networks with 1000 layers

G Li, M Müller, B Ghanem… - … conference on machine …, 2021 - proceedings.mlr.press
Deep graph neural networks (GNNs) have achieved excellent results on various tasks on
increasingly large graph datasets with millions of nodes and edges. However, memory …

Ctrl: A conditional transformer language model for controllable generation

NS Keskar, B McCann, LR Varshney, C Xiong… - arXiv preprint arXiv …, 2019 - arxiv.org
Large-scale language models show promising text generation capabilities, but users cannot
easily control particular aspects of the generated text. We release CTRL, a 1.63 billion …

Virtex: Learning visual representations from textual annotations

K Desai, J Johnson - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
The de-facto approach to many vision tasks is to start from pretrained visual representations,
typically learned via supervised training on ImageNet. Recent methods have explored …

Generalization through memorization: Nearest neighbor language models

U Khandelwal, O Levy, D Jurafsky… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce $ k $ NN-LMs, which extend a pre-trained neural language model (LM) by
linearly interpolating it with a $ k $-nearest neighbors ($ k $ NN) model. The nearest …