[HTML][HTML] Attention Is All You Need.(Nips), 2017

A Vaswani, N Shazeer, N Parmar, J Uszkoreit… - arXiv preprint arXiv …, 2017 - codetds.com
摘要占主导地位的序列转导模型基于复杂的递归或卷积神经网络, 包括编码器和解码器.
性能最好的模型还通过注意力机制连接编码器和解码器. 我们提出了一种新的简单网络架构 …

The best of both worlds: Combining recent advances in neural machine translation

MX Chen, O Firat, A Bapna, M Johnson… - arXiv preprint arXiv …, 2018 - arxiv.org
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling
for Machine Translation (MT). The classic RNN-based approaches to MT were first out …

Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology

J Wiens, ES Shenoy - Clinical infectious diseases, 2018 - academic.oup.com
The increasing availability of electronic health data presents a major opportunity in
healthcare for both discovery and practical applications to improve healthcare. However, for …

Learning deep transformer models for machine translation

Q Wang, B Li, T Xiao, J Zhu, C Li, DF Wong… - arXiv preprint arXiv …, 2019 - arxiv.org
Transformer is the state-of-the-art model in recent machine translation evaluations. Two
strands of research are promising to improve models of this kind: the first uses wide …

Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network

A Sherstinsky - Physica D: Nonlinear Phenomena, 2020 - Elsevier
Because of their effectiveness in broad practical applications, LSTM networks have received
a wealth of coverage in scientific journals, technical blogs, and implementation guides …

Self-attentive sequential recommendation

WC Kang, J McAuley - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org
Sequential dynamics are a key feature of many modern recommender systems, which seek
to capture the'context'of users' activities on the basis of actions they have performed recently …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

A Conneau, G Kruszewski, G Lample, L Barrault… - arXiv preprint arXiv …, 2018 - arxiv.org
Although much effort has recently been devoted to training high-quality sentence
embeddings, we still have a poor understanding of what they are capturing." Downstream" …

Massively multilingual neural machine translation in the wild: Findings and challenges

N Arivazhagan, A Bapna, O Firat, D Lepikhin… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce our efforts towards building a universal neural machine translation (NMT)
system capable of translating between any language pair. We set a milestone towards this …

[PDF][PDF] Attention is all you need

A Vaswani - Advances in Neural Information Processing Systems, 2017 - user.phil.hhu.de
The dominant sequence transduction models are based on complex recurrent
orconvolutional neural networks in an encoder and decoder configuration. The best …