作者
Yoshua Bengio, Paolo Frasconi, Patrice Simard
发表日期
1993/3/28
研讨会论文
IEEE international conference on neural networks
页码范围
1183-1188
出版商
IEEE
简介
The authors seek to train recurrent neural networks in order to map input sequences to output sequences, for applications in sequence recognition or production. Results are presented showing that learning long-term dependencies in such recurrent networks using gradient descent is a very difficult task. It is shown how this difficulty arises when robustly latching bits of information with certain attractors. The derivatives of the output at time t with respect to the unit activations at time zero tend rapidly to zero as t increases for most input values. In such a situation, simple gradient descent techniques appear inappropriate. The consideration of alternative optimization methods and architectures is suggested.< >
引用总数
199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320244611241222335237117631151234354955475121
学术搜索中的文章
Y Bengio, P Frasconi, P Simard - IEEE international conference on neural networks, 1993