#bengio94j###:

Next: Computational capabilities of DTRNN Up: Papers Previous: #pollack90j###: Contents Index

Bengio et al. (1994):

This paper http://www.dlsi.ua.es/~mlf/nnafmc/papers/bengio94learning.pdf discusses the problem of long-term dependencies, a problem which is specific to DTRNN-like sequence processing devices and may be formulated as follows: when the sequence processing task is such that the output after reading a relatively long sequence depends on details of the early items of the sequence, it may occur that learning algorithms are unable to acknowledge this dependency due to the fact that the actual output of the DTRNN at the current time is very insensitive to small variations in the early input, or, what is equivalent, to the small variations in the weights involved in the early processing of the event (even if the change in the early input is large); this is known as the problem of vanishing gradients (see also (Haykin, 1998, 773)). Small variations in weights are the modus operandi of most learning algorithms, in particular, but not exclusively, of gradient-descent algorithms. Bengio et al. (1994) prove that the vanishing of gradients is specially severe when we want the DTRNN to robustly store information about a very early effect. The paper also presents a series of experiments in which the performance of alternate DTRNN learning methods is evaluated for three simple single-input single-output problems having long-term dependencies; the experiments show a partial success of some of them.

Next: Computational capabilities of DTRNN Up: Papers Previous: #pollack90j###: Contents Index

Debian User 2002-01-21