Next: Computational capabilities of DTRNN
Up: Papers
Previous: #pollack90j###:
  Contents
  Index
This paper http://www.dlsi.ua.es/~mlf/nnafmc/papers/bengio94learning.pdf discusses the problem of
long-term dependencies, a problem which is specific to DTRNN-like
sequence processing devices and may be formulated as follows: when the
sequence processing task is such that the output after reading a
relatively long sequence depends on details of the early items of the
sequence, it may occur that learning algorithms are unable to
acknowledge this dependency due to the fact that the actual output of
the DTRNN at the current time is very insensitive to small variations
in the early input, or, what is equivalent, to the small variations in
the weights involved in the early processing of the event (even if the
change in the early input is large); this is known as the problem of
vanishing gradients (see also
(Haykin, 1998, 773)). Small
variations in weights are the modus operandi of most learning
algorithms, in particular, but not exclusively, of
gradient-descent
algorithms. Bengio et al. (1994) prove that the vanishing of
gradients is specially severe when we want the DTRNN to robustly store
information about a very early effect. The paper also presents a
series of experiments in which the performance of alternate DTRNN
learning methods is evaluated for three simple single-input
single-output problems having long-term dependencies; the experiments
show a partial success of some of them.
Next: Computational capabilities of DTRNN
Up: Papers
Previous: #pollack90j###:
  Contents
  Index
Debian User
2002-01-21