Next: Long-term dependencies:
Up: Learning problems
Previous: Learning problems
  Contents
  Index
The error function for a given learning set is usually a function of a relatively large number of learnable
parameters. For example, a rather small
DTRNN, say, an Elman net with two inputs, two output
units, and three state units has 21 weights, 5 biases and, in case we
decide to adjust them, 3 initial state values. Assume we have already
found a minimum in the error surface. Due to the structure of
connections, choosing any of the 6 possible permutations of the 3
state neurons would yield exactly the same value for the error
function. But, in addition to this, it is very
likely that the 26-dimensional space of weights and biases is plagued
with local minima, some of which may actually not
correspond to the computational task we want to learn. Since it is not
feasible for any learning algorithm to sample the whole 26-dimensional
space, the possibility that it finds a suboptimal minimum of the error
function is very high. This problem
is especially important with local-search algorithms such as gradient
descent: if the algorithm slowly modifies the
learnable parameters to go downhill on the
error surface, it may end up
trapped in any local minimum. The problem of multiple minima is not a
specific problem of DTRNN; it affects almost all neural-network
architectures.
Next: Long-term dependencies:
Up: Learning problems
Previous: Learning problems
  Contents
  Index
Debian User
2002-01-21