Multiple minima:

Next: Long-term dependencies: Up: Learning problems Previous: Learning problems Contents Index

Multiple minima:

The error function for a given learning set is usually a function of a relatively large number of learnable parameters. For example, a rather small DTRNN, say, an Elman net with two inputs, two output units, and three state units has 21 weights, 5 biases and, in case we decide to adjust them, 3 initial state values. Assume we have already found a minimum in the error surface. Due to the structure of connections, choosing any of the 6 possible permutations of the 3 state neurons would yield exactly the same value for the error function. But, in addition to this, it is very likely that the 26-dimensional space of weights and biases is plagued with local minima, some of which may actually not correspond to the computational task we want to learn. Since it is not feasible for any learning algorithm to sample the whole 26-dimensional space, the possibility that it finds a suboptimal minimum of the error function is very high. This problem is especially important with local-search algorithms such as gradient descent: if the algorithm slowly modifies the learnable parameters to go downhill on the error surface, it may end up trapped in any local minimum. The problem of multiple minima is not a specific problem of DTRNN; it affects almost all neural-network architectures.

Next: Long-term dependencies: Up: Learning problems Previous: Learning problems Contents Index

Debian User 2002-01-21