A number of learning algorithms for DTRNN are coupled to a particular
architecture: for example, BPS
(Gori et al., 1989) is a special algorithm used to
train local feedback networks, that is, DTRNN in which the
value of a state unit is computed by using only its previous
value
but not the rest of the state values
. Local-feedback DTRNN using threshold linear units and having a two-layer
output network capable of
performing any Boolean mapping have recently been shown
(Frasconi et al., 1996) to be capable of
recognizing only a subset of regular
languages, and to be
incapable of emulating all FSM (Kremer, 1999). A related
algorithm is focused backpropagation(Mozer, 1989). Learning
algorithms are also very simple when states are observable (such as in
NARX networks, see section 3.2.3), because, during
learning, the desired value for the state may be fed back instead of the
actual value being computed by the DTRNN; this is usually called teacher forcing..
But sometimes not only learning algorithms are specialized on a particular architecture but it is also the case that the algorithm modifies the architecture during learning. One such algorithm is Fahlman's (Fahlman, 1991) recurrent cascade correlation, which is described in the following section.