Real-time recurrent learning (RTRL) has been independently derived by
many authors, although the most commonly cited reference for it is
Williams and Zipser (1989b) (for more details see also
Hertz et al. (1991, 184) and
Haykin (1998, 756)). This algorithm
computes the derivatives of states and outputs with
respect to all weights as the network processes the sequence, that is,
during the forward step. No unfolding is performed or necessary. For
instance, if the network has a simple next-state
dynamics such as
the one described in eq. (3.10), derivatives may be
computed together with the next state. The derivative of states with
respect to, say, state-state weights at time , would be computed
from the states and derivatives at time
and the input at time
as follows:
![]() |
(4.28) |
Since derivatives of outputs are easily
defined in terms of state derivatives for all architectures, the
learnable parameters of the DTRNN may be
updated after every time step in which output
targets are defined, (using the derivatives of
the error for each output), therefore
even after having processed only part of a sequence. This is one of
the main advantages of RTRL in applications where online learning is necessary; the other one is the ease with which it may
be derived and programmed for a new architecture; however, its time
complexity is much higher than that
of BPTT; for first-order
DTRNNs such as the above with more state
units than
input lines () the dominant term in
the time complexity is
. A detailed derivation of RTRL for a
second-order DTRNN architecture may be found
in (Giles et al., 1992).
The reader should be aware that the name RTRL (Williams and Zipser, 1989c) is applied to two different concepts: it may be viewed solely as a method to compute the derivatives or as a method to compute derivatives and to update weights (in each cycle). One may use RTRL to compute derivatives and update the weights after processing a complete learning set made up of a number of sequences (batch update), after processing each sequence (pattern update), and after processing each item in each sequence (online update). In these last two cases, the derivatives are not exact but approximate (they would be exact for a zero learning rate). For batch and pattern weight updates, RTRL and BPTT are equivalent, since they compute the same derivatives. The reader is referred to Williams and Zipser (1995) for a more detailed discussion.