Real-time recurrent learning (RTRL) has been independently derived by
many authors, although the most commonly cited reference for it is
Williams and Zipser (1989b) (for more details see also
Hertz et al. (1991, 184) and
Haykin (1998, 756)). This algorithm
computes the derivatives of states and outputs with
respect to all weights as the network processes the sequence, that is,
during the forward step. No unfolding is performed or necessary. For
instance, if the network has a simple next-state
dynamics such as
the one described in eq. (3.10), derivatives may be
computed together with the next state. The derivative of states with
respect to, say, state-state weights at time , would be computed
from the states and derivatives at time and the input at time
as follows:
(4.28) |
Since derivatives of outputs are easily defined in terms of state derivatives for all architectures, the learnable parameters of the DTRNN may be updated after every time step in which output targets are defined, (using the derivatives of the error for each output), therefore even after having processed only part of a sequence. This is one of the main advantages of RTRL in applications where online learning is necessary; the other one is the ease with which it may be derived and programmed for a new architecture; however, its time complexity is much higher than that of BPTT; for first-order DTRNNs such as the above with more state units than input lines () the dominant term in the time complexity is . A detailed derivation of RTRL for a second-order DTRNN architecture may be found in (Giles et al., 1992).
The reader should be aware that the name RTRL (Williams and Zipser, 1989c) is applied to two different concepts: it may be viewed solely as a method to compute the derivatives or as a method to compute derivatives and to update weights (in each cycle). One may use RTRL to compute derivatives and update the weights after processing a complete learning set made up of a number of sequences (batch update), after processing each sequence (pattern update), and after processing each item in each sequence (online update). In these last two cases, the derivatives are not exact but approximate (they would be exact for a zero learning rate). For batch and pattern weight updates, RTRL and BPTT are equivalent, since they compute the same derivatives. The reader is referred to Williams and Zipser (1995) for a more detailed discussion.