Next: #pollack90j###:
Up: Papers
Previous: #jordan86t###:
  Contents
  Index
Elman's paper ,
``Finding structure in time'' (http://www.dlsi.ua.es/~mlf/nnafmc/papers/elman90finding.pdf), introduces another widely-used
recurrent architecture, the simple recurrent net, which everyone
calls now an Elman net (see section 3.2.2);
previous state is called context --in view of
the fact that they try to encode information about the inputs seen so
far,
-- and current state
is said to be stored in hidden units. Instead of using BPTT or
RTRL, the networks are trained using simple backpropagation in each
time step without considering the recurrent effects of each weight on
the values of context units. Elman studies the performance of this
network and the nature of the representations learned by the network
when it is trained to perform four sequence
prediction tasks:
- Predicting the next bit in a sequence in which every third bit
is the exclusive or of the previous two, which are randomly chosen;
the error of the trained network drops every three cycles, when the
current bit may be predicted from past inputs.
- Predicting the next letter in a random sequence of the three
syllables ba, dii and guu where letters are represented
by binary vectors representing their articulatory (phonetic)
features. The network learns to predict the vowels from the
consonants and also the fact that a consonant follows the vowels,
even if it is impossible to predict which one.
- Predicting the next letter in a sequence of concatenated words
(without blanks) from a 15-word lexicon; letters are represented by
random 5-bit vectors. As a result, prediction error gracefully falls
inside words and raises at the end of the word, where the letter
starting the next word cannot be predicted: the network learns to
predict word boundaries.
- Predicting the next word in a concatenated sequence of two- and
three-word sentences; each words is represented by a
randomly-assigned binary vector having a single bit on ( one-hot encoding). Hierarchical
clustering studies of the hidden
unit activation
patterns show that
the net has developed representations of words that correspond to
lexical classes (noun, verb) and subclasses (transitive verb,
animate noun), etc. simply by learning the sequential layout of
words.
In all cases, Elman (1990) nets learn the temporal structure
present in the sequences of events they are trained to predict.
Next: #pollack90j###:
Up: Papers
Previous: #jordan86t###:
  Contents
  Index
Debian User
2002-01-21