#Elm90a###:

Next: #pollack90j###: Up: Papers Previous: #jordan86t###: Contents Index

Elman (1990):

Elman's paper , ``Finding structure in time'' (http://www.dlsi.ua.es/~mlf/nnafmc/papers/elman90finding.pdf), introduces another widely-used recurrent architecture, the simple recurrent net, which everyone calls now an Elman net (see section 3.2.2); previous state ${\bf x}[t-1]$ is called context --in view of the fact that they try to encode information about the inputs seen so far, ${\bf u}[1]\ldots {\bf u}[t-1]$ -- and current state ${\bf x}[t]$ is said to be stored in hidden units. Instead of using BPTT or RTRL, the networks are trained using simple backpropagation in each time step without considering the recurrent effects of each weight on the values of context units. Elman studies the performance of this network and the nature of the representations learned by the network when it is trained to perform four sequence prediction tasks:

Predicting the next bit in a sequence in which every third bit is the exclusive or of the previous two, which are randomly chosen; the error of the trained network drops every three cycles, when the current bit may be predicted from past inputs.
Predicting the next letter in a random sequence of the three syllables ba, dii and guu where letters are represented by binary vectors representing their articulatory (phonetic) features. The network learns to predict the vowels from the consonants and also the fact that a consonant follows the vowels, even if it is impossible to predict which one.
Predicting the next letter in a sequence of concatenated words (without blanks) from a 15-word lexicon; letters are represented by random 5-bit vectors. As a result, prediction error gracefully falls inside words and raises at the end of the word, where the letter starting the next word cannot be predicted: the network learns to predict word boundaries.
Predicting the next word in a concatenated sequence of two- and three-word sentences; each words is represented by a randomly-assigned binary vector having a single bit on ( one-hot encoding). Hierarchical clustering studies of the hidden unit activation patterns show that the net has developed representations of words that correspond to lexical classes (noun, verb) and subclasses (transitive verb, animate noun), etc. simply by learning the sequential layout of words.

In all cases, Elman (1990) nets learn the temporal structure present in the sequences of events they are trained to predict.

Next: #pollack90j###: Up: Papers Previous: #jordan86t###: Contents Index

Debian User 2002-01-21