Next: #pollack91j###:
Up: Inference of finite-state machines
Previous: Inference of finite-state machines
  Contents
  Index
this paper describes how an
Elman (1990) simple recurrent network
may be trained to read a string and predict the next
symbol.
When a network is trained to predict the next
symbol, a one-hot, exclusive, or local interpretation (one unit per
symbol, high when the symbol is present and low
otherwise) is used for
the alphabet, and a quadratic error function is used, it is the case
that the actual outputs of the DTRNN after having seen a string are a
good approximation to the frequencies observed for each of the
successors of that string in the learning set. If the learning set is interpreted
to be a representative draw from a given probability
distribution over
strings, then the outputs may actually be interpreted as
probabilities, and the DTRNN may be used as a generator having a
similar probability distribution.
However, Cleeremans et al. (1989) do not use
the trained DTRNN
as a generator but rather as an acceptor for the
language (they explored two languages). A
string is accepted if each of its
successors is
predicted with a probability
higher than a threshold, which the authors set to 0.36.2. For the simplest language,
using 70,000 random strings, of which only 210 were grammatical, the
network only accepted the grammatical ones. Also, the network accepted
all string in a set of 20,000 random grammatical strings. The second
language modelled long-term
dependencies: the strings started and ended with the same symbol;
if the probabilities of intervening strings were independent of that
symbol, the network failed to learn the language; however, when the
distribution of intervening strings depended even if very slightly on
the initial symbol, then the DTRNN learned the
language.
The authors also perform a hierarchical clustering of the observed values of the hidden units and observe clusters that correspond to the states in the
automata defining the languages.
Next: #pollack91j###:
Up: Inference of finite-state machines
Previous: Inference of finite-state machines
  Contents
  Index
Debian User
2002-01-21