Next: Representing and learning Up: Grammatical inference with DTRNN Previous: Grammatical inference (GI) Contents Index

Discrete-time recurrent neural networks for grammatical inference

This chapter is concerned with the use of discrete-time recurrent neural networks (DTRNN) for grammatical inference. DTRNN may be used as sequence processors in three main modes:

Neural acceptors/recognizers:: DTRNN may be trained to accept strings belonging to a language and reject strings not belonging to it, by producing suitable labels after the whole string has been processed. In view of the computational equivalence between some DTRNN architectures and some finite-state machine (FSM) classes, it is reasonable to expect DTRNN to learn regular (finite-state) languages . A set of neural acceptors (separately or merged in a single DTRNN) may be used as a neural classifier.
Neural transducers/translators:: If the output of the DTRNN is examined not only at the end but also after processing each one of the symbols in the input, then its output may be interpreted as a synchronous, sequential transduction (translation) for the input string. DTRNN may be easily trained to perform synchronous sequential transductions and also some asynchronous transductions .
Neural predictors:: DTRNN may be trained to predict the next symbol of strings in a given language. The trained DTRNN, after reading string outputs a mixture of the possible successor symbols; in certain conditions (see e.g. Elman (1990) ), the output of the DTRNN may be interpreted as the probabilities of each of the possible successors in the language. In this last case, the DTRNN may be used as a probabilistic generator of strings.

When DTRNN are used for grammatical inference, the following have to be defined:

A learning set. The learning set may contain: strings labeled as belonging or not to a language or as belonging to a class in a finite set of classes (recognition/classification task)^6.1; a draw of unlabeled strings, possibly with repetitions, generated according to a given probability distribution (prediction/generation task); or pairs of strings (translation/transduction task).
An encoding for input symbols as input signals for the DTRNN. This defines the number of input lines of the DTRNN.
An interpretation for outputs: as labels, probabilities for successor symbols or transduced symbols. This defines the number of output units of the DTRNN.
A suitable DTRNN architecture, the number of state units and the number of units in other hidden layers .
Initial values for the learnable parameters of the DTRNN (weights, biases and initial states ).
A learning algorithm (including a suitable error function and a suitable stopping criterion) and a presentation scheme (the whole learning set may be presented from the beginning or a staged presentation may be devised).
An extraction mechanism to extract an automaton or grammar rules from the weights of the DTRNN . This will be discussed in detail in section 5.4.

Next: Representing and learning Up: Grammatical inference with DTRNN Previous: Grammatical inference (GI) Contents Index

Debian User 2002-01-21