DTRNN based on sigmoid units

Next: Some DTRNN architectures cannot Up: DTRNN behaving as finite-state Previous: #goudreau94a###, Contents Index

DTRNN based on sigmoid units

Much of the work performed by researchers in the contact area between neural networks and formal language and computation theory concerns the training of sigmoid-based DTRNN to identify or approximate finite-state machines, and, in particular, regular language acceptors such as deterministic finite-state automata (DFA). This is the subject of chapter 5. Most of this work starts by assuming that a given DTRNN architecture is capable of performing the same computation as a finite-state machine (FSM). This section addresses the following question: ``When does a DTRNN behave as a FSM?''

In a recent paper, Casey (1996) has shown that a DTRNN performing a robust DFA-like computation (a special case of FSM-like computation) must organize its state space in mutually disjoint, closed sets with nonempty interiors corresponding to the states of the DFA. These states can be taken to be all of the points such that if the DTRNN is initialized with any of them, the DTRNN will produce the same output as the DFA initialized in the corresponding state. This formulation, which is closely connected to what Pollack (1991) called a dynamical recognizer, has inspired the following definition. A DTRNN $N=(X,U,Y,{\bf f},{\bf h},{\bf x}_0)$ behaves as a FSM $M=(Q,\Sigma,\Gamma,\delta,\lambda,q_I)$ when the following conditions are held:

Partition of the state space:

Each state $q_i \in Q$ is assigned a nonempty region $X_i\subseteq X$ such that the DTRNN

is said to be in state

at time

when ${\bf x}[t]\in X_i$ . Accordingly, these regions must be disjoint: $X_i\cap X_j=\emptyset$ if $q_i\neq q_j$ . Note that there may be regions of

that are not assigned to any state.

Representation of input symbols:

Each possible input symbol $\sigma_k \in\Sigma$ is assigned a different vector ${\bf u}_k\in U$ (it would also be possible to assign a region $U_k\subseteq U$ to each symbol).

Interpretation of output:

Each possible output symbol $\gamma_m \in \Gamma$ is assigned a nonempty region $Y_m\subseteq Y$ such that the DTRNN

is said to output symbol $\gamma_m$ at time

when ${\bf y}[t]\in Y_m$ . Analogously, these regions must be disjoint: $Y_m\cap Y_n=\emptyset$ if $\gamma_m\neq\gamma_n$ . Note that there may be regions of

that are not assigned to any output symbol.

Correctness of the initial state:

The initial state of the DTRNN

, belongs to the region assigned to the initial state

, that is, ${\bf x}_0\in X_I$ .

Correctness of the next-state function:

For any state

and symbol $\sigma_k$ of

, the transitions performed by the DTRNN

from any point in the region of state space

assigned to state

when symbol $\sigma_k$ is presented to the network must lead to points that belong to the region

assigned to $q_i=\delta(q_j,\sigma_k)$ ; formally, this may be expressed as

$\begin{displaymath} {\bf f}_k(X_j)\subseteq X_i \;\;\;\forall q_j\in Q,\sigma_k\in\Sigma : \delta(q_j,\sigma_k)=q_i \end{displaymath}$

(5.1)

where the shorthand notation

$\begin{displaymath} {\bf f}_k(A)=\{{\bf f}({\bf x}, {\bf u}_k) : {\bf x}\in A \} \end{displaymath}$

(5.2)

has been used.

Correctness of output:

In the case of Mealy NSM , for any state

and symbol $\sigma_k$ of

, the output produced by the DTRNN

from any point in the region of state space

assigned to state

when symbol $\sigma_k$ is presented to the network belongs to the region

assigned to $\gamma_m=\lambda(q_j,\sigma_k)$ ; formally, this may be expressed as

$\begin{displaymath} {\bf h}_k(X_j)\subseteq Y_m \;\;\;\forall q_j\in Q,\sigma_k\in\Sigma : \lambda(q_j,\sigma_k)=\gamma_m \end{displaymath}$

(5.3)

where the shorthand notation

$\begin{displaymath} {\bf h}_k(A)=\{{\bf h}({\bf x},{\bf u}_k) : {\bf x}\in A \} \end{displaymath}$

(5.4)

has been used. In the case of Moore NSM , for any state

, the output produced by the DTRNN

from any point in the region the condition for the correctness of the output may be expressed as:

$\begin{displaymath} {\bf h}(X_j)\subseteq Y_m \;\;\;\forall q_j\in Q : \lambda(q_j)=\gamma_m, \end{displaymath}$

(5.5)

with ${\bf h}(A)=\{{\bf h}({\bf x}) : {\bf x}\in A\}$ .

Note that the regions $X_i\subseteq X$ , $i=1,\ldots,\vert Q\vert$ and $Y_m\subseteq Y$ , $m=1,\ldots,\vert\Gamma\vert$ may have a nondenumerable ^5.5 number of points, because of being subsets of $\mathbb{R}^n$ for some . However, for a finite input alphabet $\Sigma$ , only a denumerable number of points in the state () and output () spaces are actually visited by the net for the set of all possible finite-length input strings over $\Sigma$ , denoted $\Sigma^*$ , which is also denumerable.

Note also that DFA represent a special case: as said in section 2.3.3, deterministic finite-state automata may be seen as Moore or Mealy machines having an output alphabet $\Gamma=\{{\cal Y},{\cal N}\}$ whose output is only examined after the last symbol of the input string is presented, and it is such that, for a Moore machine,

$\begin{displaymath} \lambda(q_i) =\begin{array}{l} {\cal Y} \mbox{ if } q_i\in F \\ {\cal N} \mbox{ otherwise,} \end{array}\end{displaymath}$

(5.6)

and for a Mealy machine,

$\begin{displaymath} \lambda(q_i,\sigma_k) =\begin{array}{l} {\cal Y} \mbox{ if... ...q_i,\sigma_k)\in F \\ {\cal N} \mbox{ otherwise.} \end{array}\end{displaymath}$

(5.7)

A region in output space

would be assigned to each one of these two symbols: $Y_{\cal Y}$ , $Y_{\cal N}$ , such that $Y_{\cal Y}\cap Y_{\cal N}=\emptyset$ , and the output of the NSM would only be examined after the whole string has been processed.

Recently, Šíma (1997) has shown that the behavior of any DTRNN using threshold activation functions may be stably emulated by another DTRNN using activation functions in a very general class which includes the sigmoid functions considered in this paper, and describes the conditions under which this emulation is correct for inputs of any length. This means that any of the constructions described in section 4.2.1 to encode FSM in DTRNN may be converted, using the method by Šíma (1997) into an equally-capable analog DTRNN.

Subsections

Some DTRNN architectures cannot implement all FSM

Next: Some DTRNN architectures cannot Up: DTRNN behaving as finite-state Previous: #goudreau94a###, Contents Index

Debian User 2002-01-21