Neural Mealy machines

Next: Neural Moore machines Up: Discrete-time recurrent neural networks Previous: Discrete-time recurrent neural networks Contents Index

Neural Mealy machines

Omlin and Giles (1996a) and Omlin and Giles (1996b) have used a second-order ^4.6recurrent neural network (similar to the one used by Giles et al. (1992), Pollack (1991), Forcada and Carrasco (1995), Watrous and Kuhn (1992), and Zeng et al. (1993) ) which may be formulated as a Mealy NSM described by a next-state function whose -th coordinate ( $i=1,\ldots,n_X$ ) is

$\begin{displaymath} f_i({\bf x}[t-1],{\bf u}[t])= g\left( \sum_{j=1}^{n_X} \sum_{k=1}^{n_U} W^{xxu}_{ijk} x_j[t-1] u_k[t] + W^x_i \right), \end{displaymath}$

(4.8)

where $g:\mathbb{R}\rightarrow [S_0,S_1]$ (usually

and

) is the activation function ^4.7of the neurons, and an output function whose

-th coordinate ( $i=1,\ldots,n_Y)$ is

$\begin{displaymath} h_i({\bf x}[t-1],{\bf u}[t])= g\left( \sum_{j=1}^{n_X} \sum_{k=1}^{n_U} W^{yxu}_{ijk} x_j[t-1] u_k[t] + W^y_i \right). \end{displaymath}$

(4.9)

Throughout this document, a homogeneous notation will be used for weights . Superscripts indicate the computation in which the weight is involved: the

in $W_{ijk}^{xxu}$ indicates that the weight is used to compute a state (

) from a state and an input (

); the

(a bias) indicates that it is used to compute an output. Subscripts designate, as usual, the particular units involved and run parallel to superscripts.

Activation functions are usually required to be real-valued, monotonously growing, continuous (very often also differentiable), and bounded; they are usually nonlinear. Two commonly used examples of differentiable activation functions are the logistic function $g_L(x)=1/(1+\exp(-x))$ , which is bounded by and , and the hyperbolic tangent $g_T(x)={\rm tanh}(x)=(1-\exp(-2x))/(1+\exp(-2x))$ , which is bounded by and . Activation functions are usually required to be differentiable because this allows the use of learning algorithms based on gradients. There are also a number of architectures that do not use sigmoid-like activation functions but instead use radial basis functions ((Haykin, 1998), ch. 5; Hertz et al. (1991, 248)), which are not monotonous but instead are Gaussian-like functions that reach their maximum value for a given value of their input. DTRNN architectures using radial basis functions have been used by Frasconi et al. (1996); Cid-Sueiro et al. (1994).

Another Mealy NSM is that defined by Robinson and Fallside (1991) under the name of recurrent error propagation network, a first-order DTRNN which has a next-state function whose -th coordinate ( $i=1,\ldots,n_X$ ) is given by

$\begin{displaymath}f_i({\bf x}[t-1],{\bf u}[t])=g\left(\sum_{j=1}^{n_X} W_{ij}... ...j[t-1] + \sum_{j=1}^{n_U} W_{ij}^{xu} u_j[t] + W^x_i\right), \end{displaymath}$

(4.10)

and an output function ${\bf h}({\bf x}[t-1],{\bf u}[t])$ whose

-th component ( $i=1,\ldots,n_Y$ ) is given by

$\begin{displaymath} h_i({\bf x}[t-1],{\bf u}[t])=g\left(\sum_{j=1}^{n_X} W_{ij}^... ...j[t-1] + \sum_{j=1}^{n_U} W_{ij}^{yu} u_j[t] + W^y_i\right). \end{displaymath}$

(4.11)

Jordan (1986) nets may also be formulated as Mealy NSM. Both the next-state and the output function use an auxiliary function ${\bf z}({\bf x}[t-1],{\bf u}[t])$ whose

-th coordinate is

$\begin{displaymath} z_i(({\bf x}[t-1],{\bf u}[t]) = g\left( \sum_{j=1}^{n_X} W_{... ...j[t-1] + \sum_{j=1}^{n_U} W_{ij}^{zu} u_j[t] + W_i^z\right), \end{displaymath}$

(4.12)

with $i=1,\ldots,n_Z$ . The

-th coordinate of the next-state function is

$\begin{displaymath} f_i({\bf x}[t-1],{\bf u}[t]) = \alpha x_i[t-1] + g\left( \s... ...{n_Z} W_{ij}^{xz} z_j({\bf x}[t-1],{\bf u}[t]) + W_i^x\right) \end{displaymath}$

(4.13)

(with $\alpha\in[0,1]$ a constant) and the

-th coordinate of the output function is

$\begin{displaymath} h_i({\bf x}[t-1],{\bf u}[t]) = g\left( \sum_{j=1}^{n_Z} W_{ij}^{xz} z_j({\bf x}[t-1],{\bf u}[t]) + W_i^x\right). \end{displaymath}$

(4.14)

Next: Neural Moore machines Up: Discrete-time recurrent neural networks Previous: Discrete-time recurrent neural networks Contents Index

Debian User 2002-01-21