Introduction:
The cognitive task of the model
Pollack's Recursive Auto-Associative Memory (RAAM) was among a number of models developed in the late 1980's to address the question of how compositional structures may be stored within a connectionist framework. At the time of its development, there was a debate raging over the relative merits of symbolic and connectionist systems. The ability of symbolic AI systems to store and manipulate combinatorial structures of arbitrary size and complexity had proved to be a great advantage in many applications including language processing and strategic planning. The connectionist systems of the day, in contrast, typically dealt with activation vectors of a predetermined size or slid their data through a fixed-width buffer. Critics thus charged that they lacked the representational adequacy needed for higher level cognitive tasks.
The data and input/output task
The aim of the RAAM architecture was to demonstrate how certain typical kinds of compositional structures may be stored within a connectionist system. We describe here the storage of binary trees, though the approach has also been used for sequences and for trees of higher valence.
The data for a RAAM network consist of a collection of trees and a representation (pattern of 0's and 1's) for each terminal symbol occurring in those trees. The task for the network is to provide a means of compressing each tree into a representation (activation vector), and reconstructing a tree from its representation.
The network
The RAAM architecture is very similar to that of encoder networks (Ackley et al., 1985), consisting of a compressor network and a reconstructor network. The principal difference is that in a RAAM the compressor and reconstructor are used recursively to encode and decode, respectively.
For example, to encode the tree (E (F G)), we would first feed (F G) into the compressor network, giving output a. We would then feed in (E a), giving output b. To decode, we feed b into the reconstructor network, giving (E a). Some kind of `terminal test' would then be applied to both E and a to tell us that E is a terminal (requiring no further decoding), while a is a non-terminal which must be fed again into the reconstructor - giving (F G).
Since the activation vectors for the terminals are pre-determined and consist entirely of 0's and 1's, while those of the non-terminals are determined dynamically in the course of training and may take on any intermediate values between 0 and 1, it is convenient to adopt the following as our `terminal test': if all the activation values are close to 0 or 1 (i.e. <0.2 or >0.8), we assume that a terminal symbol has been reached and round off the activations to 0 or 1. Otherwise we continue to decode.
Memory:
Information to be stored, and mechanisms of storage
There are really two kinds of information being stored by the network: firstly, the representations of the trees (and subtrees); secondly, their structural relationship to each other. The latter is what is directly computed by the network, while the former may be recovered indirectly by recursive application of the compressor network.
Describe how the mechanisms achieve the memory
The RAAM network implements what may be called a moving target learning algorithm in the sense that the representations of the non-terminals are not fixed in advance but determined dynamically in the course of training, as follows:
At each training step, the representations of the left and right children of a given subtree are fed into the compressor network and whatever output emerges is taken to be the new representation of that subtree. This output is then fed into the reconstructor and the combined compressor-reconstructor network is trained as a two-layer backpropagation network to reproduce the representations that were input (in the same manner as the encoder networks of Ackley et al., 1985). The representations to be learned are thus changing from one epoch to the next, but eventually the system settles down into a stable state such that all the trees can be compressed and reconstructed accurately.
Time:
Since the trees are compressed and reconstructed recursively, combinatorial structure is in some sense converted into temporal structure. The network is able to store an entire tree by essentially `chopping' it up into its component branches and then processing each branch one at a time in discrete time steps. This is true both in the training phase and the recall phase. Storage or retrieval of a tree requires a whole number of discrete time steps equal to the number of branches in the tree.
Change:
Both the representations of the subtrees and the structural mappings between them are changing over time in a kind of co-evolution as the training proceeds. Generally, the representations for subtrees of a given depth cannot settle down until after the representations for trees at lower depth have already settled, so there is a natural relaxation process by which the representations for subtrees of depth 1, 2, etc. settle one after the other until the whole system eventually stabilizes. It is noteworthy that the network is able to learn the structural relationships between representations even though those representations are changing from one epoch to the next.
Structure:
How structure in the world is represented as structure in the model
The left and right reconstructor networks form an Iterated Function System or IFS (Barnsley, 1988). The trees, embedded by representation into a `hypercube' of possible activation vectors, represent transient states on the way to the attractor of this IFS. The way a RAAM compresses the structure of the trees into these two maps can thus be compared with Fractal Image Compression (Barnsley, 1988) which - relying on the self-similarity of natural scenes - is able to encode an image compactly by calculating and storing the IFS maps that will re-create that image.
Generalization and its causes
RAAM networks show a considerable capacity for generalization - i.e. storage and retrieval of new trees that were not in the training set. Two factors seem to facilitate this generalization: firstly, there is the modular, recursive structure of the stored data, which allows new trees to be built from existing subtrees; secondly, the fact that the compressor and reconstructor are trained together as a single network encourages them each to adapt to the other's behaviour, so they are more likely to work properly together when confronted with new (unseen) input.
Discussion and Conclusions:
Like all neural network models, the exact details of the RAAM architecture make it a little too contrived to be an accurate description of what really goes on in the brain as it performs a complex task. It does however demonstrate:
(1) how compositional structured objects may
be stored within a connectionist system,
(2) how the same network weights, by
recursive encoding, may be used to
store different parts of a combinatorial
structure at different levels,
(3) how different parts of a complex network
may co-evolve with each other in order
to work better together as the whole
system develops.
References
Pollack, J.B. 1990.
Recursive Distributed Representations,
Artificial Intelligence
46(1), 77-105.
Ackley, D.H., G.E. Hinton & T.J. Sejnowski, 1985.
A learning algorithm for Boltzman Machines,
Cognitive Science 9,
147-169.
Barnsley, M.F. 1988. Fractals Everywhere
(Academic Press, San Diego CA).