Simon Dennis
Department of Psychology
The Univeristy of Queensland
mav@psy.uq.oz.au
http://psy.uq.oz.au/~mav/home.html
*Vector based accounts are those that conceptualize their inputs and outputs as vectors. The class includes both the feature/localistic representations of Minerva II (Hintzman, 1984) and the distributed representations of TODAM (Murdock, 1982), CHARM (Eich, 1982) and the Matrix model (Pike, 1984; Humphreys, Bain & Pike, 1989b).
Most mathematical models in the memory literature concentrate on the inputs and outputs of memory tasks. Significant focus has been placed on input variables such as list length, number of repetitions, presentation rate, item spacing and item similarity. Output variables such as accuracy, reaction time and familiarity ratings have also figured prominently, although accuracy has been studied most intensively.
By contrast to these earlier mathematical models, in error correcting backpropagation models, such as the Hebbian Recurrent Network (Dennis, 1993), performance is determined not only by the nature of the mechanism, but also by the parameters of the training set. Such mechanisms allow the inclusion of a new set of variables that correspond to the statistics of the environment with which the subject is faced on a day to day basis.
For the purposes of this paper, the variables will be divided into input, output and environmental parameters. Table 1 outlines this division and presents a summary of some of these psychological variables and the corresponding model variables including those that will be addressed in this article.
In general, the methods used to represent input variables in the vector based models translate very directly to connectionist models. For instance, words are often represented by vectors, the similarity of which is taken to correspond to experimental notions of similarity. Since connectionist models also take vectors as input the same assumptions can be made. Therefore in this article, I will concentrate on the output and environmental variables. The two major output variables which have been considered in the memory literature are accuracy and reaction time. The following two sections summarize this work and describe some of the more recent methods which can be applied to connectionist memory models. The fourth section discusses environmental variables and includes a review of a case study of constructing a training set based on environmental analysis.
Table 1: Summary of some of the psychological and model variables which have been taken to correspond in recent models of human memory. The first column describes the psychological variable which might be observed either in the laboratory setting or in the environment. The second column identifies the model construct which has been associated with that psychological variable and the third column lists the models in which this correspondence has been made. The correspondences listed as "possible" have not been made in the memory literature, but are very natural extensions within a connectionist framework.
Input Variables
| Psychological Variable | Model Variable | Model |
|---|---|---|
| list length in the experimental setting | list length in the test set | SAM, TODAM, Minerva II, Matrix, HRN, Ratcliff90 |
| # of repetitions in the experimental setting | # of repetitions in the test set | SAM, TODAM, Minerva II, Matrix, HRN, Chappell93 |
| presentation rate in the experimental setting | presentation rate in the test set | SAM, TODAM, Matrix, Chappell93 |
| item spacing in the experimetnal setting | item spacing in the test set | SAM, Minerva II |
| vocabulary | # different items in the test set | SAM, TODAM, Minerva II, Matrix, HRN, Chappell93, Ratcliff90 |
| similarity of experimetnal words | dot product of vectors representing words | TODAM, CHARM, Matrix, Minerva II |
Output Variables
| Psychological Variable | Model Variable | Model |
|---|---|---|
| accuracy i.e. d' in the experimental setting | accuracy i.e. d' based on familiarity compared against criterion | SAM, TODAM, CHARM, Minerva II, Matrix |
| accuracy i.e. d' in the experimental setting | accuracy i.e. d' based on maximally active output node | HRN |
| accuracy i.e. d' in the experimental setting | accuracy i.e. d' based on pattern to which autoassociator converges | Chappell93 |
| reaction time | number of processing cycles to stability | Chappell93 |
| familiarity ratings | sum of cubed dot products | Minerva II |
Environmental Variables
| Psychological Variable | Model Variable | Model |
|---|---|---|
| word frequency in the environment e.g. text | word frequency in the training set | HRN |
| recurrence probability in the environment e.g. text, speech | recurrence probability in the training set | HRN |
| development time | training epoch | possible |
| learning trial | training epoch | possible |
In recognition memory tasks, subjects are required to respond either yes or no depending on whether they believe they saw the current test word in the study list. The literature on modeling recognition memory has been dominated by the global matching models (Humphreys, Pike, Bain & Tehan, 1989). While these models can differ significantly in detail, they all employ a signal detection framework which involves some mechanism for calculating a familiarity value that is subsequently compared against a criterion (see figure 1).

A similar method can be applied to connectionist models. Ratcliff (1990) trained a three layer backpropagation network on a series of study vectors. The network was an autoassociator, that is, was required to reproduce the input vector on the output units. To test the network, he presented it with the test item and calculated the familiarity measure by taking the dot product of the output with the input vector. This value was then compared against a criterion (see figure 2).

An alternative to using the signal detection framework is to dispense with external decision mechanisms and allow the network to generate a response. Hence, in the context of recognition the output becomes a yes/no node or perhaps one yes and one no node (see figure 3). Such an approach was suggested by Ratcliff (1990) and is used in the HRN (Dennis, 1993). This approach simplifies the model, but raises questions about whether the appropriate teaching information could be available within the human cognitive system.

In the recall paradigms subjects are required to respond with an item from the study list. In the vector based models a search through the lexicon is conducted and the target vector which is the closest (has the largest dot product with) the retrieved vector is chosen. The same mechanism can be applied in connectionist networks.
As an alternative, Chappell (1993) used a Hopfield style autoassociative memory as a second phase to retrieval. The Hopfield style network was allowed to cycle until it was stable (i.e. stopped changing). The resulting vector was either one of the target vectors in which case it was scored as a hit, or a zero vector, incorrect target vector or linear combination of target vectors in which case it was scored as a miss.
Another alternative is to use a local rather than distributed code for the output vectors (c.f. Elman, 1989, 1990). The node with maximal output is chosen as the response of the network and is scored appropriately.
A final issue for consideration when calculating accuracy is to what does a single network correspond. One possibility is to simulate a single network and to compare these results against the aggregate results from subjects. For many feedforward architectures such an approach is appropriate because final performance is not affected unduly by the initial conditions. In some recurrent architectures, where significant variations are observed as a consequence of initial weight configurations and input pattern orderings (when using online training), it may be more appropriate to consider each network as a separate subject. This is the approach adopted in the work on the Hebbian Recurrent Network and opens the possibility of studying individual differences which arise as a consequence of systematically different subject histories.
Of the vector based models, only TODAM (Murdock, 1982) includes an explicit mechanism for determining reaction time in recognition (see figure 4). The mechanism employed involves adding a time varying noise component to the familiarity value. Two criteria are set. A yes decision is made when the signal plus noise exceeds the high criterion and a no decision is made when the signal plus noise is under the low criterion. When the familiarity value is between the criteria, the system waits for the noise to take the value out of the middle ground. Since the mechanism is based on the familiarity value it can be applied as easily to any of the global matching models.

While such a mechanism could be adopted in a connectionist framework, there are several architectures that have been suggested in which reaction time is intrinsic to the network model. One such possibility is to use the Hopfield style autoassociative "clean up" process (Chappell, 1993, see above). Each "processing cycle" of the network corresponds to a unit of time in the experimental situation. The reaction time is the number of cycles the network takes to reach stability. Such a model has been shown to have utility in modeling the related task of perceptual identification (Masson, 1989).
Another possibility is to use a time averaged net input to the activation function (Cohen, Dunbar & Mc Clelland, 1990; Reilly, 1993). That is,

is the net input to unit
j at time t,
is a decay
parameter,
is the activation of unit
i at time t, and
is
the weight between unit i and unit j at time t. As
the activation of a output node increases, evidence is accumulated
suggesting that it is the required output. A criterion is set and
network is said to have responded when the evidence exceeds this
criterion. Such an approach was adopted successfully by Cohen et. al.
(1990) in their model of the Stroop effect and by Reilly (1993) in his
model of eye movement in reading.A third possibility is to time average the activation itself (Mozer, 1992; Cottrell, Nguyen & Tsung, 1993), that is:

. Mozer suggested including
as
one of the parameters to be optimized. Cottrell et. al. (1993) found
that such a system was more robust in tracking sinusoidal waves. The
performance on more complicated tasks is an open question.
At this stage it is too early to give a comprehensive list of the important environmental variables or the possible mechanisms for providing environmental sensitivity. Instead, a review of a case study (Dennis, 1993) is presented to provide a concrete example of the nature of these variables and the way in which they can affect connectionist models.
The recognition task, however, involves determining whether a word has been repeated from the study list, that is, repeated with in the current list context. It is about accuracy on the recurrence of the word, not the occurrence of the word. If recurrence frequency in current English usage is inversely proportional to occurrence frequency, a connectionist model which was sensitive to recurrence frequency might conform to the experimental data.
To test this hypothesis, Dennis (1993) examined three hundred messages from the connectionist mailing list to determine the recurrence probabilities of high and low frequency words. Each message was considered as a separate context. Before counting began, the messages were filtered to remove mail headers, ftp instructions, punctuation, misspellings, and function words.
To quantify the likelihood of recurrence of a word within the same context, the word density was defined as the number of occurrences of the word divided by the number of contexts in which it occurred.

Figure 5 shows the average word density plotted against word frequency. On this sample, it is the case that low frequency words have higher recurrence rates than high frequency words.
A training set which reflected these environmental constraints was constructed. Certain input patterns were designated high frequency words and the rest were considered low frequency. The training set consisted of a series of study sequences followed by test trials. The frequency of the high frequency patterns in both the study and test phases was always greater than that of the low frequency patterns. The probability of a low frequency word being tested given that it was in the study list, however, was greater than for high frequency words.
When presented with the above training set the HRN model (Dennis, 1993) showed a low frequency word advantage in recognition.
The success of the work in this study depended on a careful examination of the task and the selection of the environmental variable (in this case recurrence probability) that most closely reflected the task. Once this variable had been identified, however, it was still necessary to determine how to map this into the statistics of the training set. The practicalities of simulation require that the environmental statistics be simplied, but in so doing one must avoid overlooking environmental constraints which may alter the results of simulation. For example, it would have been possible to ensure that the recurrence rate of high frequency words was lower than that of low frequency words, but neglect to ensure that the high frequency words continued to occur with higher frequency. Clearly, this would have seriously undermined the validity of the results.
It is in the area of environmental constraints, however, that I believe connectionist models have the most to offer. While environmental variables have not been completely absent from mathematical models of human memory, they have not received sufficient focus and the mechanisms by which they have been introduced have been indirect. Bridging assumptions about the effects that exposure might have on the development of representations, and on the storage and retrieval mechanisms have been required. By contrast, the effect of the environment is intrinsic to error-correcting connectionist models.