[Table of Contents]

The Correspondence Between Psychological and Network Variables in Connectionist Models of Human Memory

Simon Dennis
Department of Psychology
The Univeristy of Queensland
mav@psy.uq.oz.au
http://psy.uq.oz.au/~mav/home.html

Abstract

The field of human memory has a long history of mathematical modelling. Determining how the many psychological variables examined in the empirical literature relate to the constructs in these models is a well developed topic. This paper provides brief summaries of how two of the most prominent of the memory performance indicies, accuracy and reaction time have been modelled in vector based accounts*. In addition, some possible extensions that are suitable for connectionist architectures are presented. Finally, the way in which connectionist models allow a more direct treatment of environmental variables is illustrated with a review of a case study involving the word frequency effect in recognition memory.

*Vector based accounts are those that conceptualize their inputs and outputs as vectors. The class includes both the feature/localistic representations of Minerva II (Hintzman, 1984) and the distributed representations of TODAM (Murdock, 1982), CHARM (Eich, 1982) and the Matrix model (Pike, 1984; Humphreys, Bain & Pike, 1989b).

Introduction

Vector based models of human memory such as TODAM (Murdock, 1982), CHARM (Eich, 1982), Minerva II (Hintzman, 1984) and the matrix model (Pike, 1984; Humphreys, et. al., 1989b), which share many features with connectionist models, have been proposed by memory researchers since the 1970s (see Anderson, Silverstein, Ritz & Jones, 1977). There has been a long history of relating the psychological variables seen in the laboratory with those present in these models and, hence, it is worthwhile reviewing these attempts when determining the correspondence in connectionist models.

Most mathematical models in the memory literature concentrate on the inputs and outputs of memory tasks. Significant focus has been placed on input variables such as list length, number of repetitions, presentation rate, item spacing and item similarity. Output variables such as accuracy, reaction time and familiarity ratings have also figured prominently, although accuracy has been studied most intensively.

By contrast to these earlier mathematical models, in error correcting backpropagation models, such as the Hebbian Recurrent Network (Dennis, 1993), performance is determined not only by the nature of the mechanism, but also by the parameters of the training set. Such mechanisms allow the inclusion of a new set of variables that correspond to the statistics of the environment with which the subject is faced on a day to day basis.

For the purposes of this paper, the variables will be divided into input, output and environmental parameters. Table 1 outlines this division and presents a summary of some of these psychological variables and the corresponding model variables including those that will be addressed in this article.

In general, the methods used to represent input variables in the vector based models translate very directly to connectionist models. For instance, words are often represented by vectors, the similarity of which is taken to correspond to experimental notions of similarity. Since connectionist models also take vectors as input the same assumptions can be made. Therefore in this article, I will concentrate on the output and environmental variables. The two major output variables which have been considered in the memory literature are accuracy and reaction time. The following two sections summarize this work and describe some of the more recent methods which can be applied to connectionist memory models. The fourth section discusses environmental variables and includes a review of a case study of constructing a training set based on environmental analysis.

Table 1: Summary of some of the psychological and model variables which have been taken to correspond in recent models of human memory. The first column describes the psychological variable which might be observed either in the laboratory setting or in the environment. The second column identifies the model construct which has been associated with that psychological variable and the third column lists the models in which this correspondence has been made. The correspondences listed as "possible" have not been made in the memory literature, but are very natural extensions within a connectionist framework.

Input Variables

Psychological VariableModel VariableModel
list length in the experimental setting list length in the test set SAM, TODAM, Minerva II, Matrix, HRN, Ratcliff90
# of repetitions in the experimental setting # of repetitions in the test set SAM, TODAM, Minerva II, Matrix, HRN, Chappell93
presentation rate in the experimental setting presentation rate in the test set SAM, TODAM, Matrix, Chappell93
item spacing in the experimetnal setting item spacing in the test set SAM, Minerva II
vocabulary # different items in the test set SAM, TODAM, Minerva II, Matrix, HRN, Chappell93, Ratcliff90
similarity of experimetnal wordsdot product of vectors representing words TODAM, CHARM, Matrix, Minerva II

Output Variables

Psychological VariableModel VariableModel
accuracy i.e. d' in the experimental setting accuracy i.e. d' based on familiarity compared against criterion SAM, TODAM, CHARM, Minerva II, Matrix
accuracy i.e. d' in the experimental setting accuracy i.e. d' based on maximally active output node HRN
accuracy i.e. d' in the experimental setting accuracy i.e. d' based on pattern to which autoassociator converges Chappell93
reaction time number of processing cycles to stability Chappell93
familiarity ratings sum of cubed dot products Minerva II

Environmental Variables

Psychological VariableModel VariableModel
word frequency in the environment e.g. text word frequency in the training set HRN
recurrence probability in the environment e.g. text, speech recurrence probability in the training set HRN
development time training epoch possible
learning trial training epoch possible
Note: The accuracy with which a subject produces the correct response to a memory test is commonly reported as a d' value (i.e. z(hit rate) - z(false alarm rate) where z is the standard transformation).

Accuracy

In the recent mathematical memory literature the most common tasks that have been addressed are recognition, cued recall and free recall.

In recognition memory tasks, subjects are required to respond either yes or no depending on whether they believe they saw the current test word in the study list. The literature on modeling recognition memory has been dominated by the global matching models (Humphreys, Pike, Bain & Tehan, 1989). While these models can differ significantly in detail, they all employ a signal detection framework which involves some mechanism for calculating a familiarity value that is subsequently compared against a criterion (see figure 1).

Figure 1: Signal Detection Theory in recognition memory: A famililarity value is calculated and compared against a criterion. If the current word is new, its familiarity value will the sampled from the Distractor distribution. If the word is old it will be sampled from the Target distribution.

A similar method can be applied to connectionist models. Ratcliff (1990) trained a three layer backpropagation network on a series of study vectors. The network was an autoassociator, that is, was required to reproduce the input vector on the output units. To test the network, he presented it with the test item and calculated the familiarity measure by taking the dot product of the output with the input vector. This value was then compared against a criterion (see figure 2).

Figure 2: Using signal detection theory with a backpropagation network.

An alternative to using the signal detection framework is to dispense with external decision mechanisms and allow the network to generate a response. Hence, in the context of recognition the output becomes a yes/no node or perhaps one yes and one no node (see figure 3). Such an approach was suggested by Ratcliff (1990) and is used in the HRN (Dennis, 1993). This approach simplifies the model, but raises questions about whether the appropriate teaching information could be available within the human cognitive system.

Figure 3: Allowing the network to make the recognition decision.

In the recall paradigms subjects are required to respond with an item from the study list. In the vector based models a search through the lexicon is conducted and the target vector which is the closest (has the largest dot product with) the retrieved vector is chosen. The same mechanism can be applied in connectionist networks.

As an alternative, Chappell (1993) used a Hopfield style autoassociative memory as a second phase to retrieval. The Hopfield style network was allowed to cycle until it was stable (i.e. stopped changing). The resulting vector was either one of the target vectors in which case it was scored as a hit, or a zero vector, incorrect target vector or linear combination of target vectors in which case it was scored as a miss.

Another alternative is to use a local rather than distributed code for the output vectors (c.f. Elman, 1989, 1990). The node with maximal output is chosen as the response of the network and is scored appropriately.

A final issue for consideration when calculating accuracy is to what does a single network correspond. One possibility is to simulate a single network and to compare these results against the aggregate results from subjects. For many feedforward architectures such an approach is appropriate because final performance is not affected unduly by the initial conditions. In some recurrent architectures, where significant variations are observed as a consequence of initial weight configurations and input pattern orderings (when using online training), it may be more appropriate to consider each network as a separate subject. This is the approach adopted in the work on the Hebbian Recurrent Network and opens the possibility of studying individual differences which arise as a consequence of systematically different subject histories.

Reaction Time

In the area of human memory there has been a limited amount of work in modelling reaction time findings as compared against the work on accuracy (see Ratcliff, 1978; Murdock, 1982, for exceptions). Yet, reaction time is an important ingredient in model identification (Anderson, 1990) as demonstrated by the distributions for both recognition and cued recall that have been collected recently (Nobel & Shiffrin, 1992).

Of the vector based models, only TODAM (Murdock, 1982) includes an explicit mechanism for determining reaction time in recognition (see figure 4). The mechanism employed involves adding a time varying noise component to the familiarity value. Two criteria are set. A yes decision is made when the signal plus noise exceeds the high criterion and a no decision is made when the signal plus noise is under the low criterion. When the familiarity value is between the criteria, the system waits for the noise to take the value out of the middle ground. Since the mechanism is based on the familiarity value it can be applied as easily to any of the global matching models.

Figure 4: The reation time mechanism in TODAM. Reproduced from Murdock (1982).

While such a mechanism could be adopted in a connectionist framework, there are several architectures that have been suggested in which reaction time is intrinsic to the network model. One such possibility is to use the Hopfield style autoassociative "clean up" process (Chappell, 1993, see above). Each "processing cycle" of the network corresponds to a unit of time in the experimental situation. The reaction time is the number of cycles the network takes to reach stability. Such a model has been shown to have utility in modeling the related task of perceptual identification (Masson, 1989).

Another possibility is to use a time averaged net input to the activation function (Cohen, Dunbar & Mc Clelland, 1990; Reilly, 1993). That is,

where is the net input to unit j at time t, is a decay parameter, is the activation of unit i at time t, and is the weight between unit i and unit j at time t. As the activation of a output node increases, evidence is accumulated suggesting that it is the required output. A criterion is set and network is said to have responded when the evidence exceeds this criterion. Such an approach was adopted successfully by Cohen et. al. (1990) in their model of the Stroop effect and by Reilly (1993) in his model of eye movement in reading.

A third possibility is to time average the activation itself (Mozer, 1992; Cottrell, Nguyen & Tsung, 1993), that is:

where f is the asymmetric sigmoid. Mozer (1992) improved the induction of temporal structure of a recurrent network using the above activation equation, but found it sensitive to the choice of . Mozer suggested including as one of the parameters to be optimized. Cottrell et. al. (1993) found that such a system was more robust in tracking sinusoidal waves. The performance on more complicated tasks is an open question.

Environmental Variables

Error-correcting connectionist models, such as the HRN (Dennis, 1993), differ from many mathematical memory models in that they are sensitive to the environment. Quite different behaviour can be expected depending on the statistics of the training set. Variables such as word frequency, recurrence probability and the word density within the training set can qualitatively and quantitatively change the behaviour of a connectionist network. Consequently, it is important to determine the statistics of the environment so as to justify the parameters of the training set.

At this stage it is too early to give a comprehensive list of the important environmental variables or the possible mechanisms for providing environmental sensitivity. Instead, a review of a case study (Dennis, 1993) is presented to provide a concrete example of the nature of these variables and the way in which they can affect connectionist models.

Environmental Analysis Case Study

The word frequency effect in recognition is one of the most robust effects found in the memory literature. Words that occur with high normative frequency in the language are recognized more poorly than low frequency words (e.g. Gorman, 1961; Schulman & Lovelace, 1970), a result which has spawned significant debate in the literature (Allen & Garton, 1968; McCormack & Swenson, 1972; Schulman, 1967; Gorman, 1961; Zechmeister, 1969, 1972; Kinsbourne & Geoarge, 1974; Zechmeister, Curt & Sebastian, 1978; Underwood & Freund, 1970; Underwood, 1972; Mandler, 1980; Gillund & Shiffrin, 1984; Glanzer & Bowles, 1967; Glanzer & Adams, 1990; Glanzer, Adams & Iverson, 1991). Word frequency is the most widely studied environmental variable and provides a solid starting point for evaluating the sensitivity of connectionist models to the environment. On first examination, however, one would expect a connectionist network to favour the high frequency words, since they will be seen more often and there will be more opportunity for the network to adapt itself to these words.

The recognition task, however, involves determining whether a word has been repeated from the study list, that is, repeated with in the current list context. It is about accuracy on the recurrence of the word, not the occurrence of the word. If recurrence frequency in current English usage is inversely proportional to occurrence frequency, a connectionist model which was sensitive to recurrence frequency might conform to the experimental data.

To test this hypothesis, Dennis (1993) examined three hundred messages from the connectionist mailing list to determine the recurrence probabilities of high and low frequency words. Each message was considered as a separate context. Before counting began, the messages were filtered to remove mail headers, ftp instructions, punctuation, misspellings, and function words.

To quantify the likelihood of recurrence of a word within the same context, the word density was defined as the number of occurrences of the word divided by the number of contexts in which it occurred.

Figure 5: Mean word density verses word frequency. The bars represent the 95% confidence intervals. Reproduced from Dennis (1993).

Figure 5 shows the average word density plotted against word frequency. On this sample, it is the case that low frequency words have higher recurrence rates than high frequency words.

A training set which reflected these environmental constraints was constructed. Certain input patterns were designated high frequency words and the rest were considered low frequency. The training set consisted of a series of study sequences followed by test trials. The frequency of the high frequency patterns in both the study and test phases was always greater than that of the low frequency patterns. The probability of a low frequency word being tested given that it was in the study list, however, was greater than for high frequency words.

When presented with the above training set the HRN model (Dennis, 1993) showed a low frequency word advantage in recognition.

The success of the work in this study depended on a careful examination of the task and the selection of the environmental variable (in this case recurrence probability) that most closely reflected the task. Once this variable had been identified, however, it was still necessary to determine how to map this into the statistics of the training set. The practicalities of simulation require that the environmental statistics be simplied, but in so doing one must avoid overlooking environmental constraints which may alter the results of simulation. For example, it would have been possible to ensure that the recurrence rate of high frequency words was lower than that of low frequency words, but neglect to ensure that the high frequency words continued to occur with higher frequency. Clearly, this would have seriously undermined the validity of the results.

Summary and Conclusions

The number and variety of psychological and model variables that have been considered in the memory literature is too great to be adequately covered in a short review article. With this in mind, I have selected the two most significant output variables consider to date, namely accuracy and reaction time, and have attempted to briefly summarize the methods used in the vector based predecessors to connectionist models and to outline extensions to current connectionist architectures which may be useful in accounting for these indicies of performance.

It is in the area of environmental constraints, however, that I believe connectionist models have the most to offer. While environmental variables have not been completely absent from mathematical models of human memory, they have not received sufficient focus and the mechanisms by which they have been introduced have been indirect. Bridging assumptions about the effects that exposure might have on the development of representations, and on the storage and retrieval mechanisms have been required. By contrast, the effect of the environment is intrinsic to error-correcting connectionist models.

References