[Table of Contents]

How Far Do Neural Net Models Account for Human Reasoning?

Graeme S. Halford
Psychology Department
University of Queensland

and

William H. Wilson
School of Computer Science and Engineering
University of New South Wales

Author's address: Graeme Halford, Psychology Department, University of Queensland, 4072, Australia. Fax: 61 7 365 4466. Email: gsh@psych.psy.uq.oz.au

We would like to consider how far neural net models account for some of the higher human reasoning processes, and then address the methodological issue of how we can assess neural network performance relative to that of humans. I will start with a specific reasoning problem, the balance scale, child and adult performance on which has been modelled by McClelland (1995).

The balance scale comprises a beam on a pivot with pegs equally spaced from the fulcrum on each side. One or more weights can be placed on the pegs. The beam balances when the product of weight and distance on the left equals the product of weight and distance on the right. That is when Ws( ,l) x Ds( ,l) = Ws( ,r) x Ds( ,r). The literature has been reviewed elsewhere (Halford, 1993; McClelland, 1995), but for our purposes it will be sufficient to note that participants have been asked to perform a number of tasks, each of which requires them to specify one factor, given the rest. Perhaps the commonest procedure is to set up weights on both sides, with the beam on chocks, and ask subjects to predict which side will go down, or whether the beam will balance. That is, they are asked to predict the balance-state, a variable with three levels, left side down, balance, and right-side down. We will refer to this as the balance-state question.

In another procedure, subjects are shown a weight on one side, and a specific peg is indicated on the other side, and they are asked to indicate what weight would be required on that peg for the beam to balance. For example, if there are three weights on peg 2 (2 steps out from the fulcrum) on the left, how many weights would be required on peg 3 on the right side to make the beam balance? We will refer to this as the missing weight question.

In the third type of procedure they can be shown a set of weights on a peg on one side, and asked on which peg a specified number of weights would have to be placed on the other side if the beam were to balance. For example, they might be shown 3 weights on peg 4 on the right, and asked on which peg 6 weights would need to be placed on the left side to make the beam balance. We will refer to this as the missing distance question.

We will assume that a person with reasonable understanding of the balance scale would be able to answer any of these questions, assuming that extraneous factors such as numerical competence were eliminated. Therefore, as a first approximation, we will define understanding of the balance scale as a cognitive representation, or mental model, of the relations between Ws( ,l), Ds( ,l), Ws( ,r), Ds( ,r) and the balance-state. Representation of these relations would mean that, given values for any four variables, they can predict the fifth. More generally, any subset of N-1 variables can be input, and the output is the remaining variable. We call this the flexibility property.

More complex performances, such as predicting what happens to one variable as another is varied while holding the rest constant, should also be possible for anyone who understands this concept. Such understandings have been examined using information integration theory (Surber & Gzesh, 1984). More importantly, given such understanding, it should be possible to develop appropriate strategies for dealing with the task, given appropriate experience. As planning net models in various fields have shown (Greeno, Riley, & Gelman, 1984; Halford, Smith, Dickson, Maybery, Kelly, Bain, et al., 1995), experience interacts with a concept of the task to constrain strategy development. The importance of this point is that it shows why understanding is essential to human cognition. It is not enough simply to model surface properties of performance. We need to model the underlying concepts that guide that performance.

Understanding of the concept therefore defines the target which neural net models are attempting to reach. If we want to model human reasoning, we have to encompass the "mental models", cognitive representations, or understandings which guide human cognition in its higher manifestations. This therefore will be the criterion that we will use in assessing the success of neural net models in this area. Interestingly, the task of assessing neural net models in this respect is analogous to the assessment of children's cognitive development. In either case we want to know whether the processes that are being performed amount to understanding of the relevant concept. We will return to this point later. Armed with this criterion, I will now consider McClelland's connectionist model of the balance scale.

Connectionist model of the balance scale

McClelland's (1995) model of human performance on the balance scale is essentially a three-layered net. There are 20 input units which code the weights and distances on the left and right, 5 units representing the 5 levels that were used for each of the 4 variables. There is a hidden layer, which codes relations between weights and distances on each side of the balance, and there is a set of output units which codes the balance-state, left-down, balance, or right-down.

The model gives a good fit to existing data on performance of children and adults. As the model is trained on a corpus of balance scale problems, it moves to performances which approximate those of progressively older children, thereby simulating the progression observed by Siegler (1981) and others. It also captures phenomena neglected by earlier theories, including the torque difference effect, which means that the size of the difference in torque between left and right affects judgment. This is interesting because there is no logical reason why human judgments should be influenced by the magnitude of the torque difference. For example, if are 3 weights on peg 2 on the left and 4 weights on peg 1 on the right (a small torque difference), we know the left side will go down. The judgment is therefore the same as when the torque difference is large, such as when there are 5 weights on peg 5 on the left, and 4 weights on peg 1 on the right. Though there is no logical basis for the magnitude of the torque difference having any effect in this task, it does affect human judgments. The neural net model captures this and other significant aspects of human performance which are neglected by more "rationally based" theories.

It is clear therefore that neural net models have something to offer in this and similar domains. We can still ask however whether the three-layered net used in McClelland's model simulates understanding of the balance scale. When we refer to our previously stated criterion, we note that the neural net performs only one task, the balance-state question. In effect it computes one function: given Ws( ,l), Ds( ,l), Ws( ,r) and Ds( ,r) as input, it computes the balance-state. The trained net could not compute the functions which correspond to the missing weight or missing distance question without retraining, which may result in catastrophic forgetting, so the model could no longer perform the balance-state task. The model seems inherently incapable of meeting our criterion for understanding the balance scale, the minimum requirement for which is that it can compute any of Ws( ,l), Ds( ,l), Ws( ,r), Ds( ,r), or the balance-state, given the other four. However we have seen that human performers, even children, can do this at least to some degree, so the model seems to fall short of human understanding in this respect. Given that the balance scale is a rather elementary concept, the model's performance, when judged by this criterion, seems rather modest, its achievements in accounting for the balance-state data notwithstanding.

Must we therefore be pessimistic about the prospects for neural nets accounting for human reasoning? We will argue not, but I think that further success will depend on better definition of the criteria for the relevant performances in each domain, together with a systematic comparison of neural net performances with these criteria. We will try to sketch out how this can be done. First however I will describe how understanding of the balance scale can be represented in a neural net architecture, though not without sacrificing many of the advantages of McClelland's model.

The balance scale can be represented as a Rank 5 tensor product VBALANCEŸVWlŸVDlŸ VWrŸ VDr. There are vectors representing the balance state, BALANCE, and each of the input variables, Ws( ,l), Ds( ,l), Ws( ,r), Ds( ,r). This is based on the approach of Halford and his collaborators (Halford, 1993; Halford, et al., 1994) in which tensor products are used to represent the binding between a vector representing a relation and vectors representing each of its arguments. This representation treats the balance scale as a quaternary relation, BALANCE(Ws( ,l), Ds( ,l), Ws( ,r), Ds( ,r)). The tensor product representation does fulfil at least the minimal criteria for understanding the concept as defined above, because any variable can be used as output, given the remaining variables as input. That is, given any four of Ws( ,l), Ds( ,l), Ws( ,r), Ds( ,r) and the balance-state, the remaining variable can be determined. Thus the balance-state question, as well as the missing-weight and missing-distance questions, can be answered. On the other hand the model, at least in its present form cannot handle training functions, or simulate the torque-difference effect that constitute an important aspect of McClelland's model.

At present no neural net model appears capable of capturing all aspects of a task such as the balance scale, but there seems to be no reason in principle why such a neural net should not be produced. Weights can be trained so they are multi-directional, which would permit the flexibility property of the tensor product model to be combined with those of the three-layered model. A net which achieved this for a quaternary relation such as the balance scale would probably be very complex, might need a large number of hidden, or binding, units, and may pose complex technical training problems, but that is not our primary concern here. Rather we want to provide a conception of what needs to be achieved in modelling human reasoning, then compare that goal with current modelling efforts.

The example of the balance scale illustrates the point that human concepts entail representation of relations, an argument that has been developed further elsewhere (Phillips, Halford & Wilson, 1995). The principle of moments, which underlies the balance scale, is essentially a quaternary relation. Other concepts also can be represented as relations, and the complexity of concepts can be defined in terms of the complexity of relations to which they are equivalent (Halford, 1993; Halford & Wilson, in preparation; Halford, Wilson, Guo, Gayler, Wiles, & Stewart, 1994). Relational complexity can be defined in terms of the number of arguments that a relation has. This concept of complexity can be shown to account for processing load effects, as well as developmental and inter-species differences (Halford, 1993; Holyoak & Thagard, 1995). I will define relational complexity, and give examples of concepts at each level. This will provide us with a basis for evaluating neural net models of human concepts.

Relational complexity of concepts

We will argue that relational complexity can be defined in terms of the "arity" of a relation; that is, the number of arguments that a relation has. The number of arguments corresponds to the number of dimensions of the space in which the relation is defined. An N-ary relation Rs(n, )(as( ,1),as( ,2),. . ,as( ,n)) is a subset of the cartesian product Ss( ,1)x Ss( ,2)x . . x Ss( ,n). It is a set of ordered n-tuples {. . (a,b, . . n)} such that R(a,b, . n) is true. In nontrivial cases, each Ss( ,i), or each argument of R , can be instantiated in more than one way, and therefore represents a source of variation, or dimension. The number of arguments, N, corresponds to the number of dimensions in the cartesian product space, and therefore provides a measure of relational complexity. A proposition is a specific instantiation of a relation. It is the smallest unit of knowledge that can have a truth value (Anderson, 1980). It has two components, a predicate, which is a symbol for a relation, and argument(s). Each argument represents a specific instantiation on a dimension of possible instantiations. For example, the proposition BIGGER-THAN(dog, mouse) includes the relational symbol BIGGER-THAN, plus two arguments, dog and mouse. The arguments of BIGGER-THAN can be instantiated in other ways, so each argument provides a source of variation. Each instantiation yields a different proposition; BIGGER-THAN(whale,dolphin) instantiates the arguments of BIGGER-THAN in a new way, yielding a new proposition. Each proposition is a point in the space defined by the relation.

Dimensionality is analogous to the number of variables in an experimental design. An experimental design can be thought of as a set of relations between independent and dependent variables. A one way experimental design is equivalent to a binary relation between one independent and one dependent variable. A two-way experimental design is equivalent to a ternary relation, between two independent and one dependent variables. The emergence of interactions in two-way designs is analogous to the emergence of three-way comparisons in ternary relations. Just as an experimental design can be collapsed over factors which contribute no effects, the rank of a representation ignores components which do not contribute to the decision process.

We can define relations of varying complexity depending on the number of arguments they have, as follows. Each level of complexity corresponds to a class of concepts. Unary relation, R(x), is a binding between a relation and one argument. In terms of sets; a unary relation R on a set S is a subset of S. It is the set of objects {(x), . . } in S for which R(x) is true. The representational space has only one dimension, and the argument can be instantiated in only one way at a time. Unary relations can be interpreted as propositions with one argument, as variable-constant bindings, or as zero-variate functions.

A proposition based on a unary relation has a predicate with one argument, and can represent a state, such as HAPPY(John), an action, such as RAN(Tom), an attribute, such as BIG(dog), or class membership, such as DOG(fido). The argument can be instantiated in more than one way; in BIG(dog), "dog" can be replaced by elephant, whale, . . , hippopotamus. The argument therefore resembles a variable, or dimension. A binding between a variable and a constant can also be expressed as a unary relation; e.g. HEIGHT(1-metre). A zero-variate function is a special case of a unary relation; in a function mappings are unique. For example, ¼() = 3.1416, is a zero-variate function, equivalent to a symbolic constant. It defines a point in a space of possible values, the set of rational numbers.

Where bindings are dynamic, the relation may be changed in all-or-none fashion without external input. For example we can change HAPPY(John) to SAD(John). One component of the representation (John) remains the same, but when the other component is changed a new binding is formed. The representations are also independent of content to some extent. For example, HAPPY is the same whether its argument is John or someone else. At Rank 2 representations are no longer wholistic, but comprise components which have some degree of independence from each other. There is therefore some degree of compositionality.

Binary relations, R(x,y), can be represented as a binding between a relation and two arguments. For example, BIGGER-THAN(-, -) has two arguments, which can represent any pair of objects such that the first is bigger than the second. An example is the proposition BIGGER-THAN(dog,mouse).

In terms of sets, a binary relation on a set S is a subset S x S of elements of S. It is a set of ordered pairs {(a,b), . . . } such that aRb holds true. The representational space has two dimensions of variation, and the arguments can be instantiated in two ways at once. A univariate function, f(a) = b, is a special case of a binary relation, in which the mappings are unique; it is a set of ordered pairs, (a,b) such that for each a there is precisely one b such that (a,b ‘ f). A unary operator is a special case of a univariate function; e.g. the unary operator CHANGE-SIGN comprises the set of ordered pairs {(x, -x)}. More complex variations between components can be represented at Rank 3 than at Rank 2. The binary relation R(x,y) can represent the way x varies as a function of y, and vice verse, neither of which is possible at Rank 2. Higher-order relations also become possible with binary relations; for example, SAD(POOR(Joe)), meaning "It is sad that Joe is poor", defines a higher-order relation SAD, the argument of which is POOR(Joe). Ternary relations R(x,y,z), can be represented as a binding between a relation and three arguments. In terms of sets, a ternary relation on a set S is a subset S x S x S of elements of S. It has three dimensions, and its arguments can be instantiated in three ways at once. An example of a ternary relation would be a "love triangle", in which two persons, x and y, both love a third person, z. Most ternary relations can be "decomposed" into lower order relations.

A bivariate function is a special case of a ternary relation. It is a set of ordered triples (a,b,c) such that for each (a,b) there is precisely one c such that (a,b,c ‘ f).

A binary operation is a special case of a bivariate function. A binary operation on a set S is a function from the set S x S of ordered pairs of elements of S into S; i.e. S x S Æ S. For example, the binary operation of arithmetic addition consists of the set of ordered pairs of {. . , (3,2,5), . . , (5,3,8), . . , . . }; i.e. {(x,y,z) | x + y = z}.

The number of possible relations between elements increases again with ternary relations: Rs(3, )(x,y) is true iff $z such that R(x,y,z) is true. From it can be derived Rs(2, )(x,y), Rs(2, )(y,z), and Rs(2, )(x,z), as well as Rs(3, )(x,y,z). With a ternary relation, but not with unary or binary relations, it is possible to compare x with yz, or y with xz, or z with xy. It thus becomes possible to compute the effects on x of variations in yz, and so on.

Higher-order relations between binary relations also become possible at Rank 4. Thus we can define a relation such as MONOTONICALLY-LARGER(a,b,c), from which we can derive LARGER(a,b), LARGER(b,c), and LARGER(a,c). This exemplifies transitivity. Another concept based on ternary relations that has been important in cognitive development is inclusion, that is a and a' are included in b.

Quaternary relations, R(w,x,y,z) can be represented as a binding between a relation and four arguments. An example would be proportion; a/b = c/d expresses a relation between the four variables a,b,c, and d. It is possible to compute how any element will vary as a function of the others. With a quaternary relation all the comparisons that are possible with ternary relations can be made, as well as four-way comparisons; the effect on w of variations in x,y,z, the effects on x of variations in w,y,z, and so on.

Quaternary relations can also be interpreted as functions or as operations. A trivariate function is a special case of a quaternary relation. It is a set of ordered 4-tuples (a,b,c,d) such that for each (a,b,c) there is precisely one d such that (a,b,c,d ‘ f).

Quaternary relations may be interpreted as a composition of binary operations. For example (a + b) x c = d is a quaternary relation. As we have seen, the balance scale concept, or principle of moments might be considered as a quaternary relation. Another example would be proportion, a/b = c/d. In this case a relation is defined between four variables. Any variable can be related to any one or more of the others; for example variations in a can be expressed as a function of variations in any of b, c, or d, or as a function of b and c, b and d, c and d, or as a function of b,c,d. Thus proportion is a four dimensional concept.

More complex concepts Concepts based on relations with more than four arguments exist of course, but our assessment of the working memory literature has led us to the conclusion that quaternary relations are the most complex that are processed in parallel (Halford, et al., 1994). More complex concepts are segmented into components that are processed serially, or chunked into fewer dimensions. Thus we suggest that the four levels of relations defined above are sufficient to account for human reasoning. The question which now arises is the ability of neural net models to deal with these concepts. All these relations can be implemented as tensor products, as we will see in the next section.

Implementing relations as tensor products

Smolensky (1990) has shown that variable bindings can be represented by the tensor product of vectors representing the variable and the constant; i.e. XŸC, where X is a variable and C is a constant. Halford et al. (1993) have extended this idea, and shown that predicate-argument bindings can be handled using tensor products of vectors representing the predicate and its arguments, as shown schematically in Figure 1.




Figure 1. Tensor product representation of relation-argument bindings for unary-quaternary relations (compared with 2- and 3-layerd nets). Rank refers to the number of related components in the internal representation. Each tensor product comprises a vector representing the predicate, and a vector representing each argument. Thus a predicate with n arguments is represented by P Ÿ as( ,1) Ÿ as( ,2). . . Ÿ as( ,n), where P is a vector representing a predicate, and as( ,1) . . . as( ,n) are vectors representing arguments. The relations defined above can be implemented as tensor products of vectors representing relations with appropriate numbers of arguments. A unary relation, R(x) can be represented by a rank 2 tensor product RŸx, as shown in Figure 1. A binary relation, R(x,y) can be represented by a rank three tensor product RŸxŸy. Similarly, a ternary relation, R(x,y,z) can be represented by a rank four tensor product RŸxŸyŸz, and a quaternary relation, R(w,x,y,z) by a rank five tensor product RŸwŸxŸyŸz.

In the tensor product representation, given orthonormal vectors, with N-1 vectors used as inputs, the Nth vector can be recovered, thus fulfilling the flexibility requirement defined above.

The rank of a tensor product is not determined by its geometric shape, but by the interconnection between units representing the vectors. Any of the tensor products in Figure 1 can be "unwrapped" by concatenating the vectors so they form a linear array, and the representation would appear to be Rank 1. However for relations to be represented with the flexibility property defined earlier, it would be necessary to maintain the connections between the units in each of the vectors. These are shown for Rank 2 tensor products in Figure 1. These connections through binding units are critical to the computations the representation performs, and the connections, rather than the spatial layout, determine the rank of the tensor product. Thus the resemblance to a Rank 1 representation would be more apparent than real.

Relational concepts as criteria for neural nets

If we accept that human cognition entails processing relational concepts as indicated above, then we can ask whether neural net models are adequate to handle such concepts. They can be represented by tensor products as we have seen, but there are weaknesses in tensor product representations, the main ones being that the representations are "hand-wired" and cannot be learned. Another is that, the requirement for representations to be orthonormal to give reliable outputs conflicts with coding for similarity.

The major alternative seems to be some sort of three-layered net or one of the elaborations that have been explored, such as the back-propagation net. The question however is whether such a net can represent predicate-argument bindings for relations from unary to quaternary. A three-layered net can compute any of the relevant functions. It could, for example, compute the functions required for the balance-state, missing-weight and missing-distance questions in the balance scale task. However conventional three-layered nets do not represent relations per se, in the sense that they do not represent predicate-argument bindings in the way that can be done with tensor products, as discussed above. On the other hand there seems to be no reason in principle why nets cannot be constructed that will do this. Hinton (1990) has already trained nets that represent family relations. These are essentially binary relations (uncle of etc.), and there seems no reason in principle why nets should not be developed which represent higher-rank relations. This is a technical challenge for the future.

Our main concern here however is how we will know when a net represents a particular concept. This is essentially the same problem that has preoccupied cognitive developmental psychologists for at least three decades. As the example of the balance scale illustrates, essentially the same criteria can be applied in both cases. Understanding the task can be defined in terms of a set of input-output mappings which can be produced by a representation of that concept, and not without that concept. In practice it is not easy to devise a set of input-output mappings that uniquely define understanding of a particular concept, but there are enough cases where this enterprise has been successful to suggest that it can be done. An important caveat however, in both cognitive development and neural net modelling, is that the nature of the concept must be analysed with care, otherwise a lot of unnecessary controversy may occur.

The example of the balance scale illustrates this point. McClelland (1995) has produced a very fine model of one aspect of performance on the balance scale. However it does not model understanding of the balance scale in any comprehensive sense. In order to be credited with understanding the balance scale, a human being would have to be capable of many things which McClelland's three-layered net could not perform. Suppose we were testing a child to see if she understood the balance scale. Suppose the child gave nearly flawless answers to the balance-state question, but performed no better than chance on the missing weight and missing distance questions, we think we would be reluctant to credit her with understanding of the balance. At the least she would have to demonstrate passable performance on all three question types, and ideally should recognize all relations between the relevant variables. Thus understanding the concept essentially means representing relations between all the variables involved. In the case of the balance scale, understanding amounts to representation of a quaternary relation, as defined earlier.

Though many of the details have been omitted here, our main contention is that human concepts can be defined in terms of the relations to which they are equivalent, and understanding the concept is equivalent to representation of that relation. This argument has been elaborated by Phillips et al. (1995) and the representation of more complex concepts has been considered elsewhere (Halford, 1993; Halford, et al., 1994). However our point here is that this definition provides a criterion for adequate performance of neural nets in simulating human performance in respect of a given concept, just as it provides a criterion for the performance of children in respect of the same concept.

References

Anderson, J. R. (1980). Cognitive psychology and its implications. New York: W. H. Freeman.

Greeno, J. G., Riley, M. S., & Gelman, R. (1984). Conceptual competence and children's counting. Cognitive Psychology, 16, 94-143.

Halford, G. S. (1993). Children's understanding: the development of mental models. Hillsdale, N. J.: Erlbaum.

Halford, G. S., et al. (1995). Modelling the development of reasoning strategies: The roles of analogy, knowledge, and capacity. In T. Simon & G. S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Cognitive Modelling Hillsdale, NJ: Erlbaum.

Halford, G. S., et al. (1993). Parallel distributed processing approaches to creative reasoning: Tensor models of memory and analogy. In T. Dartnall, S. Kim, & F. Sudweeks (Ed.), AI and creativity: Proceedings of the AAAI Spring Symposium, . Stanford, March 1993:

Halford, G. S., & Wilson, W. H. (in preparation). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. In Halford, G. S., et al. (1994). Connectionist implications for processing capacity limitations in analogies. Norwood, NJ: Ablex.

Hinton, G. E. (1990). Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46, 47-75.

Holyoak, K. J., & Thagard, P. (1995). Mental leaps. Cambridge, MA: MIT Press. McClelland, J. L. (1995). A connectionist perspective on knowledge and development. In T. Simon & G. S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Cognitive Modelling Hillsdale, NJ: Erlbaum.

Phillips, S., Halford, G.S., & Wilson, W.H. (1995) The processing of associations versus the processing of relations and symbols: A systematic comparison. Proceedings of the 17th Annual Cognitive Science Conference, Pittsburgh, July 22-25, 1995.

Siegler, R. S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46, 1-84. Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2), 159-216.

Surber, C. F., & Gzesh, S. M. (1984). Reversible operations in the balance scale task. Journal of Experimental Child Psychology, 38, 254-274.