Target Paper: Hadley, R.F. & Hayward, M. (1995) Strong Semantic Systematicity from Unsupervised Connectionist Learning. In J.D. Moore and J.F. Lehman (Eds) Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society (North American) , University of Pittsburgh, Lawrence Erlbaum Associates, 358-363.
Introduction
Following the claims of Fodor and Pylyshyn (1988) that connectionist models are incapable of demonstrating systematicity, several models have been proposed that achieve this requirement. Hadley (1994) subsequently showed that Fodor and Pylyshyn's definition was somewhat problematic, and has now proposed semantic systematicity as a more suitable criterion. Accordingly, Hadley and Hayward (1995) have proposed a model capable of displaying semantic systematicity. That is, given a range of input sentences, the model derives the appropriate internal representations for sentences with words being used in novel syntactic positions. For example, the network is able to form the appropriate representation for "Mary sees Bill", after previously seeing Bill only in the agent role.
The input data is derived from a simple recursive grammar, and all sentences are of the form "Noun Phrase-Verb-Noun Phrase", (with noun phrases possibly containing relative clauses). Importantly, all verbs used are direct, thus causing all phrases to be of the form Agent-Action-Patient. Furthermore, of the twelve nouns, eight appear only as either subject or object. All words are effectively represented as orthogonal unit vectors, and presented sequentially to the network. Although there is no real output from the model, the model can be evaluated by directly assessing the internal representation.
The networks consists of two distinct layers: an input layer much like the input layer of a standard MLP, and a semantic layer responsible for the internal representation. There are four types of nodes in the semantic layer - concept nodes, proposition nodes (p-nodes), binding nodes, and thematic site nodes. Proposition nodes form the basis of the representation. There are two distint types - a single master node, and three modifier p-nodes. The master p-node forms the overall base of the representation and if necessary, the modifier p-nodes are used as the basis of relative clauses.
The "core" of the p-nodes are attached to the input layer, and are also connected with weighted, trainable links to three thematic site nodes denoting the agent action and patient roles. Modifier p-nodes have an additional node, for binding relative clauses.
Concept nodes allow the representation of input words, and each of these is connected to the input layer with a trainable connection. Furthermore, each action concept is connected, via the binding nodes to each action theme site, and object concepts are similarly bound to agent and patient theme site. The representation in the semantic layer for "Jane sees Bill who likes Mary", is demonstrated in figure 1.

Weighted, trainable links exist between the input layer and the p-node cores and concept nodes of the semantic layer; and also between the p-node cores and each of its thematic site nodes. All training is done with a simple Hebbian algorithm.
If we consider the output of the system to be the resulting internal representation, then the success of the model on "novel" sentences closely addresses the cognitive task. Provided the network can successfully represent input sentences with words in novel syntactic positions, the model can be said to have achieved the cognitive goal - that of semantic systematicity.
Memory
Obviously the model needs to retain the structure of the current sentence. Additionally, the model must retain information on the general apparent structure of sentences. That is, it must know the agent-action-patient sequence of the grammar, and it must also recall that words that have appeared as subject, may also appear as object. Additionally, the network must learn which input word is associated with which concept (node), or the p-node core in the case of who.
Clearly the memory for the bindings of words and concepts is realized in the form of the strengths of connections between the input and semantic layers. The case of recalling general sentence sequence is less clear. It is entirely the role of the p-node cores to perform this function, and the information is preserved in the connections between the p-node core and its thematic role nodes.
Bindings take place between the most active unbound concept and thematic role nodes. The most active role node is that which is unbound and receives the largest boost from the p-node core. That is, the node with the strongest connection with the p-node core. In this way, the nodes are bound in the proper agent-action-patient order.
Thus, we have a fast process of storing the current sentence in the form of bindings within the semantic layer and a slower process of the p-nodes learning the apparent sequence-structure of the grammar. Note that the ability to recall that a concept has appeared as agent, and then allow it to appear as patient (ie the property of systematicity), is achieved by the sequencing mechanism of the p-nodes. When a concept is activated it will attempt to bind to the most active site node. As the sequence of input is always constant a word presented in a novel position will be bound according to the expected role in the sequence.
One final point to be made, regarding memory for concepts. Each concept must explicitly be associated with a concept node. Thus, as the number of concepts grows, more concept nodes must be added to the system.
Time
Time is used in Hadley's model as a sequencing agent only. Input is presented, in the form of a sequence of words. From part of this sequence (a sentence), the model must transform the temporal representation into a spatial one, which preserves the thematic roles of the input.
From the sequence of input words, the model must deduce the end of sentences - there are no explicit sentence delimiters in the input. The end of a sentence can be deduced when all role site nodes of the master p-node have been bound, as have all role sites of active mod p-nodes. However, on recognition of the end of a sentence, the network activations are externally reset to zero.
Processing is performed in a distinct sequence. Initially, the activations of all nodes are set to zero. The master p-node core is externally activated causing it to fire, spreading activation to its thematic site nodes. Assuming successful training, the agent site wins a WTA competition, and is activated. Next, an input node is activated, and activation flows to the semantic layer. This should activate the associated concept node.
Binding then occurs between the most active site node, and the most active concept node (between Jane and the agent site in figure 1). This binding then causes decay in all active nodes, and the master p-node core to fire again, although this time the action role site should be activated. The action concept should similarly be bound to the action role site, likewise the case for the patient site.
In the event of a relative clause, the "who" lexical item should trigger the core of a modifier p-node. The bindings for the relative clause then follow in a similar manner to the bindings in the master p-node. (The role of "who" will be discussed more fully in the section on structure.)
There is no real concept of sequence in the network's representation. For example, the networks representation of "Bill sees Tom", is identical to the representation of "Tom was seen by Bill". Although these sentences convey the same proposition, the network is unable to distinguish the them. However, as the learned grammar can not contain passive sentences, there is no need to distinguish them. Thus, within the grammar under consideration, the general sequence is reflected in the relationship between the strength of the weights of the p-node core to role node connections, and the agent-action-patient structure of the grammar.
This has important implications for what the network is able to learn. If all sentences follow the same sequence, then the network is capable of learning the grammar. If however, the network mixes two different sequences, then the network should either fail to attain a suitable state, or fail to recognize one style of sequence. Hadley's model seems incapable of deducing both the class and the ordering of words. Given a constant ordering, the model is able to derive the classes of words. Similarly, if words are bound to a single class, then arbitrary orderings may be imposed (although this would now allow systematicity). It is not possible, for the model to perform given differing role orderings, and differing role-concept bindings.
Change
Clearly the majority of change is due to changes in activations, weights, and the particular bindings in effect. While most of these changes are relatively straightforward, the use of decay is an important aspect of the model.
After a binding takes place, all nodes undergo a small decay. This allows the network to process relative clauses. Consider the sentence "Bill who likes Mary saw Jane." "Bill" is first bound to the agent role and active nodes decay. A competition among the remaining site nodes of the master p-node is then won by the "action" role. The presentation of who causes a mod p-node core to fire, and the ensuing WTA competition at this p-node results in the agent site winning. The two most active nodes ("Bill" and agent) are bound, and all active nodes undergo decay. The competition at the mod p-node core is won by the "agent" role.
When "sees" is presented, there are two active unbound nodes - the agent roles of the master and mod p-nodes. As the master p-node's site node has been active longer and decayed more, the mod p-node's site wins the competition, and is bound correctly. Without decay, the network could not have correctly processed the sentence, as two role site nodes would have had equal activation.
Most other aspect of change have been addressed in the previous section on memory, particularly with regards to the change in the networks weights due to learning.
Structure
Due to the nature of the semantic network, there is an obvious structure to network representations. The semantic layer of the model is similar to a traditional semantic net. Clearly the model uses a substantially symbolic representation, in that complex representations (propositions) directly spatially contain their constituents (concepts).
The general representational structure is built directly into the architecture, through the use of "typed" role and concept nodes. Most structure only changes through different bindings between nodes. However, the binding patterns are regulated by the learning of the agent-action- patient sequence. It is this pattern which dictates the final structure of the representation.
There are however, some serious limitations in what the network is able to represent. Due to a limitation in the number of mod p-nodes (there are only three), the network is only able to represent sentences with no more than three relative clauses. The grammar itself does not make this limitation, so the network has not completely learned the grammar (which allows arbitrary depth).
As alluded to earlier, the model represents propositions in the passive and active voices in an identical manner. However, the model is incapable of processing sentences in the passive voice. Although passive voice sentences are not included in the grammar, the networks inability to process them is concerning due to the heavy reliance on role sequences.
The role that "who" plays in the model should also be carefully considered. Unlike other words in the grammar which are associated with concept nodes, "who" becomes associated with mod p-nodes. This is as a result of the learning mechanisms, where the mod p-nodes and "who" are correlated more strongly than other parings, and hence have the strongest connection. Hence, "who" is the only word not associated with a concept node. This then allows the model to use "who" as the marker of a relative clause. Once the network is presented with "who" it builds a mod p-node structure.
Despite some of these reservations, the network also has some very good properties. Obviously, the network can generalize as this was the major aim. Generalization occurs in two ways. Primarily, the network is capable of processing words in novel syntactic position. It is also capable of generalizing to novel levels of embedding. During training, the sentences have a maximum relative clause embedding of one, while during testing the network succeeds in processing sentences with a depth of three relative clauses.
Discussion and Conclusions
Hadley has certainly given us a model which attains his criterion of semantic systematicity. However, its reliance on context-free semantic nodes (and hence symbolic nature) mean that the model cannot be used to answer the criticisms of Fodor and Pylyshyn. The network has shown its generalization power with the target grammar, although it is doubtful that the network could be extended to more complex grammars, particularly those including using a combination of passive and active voice phrases.
Indeed, the network's ability to generalize may be as much a property of the grammar as the network itself. As the roles of the sentence map on to the grammar in such a regular manner, the network need only learn the sequence of roles, and what to bind them to. Obviously, in a real language, the approach taken here will not work. Although there remains the possibility that binding could be delayed until roles could be determined, this would require a substantial rework of the model.
Despite the fact that the model has achieved the chosen goal of semantic systematicity, it has not done so in an entirely successful and convincing manner. There is also some loss of value in the model due to its symbolic nature. However, probably its greatest achievement has been to highlight the deficiencies in the definition of semantic systematicity. It is doubtful that many would regard this model as achieving systematicity in any satisfying way. If this models has served to draw attention to areas where the definition could be lacking, then it has still been successful.
References
Fodor, J. & Pylyshyn, Z. (1988) Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3 - 71.
Hadley, R. (1994) Systematicity in connectionist language learning. Mind & Language, 9, 431-444.