KEYWORDS: computation, connectionism, consciousness, dissociation, mental representation, phenomenal experience
ABSTRACT: When cognitive scientists apply computational theory to the problem of phenomenal consciousness, as many of them have been doing recently, there are two fundamentally distinct approaches available. Either consciousness is to be explained in terms of the nature of the representational vehicles the brain deploys, or it is to be explained in terms of the computational processes defined over these vehicles. We call versions of these two approaches VEHICLE and PROCESS theories of consciousness, respectively. However, while there may be space for vehicle theories of consciousness in cognitive science, they are relatively rare. This is because of the influence exerted, on the one hand, by a large body of research which purports to show that the explicit representation of information in the brain and conscious experience are dissociable, and on the other, by the classical computational theory of mind: the theory that takes human cognition to be a species of symbol manipulation. Two recent developments in cognitive science combine to suggest that a reappraisal of this situation is in order. First, a number of theorists have recently been highly critical of the experimental methodologies used in the dissociation studies -- so critical, in fact, that it is no longer reasonable to assume that the dissociability of conscious experience and explicit representation has been adequately demonstrated. Second, computationalism, as a theory of human cognition, is no longer as dominant in cognitive science as it once was. It now has a lively competitor in the form of connectionism; and connectionism, unlike computationalism, does have the computational resources to support a robust vehicle theory of consciousness. In this paper we develop and defend this connectionist-vehicle theory of consciousness. It takes the form of the following simple empirical hypothesis: phenomenal experience consists in the explicit representation of information in neurally realized pdp networks. This hypothesis leads us to reassess some common wisdom about consciousness, but, we will argue, in fruitful and ultimately plausible ways.
Full article available here.
Abstract:
O'Brien and Opie's theory of consciousness relies heavily on a distinction
between explicit activation vectors and inexplicit weight vectors. But
determining which representations are explicit vehicles requires appeal
to process, and so their vehicle theory is in fact a process theory.
Full article available here.
NB: In this exchange, indented material represents quotation from the
other participant's previous posting
Date: Mon, 30 Mar 1998
From: Gerard O'Brien
Dear Hugh,
Jon and I presumed our own rendering of the explicit/inexplicit distinction
would come in for some criticism (largely because this has been a contentious
issue in philosophy for some while). One point we would make is that our
reliance on Dennett is perhaps not as heavy as you suggest. While Dennett's
taxonomy does provide us with some inital scaffolding, our account goes
further than his, in that it is developed in the context of specific computational
architectures (Turing machines and PDP systems). Consequently, it doesn't
much matter to us whether the whole of our account can be extracted from
Dennett (though, as a matter of fact, we think that most of it can). Our
account is more appropriately judged on the way it illuminates the different
ways in which information is coded in these architectures.
Date: 23 Apr 1998
From: Gerard O'Brien
Dear Hugh,
I know Van Gelder flirted with the notion of superpositionality of
activation patterns in some of his earlier work (especially in his 'What
is the "D" in "PDP"?' [17]), but my own view is that
at this point in his discussion he was mistakenly conflating the properties
of activation pattern representations with connection weight representations.
He says, for example, that "individual units can be involved in representing
many different entities at the same time, for the characteristic patterns
for two different entities can be activated at once over the same set of
units. The representings of the two different entities can in this way
be superimposed on each other" (pp.41-2). But understood as a point about
activation pattern representations, this claim is surely false. One simply
cannot have more than one "characteristic" pattern of activity generated
across the same set of units at any one moment in time. However, understood
as a point about connection weight representation (that is, reading van
Gelder's talk of "patterns activated" as a reference to the patterns of
connection weights) this claim is fine. I think van Gelder intended this
claim to be about activation pattern representations as well as connection
weight representations. But I also think he is just wrong about this. (See
also Any Clark's discussion of superpositionality in Associative Engines[4].)
Putting van Gelder aside, I can't make any sense of the "superposition" of contents in activation patterns. One could certainly claim that the same activation pattern is capable of supporting multiple interpretations (though in actual fact, this is not what happens in connectionist practice). But this would amount to ambiguity, not superpositionality. What do you mean by superposition in activation patterns? What is superposed? And how is this superposition achieved?
Date: 27 April 1998
From: Hugh Clapin
Dear Gerard,
Here's a simple example. Take a simple three-layer network. Input A causes
characteristic pattern a across the hidden units, and input B causes characteristic
pattern b across the hidden units.
If you add input A to input B, then the activation pattern will be a + b - the superposition of a and b. So superposed input activation vectors cause superposed hidden activation vectors. I submit that this hidden unit activation vector is a superposed representation of A and B.
Or, if you don't like superposing the inputs:
"Consider a three-layer network whose input units are divided into
two subsets which allow two inputs to be presented simultaneously, one
input to each subset of input units. In such a network the activation vector
corresponding to the hidden units would superpositionally represent the
two inputs." (from my BBS commentary as submitted [1])
So here are two examples of superpositional activation vectors. In each case two characteristic patterns of activity are superposed to create a single pattern of activity.
The more general point here is that superpositionality isn't confined to intuitively tacit representations like weight vectors: representations that are unproblematicly explicit can also be superpositional. Consider the example of sound. A piano is played, causing a 'characteristic pattern' of air pressure at the point at which a microphone picks it up (which then transduces the air pressure pattern into a pattern of voltage). A singer sings, similarly causing a characteristic pattern of air pressure. If the piano and singer make a noise at the same time, then their characteristic patterns of air pressure are superposed (everywhere, but in particular) at the point where the microphone picks up the signal and transduces it etc. I take sound recording to be an uncontroversial example of superpositional coding. Now the air pressure at the microphone is reproduced by your speakers when you play a CD, in the meantime being communicated via wires, and stored on a medium like a CD or cassette. I submit that the voltage pattern in the wires, the pattern of pits on your CD, (or of magnetic polarisation on your cassette tape), and the vibration of your speakers are each explicit representations of the singer's voice and of the piano's sound, and they are superpositional representations of both those things too. So some representations are both superpositional and explicit.
Which passage in Associative Engines do you have in mind?
Date: Fri, 08 May 1998
From: Gerard O'Brien
Dear Hugh,
You wrote in response to my point that one simply cannot have more
than one "characteristic" pattern of activity generated across the same
set of units at any one moment in time:
Take a simple three-layer network. Input A causes characteristic pattern a across the hidden units, and input B causes characteristic pattern b across the hidden units.But you haven't shown that you can have more that one pattern of activity across the same set of units at any one moment in time. All you have shown is that there is a third activation pattern that the hidden units can generate -- one that comes about by exposing the network to a different input (one that is produced by combing input A and input B). Moroever, in what sense is this third pattern a "superposed" representation? To answer this question you have to show how the contents of both earlier hidden unit activation patterns are somehow both "co-present" in the third activation pattern. And for this you need to tell a systematic story about the interpretation of these activation patterns (a story about the relationship between their vectorial properties and their content). But all of this is missing. Can you point to a PDP net in the literature that superposes representations in the way you claim?If you add input A to input B, then the activation pattern will be a + b - the superposition of a and b. So superposed input activation vectors cause superposed hidden activation vectors. I submit that this hidden unit activation vector is a superposed representation of A and B.
Moroever, your talk of the third actication vector being a superposed representation of "A and B" is rather odd. I thought A and B were inputs (input patterns?). Do you take the activation patterns across hidden layers of PDP networks to "represent" their inputs (input patterns)? What's the point of representing inputs? Surely the aim is to represent features of the task domain? Again, do you know of PDP nets that use their hidden layers to "represent" inputs?
Or, if you don't like superposing the inputs:But again, I don't understand. On my understanding, PDP systems PROCESS their inputs to achieve activation pattern representations of features of their task domains; they don't employ activation patterns to REPRESENT their inputs. So I don't understand your use of "superpositionality" here. (Other than to say that we are using the term "representation" quite differently. You seem to be using it to mean something purely syntactic, if such a term can be applied in the PDP context, whereas I am using it to refer to a vehicle with representational content.)"Consider a three-layer network whose input units are divided into two subsets which allow two inputs to be presented simultaneously, one input to each subset of input units. In such a network the activation vector corresponding to the hidden units would superpositionally represent the two inputs."
The more general point here is that superpositionality isn't confined to intuitively tacit representations like weight vectors: representations that are unproblematicly explicit can also be superpositional. Consider the example of sound ...Two things. First, the game here is different. Our claims are about how information is coded in PDP systems. I'm not sure that were committed to more general claims about explicit representations wherever they occur. But having said that, second, it seems to me that this example is a case of the representation of superposition, not superpositional representation.
Here's why. To the extent that "the voltage pattern in the wires, the pattern of pits on your CD, (or of magnetic polarisation on your cassette tape), and the vibration of your speakers" are representations, what do they represent? You say that they are representations of "the singer's voice and the piano's sound", but strictly speaking they are representations of a sequence of sound waves (which when it impacts on our brains causes us to have an experience of the singer's voice and the piano's sound). But in this respect, there is no superposition of representational content. They may represent something (a sequence of sound waves) that is itself a superposed product (of two other sequences of sound waves), but the representational vehicles here (the voltage pattern in the wires, etc.) don't themsevles superpositionally encode this information. Superpositional encoding occurs when there is a many-to-one mapping of contents onto representational vehicles. The vehicles you have described, to the extent that they are representations at all, represent a single sequence of sound waves; they don't in any way "represent" the two other sequences of sound waves of which this single sequence of sound waves is itself a superposed product. In short, the representation of something superpositional doesn't constitute superpositional representation.
Finally, the passage in Andy Clark Associative Engines to which I was referring occurs on pp.17-23. You will notice that Clark while discussing superpositionality in connectionism does not claim that activation patterns represent in a superpositional fashion. His focus is connection weight representation.
Date: Fri, 08 May 1998
From: Hugh Clapin
Dear Gerard,
Can you point to a PDP net in the literature that superposes representations in the way you claim?Not off the top of my head. I don't see any reason why such a system couldn't exist, however. I guess I'm assuming that your claim is not just that there are in fact no activation vectors which are superposition, but that there could not be. I agree that about the sort of story that would be needed to fill out the suggestion ("the contents of both earlier hidden unit activation patterns are somehow both "co-present" in the third activation pattern"), but I don't see why such a story couldn't be told. If you can tell a story about what makes the 'ordinary' hidden unit activation vector a representation of something, then I reckon I can tell the same story about the superposed hidden unit activation vector.
Moroever, your talk of the third actication vector being a superposed representation of "A and B" is rather odd. I thought A and B were inputs (input patterns?). Do you take the activation patterns across hidden layers of PDP networks to "represent" their inputs (input patterns)?OK - that was sloppy. Let me try again: thing-in-the-world alpha causes input pattern A causes hidden unit pattern a (and same for beta - B - b). The rest of the story stays the same except that the conclusion should read "I submit that this hidden unit activation vector is a superposed representation of alpha and beta."
But again, I don't understand. On my understanding, PDP systems PROCESS their inputs to achieve activation pattern representations of features of their task domains; they don't employ activation patterns to REPRESENT their inputs. So I don't understand your use of "superpositionality" here.Again I was sloppy. Let me try again:
Consider a three-layer network whose input units are divided into two subsets which allow two inputs to be presented simultaneously, one input to each subset of input units. In such a network the activation vector corresponding to the hidden units would superpositionally represent the content of the two inputs.
Is your problem here simply that I appeared to be saying that the hidden activation vectors represent the input activation vectors? That problem should be resolved by the less sloppy formulations.
In short, the representation of something superpositional doesn't constitute superpositional representation.Now you seem to be playing the game you accused me of in my sloppiness: making the content of a representation another representation. Surely the voltage pattern in the wires is, unproblematically and quite properly, a representation of the singer's voice and the piano's sound. As with all covariational causal represenation, there is a problem of where the content is: does the retina represent the world, or the pattern of light incident at the lens of the eye, or the pattern of light at a point midway between me and the nearest surface? I don't have a principled answer to this question, by the way.
Secondly, a mixing desk superposes two electrical representations of noises. If I have separate microphones for the piano and singer, and mix them in the desk, then superposition first happens in the realm of voltages - no way to construe that as representation of superposition, surely.
It is slowly dawning on me that we have quite distinct pictures
of the nature of connectionist computation and processing, and perhaps
that what's really at issue here. My suspicion is that your notion of what
makes something a representation or not in a PDP network will (perhaps
tacitly) appeal to the processes that state undergoes, or could undergo.
My proposed superpositional hidden unit activation vectors don't represent
both the inputs they're meant to (according to me) because they don't undergo
processes of a certain sort, don't enter into the right sorts of computations.
But of course such a story grounds the explicit/tacit distinction in process,
and the vehicle theory becomes a process theory.
Date: Fri, 08 May 1998
From: Gerard O'Brien
Dear Hugh,
If you can tell a story about what makes the 'ordinary' hidden unit activation vector a representation of something, then I reckon I can tell the same story about the superposed hidden unit activation vector.Okay, here's the story. Activation patterns are representations of "things" in virtue of relations of structural isomorphism that obtain between them. Consider, as an example, NETtalk. NETtalk transforms English graphemes into their appropriate phonemes, given the context of the words in which they appear. The task domain, in this case, is quite abstract, in that it is the (contextually nuanced) letter-to-sound correspondences that exist in the English language. Back propagation is used to shape its activation landscape, consisting of patterns across 80 hidden units, until this landscape is structurally isomorphic with its task domain. At this point, variations in patterns of activations systematically match variations in letter-to-sound correspondences. It is this structural isomorphism that is revealed in the now very familiar cluster analysis to which Sejnowski and Rosenberg subjected NETtalk. It is this isomorphism that makes it right and proper to talk, as everyone does, of a "semantic metric" across NETtalk's activation landscape.
Once it is trained up, NETtalk's connection matrix (its connection weights together with its pattern of connectivity) stores information about the 79 letter-to-sound correspondences in English. This information is coded in a superpositional fashion: the resources used to code any one letter-to-sound correspondence are co-extensive with the resources used to code any other. There is a many-to-one mapping of contents onto a representational vehicle. Whenever NETtalk is exposed to a graphemic input, it generates a pattern of activation across its hidden layer which represents the letter-to-sound correspondence for that grapheme (in its embedding context). This is not superpositional representation because there is a one-to-one mapping of content onto this representational vehicle. As a point in NETtalk's representational landscape (i.e., as an activation vector) it has its content fixed by the structural isomorphism that obtains between this landscape and the target domain. Therefore, in order to get activation patterns in PDP systems to code information in a superpositional fashion, you must show how a single activation vector can (at one and the same time) occupy more than one point in an activation landscape. (In this context it won't do to exploit sub-patterns of activation, as this merely generates different landscapes and hence "conjoint" contents, not different points in a single landscape.)
Can you do this?
Date: Tue, 12 May 1998
From: Hugh Clapin
Dear Gerard,
Admission: I'm not going to provide exactly what I promised in issuing
the challenge. In fact the key comparison is between superpositional activation
vectors and superpositional weight matrix representation: if the latter
is representational then so is the former. So I'm not going to say that
my purported superpositional activation vector represents in the exactly
the same way that standard activation vectors do - but that doesn't stop
them being representations.
Okay, here's the story. ...OK - so in the standard case activation vectors represent by virtue of being structurally isomorphic to (some aspect of) their contents. The connection matrix representation is a kind of embodied know-how (and so properly tacit representation in Dennett's taxonomy). So while structural isomoporphism is required for some kinds of representation, it isn't required for all kinds of representation. Connection matrices, for example, represent without being structurally isomorphic. So do sentences. Clarification: I am not claiming that most cases of activation vector representation are superposition. Off the top of my head I can't point to an example of a connectionist network that uses such a form of representation. My only claim is that such a representation is possible.
Also, note that I'm not denying that there is an important and significant
difference between isomorphic and superpositional representation: there
is. My point is only that activation vectors may be superpositional or
isomorphic.
Date: Tue, 12 May 1998
From: Gerard O'Brien
Dear Hugh,
So the situation is this. If PDP activation patterns represent in virtue
of structural isomorphism, then superpositionality is ruled out. On the
other hand, if one can tell some other story about the representational
content of such activation patterns (some kind of causal/information theoretic
story, say) then superpositionality might still be a goer.
The question now, though, is whether any story other than structural isomorphism is plausible in the context of PDP (and hence connectionism). Here we must leave the standard PDP-lore behind (to the extent that there is any consensus among connectionists) and enter more speculative territory, but I think there is a good reason for thinking that the answer to this question is in the negative. This will seem surprising because philosophers of mind are so used to thinking that the issues of computational architecture and representational content are orthogonal. But this is a legacy of classical cognitive science, grounded as it is in digital computational theory. Given that digital computations inherit their semantic coherence from rules that are quite distinct from the structural properties of the symbols they apply to, classicism appears to place few constraints on a theory of content. As long as these symbols are transformed according to these rules, it would seem to matter not where their representational content derives from. Thus the causal, functional and teleofunctional theories of content that dominate the literature are all, prima facie, compatible with the symbolic representations.
All of this changes when we move to the PDP context. PDP networks are capable of computational operations only in so far as their activation landscapes are (after being shaped through the application of learning rules) structurally isomorphic with their target domains. This is how they compute; this is how their causal operation is rendered semantically coherent. Without such structural isomorphisms, PDP systems are useless. But as a consequence, connectionists just don't have the same luxury as classicists when it comes to content determination. They are forced to explain content in terms of relations of structural isomorphism between activation pattern representations and features of their target domain.
It is for this reason that one can't point to existing connectionist
networks that employ the kinds of (superpositional) activation pattern
representations of which Hugh speaks. Superpositionality in PDP activation
pattern representation is not really a goer, after all.
Date: Wed, 13 May 1998
From: Hugh Clapin
Dear Gerard,
For the sake of argument, I'll accept that structural isomorphism rules
out superpositionality.
(I'm not sure I really think this, though. think again of sound: the representation on the CD is superposed, however it is also clearly a structural isomorphism: it's a structural isomorphism of the pressure wave incident at the microphone. Maybe its not a structural isomorphism of the sounds of the individual instruments, but I'm not sure even about that. The lesson here is that it is critical to keep in mind the content relative to which a given representation is or is not isomorphic and/or superpositional.)
But if you want to hold that the only sort of representation is structural isomorphism, then that rules out connection matrix of the network as a representation. You seem to be throwing out the baby (tacit, superpositional representation in the weights) with the bathwater (superpositional representation in the activation vectors).
The question now, though, is whether any story other than structural isomorphism is plausible in the context of PDP (and hence connectionism).I can't tell you how much I agree with this previous paragraph. My PhD dissertation argues exactly this point [2], and the general lesson is central to my work on representation.
...
This will seem surprising because philosophers of mind are so used to thinking that the issues of computational architecture and representational content are orthogonal. But this is a legacy of classical cognitive science, grounded as it is in digital computational theory.
But as a consequence, connectionists just don't have the same luxury as classicists when it comes to content determination. They are forced to explain content in terms of relations of structural isomorphism between activation pattern representations and features of their target domain.OK but what about the representation in the weights?
Summary:
I said that the explicit/implict distinction you rely on for your theory of consciousness had to be, de facto, a process distinction, thus turning the theory of consciousness into a process theory (and a promising process theory it is, too).
In arguing that, I made the point that it wasn't obvious that activation vectors are explicit and weight-representations are inexplicit.
Your account of explicitness requires, as a necessary condition, that an explicit representation posesses a single semantic value.
I argue that apart from this condition, your account of explicitness is effectively a process account. And I argue that the requirement that an explicit representation possesses a single semantic value is an idiosyncratic requirement, which doesn't fit our intuitions about explicitness (my counter-example being that explicit representations of sounds are typically superpositional and thus possess more than a single semantic value).
So I say your definition of explicitness ought drop the requirement of a single semantic value, and consequently it becomes a wholly process account.
Taking another tack, even if I were to accept that single semantic value is reasonable condition on explicitness, maybe some activation vectors would turn out not be explicit on your definition, because they might be superpositional.
Your response is to say that the only plausible method by which activation vectors represent is by virtue of structural isomorphism, and superpositionality is incompatible with structural isomorphism (or more exactly, structural isomorphism with X is incompatible with superpositional representation of X and of Y, for most Y).
My worry with this line of defence is that it rules out superpositional representation as representation at all, and thus rules out weight vector representation, which seem to contradict an important assumption in your theory: both activation and weight vectors represent, but they do so differently, and the difference grounds the difference between conscious and unconscious mental states. It would seem to be a consquence of this line of defence that there is no unconscious representation; a conclusion I presume you don't want to embrace.
A further worry: Cummins in his recent book Representations,
Targets and Attitudes [6] argues that structural
isomorphism is the only proper sort of representation (and is thus congenial
to your defence); however he also makes the point that structural isomorphism
with X almost always entails structural isomorphism with a bunch of other
things (many things share aspects of their structure). So structural isomorphism
alone doesn't give you a 'single semantic value' anyway.
Date: Wed, 13 May 1998
From: Gerard O'Brien
Dear Hugh,
Lots to say about your last posting and not enough time. But a very
brief response. In fact, I didn't say that "the only sort of representation
is structural isomorphism". I said that the right way of thinking about
representation in PDP activation patterns is in terms of structural isomorphism.
Unless one thinks that all representational vehicles wherever they occur
must represent in the same way (and why would one think that?), my claim
here entails nothing about the way other kinds of representational vehicle
(such as connection weight representations, symbolic representations, etc.)
do that thing they do. My remarks were focussed on activation pattern representation
because the issue we were discussing concerned whether or not this kind
of vehicle can represent information in a superpositional fashion.
Having said this, and since you asked, it turns out that that the story to be told about connection weight representation also invokes structural isomorphism. But as always, things get more complicated, and I can only be very brief. Unlike first-order isomorphism, which can obtain between a single representational vehicle and a single representational object, structural isomorphism is a systemic property. One can talk sensibly about an individual activation pattern representation being "structurally isomorphic" with some feature of a task domain only if the former belongs to a system of representational vehicles that systematically mirror features of the task domain in question. Points in a representational landscape are isomorphic with individual features in the task domain only in so far as the landscape as a whole is isomorphic with the target domain. (Think again of how NETtalk works. Its representational landscape -- as revealed by cluster analysis -- is isomorphic with the the whole range of letter-to-sound correspondences in English.) But where is this "whole representational landscape" (as opposed to points in this landscape)? The answer is that it is embodied in a dispositional fashion in the (trained up) connection matrix of the PDP network in question. And because it is so embodied, this connection matrix (tacitly) encodes information about the target domain.
Date: Tue, 19 May1998
From: Hugh Clapin
Dear Gerard,
I think my sound example shows that superpositionality and structural
isomorphisms aren't mutually exclusive, and your story about representation
in weight vectors seems to acknowledge that point. NETtalk is only one
of many possible connectionist architectures, and it seems pretty risky
to found your theory of explicitness, and thus consciousness, on the hope
that hidden unit vectors not be superpositional. I was going to concede
at this point (and I think I did in an earlier email) that the NETtalk
hidden unit activation vectors aren't superpositional. But having looked
back at some van Gelder discussions of this issue, I'm not sure I ought
make such a concession:
It seems that your view that activation vectors shouldn't/can't be superpositional is perhaps an isolated view. In his 'Defining "Distributed Representation"' [18] van Gelder explicitly argues that hidden unit activation vectors can be, and often are, distributed by virtue of being superpositional.
So on p. 181 he says 'It is common in connectionist practice to regard the hidden layer acitvations in a fully-connected feed-forward backpropogation network as a distributed representation of the input system.' He then goes on to show why this is justified on his formal account of distribution according to strongly distributing transformations. 'In other words, when the network generates the hidden unit pattern on the basis of the input pattern, it is implementing a strongly distributing transformation from the function specifying the input pattern to that specifying the hidden pattern.' Similarly he goes on (pp. 181-182) to show how the hidden unit activation vectors in the RAAM architecture, for example, are distributed codings of a sequence of input vectors.
Following that, I think the thing to say is this: the NETtalk hidden unit activation vectors as you describe them aren't distributed representations of the input graphemes, but were the input vector to be a discrete, localist coding where each input unit represented a single content, then the hidden unit vector would be a distributed representation. Secondly, van Gelder's RAAM discussion suggests that hidden unit activation vectors are often distributed representations of the distal contents of the input vectors, at least in cases where the input vectors are represented in a distributed fashion across the hidden unit activation vectors.
So I think it's pretty clear that not only could there be distributed coding in activation vectors, but there often is.
Date: Tue, 19 May 1998
From: Gerard O'Brien
Dear Hugh,
I certainly don't think my defence of the non-superpositionality of
activation pattern representation rests solely on NETtalk. I introduced
this example to provide a degree of concreteness that is often lacking
in philosophical discussions of these matters. (This is why philosophy
often goes awry -- the view from the armchair is very limited). Moroever,
I could have made exactly the same point using a number of other PDP models.
Indeed, I think the burden here lies with those who think activation patterns
can represent in a superpositional fashion. And in this respect I'm still
waiting for a good example.
Date: Tue, 19 May 1998
From: Hugh Clapin
Dear Gerard,
According to the best theory I know of what distributed representation
is (van Gelder's), activation vectors can be distributed representations.
That's my claim. If you have a better theory of distributed representation
which rules out activation vectors as distributed, then I'd like to see
it. You have given me some idea of your account of representation: that
activation vector representations are representations by virtue of being
isomorphic to their contents, for example. But as you acknowledge that
not all representation is going to be isomorphic, the path seems open for
the suggestion that some activation vectors represent via superposition
(cf some ink patterns represent by isomorphism - pictures - some represent
arbitrarily - words).
I thought the RAAM example would be a good one, for the purposes
of demonstrating this point. Here's another. Elman's network which is designed
to process sentences with embedded grammatical structure has superpositional
representation in the hidden units. I'm looking at the diagram on p. 153
of Connectionism: Theory and Practice edited by S. Davis, in the
article 'Grammatical Structure and Distributed Representations' by J. L.
Elman [8]. That network has 5 layers, as follows:
1. a 26 node input layer which feeds to
2. an 80 node layer, 10 nodes of which are fed from the input layer,
the remaining 70 nodes are 'context' units fed from the 3rd layer
3. a 70 node layer (called 'hidden') fed from all the nodes in layer
2, which feeds back to the 'context units in layer 2, and forward to
4. a 10 node layer which feed forward to
5. a 26 unit layer.
(you can see a similar network at http://crl.ucsd.edu/~elman/Papers/weckerly_elman/weckerly_elman.html#62.)
Now what goes on between layers 2 and 3 is of interest to us. The activation of layer 3 is a function of the activation of the 10 unit subpart of layer 2 (which is fed directly by the input layer), and the 70 unit subpart of layer 2. the activation of layer 3 is a superposition of the activation of subpart 1 of layer 2 and subpart 2 of layer 2. If we describe the activation of the 10 unit subpart of the layer 2 as a redescription of the input, and we assume that the activation of the 70 unit subpart of layer 2 at time t is a representation of activation of layer 3 at time t-1, then, at time t, the hidden unit activation vector of layer 3 is a superposed representation of the redescription of the input at time t (1st content) and the hidden unit activation vector of layer 3 at time t-1 (2nd content).
I'm fairly certain this won't convince you, but I'm still a bit
unclear as to why.
Date: Tue, 19 May 1998
From: Gerard O'Brien
Dear Hugh,
This is a long one, so take a deep breath.
According to the best theory I know of what distributed representation is (van Gelder's), activation vectors can be distributed representations. That's my claim.As I have noted in past postings, I think van Gelder's account of superpositional representation in PDP systems is fine insofar as it is restricted to connection weight representation. The trouble is that he doesn't so restrict it, and that's where he goes wrong. Let me support this charge in a little more detail.
Van Gelder writes about activation pattern representation that "individual units can be involved in representing many different entities at the same time, for the characteristic patterns for two different entities can be activated at once over the same set of units. The representings of the two different entities can in this way be superimposed on each other" ([17], pp.41-2). This, he argues, is superpositional representation, as the resouces used to code one item of information are coextensive with those used to code others. But in support of this claim he refers to just two sources. The first is Touretzky and Hinton's 'A Distributed Connectionist Production System' [15]. According to van Gelder this model shows that it is possible to "superimpose patterns while preserving the functional independence of the representations" (p.42). But there is no such superposition of activation patterns in this model (nor do Touretzky and Hinton make such a claim). What seems to have confused van Gelder here is that this model's "working memory", which is made up of 2000 binary state units, through its state of activation is capable of coding more than one "item of information" (in this case triples of symbols from an alphabet of 25 symbols) at any one time. But the means by which this is done doesn't involve superpositional activation pattern representation. Instead, the activity of each unit is taken to code 216 triples (out of 15,625 possible triples). Which triples are coded across the 2000 units at any one time is then determined by "overlaps" between the active units' 216 possibilities. This is an example of a form of representation somewhere between localist and superpositional. It is not localist, because individual units participate in the coding of more than one triple. But it is not superpositional either, because the resources used to code one triple are not coextensive with the resources used to code any other triple -- any one unit is involved with coding only about 1 percent of the information that the network can represent. In this sense, this network works with a limited form of "coarse coding" which is actually closer in spirit to localist representation.
(Incidentally, this is where the ambiguity in the term "distributed" can lead one astray. Once upon a time, distributed representation in PDP simply referred to those cases where an item of information was represented through the activity of more than one unit, and where these units participated in the coding of more than one item of representation. But in more recent times the term "distributed representation" has become synonymous with superpositional representation, a quite different notion. This is fine as long as we are careful. Touretzky and Hinton claim that their model employs distributed activation pattern representation. But by this they mean the original interpretation of distributed, not the more recent superpositional interpretation. Perhaps van Gelder should have been more wary.)
Van Gelder's second source in support of his claim about superposition in activation pattern representation is Paul Smolensky's tensor product scheme for connectionist variable binding. According to van Gelder, Smolensky offers "a formal definition of superposition in terms of vector addition. The result of adding two vectors is a new vector that, under the scheme in question, is taken to represent the same items as both the originals. Since the portion of the resources implicated in representing each item is now exactly coextensive--that is, just the whole new vector itself--the representings are superposed in exactly the sense just outlined" (p.43). But the problems with Smolensky's story here are legion. Smolensky entered into speculations along these lines in response to Fodor and Pylyshyn's criticism that connectionist representations lack constituent structure, and hence connectionism cannot explain the systematicity of thought. Smolensky thought that his tensor product scheme could show how connectionist representations do have constituent structure. This structure would be contained in a single connectionist activation pattern representation by virtue of the fact that it is constructed by "superposing" tensor product vectors (where superposing here simply means combining them in some way, in this case through addition). But as Fodor and McLaughlin very quickly pointed out [9], Smolensky's story doesn't work because the constituent structure to which he refers is merely imaginary: the fact that the the activation pattern representation is generated by adding other vectors together in no way guarantees that this "superposed" result contains any information about the vectors from whence it came. Smolensky's reply was that it would be relatively easy to take this final activation pattern and extract the component vectors. But Fodor and McLaughlin's counter-reply was that there are infinitely many decompositions of a given activity vector; and, moreover, that counterfactual representations have no causal consequences. From all accounts, Smolensky has seen the error of his ways, and is no longer pursuing this kind of speculation. (Indeed, Smolensky seems to have put his weight between a hybrid theory of cognition, in which systematicity is explained in part by a connectionist implementation of a classical architecture.)
But the problem with Smolensky's story also vitiates van Gelder's account of superposition in activation pattern representation. A mere "syntactic" superposing of vectors doesn't ipso facto generate a superpositional form of representation. This brings me, Hugh, to your interpretation of Elman's recurrent network.
I'm fairly certain this won't convince you, but I'm still a bit unclear as to why.You're right, and let me explain why. Your description of this network's performance suffers from the problem just diagnosed above: a mere "syntactic" superposing of vectors doesn't ipso facto generate a superpositional form of representation. Question: Why do you say that the activation pattern across layer 3 at time t is a "superposed" representation of the "redescription of the input at time t (1st content) and the hidden unit activation vector of layer 3 at time t-1 (2nd content)". Answer: Because the activation pattern across layer 3 is produced in some way by "combining" these two other vectors. But combining two vectors to generate a third (what I have been calling "syntactic superposition") doesn't guarantee that the third vector superpositionally represents the contents of the first two (call this "content superposition" if you like). To show content superposition (which is what our discussion is all about) you've got to explain how the contents of the two earlier vectors are "co-present" in the third activation pattern. And for this you need to tell a systematic story about the interpretation of these activation patterns (a story about the relationship between their vectorial properties and their content). And all of this is missing from your description of Elman's network. (If this sounds familar, it's because I'm simply repeating what I said in a previous posting.) In other words, you've supplied a purely syntactic story. Such stories don't really explain how PDP networks do any computational work. In order to understand Elman's network, for example, you need to say something about its representational capacities.
Telling a representational story about Elman's network takes a bit of work. But Elman himself has done this work. Because his is a recurrent network (rather than a straightforward feedforward network like NETtalk) cluster analysis is not particularly illuminating. Instead Elman applied the numerical technique of "principal components analysis" to get a picture of the representational content of activation patterns across the crucial third layer. What he discovered was that this layer doesn't just represent words, it represents each word plus the grammatical category to which it belongs (in the sentences in which these words are embedded). That is, each point in the activation space of this third layer represents a word in a particular grammatical context.
With this account of the content of activation patterns across layer three before us, can we make any sense of the claim that these the representations in this layer superpositionally encode the information contained in activation patterns across the two subparts of the second layer? That is, can any sense be made of the claim that the contents of the two earlier vectors are "co-present" in the layer 3 activation pattern? The answer is straightforwardly in the negative. Remember, this layer 3 activation pattern represents a word in a particular grammatical context. The content of the 10 unit subpart of layer 2, which is directly fed by the input layer, is the word in no grammatical context, while the content of the 70 unit subpart of layer 2, which is fed back from layer 3, is a different word in its own particular grammatical context. Neither of these contents is "co-present" in the layer 3 activation pattern. A word in a grammatical context has a different content from a word in no such context. But far more obviously, a word in a grammatical context has a different content from a different word in its own grammatical context (something that is marked in this case by the fact that the latter occupies a quite different point in the activation space of the third layer).
One can, of course, talk about the semantic relation between the representation of a word in no grammatical context and the representation of this word in a grammatical context, and even perhaps the semantic relation between the representation of a word in a grammatical context and the representation of the previous word in the embedding sentence from which the former came. But this is not the issue. The issue is whether the activation pattern across the third layer superpositionally represents the information contained in the subparts of layer 2 in a superpositional fashion. And here the answer is clear: layer 3 activation patterns in Elman's network do not even represent this information, let alone represent it superpositionally. The moral of the story is: syntactic superposition does not content superposition make.
Date: Wed, 20 May 1998
From: Hugh Clapin
Dear Gerard,
Now I think we are getting to the heart of the disagreement! - and
that is the distinction you draw between 'merely syntactic' superposition
and 'semantic' superposition; and in the latter the contents must be "co-present".
First point: on this distinction between 'syntactic' superposition and 'semantic' superposition, you agree with me that there could be and are 'syntactically' superposed activation vector representations. (I'll keep using the inverted commas, because I don't like the distinction.)
Now the fact that you keep on putting "co-present" in inverted commas worries me a little. I take it this is to suggest that a (semantically) superpositional representation of a cow and sheep needn't actually have a sheep and cow co-present in it. So what, I ask myself, is the difference between syntactic superposition and semantic superposition?
Let's start with negative characterisations. Merely syntactic superposition fails to be semantic if the contents are unrecoverable - this seems to be the essence of your account of why Smolensky's tensor-style manipulation isn't semantic superposition.
the fact that the the activation pattern representation is generated by adding other vectors together in no way guarantees that this "superposed" result contains any information about the vectors from whence it came.This surprises me. Unless I'm using a different notion of 'information' to you, this is just wrong. Tree rings carry information about tree age simply by virtue of causal relations, and any causal relation carries some information. Perhaps you mean that the fact that the the resultant vector is the result of some causal process involving the former two vectors in no way guarantees that the result contains the same content represented by the initial vectors. I'd agree with this second formulation. However some kinds of 'merely syntactic' recodings of representations do continue to carry the content of the original: think of a picture of sentence, a bit-map of a picture, etc.
Note that I'm assuming representation is transitive in the following way. If Representation R1 has content C, then a representation R2, of R1, also represents C. That is, R2 can be thought of as having two contents: R1 and C. (example: the sentence 'the cat is on the mat' represents a certain state of affairs. A photograph of that sentence also represents that same state of affairs, by virtue of representing the sentence.)
Now you're right that any old syntactic bumping around won't guarantee transitivity, but a recoding which allows the reformation of the original representation does preserve the content of the original. And as your discussion of the Fodor and McLaughlin point seems to acknowledge, whether the original representation is recoverable is matter of architectural detail: what processes are available to allow the recovery of the original representation. (Note that this gets at one of the original key points: there is no such thing as absolutely explicit representation - a representation only carries content explicitly to the extent that that information can be extracted, which relies on the broader architecture and processes of the system in question.)
Now further work in the spirit of Smolensky's tensor approach seems to show that you can have activation patterns which are: (syntactic, in your terms) superpositions of sentences with grammatical structure, whereby the grammatical structure is recoverable according to certain processes.
(My source for this is a paper Chris Eliasmith from the PNP program at St Louis, presented to last year's Southern Society for Philosophy and Psychology, called 'Structure Without Symbols: Providing a Distributed Account of Low-level and High-level Cognition'.[7] In this paper Eliasmith refers to work using a representational scheme called 'Holographic Reduced Representations'. HRRs are formed by performing certain vector transformations (convolutions) on component vector representations, and Eliasmith argues that HRRs have all the requisite systematicity required by Fodor et. al. Eliasmith is using the empirical work of T. A. Plate [12])
Now our argument isn't over systematicity. I raise the Eliasmith stuff because he shows in that paper that HRRs are superpositional codings of vector representations from which the original representations can be recovered (to a degree of accuracy). So, hypothetical question: If you could ('syntactically') superpose activation vectors into a kind of activation vector from which the orignial vectors could be recovered, would this be 'semantic' superposition, as far as you're concerned?
Summary so far: I think that you think that 'semantic' superposition requires: 'syntactic' superposition + recoverability of original representations. I've got an example (Eliasmith's HRR's) of such a representation scheme, hence an example of ('semantically' superpositional activation vectors). But I don't think I should have to come with an example anyway - all I want is for you to acknowledge the possibility of 'semantically' superposed activation vectors.
Now, in reply to the Elman example you say:
To show content superposition (which is what our discussion is all about) you've got to explain how the contents of the two earlier vectors are "co-present" in the third activation pattern. And for this you need to tell a systematic story about the interpretation of these activation patterns (a story about the relationship between their vectorial properties and their content).So we should understand the "co-presence" in representation R of thing-in-the-world alpha and thing-in-the-world beta as a systematic story about the relationship between R on the one hand and alpha and beta on the other. I can do this with simple vector addition. If vectors A and B represent alpha and beta by isomorphism and, R is the simple sum of A and B, then you have a systematic account of the relationship between R and alpha, and between R and beta.
Now that won't be acceptable to you, of course. The systematic relationship I've shown is of the wrong sort, you'll say. Maybe you want to be able to recover A and B from R, which simple vector addition makes difficult (not impossible, given the right constraints on the possible As and Bs, and background machinery). If so, see the first point above - recoverability can be done. Perhaps you have a more particular systematic relationship between R and alpha and beta in mind, say isomorphism. Well, let's say that this can't be done (I'm not totally convinced that it can't be done, but for the sake of argument ...). As I said in previous posts, isomorphism isn't the only representation-making relation. So it's possible that there is another representation-making relation between alpha and beta and R, so it's possible that R is a 'semantic' superpositional activation vector reprsentation of alpha and beta.
So given the issue isn't whether or not anyone has actually built a network that has 'semantic' superposition in activation vectors (I don't know if this is true, partly because I'm very suspicious of this 'syntactic' vs 'semantic' superposition distinction.), but whether such a network is possible, will you accept that there might be such a network?
Date: Thu, 21 May 1998
From: Gerard O'Brien
Dear Hugh,
First point: on this distinction between 'syntactic' superposition and 'semantic' superposition, you agree with me that there could be and are 'syntactically' superposed activation vector representations.Yes. A syntactically superposed activation vector is merely an activation vector than has been generated by combining in some way two or more other activation vectors. I refer to it as syntactic because it is the kind of superposition that, for example, physicists talk about when characterising wave addition. It has got nothing to do with the representation of information.
A semantically superposed representation of a cow and a sheep must represent information about a cow and a sheep. And it must do so in a such a fashion that the resources used to represent the one are coextensive with those used to represent the other. (The inverted commas around the word co-present were merely to indicate that this is an odd word.)
This surprises me. Unless I'm using a different notion of 'information' to you, this is just wrong. Tree rings carry information about tree age simply by virtue of causal relations, and any causal relation carries some information.In one sense we have been using different notions of 'information'. Throughout most of this discussion I've been using 'information' to mean something like "representational content". (And incidentally, I don't think that representational content reduces to causal relations. But that's not all that relevant here because while information theoretic semanticists construct representation out of causal relations, they don't simply identify representation with such relations, on the pain of generating pansemanticism.) But above I'm not using 'information' in this way. I'm simply pointing out that one cannot determine from a resultant vector what two vectors were syntactially superposed to form it. This I take to be obvious. (Similarly: here's a number -- 42 -- that I generated by adding two others together. Can you tell me what numbers I used?)
Now you're right that any old syntactic bumping around won't guarantee transitivity .... (Note that this gets at one of the original key points: there is no such thing as absolutely explicit representation - a representation only carries content explicitly to the extent that that information can be extracted, which relies on the broader architecture and processes of the system in question.)The first part of this paragraph is fine. But you won't be surprised if I don't accept the part in parentheses. As I argued in an earlier posting, a PDP activation pattern is contentful in virtue of being structurally isomorphic with same feature of its target domain. Such an isomorphism obtains quite independently of the causal role this activation pattern has in some larger architecture, and hence how its representational content is used. I know you won't agree with this, and you've got people like Andy Clark, David Kirsh, and, for slighly different reasons, much of the rest of the philosophy of mind community on your side, but this is the burden that us (lonely, isolated) vehicle theorists must shoulder. But look at what our story buys you: because content is determined independently of use, you can tell a robust story about the causal efficacy of content! (I take it that this is one of central points of Robert Cummins' latest book. [6])
So, hypothetical question: If you could ('syntactically') superpose activation vectors into a kind of activation vector from which the orignial vectors could be recovered, would this be 'semantic' superposition, as far as you're concerned?I haven't looked at Eliasmith's work, but I'm perfectly willing to suppose that this is possible -- Smolensky indicated that it would be relatively easy to do this. And the answer to your question is: Yes. But notice that the representational vehicle which superpositionally encodes this information is not the activation vector. It is this vector together with the machinery (presumably some kind of PDP network) that enables the original vectors to be extracted from it. The crucial point here is that the activation vector itself doesn't code information in a superpositional fashion. This information is superpositionally encoded in this vector plus a PDP network or networks. That's the standard kind of superpositional representation one finds in PDP.
(Incidentally, Eliasmith's approach is not particularly relevant to the point that Fodor and McLaughlin are making. They accept that it might be possible to extract from one vector other vectors that carry information about constituent structure etc.-- that was the gist of Smolensky's lame reply, remember. Fodor and McLaughlin are asking whether an activation pattern itself -- a connectionist representation -- can have constituent structure. If one attempts to answer this question by pointing to other activation vectors that the system can extract from this activation pattern representation, one has misunderstood the force of their criticism.)
But I don't think I should have to come with an example anyway - all I want is for you to acknowledge the possibility of 'semantically' superposed activation vectors.As I've said, I don't think Eliasmith's HRR's are examples of activation patterns that represent in a superpositional fashion. But putting that aside, I'm not sure I know what you're asking me to acknowledge. I don't think PDP activation patterns represent in a superpositional fashion. I've told you why they can't, pointing to the way PDP systems compute by exploiting structural isomorphisms between their representational landscapes and their target domains. If you're asking me whether it's possible that one day a PDP network will be built whose activation patterns do represent in a superpositional fashion, my answer is that I doubt it (based on the way PDP systems compute). But who knows; as my colleague Chris Mortensen says "Anything is possible".
Date: Tue, 26 May 1998
From: Hugh Clapin
Dear Gerard,
What's becoming clear to me is that you have some strong assumptions
about the nature of connectionist computation and processing which help
explain the positions you take.
A syntactically superposed activation vector is merely an activation vector than has been generated by combining in some way two or more other activation vectors. I refer to it as syntactic because it is the kind of superposition that, for example, physicists talk about when characterising wave addition. It has got nothing to do with the representation of information.In the last sentence we differ significantly. I think exactly this sort of superposition ('syntactic') is what underlies much of what is representationally interesting about connectionist networks.
A semantically superposed representation of a cow and a sheep must represent information about a cow and a sheep. And it must do so in a such a fashion that the resources used to represent the one are coextensive with those used to represent the other.I take it that the problem with 'syntactic' superposition is typically that the former condition is breached, according to you: they don't (using a technical sense of 'represent' that I'm still unclear about) represent the contents I want them to represent.
I'm simply pointing out that one cannot determine from a resultant vector what two vectors were syntactially superposed to form it. This I take to be obvious. (Similarly: here's a number -- 42-- that I generated by adding two others together. Can you tell me what numbers I used?)Yes, if you place constraints on the candidates. One way to do a similar task would be to ask which numbers were multiplied together to get 42, restricting the candidates to primes - then you can get a unique answer. I'm sure you can do it with sums, too. The analgous situation in networks would be that there are constraints on the vectors to be superposed such that unique decomposition is possible. I'm not saying I know of such a system, of course, although one of the uses of a normal basis of a vector space is precisely to allow unique decomposition of vectors. So the general point is that if you know enough about the system in general, the right information could well be present in the representation. But this makes the content of the representation sensitive to its context and processing, which you won't like.
Are you claiming that all PDP activation patterns must represent by isomorphism? Here's an example of one that doesn't: the input vectors on Ramsey Stich and Garon's network [13] (which you and I know well [10], [3]). What structural feature of the fact of dogs having fur is isomorphic to 1100001100001111? My inference is that this is a stipulation you make: the only proper form of representation in PDP activation vectors is isomorphism. (Actually, I am very tempted by Cummins' line, which is in part that the only form of representation proper is isomorphism - I like it a lot. My complaint here would be that it seems that this assumption is necessary to make sense of your distinction between 'syntactic' and 'semantic' superposition, and thus make sense of your claim that activation vectors are never superpositional, and thus make sense of the suggestion that there is a firm distinction between explicit and inexplicit representation. And this very strong assumption, probably not shared by many people, wasn't made explicit.)
The crucial point here is that the activation vector itself doesn't code information in a superpositional fashion. This information is superpositionally encoded in this vector plus a PDP network or networks. That's the standard kind of superpositional representation one finds in PDP.Aahaa! I'm starting to get it, now, I think. Something must be special about activation vectors such that they can represent on their own, without appealing to any sort of decoding machinery. And I take it your suggestion is that what is special is that they are isomorphic to their contents. So some vehicles are genuinely, absolutely explicit because they are isomorphic to their contents, according to you.
Of course this isn't a completely unproblematic notion of meaning
to rest your theory on. Isomorphisms are everywhere, and you get the same
pansemantic explosion as do causal theorists.
(Incidentally, Eliasmith's approach is not particularly relevant to the point that Fodor and McLaughlin are making. ...)(Also incidentally: if you think, like I'm inclined to, that the only interesting form of constituent structure is functional constituent structure, then Fodor and McLaughlin's criticism rests on a purely incidental feature of standard linguistic representation: components are spatially separated. I think van Gelder's arguments on this front are compelling in his 'Compositionality' paper [16].)
So I now think that you think that:
Date: Tue, 26 May 1998
From: Gerard O'Brien
Dear Hugh,
I just thought I'd make a couple of final remarks by the way of indicating
where things now stand in relation to our discussion.
My inference is that this is a stipulation you make: the only proper form of representation in PDP activation vectors is isomorphism.I hope it's not a stipulation. I hope it actually falls out of the way PDP systems compute. As for the above example, the problem here is the misdescription (by Ramsey, Stich and Garon [13]) of the input representations to their network. There is an important sense in which their input patterns do not actually represent propositions such as "dogs have fur" (and nor do they need to in order for the network to do its computational work). Instead, these input patterns have a much less robust content, something along the lines of "X has property g". Once such a rediscription is used, one does start to see a certain kind of isomorphism between input patterns and their representational objects.
My complaint here would be that it seems that this assumption is necessary .... And this very strong assumption, probably not shared by many people, wasn't made explicit.Fair enough, but as I've been saying, the assumption here is no mere stipulation; it is one of the entailments of connectionism. The fact that connectionists have not explicitly embraced this assumption in their writings can be countered by noting that they have implicitly done so in their modelling practices.
Isomorphisms are everywhere, and you get the same pansemantic explosion as do causal theorists. (incidently I think this is one issue that Cummins doesn't adequately address in RTA, particularly with respect to targets.)You're right about the threat of pansematicism here. But there are ways to limit this threat, some of which Cummins does explore (eg, his distinctions between theories of representational content, target fixation, and application content -- p.20 of Representations, Targets and Attitudes [6]. Structural isomorphism is a theory of representational content, not a theory of representation generally). Barbara von Eckardt in here book What is Cognitive Science? [19] also has an illuminating discussion, drawn from the work of Charles Sanders Peirce, of the different aspects of representation that need to be teased apart and treated separately.
(... Fodor and McLaughlin's criticism rests on a purely incidental feature of standard linguistic representation: components are spatially separated. I think van Gelder's arguments on this front are compelling in his 'Compositionality' paper [16])I'm tempted to respond here by saying that van Gelder's account doesn't really make any progress here because "functional constituent structure" must itself rest on "structural constituent structure" (i.e., functions are not primitive) and hence that Fodor and McLaughlin's criticisms of Smolensky could be effectively redeployed against van Gelder -- but that would be to drag in the whole debate surrounding the systematicity of thought, and who wants to do that! So I won't.
So I now think that you think that:Yes I do think this, provided we distinguish between a theory of representation generally and a theory of representational content, as noted above (and I'm not sure where the "except in very special cases" sneaks in).
- isomorphism is the basic form of representation, and things have isomorphic contents regardless of their context, processing, etc.
- except in very special cases, one representation can't be isomorphic to two different things, and so can't represent (in this sense) two things at once across the same resources, and so can't be superpositional.
So I now think that you think that: At least now I understand! I still don't agree with you, of course:Your suspicions are reasonable, given that a structural isomorphism story about content is not very popular at the moment. I guess my hope is that with time, and with a better understanding of the commitments of connectionism, it will be taken seriously. Robert Cummins' latest book is a sign that this is indeed happening.
- I'm suspicious that isomorphism simpliciter is insufficient for representation, and that whatever you add to isomorphism to appropriately constrain your semantic theory will turn it into some kind of process theory.
I did say, remember, that the story about superpositional connection weight representation also invokes the concept of structural isomorphism. But it is certainly true that connection weight representation, because it is a dispositional form of information coding, is in some sense dependent upon and hence secondary to activation pattern representation. So the story about the content of connection weight representation is going to be importantly different from the one about activation pattern representation. (I hope this doesn't turn out to be the tired old distinction between "original" and "derived" intentionality, though there are certainly echoes of Searle's "connection principle" here -- see his 'Consciousness, explanatory inversion, and cognitive science' [14].)
- since you buy non-isomorphic superpositional representation in the case of weight matrices, I still don't see why you rule it out by fiat for activation vectors.
[2] Clapin, H. The Rivets of Thought. Doctoral dissertation, School of Philosophy, The University of New South Wales, 1995.
[3] Clapin, H. 'Connectionism Isn't Magic.'Minds and Machines 1 (2 1991): 167-184.
[4] Clark, A. Associative Engines. Cambridge, Massachusetts: MIT Press, 1993.
[5] Cummins, R. Meaning and Mental Representation. Cambridge, Massachusetts: MIT Press, 1989.
[6] Cummins, R. Representations, Targets and Attitudes. Cambridge, Massachusetts: MIT Press, 1996.
[7] Eliasmith, C. 'Structure Without Symbols: Providing a Distributed Account of Low-level and High-level Cognition.' Presented to the 1997 Southern Society for Philosophy and Psychology Conference, Atlanta. Available at http://ascc.artsci.wustl.edu/~celiasmi/publications.html.
[8] Elman, J. L. 'Grammatical Structure and Distributed Representations.' In Connectionism: Theory and Practice, ed. S. Davis. Oxford: Oxford University Press, 1992.
[9] Fodor, J. A. and B. P. McLaughlin 'Connectionism and the Problem of Systematicity: Why Smolensky's Solution Doesn't Work.' Cognition 35 (1990): 183-204.
[10] O'Brien, G. J. 'Is Connectionism Commonsense?' Philosophical Psychology 4 (2 1991): 165-178.
[11] O'Brien, G. and J. Opie, 'A Connectionist Theory of Phenomenal Experience.' Behavioral and Brain Sciences (forthcoming).
[12] Plate, T. A. 'Holographic recurrent networks.' In Advances in Neural Information Processing Systems, Morgan Kaufmann 1993, pp. 24-41.
[13] Ramsey, W., S. P. Stich, and J. Garon.'Connectionism, Eliminativism, and the Future of Folk Psychology.' In Philosophy and Connectionist Theory, W. Ramsey, S. P. Stich, and D. E. Rumelhart (eds). Hillsdale, NJ: Lawrence Erlbaum and Associates, 1991.
[14] Searle, J. 'Consciousness, Explanatory Inversion, and Cognitive Science'. Behavioral and Brain Sciences 13 (1990): 585-642.
[15] Touretzky, D. S. and G. E. Hinton, 'A Distributed Connectionist Production System.' Cognitive Science 12 (1988):423-66.
[16] van Gelder, T. 'Compositionality: A Connectionist Variation on a Classical Theme.' Cognitive Science 14 (1990):355-384.
[17] van Gelder, T. 'What is the "D" in "PDP"? A Survey of the Concept of Distribution.' In Philosophy and Connectionist Theory, W. Ramsey, S. P. Stich, and D. E. Rumelhart (eds), Hillsdale, N.J.: Lawrence Erlbaum Associates, 1991.
[18] van Gelder, T. 'Defining "Distributed Representation".' Connection Science 4 (3-4, 1992): 175-191.
[19] von Eckardt, B. What is Cognitive Science? Cambridge, Massachusetts: MIT Press, 1993.