Saturday 31st May 1997
Emmanuel College, The University of Queensland
Supported by the
School of Information Technology
and the
Department of Electrical and Computer Engineering
The University of Queensland
UQ has many individuals and groups that work in the areas that fall under a broad rubric of Cognitive Systems, including content areas in vision, speech, memory, hearing, robotics; and methodologies, such as neural networks or image processing. The Workshop brings together researchers and postgrads interested in cognitive science and intelligent systems.
Gordon Wyeth
Department of Electrical and Computer Engineering
wyeth@elec.uq.edu.au
Traditionally robot design has been heavily modularised, with the sensor systems, intelligence systems and actuator systems being isolated for study and research. The study of intelligence for robots was also traditionally modularised, with seperate units for sensor interpretation, world representation and action planning. Often these modules are studied in isolation, leading to significant problems when attempts are made to integrate the sub-systems into a real robot.
An approach that is gaining favour is to build "the whole iguana". Many researchers now consider the study of sub-systems unhelpful to the progress of robotics research. Instead they advocate the construction of complete devices that can operate in real unstructured environments. These studies are providing useful results for robot development, but also pose new questions for the research of robot intelligence.
How to construct robot brains that intergrate with perception and action? How to do this without relying on the traditional decomposition of an Artificial Intelligence solution? Researchers are seeking inspiration from biology - in particular neuroscience, ethology and psychology. Naturally a blend of approaches is used. Few neurological systems are well enough understood to be duplicated, ethological models tend to be too ill defined to form the sole basis of a system design and psychological models often work on mental abstractions rather than relying upon the interface with the real world.
In our group, we have been successfully building complete robots for a number of years now. These robots have, to varying degrees, applied a blend of neuroethological approaches with traditional engineering. The robots include
The challenge for the future is to find cognitive models that solve real robotics problems, but that still operate within the sensor and actuator modalities of an appropriate robot. We seek inspiration and guidance from this workshop for the development of our next generation of robot intelligence.
Cognitive Science at UQ is a broad term that covers research in several departments and an inter-disciplinary (ID) teaching program. It is characterised by the ID nature of its teaching and research, and - like most ID programs - resists categorization into the hierarchical structure of the University. For example, the Cognitive Science group in the School of Information Technology comprise three faculty members, Helen Purchase, Guy Smith and myself, and our associated postdocs and research students. However, my group spans Info Tech and Psychology, and covers both neural network and other unrelated cognitive research. Several students have been jointly enrolled in both departments, or in conjunction with other departments (s.a. Electrical and Computer Engineering, Commerce, and Cognitive Neurophysiology at Herston Medical Centre).
I will talk first about my group's research in neural networks, and the approach we take, then briefly mention other cognitive modeling projects, and finish with a short summary of the Cognitive Science teaching program. A diversity of projects seems to be a characteristic of many research groups in neural networks - the core issues are the properties of networks themselves, but to a greater or lesser degree, the projects taken on include applications in a broad range of domains, which can vary from neuroscience to engineering applications, cognitive modeling or financial markets.
Neural networks research is done in several departments at UQ, with large groups in Information Technology, Psychology and Electrical and Computer Engineering. My group focuses on the fundamental properties of neural networks, in particular those that have application to cognitive modeling. As a framework or language for cognitive modeling, neural networks provide functional components that differ from traditional symbolic systems, particularly those related to distributed memories, the structure of information in time, and learning. Expressing a theory as a neural network forces the modeler to pay attention to how information is represented within the model, what aspects of the environment are encoded in the training data, and how the network incorporates information into its distributed internal representations through learning, and how time and structure are incorporated.
The goal of fundamental neural net research is to gain a systematic understanding of the mechanisms of neural networks, and the behaviours they are capable of. The methodology of this research includes theoretical analyses and extensive simulations. The most exciting achievement in this type of research is the discovery of an unanticipated property of neural networks - a simulation that produce an "Aha!" reaction. An example from a joint project with UCSD is the discovery that recurrent neural networks can learn and generalise context sensitive grammars. Previous to our simulation results, limitations of precision in the hidden layer of a recurrent network were assumed to also limit generalisation properties, which has implications for their use in language learning.
Open questions and current hot topics in neural networks include the modeling of temporal order in sequences, binding issues using different architectures (such as adding phase, or using spiking networks), modeling structure at all levels that allows generalisation of relational as well as featural information (systematicity and productivity questions), modeling structure in grammars for different classes of languages, and the relationship to dynamical systems.
Tools to understand neural networks, particularly hidden unit representations and dynamic systems analysis form an ongoing part of many of the above projects.
Often a fundamental property is highlighted as part of a cognitive modeling project (as in the grammar learning case above, which was discovered during a linguistic modeling project), and collaboration is a major part of our work. In collaborative projects, our primary research questions usually focus on what computational primitives are provided by the networks, and how these are combined to produce a given behaviour. Most current modeling projects in my group are aimed at understanding the properties of neural networks in producing a given phenomena (such as a recent study of a feedforward network that exhibited deep dyslexic phenomena; and catastrophic interference in multi-layer nets, inspired by models of the relationship between Hippocampus and Neocortex).
By contrast, in Psychology, researchers are usually more directly associated with modeling that draws directly on empirical data, i.e., where the goal is to design and simulate a neural network that deepens our understanding of a cognitive or linguistic phenomena. Collaborative projects with colleagues in Psychology have included human memory, analogical processing, and early vision. In such models, the highest achievement is not the discovery of new properties of networks per se, but rather the qualitative or even quantitative modeling of phenomena, and the highest excitement lies in producing models that predict novel *empirical* consequences.
The goals of finding novel mechanisms vs novel empirical consequences are often orthogonal, require different research methods, and draw on different literatures. The goals are different - but equally valid. Collaboration in neural network modeling projects often involves an initial exchange of information, about the phenomena of interest on the one hand, and computational mechanisms on the other. An initial simulation of the computational mechanism exhibiting is then possible, capturing some properties of the phenomena, which in turn facilitates refining the description of the phenomenon, which then spurs a further simulation stage, and so on. A cycle of simulations and critiques can result, gradually honing both an understanding of the power of a model, and the critical aspects of the phenomena that are being modeled. The cycles can take years or even decades, such as the ongoing saga of dual route and single route processes in neural network models of reading.
A workshop "Connectionist Models of Cognition" to introduce BrainWave is planned for July.
Current empirical work focuses on the game of Go, particularly expert/novice differences in memory and learning, and the nature of representations used by experts. The study of human Go players is complemented by computer-go programs, and we follow this literature very closely for the insights it provides for understanding issues in modeling human players. Our goal is to understand the complex information sources used by human players, and the way information is be represented and structured in expert play. Such an understanding is also expected to give insights into data structures and processes that could enhance computer-go programs, thus this research has potential outcomes in both cognitive psychology and artificial intelligence. This project is the focus of Jay Burmeister's PhD research, and involves collaboration with Yasuki Saito and Atsushi Yoshikawa at NTT in Japan.
The teaching program for Cognitive Science offers undergraduate BA and BSc, BA Honours, and Postgraduate Diploma. It contains 4 core subjects:
Elective subjects are drawn from the contributing disciplines. The program is thus very broad, and students are encouraged to develop depth in one or more of the contributing areas.
The core subjects are also taken as electives in a wide range of degrees, including students from B.Inf Tech, B.E., B. Comm, as well as B.A. and B.Sc.
School of Psychology
University of Queensland, 4072, Australia
gsh@psy.uq.oz.au
http://www.psy.uq.edu.au/Department/Staff/gsh/
Considerable success has been achieved in modelling human cognition in neural nets, but there are some criteria that still need to be met. Typical back propagation/feed-forward nets capture the flexibility and robustness of human cognition, but lack the sensitivity to structure that characterise higher cognitive processes. Relations are the essence of structure, and our research group has approach this problem by considering how relations can be represented in neural nets. We have identified a set of properties that characterise relational knowledge and distinguish it from more basic associative knowledge. We have also demonstrated how these properties can implemented in a neural net model. This model can be contrasted with feedforward nets, and predicts properties, including capacity limitations, that are associated with higher cognitive processes, but not with associative processes.
The main proposition of this paper is that the properties of higher
cognition can be captured by the concept of relational schema. The main
properties of relational schemas are:
Representation of relations. An n-ary relation R(a1,a2,^J,an) is a subset of
the cartesian product S1% S2%^J %Sn. Representation of a relation requires a
set of bindings between a relation symbol or predicate, R, and the
arguments (a1,a2,^J,an). The binary relation BIGGER-THAN(whale,dolphin) is a
binding between the predicate BIGGER-THAN and the arguments "whale" and
"dolphin". Relational schemas have the following properties:
Symbolisation means that the link between the arguments of a relation is
explicitly symbolized (e.g. the link between "whale" and "dolphin" is
explicitly symbolized by the predicate LARGER-THAN). This makes a relation
accessible to other cognitive processes, so that a relational instance can
be an argument to another relation. The property of using labelled links is
shared by propositional networks, but is not characteristic of
associations, in which all links are of the same kind, and unlabelled.
Higher-order relations have relations as arguments, whereas first order
relations have objects as arguments. For example:
BECAUSE(LARGER-THAN(whale,dolphin), AVOIDS(dolphin,whale)). BECAUSE is a
higher-order relation.
Relational systematicity means that certain relations imply other
relations. For example >(a,b) F <(b,a), whereas sells(seller,buyer,object)
F buys(buyer,seller,object). The first instance can be written as the
higher-order relation IMPLIES(>(a,b), <(b,a)).
Omni-directional access means that, given all but one of the components of
a relation, we can access (i.e. retrieve) the remaining component. For
example, the relational schema R(a,b), any of the following can be
performed:
Operations on relations include select, project, join, add, delete, union, intersect, and difference (Phillips et al., 1995; in preparation; Codd, 1990). These operations permit information stored in relational knowledge structures to be accessed and manipulated in flexible and powerful ways.
Neural net implementation of relational knowledge has been achieved using an extension of Smolensky's (1990) tensor product approach. Each relational instance is represented as a unique n-tuple, by representing bindings between relation symbol and arguments as outer products. Thus to represent loves(Joe,Jenny), each component, loves, Joeand Jenny is represented as a vector, and the binding is represented as the outer product of these vectors. Other instances of loves are represented in the same way, and can be summed to form a tensor product which represents the relation loves . Thus loves(Joe,Jenny) and loves(Tom,Wendy) are represented as: Vloves^_VJoe^_VJenny + Vloves^_VTom^_VWendy This approach implements all the properties of relational knowledge. There is one component representing the symbol and one for each argument, so the representation of an n-ary relation has n+1 components. The components retain their identity, and the representations have the compositionality property that is missing from MLPs. This approach is better than using role-filler bindings, which result in ambiguity when relational instances are superimposed. Consider loves(Joe,Jenny) represented as "loves" plus a binding of Joe to the lover role and Jenny to the loved role, and similarly for loves(Tom,Wendy): loves + lover.Joe + loved.Jenny + loves + lover.Tom + loved.Wendy Here the "." symbol signifies the role-filler binding and the "+" serves to concatenate the bindings to the relation-symbol. This represents the fact that Joe and Tom are lovers and that Jenny and Wendy are loved, but it does not indicate who loves whom, and does not represent n-tuples. A neural net model of human analogical reasoning, the Structured Tensor Analogical Reasoning model, has been developed.
We are trying to understand the complex information processing that
underlies perceptual abilities such as depth perception. Our work touches
IT in two ways:-
1. The model systems that we study use very sophisticated
information processing to analyse stimuli. We are still trying to find out
the significance of some of the neural structures that are involved in
these analyses.
2. We make great use of IT to study the brain, in particular to analyse the
optical signals produced by stimulated brains that enable us to study the
brain's processing non-invasively.
Two brain systems will be presented:-
1. Binocular vision in the owl: A system that evolved independently of
binocular vision in mammals, thereby allowing some judgements about the
relative importance of the different architectural features shown by owl
and mammal systems for binocular vision.
2. Electroreception in platypus: The platypus can detect the absolute
distance to its underwater prey by measuring the time interval between the
arrival, through the water, of the electrical wave (instantaneous) and the
mechanical wave (up to 20 msec later for realistic distances). The neural
structure set up to measure the range of time intervals in the brain is
remarkably similar to the neural structure used to measure small
differences between the two images in the binocular visual system of owls
and primates.
Manger, P.M. and Pettigrew, J.D. (1994) Electroreception and the feeding
behaviour of the platypus (Ornithorhynchus anatinus: Monotremata:
Mammalia). Phil. Trans. Roy. Soc. B. 347: 359-381.
Owl:
Pettigrew, J.D. (1990) A single, most-efficient algorithm for stereopsis?
pp 283-290 In Vision: Coding and efficiency. C. Blakemore, ed. Cambridge
University Press.
Pettigrew, J.D. (1986) Evolution of Binocular Vision. In Visual
Neuroscience, eds. J.D. Pettigrew, K.J. Sanderson and W.R.Levick pp 208-22.
Cambridge University Press.
Pettigrew, J.D. and Konishi, M. (1976) Neurons selective for
orientation and binocular disparity in the visual Wulst of the barn owl
(Tyto alba). Science, 193: 675-678.
Cooperative Research Centre for Sensor Signal and Information Processing,
Department of Electrical and Computer Engineering,
The University of Queensland.
jackway@elec.uq.edu.au
http://www.cssip.elec.uq.edu.au/staff/jackway.html
CSSIP has located several major research projects at the University of Queensland including the Cytometrics project and the Ground Penetrating Radar project. CSSIP currently funds 2 full-time research fellows and supports 9 full-time and 2 part-time PhD students in the Department of Electrical and Computer Engineering. Five other departmental academic staff are members of CSSIP.
Cytometrics and its automation is a niche area of major high-technology research and development worldwide. The release of the first commercial products last year has sparked intense interest and debate worldwide. In 1993 the Pap smear project involved 2 PhD students and their supervisors. This project currently comprises 6 PhD students, a full-time research fellow (project manager), and a full-time senior research assistant and has an annual budget (including salaries + overheads) of around $100,000.
This video-microscope is used to capture computer images of cells from standard cytology slides. These computer images are then analysed to determine whether there is evidence of cancerous changes on the slide. The CSSIP Cytometrics project is researching powerful new methods for the analysis of these images and is making excellent early progress.
However, this research requires tens-of-thousands of cell images and a constant supply of new images to avoid problems of over-training and bias in the classification schemes used.
Image cytometry is a newly emerging field in which the techniques of digital image processing and analysis, pattern recognition and classification are applied to the medical fields of cytopathology and histopathology which involve the microscopic study of cells and tissues often for the diagnosis of cancerous changes.
The project has just been successful in obtaining a large grant from the Faculty for the purchase of an automated image cytometer which should greatly assist in the collection of image data from slides.
This work would be part of a smart ``search-and-capture'' algorithm to find and image suitable cells on a slide for MACs analysis. Many of the software components for our own ``search-and-capture'' algorithm have been developed although work on this has ceased since July.
We have developed a technique to modify matrix weightings used in GLCM based on training data, to give an improved set of features with higher discrimination. More recently we have developed a new method called Adaptive GLCM where ``hot-spots'' of discrimination in a Gray-Level Co-occurrence 3-space can identified and grouped into features.
These methods have yet to be fully investigated in their application to the MACs approach but we are hopeful that they may enable much more sensitive MACs detection and measurement.
This work aims to capture information from nuclear texture that is missed in statistical approaches such as GLCM, or gray-level run length approaches, or averaging approaches, such as area ratios of threshold sets.
Outcomes from this project include the Multiscale Classifier (MSC) which has been released publicly and a recent PhD thesis (A.P. Bradley).
Our emphasis here has been the creation of a software test-bed called XCyte in which to test the various features, methods and methodologies. Much time has also been spent examining improved methods for the data analysis and MACs slide classification.
This project has taught us to use the ROC curve as the basis of comparison of classifier performance (ours and our competitors), and to use cross-validated methodologies in all our studies.
Note: some neural net based data analysis based methodologies have been used in this project.
School of Information Technology
The University of Queensland
Driven by an interest in the power of graphical (node-arc) representations for fostering both the learning and the use of the information embodied, I am involved in projects which consider these structural representations from a human-computer interaction point of view. My first project in this area was in the area of intelligent tutoring systems, when I investigated the use of such representations by primary school children, in the learning of the subject matter. Subsequently, I have broadened the user and task definition to adult users and more general tasks: in particular in the development of interactive fiction and graphical thesaurus systems.
In addition, in collaboration with members of the graph drawing community, I am investigating the validity of the aesthetics on which the design of automatic graph drawing algorithms are based. For many years, algorithms for drawing graphs have complied with aesthetics, on the assumption that the resultant drawings are easier to read (for example, minimising the number of edge crossings, or maximising a symmetrical view of the graph where possible). I am looking at the worth of these aesthetics from a human-computer interaction perspective through empirical testing. I have performed experiments investigating the relational (or syntactic) reading of a graph drawings, and my next step is to look at the worth of these aesthetics within application domains (for example, object-oriented programming).
As well as this practical work, I am also working on a theoretical project to provide a sound definition of multimedia communication from a semiotic perspective: the original semiotic definitions were formulated in a time when the media for communication were much more limited than now, and I am attempting to extend these definition to take into account the increasing use of advanced communication technology.
I have two PhD students, neither of which is doing anything related to the projects mentioned above! Mark Pedersen is working in grammars and methodology for machine translation (particularly looking at translation between English and Hindi), and Peta Wyeth is considering a practical method for pre-school children to learn about technology. method for pre-school children to learn about technology.
Tom Downs
Department of Electrical and Computer Engineering
td@elec.uq.edu.au
The quiet 1970s were followed by a decade in which neural networks re-emerged as a discipline of interest to engineers. Three good reasons for this re-emergence were (i) John Hopfield's seminal papers of 1982 and 1984, (ii) the development of a learning algorithm (backpropagation) that essentially blew away the objections raised in Minsky and Papert's book, and (iii) the maturing of VLSI technology to the point where the implementation of reasonably large artificial neural networks (ANNs) could be contemplated.
This talk will describe ways in which engineers have made use of some of the better-known ANNs. The ANNs to be covered are the Hopfield network, the Boltzmann machine, the multilayer perceptron, radial basis function networks and self-organising feature maps. Applications will be drawn from telecommunications, speech processing, image processing, energy systems and health care.
A description of the research into ANNs that is being carried out in the Department of Electrical Computer Engineering will also be given. This will include a discussion of what we as engineers see as important issues and the kinds of approaches we are adopting in attempting to resolve them.
Speech Technology is concerned with getting machines to "talk" and to "understand" human speech. Speech Science is concerned with building theories of how human beings accomplish these feats. Put in these blunt terms, one might expect to find a close symbiosis between Speech Science and Speech Technology. However, in my experience, the two disciplines tend to largely go their own ways. They co-exist quite happily as parallel sessions at numerous conferences that I have attended such as, Eurospeech, ICSLP, ASA, or ASSTA. Their should be more cross-talk between them than there is. The study of speech has always been an area calling for interdisciplinary skills and perspectives. The best research in both areas, tends to be informed by progress in the other. But in practice, cross-disciplinary collaboration is hard to achieve, especially within the university, where disciplinary constraints, the processes of peer review for grant applications etc. favor specialization.
At UQ we find speech research conducted in a range of departments that I am aware of: Speech Pathology & Audiology, Electrical Engineering, Psychology, The Department of English and probably in other departments that I am not aware of. Depending upon how sharply one distinguishes between Speech and Language research, the picture that one would have to paint of Speech research at UQ could vary greatly in complexity. I have chosen to restrict my purview to Speech, narrowly defined, though I am aware that at the leading edge of Speech Technology there has been increasing tendency in recent years to incorporate language models or natural language processing into speech front-ends. I have nothing useful to say about how this interesting and ambitious task of wedding Speech Technology to Natural Language Processing should be accomplished.
My collaborative research efforts at UQ have been with Linguists, Speech
Pathologists and Electrical Engineers (though mainly at QUT rather than
UQ). My research students have tended to come with professional backgrounds
in Speech Pathology and Language teaching. My current research interests
center upon two quite distinct areas:
1) the effects of phonological learning upon basic processes of speech
perception.
2) problems of speaker identification for forensic purposes.
By phonological learning, I mean the effects of learning the sound pattern of ones native language. How does linguistic experience in a given language shape our basic perception of speech sounds and our capability to adapt to speech in a second language, beyond the age of first language acquisition? We have been doing cross-linguistic studies of the perception of speech sounds, for example: how Japanese and Koreans respond differentially to certain vowel and consonant contrasts in Australian English; how the perceptual strategies that listeners use for adapting to the voices of different speakers (something that any speech recognition device must be capable of) are dependent upon specific phonological learning in their native language.
I think this research has implications for the architecture of speech recognition devices, at least for the human case, and probably for machine speech recognizers as well. I'll try to give you a flavour of our research later, so that you can evaluate the plausibility of this claim.
My second area of current research interest: speaker identification for forensic purposes, has its origin in the consultancy work I do to finance my "purer" research interests. This is an area that cries out for collaboration between Phoneticians and Engineers and where I have attempted some collaborative work. Speaker recognition is not as well developed an area of research as Speech recognition. This is equally true of Automatic Speech/Speaker Recognition (ASR) technology as it is of basic studies of speech and speaker perception (Speech Science).
A distinction is customarily drawn in the ASR literature between Speaker Identification and Speaker Verification. It is a case of the former when one is given the problem of identifying a speaker from among a cohort of voices. It is a Speaker Verification problem when the task is to decide whether a given voice, from an indefinitely large set of potential imposters represents a particular speaker or not. These two problems may call for somewhat different decision criteria. Engineers tend to work on Speaker Verification systems. Phoneticians are consulted about problems of Speaker Identification. But fundamentally it is the same problem. The fact that our approaches to both tend to be very different at the present time says more about the under-developed nature of our basic understanding of the processes involved in speech and speaker recognition than anything else.
A more fundamental distinction (but still only one of method or technique) is usually drawn between text dependent and text independent methods of speaker identification/verification. Text dependent speaker identification is where you control for, or take account of the content of the spoken message(s) upon which the speaker identification procedure operates. Text independent Speaker recognition is where the recognition algorithm does not care what the speaker is saying, where acoustic variation in the signal associated with the message content is treated just as another source of noise. Until recently, engineers have been more attracted by the text independent methods, for two reasons: 1) Text dependent speaker verification systems are too easily circumvented by the use of tape recorders etc., 2) text dependent speaker recognition, where the content of the message is not pre-ordained, is simply too hard. It involves simultaneously solving the problem of speech and speaker recognition. As a humble phonetician, I would like to advise the engineers that is exactly what they should be trying to do.
If we ask how much of the acoustic variability in a clean speech signal (e.g.: isolated words recorded under optimal conditions) is attributable to the particular speech sounds that the speaker utters, and how much is attributable to the personal identity of the speaker (e.g: given a cohort of adult male speakers), the ratio is about 4-5:1 in favor of the speech sounds. So, text-independent methods discard the bulk of the systematic variability in the speech signal before they start work. But the greater sin, from the Phonetic perspective, is that they discard most of the interesting variation in the speech signal: how speakers vary in their pronunciation of particular sounds; their articulatory habits, which may be specific to their accent, dialect, speaking style... etc.
These thoughts, together with a brief overview of Speech research at UQ, as I see it, will be developed further in the talk.
Jennifer Hallinan
Cooperative Research Centre for Sensor Signal and Information Processing
Department of Electrical and Computer Engineering
University of Queensland, Brisbane 4072, Australia
hallinan@s5.elec.uq.edu.au
The theory of Malignancy Associated Changes (MACs) in visually normal cells from patients with a malignant tumor was first suggested in 1959. It has been suggested that malignant cells produce a "field effect", probably chemical in nature, which subtly affects the organization of DNA in the nuclei of otherwise apparently normal cells in the patient. There are currently several groups worldwide working on the detection and characterization of MACs.
While all visually normal cells from "normal" (ie cancer-free) patients may be assumed to be normal, not all such cells from "abnormal" patients will, in fact, be MAC-affected. The proportion of MAC-affected cells from an abnormal patient is not known a priori, and probably varies with the stage of the cancer, its rate of progression, and other factors. This means that the "MAC-affected" cells used for establishing the canonical discriminant function are not, in fact, all MAC-affected, which fact almost certainly reduces the accuracy of classification.
It is thus desirable to diagnose a patient on the basis of information taken from the entire population of cells present on a diagnostic slide, rather than classifying individual cells. This approach avoids the problem of diagnosis of individual cells - the presence of a subpopulation of MAC-affected cells on a slide should affect the population statistics for the entire slide, enabling it to be distinguished from a normal slide, in which all the cells are normal.
This project involves the development of a classifier for cervical cells based on slide prototypes.
The data set for this study consisted of 38,580 images of thionin-SO2 - stained cervical cells from 125 patients, 69 of which are normal and 56 of which have been diagnosed by a cytopathologist as suffering from severe dysplasia. Eight features describing nuclear size, shape and texture were measured from each cell image.
A prototype feature vector for each slide was developed using Kohonen's Learning Vector Quantization (LVQ) algorithm. This resulted in a set of 125 prototypes, one for each slide. This dataset was subjected to linear discriminant analysis using a "leave-one-out" protocol, in which the analysis was performed on all but one slide, and the resulting discriminant equation used to compute a discriminant score for the remaining slide. The discriminant scores thus obtained were used to plot a Receiver Operating Characteristic (ROC) curve.
We found that the same feature values, preprocessed in the same way, discriminate between the two classes much better using a prototype approach than they do using a standard cell-by-cell approach. It would appear that there are indeed subtle alterations in the chromatin of apparently normal cells from the vicinity of cancerous or pre-cancerous lesions. While the changes are too slight to be reliably detected visually under a microscope, they appear to be sufficiently consistent to affect the overall population of cells on a slide. These population changes affect the position of the prototypes formed by the LVQ process strongly enough for a linear discriminant analysis to discriminate between the classes with reasonable accuracy.
Andrew Mehnert
PhD Candidate
mehnert@s5.elec.uq.edu.au
Cytometrics Project, Cooperative Research Centre for
Sensor Signal and Information Processing,
Department of Electrical and Computer Engineering,
The University of Queensland.
Supervisor: Paul Jackway
For complex and aperiodic textures, such as the nuclear texture of cells on a Pap smear, both the structural and spectral approaches are inappropriate. This leaves only the statistical approaches. Methods like those based on the moments of the grey-level histogram utilise only intensity information; they ignore spatial inter-relationships. Co-occurrence matrix methods are better in the sense that they utilise both intensity and relative pixel positions. Relative spatial information, albeit local, is also characteristic of Markov random field models. Such models are based on the assumption that the intensity at a given pixel depends only on its direct neighbours; i.e. is independent of all other pixels. In this poster we propose a novel approach to cell nuclear texture description based on the region adjacency graph (RAG) constructed from a segmentation of the chromatin within the nucleus of a cell. This graph embodies not only (feature) parametric information pertaining to the segmented chromatin, but also absolute relational information. Both the construction and analysis of the graph are performed using tools and functionals of mathematical morphology.
Cytometrics Project, Cooperative Research Centre for
Sensor Signal and Information Processing,
Department of Electrical and Computer Engineering,
The University of Queensland.
The Grey Level Co-occurrence Matrix (GLCM) is a well-known statistical tool for extracting second-order texture information from images. Originally introduced by Haralick, Shanmugam and Dinstien, GLCM measures second-order texture characteristics which play an important role in the human visual system, and has been shown to provide a similar level of classification performance. There is wide literature support to the proposition that GLCM is one of the most powerful and often used texture analysis methods.
The co-occurrence matrix is an estimate of the second-order joint PDF of grey-level pairs in an image. Features are extracted from this matrix in several ways, the most common of which involves applying a weighting function to each element of the co-occurrence matrix, and summing these weighted element values. A different weighting function is used for each feature.
Generally, a large number of co-occurrence features are extracted at varying spatial scales. The feature functions defined in the literature are standardised in that they are not varied based on the characteristics of texture being analysed. That is, higher-level knowledge is not used to modify these functions. It is hoped that, by extracting a large number of features at varying intersample spacing, one or more features will possess sufficient differences (in a statistical sense) between each class of texture, to allow classification of a previously unclassified texture based on these feature values alone.
This work proposes a novel method of extracting co-occurrence matrix
features adaptively, called Adaptive Multi-Scale Grey Level
Co-occurrence Matrix (AMSGLCM). The features extracted are adapted to
suit the specific characteristics of the classes of texture to be
analysed. That is, features are extracted, not via a fixed weighting
function of co-occurrence matrix elements, but by a variable summation
of elements in neighbourhoods containing proved high
discrimination. We will show that our approach has a number of
important advantages over the traditional GLCM method:
(1) features extracted using Adaptive Multi-Scale GLCM (AMSGLCM) have, on average, higher discriminatory power than the standard GLCM features defined in published literature;
(2) the use of AMSGLCM features, on average, provides lower misclassification rates than that of standard GLCM; and
(3) for a given misclassification rate, the number of AMSGLCM features required is generally less than that of standard GLCM features.
Peter Stratton and Tom Downs
Neural Network Laboratory
Department of Electrical and Computer Engineering
University of Queensland
St. Lucia Q. 4072.
Australia.
Cortical neurons tuned to specific stimuli, such as orientation-selective cells of area V1, have been found to respond with greater vigour when the stimulus is unexpected ([1]). Other neurons have been found which become active in anticipation of a stimulus which has not yet arrived. This paper introduces a neural architecture and accompanying unsupervised learning algorithm which can account for these observed characteristics. Computations that can be performed by this architecture are suggested, and simulations show how it can be applied to problems of image completion and novelty detection.
Oram and Perret ([1]) presented evidence which showed that about half of the neurons in monkey cortex that they tested in the visual and tactile modalities were more responsive when an input to which they were tuned was not expected than when it was. When a monkey had control of the position of a stimulus and it passed that stimulus through the receptive field of a monitored neuron, the neuron would respond only moderately. However, when the experimenter had control of the stimulus position in the same situation, the response magnitude was much greater. The difference is attributable to the fact that when the monkey had control it could anticipate the onset of the stimulus, but could not do so when control was in the hands of the experimenter. Oram et al conjectured that these expectation properties are facilitated in cortical neurons by cortico-cortical feedback connections (the functions of which are only just beginning to be investigated) but they did not speculate on the exact mechanism.
This paper introduces the concept of the expectation unit as a fundamental processor in the cortex. It is more biologically inspired than strictly biologically realistic, but demonstrates that feedback connections carrying the expected stimulus in combination with a simple learning rule can generate neurons which exhibit expectation properties, in particular showing how expectation can be both excitatory and inhibitory for different cells. A consequence of these ideas is a possible explanation of the need for some biological learning systems to go through a critical learning phase during which they self-organise and after which little further learning takes place.
An application of the expectation architecture for image completion and novelty detection is also demonstrated. The applicability of the architecture to the problem of attentional focus (deciding which parts of an image are worthy of attention) is discussed in preliminary form. The discussion concludes by showing how the combination of novelty detection and the direction of attention can result in a powerful invariant image recognition system.
Expectation is viewed as a method of filtering the input stream, and as such is a pre-attentive phenomenon. It functions as a filter by just quietly passing on any input messages (stimuli) which were anticipated by higher cortical areas, but signaling when expectation is not met. When an expected stimulus is passed to higher level cortical processes, presumably those processes can decide whether the stimulus needs attending but are not compelled to do so. If, however, an unexpected stimulus arrives or an expected stimulus doesn't, expectation generates a strong mismatch signal which compels conscious attention.
Filtering of input stimuli, as facilitated by expectation, plays three important, related roles in cognition. The first and most obvious need for filtering is to remove the vast quantities of mostly redundant data which continuously assail the brain. Secondly, filtering can remove irrelevant or unhelpful details so that processing can be focused on those details which convey the most information (greater predictability implies less information content). Finally, filtering of inputs allows novelty detection; clearly once expected inputs are removed from the input stream what is left is what was unexpected and (assuming a correct and efficient learning algorithm) what must therefore be novel. This in turn can drive learning since what is perceived to be novel is what remains to be learned.
Expectation may have a role in conscious attention. Consider a layer of expectation units with weights from the layer below, say a retinal layer, tuned in such a way that the expectation units function as edge detectors. Assume that the expectation units are connected laterally with Hebbian weights, such that when two adjacent units tend to be active together the connection between them is strengthened. When scenes are presented on the retina, an expectation unit learns to excite (i.e. send expectation to) adjacent units with similar edge orientation. After training, this results in an expected continuation of an edge at its end points, which appears on the cell array as spots of activation corresponding to the ends of the edge. When end points can be reliably detected like this, the existence of a line in between can be assumed and hence becomes redundant information. The many edges of a scene (each of which, to be described, requires specification of at least a position, a direction and a length) are reduced to a handful of points with associated orientations without loss of the information contained in the image. This transformation identifies the points of the image which are salient for image recognition. If expectation functioning on simple features like edges is useful, this raises the question of how a multi-layered expectation architecture responding to complex features and even hyper-complex objects could be implemented. It is conjectured that the combination of learned association between complex features with expectation to drive attentional focus could provide the basis for a powerful transformation invariant image recognition scheme. This is a direction for future research.
Ian Wood
PhD Candidate
Dept. of Electrical and Computer Engineering
Supervisor: Tom Downs
wood@elec.uq.edu.au
Reaching equilibrium in a Boltzmann machine is a computationally costly process, and must be done many times during training and usage.
For the equilibrium probabilities of states to be meaningfully different, a low final temperature or energy is required. Such states are often difficult to find, and so an optimisation method to lower the energy from an initial randomly chosen state is required.
Simulated annealing is a method for providing reasonable solutions to combinatorial optimisation problems, and has been routinely applied to the Boltzmann machine to find low energy states and equilibria.
It extends a technique called the Metropolis algorithm, taken from statistical physics. Near-optimal solutions are found by generating huge numbers of variations in sequence from an initial guess at a solution. A new solution is accepted if it lowers the system energy (equivalent to optimisation), but occasional increases in energy are allowed to escape from local minima. The frequency and extent to which energy increases are allowed is governed by a global temperature parameter. The innovation of simulated annealing was to translate this computational physics into optimisation and to introduce temperature schedules - functions of temperature against time which begin with a high temperature and gradually reduce it over time (annealing), forcing the solution towards lower energies.
I have attempted to borrow again from statistical physics, taking a method called Microcanonical Monte Carlo Simulation, and extending it to provide an alternative to simulated annealing. The differences are not immense, mainly in using a computationally simpler acceptance function, which differs somewhat in method also.
The main differences between standard simulated annealing and microcanonical simulated annealing are a computationally simpler acceptance function, which deals with energy increases in a different way.
The Microcanonical method involves the use of a reserve of energy, called a "demon", which exchanges energy with the system during a search for equilibrium conditions or optimisation. Increases in system energy are only permitted if the demon can "lend" the system the quantity of energy required. Conversely, if system energy decreases when a new state is adopted, the difference in energy is given to the demon. My new methods revolve around control of the demon. The demon energy can be reduced by annealing or bounding its value so as to drive the system energy lower. Variations in the way the demon limits transitions have been tried, such as randomising its value around its mean.
Some of the new algorithms have been tested against standard simulated annealing, with log and negative exponential annealing schedules on 10, 20, 50, 100 and 200-city versions of the Traveling Salesman problem, in two dimensions.
Performance of the algorithms have been comparable to the original simulated annealing algorithm, with more computation time providing better results in most cases. Numerical results are given in the poster.
Although a number of improvements have been suggested and applied to the standard simulated algorithm, many of these methods are compatible with microcanonical simulated annealing. These include many variations in generation function or annealing schedule.
The case for use of the microcanonical method with Boltzmann machines cannot be fully stated yet, as analysis of the algorithms performance is continuing. However, it is clear that the computational steps involved in the generation of a possible next state and its acceptance or rejection are simpler. This should improve the speed of Boltzmann machines which require the calculation of equilibrium statistics around a target energy or temperature. Also, some variations of the algorithm require less user-specified parameters than standard simulated annealing.
Hanna Majewski and Janet Wiles
School of Information Technology,
The University of Queensland, Queensland 4072
Australia
Visual representations differ in their capacity to encode binding information: in this paper we present a sequential binding task which requires a recurrent neural network to translate from a feature-based to a combinatorial scene-based representation. The mechanisms for binding information also depend on representational capacity. Binding information is easily carried by phase, but is not usually a component of neural network models. We propose a complex version of backpropagation for use with complex domain recurrent networks and assess the resources and requirements of the Simple Recurrent Network (SRN) and the Complex Domain Recurrent Network (CDRN) in simulations of the sequential binding task. Simulations demonstrate the improved performance and capacity of the CDRN.
The principal themes of my research have been the `scaling up' of connectionist models and machine learners, mathematical analysis of neural networks and dynamical computation systems, and the role of co-evolutionary dynamics in the evolution of complexity. Pursuing these themes, I have worked on the following projects:
(1) Scaling-up RAAMs: A number of new connectionist models including recursive auto-associative memory or RAAM were developed in the late 1980's to address concerns about how compositional structures may be manipulated within a connectionist framework. However, these systems generally run into difficulties when they are scaled up to handle the type of deeply nested structures that arise in `real world' applications. I therefore introduced a number of modifications to the RAAM architecture including digital outputs, extra layers, pre-conditioned weights and initial representations, which allow it to store structures of a greater depth and complexity than previously reported.
(2) Analysis of Dynamical Recognizers:
Much work has been done in the training of neural networks to induce formal languages from examples. The dominant approach has been to train a network on a number of positive and negative strings from a regular language, and then measure the `generalization ability' of the network or a finite state automaton (FSA) extracted from it, using the original language specification as a yardstick. However in my view there is a subtle underlying bias to this approach because there may be a discrepancy between languages which are simple from the point of view of symbolic systems (namely the regular languages) and those which are simple for dynamical systems. This bias has partly come about because dynamical systems are harder to analyse than symbolic ones; while much is known of how to analyse a network which robustly models an FSA, little is known of how to analyse networks which have induced non-regular languages and therefore cannot be modeled exactly by an FSA. I have made a contribution towards such an analysis by developing a method for testing empirically whether the induced language is regular or not; if it is regular, an equivalent FSA is extracted, otherwise a series of non-deterministic FSA's are generated, which describe the network's behavior at successively more refined levels of detail. It is my hope that further work in this direction will provide a better understanding of the relationships between language classes appropriate to symbolic and dynamical systems, and clarify the strengths and weaknesses of different computational paradigms.
(3) Co-evolution and Backgammon:
In 1992 Gerald Tesauro at IBM developed a neural network backgammon player using temporal difference learning in a co-evolutionary self-play environment. Further development eventually led to a world master level player called TD-Gammon. Though TD-Gammon represents a major milestone in machine learning, it has not led to similar impressive breakthroughs in other domains. In this project, we were able to develop a player comparable in performance to Tesauro's 1992 network using a simple hillclimbing algorithm, suggesting that the remarkable success of TD-Gammon had more to do with the co-evolutionary nature of the training, and the dynamics of the game of backgammon itself, than to sophistication in the learning techniques. In ongoing work we are trying to isolate the specific features of the backgammon domain which make it amenable to this kind of approach, and thus gain a better understanding of how learning may be facilitated in other domains.
Guy Smith
School of Information Technology
University of Queensland
Image texture is an intuitive concept. Every child knows that leopards have spots, but tigers have stripes ; that curly hair looks different to straight hair ; and the distinction between ironed clothes and crumpled clothes seems to be important. Even though texture is an intuitive concept, a definition of texture has proven difficult to formulate. Haralick, Shanmugam and Dinstein (1973) noted:
Over the years, many researchers have expressed this sentiment: Cross and Jain (1983)
and Bovik, Clarke and Geisler (1990)
and Jain and Karu (1996)
Despite the lack of a universally agreed definition of texture, all researchers agree on two points. Firstly, there is significant variation in intensity levels between nearby pixels ; that is, at the limit of resolution, there is non-homogeneity. Secondly, texture is a homogeneous property at some spatial scale larger than the resolution of the image.
Some researchers define texture by describing it in terms of the human visual system: that textures do not have uniform intensity, but are none-the-less perceived as homogeneous regions by a human observer. For example, Bovik, Clarke and Geisler (1990) write
Also, Chaudhuri, Sarkar and Kundu (1993) write
However, Faugeras and Pratt (1980) note the limitations of human perception:
A definition of texture based on human perception is useful for discussion regarding the nature of texture. However, a definition based on human acuity poses problems when used as the basis for experimental design. Any definition of texture would have to address the problem that a family of textures, generated by a parameterised method, can vary smoothly between perceptually distinct and perceptually identical pairs of textures.
There have been two main computational approaches to the definition of texture: the stochastic approach and the structural approach.
The stochastic approach considers that the intensities are generated by a two--dimensional random field. This approach is described by Faugeras and Pratt (1980)
and is also described by Cross and Jain (1983)
The stochastic approach assumes there is some spatial structure in the random field. Cross and Jain (1983) write
Also, Jain and Karu (1996) write
This spatial structure is more strongly emphasised in the structural approach to texture. In this approach, a texture is composed of a primitive pattern which is repeated throughout the texture. The relative positioning of the primitives in the pattern are determined by the "placement rule". Faugeras and Pratt (1980) describe this approach:
Cross and Jain (1983) also describe this approach:
The lack of a widely accepted formal definition of texture makes it difficult to theoretically compare texture analysis algorithms which have disparate theoretical bases. There has been no major theoretical comparison of algorithms since Conners and Harlow (1980).
Comparisons of texture analysis algorithms have been empirical in nature. Unfortunately, no standard sets of classification problems, or methodologies for comparing algorithms, have emerged in the literature.
The MeasTex project attempts to provide a methodology for comparing algorithms, a suite of standard classification problems, and a framework for developing domain-specific suites of classification problems.
In the past decade, speech recognition techniques have been applied to Chinese speech mainly for the purpose of inputting Chinese characters into computer in speech. The desired functions of a Chinese recognition system include a fast response time compared with other Chinese input techniques, adaptation to new speakers with little training, correct recognition of isolated syllable or complete sentence without user intervention, tolerance to environmental noise, and production of correct and accurate results in the presence of emotional stress and tone changes in a user's pronunciation.
Chinese differs from languages like English in that the overwhelming majority of its morphemes are monosyllabic, and its basic writing symbols, called characters, are also monosyllabic. Furthermore, it is a tonal language, with about 408 syllables if the tonal features are ignored, and about 1300 syllables if tonal features are included.
In spoken English recognition, the most widely used method for feature extraction is the linear prediction coding (LPC) based method, modified by mel frequency (based on the characteristics of human perception) and expressed as a series of cepstral coefficients (similar to concept to the impulse response of the system.). However, it is known that this method does not work well for noisy speech signals. I am exploring whether the LPC based methods will work well for Mandarin Chinese speech processing.
Currently, the most popular method in speech recognition is to combine multilayer perceptron/recurrent neural networks for local frame based processing, and to use the hidden Markov model(HMM) for determination of individual word scores. A continuous mixture Gaussian density HMM is applied in this project. In this method, continuous mixture Gaussian densities are used to model the feature parameters spectra, where correlation between the feature components are captured by the structure of the covariance matrices. The combined method is used in this project of Chinese speech recognition.
Simon Dennis and Michael Humphreys
School of Psychology
Recent work on episodic recognition has been dominated by two main approaches: the Dual Processing approach based on Jacobys (1991) Process Dissociation Procedure and the Global Matching approach based on a series of processing models including SAM, TODAM, CHARM, the Matrix Model and Minerva II. We introduce the Bind Cue Decide Model of Episodic Memory (BCDMEM), a multiprocessing account in the global matching tradition and demonstrate how it integrates these approaches accounting for the key data from each. We conclude that subjects have the ability to employ a number of processes in making episodic recognition decisions and that the nature of the instructions, the discriminability of the encoding processes, the amount of time available and the effectiveness of context cues will affect their choice in any given experimental paradigm.
Simon Dennis
School of Psychology
The nature and role of context is one of the fundmental unanswered questions in cognitive science. Context refers to an internal summary or reduced description of a sequence of events, which can influence the processing of the current event. Context can either be persistent (for the duration of the events) as in physical context, thematic/mematic context, process-based context, list specific context, and emotional context or it can be dynamic (updated as new events occur) as in phonemic context, verbal context, sentential context and computational context. While progress has been made in understanding dynamic context, less is known about persistent forms of context beyond their necessity. In particular, how the cognitive system is able to learn to divide the stream of input events into episodes has yet to be addressed and is a major challenge for both symbolic and connectionist approaches to cognition.
In this talk, I will present data based on Jacoby's (1991) process dissociation procedure suggesting a distinction between list-specific and process-based contextual information. I will also present a neural network architecture that is capable of extracting persistent contextual representations based on the statistics of event sequences as a starting point for a discussion on how contextual representations might be formed.
Simon Dennis and Michael S. Humphreys
School of Psychology
In the exclusion condition of the Process Dissociation Procedure (Jacoby 1991), subjects are given two lists of items at study. At test, they are required to respond yes to items from the second list and no to items that are either new or from the first list. Jacoby has shown that strengthening an item through a levels of processing manipulation (forming an anagram versus reading a word) decreases false alarms to list one items. In our experiment, strength was manipulated by increasing the number of repetitions of the item and was found to increase false alarms to list one items. However, when the target list was list one, strength decreased false alarms to list two items. The data suggest that there are at least two different types of context representations that can be used to exclude an item. The first type of context relies on diagnostic study tasks to operate (such as forming an anagram). The second type of context is time-dependent and list-specific; and is more difficult for list one items. The implications for models of episodic recognition are discussed.
Simon Dennis and Devin McAuley
School of Psychology
Cognition in both a developmental and evolutionary sense can be seen as a progression from a system that is closely coupled to the environment to a system that is capable of forming persistent representations of stimuli and events. Recent interest in dynamical models of cognition have emphasised the importance of studying human cognition as an embodied system. We argue that this approach has tended to ignore the extent to which human cognition is disembodied. In this paper, we consider the usefulness of the dynamical approach in explaining the development of persistent representations. Even in time discrimination, which at first glance would seem a prime candidate for a completely dynamical explanation, the key role of persistent representations escapes simple explanation. In higher-level domains, such as paired-associate learning, the explanatory problems become more acute for the dynamical approach.
Simon Dennis
School of Psychology
Supervised procedures (including teacher based and reinforcement algorithms) require information which in its current form is unlikely to be available to an autonomous system. Unsupervised systems do not require the oracular information necessary for the supervised paradigms, but have mainly been applied to the extraction of features from the input ensemble, and rarely to the generation of behaviours. This paper outlines an unsupervised paradigm by which an autonomous agent can learn to generate behaviour using information about the actions of other autonomous agents that is implicit in the environment. By closing the action/ perception loop, the algorithm acquires an intentionality based on prediction error in situations when it is in the presence of other agents and when it is alone. A simple two-armed robot is used to demonstrate the principle.
Peter Bruza
Faculty of Information Technology
QUT
Simon Dennis
School of Psychology
Often queries to Internet search engines consist of one or two terms. As a consequence, the effectiveness of the retrieval suffers. This paper describes an internet search engine that helps the user formulate their query by a process of navigation through a structured, automatically constructed, information space called a hyperindex. In the first part of this paper, the logs of an internet search engine were analyzed to determine the proportions with which different types of query transformation occur. It was found that the primary transformation type was repetition of the previous query. Users also substitute, add and delete terms from a previous query and with lower frequency split compound terms, make changes to spelling, punctuation, and case and use derivative forms of words and abbreviations. The second part of the paper details the hyperindex - which aids the user in query term addition, deletion and substitution. The architecture of a hyperindex-based internet search engine is presented. Some initial practical experiences are also discussed.
Editor: Simon Dennis
Noetica: A Cognitive Science Forum is the electronic journal of the
Australasian Cognitive Science Society, and can be found at:
The aim of Noetica is to promote the interests of the multi-disciplinary field of Cognitive Science. The participation of scholars from all areas of Cognitive Science is invited, including:
William H. Wilson
Computer Science & Engineering
University of New South Wales
Sydney NSW 2052 Australia
billw@cse.unsw.edu.au
Graeme S. Halford
Psychology
University of Queensland
Queensland 4072 Australia
gsh@psy.uq.edu.au
Steven Phillips
Information Science Division
Electrotechnical Laboratory
1-1-4 Umezono, Tsukuba 305 Japan
stevep@etl.go.jp
It is proposed that the distinction between basic and higher cognitive processes can be captured by the difference between associative and relational processes. Properties of relational processing include reification of the link between entities, so higher-order relations have lower-order relations as arguments, whereas an associative link per se cannot be a component of another association. Therefore relational processes can be hierarchical and recursive, whereas associative structures are flat. Relations, unlike associations, also have the properties of omni-directional access and systematicity. Relational processes support reasoning and content-independent transfer, and have many of the properties of symbolic models. Typical feedforward neural nets do not implement these properties in a natural way, but they can be implemented with tensor product nets. The requirements for neural nets to model higher cognitive processes are considered.
Graeme S. Halford
Psychology
University of Queensland
Queensland 4072 Australia
gsh@psy.uq.edu.au
William H. Wilson
Computer Science & Engineering
University of New South Wales
Sydney NSW 2052 Australia
billw@cse.unsw.edu.au
Steven Phillips
Information Science Division
Electrotechnical Laboratory
1-1-4 Umezono, Tsukuba 305 Japan
stevep@etl.go.jp
A conceptual complexity metric based on representational rank is proposed. Rank is the number of entities that are bound into a representation, and is related to the number of dimensions, which is a measure of complexity. Each rank corresponds to a class of neural nets. The ranks and typical concepts which belong to them, are: Rank 0, elemental association; Rank 1, content-specific representations and configural associations; Rank 2, unary relations, class membership, variable-constant bindings; Rank 3, binary relations, proportional analogies; Rank 4, ternary relations, transitivity and hierarchical classification; Rank 5, quaternary relations, proportion and the balance scale. Rank 6, quinary relations. Rank 0 can be performed by 2-layered nets, rank 1 by 3-layered nets, and ranks 2-6 by tensor products of the corresponding number of vectors. Virtually all animals perform rank 0, vertebrates perform rank 1, primates perform rank 2-3, but ranks 4-6 are uniquely human. Rank also increases with age.
Graeme S. Halford
Psychology
University of Queensland
Queensland 4072 Australia
gsh@psy.uq.edu.au
William H. Wilson
Computer Science & Engineering
University of New South Wales
Sydney NSW 2052 Australia
billw@cse.unsw.edu.au
Brett Gray
University of Queensland
Australia
Steven Phillips
Information Science Division
Electrotechnical Laboratory
1-1-4 Umezono, Tsukuba 305 Japan
stevep@etl.go.jp
Neural net models of human analogical reasoning need to incorporate realistic limitations in capacity to process information in parallel. Because of this, the Structured Tensor Analogical Reasoning (STAR) model, represents complex analogies as a hierarchy of levels, with parallel processing within any one level and serial processing between levels. The major components of the model are two constraint satisfaction networks. The focus selection network selects a relation in the base for mapping to the target. These relations are loaded into a mapping network which finds the best mapping. The most complex structure mapped in any one step is a quaternary relation, consistent with human capacity limitations. However the mapping is constrained by a set of parallel acting constraints, including consistency with previous mappings, salience of mapped elements, element and relational similarity, and structural correspondence.
Jay Burmeister
Supervisor: Janet Wiles
Games such as chess have long been utilised as research domains in AI and cognitive psychology because they can be formally specified and provide non-trivial domains without all the problems associated with real world complexity. In AI, chess has primaril
y been used to study tree-search, leading to the development and/or the refinement of many search techniques such as minimax and alpha-beta pruning. In cognitive psychology, chess has been used as a means to study perception, pattern recognition, memory,
encoding, and problem solving. Chess has also been used to: develop theories about the architecture of the human cognitive system; as a domain for cognitive modelling; as an empirical domain to study chunking; to study the nature of expertise in general a
nd chess expertise in particular; and to contribute to an understanding of chess playing itself.
Results from psychological research into chess have shown that chess players rely less on search than on a thorough knowledge of chess patterns and an ability to access and use them effectively. Although this influenced some early AI researchers to try to
incorporate more knowledge into their chess playing systems, the performance of such systems did not keep pace with the performance of brute-force tree-search systems. Chess programs play chess well but have ceased to make any contribution to the psychol
ogical understanding of human cognitive abilities. They have also made a progressively diminishing impact on AI programming techniques as improvement to performance is now primarily achieved by speed improvements in hardware. The current state-of-the-art
in brute-force chess programs, Deep Blue, beat the reigning world champion Garry Kasparov in a best-of-six series this year. However, although Deep Blue uses AI techniques, the Deep Blue project is concerned more with parallel computing power than AI.
Deep Blue's victory over Kasparov provides the AI community with the opportunity to appraise what future benefits can be derived from chess research and whether AI games research efforts may return more benefits if they are expended on games other than ch
ess. The game of Go has emerged as an appropriate successor to chess as a research domain for a variety of reasons. Go, like chess, provides a formally specified and non-trivial domain, however, Go is not amenable to brute-force AI techniques because ther
e is no effective evaluation function available for Go programs. The current state-of-the-art in Go programs play at about the level of someone who has played a few games a week for a year and has studied some introductory Go books. Typically, Go programs
try to limit the number of suggested moves to explore rather than prune the search tree as a result of an evaluation. The generation of good moves to examine requires the possession and effective use of Go knowledge. Thus, machines, just like humans, mus
t rely more heavily on knowledge than on search to play Go well.
Go is yet to be systematically studied as a domain from a cognitive psychology perspective in the way that chess has been studied. Unlike chess, the study of aspects of human knowledge within Go may provide insights which lead to improved performance in G
o programs through the development of new AI techniques. Of particular interest is the structure of knowledge possessed by human players, since this may have an impact on the type of knowledge representation used in programs.
One effective means for investigating human knowledge is through a memory testing paradigm. We conducted a series of experiments on both Australian and Japanese Go players ranging from novices through to masters (Burmeister & Wiles, 1996; 1997) which test
ed their episodic and inferential abilities to reconstruct Go positions.
Human and Computer Go
Schools of Information Technology and Psychology
jay@it.uq.edu.au
http://www.psy.uq.edu.au
References
Burmeister, J. and Wiles, J. The use of inferential information in remembering Go positions. In H. Matsubara, editor, Proceedings of the Third Game Programming Workshop in Japan, pages 56-65,
Tskuba, September 1996. Computer Shogi Association.
Burmeister, J. and Wiles, J. Memory performance of master Go players. To appear in proceedings of IJCAI workshop Using Games as an Experimental Testbed for AI Research, 1997.
Emma E. Bell and Helen J. Chenery
Department of Speech Pathology and Audiology
The University of Queensland Brisbane, Australia, 4072
Anthony J. Baglioni, Jr
Social Sciences Group
The University of Queensland Brisbane, Australia, 4072
Recently, it has been proposed that degradation of certain items within semantic memory may contribute to hyperpriming in patients with Alzheimer's dementia (AD), although this theory has yet to be substantiated by empirical evidence. This paper presents a methodology which was devised to test this semantic degradation hypothesis by replicating, in normal subjects, the effects of the semantic degradation seen in AD, and examining the resultant semantic priming effects for those items deliberately degraded in these subjects' semantic memories. Thirty-nine undergraduate university students completed the experiment. One group learnt new vocabulary items, in the form of pronounceable nonwords, which were given complete semantic descriptions resulting in reaso nably intact semantic representations of these items. Another group of subjects learnt the same set of nonwords which had been given incomplete semantic descriptions resulting in degraded representations for these items. The results of a semantic priming experiment revealed no significant differences in the priming of degraded and intact nonwords, along with an unexpected finding of faster naming latencies for targets following neutral primes. This curious result and the validity of using nonword learning studies of normal subjects to investigate semantic hyperpriming in the AD population are discussed.
Helen J. Chenery
Department of Speech and Hearing
University of Queensland, 4072
h.chenery@mailbox.uq.edu.au
John C. L. Ingram
Department of English
The University of Queensland, 4072
Bruce E. Murdoch
Department of Speech and Hearing
University of Queensland 4072
The present study investigated how a dementing illness such as Alzheimer's disease, might affect a person's recourse to higher order contextual information in the access and integration of lexical material in on-line discourse comprehension. More specific ally, the experiment investigated the priming of homophones in a discourse context, using a cross modal lexical decision task, and compared the performances of a group of six subjects with mild to moderate dementia of the Alzheimer's type (DAT) with those of a matched control group. The subjects listened to two-sentence paragraphs and performed a lexical decision on visually presented targets that followed ambiguous prime words (or homophones) at two inter-stimulus intervals (ISI's); 330 and 1000 msec. Wh en the target was a word, it was either an associate of the prime word, a probable inference suggested by the discourse, or an unrelated word. The control subjects primed both the discourse appropriate and inappropriate associate of the homophone at shor t (330 msec) ISI's (but not an appropriate inference word), a finding which supports the exhaustive access model of ambiguity resolution. As the ISI was lengthened to 1000 msec, however, the discourse-appropriate inference word was primed, and reflects th e operation of attention-dependent integrative strategies. The subjects with DAT primed both appropriate associates and inference words at the short ISI. At ISI of 1000 msec, the DAT subjects primed the appropriate associate and showed substantial inhibit ion priming of the inappropriate associate. These results point to disturbances in the selective automatic activation of lexical material, and in the conscious integration and elaboration of lexical material in ongoing discourse comprehension in persons w ith DAT.
Michael Norris
School of Information Technology
michaeln@it.uq.edu.au
Auditory scene analysis describes how a human listener separates a sound into events that can be matched to apparent sources.
Many psychological studies of auditory scene analysis have been carried out, with a focus on auditory streaming phenomena - locating the experimental conditions where a sequence of sounds changes from being perceived as coming from a single source, to being perceived as coming from two different sources (Bregman 1990 ; Warren 1993).
In a typical auditory streaming experiment , subjects are presented with a repeating sequence of brief tones, alternating between high and low frequencies and asked whether they heard a single sequence of tones, or two sequences, one of high tones, the other of low tones presented simultaneously. Above a certain combination of presentation rate and difference in frequency of tones, subjects hear two separate streams.
Bregman (1990, p.397), a prominent researcher in the field, has proposed that there are two distinct processes involved in auditory perception - a "primitive" analysis process, unaffected by familiarity or expectation, that decomposes a sound into parallel streams, and a "schema-driven" analysis process involving memory and conscious attention that performs a top-down analysis of streams into conscious percepts. Auditory streaming experiments for which results do not appear to be affected by a subject's prior knowledge are seen as evidence for Bregman's theory.
Wang's (1996) Segregation Network models the stream segregation process in the auditory nerve and cortex, assuming pre-processing by some model of the cochlea and extracting just enough information to distinguish perceptual streams without taking into account any prior knowledge or learning in the listener. The Segregation Network is a relatively simple oscillatory neural network model intended to serve as a basis for a connectionist explanation of the primitive auditory streaming phenomena observed by Bregman (1990). The Segregation Network forms perceptual groupings directly from the neural architecture, rather than using a symbolic algorithm to group processed signals.
Wang (1996, p.16) claims that the the Segregation Network is neurally plausible - that the components of the network architecture correspond to structures found in the brain, and justifies design decisions with reference to neurological and psychological findings.
The Segregation Network (Wang 1996) is an instance of a neural network architecture, the LEGION (Locally Excitatory, Globally Inhibitory Oscillator Network) which has also been suggested as model of aspects of visual perception (Terman & Wang 1995).
The LEGION architecture consists of a grid of oscillators with excitatory connections between neighbours, a global inhibitor that is connected to each of the other oscillators, and an external input to each oscillator. There is no output to the model. Results of simulations are measured in terms of the synchrony among groups of oscillators and desynchrony between groups.
| 1. |
|
2. |
| Where :
|
|
| |
In the Segregation Network, the one-dimensional representation of sound (as a parallel set of signals representing frequency bands) is turned into a two-dimensional representation by a set of "delay lines" that smear the signal across time. The two dimensions of the oscillator grid then represent the frequency band of the incoming signal and the time delay. The incoming sound continuously scrolls across the grid of oscillators.
Dept. of Electrical Engineering at the University of Queensland
It has long been envisaged that mobile robots will one day make our lives easier both in industry and around the home. Tasks such as cleaning, gardening and general maintenance will perform the bulk of robotics use around the home. However first robots n eed to become smarter in order to work in our unstructured and dynamic environment.
This paper provides an overview of the development of a novel robot intelligence system that allows the robot to hunt and gather using vision as its primary sensory system.
Most of the operations required to clean a room, for example, can be classed as hunt and gather operations. For example to pick all the up off a floor clothes, or toys if you have children, requires that the robot hunts for the objects and then gathers t hem by performing some action. This is the same as foraging for food or weeding a garden. It is the actions that are performed once the object(s) are recognised that will change for each activity. Thus a system that can perform the hunt and gather task generically can be used as a base to build a robot that is useful to the real world. This is the aim of this proposal. That is, to develop a robot that can be used as a generic base for hunt and gather tasks.
The proposed architecture is intended to be more reliable and plastic (able to adapt in the lifetime of the robot) than that of competing systems. One such system is the robot CORGI, built by Gordon Wyeth ([Wyeth 94]). This robot uses a neural network that uses the supervised learning scheme called back-propagation to learn how to find tennis balls and how to avoid obstacles. However this scheme has limitations in that it cannot adapt to changes in the environment without being re-trained by the teach er. So it will always require a teacher to learn a new action (or to improve an already learnt action). This means it can't learn from its own experiences without someone or something specifying what the correct response would have been.
A better alternative is to have a system that can not only learn by itself, but can also be taught to some extent. By using a combination of un-supervised learning and Reinforcement learning systems this can be acheived.
Figure 1. Overview of the proposed neural architecture
Figure 1 shows an overall diagram of the proposed architecture. The architecture consists of two main sections; the visual recognition section and the action-association network.
The first part is essentially the processing of the visual input to generate an output that is related to the objects within the visual field. That is, when particular objects are present in the field of view of the robot, particular areas of the output of the feature maps will become active. These areas will also be sensitive to the position of the object within the visual field, as its position may be relevant to the action taken by the robot. This approach is based upon a biological research of prim ate visual systems ([Hubel 89], [Shah et al 96 I]).
The second part of the architecture, the action-association network, generates the actual behaviours the robot will perform. It is here that the internal information of the robot is processed. This is also the stage where the information from various se nsors is fused together. That is the outputs of the different sensory mechanisms are combined. This is really the robots centre of intelligence. It is here that the robot will consider what actions to perform depending upon the sensory input (in this s ystem only the visual input and the internal states).
The next stage of development consists of developing a fully fledged robot base along with the action association network.
The development of the VPU will take place up until Christmas 1997. Early 1998 will see the development of the mobile robot base followed by the development of the action-association network.
[Shah et al. 96 - I] Samir Shah and Martin D. Levine. Visual Information Processing in Primate Cone Pathways-Part I: A Model. IEEE Trans. on Systems, Man and Cybernetics-Part B: Cybernetics, Vol 26, No 2, April 1996. Pages 259-274.
[Wyeth 94] Gordon Wyeth. Neural Constructs for Creating Intelligent Robots: Ideas and Preliminary Results. Proc. MMVIPS, IEEE Computer Society Press, 1994.
Phillip Chan
chanpk@elec.uq.edu.au
Electrical Engineering Department
Mobile Robot Group
The University of Queensland
This paper discusses one of the current researches at the Mobile Robot Group in the area of visual navigation. I review several approaches and behaviours used by insect such as bee to navigate in their foraging trip: "Path integration or dead-reckoning", "The snapshot model", "Image matching". The incorporation of these schemes into the proposed navigation system is hope to produce a robust and self-learning navigation system allowing the return of the mobile agent to its base at the completion of its e xploration. Finally, we briefly discuss the potential of my visual foraging system to real-world industrial application.
The programming of autonomous agents to identify certain objects in its sensory space as landmarks had already be achieved (for example [Mataric], [Masaki] & [Kr(se & Eecen]). All of these robots require a previous knowledge or definition of their envi ronment's structures (such as sensory reading of corridors, door way, marking on road, horizontal / vertical lines on walls etc.) which acts as landmarks. The drawback of this approach is that the robot is limited to use only those pre-defined landma rks for which it has been trained prior to its operation. This restricts the robots to operate only in environments predictable by the programmer.
The proposed navigation system solves the above restriction by implementing a self-learning visual navigation system that does not rely on structural information and relationships between objects of its environment, but by learning and remembering what i t sees along its journey. Such approach has been proven by evolution and used by many foraging insects such as bees and ants.
The robot will be completely autonomous, with its CCD camera, compass, electronics and battery on board a wheel based chassis.
Due to the abundant evidence which supports the belief of the existence of goal vectors to scenes linkages, such as the "Path integration or dead-reckoning" [Dyer] & [Collett], "The snapshot model" [Cartwright & Collett], "Turn back and look behaviour (TBL)" [Lehrer] and "Image matching" [Collett], this mechanism of associating orientation information to scenes (as to goal vectors to scenes linkage) is chosen to be the model of the proposed navigation system.
Figure 1. The visual navigation network.
* [Masaki], "Vision-based Vehicle Guidance", Springer-Verlag New York, Inc. 1992.
* [Kr(se & Eecen], "A self-organizing representation of sensor space for mobile robot navigation", Proceedings of the IEEE/RSJ/GI International Conference on Intelligent Robots and Systems (IROS'94), Sept. 1994, pp 9-14.
* [Kohonen], "Self-Organization and Associative Memory", Springer-Verlag Berlin Heidelberg, 1989.
* [Kohonen2], "The Self-Organizing Map", Proceedings of the IEEE, Vol: 78, No: 9, 1990.
* [Dyer], "Spatial Memory and Navigation by Honeybees on the Scale of the Foraging Range", The Journal of Experimental Biology 199, pp 147-154, 1996.
* [Collett], "Insect Navigation en Route to the Goal : Multiple Strategies for the use of Landmarks", The Journal of Experimental Biology, 199, pp 227-235, 1996.
* [Cartwright & Collet], "How honey bees use landmarks to guide their return to a food source", Nature Vol.295 p560-564, 1982.
* [Goodman & Fisher], "The Behaviour and Physiology of Bees", CAB International, Wallingford, UK, 1991.
* [Frisch], "Bees : Their Vision, Chemical Sense, and Language", Cornell University Press, Ithaca and London, 1971. 1