Acquisition of Case Prototypicalities by Supervised Machine Learning
 
 
Dan-Hee Yang*, Ik-Hwan Lee**, Mansuk Song*
http://december.yonsei.ac.kr/~dhyang, http://suny.yonsei.ac.kr/~ihlee
*Department of Computer Science, **Department of English
Yonsei University, Seoul, Korea
 
Abstract

Constructing a lexicon is essential to provide a proper semantic analysis of linguistic expressions. However, most of related works have the following three problems: First, the results are not enough to define word meaning, but no more than a kind of thesaurus or taxonomy. Second, the approach of componential analysis is used to represent the meaning of words. Third, the supervised machine learning of linguistice knowledge suffers from the absence of large training data. In this paper, to circumvent these problems, we propose the concept of Case prototypicality that can be a direct knowledge for picking out the Case of each argument in a sentence. Thus, the meaning of words is represented by a set of Case prototypicalities. Also, we introduce the complexity-types of determining Case to reduce the burden of building sufficient training data. Finally, we show that this knowledge can be acquired by supervised machine learning on the basis of Case particles and the collocational information from a large corpus.

Contents

Abstract
Contents
1. Introduction
2. Problems of Previous Approaches 2.1 Cognitive Reality
2.2 Building a Lexicon
3. About Word Meaning 3.1 Structure of Lexical Field
3.2 How Humans Acquire the Meaning of Words
3.3 Case Prototypicality as a Semantic Primitive
3.4 Hypothesis and Representation of Word Meaning
3.5 Complexity Types of Determining Case
4. Acquisition of Word Meaning 4.1 Building the Training Data
4.2 Machine Learning Algorithm
4.3 Result of Experiment
5. Conclusion and Future Work
Acknowledgements
References
 
1. Introduction

To build computational models of language, many researches of computational linguists or computer scientists in the area of natural language processing (NLP) have taken plausible theories of language-related disciplines such as linguistics, psycholinguistics, the philosophy of language, and cognitive science. Especially in the semantic analysis of NLP, the meaning of words has been represented, mainly based on the semantic feature hypothesis, by the result of componential analysis, and the process of semantic analysis was substantially dependent on the information of argument structure and selectional restrictions under the thematic role theory of Chomsky.

To make practical working systems in accordance with such theories, we need a lexicon of componential analysis of all words, argument structures that each verb requires, and selectional restrictions which noun phrases should satisfy to meet the required thematic roles. There are various methods of the semantic lexical representation. However, most of them include a taxonomic classification of concepts or words. One of critical bottleneck problems is how to construct a complete and reasonable taxonomy for NLP. Constructing this knowledge manually is in the domain of the linguists' work. On the other hand, computer scientists or computational linguists are responsible for its automatic construction. There have been many approaches to the automatic extraction of linguistic knowledge from large corpora and machine-readable dictionaries in NLP.

This study focuses on automatically building a lexicon among the three types of linguistic knowledge. When we talk about word meaning, it is generally considered as a lexicographical meaning. However, we do not communicate with each other by the lexicographical meaning. Furthermore, there is no agreement upon what the meaning is among linguistics and this fact gives rise to various types of semantics. Bunge (1974) classifies semantics into ten types, and Lyons (1981) into six types. Behavioristic semantics, which is one of the semantics considered to be very peculiar even though they have a long history, defines the meaning of an expression as a stimulus which causes it, a response which it causes, or both of them. Cognitive linguists think that the meaning of a word cannot be defined in the way that 'A' is defined as 'B', but it should be defined as the relationship among words like a semantic net. In the extreme, the meaning of a word may be considered as the knowledge about the world.

In view of NLP, how should the meaning be defined at all? Since it should exist, as an internal data in NLP systems, we cannot think it apart from a lexicon. The lexicon required by a morphological analyzer is different from that required by a discourse analyzer. Therefore, we should see the meaning of words, not in view of the answer to 'What is it (e.g., love, life)?', but in view of 'To what linguistic processing is it pertinent?'. This means that it is desirable to use the term in such a way as the word meaning pertinent to morphological analysis or the one pertinent to semantic analysis. Whatever definition of word meaning cannot meet a linguistic or philosophical question such as 'Is the definition correct or complete?' Accordingly, this study talks about the word meaning pertinent to picking out Cases and use the terms the meaning of words and the knowledge of words alternatively as the case may be.

We take a careful look at the problems of existing linguistic theories, and synthesize some of them in a way different from those adopted in the theories of almost all NLP. Then, we extract triples [noun, Case particle, verb] from a large corpus, and we assign Cases to a word by supervised machine learning on the basis of Case particles and the collocational information. Finally, we define the word meaning by a set of Case prototypicalities. In addition, we will see that Case particles have a very important role in the acquisition of word meaning in computers.

2. Problems of Previous Approaches

Disambiguating word senses and grasping Cases are the main task of semantic analysis in NLP. To achieve this task, much work has been done on automatically acquiring linguistic knowledge such as verb patterns, thesauri and the selectional restrictions with semantic features (Velardi 1991, Li 1996, Resnik 1994, Armstrong 1993, Hindle 1990, Pedersen 1995). For the representation of word meaning, Velardi (1991) classifies four types of semantics: conceptual semantics, surface semantics, technical semantics, and naive semantics. He suggests that surface semantics be more adequate for the purpose of an extensive codification on computers because it can be induced from word co-occurrences in texts. However, it seems that the representation is similar to that of result of semantic analysis of a sentence according to Chomsky's ST-model with a view of interpretive semantics in NLP. So there seems to arise a problem in that building the knowledge requires considerable levels of semantic analyses.

One of the most influential analyses, which conforms to the cognitive reality, was Schank's conceptual dependence (CD) which attempt to represent the meaning of action verbs with 11 primitives such as ATRANS, MOVE, and PROPEL. In his later work, primitives are used as building blocks to construct larger structures (e.g., SCRIPT, MOP, TOP) in order to capture the meaning of verbs (Schank 1977:1982). His idea and methodology are very systematic and have had a considerable influence on the field of NLP. However, because he follows generative semantics and the lexical decomposition paradigm, he is open to censure due to lack of extensibility or scalability. Pedersen (1995) attempts a system that automatically acquires the meaning of unknown nouns and verbs from the corpora. He follows P. Kay's view (1971) that human lexicons are largely organized as taxonomies of concepts. The acquisition of meaning is defined as locating an existing concept node in a concept hierarchy that defines an unknown word. If there is no node of such a concept, then a node is created and placed into the concept hierarchy. However, the essential of this process is just to construct hierarchical taxonomies.

All existing methods (Hindle 1990, Pereira 1993, Grefenstette 1993) for word classification take no account of the thematic role or Case, but uses only a collocational relation in the form of [Subject Verb Object]. However, Jeong (1993) asserts that just by a statistical processing of the collocational information or mutual information, we can know that the words which belong to the same category are related with each other in some way, but never know what relationship there is among them. If so, what is the inherent flaw of the existing works? In the following we will try to answer this question, basing our account on the cognitive reality.

2.1 Cognitive Reality

In general, the cognitive reality is not a major issue to computer scientists or computational linguists. Nevertheless we lay a great stress on it because human language is the highly sophisticated intelligent system that is fundamentally different from animal cries and calls. Furthermore, human language is the essence of human intelligence in itself. Considering such a characteristic, we think that imitation is the best approach for establishing a computational model of human language. Currently, most, if not all, representations of word meaning are based on the Clark-like semantic features. However, the resulting knowledge might not conform to the cognitive reality in that humans, even linguists, cannot naturally enumerate the semantic features of a word, although we have no doubt that they have an obvious concept of the word and that they are able to speak fluently.

Piaget suggests that the ability to represent objects and events is a necessary prerequisite for the acquisition of any system of symbolic representation for knowledge and experience (Piaget 1951). Children use words not just to name objects, but to pick out the roles those objects play in whatever event being described. The frequency of naming some objects suggests that some roles may be more salient than others (Nelson 1974).
 

Table 1. Roles and actions represented in one-word utterance
Role or action
Utterance
Context
Agent
Dada
Hears someone come in
Action or resulting from action
Down
When sits down or steps down from somewhere
Object affected by action
Ban
When wants fan turned off
State of object affected by action
Down
When shuts cabinet door
Object associated with another object or location
Poo
With hand on bottom after being changed, usually after bowel movement
Possessor
Lara
On seeing Lauren's empty bed
Location
Bap
Indicating location of feces on diaper

Nelson (1974) reports that children appear to name mostly movers (e.g., people, vehicles, animals) and movables (e.g., food, clothing, toys), with a few recipients (people). Greenfield and Smith (1976), in fact, look at the order of acquisition of these different roles. They find that the two children they observed both began by naming movers or agents, then movables or objects affected by an action, and finally places or locations, and possessors or recipients. Table 1 lists the different roles and the actions or states that the child talked about in the order from top to bottom that they emerged in his speech. The roles children pick out appear to be precursors to the semantic roles expressed in the adult utterances (Clark H. and Clark E. 1977). When we ask someone what an atomic bomb is, he may answer: It can kill tens of thousand people. It can destroy a city completely. It forced Japan to yield in the second world war. He is not expected to say something like its structure, its color, its hierarchical analysis into the semantic features. Nevertheless, he is willing to think that he knows it and he can use the word well without any problem in a real life. The next question that we take into account is whether building a lexicon by computers is feasible as expected.

2.2 Building a Lexicon

In practice, the componential analysis and lexical decomposition are problematic whether or not they are done manually or automatically. Before everything, it is extremely difficult, if not perhaps impossible in principle to find a suitable, linguistically universal collection of semantic primitives in which all words can be decomposed into their necessary properties. Even simple words whose meanings seem straightforward are extremely difficult to characterize (Hirst 1987). It may be nonsense if we believe that we can make computers execute such a task well with a current technology. The task requires not only all kinds of knowledge in the world but also the highest intelligence. There is an excessively long distance from the four fundamental rules of arithmetic.

The prime example of scientific taxonomies is the classification of plants and animals based on the proposals made by the Swedish botanist Linnaeus. Thesaurus of English Words and Phrases by Mark Roget is representative of the linguistic classification. It took many years for experts to classify them. To make matters worse, a single classification constructed manually or automatically seems least likely to satisfy the ultimate need of NLP because there need to apply a variety of criteria for classifications depending on contexts (Leem 1993). In addition, existing automatic methods have left much to be needed to define word meaning from a corpus.

According to what mentioned so far, it is very important to represent the word meaning in the way how computers can construct them without difficulty. But since we cannot grasp the entity of any relationship just by statistical processing, we assert that supervised machine learning process is indispensable. Furthermore, if the representation is cognitively plausible, it adds luster to what is already brilliant. In the following, we insist that Cases be the very key to this problem.

3. About Word Meaning

Most extensive works on word meaning have been carried on by generative semanticists such as Lakoff (1965), McCawley (1968), and Postal (1970). Although they assume that primitive semantic elements, atoms from which word meaning are composed, represent some kinds of mental constructs or concepts, most of them fail to note how these concepts connect with what we talk about, their role in articulating the objective significance of language (Chierchia 1990). Let us now look at the lexical field theory underlying the theories of word meaning.

3.1 Structure of Lexical Field

Trier (1934) and Porzig (1934) explain the structure of lexical field by the constitutional relation of words (Lee 1995). As we see in Figure 1, Trier makes an attempt to establish the semantic relation of words on the basis of the paradigmatic relation. In other words, lexical field is represented by a hierarchical tree diagram. This is a background theory for the componential analysis of modern lexical semantics. In NLP, the traditional semantic nets, one of the knowledge representations used by many researchers, follow this paradigm.

 
On the other hand, Porzig makes an effort to reveal the sense relation among words on the basis of the syntagmatic relation. He introduces the concept of encapsulation. For example (Lee 1995), English 'kick' and 'punch' are translated into French as follows: a. kick: donner un coup de pied ('to strike with the foot')
b. punch: dooner un coup de poing ('to strike with the fist')
In this case, between 'kick' and 'foot' and 'punch' and 'fist', respectively, there is an essential semantic relation, which has a collocational characteristic and lies at the root of the syntagmatic relation. Accordingly, we regard this relation as a fundamental sense relation between words. As far as the above example is concerned, he asserts that the sense of 'with the foot' has been encapsulated in the single term kick'. This holds for the relation between with the fist' and 'punch'. We now consider the question: how can we pick up the meaning of words?

3.2 How Humans Acquire the Meaning of Words

Eve Clark's (1973) semantic feature hypothesis is based upon a definitional view of word meaning. In other words, the meaning of a word consists of a set of necessary and invariant semantic features. Children acquire the meanings of words within her theory by first acquiring the more general superordinate features. Acquisition then goes from the more general to the more specific. The first features which are acquired are those which are perceptually salient to the child. The most primitive categories involve movement, shape, size, sound, taste, and texture (Ingram 1989). On the other hand, Nelson (1974) insists, by the functional core concept theory, that we acquire meaning as we recognize the functions of objects by some other non-linguistic means (Park 1996). The major difference is that Nelson emphasizes the role of functional semantic features such as roll, spatter, move, etc when children acquire the meaning of words.

Rosch et al's (1973) prototype theory is an approach developed to account for the representation of meaning in adult language. The proposal is that the meaning of words is not a set of invariant features, but rather a set of features which capture family resemblance. Some objects will be most typical of the word's meaning by sharing more of the word's features than others. Certain features, then, will be more important in determining class membership than others, but none are required by all members (Ingram 1989, Taylor 1995). Bowerman (1978) suggests that children use both perceptual and functional features. One type of categorization does not necessarily replace the others over time; that is, the different kinds of classification can be used simultaneously. Therefore, one type of categorization is not necessarily more primitive than another. From these claims, Bowerman concludes that the representation of meaning as features and prototypes needs to be incorporated into a single model (Ingram 1989, Taylor 1995, Park 1996). These theories explain how humans pick up the word meaning.

3.3 Case Prototypicality as a Semantic Primitive

Humans seem to have inconsistent, multi-partial knowledge about words. Furthermore, the knowledge is likely to be as direct as requiring little inference for surface understanding of sentences. Humans instantly grasp the meaning of a word by its thematic role or Case without componential analysis. Only in case that ambiguity occurs, they try to analyze the meaning more deeply. Therefore we should be free ourselves not only from the hierarchical taxonomy, but also from the procedural representation of knowledge bearing inference in mind. Accordingly, it is necessary to have a non-hierarchical and non-procedural representation for word meaning.

The Case of a word is determined only within a sentence. However, we have the previous knowledge about Cases which helps us to understand various situations where the word may be used. If we did not have such knowledge, we can never communicate with each other. If we do not know about X, or Cases which X may be used in a various situation, we cannot use X except for 'What is X?'. However, we are able to infer the Cases through looking at the usage in various contexts. This is the very acquisition process of the meaning of words. Naturally, Nelson's functional core concept theory seems to be more reasonable and cognitively real.

As we noted in the previous chapter, we cannot express the hierarchical knowledge naturally and that it is extremely difficult to build. Since Porzig's field theory is based on the syntagmatic relation, we do not suffer from a hierarchical structure. Hereupon, we take the notion of encapsulation to be represented by Case. Humans can describe a prototype of an object well, but they cannot say the lexicographical definition of an object well. We consider such facts support the prototype theory well. As a result, Case in itself becomes the semantic feature or semantic primitive. We do not insist that Case be the smallest unit of the linguistic meaning, but we think it is the smallest unit in a high level of cognitive process. Case is a highly abstract representation of word meaning, which has a linguistic universality. We do not consider that Case determination is a final phase of semantic analysis. It goes without saying that an in-depth understanding should require a further deep analysis on other representations.

3.4 Hypothesis and Representation of Word Meaning

We consider Cases as the functional core concept, so that Cases have a role of semantic features. We define the meaning of words as representing to what degree each word is an exemplar of each Case concept, or prototypicality. Case prototypicality is used a general term including both Case and its prototypicality. We do not insist that Case is the smallest unit of linguistic meaning, but we think it is the smallest unit in a high level of cognitive process. We consider Case prototypicalities as direct relevant knowledge just to pick out thematic roles of arguments, rather than as whole meaning of a word. In view of these, we present a new representation based on two hypotheses of Table 2 for word meanings.

As in Table 2, we define the meaning of nouns and verbs in terms of a set of Case prototypicalities. The merits of this definition are that since both word meaning and selectional restrictions are represented by Case prototypicalities in the same way, the process of grasping Cases of a word within a sentence reduces to a remarkably plain mechanical task. Consequently, the algorithms for semantic analysis get to be greatly straightforward. We hope to emphasize that the process of acquiring a set of Case prototypicalities has a role of componential analysis.
 

Table 2. Hypothesis and representation of word meaning
  • Hypothesis 1: The meaning of a word is represented by its Case or thematic role in context at a surface level or intuitive level. The Case assumes a role of semantic primitive.   
  • Hypothesis II: Selectional restrictions are not represented by binary semantic features, but probabilistic or fuzzy one. There is no strict boundary of category, but the membership is the degree of similarity to the prototype.   
  • Meaning representation of noun 

  • n = { (cp, z) }, where n is a noun, p is a Case particle, c is the Case which may be assigned to the noun when it co-occurs with the Case particle p, and z is a prototypicality. 
  • Meaning representation of verb

  • v = { (cp, z) }, where v is a verb, p is a Case particle, c is the Case which the verb v may require when it co-occurs with the Case particle p, and z is a prototypicality.

Case particles follow a noun and determine its role in a sentence. They are similar to prepositions in English. So they are sometimes called postpositions. Their main usage is to manifest the grammatical relations of words within a sentence. Case particles are classified into seven major types. Nominative particle follows a subject. Objective particle follows an object, and the like. Adverbial particles are used in a variety of ways depending upon the preceding nouns and predicates. There are about 30 adverbial particles and two particles may be used as a combined one.
 

Table 3. N : M relationship between Case particle and deep Case
  Case Particle
Deep Cases for which it can be used
-ulo 'to, as, into, for, toward, of, from, etc.' orientation, path, goal, attribute , qualification, purpose, instrument, material, cause
-ey 'at, to, for, by, etc.' location, patient, cause, instrument, agent, benefactive

As we see in Table 3, a Case particle in Korean manifests surface Case in the N : M relationship so that one Case particle can be used to represent several (deep) Cases and vice versa. Thus the order of words in a sentence in Korean is not important except peculiar situations. This study focuses on the Case particle '-ulo' which is one of the most complex particles in Korean. To simplify this experiment, we classify the Cases which the particle '-ulo' can Case-mark into three types by similarity: GOAL for orientation, path, goal; INST for instrument, material, cause; and ATTR for qualification, attribute, purpose. Note that when we use 'Case', this means deep Case except for 'Case particle'.

3.5 Complexity Types of Determining Case

Let [noun, Case particle] be a noun part and [Case particle, verb] a verb part out of [noun, Case particle, verb]. Then, we can classify the information type for picking out the Case of arguments as in Table 4. We can see here that Case particles in Korean have, to an extent, a role of selectional restrictions in that Case particles restrict the nouns which can occur before '-ulo' in the relation of verbs. Figure 2 shows the two parts of information we are interested in for determining Case.

Figure 2. The information for determining Case

The expression ulo sayngkakhata 'think as', for example, has complexity-type I when we are sure that the nouns adequate for ATTR come up as an argument without looking at the noun, Since the meaning of verbs is unique in this type, it does not get any influence from the noun, so that the verb part only determines the thematic role of the noun. Complexity-type II is exemplified in expressions such as ulo salta 'live on', mwul 'water' in mwul-lo salta 'live on water'. These are construed as INST. Sensayng 'teacher' in sensayng-ulo salta 'live as a teacher' is construed as ATTR according to the precedent noun. Mwul 'water' in mwul-lo pyenhata 'be turned into water' is construed as ATTR.

Related earlier studies (Yang 1997, 1998) assumed that only the noun part determines the Case, but we exclude the case because it does not seem to occur in reality by more thorough investigation: When we look at tongccok-ulo 'to the east', we assume that we might construe tongccok 'east' as GOAL without looking at the following verb. However, we should construe it as ATTR if we look at verbs such as pyenhata 'be turned into' and syangkakhata 'think'. Complexity-type III is the case where we cannot determine the Case only with the noun part and verb part. In pay-lo kata 'go by boat', for example, pay 'boat' in Ceycwuto-ey pay-lo kata 'go by boat to Ceycwu island' is construed as INST, but pay 'boat' in tach-ul ollile pay-lo kata 'go to the boat to weigh anchor', as GOAL. This phenomenon shows that the ambiguity can be resolved by referring to the other parts (in this case, Ceycwuto-ey 'to Ceycwu island' and tach-ul ollile 'to weigh anchor').

Table 4. The complexity types for picking out Case
Type
Definition
I
Only the verb part can determine the Case of its argument regardless of the noun part. 
II
Both noun and verb parts should be considered.
III
The relation to other arguments should be considered.

As a result, in case of complexity-type I, we can pick out the Case of an argument only with the verb part without considering the semantic relation between a noun and a verb. This fact considerably reduces the efforts for preparing a training data. However, it may be a factor which drops the accuracy of machine learning because there may be errors in human's intuition. This study is to construct the meaning of words needed to pick out the Case of sentences which belong to complexity-type I and II after we make a computer learn Case concepts by putting complexity-type I to good use. However, complexity-type III is outside the range of this study.

4. Acquisition of Word Meaning by Machine Learning

4.1 Building the Training Data

The observation of a corpus shows that there are a proper number of words belonging to complexity-type I. This information considerably reduces the efforts for preparing a training data. The key point of our approach is to collect the verbs that belong to complexity-type I from a corpus as many as possible. First, we extract triples [noun, Case particle, verb] from a corpus. Second, we carefully select verbs under complexity-type I from the triples. This step seriously influences the accuracy of the machine learning described in the following section. Finally, we manually assign Cases to them considering the given Case particle. The result is called a training set TSET, which is a set of [noun, Case particle, verb, Case]. Note that we will restrict our consideration to the instance that Case particle is '-ulo' from now on. Now, how can we implement this idea on computers? The clue to the solution of the problem lies in Case particles and the collocational information. Let's take them into consideration in the following.

4.2 Machine Learning Algorithm

If we let c be a Case, a verb is represented by [{ (n, f) | n is a noun which co-occur with the verb, f is the relative frequency of the noun in TSET}, c], where f is calculated as dividing the frequency which the noun co-occurs with the verb by the total frequency of the noun in the TSET. The set of such verb is called Instance Set (ISET) which is used as an input of this algorithm. Thus the nouns assume the role of attributes of verbs in learning the concepts of each Case. The other input is a set of hypothesized linear threshold units (HSET), where linear threshold unit (LTU) is an intensional concept representation of a Case. The HSET is initialized to empty sets. The goal of this algorithm is to get the relevant HSET as an output.

Much of the work on threshold concepts has been done within the 'connectionist' or 'neural network' paradigm, which typically uses a network notation to describe acquired knowledge, based on an analogy to structures in the brain. To make the concept of LTU clear in this study, it can be stated as

If å wifi > b then ck, where ck Î { GOAL, ATTR, INST }, wi is a weight, fi is the attribute' value of a noun ni ,, and b is a threshold.
To classify a verb, one multiplies each observed attribute's value by its weight, sums the products, and sees if the result exceeds the given threshold. In principle, an arbitrary LTU can characterize any extensional definition that can be separated by a single hyperplane drawn through the instance space, with the weights specifying the orientation of the hyperplane and the threshold giving its location along a perpendicular. For this reason, target concepts that can be represented by linear units are often referred to as being linearly separable (Langley 1996). Accordingly, since the HSET is able to function as a classifier, it can decide to which Case concepts a new verb belongs.
 
Table 5. Case Learner (CL): learning the concepts of each Case
Inputs 
  ISET: vt = [{ (n, f) | n is a noun, f is the relative frequency of the noun }, c], 
     where vt is a verb and c is a Case, 1 £ t £ the total number of verbs in the TSET. 
  HSET: a set of LTUs which are intensional concept representations of each Case 

Output 
  HSET: a revised one 

Parameters 
  a : a momentum term that reduces oscillation 
  h : a gain term that determines revision rate. 
  cf. H: If å wifi > b then ck. H Î HSET 
       D wi(h): a current delta of weight 
       D wi(h-1): a previous delta of weight 

Procedure CL (ISET, HSET: input; HSET: output) 
{ 
   for each training instance vt in ISET 
  { 
     C = c of vt
     for each Hj in HSET 
    { 
       Ck = Case which Hj predicts for vt
       if Ck is same as C 
       then continue; 

       if Ck is negative and C is positive 
       then s = 1 
       else if Ck is positive and C is negative 
               then s = -1; 

       for each attribute ni of vt 
       { 
         fi = the attribute's value of ni
         if ni exists in Hj 
         then wi = the weight for ni in Hj 
         else wi = fi; 
         D wi(h) = sh fi + a D wi(h-1); 
         wi = wi + D wi(h); 
       } 
       b = b + sh
    } 
  } 
   return the revised HSET; 
}

 
The Case Learner (CL) algorithm in Table 5 is based on the perceptron revision method (PRM) (Langley 1996) which is an incremental approach to inducing LTUs using the gradient descent search. However, we modify it slightly to prevent a phenomenon of oscillation. In other words, we reflect a proportion(a ) of a previous delta (D wi(h-1)) into a current weight (wi). Also, we use the perceptron convergence procedure (PCP) (Langley 1996) in Table 6 which induces LTUs nonincrementally by applying CL algorithm iteratively to the ISET until it produces an HSET that makes no errors or until it exceeds a specified number of iterations. This algorithm guarantees to converge in a finite number of iterations on LTUs that make no errors on these training data.
 
Table 6. PCP for assigning Cases
Input, Output 
  cf. CL algorithm 

Procedure PCP (ISET: input; HSET: output) 
{ 
   HSET = empty sets; 
   count = the maximum number of iterations; 

   while count > 0 
   { 
      for each instance vt in ISET 
      { 
         for each Hj in HSET 
            if Hj incorrectly predicts for vt 
            then go to ERROR; 
       } 
       return HSET; 
       ERROR: 
          count = count - 1; 
          CL (ISET, HSET); 
   } 
}

 
4.3 Result of Experiment

The prototypicality of a verb for each Case is calculated by the left part of LTUs or å wifi obtained by the learning process and is normalized between -1 and 1. The prototypicality of a noun for each Case is wk / å abs(wi) of the corresponding å wifi. The wk is the weight of the noun, which we regard as prototypicality since it represents the discretability of classifying Cases. Finally, we define the meaning of a word by using the assigned Cases and prototypicalities. More strictly speaking, we find a set of Cases which each verb can require and each noun is used for, in company with its prototypicality.

By choosing 30 training verbs considering the frequencies of verbs in YSC-IX Corpus built by Korean Lexicographical Center of Yonsei University, which consists of 1.2 million words extracted from children's books, the training data lead to about 1,000 triples. Table 7 is a part of the results of this experiment. The accuracy of this results is about 87 percent. According to the word definitions in Table 2, a noun kang 'river' is represented by { (GOAL, 0.327), (INST, 0.001), (ATTR, -0.173) } and a verb kata 'go', by { (GOAL, 0.783), (INST, 0.377), (ATTR, 0.183) }.
 

Table 7. Cases prototypicalities of nouns and verbs
 
GOAL
INST
ATTR
kang 'river'
0.327
0.001
-0.173
kil 'road'
0.281
0.130
0.003
yenphil 'pencil'
-0.001
0.190
0.226
pep 'law'
0.041
0.319
0.132
kata 'go'
0.783
0.377
0.183
sayonghata 'use'
-0.347
0.111
0.121
tayhwahata 'talk'
0.000
0.024
0.000
mantulta 'make'
0.001
0.667
0.476
 

5. Conclusion and Future Work

This study demonstrates that words are able to be defined by a set of Case prototypicalities by using the syntactic relations among words considering the characteristics of Case particle. Furthermore, by the machine learning mechanism, the definition of a word can be automatically induced from a corpus. This result is taken to be the direct knowledge about a word for the syntactic and semantic analyses in NLP. Especially, we emphasize that our approach is not only more practical and efficient for implementing on computers, but also more reasonable in view of psycholinguistics.

To improve our experiment, we have got something more to do. First, we should classify, in more detail, the types of Cases which each Case particle may take. For this, the meaning of each Case should be defined more strictly. Second, we should think out how to discriminate necessary arguments, optional arguments, and adverbials by computers. Third, we should improve the algorithms for automatically assigning Cases to words. Fourth, more plausible normalization method is required to adjust prototypicalities calculated by LTUs for each Case and by weights of each noun. Finally, to overcome a difficulty of building the large training data, which is an intrinsic problem of the machine learning, we should try to develop more sophisticated techniques, incorporating both the supervised and unsupervised learning strategies.

Acknowledgements

This research was funded by the Ministry of Information and Communication of Korea under contract 98-86.

References

Chierchia, Gennaro and Sally McConnell-Ginet (1990). Meaning and Grammar, pp. 349-360. Cambridge, MA: MIT Press.

Clark, Herbert H.; and Eve V. Clark (1977). Psychology and Language. New York: Harcourt Brace Jovanovich.

Grefenstette, G. (1993). Automatic Thesaurus Generation from Raw Text Using Knowledge Poor Technique. In 9th Annual Conference of the UW Centre for the New OED and Text Research.

Hindle, Donald (1990). Noun Classification from Predicate-Argument Structures. In Proceedings of the 28th Meeting of the Association for Computational Linguistics (ACL-90).

Hirst, Graeme (1987). Semantic Interpretation and the Resolution of Ambiguity, pp. 28-29. New York: Cambridge University Press. Ingram, David (1989). First Language Acquisition: Method, Description and Explanation, pp. 398-432. Cambridge University Press.

Jeong, Young-Me (1993). An introduction to Information Retrieval, pp. 196-202. Seoul: Gume Trade.

Langley, Pat (1996). Elements of Machine Learning, pp. 67-94. San Francisco: Morgan Kaufmann.

Lee, Ik-Hwan (1995). An Introduction to Semantics, pp. 58-225. Seoul: Hanshin.

Li, Hang (1996). Clustering Words with the MDL Principle, cmp-lg/9605014.

Lim, Hong-Bin and Jae-Young Han (1993). A Study on Classification of Korean Vocabulary, pp. 1-17. Seoul: National Institute of the Korean Language.

Mansuk Song, Gi-Sim Nam, Dan-Hee Yang et al. (1998), Automatic Construction of Case Frame for the Korean Language Processing, The Ministry of Information and Communication of Korea, '97 Research Report, 35-47.

Nam, Ki-Shim (1993). The Usage of Korean Particle. Seoul: Seogwang Academic Press.

Park, E-Do (1996). The Acquisition of Mother Tongue and Learning of Foreign Language, pp. 75-115. Seoul: Hanguk Munhwa.

Pedersen, Ted (1995). Automatic Acquisition of Noun and Verb Meanings. Technical Report 95-CSE-10. Department of Computer Science and Engineering of Southern Methodist University.

Pereira, F. ; N. Tishby; L. Lee (1993). Distributed Clustering of English Words. In Proceedings of ACL 93.

Piaget, J. (1951). Play, Dreams, and Imitation in Childhood. New York: W.W.Norton.

Resnik, Philp Stuart (1994). Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. Dissertation, Pennsylvania University.

Schank, Roger (1982). Dynamic Memory, pp. 77-123. New York: Cambridge University Press.

Schank, Roger and Robert Abelson (1977). Scripts Plans Goals and Understanding, pp. 11-17. New Jersey: Lawrence Erlbaum.

Taylor, John R (1995). Linguistic Categorization: Prototypes in Linguistic Theory. Oxford University Press.

Velardi, Paola (1991). Acquiring a Semantic Lexicon for Natural Language Processing, pp. 343-349. In Uri Zernik, eds., Lexical Acquisition. Lawrence Erlbaum.

Yang, Dan-Hee, Ik-Hwan Lee, and Mansuk Song (1997). Automatically Defining the Meaning of Words by Cases. In Proceedings of the International Conference on Cognitive Science '97 (ICCS-97).

Yang, Dan-Hee, Ik-Hwan Lee, and Mansuk Song (1998), Using Case Prototypicality as a Semantic Primitive, In Proceedings of the Pacific Asia Conference on Language, Information, and Computation 12 (PACLIC-12).

Yang, Dan-Hee, Seong-Hyeon Yang, Young-Sin Lee, and Mansuk Song (1997). Definition and Representation of Word meaning Suitable for Natural Language Processing,  In Proceedings of SOFT EXPO '97 Conference.