Resolving Ambiguous Descriptions through Visual ...

39 downloads 0 Views 73KB Size Report
Ingo Duwe, Klaus Kessler and Hans Strohner. SFB 360 „Situated Artificial ... Hirst, 1995; Dale & Reiter, 1995). But, how exactly is this done? In order to answer ...
Resolving Ambiguous Descriptions through Visual Information Ingo Duwe, Klaus Kessler and Hans Strohner SFB 360 „Situated Artificial Communicators“ Project „Reference in Discourse“ University of Bielefeld, PO Box 10 01 31 D-33501 Bielefeld, Germany [email protected] Abstract In the context of the SFB (special research group) 360 "Situated Artificial Comunicators" the project "Reference in Discourse" deals with the selection of a specific object from a visual scene in a natural language situation. In the SFB scenario, a robot is instructed by a person to construct a toy airplane. One of the prerequisites to solve this task is the identification of a described object in a given set of objects. The system OINC (Object Identification in Natural Communicators) aims at solving this problem in a psychologically adequate way. The difficulties occuring with incomplete and deviant descriptions correspond to data from psychological experiments with human subjects. Introduction From a cognitive point of view, semantic and pragmatic inferences play a major role in various arenas of language processing. Two of these are the establishment of reference and coherence. While the study of coreferential phenomena with nominal and pronominal anaphors has a tradition in psycholinguistics, research into reference proper is less developed. Under the notion of reference proper we understand a direct relationship between a concept and one or more external objects. Reference in natural language poses both theoretical and empirical problems. What people seem to do in order to overcome reference problems is that, in addition to verbal information, they take situational information and world knowledge into account (Heeman & Hirst, 1995; Dale & Reiter, 1995). But, how exactly is this done? In order to answer this question, it is necessary to clarify which type of reference we are talking

about. Our study is concerned with what Clark and Marshall (1981) have termed the visible situation use of definite descriptions. Imagine somebody asking you to pass him or her the sugar. This noun phrase can be taken as an example of a visible situation use of definite descriptions if there is one, and only one, sugar bowl perfectly visible in the scenario (singularity constraint). Both the linguistic and the visual information can be evaluated to identify the object meant (Epstein, 1995; Mott, 1995). Traditionally, considerations pertain to cases in which the referent can be identified uniquely, on the basis of the verbal and visual information. In a series of experiments on demonstrative reference, it has been shown that people can easily cope - even with underdetermined reference. In order to identify the intended referent, subjects rely on perceptual salience as well as on pragmatic assumptions about the speakers communicative goals. Imagine someone asking you to pass him or her the spoon. If there are several spoons in the scenario, one would probably pass the spoon one can reach most easily or the particular spoon the other person presumably wants. It stands to reason that situations in which reference cannot easily be established are most interesting for studies of the role of focus in reference resolution (Gordon, Grosz & Gilliom, 1993; Sedivy, Carlson, Tanenhaus, Spivey-Knowlton & Eberhard, 1994). Focus in Visual and Verbal Processes Unlike many linguistic approaches which describe focus mainly on the basis of structural characteristics of a discourse, cognitive approaches regard focussing as an integral part of cognitive activity. Focus is explained on the basis of activation changes in natural or

artificial cognitive systems. With regard to language processing, focus refers to the language user’s actual focus of attention. In language comprehension, verbal and non-verbal signals may direct the recipient’s focus of attention towards particular concepts. Thus, focussing is regarded as a phenomenon that has observable effects on the recipient’s overt behaviour. In reference, the ultimate goal is to arrive at exactly the amount of information necessary and sufficient to identify one and only one external object. Until then, information must be incorporated into or dropped from the focus by the attentional process. From these premises, focussing - the adjustment of the information in focus - is to be viewed as an emergent phenomenon of global retrieval processes. Bosch and Geurts (1989) advocated an approach in which underdetermined and ambiguous reference can be resolved by a process called accommodation. Accommodation is understood as a cognitive process that links the referent of a definite noun phrase to concepts in a discourse model. The discourse model consists of concepts relevant for the processed discourse. Concepts explicitly mentioned in the discourse are said to be in the explicit focus. If necessary, concepts related to these can be accommodated in explicit focus and are thus available as potential referents. Fraurud (1990) added to this approach the notion of conceptual anchor which emphasises the central role of conceptual relations in reference resolution. According to Fraurud, anchors can be thought as elements of a cognitive framework built up as the result of an interaction between the discourse information and the knowledge of the participants. Anchors can be established prior to or on the occasion of a definite description. In some cases, the anchor can be identified only by means of visual information. It is important to notice that in this theory the notion of anchor is not dependent on the existence of a focus. Reference resolution is possible even in cases where the referent is out of focus. The semantic relationships between concepts are used to find a referent. For instance, if in a discourse the daughter were

mentioned, one could ask oneself whose daughter and thus find a suitable referent. Some Empirical Results In order to assess the role of the visual and the verbal focus channel and to gather evidence on their interplay during the constitution of a resultant focus structure which may serve as an effective reference determinant, we have put to test some of the above considerations in a simple, straightforward referent-identification study (cf. Strohner, Sichelschmidt & Duwe, 1994; Kessler, Duwe & Strohner, 1996; Strohner, Sichelschmidt, Duwe & Kessler, in prep.). In several experiments, subjects were asked to mark geometrical objects. From a set of potential referents presented pictorially, subjects had to choose the one they felt to be most appropriate in accordance with an instruction given in writing, such as Please mark a cube. Every picture contained seven objects from one category (e.g., pyramids) and two of a different kind (e.g., cubes). The nine objects were arranged in three groups of three. The left and right triplets each had one of the two rare objects at their centre. Since we were going to study the factors that determine subjects choice of a particular referent, we made sure to provide a choice by using referentially ambiguous materials throughout. With ambiguous materials, the singularity constraint is violated; there is always more than one potential referent. Yet reference resolution should be possible by recourse to focussing; however, choice of the particular referent as well as the ease of reference should be affected. The main results of the experiments may be summarised as follows: -The conceptual focus built up during preceding tasks directs the selection of a referent in an ambiguous reference task. -The initial focus is often more effective than

the sequel focus. -Even with difficult expressions, most reactions were referential. -Indefinite descriptions lead to more referential reactions than definite descriptions. -With a tolerant goal orientation, subjects respond to definite descriptions more often referentially than with a critical goal orientation.

Descripton type * conceptual relatedness 4000

3500 3205,7

3000

3221,56

2944,05

ms

2685,5

2500

2000 weak related definite

close related indefinite

Figure 1: Interaction of description type and conceptual relatedness

The interaction between description type and goal orientation confirms the hypothesis that reference resolution is not only a semantic but also a pragmatic phenomenon. The effort of looking for an unclear referent is not equal for all situations but depends on the cooperativeness of the communication partners. This is particularly true for the problematic case of definite descriptions. In traditional, philosophical semantics, definite descriptions were said to have a singularity constraint. Clearly, this idea can be refuted on the basis of our data. Our results also show, however, that definite descriptions impose a larger cognitive burden on the recipient than indefinite descriptions. The exact nature of this cognitive burden remains unclear so far and requires further research. The fact that the initial focus but not the sequel focus determined the choice of the reference object may be linked to the first mention effect repeatedly demonstrated in discourse

comprehension. In addition to the factors of focus and description type, in some experiments we also included a semantic factor. The closeness of relationship between the verbal and the visual concepts were systematically varied. Our results show that this had a highly effective influence on frequency, type and difficulty of reference resolution. It remains to be seen, however, what exactly caused this effect. In the mediated reference condition, subjects sometimes marked a reference object by outlining one of its sides only. Thus, the referent in the mediated condition not only seems to be more difficult, but also more ambiguous than in the immediate condition. In one experiment with reading time measures we obtained the expected interaction between conceptual relationship and description type (cf. Fig. 1). It may be speculated that the two factors operate in common conceptual areas. The definiteness type in nominal descriptions may inform about an abstract space of possible referents, whereas the conceptual realisation of the nominal description activates a certain concept in that space. Since there should be some overlap between the two conceptual areas, the observed statistical interaction is a first confirmation of this theoretically derived hypothesis. Some Theoretical Considerations Conceivably, resolving ambiguous reference can function in the following way: In perceiving a scene with various objects, not every object is paid equal attention by the cognitive system. At some point in time, some objects will be in focus, and thus receive more cognitive processing than other objects. Vision is an everyday example of selectivity in cognitive processing: The foveal region, which is the area of the most intensive visual processing, covers only a small portion of the overall information available. The part of the perceptual information most intensively processed is the perceptual focus. On the basis of the perceptual

focus, knowledge driven processes build up associative links to other knowledge units not explicitly given in the visual scene or the verbal expression. One important source for the related knowledge is the experience built up during past interactions with the objects. The Figure 2: The architecture of the OINC system knowledge units activated additionally constitute the conceptual The OINC System focus. Working in the framework of the semantic There is activation of conceptual knowledge theory of Winograd and Flores (1986) Mott during language processing, which contributes (1995) sees reference as a classification to the constitution of conceptual focus. problem within a situation theory. From a Compared to other types of reference, linguistic cognitive point of view the task described is an reference has some special features. One is that assignment problem where verbal information the referent of verbal expressions need not be has to be linked to mental object representation. present in the actual situation. Another aspect is According to our studies, natural language is that for many verbal expressions there is more neither complete nor clear. The previously than one possible referent. Therefore, the mentioned results of our experiments degree of freedom for reference in language constrained the following central theorems on processing is higher than in other modalities. which our system OINC (Object Identification This is the basis of the great communicative in Natural Communicators) is based: power of language, but it can also lead to communicative problems, such as ambiguous, 1. Focus is an activation of mental vague, and metaphoric reference. representations which surrenders to other assignments before. Together, perceptual and conceptual foci 2. A semantic deviance is a perceived constitute a resultant focus process - a difficulty in spite of successful object complex knowledge based process which identification and results in longer comprises that portion of the discourse model calculation. that is most active at a certain point in time. In 3. The object most closely associated to the this focus structure, ambiguous reference representation of the verbal description is solution is based on conceptual relations chosen from the set of objects in the visual between the visual and linguistic information. scene (also deviant or incomplete). Definite and indefinite descriptions may 4. There are associative connections of the contribute to the nature of these relations. mental representations of the objects which Thus, we have laid the foundation for a notion were used for descriptions to a set of of reference not bound to linguistic processes, mental representations X. This set contains but based on general cognitive processes during in turn associative connections to the perception and conceptual processing. mental representations of the situatively available objects. 5. The set X can be a set of qualities.

6. The set X mentioned above is represented on a subconceptuel level of the language description level (Smolensky 1988). In our case this can be a set of qualities, which represents a part of the world knowledge of a human communicator. The system architecture or the arrangement of layers used here, results from the theorem 4 mentioned above. It is shown in Figure 2. The Calculation Algorithm Assignments are calculated roughly and quickly by neural nets. The mathematical similarity of neural nets and statistics was proved recently (White 1989, Rojas 1993, 1994, Paaß 1994 ). An algorithm less examined is the IAC (Interactive Activation and Competition) algorithm of McClelland & Rumelhart (1988). This algorithm is an expansion of the spreading activation algorithms, which were successfully applied to the modelling of psycholinguistic phenomena (Dell 1986, Schade 1992, Eikmeyer & Schade 1994). In the IAC algorithm both positive and negative as well as unidirectional weights are realised. The termination of the algorithm is mathematically secured for |neti|=1 (McClelland & Rumelhart, 1988; Schade 1992). This net is a resonant net. Therefore it is important that a solution is always provided. The formal representation of this algorithm reads as follows: neti = Σ wij outputj + extinputi

increasing of the activation. In general max=1, min