SAPIENS : spreading activation processor for ... - IDEALS @ Illinois

2 downloads 0 Views 396KB Size Report
author. We wish to thank George Kiss for supplying a magnetic tape version ...... .51 saw scissors .25 .18 .08 .29 climbing hanging .20 .16 .11 .14 hammer fist .20.

H ILLINOI

S

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

PRODUCTION NOTE University of Illinois at Urbana-Champaign Library Large-scale Digitization Project, 2007.

.Technical Report No.

296

SAPIENS: SPREADING ACTIVATION PROCESSOR FOR INFORMATION ENCODED IN NETWORK STRUCTURES Andrew Ortony University of Illinois at Urbana-Champaign Dean I. Radin Bell Telephone Laboratories,

Columbus,

Ohio

October 1983

Center for the Study of Reading READING EDUCATION REPORTS

o0

o)\a^ ^

-

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 51 Gerty Drive Champaign, Illinois 61820 BOLT BERANEK AND NEWMAN INC. 50 Moulton Street Cambridge, Massachusetts 02238

The Nationa Institute o: Educatior U.S. Departmento Education Washington. D.C. 20201

CENTER

FOR THE

Technical

STUDY

OF

READING

Report No. 296

SPREADING ACTIVATION PROCESSOR FOR SAPIENS: INFORMATION ENCODED IN NETWORK STRUCTURES Andrew Ortony University of Illinois at Urbana-Champaign

Dean I. Radin Bell Telephone Laboratories,

Columbus,

Ohio

October 1983

University of Illinois

at Urbana-Champaign 51 Gerty Drive Champaign, Illinois 61820

Bolt Beranek and Newman Inc. 50 Moulton Street Cambridge, Massachusetts 02238

The research reported herein was supported in part by the National Institute of Education under Contract No. US-NIE-C-400-76-Q116, and in part by a Spencer Fellowship awarded by the National Academy of Education to the first author. We wish to thank George Kiss for supplying a magnetic tape version of the Associative Thesaurus, the Coordinated Science Laboratory and Computing Services Office of the University of Illinois for granting computer time on the DEC-10, David Waltz for the use of a disk pack which housed a modified version of the Associative Thesaurus, and Harry Blanchard and Glenn Kleiman for their helpful comments on earlier versions of this manuscript.

EDITORIAL

BOARD

William Nagy Editor Harry Blanchard

Patricia Herman

Nancy Bryant

Asghar Iran-Nejad

Pat Chrosniak

Margi Laff

Avon Crismore

Margie Leys

Linda Fielding

Theresa Rogers

Dan Foertsch

Behrooz Tavakoli

Meg Gallagher

Terry Turner

Beth Gudbrandsen

Paul Wilson

Ortony & Radin

1

SAPIENS

2

Ortony & Radin

SAPIENS

SAPIENS: Spreading Activation Processor for

Abstract

Information Encoded in Network Structures This report describes

a

process

memory

in

semantic

computer

implementation

of

a

spreading

and discusses its performance on some tasks often

employed in psychological studies of human language processing. thesaurus

containing

over

between them was used as

the

activation

16,000

words

and

data

base,

thus

all

An

associative

free-associative strengths

making

SAPIENS

confront

the

information and computation problems inherent in large data base manipulation.

One of the hallmarks of intelligence is and

utilize

information

discounting irrelevant comprehension

or

relevant

to

information.

perception)

the solution of a problem while largely

In many cognitive tasks (such

this

accessed from memory more or less

the ability to efficiently identify

means

that

immediately,

exhaustive, or near exhaustive search.

thus

information--is

important

practical

described

in

this

an

information must be

precluding

any

kind

of

subset

of

the

totality

of

important theoretical question in psychology and an

question report

language

Consequently, the relevance problem--the

problem of how to identify a potentially relevant stored

relevant

as

for

takes

AI an

(Artificial

Intelligence).

The

work

AI perspective on the problem, using the

context of natural language processing as its

basis,

although

the

principles

upon which it is based are quite general.

Research in AI has devised various domain-specific mechanisms with

the relevance problem.

Robinson's

(1965, 1968)

early example of an approach that proved fruitful

in

for

dealing

resolution principle is an the

field

of

automatic

theorem proving. In the domain of scene analysis Waltz (1975) solved the problem by taking advantage of distinguishing

the

the

huge

reduction

in

data

storage

problem

solving

from

physically possible pairs from the logically possible pairs

of line junctions in a two dimensional representation of a scene. of

resulting

proper,

a

In

the

area

general guiding principle has been the careful

choice of knowledge representation.

It quickly became apparent that the

choice

could have a dramatic influence on the ease of problem solution, the old problem

SAPIENS

3

Ortony & Radin

Two developments in AI and

to

pertinent

especially

are

psychology

the

The first is the emergence from earlier but vaguer accounts

problem.

relevance

not

primarily

example, Raphael, 1976).

so

on

much

representations (which is the problem of schemas,

scripts,

of

structure the

rather

memory,

and

scripts (e.g. Schank & Abelson, 1977),

be

is the recognition that the distinction between programs and data need

much

less sharp than was generally supposed. The notion is that much information that and

appears on the surface to be declarative in nature can be, represented

as

observation

This

procedural.

important underlying principle of Planner-like languages evidenced

also

the

in

the

devaluation

consequence

of

generalized

knowledge

to

tended

the

the

rely upon associative networks.

(c.f.

is

often

more an

constitutes Hewitt,

1972),

(whether implemented or not) have generally been

structure

of

the

things

organization

knowledge

However, associative networks

system of

structures,

the here

of

Norman

program/data to

be

Rumelhart (1975).

and

called

distinction

is

viewed

merely

as

spaces

in

conduct a specific kind of search operation known as the intersection

to

which

1975).

search (Quillian, 1968; Collins & Quillian, 1969; Collins & Loftus,

The

general mechanism employed is that of spreading activation, and the mechanism is considered reached

is

upon

(e.g.

schemas

have

and

overall

The second, related, development, primarily from AI,

Rumelhart & Ortony, 1977).

advantageously

than

Models of the macrostructure of

represented in memory. (e.g. Minsky, 1975),

the

In other words, it depends on

knowledge representations, variously called frames

generalized

of

nature

on

about

proposals

detailed

1952) of increasingly

Piaget,

working

those

on the overall organization and interrelations of

as

etc.)

to

concern

chief

individual knowledge

of

structure

internal

the

such representations in the system. (e.g., Bartlett, 1932;

depends

Efficient access to all and only the relevant knowledge structures

for

(see,

example

convincing

of the mutilated chess board being a simple but

SAPIENS

4

Ortony & Radin

to have succeeded when it discovers an intersecting node that can

from

the

be

This limited use, however, fails to

different source nodes.

A

that

"schemas," are partly

capitalize on the power both of network representations themselves, and spreading activation mechanism. the potential power of the

The principal purpose of SAPIENS was

spreading

activation

mechanism

the

and

of

the

to harness semantic

procedural and partly declarative. network

representation

to

simulate

schema

selection.

Furthermore, this was

Because the utilization of a schema (in an AI system or in human cognition) undertaken in the context of results

a data base of

sufficient size that the

of automatically, it offers a powerful way of dealing with part

of

the

system's

design could be generally applicable rather than relegated to

relevance

the

the category of ungeneralizable "toy" problems. problem,

but

it

does only deal with part of it. for

bringing

the

into

play in the first place.

of

schema

utilization),

it

was possible to ignore the structure of the

The nodes are simply English

words,

although

we

make

the

In this report we describe a assumption

computer implementation of a process

to

appropriate nodes in the net.

schemas

Given this goal (as opposed

The missing component is the that

schema selection mechanism which is responsible candidate

principles

a great deal of potentially relevant information becoming available

in

for doing this--a process proposed

that

as

such,

they

can, in principle, be treated as the names of

earlier schemas.

in Ortony (1978). Our assumption that a network of words can be regarded schema

names

is

an

important

simplifying

assumption

as

which

a

netweork warrants

of some

Ortony & Radin

elaboration. he

5

SAPIENS

In a complete representation, we assume that there would

at least three different

episodic level.

have

levels--a lexical level, a conceptual level, and an

The lexical level represents connections between words

without

distinguishing between distinctly different meanings that a word might have. the lexical level a word like bank could have connections to other of

which

to

words,

At some

Ortony & Radin

6

Consequently,

the

process

cannot presuppose that a semantic representation of

the input has already been achieved. process

has

to

operate

SAPIENS

at

This

means

that

the

schema

selection

a relatively impoverished semantic level.

Of the

three levels that we have proposed, only the lexical level is devoid of semantic content.

(e.g., money) have to do with the "financial institution" sense, some A spreading activation mechanism ought to have at least the following

(e.g., weeds) with the "side of

a river" sense, and some with

other

senses

characteristics: the word such as those related to basketball, airplanes, etc. level represents associations between words, not between connections

are

represented

at the next,

Thus, the lexical

concepts.

conceptual, level.

five

of

Conceptual

(a)

context

sensitivity,

different patterns of activation for context conditions;

the

which

same

permits

input

the

string

production of

under

different

(b) efficiency, permitting a mechanism to operate in a space

At this level, we containing perhaps

tens of thousands of nodes;

(c)

decreasing

activation

over

suppose that there are separate schemas for the distinct meanings of a word like time, bank.

Furthermore,

so

as to prevent every input from activating the entire network forever;

these schemas are ordinarily not directly connected to one (d) summation of activation from different sources, so as to permit differential

another.

However, they are connected

through the lexical

level

in

the

sense activation

that

they are all directly connected

levels on equally distant nodes;

and (e) an activation threshold for

to the word or words that constitute their each node which determines whether it will transmit activation to other nodes.

labels or names.

Finally, we assume that representations involving some of

more

specific

the Based on these principles, a processor

noteworthy

experiences

centered

developed. represented at the episodic level.

At this

for

operating

on

a

network

was

around particular schemas are

level,

individual

In

it,

spreading

activation

is used as a mechanism not just for

representations finding intersections, but for identifying constellations of candidate nodes for

are

again directly associated with particular schemas, perhaps indexed in terms employment

of notable deviations from the canonical representations (see Schank, 1982).

in

restrict the the

present

the

process

at hand.

set of potentially relevant nodes.

The ability to select

report, we investigate the degree to which schema selection can be set

facilitated through processes

that are restricted to the lexical

level.

of

both practical and

theoretical reasons why this that it

is much easier

to

of potentially relevant nodes for possible use in subsequent processing an

important

component

in

an

intelligent

construct

a

data

base

While

we acknowledge that it is not sufficient to endow a system with

of genuine wisdom, we were unable to resist the name SAPIENS--Spreading

lexical

associations

than

it

Processor The theoretical reason is that

the

for

Information

Encoded

in

Network

Structures.

The program was

schema written in MACLISP and implemented on a DEC-10 computer. The data

process

is

of

Activation

is to construct a comparably sized data base of

conceptual and episodic structures. selection

small

is a worthwhile enterprise. system.

The practical reason is

a

There is, as we have already suggested,

are

In other words, the mechanism is used to

In

necessity

a pre-semantic process.

associative process whose goal is to permit a determination of the meaning

base

was

an

That is, it is a of

some

thesaurus

consisting of over 16,000 words and all free-associative

input. strengths between them (Kiss,

1968).

Since

the

data

base

was

empirically

Ortony

7

& Radin

generated

(by

SAPIENS

soliciting associative responses from students), and since it is

quite large, the network can be regarded as a reasonable facsimile of (part

of)

small subset of relevant concepts potentially usable for further processing

Four tasks were used to examine the performance of SAPIENS, While

It was considered important to use a large, empirically realistic data base

these

two

emphasize that they are not intended as

reasons.

First,

simulated process, semantic

it was necessary to have a data base with

diversity

and

trivializing the problem. called

in order to be able to test the effectiveness of the

"combinatorial

with

sufficiently

explosion"

problem contain

great

interconnections

Second, since a large data base was

simulated semantic network models example,

rich

a

than

Quillian, 1968, encoded about 850 nodes).

a

to

used,

had to be addressed. less

deal

are

aware

of.

avoid

the

so-

Most computer

thousand

nodes

(for

The present system works on

a data base an order of magnitude larger than any other semantic network we

of

system

Clearly the time requirements resulting from the massively

by,

for example, inference or problem solving mechanisms.

some arbitrary individual's lexical map.

for

SAPIENS

Ortony & Radin

tasks

they

were

some

of

suggested by published experimental work, it is important to simulations

of

experiments.

Rather,

should be viewed as illustrations of the kind of problems that SAPIENS can

handle and of the way in which it handles them.

Thus, we do not view the

tasks

as merely being relevant to the question of whether a spreading activation model can, in principle, provide simple solutions to the schema selection problem to

such

issues

as

lexical

disambiguation.

other can in principle solve some set

and

Proposals that some mechanism or

of problems are not very

compelling.

We

view performance on the tasks as demonstrating that a spreading activation model employing a realistically large number

of

nodes

does

solve

these

problems.

increased number of possible paths in a network of 16,000 nodes put much greater

Furthermore, we consider it important that we have a working program to do this,

demands

rather

on

the

processor.

Finally, we felt that

only with a relatively large

and semantically diverse data base would it be possible to explore the potential

from

the

entire

16,000

the

names

of

it

quickly

word network a restricted set of 10 to 20 These words can be

thought

of

as

the best candidate schemas for subsequent top-down processing by

other mechanisms. In this respect, SAPIENS is analogous to the filtering process in

Waltz's

(1975)

words.

line segments (out of

by a semantic description

most-

thousands of possibilities) for further processing

mechanism.

proposal--the

SAPIENS takes an input string and finds a

The

enterprise,

therefore,

is

an

AI

be

used

to

disambiguate

ambiguous

second task seeks to show how standard typicality effects found in

various laboratory tasks

can

performance

cued

in

a

mock

be

accommodated

recall

by

"experiment"

SAPIENS. is

Third,

examined.

describe a simple examination of the effects of manipulating word

SAPIEN's

Finally, we

order

of

an

input string.

program that generates semantic descriptions of scenes with

shadows. This filtering process takes a scene and finds a small subset of likely

theoretical

The first task shows how context can

The main result of SAPIENS is that, given several input words,

relevant words without extensive searching.

a

enterprise.

of the system for dealing with a broad range of tasks.

identifies

than

It should be emphasized that we do not claim that the mechanism we is

sufficient

to

realize

complete

solutions to

schema utilization is equally important. conceived,

a

spreading

activation

all such problems.

However, we do

mechanism

may

well

claim be

that, a

propose Clearly properly

fundamentally

Ortony & Radin

SAPIENS

important ingredient insofar as it represents a viable solution selection problem.

to

the

schema

We consider it significant that so much can be achieved with

so little, considering that

the

only

data

that

SAPIENS

operates

Ortony & Radin

with

are

associative strengths between words.

10

SAPIENS

The words that are used as input strings to SAPIENS are called seeds. seed

is

weighted so that the original associative strengths can effectively be

altered to simulate context effects and decreasing expansion

consists

of

accepting

each

input

activation

seed

over

complete, Before discussing the design characteristics of

the

program

it

will

be

define and clarify our use of a number of key terms.

First, in

the the

discussion that follows we do not distinguish between nodes, words, and

intersections

are

found

within

relevant

forward environment or RFE.

the

expanded

list,

concepts.

This

is

time-slice

of

English

words.

(and therefore the RFE) occur within one

or spread.

This breadth-first

expansion in one time-slice simulates

computation process, and thus spreading activation in SAPIENS can

be

However, our theoretical orientation regarded

views words as

comprise

The expansion of all the input seeds

The network contains nodes which a parallel

representations

them,

these

not because there is no difference, but because in the

present context the differences do not matter. are

and

schemas and the identification of intersections

or

An

in turn as a stimulus and

intersections, along with the activation levels associated with to

time.

creating a single list of all responses to these stimuli. After the expansion is

Design Features

helpful

Each

labels or names for schemas or

concepts.

We

can

as a parallel process which simultaneously spreads activation out from

consistently several input seeds.

maintain

this

position

even

though

underlying schematic structure that

we have made no attempt

to implement the The operation of SAPIENS

such nodes would have to have if

they

solution.

to be employed in a full-fledged natural language processor. The network links

itself, then, consists of some 16,000 English words

and

is analogous to growing a crystal in

a

saturated

were Input

seeds

used

chemical seeds

used

interconnected

associative

to

to

start

a

start

spreading

crystal-growing

activation process,

are similar to

and

the

densely

their network

is similar to a densely saturated chemical

which represent the empirically based associative strengths between them. solution. The seeds constitute a core around which layer upon layer of molecules

Associative

strength is an integer measure (1-100) of how many times a

response (or

(R)

was

given

to

a

specific

stimulus

(S)

by 100 people (Kiss, 1968).

nodes)

cluster.

This

clustering

around the seed core. The first layer is proportion of

the associative strength "sent" from S to R is called

formed

of

molecules

attracted

to

the

seeds.

are

the total amount of activation currently being

is the sum of all activations

coming

into

that

further

any

other stimulus words that

set

of

In a

are "transmitting" or "sending" activation.

The associative thesaurus contains both forward (S-R) and The

point,

word an attraction threshold fails to be reached and the growing process stops.

from

most

After several layers have formed, there is

less of a tendency for the crystal to continue growing. At some received at that word. It

that

activation. strongly

The activation level of a word is

produces an onion-like series of shells

The

reverse

nodes that can be reached from (i.e. as responses

(R-S)

similar fashion,

the

connected

around

first

SAPIENS

spread

creates

a

cluster

of

strongly

data. nodes

the input seed core.

As each new layer is formed, the

to) a particular size of the cluster increases, and more nodes are pulled into the RFE.

word is called the forward environment. layer, however, is formed of successively more remotely related nodes.

Each new

11

Ortony & Radin

related to the core. clusters

after

Normally, SAPIENS stops

Design Example

are

mildly

related

at

a

find

small subset of

since

spread

second

the

but more importantly, the

best,

purpose for the spreading has been fulfilled in one or two spreads, quickly

second

namely,

to

for simplicity's sake, fictitious but realistically representative data.

using,

of

weights

all responses and associative strengths from the

useful

for

notice

likely

be

to

Figure 1.

entire network.

that

On average, each node in the network spreads out to a

algorithm,

goal-searching be

eventually

reached, It is

interconnected. prevent untold

especially

thousands since

20

about

upon

thousands

the

network

associates. would

nodes

of

so

is

the was

fruit

In

0.51.

other

words,

they

seeds

plus the intersections are re-input as

is

important

empirically-based

are

Insert Figures

the in the expanded space and to

result.

will

as illustrated in Figure 2.

As

as that

is,

an

area

of

very

input

for

next

the

time-slice.

The new input list contains the original the

intersection

words

at

weights

form as more intersections are added to the inputs. which

The

is created

high seeds with input weights 1, together with

to

create

seeds, it becomes more likely that most

of the activation will circle in on itself, begin

transition

1 and 2 about here

Once the intersections are found and the strengths summed, a new list

will

fact

the

---------------a new list of the intersections and summed strengths

activation

as shown in

new seeds. The result of

this re-input is to restrict the kinds of intersections that more and more intersections are input as

to

probabilities.

The next step is to find intersections original

It

seeds.

proportion of subjects producing banana as a response to the stimulus

densely

spread

input

with

The value 51, for example, from fruit to banana, refers to

this same density that enables the clustering approach to

thousands of nodes from being activated. At each new

both

the total activation output from each seed equals 100,

that

further processing, but the ones that are useful are found without searching the

fruit,

First, apple and fruit are expanded to create a list containing

1.

Not

are

and

Suppose the input seeds are the two words, apple

relevant concepts for further processing.

all of the words found represent concepts that

In

SAPIENS

12

We shall now describe a fairly detailed example of the operation of SAPIENS

another layer about the first, but is associatively less strongly

wraps

successive

Ortony & Radin

The

The first spread creates a tight cluster around the seed core. spread

SAPIENS

inputs are recursive, each input containing part of the

previous

one,

reflect

the

proportion of total activation on each word (see Figure 3).

and The purpose of using proportional weights on the intersected nodes is to prevent

the

effect

is

to quickly excise from the network the relevance structure that an

activation explosion on the next time-slice.

The weights on a seed are used

exists around the original inputs. to multiply the associative strengths example,

since

response from red activation

the

of

each

response

to

that

seed.

For

weight on the node red is 0.3, the activation sent to each

will

be

multiplied

by

0.3.

This

means

that

the

total

sent from red will be 0.3 x 100, or 30, because each node originally

outputs a total of 100.

The original seeds are always re-input at weight 1

and

SAPIENS

13

Ortony & Radin

The maximum strength sum attainable after any spread is 100xN + 100,

successive intersection nodes are always input at proportional weights which sum to 1 to simulate the attention being placed on the input seeds (high activation) of

process

the

and

activation

spreading

operating

activation)

(low

on

N

the

equals

number

outputs a strength of intersected

intersected nodes.

100 (weight = 1),

the

second

and

is forced to equal 100

nodes

spread,

sum

the

of

strengths

(sum of weights = 1).

injected into the network at

the

to prevent an activation explosion),

to 1 (so as

decreases rapidly. the

spread,

is constrained to sum

the activation on intersections

increases in length on each

As the new seed list

be

to

happen

word

each

on

activation

in

intersections

too,

for these particular responses are the only

It is,

of

possible,

course,

and

different strength sums. For example, if the maximum strength is 300, and

desirable,

proportional

activations

on

continues because some nodes

additional

receive

will

seeds produces a sum

of

60

activation

percent.

(and one time-slice)

is complete.

spreads,

another

and

the new input list, the spread

available

activation

Thus,

the first pair of seeds

and

the

second

pair

represents

is better "integrated" (in the sense of

"inter-related") than the second pair by 10 percent of the available activation.

apple

containing

always

more

intersections and

The

are

exceptions

included

in

the

RFE

it

which

is

Each word in the RFE has an activation

smaller activation strength sum. "Association" in this sense is

whether they received

levels

associated

the total amount of activation received by that word in the

time-slice that just ended. The activation activation

strength

of

strength

nodes currently in the RFE.

activation strength sum is 74.

sum

is

the

sum

of

a

the

thus better thought of as activation or not.

hence a higher activation strength sum.

Weakly associated seeds may produce an RFE with only a few intersections, and sources.

correspondingly are

an

Normally all nodes that appear in the RFE must

have received activation from at least two which

produce

and RFE

seeds,

10

At the end of each cycle the result is

Notice, as can be seen in Figure 4, that the input seeds

fruit do not have strength sums.

pair

a sum of 30 after 2 spreads, we can say that the first pair represents

Input seeds that are strongly associated with one another will

a new RFE.

with

2

from 20 percent of the

Continuing the example, with the creation of

after

RFE words will change as the

relevant

newly accessed nodes.

input

intersections)

the relevant forward

produces

cycle

of

beginning

nodes receiving activation in the network.

one pair of

spread

In our example,

successive

among

the

all

A maximum attainable strength sum enables us to make meaningful comparisons

proportional

environment (RFE) must decrease. that

from

300 is the maximum attainable strength sum. This can

then

occur only if all of the responses (from the input seeds and the Since the total input weight of all intersected nodes

where

This is because each input seed always

input seeds.

of

if a total activation of 300 is Insert Figures 3 and 4 about here

SAPIENS

14

Ortony & Radin

all

In the present example, the

a measure of "inter-relatedness" between words.

SAPIENS

15

Ortony & Radin

the

Program Performance

usual

SAPIENS

16

Ortony & Radin

are vague, being consistent with a wide range of

rather they

sense,

very different kinds of referents. Lexical Disambiguation described

The first task to be

The most

this problem is the need to disambiguate homonyms on

the basis of contextual information. of

the

comprehension]

with

problems is

models

puts it:

As J. Anderson (1976) all

more

deal with multiple word senses or multiple syntactic possibilities spreading

model provides

activation

simulate

on

a

computer,

question of their psychological validity. (p.

The degree to which SAPIENS is disambiguation

problem was

able

448

than

The one

solution

the

to

of

The

input seeds.

intersections

meaning

of

the

For example, consider the words mint, bank, bar, fruit, and the usual sense:

between the sense of mint as candy, and as

they

each

a

place

for

have

manufacturing

in distinguishing the sense of bank as

institution from the sense in distinguish the sense of bar as

which

to

ought

be

if yellow were part of the

whereas,

The results of the simulation

Insert Table 1 about here

more

degree of

rivers

have

banks.

And

we

coins.

a financial wanted

to

a place to have a drink, from the sense in which

a rod is a bar. The remaining cases, fruit and game, are not ambiguous words

in

interconnectedness or integration of

an input. Thus, for example,the first two rows pecuniary

mint

of

sense

is

better

in

the

RFE

in

Table than

1

suggest

that

river appears high in the RFE (see

Row

3

that the original seeds

of

the

Table

1)

the

the gastronomic sense. appear

only

receive activation from their associates.

they themselves

if

of

measure

the word clusters associated with

integrated

SAPIENS is

Another interesting property of

because

it

So,

received

activation from water, stream, and/or other activated elements in the RFE.

distinct meaning. In particular, we were interested in distinguishing

Similarly, we were interested

lemon,

context, one might expect the reverse to be true.

... are hard

investigated by injecting a context-setting word and

first three are ambiguous in

the context for fruit, apple

Recall that the activation strength sum can be taken as a

that were clearly associated with the contextually appropriate

game.

or

instances of the concept.

are presented in Table 1.

found in one or two spreads were usually sufficient to produce clusters of words

word.

banana,

than

impose

The

)

contribute

to

the target (ambiguous) word into the network as

disambiguous

available

to

ought

is irrelevant to the

difficulty

this

but

...

provide

to

ought

the potential for associative context

to prime a word's meaning. These parallel, strength mechanisms to

constraints on the relative accessibility of different Thus, for example, if red is part of

they run into difficulties when they must

that

it

context

terms,

vague

an ambiguous word. In the case of the

language

[computer-based

current

context

of greater availability of words related to the appropriate meaning of

evidence

typical example of

One

the

in the face of more than one alternative.

appropriate word meanings

selection of

of

problem

to

sensitive

If SAPIENS is appropriately the

with

concerned

is

A general observation that can be made about these results ranked

highly

nodes in the RFE are, as

cases where the RFE is bank

in

related, related

large (as

is that the most

a rule, semantically highly related. In

indicated by a high strength sum),

for

example

the context of money, the least highly ranked concepts are very weakly often

representing

concepts.

For

syntagmatically,

example,

make

rather

than

paradigmatically

appears very low on the list for bank.

Ortony & Radin

17

SAPIENS

Presumably it is only there at all because one puts money that the

bank.

One

suppose that and

the

way

makes

into

to interpret the varying numbers of nodes in the RFE is

the more nodes there are in it, the better integrated the

RFE

to is,

more knowledge there is associated with the particular use of the term

(compared only, of words,

one

uses

course, to alternative uses

giving

rise

to

larger

frequency, or more typical ones.

RFEs

of

the

might

same

be

term).

regarded

In

other

as the higher

However, caution should be exercised

in

Ortony & Radin

18

SAPIENS

drink and The mother made a drink suggest different kinds of bar

was

crowded

and

The

while

The

bar was rusted imply different kinds of bars. Since

SAPIENS finds items relevant to all of followed

drinks

the

input

words,

the

RFE

for

drink

by bar will reflect both a bar sense of drink and a drink sense of bar.

Similarly, finding a relation between red and fruit, and fruit and red entails a comparable process, except that the seeds are (technically) unambiguous.

this Typicality and Verification Tasks

respect: it

is not being claimed that, for example, the most probable use of the

word mint,

is the pecuniary sense, but only that that

than

particular

the

present

data base.

preferred

sense

is

more

alternatives that were investigated in the context of the

It is perfectly possible that mint as

sense in some other data base.

herb,

would

be

the

Furthermore, a realistic test of this

would need to control for the frequency of

the context-setting word,

It is now a well-established finding

probable

which

was

which

true

typical the example, x, is of the category, y, Shoben,

&

schema

selection

(presumably

in

top

down

processing),

it

can

be

concluded

speed

with

Rips, 1974).

(see, e.g.,

Rosch,

1973;

Smith,

If apple is a more typical fruit than strawberry, then

This fact is, of course, consistent with the view that

exemplars are more available than less typical

ones.

The

second

for set of

use

the

the sentence An apple is a fruit will take less time to verify than the sentence

more typical If the program is viewed as a model of

that

of the form "An x is a y" can be verified depends on how

A strawberry is a fruit.

not done in the present case.

later

sentences

in Psychology

simulations was conducted to see whether or not SAPIENS would demonstrate

that SAPIENS is such differential availability.

reasonably successful. small

relevant

subset

This relevant subset, related

to

the

The clustering process was supposed to of

find

a

the nodes without resorting to extensive searching.

the RFE, ought to have, and

input

quickly

seeds

taken together.

did include, nodes

that

were

The program, it can therefore be

concluded, is able to restrict the candidate nodes for subsequent processing that

so

those candidates are likely to be relevant to the appropriate meaning of

a

Injecting apple followed by fruit containing

that SAPIENS does not know which seed is

the context-setting word

activation

the

RFE

By comparison, if

the

with

several

different

kinds

of referents.

an

RFE

listed

in

order

of

seed

nodes

are

strawberry

activation

proportion

of

0.21

and

summer, and

(sum

strength

64).

Thus,

as expected,

the more typical exemplar, apple, produces an typical one.

Similarly,

robin

are followed

compatible

are

comprises cream, juice, jam, raspberry, pie, tart,

RFE with greater overall strength than the less Both words

in

and

the target word. If both nodes are somewhat ambiguous, a "complementary

disambiguation" can occur. Consider for example, drink and bar.

results

level; the proportion of available activation was 0.32 (sum strength

for this RFE was 97).

equals which is

network

the words pie, pear, orange juice, tree, juicy, tart, banana, green,

food after one spread, with an Note

the

sauce, and summer after just one spread. These words

fruit,

word as determined by the context.

into

The bartender made a

by bird gives bird, song, sparrow, swallow, thrush, and starling after

SAPIENS

19

Ortony & Radin

two spreads, with an activation proportion of

while

0.38 (sum strength of 114),

penguin followed by bird gives bird, fly, black, feathers, feather, flight, skky, and wing after two spreads, with a smaller activation proportion

of

(sum

0.29

Finally,

observations

of

couple

of

First,

RFEs.

it

for

again,

present

really

not

is

this

purposes

should be mentioned that psychological experiments on that

are

subjects

faster

at

the

about

rejecting non-examples like A

are very fast at contents

it

fly. As it is currently set up, SAPIENS cannot handle

examples of category members than poor ones, but also that they

good

verifying a

making

worth

is

but

negative properties, important.

to

sentence verification routinely reveal not only

strength of 87) than in the robin case. At this point it

unable

are

penguins

SAPIENS

20

Ortony & Radin

is

pencil

a

bird.

RFEs

The

should be remembered that the data that form the corresponding

the

from

resulting

inputs

can

be

interpreted

in

manner

a

basis of the network were collected several years ago and represent some amalgam consistent with this finding, for, in the case where the term refers to of

a

non-

Assuming

number of undergraduate students at Edinburgh, Scotland.

large

a

of the category, the RFE tends to be empty so that the sum strength is

instance English than the general tendency of the dialect to be closer to that of British

equal to or close to zero. to

it

English,

American

be

should

recognized that robins (being relatively

uncommon in Britain) are not, in fact, the best thrushes,

and

peculiarities

of

this

kind.

more

drink bar.

by

U.S.

standards

In Britain one of

(especially at the bar) is

(perhaps

green as they are red

pies

in

used

causes no real problems.

and

highly activated.

wing

Nurses have to be licensed.

example

present most

typical

things

to

drink

in

a

pub

Sometimes

as

a

word

this is just a plural form, sometimes a

Furthermore, it is

interesting to note that the second It is possible

that

words

contributed enough activation to fly for it to become

On the other hand,

perhaps

fly

refers

to

the

fact

that

cue.

cue

remote

In

the

actress would be the close cue for sentence (1) and the remote

cue for sentence (2), while doctor would be the close cue for sentence the

highest RFE word for the pair penguin/bird was fly. feathers

(2)

Later, subjects were given either a "close" or a "remote" recall

a pint (of beer).

feather and feathers).

Nurses are often beautiful.

Another

tarts.

and

(1)

might be the occurrence of pint in the RFE for

the

subjects studied a

number of simple sentences such as (1) and (2).

verb form, and so on. Ideally, these would have been removed, but their presence

like

In an experiment reported by Anderson and Ortony (1975),

of

the experience of most

Notice also, that the RFE often contains more than one cognate (e.g.

number

so) because there is a subcategory of apples known as cooking apples

which are always green (and sour) and are peculiarity

There are a

For example, it is

English people that apples are typically just as even

Cued Recall

starlings are all better. Notice that these other good instances

in fact find their way into the RFE for robin and bird. other

birds--sparrows,

of

exemplars

for

sentence

(1).

The

experiment

showed

that

(2)

and

what

was

semantically close or remote depended on the entire sentence rather than on part of

it

since

sentences

the

cues

were

which included key

not words

sufficient

to permit the recall of control

like beautiful and licensed.

For

example,

21

Ortony & Radin

the cues were not

SAPIENS

differentially effective, indeed, they were not

all for control sentences like (la)

effective

at

and (2a).

Ortony & Radin

just

one

source

Landscapes are often beautiful.

(2a)

Taverns have to be licensed.

cue

probability

account

sentences: distant

is

an

5d).

example

The of

intersection a

seed-cue

between

the

intersection

RFE2

and

the

(Figure 5e).

The

activation strength sum of the seed-cue intersection is directly related to

Ortony (1978) has suggested that a model along the lines of the present one can

SAPIENS

will not exceed its threshold because insufficient activation

will be received (Figure expanded

(la)

22

for

these

results.

close recall cues

recall

cues

Different RFEs are created for the different

integrate strongly with their respective RFEs,

integrate

weakly

and

sum is

of

the

recalling the input given a cue because the activation strength

a measure of the overlap between the input seed RFE and the cue

RFE.

It

is reasonable to assume that if a cue RFE completely overlaps (and restimulates) the input

seed RFE that the probability of recalling the

inputs

will

be

very

with them. These integration strengths high.

On the other hand, if a cue RFE does not overlap any portion of the input

reflect the probability of a cue eliciting recall of a sentence. The task in the Anderson and Ortony first

creating

(1975)

experiment

was

simulated

by

RFEs from input seeds corresponding to the substantive words in

the to-be-learned

sentences.

Then the expanded cues were intersected

input seed RFEs to find the amount of operations used by SAPIENS is

integration between them.

with

the

The sequence of

seed RFE, it is unlikely that the

input

associates

reactivated.

of

the

inputs

were

will

be

recalled Therefore,

strength sum is greater than another, it seems reasonable former

to

because if

no

close

one activation

assume

that

the

sum will reflect a greater probability of input sentence recall than the

latter sum.

The results

are summarized in Table 2.

illustrated in Figure 5. Insert Table 2 about here

Insert Figure 5 about here In almost all cases The process begins when seeds A and B are injected into the network (Figure 5a).

Intersections are then found between expanded A and expanded B. This area

is called RFE1 (Figure 5b).

The seeds and RFE1

(Figure

in

5c)

as

explained

are

re-input

the basic design section.

to

the

This gives rise to a

second RFE (RFE2) resulting from the intersections of seeds A and B, (Figure

5d).

part of this that

a

Any

word

that

with

RFE1

receives activation from at least two sources is

(and subsequent) RFEs because the notion of

certain

network

a

threshold

amount of activation is received by a node before

transmit activation to other nodes.

requires

it begins to

In SAPIENS, a node receiving activation from

Ortony (1975).

the trends are in the direction found by

in

any way affected the validity of the test.

the cue sexy, for the sentences about nurses was the

original

network. revert

and

Not all the words from the original experiment were available in

the network, consequently some minor changes were needed. changes

Anderson

cue

used

None of these wording For example, the use of

necessitated by the

fact

that

in Anderson and Ortony (1975), actress, was not in the

As an aside, it should perhaps be mentioned that

this

forced

us

to

(for the purposes of science, only) to the standards of sexism that were

pervasive at that time the data base was assembled some 20 years ago.

SAPIENS

23

Ortony & Radin

It is important to emphasize that no

claims

specific comprehension and memory processes. whatever the comprehension process is,

are

being

made

about

here

All that is being supposed is that

bus

is weighted high, and bus-related items are ranked high when the input seed is weighted high.

an

of

it does involve the establishment

SAPIENS

24

Ortony & Radin

Insert Figure 6 about here RFE

to

limit

also assumed that that RFE could be used as part of the

representation

is

It

the schemas that are brought into focus for that process.

the

of

the

This result suggests that the salience of remembered

sentence

reproduced

(or

from

it),

input

an

play

can

words

a part that apparently can be role

important

in

determining

The

the activation levels of nodes in an RFE.

gainfully employed in the retrieval process. tend

activation levels are important because the nodes with the most activation Word Order Effects

to be more relevant than lower activated nodes.

Compare sentences (3a) and (3b).

Needless to say, assigning weights of 10 to the agent, and 1 and the patient

(3a)

The boy smashed the bus.

(3b)

The bus smashed the boy.

is rather an arbitrary way of handling the problem, yet it seems

to work for all that. weights In sentence (3a) images of vandalism and boy-related items are likely to come to mind, whereas in sentence (3b),

accidents and bus-related items come to mind.

Ordinarily SAPIENS would accept boy, smash, and bus parallel input string.

exactly

the

one

simultaneous,

same

That is, boy, smash, bus and bus, RFE.

Since

different meanings, it would be nice if

sentences

smash,

and

boy

(3a) and (3b) actually have

be

would

A more sophisticated approach

select

to

the

the basis of some kind of optimization procedure. Since SAPIENS has

on

For

role.

case

no parsing ability, the user has to apply the weights to each

natural language it would be necessary to recognize, for example, that

handling

Thus the syntactic features of subject and predicate, or

actor and object, are lost. produce

as

verb

the

to

the

be made. The purpose of the present demonstration, however, is only

to

weights

to

adjustments

appropriate

the input was in passive form so as to permit

to show that,

in principle, SAPIENS has

to

flexibility

the

be

sensitive

to

simple syntactic constraints.

the relevant syntactic information could Conclusion

somehow be preserved. Natural language processing, like other One way in which this can be done is to give a greater weight to the

areas

of

AI,

has

to

face

the

agent problem of how to reduce the search set of candidate representations if it hopes

of

the

input sentence.

The agent (subject) might, for example, be assigned an to utilize appropriate ones to facilitate top down processing.

input weight of 10, and the verb and patient might be assigned weights of,

to 1.

proved

be

a

somewhat

stubborn

problem

in

the

past,

and

one

which

becomes

SAPIENS

appears

Such weights could be regarded as reflecting the salience of each case role. increasingly difficult as the size of the data base increases.

Figure 6 shows the effects on the RFE that such a manipulation of

weights

has. to

As

This has

say,

one

might expect, boy-related items are ranked high when the input seed boy

be

a viable general solution to this problem, because it quickly produces a

Ortony

& Radin

small set help

25

SAPIENS

behind the design of mechanisms

that

SAPIENS

would

be

are

essentially

employed

independent

AI

domains

as

well.

embody a realistic representation of

the

particular

in a complete natural language processing

The chief constraint on SAPIENS the

associative

SAPIENS

References

The principles

of

system, and, in fact, there is no reason why SAPIENS could not

nodes.

26

of candidates and is able to do so in a manner that would permit it to

in a range of important natural language processing tasks.

other

Ortony & Radin

be

utilized

in

is that it has to

connections

between

its

In our implementation the onerous task of assembling these data was done

Anderson, J. R.

Language, memory, and thought.

Anderson, R. C., & Ortony, A. polysemy.

New York:

On putting apples into

Wiley, 1976.

bottles -- A

problem

of

Cognitive Psychology, 1975, 7, 167-180.

Bartlett, F. C.

Remembering.

Cambridge:

Collins, A. M., & Loftus, E. F. processing.

A

Cambridge University Press,

spreading

activation

theory

of

1932. semantic

Psychological Review, 1975, 82, 407-428.

independently. Collins, A. M., & Quillian, M. R. While spreading activation occasionally

used)

before

been shown to be viable or significant

size.

proposal, and as since

it

SAPIENS

mechanisms

have

been

proposed

(and

in

AI,

in both psychology and AI, such a mechanism has not efficient is

when

applied

to

a

data

base

of

any

an implemented system, rather than an abstract

such, the specific details

of

its

design

become

important,

is just such details that distinguish approaches that could only work

in principle from those that work in fact.

Retrieval time from semantic memory.

Journal

of Verbal Learning and Verbal Behavior, 1969, 8, 240-247. Hewitt, C.

Description and theoretical analysis (using schemata)

language

for

A

proving theorems and manipulating models in a robot (AI Tech.

Report No. 258). Kiss, G. R.

of planner:

Cambridge, Mass.:

M.I.T., 1972.

Word associations and networks.

Journal

of

Verbal

Learning

and

Verbal Behavior, 1968, 7, 707-713. Minsky, M.

A framework for representing knowledge.

psychology of computer vision. Norman, D. A., & Rumelhart, D. E.

New York:

In P. H. Winston (Ed.), The

McGraw-Hill, 1975.

Explorations in

cognition.

San

Francisco:

Freeman, 1975. Ortony, A.

Remembering, understanding, and representation.

Cognitive

Science,

1978, 2, 53-69. Piaget, J.

The origins of intelligence in children.

New

York:

International

Universities Press, 1952. Quillian, M. R. processing. Raphael, B.

Semantic memory.

In

Cambridge, Mass.:

M.I.T. Press, 1968.

The thinking computer.

M. Minsky

San Francisco:

(Ed.),

Semantic

Freeman, 1976.

information

SAPIENS

27

Ortony & Radin

Robinson, J. A.

A machine-oriented logic based

on

the

resolution

Journal of the Association for Computing Machinery, 1965, 12, Robinson, J. A.

The generalized

resolution

principle.

In

principle.

23-41.

D. Michie

(Ed.),

Table 1 Machine

Intelligence 3.

Edinburgh:

Edinburgh University Press, 1968.

Results of Disambiguation Simulation Rosch, E. H.

On the internal structure of perceptual and

In T. F. Moore (Ed.), New York:

Cognitive development and the acquisition of language.

& Ortony, A.

C. Anderson,

Dynamic Memory.

Schank, R. C.,

& Abelson,

Hillsdale, N.J.:

The representation of knowledge in

R. J. Spiro

acquisition of knowledge. Schank, R.

&

W. E. Montague

Hillsdale, N.J.:

(Eds.),

memory.

Schooling

Number of Nodes in RFE

Strength Sum

Proportion of Total Activation

3

22

.07

11

77

.26

9

87

.29

23

123

.41

3

23

.07

beer, drink, pint

7

83

.28

fruit

orange,

4

25

.08

red

fruit

apple,

green, hot

3

31

.10

card

game

play,

poker, board

3

28

.09

ball

game

play,

football,

3

33

.11

chess

game

board,

2

40

.13

First Three RankOrdered Nodes in the RFEa

Context Word

Target Word

chocolate

mint

sweet,

coin

mint

money, cash, gold

river

bank

river, water, stream

money

bank

money,

metal

bar

iron, door,

drink

bar

yellow

In

and the

milk,

taffee

Erlbaum, 1977.

New York: Cambridge University Press, 1982. R.

P.

Scripts,

plans,

goals

and

silver, book

understanding.

rod

Erlbaum, 1977.

Smith, E. E., Shoben, E. J., & Rips. A

categories.

Academic Press, 1973.

Rumelhart, D. E., R.

semantic

Structure and process in semantic

featural model for semantic decisions.

memory:

Psychology Review, 1974, 81,

green, bananna

214-

241. Waltz, D.

Understanding line drawings

of scenes with shadows.

(Ed.), The psychology of computer vision.

New York:

In P. H. Winston

tennis

McGraw-Hill, 1975.

aRelevant

play

Forward Environment

Table 2 Figure Captions Results of Cued Recall

Simulation ____

Proportion of Total Activation With

Target Sentence Paira

The nurse washed her

Close Cue

Remote Cue

shampoo

detergent

hair.

The nurse washed her clothes.

.36

.18

.18

.22

Figure 1.

Expansion of nodes apple and fruit.

Figure 2.

red and pie are

intersections found

in the expanded

environment of apple and fruit. Figure 3.

Original

seeds, in this example the nodes apple and fruit,

are always weighted at 1.0.

Intersections, here the nodes red and pie,

are proportionally weighted at fractions which reflect

sexy Nurses are often beautiful

.23

Nurses give health care.

.09 saw

doctor .17

.51

of total

the proportion

activation on each node.

Figure 4.

The end result after one time-slice

is a relevant forward

scissors environment (RFE).

The farmer cut the wood.

.25

.18

The farmer

.08

.29

climbing

hanging

The ivy covered the walls.

.20

.16

The picture covered the walls.

.11

.14

cut the

fabric.

Figure 5. in the cued

recall

Figure 6.

hammer The man hit the nail.

.20

The man hit the jaw.

.07 tv

The man watched

the show.

The man listened to the

aWords b

in italics

show.

fist

.11 .19 radio

.11

.01

.13

.11

represent input seeds.

As presented, the close cue usually produces a higher proportion of total activation for the first member of the pair and a lesser proportion for the second. Similarly, the remote cue produces a lesser proportion for the first member and a higher proportion for the second.

Steps

involved in creating the seed-are

experiment

intersection used

simulation.

Result of simulation of syntactic effects.

31

21

12

44

10

8

red t=22/74=.3

pie wt=52/74= .7

(a)

(b)

(c) (d)

(e)

1 higher activation

RFE rank

lower activation

I 2 3 4 5 6 7 8 9 10 I 12

/III

\

-

I

)y2 ) -

N.

/ N.>

-~ c: C) 0

IE

C-)

boy-related

J) E

E

01) .0l

items

L. L.

L.

1

-0

c} 0

C0

-a

T3

O 0 0

t-JC ot

L,

L0) 1L.

Q.

c L.

4-

bus-related items

I

This straight line shows the ranking of nodes in the RFE produced by boy smash bus.

2

This curve shows how the nodes are ranked lower for boy-related items and higher for bus-related items when the input seeds are bus smash boy.

I