24 downloads 0 Views 5MB Size Report
and all types of references spring quickly to mind: horses running, a race track. ...... HELP, a medical advisor, was developed by a team of doctors from the.



Series Editors: Professor JOHN CAMPBELL, Department of Computer Science, University College, London, and JEAN HAYES-MICHIE, The Turing Institute, Glasgow

ARTIFICIAL INTELLIGENCE A Tool for Industry and Management HANS W. GOTTINGER, Institute of Management Science, University of Maastricht, The Netherlands, and HANS P. WEIMANN, IABG, West Germany Artificial intelligence is the study of using knowledge to solve problems using computers. It offers a new perspective and a new methodology. This methodology has been put to use for and tested by emerging Al technologies. The book covers some major commercial developments (e.g. expert systems, natural language processing, speech recognition, robotics, and others). It guides the reader through key developments, enabling him or her to make pertinent technology assessments as well as investment choices in terms of current applications, trends and future opportunities. Its focus on a technology assessment view of Al makes the book unique among competitors. Readership: Information technology and management. Innovation economics and industrial management. Computer science (informatics), and cognitive science. From first year undergraduates through to postgraduate, research and industry.

ARTIFICIAL INTELLIGENCE A Tool for Industry and Management

ELLIS HORWOOD SERIES IN ARTIFICIAL INTELLIGENCE J o in t Series E d ito rs: P rofessor JOHN CAMPBELL, D epartm ent o f C o m p u te r Science, U n iv e rs ity C ollege L ondon, and Dr JEAN HAYES MICHIE, Research Associate, The T u rin g In stitu te , G lasgow A nderson, J. (editor) POP-11 COMES OF AGE: The Advancem ent o f an Al P rogram m ing Language Andrew , A.M. CONTINUOUS HEURISTICS: The P relinguistic Basis of Intelligence A ttar, A. KNOWLEDGE ENGINEERING* Bergadano, F., Giordana, A. & Saitta, L. MACHINE LEARNING: A General Fram ework and its Applications Blasius, K.H. and Burckert, H.-J. DEDUCTION SYSTEMS IN A l* Bramer, M.A. (editor) COMPUTER GAME PLAYING: Theory and Practice Campbell, J.A. (editor) IMPLEMENTATIONS OF PROLOG Campbell, J!A. and Cuena, J. (editors) PERSPECTIVES IN ARTIFICIAL INTELLIGENCE, Vols 1 & 2 Campbell, J. A. & Cox, P. IMPLEMENTATIONS OF PROLOG, Vol. 2* Carter, D, INTERPRETING ANAPHORS IN NATURAL LANGUAGE TEXTS Davies, R. (editor) INTELLIGENT INFORMATION SYSTEMS: Progress and Prospects Evans, J.B. STRUCTURES OF DISCRETE EVENT SIMULATION Farreny, H. Al AND EXPERTISE Forsyth, R, & Rada, R. MACHINE LEARNING: Applications in Expert Systems and Inform ation Retrieval Frixione, S.G., Gaglio, S., and S pinelli, G. REPRESENTING CONCEPTS IN SEMANTIC NETS Futo, I. & Gergely, T. ARTIFICIAL INTELLIGENCE IN SIMULATION Gabbay, D.M. PROGRAMMING IN PURE LOGIC* G ottinger, H.W. & W eim ann, H.P. ARTIFICIAL INTELLIGENCE: A Tool fo r Industry and Management Hawley, R. (editor) ARTIFICIAL INTELLIGENCE PROGRAMMING ENVIRONMENTS Hayes, J.E. & Michie, D. (editors) INTELLIGENT SYSTEMS: The Unprecedented O pportunity Levy, D.N.L. & Beal, D.F. (editors) HEURISTIC PROGRAMMING IN ARTIFICIAL INTELLIGENCE: The First Com puter Olympiad Lopez de Mantaras, R. APPROXIMATE REASONING MODELS Lukaszewicz, W. NON MONOTONIC REASONING McGraw, K. & W estphal, C. READINGS IN KNOWLEDGE ACQUISITION: Current Practices and Trends M ellish, C. COMPUTER INTERPRETATION OF NATURAL LANGUAGE DESCRIPTIONS Michie, D. ON MACHINE INTELLIGENCE, Second Edition M o rtim er, H. THE LOGIC OF INDUCTION Mozetic, I. MACHINE LEARNING OF QUALITATIVE MODELS* Obermeier, K.K. NATURAL LANGUAGE PROCESSING TECHNOLOGIES IN ARTIFICIAL INTELLIGENCE Partridge, D. ARTIFICIAL INTELLIGENCE: A pplications in the Future o f S oftw are Engineering Ramsay, A. & Barrett, R. Al IN PRACTICE: Examples in POP-11 Saint-Dizier, P. & Szpakowicz, S. (editors) LOGIC AND LOGIC GRAMMARS FOR LANGUAGE PROCESSING Savory, S.E. ARTIFICIAL INTELLIGENCE AND EXPERT SYSTEMS Shanahan, M. & Southw ick, R. SEARCH, INFERENCE AND DEPENDENCIES IN ARTIFICIAL INTELLIGENCE Spacek, L. ADVANCED PROGRAMMING IN PROLOG Sparck Jones, K. & W ilks, Y. (editors) AUTOMATIC NATURAL LANGUAGE PARSING Steels, L. & Campbell, J.A. (editors) PROGRESS IN ARTIFICIAL INTELLIGENCE S m ith, B. & Kelleher, G. (editors) REASON MAINTENANCE SYSTEMS AND THEIR APPLICATIONS Torrance, S. (editor) THE MIND AND THE MACHINE Turner, R. LOGICS FOR ARTIFICIAL INTELLIGENCE Wallace, M. COMMUNICATING WITH DATABASES IN NATURAL LANGUAGE Wertz, H. AUTOMATIC CORRECTION AND IMPROVEMENT OF PROGRAMS Yazdani, M. (editor) NEW HORIZONS IN EDUCATIONAL COMPUTING Yazdani, M. & Narayanan, A. (editors) ARTIFICIAL INTELLIGENCE: Human Effects Zeidenberg, M. NEURAL NETWORK MODELS IN ARTIFICIAL INTELLIGENCE *

In preparation


A Tool for Industry and Management HANS W. GOTTINGER Institute of M an a g e m e n t Science U niversity of M aastricht (RU), M aastricht, The N etherlands

H. PETER WEIMANN IABG, M unich, W . G erm any







First published in 1990 by

ELLIS HORWOOD LIMITED M arket Cross House, Cooper Street, Chichester, W est Sussex, P 0 19 1EB, England A d ivisio n o f S im on & S chuster Inte rn a tio n a l G roup © Ellis Horwood Limited, 1990

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission, in writing, of the publisher Typeset in Times by Ellis Horwood Limited Printed and bound in Great Britain by Hartnolls, Bodmin, Cornwall

British Library Cataloguing in Publication Data Gottinger, Hans W. (Hans Werner) Artificial intelligence: a tool for industry and management. 1. Manufacture. Applications of artificial intelligence I. Title. II. Weimann, H. Peter 670.427 ISBN 0-13-048372-9

Library of Congress Cataloging-in-Publication Data Gottinger, Hans Werner and Weimann, H. Peter Artificial intelligence: a tool for industry and management/ Hans W. Gottinger and H. Peter Weimann. p. cm. — (Ellis Horwood series in artificial intelligence) ISBN 0-13-048372-9 1. Artificial intelligence. I. Title. II. Series. Q335.G67 1990 006.3-dc20 90-35577 CIP

Table of contents Preface...............................................................................................................................7 1 An overview of Artificial Intelligence.....................................................................9 2 The technology of A I ............................................................................................. 13 3

Expert systems — commercial and industrial significance.................................22


Expert systems — knowledge engineering...........................................................32


Natural language processing................................................................................63


AI programming languages.................................................................................. 73


Expert and database systems................................................................................88


AI techniques applied to business management...............................................102


Industrial applications of A I .............................................................................. 113

10 Speech recognition..............................................................................................117 11

AI and rob otics................................................................................................... 127


Automatic programming................................................................................... 132


Intelligent decision support systems....................................................................137

Epilogue: perspectives for intelligent systems.......................................................... 154 Index............................................................................................................................. 157

Preface Interest in artificial intelligence (AI) continues to grow rapidly. Until recently, popular views of the field have developed largely under the influence of fictional literature. The artificially intelligent computer is usually portrayed as naive but sinister (HAL in 2001, or the game-playing computer in War Games). In the past year factual reporting in the popular press has improved the general availability of facts about AI. The image being generated now portrays AI as a powerful new tool. Like other tools discovered by man, AI allows a human to easily control and direct power sources in accomplishment of a task by providing cognitive amplification or argumentation. AI holds promise for (1) relieving humans from the drudgery of interfacing with computers in much the same manner that robots have relieved them from handling assembly-line processes, and (2) enhancing the computer’s ability to assist in problem analysis. The goal of the AI field is to develop computational approaches to intelligent behaviour. Of particular importance is getting machines to solve problems or carry out tasks by enabling them to purposefully manipulate symbols, recognize and appropriately respond to patterns (often accomplished by accessing a knowledge base) and/or adapt in a manner similar to human beings. Additionally, we contend AI includes the productive merging of symbol manipulation and pattern recognition by the computer, with the computer’s ability to carry out extensive computational algorithms quickly and accurately. Knowledge-based (expert) systems form a major part of the book. This is the area in AI with the largest potential commercial impact. Knowledge-based systems are defined to be a class of computer programs intended to serve as consultants for decision-making. These programs use a collection of facts, rules of thumb, and other knowledge about a limited field to help make inferences in the field. They differ substantially from conventional computer programs in that their goals may have no algorithmic solution, and they must make inferences based on incomplete or uncertain information. They are called expert systems because they address problems normally thought to require human specialists for solution, and knowledge-based because researchers have found that amassing a large amount of knowledge, rather than sophisticated reasoning techniques, is responsible for the success of the approach. Languages and tools represent the bases for the development of expert systems. Their embedded methods and symbolic processing capabilities distinguish them from conventinal languages. On the one hand, methods and languages not only have influence on the development of knowledge-based system (so-called knowledge



engineering), on the other hand they have growing impact on conventional software development and on conventional programming languages. This book covers some major commercial developments (e.g. expert systems, natural language processing, speech recognition, robotics). It guides the reader through key developments, enabling him to make pertinent technology assessments as well as investment choices in terms of current applications, trends and future opportunities. The integration of expert system and commercial applications is an area of special interest today and in the near future and has impact on established technology like database systems. Therefore a chapter of this book has been devoted to this theme. A large part of the material in the book was repeatedly taught by the first author in a first-year graduate course on ‘Introduction to Artificial Intelligence' and ‘Intelligent Decision System’ at the Department of Systems Engineering, University of Virginia (1985-87). We are very grateful to Ellis Horwood Ltd for including the book in their AI series, in particular for showing much patience with our need to stretch deadlines for completing the manuscript. Maastricht and Munich March 1990

H. W. Gottinger H. P. Weimann

An overview of Artificial Intelligence Artificial Intelligence (AI) has emerged as one of the most significant technologies of this century. From its beginnings in university computer laboratories almost 30 years ago, the AI field has matured to commercialization within the past few years. Artificial Intelligence is the subfield of Computer Science that is concerned with symbolic reasoning and problem solving, by manipulation of'ETowledge’ rather than mere~dafa! '.. " Artificial Intelligence is created by writing a computer program having certain unique characteristics, t he classical ‘non-intelligenf computer program is a fairly rigid, structured procedure for dealing with a specific type of problem. The program may be very flexible and may be capable of dealing with very complex"situations, but " It cannot solve any problem that the programmer did not foresee when he wrote the 'progrartl.' Everyt hi n g program, does is predictable or preordained. “■^A'prpgraiq that is designed to exhibit intelligence, on the other hand, is expected to do things that have not been .explicitly programmed. In essence, an intelligent program consists of a complex set of rules on how to process data; in addition, it has a ' certain amount of information -— a data base. Finally, the program is.givep..a goal fp ... re a lto r is asked to perform a given task. It is not told specifically how to proceed but only to do so. The program then uses the rules to process the available data so as tow reach the assigned goal... A key characteristic of.AI programs is heuristics, rules of thumb, which guide the program. Gevarter (1985) distinguishes the characteristics of AI programs from conventional computer programs. Characteristics of conventional programs are: • • • • • •

often primarily numeric Algorithmic (solution steps explicit) Integrated information and control Difficult modification Correct answer required Best possible solution usually sought

In contrast, AI programs have the following characteristics: • • • •

Primarily symbolic processes Heuristic search (solution steps implicit) Control structure usually separate from knowledge domain Usually easy to modify, update, and enlarge




• Some incorrect answers often tolerable • Satisfactory answers usually acceptable

[Ch. 1



Insights into how the human mind works are coming directly from our efforts to give computers thought, according to Professor Marvin Minsky of the MIT Artificial Intelligence Laboratory. Allegations that the computer ‘can never be creative, intuitive, emotional, and will never really think, believe, or understand anything’ are false, he wrote in the 1983 issue of the Technology Review, published by MIT. Computers some day will imitate the processes that go on in human minds, Minsky says. But making intelligent machines requires better theories of how to represent the human mind in the inner workings of computers. Making computers learn from their own experience is most important.



Within the past five years, AI has received wide public attention. The event upon which much of this attention has been focused is the competition between the United States and Japan to develop a ‘fifth generation computer’ incorporating AI. AI has been chosen as the theme for the 1985 World’s Fair in Tsukuba, Japan.



The field of artificial intelligence incorporates a wide range of technologies, including: • • • • • • • • • •

Expert systems Natural language Speech processing Vision Robotics Cognitive modeling Knowledge representation and utilization Problem solving and inference Learning (knowledge acquisition) Computer hardware.

" " Applications are far-reaching, including new approaches for business, industry, education, science, mineral exploration, and the military. The following sections provide a brief overview of some of the various areas of AI technology. Subsequent chapters provide a more in-depth assessment of each technology and area of application.


An Expert System is an intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require human

Sec. 1.5]



expertise for their solution. A human expert usually collaborates to help the knowledgeBase. .....— --------- ----_ Some examples of expert systems and their applications are: • XCON (also called R l) is an expert system developed by Carnegie Mellon University for Digital Equipment Corporation to configure VAX computer systems. DEC reports savings of $20 million a year through use of this program. • EURISKO is a program developed by Douglas Lenat of MCC Austin, Tx, which discovers, learns, and applies heuristics by simulating experiments in fields such as mathematics, computer chip design, and biological evolution. The program has the ability to learn from its experience, and can apply heuristics that work in one domain to entirely different ones. In 1981, EURISKO was entered in the competition for playing the futuristic war game Traveller. By applying heuristics it learned from its experience in biological evolution, EURISKO won the championship. • MOLGEN, a program for planning molecular genetic experiments, is in regular commercial use. • Expert systems in use in hospitals are diagnosing cases with such accuracy that doctors can apply the results 85% of the time. • General Electric is using the program CATS-l/DELTA for the maintenance of diesel locomotives. • BACON.5 is an expert system developed at Carnegie-Mellon which has been applied to discover some of the basic laws of nature. So far, BACON.5 has 'rediscovered' the ideal gas law, Snell’s law of refraction, the conservation of momentum, the specific heat law of Black, Joule’s law for the conservation of energy, and Ohm’s law relating electrical current, voltage, and resistance.



Artificial vision (also called machine vision and computer vision) shares with expert systems the role of being one of the more popular topics in AI today. Commercial vision systems have already begun to be used in manufacturing and robotic systems for tasks such as inspection, recognition, and guidance. Other applications are at various stages of development and are beginning to be used in military, cartographic, image interpretation tasks, and remote control. The key to artificial vision system operation is the computer program for analyzing the digitized image from a camera. These computer algorithms for today’s commercial systems are based on previous work of AI researchers.



Another area of AI commercialization is 'Computer Aided Instruction’ (CAI). Programs under development enable students to ask questions of the computer and receive insight instruction with a human expert. (Sleeman & Brown 1982).



[Ch. 1


One of the major limitations to wider usage of computers is the inability of non­ programmers to interface with the computer. A solution developed by the AI community is beginning to ease the computer illiterate’s difficulty. Natural language understanding programs are being developed to serve as 'front ends’ for data bases and other programs, in order that they may be accessed by simple English language commands. A few such natural language programs are now commercially available and several hundred systems have been installed successfully, at least as prototype systems.




/ Speech recognition is an area of AI which has received considerable research l emphasis, and several commercial systems are now available. Voice data entry systems are now in common use for applications such as manual sorting and NC programming. Indications are that this technology will develop to the extent that new products such as speech recognition typewriters may become commercially available with the next five to ten years.


Davis, R. & Lenat, D. B. (1982) Knowledge-based systems in Artificial Intelligence, McGraw Hill: New York. Gevarter, W. B. (1985) Intelligent machines: an introductory perspective o f Artificial Intelligence and robotics, Prentice-Hall: Edgewood Cliffs, N.J. Lenat, D., Cohen, P. R. & Feigenbaum, E. (eds) (1982) The handbook o f Artificial Intelligence, Vol. 1-3, W. Kaufmann: Los Alamos. Nilsson, N. J. (1971) Problem solving methods in Artificial Intelligence, McGraw Hill: New York, San Francisco. Sleeman, D. & Brown, J. S. (1982) Intelligent Tutoring Systems, Academic Press: London.

The technology of Al In the introduction of his keynote speech to the Sixth International Joint Conference on AI, Tokyo, 20-23 August, 1979, Herbert Simon of Carnegie-Mellon University said:4AI research is empirical and pragmatic, typically working with examples rather than theorems, and exemplifying the heuristic of learning by doing. In its essential reliance on weak methods and experiment instead of proof, it is adapted to the exploration of poorly structured task domains showing considerable contrast in this respect to operations research or numerical analysis, which thrive best in domains possessing strong formal structure’. In his book Principles o f AI, Nils Nilsson (1980) says: ‘Most AI systems display a more or less rigid separation between the standard computational components of data, operations and control... At the top level, the complete AI system can consist of several data-base/operations/control modules interacting in a complex fashion’. AI research has been concerned with many different ideas in the search for programs that act intelligently. The search for machine intelligence has not been guided by any consistent body of theory but has consisted of a search for anything that will work. This type of search has itself been widely used in AI programs and is known as heuristic search. The result is that many loosely coupled ideas have been brought to bear on the subject. Some of these basic ideas follow:



We say that an organism shows intelligence if it can adjust to its environment, and particularly if it can adapt to a changing environment. The more versatile it is in its adjustments the more intelligent it is. But what is it that allows an organism to adapt in this manner? In his book. The Micro Millennium, Christopher Evans (1980) states that intelligence consists of six major factors: • The first factor is sensation or the ability to capture data. We see. We hear. • The second factor is data storage. That is, information obtained during life can be used in addition to information one is born with (instinct). • Processing speed is the third factor. The faster one can process data and manipulate information, the greater the intelligence. • The fourth factor is software modification speed. In other words, how fast we can learn something, then unlearn something when new data are captured?



[Ch. 2

• The last factor is software range. The greater the range of things and the greater the complexity of tasks one can cope with and do, the greater the intelligence. Simple one-celled animals can respond to light, heat, and available food, and that is about it. More complex animals generally have all six factors to some degree, while humans obviously have them all to a high level. Where do computers fit in this scheme of things? Computers to date have not been very good at capturing data, and many are dependent on input from a keyboard. But when you connect one computer to another, the data capture rate can be very high. Computers can store a large amount of data but not nearly as much as humans. The processing speed of the computer is very high, but it may not be as efficient as the processing that humans do. Software flexibility: Computer programs can be changed today but usually not very quickly or very efficiently. Humans also tend to be creatures of habit. Software efficiency: Human experts in certain domains have very efficient software. Expert computer programs have powerful software which may or may not be as efficient as humans, but the computer may in fact know more than a human. Software range: Computers seem able to work at most of the intellectual tasks that humans have developed, but they have not been able to do such physical things as play baseball (with real balls and bats) or read most handwritten messages. However, work in robotics is slowly improving the physical capabilities of machines. If the above six factors are accepted as an adequate manifestation of intelligence, then computer programs that can modify themselves satisfy all requirements. This, of course, does not mean that computers act like or can act like humans. Many factors enter into the concept of being human, and intelligence is only one. Intelligence: A classic, though not rigorous, test of the intelligence of a system was formulated by the British computer scientist Alan Turing. Turing’s Test requires two people, a computer, a wall, a communication device, by which both a human and a machine can communicate on an equal basis. One person, the interrogator, is positioned on one side of the wall, and the computer and the other person are positioned on the other side of the wall. The interrogator communicates with the computer and the other person by means of the communica­ tion device, asking them both questions and trying to determine which is the machine. If the computer fools the interrogator, Turing said, we can conclude that a computer has passed Turing's Test in any but highly restricted formats.



The problem of finding a route from an initial state or combination of elements to a desired state involves a search and examination of intermediate states. Any specific intermediate state may be closer to the desired state than all the others; it may be no closer or further away than any other; it may be on a path that can or must terminate in the desired state; it may be on a path that will reach the desired state by an indirect or inefficient route rather than by a best route, etc.

Sec. 2.2]



As an example, you arrive by air in a strange city and are deposited in a hotel somewhere in the city. You are to meet someone at the corner of 7th Street and Columbus Avenue. In your initial state, as you emerge from the hotel you do not know what street you are on, you do not know which way 7th Street is, or which way Columbus Avenue is. How are you going to find a way from your initial state of almost total ignorance (you at least know or assume you are in the right city) to the desired or a final state of being at 7th and Columbus? Assuming you do not have a map or see anybody around you who can give directions, you will have to start exploring the city — make a search. As a first step you go to the nearest corner and discover you are at the intersection of Haverhill and Lotus). From Haverhill/Lotus you can observe four new states (street intersections). None of the four states is the desired state of 7th and Columbus. You have to go through at least one of the four available states to reach the goal, but which one should you choose? You must develop a search strategy. The simplest strategy is an exhaustive search of every street intersection in the city. If you follow this strategy you are certain to find 7th and Columbus, at which point the search stops. A1 researchers have developed two major ways to conduct an exhaustive search; one is known as depth firsthand the other as breadth first. In a depth first search for 7th and Columbus, you might decide to proceed from Haverhill/Pine, etc., exploring Haverhill in depth to see if it ends or turns somehow into 7th/Columbus. In a breadth first search you might decide to go to the four nearest intersections and see if one of them is 7th/Columbus. In certain types of simple problems, exhaustive search is practical and has the advantage that it is simple to implement. In complex problems, exhaustive search is often impossible because it results in what is known as a combinatorial explosion. In our search for 7th St. and Columbus, suppose you did not know what city you were in but wanted to find 7th and Columbus in New York City. If you happened to be in Chicago when you began an exhaustive search, you would start by examining every intersection in the city to see if it might possibly be 7th and Columbus, New York, not a very efficient search strategy, obviously. Some of the most startling examples of combinatorial explosions occur in games such as chess and checkers. If, at the opening move of a checkers game, you attempt to examine every possible move you and your opponent could make until the game was finally won or lost, based on your first move, you would have to examine approximately It)40 configurations. If the examinations could be performed at the rate of 3 thousand million per second, the process would take about 1021 centuries. Chess is even more complex, with about 10l2() possible configurations. No computer program, whether for AI or other purpose, can afford to get trapped in a combinatorial explosion. Thus a number of search strategies have been developed to overcome this problem (when possible). One of the most powerful of the search strategies is the use of heuristics. Heuristics is the application of what might be called common-sense or domain specific knowledge of the problem at hand. In our search for 7th/Columbus/New York, we would demand to be in New York before searching further. A second heuristic would be that once an intersection of either 7th or Columbus was found, that street should then be explored with a depth first strategy. Several other heuristics could be applied to this case by one number at a time. If we find Second Street, we ought to be able to find 7th St. fairly easily. Finally, a map of the city would be domain specific knowledge that would be extremely



[Ch. 2

powerful in solving the problem. (The fact that no intersection of 7th St. and Columbus occurs in New York need not disrupt the program; the search would simply end in failure.) Another important search technique involves the calculation and assignments of numerical values to each state that is examined. These numerical values attempt to rate the state as being likely or unlikely to lead to a solution. In this way, the most likely states are examined first. Other search techniques include; (1) The Alpha-Beta procedure. This technique calculates upper and lower numeri­ cal bounds on each state (or node) and can reduce search by orders of magnitude. (2) Hill climbing. In this procedure, search proceeds in whatever direction seems to be most promising. Sometimes the hill turns out to be only a local perturbation, and the strategy then fails. (3) Branch and Bound. This technique finds the shortest path to a goal state; it works by assigning cost to each partial path. (4) Best first. Generally one finds a more optimal path to a goal than breadth-first or depth-first. (Note that only one path is found in breadth-first or depth first search because once a path is found, the search stops. This path may or may not be optimum.) (5) AND/OR Trees. Many problems can be solved most easily by breaking them up into smaller problems. (Before we start looking for 7th and Columbus, let us look for New York first. Our solution to the problem is New York AND 7th AND Columbus. If we just wanted 7th AND Columbus, we should be willing to accept it in New York OR Chicago OR Detroit, etc.) (6) Minimaxing. This technique is used in games. One evaluates a possible move in terms of the maximum advantage it will confer compared to the minimum advantage. Choosing the move with minimum advantage guarantees some forward progress. Opting for maximum advantage usually means the opponent will force you to a position worse than that associated with the minimum. Backtracking When a program moves from one state to another, it may find that the new state takes it farther away from its target goal state than it was before it made the move. In this case the program may decide to return to a previous state. The progress of going back to a state that has already been tried is called backtracking. If a program gets involved in too much backtracking, it spends too much effort in useless explorations, and thus program efficiency decreases. As a result, most programs try to reduce backtracking asnjjuch as possible or even eliminate it entirely whenever it is possible to das'6.


-orward and backward chaining Typically, an AI program will have an initial state and a desired goal state. To get . from the initial state to the goal state normally involves passage through a long chain of intermediate steps or states. When the program works from the initial state toward the goal state, the process is called forward chaining. Forward chaining is a good technique to use when all or most paths from any one of many initial or intermediate

Sec. 2.3]



states converges on one or a few goal states. An alternative to forward chaining is backward chaining. In this case, the program begins to look for a path through the problem by starting a goal state and seeing how it can be modified to bring it closer to an initial state. Backward chaining is an efficient technique to use when any of many goal states will satisfy the requirements of the problem while the initial states are few; the situation in this case presents many goal states converging on one or a few initial states.



The way a human stores and manipulates knowledge is not understood, but it is certainly a complex process. One need only mention a single word, such as ‘horse’, and all types of references spring quickly to mind: horses running, a race track. Central Park, John Wayne, foxes, red coats, hooves clopping, etc. How can such a large group of associations having only tenuous and arbitrary relationships to each other be conveyed to a computer? Other knowledge representation problems include such things as scene analysis and language comprehension. When a human looks at an automobile, he knows immediately what it is, no matter what make it is, its colour, whether running or being up on blocks, going backward or wrecked. In vision by computer, for example, all this information has to be conveyed in terms of lines that may be straight, curved, short, long, etc. This jumble of data the human analyzes more or less instantaneously and recognizes a car (as well as separating it from its background), and if a computer is to see and understand what it sees, it must be able to perform the same or an equivalent analysis. In natural language conversations, humans easily pick out the subjects of the discussion and the actions involved, and are able to associate a multitude of pronouns with their correct reference (most of the time). How can a computer perform the same tasks? The problem of how to tell computers complex things has been one of the central problems of AI. Winston, in his book, writes: ‘A representation has been defined to be a set of conventions for describing things. Experience has shown that designing a good representation is often the key to turning hard problems into simple ones and it is therefore reasonable to work hard on establishing what symbols a representation is to use and how those symbols are to be arranged to produce descriptions of particular things.’ Frames

One of the key ideas for knowledge representation is the use of frames, where a frame is a collection of facts and data about some thing or some concept. The drawing (Fig. 2.1) shows a room and some of the frames that might be used to represent the room in a manner that can be conveyed to a computer. The entry level for this situation is called the Room Frame. The collection of data about this frame indicates that there is more than one type of room, but we are considering a living room. Included in the frame are five things always found in a room: ceiling, left wall, floor, and — not included in the picture — a backwall. The five parts of the room frame act as a ‘connection’ to another frame called a ‘Wall Frame’. The ‘Wall Frame’ divides



[Ch. 2

Fig. 2.1 — Wall Frame; Drawing of a room, showing some of the frames that might be used to represent the room in a manner that can be conveyed to a computer.

the wall into three sections (in this representation). Each section in turn leads to other frames; picture frame, window frame, and door frame. These in turn can call up other frames. Frames can be constructed to represent many different types of ideas. Winston uses as examples 'Group Fram e\ ‘Relationships Frame’, 'Object Frame’, ‘Word Group Frame’, 'Movement Sentence Frame’, ‘Action-Frame’, ‘State Change Frame’, as well as those mentioned previously. The number of frames that could be constructed is obviously very large. The ways in which one frame relates to other frames are also many and complex.

Sec. 2.3]



Predicate calculus The Predicate Calculus is not concerned with mathematics but with formal logic. The Predicate Calculus and Boolean Algebra use many of the same concepts antf ’ relationships such as AND, OR, negation, truth tables, and de Morgan’s Laws. The Predicate Calculus has been found paTtumTSTT^'useful irt AI because it allows an ordinarvTinghsirserUence to be.T eS S lS t^ . ^ t m a T reipreseritarionlhat "can 'p j T handled by a computer. In addition, because the representation is correct and logical, it can readily be manipulated and compared with other information. As an example, the sentence ‘John ate breakfast’, can be recast into ‘Eat (John, breakfast’). All sentences with a similar or analogous meaning (‘Tom painted car’) can be recast in the same mould. Much more complicated material can be handled, and Nilsson (1980) proposes that Al students express the following statements in the Predicate Calculus:

‘A computer system is intelligent if it can perform a task which, if performed by a human, requires intelligence.’ If a program cannot be told a fact, then it cannot learn the facts.’ The Predicate Calculus (and its extensions and elaborations) is used exten­ sively in AI (Genesereth & Nilsson 1986). Semantic networks

Certain types of relationships can best be made clear by a graphical presentation. While the Predicate Calculus can be used to express the same relationships, the connection and interaction between various elements is often not clear when a long string of logic expressions follow one after another. Just as mathematical formulas J can often be clarified by being presented graphically, so can complex ideas. Similarly, computer programs are often difficult or impossible to understand when written as lines of code, even in a high level language, but can be understood when presented as a flow chart. Production systems

Many AI programs are structured into what is called a production system. A production system has a database, a set of production rules, and a control system. These three major elements are more or less independent of each other, which gives a modular program. As a result changes can be made to any one of the three parts without affecting the other parts, either at all or in only a minor way. As one result, these programs can evolve into new programs. The rules of a production system have preconditions that must be satisfied before a rule can be applied. If the preconditions of a rule are satisfied by the database, the rule is applied and causes a change in the database. The control system determines which rule or rules has its preconditions satisfied and selects which is to be applied. A production system takes the form: IF

(Precondition A is true and precondition B is true and precondition C is false) THEN (Add 1 to the database) ELSE (Try the next possible rule).



[Ch. 2

Several different types of production systems have been developed. In a commuta­ tive production system, applicable rules can be applied in any order without affecting the result: (2 added to 3 equals 5; 3 added to 2 equals 5). Decomposable production systems allow the database to be split into several parts. Then rules can be applied to each part in turn. The procedure is equivalent to breaking a large problem into a number of smaller problems, then solving the smaller problems one at a time until an overall solution is obtained.



In searching for a solution to a problem, blind alleys, useless paths, and combinator­ ial explosions are all to be avoided if possible. One way to reduce the amount of search to be used in solving a problem is to find characteristics that can be used to increase the efficiency of the search. The use of constraints has been found useful in all AI programs. During scene analysis in the world of toy blocks, for example, straight lines may meet to form an L-shape, an arrow, a T, an X, and several other figures. A line may represent an outer edge of a block, or a boundary between a block and the background, or an interior edge, etc. Conventions have been developed for labelling these lines and figures to indicate its nature and how it is to be handled by the program. A single L, for example, could theoretically have as many as 2500 different labels; when the number of physically possible labellings are considered, only 3,5% of the total. A certain type of form ‘peak’ has 6 250 000 (approximately) theoretically possible labels, but the actual number of physically possible labels is only 10, or 0.0016% of the total. It is seen that the use of constraints in the form of physically possible labels in this situation has reduced an incredibly complicated data process­ ing problem to one much more manageable. In sentence parsing similar concepts about word position, which words are nouns, verbs, prepositions, and the kinds of things nouns do and have done to them by other words makes parsing by machine possible.



If computers are to become highly intelligent they will have to be able to learn things by themselves. Computers can learn things in several ways; by being programmed by a human (or even another computer), by getting new information, by examining a number of examples and extracting common features, and by discovering new relationships for themselves. One of the key ideas in computer learning is the concept of the near miss. In teaching a computer how to build structures in the blocks world, for example, the computer is shown an arch, consisting of two supporting blocks bridged by a third block. When the computer is then asked to build an arch, it turns out that the only thing it learned was that three blocks were involved. To correct this near miss, the computer is told that the arch concept also contains the requirement that two of the blocks support the third. When tested, the computer uses this information but fails to

Ch. 2]



provide a space between the two supporting blocks. However, this is a much closer near miss than the first attempt. When informed that the two lower blocks must not be touching, the computer has learned all that is needed to build arches in the blocks world. The representation of the above relationships is also often done graphically, with various labelled arrows, nodes, and notes. The graphs get complicated in complex systems. The program (BACON.5) that can learn new relationships in the world of physics is examined later.


Evans, C. (1980) The micro millenium, The Viking Press. Genesereth, M. R. & Nilsson, N. J. (1986) Logical foundations o f Artificial Intelligence, M. Kaufmann: Palo Alto. Gevarter, W. B. (1985) Intelligent machines: an introductory perspective o f Artificial Intelligence and robotics, Prentice Hall: Eng. Cliffs Michie, D. (ed.) (1982) Introductory Readings in Expert Systems, Gordon & Breach. Michie, D. & Hayes, J. E. (eds.), (1983) Intelligent systems, E. Horwood: Chichester. Nilsson, N. (1980) Principles o f Artificial Intelligence, Tioga Publishing Company. Simon, H. A. (1979) ‘Artificial Intelligence research strategies in the light of AI models of scientific discovery', Proceedings of the Sixth International Joint Conference on Artificial Intelligence, Tokyo, August 20-23, 1979. Winston, P. H. (1981) Artificial Intelligence, Addison-Wesley: Reading, Mass.

3 Expert systems — commercial and industrial significance Expert systems are currently the most emphasized area in the field of Artificial Intelligence. In Chapters 4 and 8 through 13 of this book which are directed toward AI applications, most of the computer programs which are discussed fall within the classification of ‘expert systems’ or ‘knowledge-based systems’. 3.1


According to Feigenbaum (1982) we have the following definition of an expert system: An “expert system” is an intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant human expertise for their solution. The knowledge necessary to perform at such a level, plus the inference procedures used, can be thought of as a model of the expertise of the best practitioners of the field. The knowledge of an expert system consists of facts and heuristics. The “facts” constitute a body of information that is widely shared, publicly available, and generally agreed upon by experts in a field. The performance level of an expert system is primarily a function of the size and quality of the knowledge base that it possesses. From the architectural point of view expert systems are generally considered as rule based systems which provide for a separation of the knowledge base and the inference engine and contain a built-in explanation facility. Additionally, a dialogue component — natural language-based or not — is assumed as part of the architec­ ture. Fig. 3.1 displays a single architecture of an expert system. From the software engineering point of view the main advantages obtained from the expert system architecture and the use of AI technology are flexibility, trans­ parency, and extensibility. Because the knowledge base of an expert system can be altered easily and extended imipediately by adding new rules and meta-rules, Artificial Intelligence supports the expert system development process with regard to its mostly ill-structured problem domain by providing sophisticated domain-inde­ pendent methods. Thus, systems analysis can be done on a higher level of abstraction (closer to the domain expert) and involving the entire engineering cycle (Patrick 1986).


Sec. 3.2]


E x p e rt S y s te m

User Interface Expl anat ion Facility



Dialog M a n ag e r

Inference Engine

Fig. 3.1 — Expert system architecture.

The potential use of expert systems appears to be virtually limitless. They can be used to diagnose, monitor, analyze, interpret, instruct, learn, plan, design, explain, and consult. Thus, they are applicable for instruction, testing, and diagnosis in the held of education; they can be used in a wide range of professions such as medicine, law, engineering, and accounting; they can do image analysis and interpretation and they can be applied to support design, monitoring, diagnosis, maintenance, repair, operations, and instruction of equipment. This list should just give an idea of the potential application areas of expert systems. Although one major idea of AI is to have domain-independent inference mechanisms for multiple problem classes, the reality shows that different problems require different methods. This leads to a classification of expert systems into specific categories. The most common scheme (Clancey 1986) divides expert system application areas into analysis problems (e.g. debugging, diagnosis, and interpretation) and synthesis problems (e.g. configuration, planning, and scheduling). Table 3.1 explains some of the problem classes.



Michie (1960) observes: ... that (ideally) there are three different user-modes for an expert in contrast to a single mode (getting answers to problems) characteristic of the more familiar type of computing: (1) getting answers to problems — user as client;



[Ch. 3

(2) improving or increasing the system's knowledge — user as tutor; (3) Harvesting the knowledge base for human use — user as pupil. Users of an expert system in mode (2) are known as “domain specialists". It is not possible to build an expert system without one ... An expert system acts as a systematizing repository over time of the knowledge accumulated by many specialists of diverse experience. Hence, it can and may ultimately attain a level of consultant expertise exceeding that of any single one of its 'tutors’. Table 3.1 — Problem classes



diagnosis design planning simulation

pattern recognition generation of specific objects generation of a sequence of actions modeling

Stefik etal. (1984) developed the pedagogical tour of prescriptions for the organization of expert systems shown in Fig. 3.2. Case 1 begins with a restricted class of problems that admits a very simple organization. In the other case, these assumptions are relaxed one at a time, the first three branches (cases 2 through 49 consider the complications of unreliable data or knowledge, time-varying data, and a large search space). Any given problem may require combining ideas from any of these topics. This concerns all problem classes as described in section 3.1. The problem of a large search space is then considered along three major branches. The first branch (cases 5 through 8) considers organizations for abstracting a search space. The second branch focuses on methods for incomplete search. The third branch considers only ways to make the knowledge base itself more efficient. Stefik’s The organization of expert systems: a prescriptive tutorial presents the substance of these architectural ideas. Examples are drawn from expert systems developed over the last ten years. It compares and evaluates approaches and attempts to organize the ideas into a coherent theory. Real systems may combine these ideas. For in depth explanations we refer to Stefik et al. (1984).



The development of any problem in AI involves a great deal of computer program­ ming. This is especially true when a program is begun without reference to other programs, so that everything must be worked out on a brand new basis. In

Sec. 3.3]


Fig. 3.2 — Expert system prescriptions, Stefik et al. (1984).




[Ch. 3

developing various types of expert consultation programs it was realized that much of the methodology (and some of the programming) could be used to develop expert programs in areas outside the original domain. What is knowledge engineering? In simplest terms, it is the coding of a specific domain of knowledge into a computer program that can solve problems in that domain. The task involves human experts in the domain working together with the programmer and/or knowledge engineer to codify and make explicit the rules that a human expert uses to solve real problems. Often the expert uses rules that he applies almost subconsciously without knowing that he knows. Usually, then, the program develops in what may seem a hit-or-miss method. As the rules are refined by using the emerging program, the expertise of the system increases. As the knowledge of more and more human experts is incorporated into the program, the level of expertise rises and eventually can exceed that of any specific human expert. Knowledge engineering usually has a synergistic effect. The know­ ledge possessed by human experts is often unstructured and not explicitly expressed (see Elstein et al. 1978). The construction of a program assists the expert to learn what he knows and at the same time can pinpoint inconsistencies between one expert and another. Major goals in knowledge engineering include the construction of programs that are modular in nature, so that additions and changes can be made to one module without affecting the workings of the other modules. A second major objective is to obtain a program that can explain why it did what it did when it did it. If the program evokes rule 86 to explain a certain set of facts, and if the human expert questions the correctness of applying this rule to the data, the program should be able to explain why it used rule 86 instead of, say, rule 89. This type of interaction allows rules to be refined and brings to light inconsistencies in procedure and data. Feigenbaum (1977) describes the activity of knowledge engineering as follows: The knowledge engineer practises the art of bringing the principles and tools of AI research to bear on difficult applications problems requiring experts knowledge for their solution. The technical issues of acquiring this know­ ledge, representing it, and using it appropriately to construct and explain lines-of-reasoning, are important problems in the design of knowledgebased systems. The art of constructing intelligent agents is both part of and extension of the programming art. It is the art of building complex computer programs that represent and reason with knowledge of the world.



A major bottleneck in the design of expert systems is the process by which the knowledge of the human expert is ‘extracted’ by the knowledge engineer. As this work is done in close coordination with experts and serves for the adequate modelling of expertise and human inference capabilities, it is called ‘knowledge acquisition’. At this point Artificial Intelligence provides, through its exploratory develop­ ment capabilities, the most significant advantage to software development compared


Sec. 3.4]


to conventional software engineering. In software engineering the definition and design phase of a program might illustrate that the expertise to an application area is ill-structured and complex. Additionally, the human inference capabilities can not be comprehended immediately. Therefore, knowledge engineering provides for an iterative analysis of the problem area and, based on the methods of AI, the solution is implemented step by step in a prototyping manner. This process results in the specific engineering cycle shown in Fig. 3.3.

I Planning

^m aintenance

Fig. 3.3 — Knowledge engineering cycle.



[Ch. 3

It differs from the traditional software engineering in so far as the analytical and design work apply to the entire cycle. In recent years substantial work has been devoted to research and development of tools supporting the knowledge acquisition task and techniques. Whereas in chapter 4 knowledge engineering is considered more from the implementation point of view and in Chapter 7, the languages and tools available are discussed, an overview of the techniques of knowledge acquisition is given in this section. Corresponding computer-assisted tools which support the methodological approach to the acquisition process and the individual subtasks and thus widen the bottleneck of knowledge acquisition, can reduce the high development effort involved in expert systems and increase the quality of the systems. The techniques of knowledge acquisition concern the two sources, the knowledge engineer can fall back on his analytical work the expert and specialist literature. Different forms of interview (e.g. structured and unstructured) belong to the manual techniques applied, as do activities in which the knowledge engineer undertakes an active role, such as the observing of an expert or the teachback interview, psychologi­ cally-based protocol analysis and scaling techniques as well as text analysis. 3.5


The use of manual acquisition techniques is time-consuming and an information loss is incurred in the communication between the knowledge engineer and the experts. The acquisition of knowledge should be improved in such a way that the expert formalises his knowledge with the help of a relevant tool. The knowledge engineer is to be supported more in the modelling of the problem area. The knowledge of the expert rather than the perceived view of the knowledge engineer is to be modelled in the expert system. Some approaches are still part of research and development in this field, which interactively support the area of interview techniques and psychology-based meth­ ods specifically and will influence the development of expert systems in future. Since effective knowledge acquisition tools are at the very heart of expert system develop­ ment they will contribute according to a successful commercial proliferation of such systems. A comprehensive overview is given by Boose (1989). ETS/AQUINAS (Boose et al. 1987) AQUINAS, an expanded version of the Expertise Transfer System (ETS), was developed by BOEING Industries. This tool unites interview techniques and psychology-based methods for knowledge acquisition. The system attempts to acquire knowledge for rule-based systems, based specifically on the repertory grid and scaling methods. With the help of the repertory grid technique, simple rules are generated on the basis of factor-analytical methods. With AQUINAS, numerous analytical problem areas, such as medical diagnosis, can be supported. The expert is aimed at directly as the user, who works interactively with the system by means of a rule-based, self-adapting user interface. Moreover, implementation tools (see Chapter 6.4) are integrated into the system. MORE/MOLE (Eshelman et al. 1987) MORE and its successor MOLE are very domain-specific acquisition tools which were developed for the field of diagnosis for mechanical systems. The expert can

Sec. 3.5]



refine the knowledge base to a problem area step by step and include new knowledge. The system builds on the knowledge representation mechanism of the semantic networks (see Chapter 4). KRITON (Diedrich et al. 1987)

Like AQUINAS, KRITON is a hybrid system which supports protocol and text analysis in addition to repertory grid and scaling methods and converts the know­ ledge acquired into relevant knowledge representation mechanisms. KRITON supports both the expert and the knowledge engineer. ROGET (Bennet 1985)

ROGET is an acquisition tool which is used for the build-up of medical diagnosis systems. It supports the interview with the expert and generates a conceptual modelling of the problem domain. KNACK (Klinger et al. 1989)

KNACK is an acquisition tool which is dedicated to the design area and specifically the development of an electro-mechanical system. KNACK assumes that the expert can present his knowledge in the form of a skeleton report and of report parts. The skeleton report represents the framework which is filled in the course of the development of a concrete system with the relevant report parts by the expert. KNACK asks for knowledge from the expert as to how report parts are compiled in a special application. Although summarized under the term ‘knowledge acquisition’, the different phases such as the recording of problem characteristics and the refinement and validation, and sub-tasks such as the classification of problems, the build-up of the first knowledge base and its refinement must be differentiated. In these points exist considerable differences in the individual acquisition tools, based on the techniques used and their breadth of application. Most tools are not only restricted to particular sub-tasks but, at the same time, concentrate on a particular class of expert systems. Thus, they pre-suppose that a first categorization of problems and the sub-tasks in the acquisition process have been completed. Often there is a close coupling to the implementation tools in these systems. Moreover, it is clear that these systems only support partially automated knowledge acquisition, based on the techniques used. The following contains automated knowledge acquisition approaches. Techniques such as those listed below are tested and developed further in these approaches. Analogy Conclusions, i.e. knowledge from known situations is applied to similar new

situations. Rule/Knowledge Induction, i.e. general rules are derived from specific forms of

knowledge and examples. Similarity and Explanation-Based Learning, i.e. similarities are derived from a

number of positive and negative examples in the form of rules, or examples serve as the basis for the derivation of rules based on a specific theory. These techniques will be a major factor in the future in research in artificial intelligence, before they can be used for knowledge acquisition. The reader, who is interested in the different approaches and techniques for



[Ch. 3

knowledge acquisition is referred to the companion volumes by Boose and Gaines (1988a, 1988b).

REFERENCES Bennet, J. S. (1985) ‘A Knowledge-Based System for Acquiring the Conceptual Structure of a Diagnostic Expert System’, Journal o f Automated Reasoning, Vol. 1, pp. 49-74. Boose, J. and Bradshaw, J. M. (1987) ‘Expertise Transfer and Complex Problems: Using AQUINAS as a Knowledge-Acquisition Workbench for KnowledgeBased Systems’, International Journal of Man Machine Studies, Vol. 26, No. 1, Academic Press. Boose, J. & Gaines, B. (eds) (1988a) Knowledge acquisition tools for expert systems, Academic Press. Boose, J. & Gaines, B. (eds) (1988b) Knowledge acquisition for knowledge-based systems, Academic Press. Boose, J. (1989) A Survey of Knowledge Acquisition Techniques and Tools, Knowledge Acquisition - An International Journal o f Knowledge Acquisition for Knowledge-Based Systems, Vol. 1, No. 1, Academic Press. Clancey, W. J. (1985) Heuristic classification. Artificial Intelligence 27 289-310. Diedrich, J., Ruhmann, 1. & May, M. (1987) ‘KRITON: A Knowledge Acquisition Tool for Expert Systems’, International Journal of Man Machine Studies, Vol. 26, No. 1, Academic Press. Duda, R. O. (1981) ‘Knowledge-based expert systems come of age’. Byte 6 No. 9, Sept 1981, pp. 238-281. Elstein, A. S ,,et al. (1978) Medical problem solving: an analysis o f clinical reasoning, Harvard Univ. Press: Cambridge, Mass. Eshelman, L., Ehret, D., McDermott, J. & Tan, M. (1987) ‘MOLE: A Tenacious Knowledge Acquisition Tool’, International Journal o f Man Machine Studies, Vol. 26, No. 1, Academic Press. Feigenbaum, E. A. (1977) ‘The art of artificial intelligence: 1. Themes and cases studies of knowledge engineering’. Fifth International Joint Conference on Artificial Intelligence, pp. 1014—1029. Feigenbaum, E. A & McCorduck, P. (1983) Fifth generation, Addison-Wesley: Reading, Mass. Gevarter, W. B. (1982) Overview o f expert systems, National Bureau of Standards Report Number NBS1R 82-2505, May 1982. Harmon, P. & King, D. (1985) Expert systems: Artificial Intelligence in business, Wiley: New York. Klinker, G., Bentolila, J., Genetet, S., Grimes, M. & McDermott, J. (1987) ‘KNACK-Report-Driven Knowledge Acquisition’, International Journal of Man Machine Studies, Vol. 26, No. 1, Academic Press. Michie, D. (1982) ‘Knowledge-based Expert Systems’, In: Michie, D. (ed.), Intro­ ductory readings in expert systems, Gordon & Breach: New York. Patrick, D. (1986) Artificial Intelligence. Applications in the Future o f Software Engineering, Ellis Horwood: Chichester.

Ch. 3]



Stefik, M., et al. (1984) ‘The organization of expert systems: a prespective tutorial’, In: Hayes-Roth, F. etal., Building expert systems, Addison-Wesley: Reading, Mass. Waterman, D. (1986) A guide to expert systems, Addison-Wesley: Reading, Mass.

4 Expert systems — knowledge engineering Too often, Artificial Intelligence (AI) terms may be misused; citings give only the names of systems, tools, or languages but fail to list what they do or where they were developed (who distributes them); and information on how to go about constructing an expert system maybe confusing or incomplete. This chapter is an attempt to bring together, under one cover, much of this information, while at the same time trying to clarify and demystify the entire knowledge engineering process. From a history and application of expert systems to basic program components to a categorization of tools and techniques, this chapter describes the fundamental concepts behind the development of expert systems as well as extensive sources of further information. Expert Systems—Historical Perspective, introduces the reader to expert systems in general and popular expert systems in specific. From a backdrop of AI research to current knowledge engineering tools, this section provides an overview of the field. Expert systems — Knowledge Engineering Process, puts forth the basic infor­ mation needed to construct such a system. Task and expert prerequisites are examined, as well as major stages of construction. The main components of an expert system are presented, and topics of knowledge acquisition, representation, control, and utilization are addressed. The complexity of the task of expert system construc­ tion is discussed along with some shortcomings of skeleton systems. Expert Systems, Tools, and Languages, provides a listing of a variety of instru­ ments. Expert system skeletons, high-level programming languages, and knowledge engineering tools are categorized and linked to sources of origin and/or distribution. This chapter is crafted not only as a user’s guide for the novice but also as an easyto-use reference guide for those involved in the field. Its layout and design were chosen to provide easy access to information regarding what expert systems are, how to go about building them, and what tools and systems have already been developed. The Appendix gives areas of applications and lists major expert systems vendors and sponsoring organizations.



The first electronic general purpose digital computer was introduced during World War II in the mid-1940s. With the advent of ENIAC (Electronical Numerical

Sec. 4.1 ]



Integrator and Calculator), researchers have come a long way to developing computer programs that could perform tasks previously only done by human experts, i.e. expert systems, knowledge-based systems, knowledge systems, or knowledge-based support systems. Knowledge-based systems have been used to solve a variety of planning, designing, analyzing, tutoring, interpreting, diagnosing, and monitoring problems. Systems have been built to formulate mathematical concepts, monitor patient respiration, configure computer systems, diagnose diseases, suggest organic chemi­ cal structures, and even to build other expert systems. After the specialist’s knowledge of a particular field of study has been incorpor­ ated, the program \ .. exhibits similar performance to a human expert in performing tasks — usually quite sophisticated tasks — which are normally considered to require intelligence' (Cox 1984, p. 237). In conventional programming when all forms of a problem and its possible solutions and outcomes are coded in, the computer pursues every avenue of every decision in its entirety on its way to a resolution. Reseachers found that when no knowledge about problem solving was available to guide the application of rules, an intractable amount of computation resulted — a combinatorial explosion. Heuristics On the other hand, human experts use judgement to determine which paths to take to reach a solution. With this judgement, they can handle areas of uncertainty, break problems down into subdomains, and select avenues to pursue in order to consider only the most promising approaches. The use of such judgement (e.g. intuition, hunches, rules-of-thumb, beliefs, pet theories, etc.) in decision making and problem resolution is known as using heuristics — the ‘art of good guessing’. To use heuristics, new programming languages needed to be developed. The use of symbols to represent facts about a given domain of knowledge instead of just strings of ‘ones’ and ‘zeros’ became a necessity. Dartmouth Summer Conference

At the Dartmouth Summer Conference in 1956, John McCarthy, Marvin Minsky, Nathaniel Rochester, Claude Shannon, and others discussed ways computers could stimulate human thought (Feigenbaum 1979). It was here that the term Artificial Intelligence (AI) was coined. During this meeting, Allen Newell, Herbert Simon, and J. C. Shaw presented a program, LOGIC THEORIST, considered to be the first AI program. LOGIC THEORIST was used to prove mathematical theorems proposed by Alfred Whitehead and Bertrand Russel in their work, Principia mathematica. LOGIC THEORIST was built using the 1PL (information Processing Lan­ guage). It approaches problems by alternating between constructing and destroying lists of possible solutions. Most of IPL’s functions create new lists of functions that are manipulated as needed. DENDRAL

DENDRAL, an early expert system based on the use of heuristics, was developed in 1965 by Stanford University researchers, Edward Feigenbaum and Bruce Buchanan



[C h.4

(Feigenbaum et al. 1971). It acts as a computer chemist by helping the user analyze existing chemical compounds. It uses some 400 rules to identify organic compounds by analyzing mass spectograms and producing hypotheses on the molecular struc­ tures. To do this, DENDRAL infers constraints from the data, produces sample structures, predicts mass spectrographs for each sample, and then compares those with the original data (i.e., a plan-generate-test method). Therefore, by using its explicit knowledge of mass spectrometry, DENDRAL determines the structures of compounds from mass spectral data inputs. (Later versions were expanded to use other spectral data as well.) Stanford University used DENDRAL for teaching purposes in organic chemistry classes. It has also been used to verify published structures. DENDRAL has been hailed as one of the first programs to successfully employ AI techniques for solving difficult technical problems (Rich 1984). MYC1N

Stanford researcher Edward Shortliffe wanted to prove that expert systems could be applied to fields other than chemistry. In 1976 he developed MYCIN, a system for diagnosing infectious blood diseases and for recommending appropriate antimicro­ bial therapies (Shortliffe 1976). MYCIN works backward from a hypthesized diagnosis to see if evidence supports it. Then, for any diagnosis with a high certainty value, MYCIN recommends a treatment. Using 500 rules extracted from physicians, the system determines a patient's drug therapy program. However, even with a performance comparable to human experts, many computer-shy physicians found MYCIN difficult to accept — a computer as a medical consultant (Buchanan & Shortliffe 1984). TEIRESIAS

Stanford researchers developed TEIRESIAS to add to an explanation module to MYCIN (Davis 1976). By just typing 'WHY', TEIRESIAS backtracks through its reasoning steps to give the physician an opportunity to accept or reject MYCIN’s advice. TEIRESIAS also aids in the acquisition of knowledge. When a new rule is added, TEIRESIAS queries the user about the possible effects on the new know­ ledge and may even tell the user if some piece of information is missing from the new rule (Table 4.1). EMYCIN

In 1977 Stanford researcher William Van Melle took MYCIN and TEIRESIAS one important step further. By removing medical knowledge base from MYCIN, and leaving the logic or reasoning section (i.e., the inference engine), a domainindependent version of MYCIN known as EMYCIN (Essential MYCIN) was created (Van Melle 1979). Van Melle found that the division of an expert system into knowledge base and inference engine made it easier to modify the expertise. Changes in the knowledge section could be made by alterations to the knowledge base alone. In fact, the EMYCIN skeleton allowed for a complete substitution of the knowledge base (Fig. 4.1). Van Melle felt that if a new problem could be set up as a problem of gathering evidence for and/or against alternative hypotheses, then

Sec. 4.1]



Table 4.1 MYCIN: Is the patient’s illness with ORGANISM-I a hospital acquired infection? USER: WHY TEIRESIAS: (i.e. WHY is important to determine whether or not the infection with ORGANISM-I was acquired while the patient was hospitalized?) (1.0) This will aid in determining the category of ORGANISM-I It has already been established that (1.1) the gram stain of ORGANISM-I is gramneg, and (1.2) the morphology of ORGANISM-I is rod, and (1.3) the aerobocity of ORGANISM-I is facultative Therefore, if (1.4) the infection with ORGANISM-I was not acquired while the patient was hospitalized then there is strongly suggestive evidence (.8) that the category of ORGANISM-I is enterobacteriaceae

The following were used: (3.1) RULE 27 indicated there is strongly sugges­ tive evidence (.8) that the aerobicity of O RG A ­ NISM-I is facultative (3.2) RULE 27 indicated there is weakly sugges­ tive evidence (.2) that the aerobicity of O RG A ­ NISM-I is anaerobic Since this gave a cumulative CF of (.8) for facul­ tative, and (.2) for anaerobic, it has been estab­ lished that the aerobicity of ORGANISM-I is facultative. USER: WHY DIDN'T YOU CONSIDER STREPTOCOCCUS AS A POSSIBILITY FOR ORGANISM-I? TEIRESIAS: The following rule could have been used to determine that the identity of ORGANISM-I was streptococcus: RULE 33 But clause 2 ('the morphology of the organism is coccus') was already known to be false for ORGANISM-I, so the rules were never tried.

Also: there is weakly suggestive evidence (.1) that the identity of ORGANISM-I is pseudomonas-aeriginosa (RULE 37) USER: HOW 1.3 TEIRESIAS: (i.e. HOW was it established that the aerobicity of ORGANISM-I is facultative?) Source: Feigenbaum (1978, p. 235).

EMYCIN could be used to construct a new expert system. Thus, EMYCIN has been used to construct several medical systems as well as a system in the engineering domain (Van Melle 1980). PUFF

In 1977 John C. Kunz of Stanford University added a pulmonary disease knowledge base (or 250 rules) to EMYCIN to develop PUFF (Fig. 4.2). PUFF is in daily use at the Pacific Medical Center in San Francisco. It has a rate of 95% correct diagnoses. It interprets data from a respirometer, an instrument that measures the flow of air in respiration (Table 4.2). It was the first actual application built using EMYCIN. PROSPECTOR

In addition to the work at Stanford University, during this same period researchers at the Stanford Research Institute (SRI) Carnegie-Mellon University (CMU), and others were productive as well. To show that expert systems had commercial applications, in 1977 SRI’s Duda, Hart, and Gaschnig presented PROSPECTOR (Duda et al. 1978). As an electronic geologist, PROSPECTOR uses knowledge of local and regional characteristics of


[Ch. 4


Fig. 4.1 — EMYCIN Skeleton. EMYCIN consists of MYCIN’s inference engine ami an empty knowledge base.



Fig. 4.2 — PUFF System (from EMYCIN Skeleton). PUFF Uses the EMYCIN shell plus a knowledge base of pulmonary disease information.

Sec. 4.1]



Table 4.2

SAMPLE PUFF OUTPUT INTERPRETATION: Elevated lungs volumes indicate overinflation. In addition, the rv/tlc ratio is increased, suggesting a mild degree of air trapping. Forced vital capacity is normal but the fevl/fvc ratio is reduced, suggesting airway obstruction of a mild degree. Reduced mid-expiratory flow indicates mild airway obstruction. Obstruction is indicated by curvature in the flow-volume loop of a small degree. Following bronchodilation, the expired flow shows slight improvement. This is confirmed by the lack of change in airway resistance. The low diffusing capacity indicates a loss of alveolar capillary surface, which is moderate. CONCLUSIONS: The low diffusing capacity, in combination with obstruction and a high total lung capacity, would be consistent with a diagnosis of emphysema. The patient’s airway obstruction may be caused by smoking. Discontinuation of smoking could help relieve the symptoms. PULMONARY FUNCTION DIAGNOSIS: 1. MILD OBSTRUCTIVE AIRWAYS DISEASE EMPHYSEMATOUS TYPE. Source: Feigenbaum (1983, p. 8)

areas favourable for specific ore deposits. It starts with a hypothesis about possible mineral deposits and works backward to find evidence that supports a particular hypothesis. With 1600 rules, PROSPECTOR helps geologists pinpoint sites for possible mineral exploration. Although PROSPECTOR was used to identify a $100 million molybdenum deposit under Mount Tolman in Washington, it is not currently being used on a daily basis. The system has influenced knowledge crafting of other mineral deposit models. R1

In 1979, CMU's John McDermott produced the first version of R l, a system used by the Digital Equipment Corporation (DEC) to configure customer requests for VAX computer systems. Rl uses the customer’s requirements as input (I) and produces a diagram for the components in the order as its output (0). Basically, the program pieces together computer systems by breaking major problems down into ordered subproblems, solving one subtask, then moving on to the next. X-SEL A spinoff of Rl is the X-SEL program. DEC uses X-SEL to provide instructions for its salespersons. The program supplies the potential customer with a list of all necessary components for a specific system. It combines knowledge from two major sources: salespersons and engineers. X-SEL can lay out a customer's needs before the order is placed. This process saves DEC from making costly mistakes in tailoring a computer system to fit a customer’s order.



[Ch. 4


In 1976, Douglas B. Lenat developed an expert system known as AM for open-ended scientific theory formation (Lenat 1977). As a scientific inventor, AM uses some 250 heuristics to do elementary mathematical research. Starting with a possible supposi­ tion, AM pursues a line of reasoning until it discovers new mathematical concepts and their interrelationships. The program employs a plan-generate-test method as its control structure. By combining old terms in new ways, AM generates new terms. However, because the meta-rules used to guide AM’s discovery paths are very general and simplistic, the longer it runs the further the concepts AM develops are from the original primitives (Lenat 1978).


AM’s inability to create and modify new heuristics about number theory led to the extension of AM into EURISKO (Lenat et al. 1982). As a successor system, EURISKO not only creates new concepts but discovers and develops new taskspecific heuristics. By keeping track of statistics on the average running times, space used, and success and failure rates of rules tried, EURISKO decides which rule to try next (Hayes-Roth et al. 1983). Frequently used systems

Although a wide variety of expert systems have been developed to date (section 4), few are in commercial use. In fact, most failed to progress beyond the research stages because they were developed solely as learning aids (i.e. so that their builders could learn how to build them). Of the systems previously discussed, PUFF, Rl/X-CON, and X-SEL are used on a continuing basis. In addition, chemical laboratories make constant use of GENOA, a DENDRAL variant. Bell Laboratories uses ACE to troubleshoot its telephone cables. Each morning, ACE reports are used to dispatch repair crews to trouble spots — a job that previously took a week for a team of technicians. A French Oil company. Elf Aquitaine, had a program developed for them by Teknowledge, Inc., of Palo Alto, California. The program, DRILLING ADVISOR, diagnosis oil-well drilling-bit sticking problems. According to Elf, more than 500 possible variables could cause such troubles. Teknowledge estimates that Elf will save millions of dollars by using DRILLING ADVISOR to reduce drilling downtime. HELP, a medical advisor, was developed by a team of doctors from the University of Utah and the Latter Day Saints Hospital (LDS) in Salt Lake City led by Ho Warner. It is the first expert system to incorporate hospital book-keeping with advice on patient care. Physicians requesting a consultation with HELP have accepted its recommenda­ tion 80% of the time. The HELP system is currently in use at Arnot-Ogden Memorial Hospital in addition to being used at LDS. Warner (1984) reports that it operates on-line at the University of Utah Medical Center.


Sec. 4.3]



Although few expert systems have gone through all of Davis’ (1982) stages of development to arrive at the point of system release (Table 4.3), Japan’s Fifth

Table 4.3 — Davis’ stages of development

1. 2. 3. 4. 5. 6. 7.

Design of the system Development of the system Evaluation of system performance Evaluation of system acceptance Extended use of system in prototype environment Development of maintenance plans for system Release of system for general use

Source: Davis (1982, p. 10).

generation challenge — that by 1992 it will build a ‘thinking’ computer (Feigenbaum & McCorduck 1983) — has proven to be an impetus for many AI researchers. Within the past ten years, in the USA alone, some 100 new companies (e.g., Teknowledge, Inc.; Inference Corp.; Computer Thought Corp.; Carnegie Group, Inc., etc.) have gone into business with their major objective being to commercialize A I. This shift — from the laboratory to the marketplace — brings a flood of new AI products to the public, including knowledge engineering tools such as: • KEE from IntelliCorp, Inc., Menlo Park, Ca. • LOOPS from Xerox. • ART from Inference Corp., Los Angeles, Ca. and for PCs: • • • • •

KES from Software A & E, Arlington, Va. M. 1 (S. 1/T. 1) from Teknowledge, Palo Alto, Ca. TIMM from General Research Corp., McLean, Va. EXPERT-EASE from Jeffrey Perrone & Assoc., Inc., San Francisco, Ca. REVEAL from InfoTym of McDonnell Douglas, Irvine, Ca.

For further details the reader is referred to Chapter 6.


Not all tasks qualify for expert system construction, and not all specialists make good domain experts. Therefore, a knowledge engineer must consider the following criteria when approaching an expert system construction problem. Task prerequisites

Suitability Is the task one that requires human intelligence to perform? One that needs resolution? One that would be of value to others if it were successfully ‘captured’ by a



[C h.4

computer program? The problem should be manageable in size and scope with the possibility of being expanded in the future to include larger applications (HayesRoth 1981). Common sense Does the task require much commonsense reasoning? Commonsense reasonings entail considerable experiential knowledge to infer new facts from what is already known. New knowledge would mean revising beliefs, i.e. beliefs would be subject to change. Since present systems are not adept at learning from their own experiences, the millions of facts and rules contained in everyday knowledge would need to be programmed into the computer. Waldrop (1984b) suggests that it would be an impracticable task to try to build a machine containing the vast amount of knowledge needed for such commonsense reasoning. Narrow domain Is the problem to be resolved found within a narrow, well-defined domain? Early researchers thought a general, problem solving approach was all A1 needed to solve the ‘intelligent’ problems of the world. However, most generalized programs soon became lost in combinatorial explosion and failed (Waldrop 1984a). Because the number of facts and rules a program must assimilate expands as the domain grows larger, narrowly defined problems with limited solution possibilities and manageable amounts of data have proven to be the best topics for expert systems construction (Ham 1984). Solvability Does the task entail a problem that at least one human expert can solve? If the problem under consideration is not solvable, then it may be impossible to build an expert system (that is, until more sophisticated automatic knowledge acquisition tools and techniques are developed, see Sec. 3.5). The exceptions to this are the expert systems that learn from experience: i.e., EURISKO (Lenat 1977) heuristic formation; META-DENDRAL (Buchanan & Feigenbaum 1981) rule formation. Clearly defined Can the expert and the knowledge engineer clearly define the task at hand? What are the inputs (I) and outputs (0)? What are the major concepts and their relationships? What data are available for initial states and what goals need to be attained? Domain knowledge should be amenable to being expressed as ‘chunks’ of knowledge that take the form of production rules, frames, or semantic networks. Davis (1982) fears that some ‘chunks’ may be so large that they cannot be easily expressed as a collection of rules without losing their original integrity. Expert prerequisites

Special knowledge Does the expert possess some special knowledge, judgement, and/or experience about the problem domain? Matching the right kind of expert with the given task is important.

Sec. 4.3]



Explain knowledge Can the expert verbalize what it is that he/she knows? What it is that he/she does? How he/she does it? The knowledge engineer must cope with the problem of having very little human expertise already codified. Therefore, the expert must be able to put into words, charts, graphs, etc., the knowledge he/she possesses. Decision making steps What steps are taken by the expert in order to solve a problem? What checks and balances are used? What weights are given to pertinent production rules? The intermediate steps taken in performing a task are as important as the resolution itself. Commitment Is the expert available and committed to the project at hand? Building an expert system means investing many hours of work. Although we have progressed since Davis’ (1982) statement regarding the time needed to develop an expert system (i.e., at least five man-years), the process still requires considerable amount of time and effort, though decreasing lately. Because the extraction of knowledge from the expert is laborious and time-consuming, an expert must be available to work with the knowledge engineer from conception and design, through implementation and testing, to refinement and release. Stages of construction

Three major stages in the construction of expert systems are given by Weiss & Kulikowski (1984). Designing the knowledge base This includes becoming familiar with the problem and its domain and characterizing the possible decisions, basic concepts, primitive relations, and definitions necessary for formalizing the domain knowledge. In addition, defining factors influencing the decisions and choosing the architecture needed to best organize the knowledge are included in the first stage of construction. Developing and testing the prototype This stage entails developing a model to be used for a feasibility study and testing the model against a library of case studies. After a prototype has been refined to acceptable levels, it may be expanded to include larger, more complex extensions of the original problem. Refining and generalizing the knowledge base Changing the knowledge base (perhaps redesigning the control structure and knowledge representations) and expanding the prototype (perhaps through versions 1 to n) to a full-fledged system usually dominate this stage of development. Modifications and revisions are a direct result of testing performed and of criticism drawn from the expert. Two more stages should be included here:



[C h.4

Extending the system This stage would include extending the capabilities of the system so that it could handle related problems — much as R 1was extended to XCON in order to configure non-VAX as well as VAX computer systems. Releasing the system Releasing the system for use by the target community would be the final step. Also included here would be maintaining the system in a dynamic environment. An organized effort to adapt the expert system to changes in needs and to maintain it at the optimal level of functionality must be provided. Basic structure The basic structure of a typical expert system consists of the following components (Fig. 4.3):



Fig. 4.3 — Components of an expert system. The inference engine, knowledge base, and global database are the major components of a rule-based expert system.

Knowledge base (KB) The knowledge base acts as a storage for the expert’s domain knowledge. It is separated from the actual computer program to make it easier to change the knowledge just by refining old rules and adding new ones without the need to change any other rule. It consists of facts, heuristics, and meta-knowledge. Inference engine (IE) or rule interpreter (RI) The control structure (also referred to as the inference engine or rule interpreter) contains the reasoning strategy or logic for choosing the appropriate problem-solving methods. In addition, rule-based systems have another database known as the global database.

Sec. 4.4]



Global database (GDB) As a working memory, the global database serves as a temporary storage for keeping track of assertions about the problem being solved. It records actions tried and effects of those actions on the state of the system, i.e., system status. Natural language interface There may also be a natural language interface (as a user-friendly front end) to provide for more flexibility and natural use. A natural language interface not only makes a program more understandable but also helps the user to feel more comfortable when interacting with the system. User-friendliness makes it possible for non-specialists to communicate with a computer program in a non-procedural manner. Because of the lack of progress in computer comprehension of natural language, user-friendly interfaces are still quite primitive. Improvements in natural language comprehension will have resulting effects upon user-friendly front ends (Roberts 1983). Explanation facilities Many expert systems include explanation facilities as part of their programs. The system may provide an internal audit trace of rules activated and executed (as well as those not tried) and uses that trace for explaining its behaviour and conclusions. Most explanation modules have capabilities such as: • displaying rules being invoked; • recording rules being invoked and using that record to explain how conclusions were reached and why certain rules were not tried; and • searching the knowledge base for rules to answer user questions (Davis et al. 1975). To access an explanation mode, usually a command of ‘WHY’ or ‘HOW’ is sufficient to elicit descriptions of what data and rules the inference engine invoked to reach its conclusion. This gives the user an opportunity to question the program’s reasoning and to review the steps it took to reach a conclusion (Gevarter 1983a). Explanation facilities also aid in the acquisition of knowledge by providing information necessary for the user to update and change the system. TEIRESIAS (Davis 1976) is an example of an explanation module that was developed to work with MYCIN. This explanation module still provides ‘shallow’ rather than ‘deep’ knowledge. 4.4


Overall, the knowledge engineer is responsible for communication between the expert and the computer. That communication entails the acquisition of knowledge, the representation of knowledge, the structure of control, and the utilization of knowledge. Knowledge acquisition A knowledge engineer must find exactly how an expert solves a problem and weed out any conflicting advice. This information is usually obtained by interviewing the



[C h.4

expert as he/she solves sample problems. However, it is often difficult for an expert to describe exactly what it is he/she does and how he/she does it. Therefore, the knowledge engineer must help to articulate the knowledge being used. As a rule, one expert per system is desired because experts differ in their views of a problem and methods for solving it. Using the knowledge of one expert makes it easier to obtain consistency within the system (Ham 1984). Therefore, the oneexpert-as-czar approach has been the credo for building most of today’s expert systems. Reconciling multiple experts’ views can be difficult. PROSPECTOR has successfully used several experts whose specialities lie in different types of mineral deposits. Because the process of knowledge acquisition may be extremely arduous and labor intensive, it is often referred to as the ‘bottleneck’ of expert systems construc­ tion. Attempts to alleviate the bottleneck have resulted in the development of programming tools such as TEIRES1AS (Davis 1981) and KAS (Reboh 1979) that aid knowledge acquisition, but much work still needs to be done. Further tools and techniques have been fully described in Section 3.5. Knowledge representation After extracting the knowledge from the expert, the next step is to encode it, including heuristics, in machine-readable form for use by the computer. Decisions on how knowledge will be represented are affected by the task at hand. As Roberts (1983) points out, the task of codifying the culture and folklore associated with a discipline is not simply a matter of typing a few sets of rules. It requires the system designer to match the domain knowledge with an appropriate knowledge represen­ tation structure. Major approaches to knowledge representation include: Production systems. The production rule system (Fig. 4.4) is an AI developer’s most popular approach to mode domain knowledge. Each domain-specific rule in the system represents a ‘chunk’ of knowledge and takes the form of (/this is true, then do or conclude that with a confidence level of n (between —1.0 and +1.0). Certainty or confidence factors (CF) are indices of how strongly the system ‘believes’ that the preconditions I F _________________ >THEN SITUATION_________ > ACTION PREMISE______ >CONCLUSION_>CFn . are satisfied, other more pertinent methods exist (Chapter 13). Certainty or confidence factors (CF) are indices of how strongly the system ‘believes’ that the preconditions imply the conclusion (i.e., rule reliability). Table 4.4 gives an example of a production rule found in R l, an expert system used by DEC to configure computer systems (McDermott 1982). This rule represents a piece of R l’s knowledge. If the ‘premises’ are true, then this rule will be invoked and will be used by Rl to reason toward an ‘action’. R l’s basic approach is to break the problem up into the subtasks and perform each one in order. During the system of configuration, Rl must choose, from a list of several applicable rules, which rule to try next. The program assumes that the rule with the


Sec. 4.4]



A,B,C XY A, B, F, G YZ



Fig. 4.4 — Graphic representation of production rules.

most IF clauses is the most specialized rule of the list and makes a rule selection based on that criteria (Gevarter 1983a). When knowledge can be respresented as actions and sequences of actions, the production rule system may be a particularly effective approach to knowledge representation. Frames and scripts. A frame is a knowledge representation scheme used for handling stereotyped objects, groups of objects, and situations (Minsky 1975). A frame has both content and procedural knowledge and is interconnected to other frames, much



[C h.4

Table 4.4

Sample R1 production rule IF: (1) (2) (3) (4)

The current context is assigning devices to Unibus modules, and There is an unassigned dual-port disk drive, and The type of controller it requires is known, and There are two such controllers, neither of which has any devices assigned to it, and (5) The number of devices that these controllers can support is known .. .

THEN: (1) Assign the disk drive to each of the controllers, and (2) Note that the two controllers have been associated and that each supports one device. Source: McDermott 1982

Production rules take the form of IF some conditions exist, i.e. list 1-5 THEN apply some conclusions, i.e. list 1-2

as how human memory is thought to be organized, it stores what is typically known about a specific subject and expectations for comparisons with observed data. Each frame has memory areas known as slots that, when filled in, describe objects and give correspondence between sets of attributes (Table 4.5), along with a frame comes information such as: • how to use a frame; • how to handle the unexpected; and • slot default values (Gevarter 1983b). Scripts are frame-like prototypes used to describe classes of subjects, scenes, or events such as a child’s birthday party, story writing, eating out, or identifying manufactured devices (Schank & Abelson 1977). They are generalized sequences of events that are assumed will happen ‘this’ time because they have ‘usually’ happened in the ‘past’ (i.e., prior knowledge used to interpret new information). A birthday party script might include such stereotypical scenes as playing games, eating ice cream and cake, opening presents, singing happy birthday, etc. Frames and scripts as a knowledge representation approach may be particularly effective when representing inter-entity relationships. For example, suppose that a natural language system with a birthday script were given the input sentence, ‘Peter and Mary gave Martin a cake with candles, and he blew them out’. The script, equipped with knowledge about the birthday candle ritual, would enable the system


Sec. 4.4]


Table 4.5

Sample of frame representation FRAME Prototype: Colour: Size: Texture:

Grizzly Black Large Hairy

Given an instance of class (instantiation) GI Instance of:


When asked what colour GI is, the frame system would reply ‘black’

A prototypical description of a grizzly bear — frame system

to assign the ambiguous pronoun references in the text ‘he blew them out’ to the appropriate referents. Semantic networks. Semantic nets (for short) are associative knowledge represen­ tations. They use nodes linked together by directed arcs (labelled edges) to describe concepts, facts, expressions, events, meaning, and object properties and their relations. Concepts nodes (objects) and predicate nodes showing causal relation­ ships (verbs) have meaning only if they are associatively linked to other nodes. In semantic nets the fundamental assumption is that all the information regarding a particular set of concepts and their relationships to other concepts are located in the same semantic vicinity (Schubert et al. 1979). In the graphical representation in Fig. 4.5, ‘circles’ represent concepts nodes (George, Mary, Bruce, Book, Female), ‘ovals’ represent predicate nodes (loves, gave), and arrowed lines represent arcs showing causal relationships (A, B, C; linked, ordered pairs). ‘Female’ is an additional characteristic in Mary’s ‘semantic’ vicinity. Based upon relatively general concepts, semantic nets use nodes to represent entities and arrowed lines to indicate relationships between attached nodes. The knowledge engineer must choose the appropriate combinations of linked nodes to represent the expert’s knowledge in the system (Brachman 1978). Blackboards. In large application systems, a promising knowledge representation approach has been to separate the knowledge base into independent modules. Each module acts as a mini-knowledge base and is responsible for subtasks for the system as a whole. These mini-expert systems communicate via a shared knowledge base called a blackboard, where domain information, along with partial problem solu-



[Ch. 4


Fig. 4.5 — Sample semantic representations.

tions and current activities, are stored and coordinated (Hayes-Roth et al. 1983). Modular knowledge bases can use a combination of production rules, frames/scripts, or semantic nets as knowledge representation approaches, HEARSAY II (Erman et al. 1981), AGE (Nii & Aiello 1979), and SOPHIE (Brown et al. 1982) have used the blackboard structure. When various levels of knowledge are needed and external events arrive at the system frequently, the blackboard approach to knowledge representation may be particular effective (Ham 1984). Whatever knowledge representation scheme is chosen, organizing the rules so that they work collectively and are readily accessible by the program is one of the basic tasks of a knowledge engineer. Control structure The inference engine or rule interpreter is the control structure the expert system uses to form inferences and to make diagnoses. It tells the system what to do next. By .selecting which rules to activate and execute, the inference engine determines how the knowledge will be used to solve a problem. There are three common control strategies: Data-driven (also called forward chaining and bottom-up). This control strategy starts from the initial data and moves upward through the rules to form a line of reasoning. The inference engine scans through the rules, finds one to apply, applies it, updates the global database, and resumes scanning until no further rules can be invoked (Table 4.6). The R1 system uses a variant of the data-driven approach as its control strategy (McDermott 1982).


Sec. 4.4]


Table 4.6 — Sample summary of data-driven procedure

1 Database (with initial data and facts) 2 Do 3-5 until database satisfies goalstate 3

Test the condition side of each rule against the database


Select a rule, say A, from all applicable rules

5 Apply rule A and modify the database 6 Stop Source: Cox (1984, p. 238). T he data-driven approach uses a forward-chaining of rules process to form a line-of-reasoning. ____

Goal-driven (also backward chaining and top-down). This problem solving approach chains backward from the conclusion or goal to be achieved, working from goal to subgoal, until the path to the conclusion is found — basically, a ‘here-is-a goal; howdo-we-get-to-it’ approach. Since the inference engine is looking for consequents that can lead to a particular goal, it only considers rules that are related to reaching that goal (i.e., it is goal-sensitive). In the goal-driven strategy, the validity of condition statements of one rule is influenced by the action statements of succeeding rules (Cox 1984). The MYCIN system uses a variant of the goal-driven approach as its control strategy (Shortliffe 1976). -- ----------


Combination-driven (data plus goal-driven approaches). Using a mixture of both the bottom-up and top-down processes, a solution is reached when the chaining converges. Bidirectional searching requires that both initial descriptions and goal descriptions must be found in the global database. The inference engine decides at each step which type of rule to invoke. When the condition coded in the initial L description section of the database matches the consequent coded in the goal description section, a solution is reached (Nilsson 1980). Meta-rules. Whatever problem solving strategy is chosen, meta-rules (rules about rules) may be added. To have a program that is more sensitive to the problem at hand, meta-rules can direct the use of rules in the knowledge base. When higherlevel rules are used to reason about lower-level rules, facts and rules that might have been ignored (because they seemed less important) may be executed. In addition, when more than one rule applies, meta-rules can give priority to one rule over another (Table 4.7). Before processing every applicable rule, meta-rules can be used to prune the list to a more task-specific size. Knowledge utilization After embedding the knowledge in the computer, the knowledge engineer's job consists mainly of refining, testing, and expanding: • Refining the knowledge includes using feedback from the expert for debugging. • Testing the system includes comparing the prototype against a library of case studies. • Expanding the prototype may include prototypes II and/or III versions before a full-fledged expert system is reached.



[Ch. 4

Building an expert system may take years, even for experienced knowledge engineers. The number of rules in a modular prototype is usually estimated to be near 50; a prototype contains some 200 rules; and a completed expert system may have 500 rules. It may take the knowledge engineer one hour to extract one production rule from one expert (Edelson 1982). Given this formula, it is easy to see how the construction of an expert system could take many years. 4.5


Tools have been designed for the sole purpose of aiding in expert systems construc­ tion. Non-computer knowledge-base authors can look to expert systems skeletons for direction. Skeleton systems are shells that provide a framework for building expert systems. Most contain an inference engine and a variety of programming aids (e.g., natural language interfaces, editors, knowledge acquisition tools, etc.); the user must supply the knowledge base. EMYCIN (Van Melle 1979), EXPERT (Weiss & Kulikowski 1984), and HEARSAY II (Erman et al. 1981) are examples of early expert systems skeletons designed to make building expert systems easier. However, such programs have drawbacks; the knowledge engineer must be concerned with determining if: • the new task fits the old framework (the biggest drawback);

Table 4.7

Sample mycin meta-rule META-RULE 2 IF: (1) The patient is a compromised host; and (2) There are rules which mention in their premise pseudomonas, and (3) There are rules which mention in their premise klebsiellas ... THEN: There is suggestive evidence (.4) that the former should be done before the latter. Source: Feigenbaum (1977, p. 1022).

Meta-rules are really ‘what-next' rules; they act as ‘strategy con­ trollers’, guiding choices among potential solution paths.

Sec. 4.6]



• the new expert's way of solving a problem is compatible with the old control structure; • the new task can use the old language; • the new system will be affected by any task-specific knowledge that may be hidden in the old system (Hayes-Roth et al. 1983). Microcomputer (personal computer) expert systems are generally based on generic architecture. Some manufacturers add special tools and modifications. For example. Personal ConsultantIM from Texas Instruments adds a development engine to the system components. Generally, the small systems for personal computers can be used to build knowledge systems of about 400 rules. These systems afford an interested domain expert the ability to construct an expert system. Several different software products are available for developing expert systems. ES/P ADVISOR employs backward chaining and heuristics called production rules. When combined with PROLOG the system is very powerful. EXPERT-EASE builds rules in a matrix that are evaluated through a decision tree. INSIGHT is based on if-then rules and uses backward chaining. is a diagnosis/prescriptive system using backward chaining. All of the systems, except EXPERT-EASE, require the use of a word processor, and all of the systems are IBM PC compatible. Some readily available skeleton systems are listed in Appendix B. A more detailed discussion on skeleton systems also named shells or knowledge engineering tools follows in connection with Al-languages in Chapter 6. 4.6


The knowledge engineer has a variety of choices for system architecture, but must work within the environment that each tool and technique operates and must be prepared to deal with its limitations (Stefik et al. 1982). Work in expert systems has been largely experimental — building and testing — with little commonality in architectural design. Thus, the knowledge engineer must look at the complexity of the problem, availability of inputs, or of desired output, and select or design data and control structures appropriate for the particular application. Designing expert systems has been referred to as an 'art' — where most knowledge engineering artists are trying to find the answers to Schank's basic questions concerning: • • • •

the the the the

nature of the knowledge; abstraction from existing knowledge to more general rules; modification of problem-solving that are independent of the domain; and relationship of plans and goals to understanding (Waldrop 1984b).

Producing expert level performance from a computer is not an easy task. 'The thing is’, says Schank, 'Al is very hard' (Waldrop 1984b, p. 805). Inherent limitations o f expert systems The principal problems that are limiting progress are: • Representing temporal knowledge • Representing spatial knowledge


• • • •

[Ch. 4


Performing commonsense reasoning Recognizing limits of their ability (cycles) Handling inconsistent knowledge Inability to perform knowledge acquisition

Other pitfalls in planning an expert system and possible remedies are: • Problem too difficult (Build a small-sized prototype system) • Cost-effectiveness • Problem too large or unstructured, so that an excessively large number of rules are required Approach: Pre-structure problem domain • Choice of inappropriate tool for knowledge engineering (Get second opinions) • Choice of inappropriate knowledge engineering language. Expert systems can be built much faster in knowledge engineering language than in conventional programming language. Specific limitations: (a) Domain expert Difficulties to extract rules from the expert. Pick an expert skilled in the target domain. Make sure the expert understands the rules. (b) Development process During system development, knowledge of the expert becomes intertwined with the program. Separate domain specific knowledge from the rest of the program.



DESCRIPTIVE CATEGORIZATION A Diagnosis — Acid/Base Electrolyte Disorders Computer-Aided Instruction — Mathematics Planning — Robotics Maintenance — Telephone Cables Image Understanding Expert Systems Design Tool — Blackboard Model Information Presentation Diagnosis — Medicine (Rheumatism & Arthritis) Planning — Naval Aircraft Take-off & Recovery Signal Interpretation — Medicine (Left Ventrical Performance) Expert Systems Design Tool Concept Formation — Mathematics Signal Interpretation — Military Situation Determination Expert Systems Design Tool — Skeleton — PCs Expert Systems Design Tool — Shell for Electronic Systems Diagnosis Expert Systems Design Tool

O RG AN IZATIO N / VENDOR* MIT BBN SRI Bell Stanford Stanford BBN Rutgers/Missouri CMU Toronto Int. Terminals CMU MITRE Logic Prog. Assoc. Smart Sys. Inference Corp.

App. A]



Planning — Selects Procedures Used by Independent Auditors



B Planning — Battlefield Weapons Assignment Computer Aided Instruction — Diagnostic Skills



C Diagnosis — Internal Medicine Project Management Diagnosis — Medicine (Glaucoma) Maintenance — Electric/Diesel Locomotives Analysis — Medicine (Pulmonary Diseases) Automatic Programming Analysis — Chemistry (DENDRAL variant) Knowledge Representation Language Consultant — Tax Advisor Diagnosis — Computer Faults Analysis — Digital Circuits Analysis — Protein Crystallography Consultant — Nuclear Power Plant Configuration Analysis — Organic Chemistry (C-13 Spectra)

Pittsburgh DEC Rutgers ge" Stanford Kestrel Stanford Tokyo Denki Columbia ICL Rutgers Stanford Georgia Stanford


D Consultant — Manufacturing Assembly Diagnosis — Computer Faults Knowledge Representation Language Computer-Aided Instruction — Mathematics Automatic Programming Analysis and Interpretation— Chemistry — Organic Compounds Planning — Planetary Flybus Diagnosis — Internal Medicine Consultant — Medicine Analysis and Interpretation — Geology (Well-Log­ ging Problems) Computer System Sizing Diagnosis — Oil-Well Drilling-Bit Sticking Problems Knowledge Representation Language — Logic Based


E Auditing Advanced EDP Systems Analysis — Electric Circuits Interpretation — Oil Well-Log Data Consultant — Welfare Rights Legislation Diagnosis — Medicine (Chest Pains) Expert Systems Design Tool — Skeleton Knowledge Acquisition Tool Heuristic Formation — Mathematics Computer-Aided Instruction — Mathematics Expert Systems Design Tool — Skeleton Expert Systems Design Tool — With Knowledge Acquisition — PCs Automatic Programming — tactical Data Fusion Expert Systems Design Tool — PCs


F Knowledge Representation Language — Frames &


Hull Stanford/IBM DEC PARC SRI Stanford JPL Pittsburgh MIT MIT/Schlum. ICL Teknowledge Smart Sys.

Brig. Yg./Florida MIT AMOCO Open Un. UCSF Stanford Boeing Stanford Stanford Rutgers Perrone Lockheed EXSYS



[Ch. 4


Predicate Calculus



Analysis — Chemistry (DNA Structures) Management — Genetic Engineering (Sequencing Nucleic Acid) Planning — Gene-Splicing Experiments Analysis — Chemistry (DENDRAL Variant) Expert Systems Design Tool General Problem Solving Computer Aided Instruction — Medical Diagnosis (Tutor for MYCIN's Knowledge Base) Model Management System — Mathematics


H Design — PC Board Configuration Automatic Programming — Simple Programs for Limited Domains Expert Systems Design Tool Speech Understanding Signal Interpretation — Ocean Surveillance Consultant — Medicine (Psychopharmacology) Speech Understanding Expert Systems Design Tool — Skeleton Diagnosis— Medicine Planning — Medicine (Hodgkins Disease) Speech Understanding Consultant — Water Resource Problems

Stanford Intellicorp Intcllicorp Stanford GHE CMU/RAnd Stanford Naval Ps. Hazeltine MIT Martin Mar. C'MU System Contr. Stanford C'MU





Diagnosis — Computer Faults (PDP hardware problems) Management — Automated Factory Signal Interpretation — Ocean Surveillance (Tracking) Diagnosis — Soybean Diseases Diagnosis — Internal Medicine Planning — Stacking Blocks Interpretation — News Stories about Terrorism Expert Systems Design Tool Planning— Delivery Dates for Computer Systems Job Shop Scheduling K Knowledge Acquisition Tool Expert Systems Design Tool — PC version available Design — VLSI Circuits Knowledge Representation Language — Frames Expert Systems Design Tool Expert Systems Design Tool Knowledge Representation Language Design — Medical Diagnosis System Knowledge Management System Planning— Missions Knowledge Representation Language — Frame Knowledge Representation Language Expert Systems Design Tool — Industrial Diagnostic & Advising Applications L Consultant — Product Liability Legislation Automatic Programming

DEC CMU Verac Illinois Pittsburgh Edinburgh Yale Rutgers DEC CMU SRI Gold Hill PARC/Stanford IntelliCorp IntelliCorp Sfts. A&E BBN Maryland Sys. Devel. MITRE XEROX-PARC Fairchild Teknowledge

Rand Stanford

App. A]





Expert Systems Design Tool — PCs Mathematical Theorem Proving Signal Interpretation — Well — Logs Knowledge Representation Language — Object-Oriented M Analysis — Symbolic Mathematics Analysis — Symbolic Mathematics Diagnosis — Medicine Analysis — Mechanics Problems Expert Systems Design Tool — Medical Consultant Automatic Rule Formation — Chemistry Planning — Molecular Genetics Knowledge Representation Language Diagnosis — medicine (Bacterial Infections) Expert Systems Design Tool — PCs N Expert Systems Design Tool — Skeleton Planning — Robotics Computer-Aided Instruction — MYCIN Reorganized for Teaching, using GUIDON Expert Systems Design Tool — Object-Oriented O Consultant — Medicine Design — PC Board Configuration Planning — Errands Knowledge Representation Language — Production Rules Knowledge Representation Language — Semantic Nets P Chess Automatic Programming Diagnosis — Automobile Engine Fault Expert Systems Design Tool — PCs

Lightyear CMU/RAnd Stanford/Schlum. XEROX-PARC

MIT MIT OSU Edinburgh Tokyo Stanford Stanford Stanford Stanford Teknowledge

CMU SRI Stanford Neuron Data Stanford Hazeltine Rand CMU MIT

SRI Stanford SRI TI

Diagnosis — Machine Processes Analysis — Protein Sequences Automatic Programming — Modelling of Oil WellLogs Diagnosis — Medicine Consultant — Manufacturing Coating Problems

CMU/Westingh. IntelliCorp Schlumberger

Automatic Programming



Analysis and Interpretation — Geology (Mineral Deposits) Knowledge Representation Language — Human Cognition. Led to Design — Computer Program Synthesis Management — Test & Assembly Plant Diagnosis — Medicine (Pulmonary Functions)


Knowledge Representation Language



R Design — Computer Questions for User




MIT Hull

CMU OPS5 Kestrel DEC Stanford



[Ch. 4




Diagnosis — Computer Hardware/Software Faults Expert Systems Design Tool Consultant — Medicine (Radiology) Diagnosis — Nuclear Reactor Accidents Consultant — Medicine (Diagnostic Prompting) Management — Biographical Data Expert Systems Design Tool — Uses a Pascal-Type Language — PCs Expert Systems Design Tool Knowledge Representation Language Expert Systems Design Tool Knowledge Representation Language — Simulation Planning — Mission Flights Information Retrieval Expert Systems Design Tool — Induces Rules from Decisions Design — Configuration of Computer Systems S Analysis — Computer Program Structure Design — Digital Electronics Automatic Programming Analysis — Symbolic Mathematics Analysis — Scripts Computer-Aided Instruction — Coaching A Game Planning — Chemical Synthesis Diagnosis — Medicine (Serum Protein Electrophoresis) Analysis — Symbolic Mathematics Solves Symbolic Mathematics Computer-Aided Instruction — Electronic Trouble-Shooting Analysis — Computer Error Logs Analysis — Earthquake Damage Planning — Molecular Genetics Experiments Crisis Management — Chemical Spills Signal Interpretation — Sensors on Naval Vessels Computer-Aided Instruction — Steam Propulsion Plant Operation Planning — Robotics (Move Objects Between Rooms) Signal Interpretation — Machine Acoustics Computer-Aided Instruction — Air Battle Simulation Design — Circuit Synthesis Design — Chemical Synthesis

ICL IBM Rutgers EC&G UCSF CNSR InfoTym Rand Stanford Rand Rand Sys. Cont. AI&DS Radian CMU/DEC Stanford Maryland

use MIT Yale BBN UCSC Rutgers MIT Inference BBN DEC Purdue Stanford Rand NOSC BBN SRI Stanford Rand MIT SUNY


T Planning — Tactical Targeting Planning — Estate Planning — Financial Analysis — Naval Task Force Threat Knowledge Acquisition Tool — Used with MYCIN Expert Systems Design Tool — Skeleton — PCs Audit Documentation Knowledge Acquisition Tool Planning — Military Air Operations Planning in wargames

Rand Illinois Rutgers Rand/NOSC Stanford Helix Minnesota GRC IABG


U Knowledge Representation Language — With Knowledge Acquisition



App. B]


V Image Understanding Diagnosis — Monitoring Respiration

Mass. Stanford


W Analysis — Seismic Data Computer-Aided Instruction — Coaching a Game Computer-Aided Instruction — Coaching a Game Computer-Aided Instruction — Coaching a Game

Teknowledge BBN BBN MIT


X Expert Systems Design Tool Consultant — Computer Sales Expert Systems Design Tool — Skeleton — PCs

MIT DEC California

To be named Management — Nuclear Power Reactor Diagnosis — Manufacturing (Integrated Circuit Fabrication) Cost Estimation of Steam Boilers Analysis — Strategic Indicators and Warnings Analysis — Battlefield Communications Risk Assessment — Construction Projects

Hitachi Hitachi Hitachi Teknowledge Teknowledge XEROX-PARC


British expert system shells AL/X Application area: Knowledge Representation: Inference Engine: Facilities:

Example application: Development language: Availability: APES Application area: Knowledge Representation: Inference Engine: Facilities: Example application:

Analysis Inference Network IF-THEN rules Goal-directed Search Data-driven Answer Propagation Explanation Uncertainty of Knowledge (Bayesian) Session Log Diagram of Inference Network Cause of Automatic Shutdown of Oil Production Platforms PASCAL IBM PC, Apple Analysis Logic Goal-directed Depth-first Search Explanation Diagnosis of Pipe-corrosion Social Service Benefit entitlement Dam Site Location




Development language: Availability: ES/P ADVISOR Application area: Knowledge Representation: Inference Engine: Facilities:

Example application: Development language: Availability: EXPERT Application area: Knowledge Representation: Inference Engine: Facilities:

Example application: Development language: Availability: EXPERT-EASE Application area: Knowledge Representation: Inference Engine: Facilities: Example application: Development Language: Availability: micro-EXPERT Application area: Knowledge Representation: Inference Engine: Facilities:

micro PROLOG Spectrum, BBC/B, IBM PC Analysis Inference Network IF-THEN rules Goal-directed Depth-first Search Explanation Checkpoint and Rerun Session Log New Employee Procedure (from Employers’ Guide PAYE) PROLOG-1 IBM PC, DEC Rainbow, SIRIUS Analysis Inference Network IF-THEN rules Goal-directed Search Data-driven Answer Propagation Explanation Uncertainty of Knowledge (Bayesian) Goal Sensitivity Analysis Dump Analysis PASCAL PASCAL source code Classification Tuples of Attributes and Outcome None. Data-driven Decision Hierarchy Trace of Path taken in Decision Hierarchy Rule-induction on Example Tuples Prediction of Bear/Bull Market PASCAL, FORTRAN IBM PC Analysis Inference Network IF-THEN rules Goal-directed Search Data-driven Answer Propagation Explanation Uncertainty of Knowledge (Bayesian) Fuzzy Values Control

[C h.4


App. B]

Example application:

Development language: Availability: NEXUS Application area: Knowledge Representation:

Inference Engine: Facilities:

Example application: Development language: Availability: REVEAL Application area: Knowledge Representation: Inference Engine: Facilities:

Example application:

Development language: Availability: SAGE Application area: Knowledge Representation:


Multiple Goals Session Log Project Management Naive Terminal Fault Diagnosis Knowledge Archiving Risk Analysis PASCAL IBM PC, SIRIUS, SAGE, VAX/VMS, PDP 11, IBM 3000/4000 Analysis (Financial Models) Inference Network IF-THEN rules Equations Goal-directed Search Explanation Uncertainty of Knowledge (Bayesian) Fuzzy Comparison Knowledge Editor Windowing (Dialogue, Help, Trace, ...) Released PASCAL, (C) IBM PC Analysis Production rules Tuples (Goal-directed (own code)) (Data-driven (own code)) Explanation (Rule-Induction on Example Tuples (own code)) Fuzzy tuples Integrated Environment/Shared Context Windowing Multiple Goals Session Log Personnel Evaluation Financial Decisions Production Quality Control FORTRAN, (REVEAL) IBM XT, ICL 2900, IBM Analysis Inference Network Blocks of IF-THEN rules



Inference Engine: Facilities:

Example application:

Development language: Availability:

[Ch. 4

Goal-directed Search Explanation Uncertainty of Knowledge (Bayesian) Fuzzy Comparison Control Checkpoint and Rerun Session Log Gear-Box Fault Diagnosis Intelligent Front-End Legal Advice PASCAL ICL PERQ, VAX/VMS, PDP 11, ICL 2900


Beerel, A. C. (1987) Expert systems: strategic implications and applications, Ellis Horwood: Chichester. Brown, J. S., Burton, R. R. & DeKleer, J. (1982) ‘Pedagogical, natural language and knowledge engineering techniques in SOPHIE I, II & IIP, pp. 227-282. In Intelligent tutoring systems, Sleeman D. & Brown, J. S. (eds). Academic Press: Inc. (London) Ltd, London, England. Buchanan, B. G. & Feigenbaum, E. A. (1981) ‘DENDRAL and META-DENDRAL: their applications dimensions’, pp. 313-322. In: Readings in artificial intelligence, Webber, B. L. & Nilsson, N. J. (eds.), Tioga Press: Palo Alto, Cal. Buchanan, B. G. & Shortliffe, E. H. (1984) Rule-based expert systems, AddisonWesley: Reading, Mass. Cox, I. J. (1984) ‘Expert Systems’, Electronics and Power, 30(3) 237-240. Davis, R. (1981) ‘Interactive transfer of expertise: acquisition of new inference rules’, pp. 410-428. In: Readings in artificial intelligence, Webber, B. L. & Nilsson, N. J. (eds), Tioga Press: Palo Alto, Cal. Davis, R. (1982) ‘Expert systems: where are we? And where do we go from here?’ A l Magazine 3(2). Davis, R. (1976) Applications of meta-level knowledge to the construction, mainten­ ance and use o f large knowledge bases, STAN-CS-76-564, Stanford University, Stanford, Cal. Davis, R., Buchanan, B. G. & Shortliffe, E. (1975) Production rules as a represen­ tation o f a knowledge-based consultation program, STAN-CS-75-519, Stanford University, Stanford, Cal. Duda, R. O., Hart, P. E., Barrer, P., Gaschnig, J., Reboh, R. & Slocum, J. (1978) Development of the PROSPECTOR consultation system for mineral exploration, Project 6415, Stanford Research Institute, Menlo Park, Cal. Erman, L. D., London, P. E. & Fickas, S. F. (1981) ‘The design and an example use of HEARSAY-III’, IJCAI-81, pp. 409^115. Feigenbaum, E. A., Buchanan, B. G. & Lederberg, J. (1971) ‘On generality and problem-solving: a case study using the DENDRAL program’, pp. 165-190. In:

Ch. 4]



Machine Intelligence, Vol. 6, Edinburgh University Press: Edinburgh, Scotland. Feigenbaum, E. A. & McCorduck, P. (1983) The fifth generation: artificial intelli­ gence and Japan's challenge to the world, Addison-Wesley: Reading, Mass. Gabriel, R. P. (1985) Performance and evaluation o f Lisp systems, The MIT Press: Cambridge, Mass. Gevarter, W. B. (1983a) ‘Expert systems: limited but powerful’, I EEE Spectrum 20(8) pp. 39-45. Gevarter, W. B. (1983b) An overview o f artificial intelligence and robotics, Vol. 1: Artificial Intelligence, Part C, Basic artificial intelligence topics, NASA 85839, National Technical Information Service, Springfield, Virginia. Ham, M. (1984) ‘Playing by the rules’, PC World, January, pp. 34-41. Hayes-Roth, F. (1981) A l the new wave — a technical tutorial for R & D management, AIAA-81-8027, Rand Corporation, Santa Monica, California. Hayes-Roth, F., Waterman, D. A. & Lenat, D. B. (1983) Building expert systems, Addison-Wesley: Reading, Mass. Lenat, D. B. (1977) ‘Automated theory formation in mathematics’, IJCAI-77, pp. 833-842. Lenat, D. B. (1978) ‘The ubiquity of discovery’, AFIPS-78, pp. 241-256. Lenat, D. B., Sutherland, W. R. & Gibbons, J. (1982) ‘Heuristic search for new microcircuit structures: an application of Artificial Intelligence’, A I Magazine, 3(3), pp. 17-33. Manuel, T. & Evanczuk, S. (1983) ‘Commercial products begin to emerge from decades of research: expert and natural-language systems heralds what could be a tidal wave’, Electronics, 56(22), pp. 127-129. McCorduck, P. (1977) ‘History of Artificial Intelligence’, IJCAI-77, pp. 951-954. McDermott, J. (1982) ‘Rl: a rule-based configurer of computer systems’, Artificial Intelligence 19(1), pp. 39-88. Minsky, M. (1975) ‘A framework for representing knowledge’, pp. 211-280, In: The psychology o f computer vision, Winston, P. H. (ed.), McGraw-Hill: New York. Nii, H. P. & Aiello, A. (1979) ‘AGE (attempt to generalize): a knowledge-based program for building knowledge-based programs’, ICA1-70, pp. 645-655. Nilsson, N. J. (1980) Principles o f Artificial Intelligence, Tioga Press: Palo Alto, Cal. Quinlan, J. Ross (1986) Applications of expert systems, Addison-Wesley: Reading, Mass. Reboh, R. (1979) The knowledge acquisition system, Report 6415, Stanford Research Institute, Menlo Park, California. Rich, E. (1984) ‘The gradual expansion of Artificial Intelligence’, Computer, 17(5), pp. 4-12. Rich, E. (1983) Artificial Intelligence, McGraw Hill: New York. Roberts, S. K. (1983) ‘Artificial Intelligence’, Mini-Micro Systems 16(11), pp. 229-233. Rolston, D. W. (1988) Principles o f Artificial Intelligence and expert systems development, McGraw-Hill: New York. Schank, R. C. & Abelson, R. (1977) Scripts, plans, goals and understanding, Lawrence Erlbaum: Hillsdale, NJ.



[Ch. 4

Schubert, L. K., Goebe, R. G. & Cercone, N. J. (1979) ‘The structure and organization of a semantic net for comprehension and inference’, pp. 121-175. In Associative networks: representation and use o f knowledge by computers, Findler, N. V. (ed.), Academic Press: New York. Shortliffe, E. H. (1976) Computer-based medical consultants: M YCIN, Elsevier: New York. Shurkin, J. N. (1983) ‘Expert systems: the practical face of artificial intelligence’, Technology Review, 86, pp. 72-78. Stefik, M., Aikins, J., Balzer, R., Benoit, J., Birbaum, L., Hayes-Roth, F. & Sacerdoti, E. (1982) The organization of expert systems: a tutorial’. Artificial Intelligence 18(2), pp. 135-173. Zolovitz, P. (ed.) (1982) Artificial Intelligence in medicine, Westview Press: Boulder, Co. Touretzky, D. S. (1984) LISP: a gentle introduction to symbolic computation. Harper and Row: New York. Van Melle, W. (1979) ‘A domain-independent production-rule system for consul­ tation programs’, IJCAI-79, pp. 923-925. Van Melle, W. (1980) A domain-independent system that aids in constructing knowledge-based consultation programs, STAN-CS-80-810, Stanford Univer­ sity, Stanford, California. Waldrop, M. M. (1984a) ‘The necessity of knowledge’. Science 223, (4642), pp. 1279-1282. Waterman, D. A. (1986) A guide to expert systems, Addison-Wesley: Reading, Mass.

Natural language processing Much of the intellectual power inherent in computers is untappable by most humans because of a language barrier. Computers operate with machine language— l ’s and 0’s — which is a form of discourse so far removed from normal language that only very small tortured conversions can take place. This language barrier led early on to the development of languages that were somewhat more comprehensible to humans but could still be translated into machine language. Many so-called high-level languages were developed and are still being developed. No programming language so far developed, however, is even close to human language. Thus only those who have mastered a programming language have access to the power of computers. And mastering a programming language is a difficult, time consuming task, in contrast to, for example, learning to drive a car. The ideal form of discourse between humans and computers would be in a language natural to humans — that is, natural language. Much effort has gone into programs that will be able to understand natural language, more effort and resources probably than in any other area of Al. If the language barrier can be overcome, computers can start to do many things that today are beyond them. For one thing, they would be able to respond intelligently to human inquiries instead of simply spitting out some pre-recorded reply. This would be of obvious advantage in correcting billing errors by stores, in ordering products by telephone, and putting new data into a computer system. But discourse in natural language, as used as humans, presents some extremely difficult problems of mechanization. Nils J. Nilsson writes, in Principles o f AI: When humans communicate with each other using language, they employ, almost effortlessly, extremely complex and still little understood pressures. It has been very difficult to develop computer systems capable of generating and ‘understanding’ even fragments of a natural language, such as English. One source of the difficulty is that language has evolved as a communication medium between intelligent beings. Its primary use is for transmitting a bit of ‘mental structure’ from one brain to another under cirumstances in which each brain possesses large, highly similar, surrounding mental structures that serve as a common content . . . . A word to the wise from the wise is sufficient. Thus generating and understanding language is an encoding and decoding problem of fantastic complexity. While the problems are difficult, a great deal of progress has been made, at least



[Ch. 5

toward programs that can parse English sentences and pick out nouns, verbs, adverbs, and so forth and answer simple questions about the subject matter. Some programs are capable of understanding simple narratives and can respond appropria­ tely when questioned. One program can process new stories off the Associated Press news wire and pick out key ideas and concepts with about 90% success. Computer programs have great difficulty, however, with unreferenced pronouns such as ‘it’. If one human says to another, ‘It is raining’, he is understood. But most computer programs cannot figure out what the ‘it’ is that is doing the raining. Programs also have trouble with analogical statements. (The statement, ‘Out of sight, out of mind’ was once translated by a computer as ‘blind, insane’.)

5.1 ARTHUR ARTHUR (A Reader That Understands Reflectively), (Granger 1983) is a computer program that can understand short, simple narratives and can revise its interpretation of what is occurring as it learns new information that contradicts something it may have learned previously. As an example, ARTHUR was ‘told’ the following simple story: Geoffrey Higgins walked into Roger Sherman’s movie theatre. He went up to the balcony, where Willy North was waiting with a gram of cocaine. Geoff paid Willy in large bills and left quickly. To a human it is fairly clear that Geoff went into the theatre to buy cocaine, although it is also possible that Geoff met Willy accidentally. When ARTHUR is asked why Geoff went into the theatre, it replies: ‘At first I thought it was because he wanted to watch a movie, but actually it’s because he wanted to buy cocaine’. (ARTHUR actually replies in a high-level programming language, and uses a formal representation, the language known as the Predicate Calculus.) ARTHUR operates by generating an inference based on what it has been told. After the first sentence in the above story, ARTHUR comes to the conclusion that Geoff wants to see a movie. Most humans would make the same assumption, lacking any other information. After hearing the second sentence, however, that Willy is waiting with cocaine, ARTHUR reconsiders its first inference and decides that Geoff really wanted to buy cocaine. The third sentence tends to confirm this new inference although it does not rule out the possibility of an accidental meeting. How is ARTHUR able to make any sense at all out of a story about two people and a drug sale? Before ARTHUR can come to any kind of conclusion, it must have information about movies, drugs, and how people interact. In brief outline, ARTHUR is provided with goals, plans, and scripts. ARTHUR operates on what is called the parsimony principle, namely, ‘The best context for inference is the one which accounts for the most actions of a story character.’ Thus, the inference that Geoff went to the theatre to buy cocaine is a simpler explanation of what happened than any other interpretation. ARTHUR tries to find the goal of the characters; in this case to buy cocaine

Sec. 5.2]



rather than to see a movie. The program must also deal with methods for achieving goals; these are the plans and scripts. ARTHUR has previously been told that a movie theatre is a place where a ‘Scripty’ activity occurs; in this case watching a movie. The activity of ‘watching a movie’ itself has a goal of entertaining the character doing the activity, but it may also have the goal of learning something. ARTHUR may have to determine which of a number of goals associated with ‘watching a movie’ is appropriate in the context of what it knows about the situation. When ARTHUR next learns that Geoff buys cocaine from Willy, it infers that Geoff had planned to buy drugs all along. The program then tries to reconcile this new inference with the previous inference of watching a movie for entertainment. In this case, the program cannot find any logical connection between watching a movie and buying drugs. But to understand the story this conflict between two incompatible inferences must somehow be resolved. The program first attempts to reconcile the difference by hypothesizing that Geoff wanted to entertain himself by using the cocaine. This provides Geoff with two different goals — entertaining himself and using cocaine. However, the parsimony principle allows a simpler interpretation of the two actions of going to the theatre and going up to the balcony — namely, to buy cocaine. Thus the original inference that Geoff went to the theatre to entertain himself is supplanted by the inference that he went there to buy cocaine. Thus the program works with an ‘explanation triple’. The triple consists of a goal (buying cocaine), and event (going to the theatre), and an inferential path connect­ ing the goal and the event (going to the theatre to huy cocaine). Some other ‘stories' that ARTHUR can ‘understand’ and explain include the following: ‘Mary picked up a magazine. She swatted a fly.’ (ARTHUR eventually concludes that Mary picked up the magazine in order to swat the fly rather than to read the magazine.) ‘Carl was bored. He picked up the newspaper. He reached under it to get the tennis racket that the newspaper had been covering.’ (ARTHUR concludes that Carl wanted the tennis racket so he could play tennis, and did not find the racket accidentally.) ARTHUR in its present state can deal with certain types of stories that have trick endings, and which arc called ‘garden path’ stories, such as mysteries, jokes, and fables. In these stories, a correct inference cannot be made until very late in the game. Program author Granger says, ‘. .. ARTHUR’S understanding mechanism is not entirely psychologically plausible in that it does not differentiate between stories with surprise endings and other garden path stories.’ In any case, the way in which humans understand stories and process information (such as a narration) is not understood. 5.2


ROBOT, Harris (1984), (which has since evolved into INTELLECT) is a program for answering questions about information contained in large databases. ROBOT



[Ch. 5

understands questions contained in large chunks expressed in natural English and responds in natural English. So far, ROBOT has been placed completely in the hands of users in twelve different companies: Bigelow-Stanford Carpet, Commer­ cial Union Assurance, DuPont Chemical, Planning Research Corp., Dun and Bradstreet, and Software AG of North America. The databases are: customer’s file, homeowners's insurance file, employee relations file, car file, employee file, membership file, accounting file, and planning file. The size of the databases ranges from 1000 records to more than 2.4 million. INTELLECT is developed by Artificial Intelligence Corporation, Waltham, Mass. INTELLECT is intended for use as a front-end interface to information retrieval applications in areas such as finance, marketing, manufacturing, and personnel. INTELLECT parses the user’s natural language query into an internal representation that sets off a database search. To do this, in addition to use a grammar, the system also draws on knowledge of the database structure, database contents, a built-in data dictionary, and an application-specific dictionary. When a query results in the generation of more than one possible interpretative paraphrase, INTELLECT resolves the ambiguity by assigning preference ratings to the different paraphrases, choosing the one most highly rated owing to its consistency with information in the database. If necessary, INTELLECT even asks the user which of several interpretations is correct. What types of questions can INTELLECT/ROBOT handle? Actual logs of questions and answers show that INTELLECT/ROBOT was able to handle the following questions (and others of a similar complexity): ‘Show subtotals of direct commission by month for first quarter 78 in region M where net amount is at least $150000.’ ‘For domestic machinery, print the July, 1978 sales dollars and sales quantities, sorted by sales class.’ 'For commercial division and those products in excess of $500000 year-todate, show me all time periods whose pre-tax earnings are more than $200000 and print their names.’ ‘For summaries with loss ratio greater than 200, report region, branch, and loss reserve by month.' ‘Print all products whose November, 1977 sales quantities are between 100000 and 300000 and whose pre-tax earnings exceed $1000000.’ One of the objectives of the program’s designers was to develop a system that was ‘domain independent’. A system that is domain independent can be applied to a wide group of data bases, rather than being tailored to just one application. Because of this important consideration, some problems that were revealed when the program went into use proved difficult to solve; some of the problems would have been solved fairly easily if the constraints on domain independence could have been removed.

Sec. 5.3]



The problems that did arise in use illustrate some of the difficulties in interpreting natural English. As an example, users often request data that concern one or more states and may sometimes spell out the state name, such as Maine, Oregon, or Indiana or may sometimes abbreviate the name as Me, OR, or IN. In phrasing a question, ROBOT considers all the logical alternatives and selects the best interpretation based on heuristics. But if a user says: ‘Print for me the names of all secretaries,’ ROBOT cannot determine whether ‘me’ stands for the state of Maine or is a pronoun referring to the questioner. ROBOT must therefore ask for clarification. Too many requests for clarification irritate the user. In actual practice, users of ROBOT were instructed to spell out the names of Maine, Oregon, and Indiana, but could use abbreviations or full spellings for all other states. Another problem surfaced when a user asked of ROBOT, in a homeowner insurance application, ‘How many snowmobiles are there?’ ROBOT interpreted to mean: ‘How many insurance policies have snowmobile coverage?’ Obviously, some snowmobile owners may have two or three snowmobiles and coverage and others may have no coverage. ROBOT found all policies that covered snowmobiles and counted the policies, not the number of the snowmobiles. This problem was seen as one of data representation. Many other ‘How many’ questions pose similar problems, and the correct interpretation depends on the specific data being queried. ROBOT is being modified to accept a certain amount of domain specific information so as to be able to handle a wide variety of ‘How many’ questions. Other problems that surfaced in use included the interpretation of what data to retrieve from a matrix and how to summarize information. Fixes are being made to solve each problem as it arises. Harris (1984) writes about ROBOT: The ROBOT system is the first natural language system to be put into production use in a variety of real world environments. The problem encountered in these applications are of interest to the research community because in many cases they challenge the fundamental principles in which virtually all current natural language query systems are based. (The four problems presented) are among the first difficult issues that future system must be able to solve if they are to be of significant use in the real world.



A great deal of work on natural language has been done at Yale University (VEGE and BORIS) and Brown University (PARAGRAM and BRUIN). The following sections describe these programs as a solution among other programs. VEGE (1980)

VEGE (Vegetable Gardening Expert, Wendy G. Lehnert and William M. Bain) answers questions about vegetable gardening, using natural language. The program is written in LISP and runs on a DEC 2060. The system parses single sentence questions about one plant at a time and provides a responsive reply. Many of the



[Ch. 5

answers the system generates are ‘canned’, although the system does have other ways to respond. The main work on the system was in the parser rather than in the answer generator. The system has a predictive parser, which allows it to understand questions about time, planting distance, and depth. Dealing with time presents much the greatest difficulty. VEGE can distinguish between sentences such as ‘How long does it take tomatoes to ripen?’ from ‘When should my tomatoes ripen if 1 planted them in May?’ Some consultations with VEGE using the predictive parser: 1. Q: A. 2. Q: A:

‘When would tomato plants be started indoors?’ 'Your tomatoes should be planted indoors from April to May.’ ‘Is one foot enough space between rows of spinach?’ ‘Yes, but you should plant your spinach rows no closer than 12 inches apart.’

VEGE also uses a ‘key-concept’ parser, which allows it to pick out key words and use then to generate a response; this is a weaker process than the predictive parser. Some examples: 1. Q: A; 2. Q: A;

‘What causes beans to be malformed?’ ‘Thrips are known to cause malformations in the fruits of pole bean plants.’ ‘What is eating holes in the edges of my bean leaves?’ ‘Weevils are probably causing the notches in your bean leaves.’

VEGE is written for the English language but can be translated to other languages such as French or German. The program is capable of accepting new information about gardening, which would increase its level of expertise. Because the program deals with the key concepts of time and distance, it can be adapted to other domains. When supplied with the proper knowledge, the program could answer such questions as: ‘How large a gap should I give spark plugs for an 8-in cylinder 1972 Chevy Impala?’ or ‘Is it too early to send wedding invitations two months before the ceremony?’. Extensions of VEGE can be made in either of two directions. In one type extension, the system can be applied to similar systems which are to be expert in one thing only. Another possible extension involves multiple knowledge domains, all of which are accessed by a single query. For multiple domains, the system would have to be supplied with information that would first allow it to identify the domain of interest. BORIS (1981)

BORIS (In-depth Understanding of narratives) was developed by W. G. Lehnert et al. at Yale University, in 1981. A number of programs have been developed (Schank 1982) that can understand a sequence of events. FRUMP, for example, was able to ‘read’ about 10% of the stories coming off the UPI newswire. Give a story about an earthquake, for example, FRUMP would be able to answer questions about the time of the quake, the place, the magnitude, and number of casualities, but other ‘human interest’ details would be ignored. In essence, FRUMP skims the story.

Sec. 5.3]



BORIS, on the other hand, attempts to understand a story in detail. The program has four basic processing units. A parser, or conceptual analyzer, reads the English language input and generates 'conceptual dependence’ structures; these represent the semantic context of phrases and sentences. An event assimilator examines the concepts and attempts to relate all imputs to each other and to a wide range of real world knowledge with which it has previously been supplied. A question/answering module uses the conceptual parser to read the questions BORIS is being asked about a story it has read, and searches its memory to find a suitable answer; this process uses search heuristics. The English generator takes the answer that BORIS has decided on the casts it into a comprehensible English sentence. The entire process of reading, understanding, and generating answers involves a complicated interactive process between the various parts of the program. BORIS was supplied with the following narratives and then asked questions about what it learned: Story: 'Richard hadn't heard from his college roommate Paul for years. Richard had borrowed money from Paul which was never paid back, but now he had no idea where to find his old friend. When a letter finally arrived from San Francisco, Richard was anxious to find out how Paul was.’ BORIS was then quizzed on this story. Q: A: O: A: Q: A:

What happened to Richard at home? Richard got a letter from Paul. Who is Paul? Richard’s friend. Did Richard want to see Paul? Yes, Richard wanted to know how Paul was.

And so forth. The above narrative is actually only the first of a five paragraph-story involving Paul, Paul’s wife Sarah, a divorce, hiring a lawyer, and a few other details. BORIS is able to answer questions about all the elements. Work is continuing on BORIS to improve its understanding of stories. Quoting from the conclusion of the BORIS report: 'It is difficult to appreciate the complexity involved in natural language processing because people do it so effortlessly. Anyone can read a news­ paper or engage in a conversation. Yet, the goal of producing commercial language systems remains unattained.’ BORIS' value at this time is believed to be primarily as a source and test bed for investigating the cognitive skills involved in human conversion, question answering, language translation, argumentation understanding.


One of the differences between human English and machine English is that humans can deal with incomplete, illogical, or ungrammatical statements, leap to conclu­ sions, and understand jokes and puns, while machines typically have to be dealt with



[Ch. 5

in a very straitlaced manner. PARAGRAM is a program that uses syntactical parsing to deal with sentences that are ‘semigrammatical’. Examples of sentences that traditional language-understanding programs cannot handle but that humans and PARAGRAM can: The boy is dying.’ ‘Bill sold Sue the book.’ ‘Jack wants to go to the store.’ While the program PARAGRAM will parse such sentences, it will also point out the grammatical problems. The program, however, is far from a practical system that can deal with all the ungrammatical forms and twisted sentences people use. Of course, there are also some language fragments that not even people can understand; if the person responsible for such a fragment, or addressed by it cannot understand it, why should a computer be expected to do so? BRUIN (1983)

BRUIN is a unified AI program that uses the same database both to understand language and to solve problems. This combination of tasks has not been accom­ plished before because knowledge representation in the two areas has been differ­ ent. Problem solving systems have used the Predicate calculus representation, while language comprehension systems have used a frame-like representation. The pro­ gram uses a common representation language, FRAIL (FRAIL (FRame-base AI Language), developed by Eugene Charniak of Brown University. The problem solving part of the system is a program called NASL, and is based on a problem solving language, also called NASL, developed by Drew V. McDermott of MIT. The language understanding part is a program called PRAGMATICS. BRUIN operates on two basic principles. (1) ‘The only knowledge which needs to be expressed in FRAIL is that associated with doing and choosing plans.’ (2) ‘Given an arbitrary action, the context recognized, PRAGMATICS will recog­ nize the action as either: (A) a part (or sub-act) of a plan, or (B) a way of accomplishing some higher goal.’ BRUIN must be supplied with information about the domain in which it is going to work. This is accomplished, at least partly, by supply BRUIN with frames that are typical of the domain. As an example, BRUIN was supplied with frames from the world of magic and house building. These take the form: Frame 1: IS A: SLOTS: PLAN: Frame 2:

Magician-saw-Victim (This is the name of the frame). action (A magician sawing a victim IS A action) (magician: must be a person) (victim: must be a person) (saw-steps: magician saws the victim) Build


Ch. 5]



action (Builder: must be a person) (Lumber: must be a piece of board) (saw-step: builder saws the lumber).

These two frames have certain elements in common. They are both actions, they both include a saw-step, and they both involve at least one person. When BRUIN is told 'Kim is building a house. Kim is sawing a board’, the program understands that Kim is sawing a piece of lumber in the plan step of the Build frame. If, on the other hand, BRUIN is told 'Kim is sawing Lee’, the program selects the 'Magician-sawVictim’ frame as appropriate, with Kim as the magician and Lee as the victim. BRUIN has been used to solve problems presented to it from a textbook on inventory control as well as problems in restaurant dining, the blocks world, automatic programming, and computer architecture. All the frames associated with the above worlds are in the same database. This means that the systems can easily be extended to other domains. Problems still being experienced with the system relate primarily to incomplete knowledge in the database. The program has 'not been seriously tested on a large complex domain.’ In the past five years we have seen developments of natural language interfaces to databases and conversational advisory systems. Such products link natural language input to conceptual representations based on Schank’s theory of conceptual depen­ dencies which capture the meaning of the input. In addition to storing information about meaning in conceptual representations, these systems incorporate information about the problem domain in various knowledge structures, such as ‘scripts’. A 'script', for instance, is a description of what happens in a situation, that conforms to a stereotype. When a scripted topic is presented to a system employing this knowledge structure, it has a set of expectations that helps it resolve ambiguities. Cognitive System’s (New Haven. Conn.) products also combine expert systems with natural language systems to form 'intelligent retrieval systems’. Natural language interfaces vary in how much users must conform their input to the structure of the system, some allowing for the use of language that is more natural than others. Most permit users to modify the dictionaries with which the systems are equipped and to add their own entries, with various degrees of ease and allowing various kinds of definitions. On encountering a spelling error, some natural language systems ask for a correction, some automatically correct the word, and others simply stop processing the sentence.


Charnick, U. B. (1983) A parser with something for everyone, Department of Computer Science, Brown University (A Technical Report of the Computer Science Department). Granger. R. H. Jr., (1983) 'When expectations fail, put a self-correcting inference system'. Artificial Intelligence Project, Department of Information and Computer Science, University of California, Irvine, CA.



[Ch. 5

Harris, L. R. (1984) Experience with robot in twelve commercial data language base query applications, Dartmouth College, Hanover, N. II. Lehnert, W. G. & Bain, W. M. (April 1980) V Ell E: Variable processing in a natural language query system. Research Report No. 183, Yale University, Department of Computer Science. Lehnert, W., Dyer, M. E., Johnson, P., Yang, C. .1. & Harley, S. (1981) BORIS — An experiment in in-depth understanding of narratives, Research Report No. 188, Yale University, Department of Computer Science. Nilsson, N. J. (1980) Principles o f Artificial Intelligence, William Kaufmann. Schank, R. C. (1982) Dynamic memory. Cambridge Univ. Press: New York, N.Y. Shapiro, S. C. (1979) ‘The SNePS semantic network processing system’. Associate Network, Academic Press. Winston, P., H. (1977) Artificial Intelligence, Addison-Wesley: Reading, Mass. Wong, D. (1982) Language comprehension in a problem solver, Department of Computer Science, Brown University, Providence.

6 Al programming languages AI languages are programming languages which were especially designed for symbolic data processing (in contrast to numerical, conventional data processing), e.g. languages which possess primitives for the description of objects and their interactive relation allowing for the logical manipulation of expressions. By means of these primitives it has become much easier to construct knowledge-based systems. The most widely used Al programming languages are LISP and PROLOG. LISP has been the dominant language in the USA, while in Europe PROLOG, as a relatively young programming language in some application areas of AI, competes with LISP. Additionally, PROLOG was selected for the Japanese Fifth Generation Computer Systems (FGCS) Program. There are numerous other programming languages significant to AI program­ ming. LISP itself has several dialects, such as MacLISP and InterLISP, though Common LISP is establishing something like a standard. New languages, e.g. SMALLTALK 80, have been developed for simplified programming. Other lan­ guages such as LOOPS have been designed as a specific tool for aiding the development of expert systems. In recent years a number of different tools or expert system shells have been merged, providing the user with specific representation and reasoning methods. A simple device for classifying various languages and tools is the language tool continuum (Fig. 6.1).

High level







c ++ Fortran Fig. 6.1 — Language tool continuum.




[Ch. 6

In general, languages are more flexible but more difficult to use for prototyping, since representation and reasoning methods have to be implemented. Tools are much less flexible, because major knowledge engineering decisions have already been incorporated into the tool by providing specific mechanisms. Consequently, if a problem matches a tool, a solution can be obtained relatively fast. Also conventional languages, e.g. C, extended by object-oriented methods, e.g. C+ + , retain an increasingly important role especially in connection with specific tools like NEXPERT OBJECT.

6.1 LISP AI is considered a young field, especially in its commercial application, but LISP, its major programming language, is relatively old. The only high-level programming language older than LISP and still in use is FORTRAN. While FORTRAN was designed primarily for numerical computation, LISP was designed for the manipulaion of symbols. LISP (LISt Processor) was developed by .1. McCarthy at MIT in 1958 during his work on Advice Taker, symbolic differentiation, computer chess, and theorem proving. Advice Taker describes the development of a program, which is able to infer intelligent conclusions from a situation description. For his work McCarthy needed a high-level programming language with the following features: • representation of symbolic expressions and • manipulation of symbolic expressions by functions in a mathematical sense These requirements result in a language combining lists as the basic data structure and operating by recursion. Conditionals and functions on lists act as the main control structures. The kernel of a LISP system is represented -by the function EVAL. The function EVAL, written in LISP, itself serves as an interpreterjfor LISP and as a formal definition of the language. In this way programming in LISP can often be stated as functional programming. In LISP we have the same syntactical representation for programs (lists) as for data (also lists). LISP is unique among programming languages in storing its programs as structural data. Consequently LISP programs can be manipulated by LISP programs as they are represented in the form of LISP data (i.e. lists). This functionality of LISP pervades the methods of Artificial Intelligence. Structure of LISP systems In LISP, symbolic expressions are either atoms (literals or numbers) or lists, e.g. (a be) ((a b) (b c) (d e)) ()

is a list with three atoms is a list with three elements is an empty list.

The functions LISP offers are mainly for the manipulation of such lists. Functions like


Sec. 6.1]


CAR first element of a list CDR part of a list after the first element is removed take lists apart. Functions like LIST and CONS build up new list structures. Other operators like EVAL or MAPCAR serve to evaluate lists as functions or to perform specific functions on every element of a list. It must be pointed out that programming in LISP will be highly recursive. Lists can be nested within each other, like a Chinese puzzle box. Consequently, to solve complex problems on such highly structured data, it is often very useful to write recursive programs. A program is recursive if it calls itself as a subfunction either directly or indirectly. How does recursion work in LISP? Let us have a look at the definition of ‘equal’ and ‘equivalence’. Two objects, objl and obj2, are equal if (1) they are equivalent, or (2) they are both list structures, such as (equal (first obj 1) (first obj2)) and (equal (rest obj 1) (rest obj2)) According to this definition, we first observe whether the two objects are equivalent. If not, we start decomposing the two objects into parts and look to see if the parts equal themselves. Function for equality in LISP (defun equal (objl obj2) (cond ((eq obj I obj2) T) ((and (list obj 1) (list obj2) (equal (first objl) (first obj2)) (equal (rest objl) (rest obj2))) T)) Although this is only a very simple example, LISP can elegantly represent and solve complex problems by the use of functions and recursion. Software development with LISP The concept which contains lists as the main data structure offers wide flexibility, much more than in most languages suitable for AI programming. The programmer can follow his individual style of programming. For example, he may easily change the syntax of the language. LISP provides automatic facilities for associating symbols, and LISP has abilities to construct and access complex data structures easily without bothering the user with pointer management or storage allocation. This shows the flexibility and the power of LISP and could explain its popularity in the AIcommunity. It is for this reason that an authoritative, introductory textbook in



[Ch. 6

Artificial Intelligence (Charniak & McDermott 1986) possibly somewhat exaggera­ tedly states: There are occasional programs in AI written in languages other than LISP, but if one were to take, say, the 100 best-known programs in the field, probably 95 would be in LISP, and of the remaining five, four were written before LISP was available on most computers. Because of the individualized style of programming LISP has drawbacks concern­ ing maintenance and self-documentation in software engineering. This is especially true if the programmer fails to follow a particular style of programming, e.g. modular program structure, local variables, and small functions. LISP development environment For the Al-programmer LISP provides more than just a language interpreter or compiler, but a complex development environment. This environment contains the language and a number of tools, such as tracer, debugger, graphic toolkits, and editor. These tools are totally integrated with the language. A main characteristic of LISP systems is that they are typically organized around the interpreter. Software development is mainly done step-by-step interactively with the interpreter by adding programs and changing components already available in a so-called prototyping manner. Restrictions for specific applications, such as real time applications, are given by the garbage collection as another integrated part of the LISP environment. LISP does storage allocation as well as pointer management completely automatically. Pointers are references to addresses of the elements (lists and atoms) handled under the total control of the LISP system, in contrast to conventional languages like C and Pascal, where the pointers (addresses) have to be implemented by the programmer. After specific time intervals the system does automatic garbage collection to clean up the storage of unreferenced objects. Garbage collection could take, depending upon the system architecture, some minutes of process time, when automatically invoked by the LISP System. LISP dialects There are only a few basic LISP functions, such as those explained previously. All remaining LISP functions are defined in terms of these basic functions. Because of its small set of primitives, different LISP dialects, shown in Fig. 6.2, have been developed. McCarthy’s Implementation of USP 1.5 (1965) originates in the basic language definition. The two major dialects of LISP are MacLISP and InterLISP. The group of InterLISP programs offers an integrated development environment and complex program control structures such as co-routines and backtracking mechanisms. Emphasis in InterLISP has been laid on the best possible programming environ­ ment, even at the expense of speed and memory space. MacLISP was developed at the MIT AI Laboratory. This dialect focuses on aspects of efficiency and flexibility

Sec. 6.1]



Fig. 6.2 — LISP dialects.

for building tools and embedded languages. Therefore, MacLISP possesses an efficient compiler and includes several packages of additional facilities. The LISP systems of LISP machines, ZetaLISP (Symbolics) and LMLISP (LMI), are closely related to MacLISP, enabling expansion of multi-process systems, object-oriented programming by means of object-oriented concepts such as flavours and frames (see section 6.3) and functions for manipulating high-resolution graphics. Especially the flavour system for object-oriented programming is an important amplification. It strongly supports the representation of structural knowledge (objects and their relation). COMMON LISP

With Common LISP, the US Department of Defense (DOD) initiated the standardi­ zation of LISP (Steele 1984). Today, nearly all computers offer if in a more or less complex implementation. Common LISP manages to be a synthesis of many of the best ideas present in different LISP dialects. Furthermore, it is intended to be efficient and portable, with stability as a major goal. In the area of workstations, LUCID Common LISP, developed by the company LUCID Inc., seems to have become a standard (with implementations on SUN as SUN Common LISP and APOLLO as DOMAIN Common LISP). Furthermore, Common LISP can also be used on workstations of Digital Equipment DEC (VAX Common LISP), on LISP machines, as well as on IBM PCs and compatibles. While Common LISP is a descendant of MacLISP, which traditionally placed emphasis on providing system­



[Ch. 6

building tools, it offers a complex, extendable, and alterable programming environment.

6.2 PROLOG PROLOG, so named for ‘programming in logic’, is a theorem-proving, logicoriented language, developed at the University of Marseilles by A. Colmerauer and his colleagues in the early 1970s. D. Warren, during his work/stay at the University of Edinburgh, created an implementation of PROLOG for the DEC System 10 that included an interpreter and a compiler. Today, the treatment of PROLOG is described by Clocksin & Mellish (1984). Different versions of PROLOG for nearly every computer family, ranging from PC over workstation to mainframe, are available. The pure system environment of the first implementation has been extended so that today they are comparable in functionality with LISP systems. Also tools, e.g. TWAICE (Nixdorf), have been built using PROLOG. A dominant application of PROLOG has been natural language understanding, but also in expert system development the use of PROLOG is proliferating. PROLOG is a logical programming language, based on first-order logic. Like LISP, PROLOG was designed for the manipulation of symbols. PROLOG, similarly to LISP, provides a programming environment centered around an interpreter. Despite their similarities, the languages have a different orientation in their pro­ gramming style. As described above, LISP's so-called functional programming style consists of a collection of single functions often nested highly recursively. In logical programming, the control structure is essential to the logical deduction process. Therefore, PROLOG has a completely different focus compared to the usual programming languages, where the solution is approached by an algorithmic code step by step. PROLOG declares the facts about objects and their interrelationship, as well as rules about objects and their interconnections, describing how to solve a specific problem. On the basis of these facts and rules PROLOG programs are designed. The notation in the form H :- B1 ,..., Bn is equivalent to horn clauses, which in turn represent a simplified version of the Predicate Calculus. In this notation H is called the head, and Bl,...,Bn the body of the clause. The symbol ‘A :- B’ means ‘A is true, if B is valid’ and the comma represents a conjunction. Clauses, of which the body is always true, are therefore called facts. Otherwise, the clause is called a rule. PROLOG provides a logic-based approach that allows one to check whether specific clauses are valid. Further, this approach can be used to validly infer new information from given facts and rules. The following example explains a simple PROLOG program:


Sec. 6.2]

FACTS: parent(bob, jack). parent(bob, peter). parent(peter, thomas). male(peter). male(bob). male(thomas). RULES: grandfatherfX, Y):male (X), parent(X,Z) parent(Z.Y).


bob is a parent of jack bob is a parent of peter peter is a parent of thomas peter is a male bob is a male thomas is a male X is grandfather of Y, if X is male and X is parent of Z and Z is parent of Y.

The user is able to work interactively with the PROLOG system and to ask queries about features of the objects and their relations, e.g. ?- grandfather(bob, thomas). —» True.

Is bob the grandfather of thomas?

As mentioned before, PROLOG has an embedded control structure. The logical deduction process driven by this control structure equals a depth first search (see section 2.6) on a graph. In the following, a short description explains the built-in logical deduction process. It shows how queries are proved on the basis of known facts and rules. PROLOG’S built-in inference mechanism

In the logical deduction process for every query, sometimes called a goal, a proof tree is constructed. The query or goal, ‘grandfather(bob, thomas)', is matched with the head of the first rule or fact where the predicate corresponds to the predicate of the goal. This process is called unification. Unification is successful if all arguments in the goal can be matched with arguments of the head of the rule or fact. If unification was successful, all variables in the body of the rule are substituted. e.g. 'male(bob), parent(bob, Z), parent(Z, thomas)’ and the modified body is proved, left to right, goal by goal, as given by the rule. If the deduction process runs successfully on a fact, such as 'male(bob)’, this specific goal is proved. If a goal cannot be unified or proved, backtracking is initiated, e.g. •parent(bob, Z)’ has been unified to 'parent(bob, uwe)’, in turn the goal ‘parent (uwe, thomas)' must be proved, which is not in our fact base and therefore the system initiates backtracking. If backtracking reaches the root of the proof tree, the deduction process fails and the PROLOG System negates the user query.



[Ch. 6

This should give an example of how PROLOG works. Readers interested in further aspects are referred to Clocksin & Mellish (1984). In particular, problems based on depth-first search are very suitable for a solution with PROLOG. In these areas, PROLOG is able to generate quickly a prototype in order to test basic approaches without paying attention to the efficiency. The major drawback of PROLOG is the fact that the language is limited to a single control mechanism, called backtracking, combined with its built-in control structure, depth-first search. This restricts the application of PROLOG, as in the case of expert sytem shells, to a specific class of problems. The built-in control structure makes PROLOG less flexible, so that AI developers often prefer LISP because of its greater versatility. Every problem solution in PROLOG different from depth-first search has to be developed by using the backtracking mechanism. Additionally, the backtracking mechanism has the disadvantage that inefficiencies might occur, because of the unconscious sequencing of facts and rules in PROLOG programming. But these ramifications cannot cover up PROLOG'S potential, because of its unique benefits. It is easily readable and it provides a sophisticated proof mechanism as well as query capabilities. PROLOG as an implementation language for expert system shells and tools plays a minor role. This may be based on the fact that PROLOG, compared to LISP, is a relatively newcomer. It is important to note that PROLOG and logic programming have a strong impact on database systems (see Chapter 7). Additionally, some efforts have been carried out to enrich tools and languages by logic programming facilities. These explanations might elucidate why the future of the two major Al-languages is still an open question. It may evolve in the fusion of the two systems. A first step in this direction is taken by the language POPLOG, which provides functional and logical programming capabilities (Hardy 1984).



In recent years, object-oriented programming languages have become popular first of all in Artificial Intelligence, but also in conventional software development with extended conventional languages like C ++ (Pinson & Wiener 1988). Objectoriented programming proves beneficial because it permits a better modelling of complex problems. If a system is described by means of objects, the problem solution is more natural and, therefore, closer to the human way of thinking. This results in a more efficient software and knowledge engineering process caused by advantages in concept, implementation, and maintenance. SMALLTALK was the programming language which in the early eighties focused interest on object-oriented programming. Objects and classes In this style of programming, every component of the system is an object, and knowledge about how to do things is associated with these objects. In practice, this is done by grouping objects that have the same data structure and do the same things in the same way into classes (sometimes called flavours or frames). This class concept is a special form of data abstraction. By means of the class concept, it is possible to

Sec. 6.3]



develop objects under the aspect that their characteristics determine their intercom­ munication. Objects interact with one another via the transmission of messages (message passing). In conventional languages, you see the objects from the way they are imple­ mented, while SMALLTALK puts emphasis on the communication with objects. What you can tell the object and which answers you may expect is important. This way of data abstraction is characteristic for object-oriented programming. Further, it is significant that all elements in SMALLTALK are objects and that all objects are treated in the same manner (‘sole inhabitants of the universe’). The SM ALLTALK environment Although SMALLTALK is strongly LISP-oriented, the essential element, namely the class concept, came from SIMULA. SMALLTALK is, like LISP, more than just a programming language. It is rather a programming environment which reflects the object-oriented philosophy. SMALLTALK (Goldberg & Robson 1983) consists of four components: — The programming language core, i.e. syntax and semantics of the language. — The programming style, i.e. the way the core is used to develop software systems and to handle the objects in accordance with the programming language core. — The programming system, i.e. a certain number of objects and classes. — The user interface model, i.e. how the user combines the disposed objects and classes. The message sending paradigm All objects communicate with each other by means of message sending. If a program or system is to be developed in SMALLTALK, it is realized by creating objects. To start work, messages are sent to the objects, and since these messages are the only way of communicating between objects, they must be very comprehensive. This versatility is guaranteed by the fact that messages can be parametrized with other objects. Furthermore, every message results in an answer. The message starts the work and the answer confirms that the work was successfully accomplished. Concept o f inheritance Another essential characteristic of object-oriented programming is the concept of inheritance. If in SMALLTALK objects have certain characteristics, variables, and methods in common, they belong to the same class. By inheritance, characteristics may belong to some classes, superclasses, and subclasses. This means that any variable that is defined higher in the class lattice will also appear in the object of a specific class. A class lower in the inheritance lattice than the given class is called subclass. Otherwise it is called superclass. Object-oriented languages LISP dialects are frequently extended for object-oriented programming. LOOPS, which includes an object-oriented programming facility, is an extension to INTERLISP-D. The Flavour package is an object-oriented extension to ZETALISP. The term flavour is simply the word used in this extension for a class of objects. Common


[Ch. 6


LISP does not define a standard object-oriented programming facility, but most implementations, like the one from LUCID Inc., are expected to provide this extension. Some modern languages such as Objective-C and C + + have these features, although they are not especially designed for object-oriented programming.



During the last few years, a number of shells and tools have been developed to support knowledge engineering (see Chapter 4) of expert systems. These tools are a distinctive mark within the area of expert systems and are the first to be compared with program generators of simulation systems in the field of conventional data processing. Historically, the first software systems supporting the expert system development have been made by removing the domain specific knowledge from already existing expert systems, e.g. EMYCIN represents MYCIN without medical domain knowledge. Generally, these approaches are restricted to the class of problems the original expert system was developed for, e.g. a specific class of medical diagnosis. Table 6.1 gives an overview of early expert systems shells, some of them derived from an expert system approach. Table 6.1 — Early programming tools for building expert systems Tool






Stanford U.






Stanford U.




Stanford U.


Stanford U.

A programming language built on top of LISP design to facilitate the use of production rules. A domain-independent version of MYCIN, which accompanies the backward-chaining and explanation approach with user aids. Supervises interaction with an expert in building or augmenting an expert system knowledge base in a network form imple­ mented for PROSPECTOR. A general rule-based programming language that can be used to develop large knowledge bases. Translates near-English into LISP. A sophisticated expert system to aid users in building expert systems. A generalized domain-independent extension of HEARSAY II. Includes a 'context' mechanism and an elaborate 'black­ board' and scheduler. A knowledge representation language and interactive know­ ledge acquisition sytem. The language provides both 'fram e' structures and production rules. An expert system that facilitates the interactive transfer of knowledge from a human expert to the system via a (restricted) natural language dialogue.

Shells are used to speed up the implementation process of expert systems. Therefore they provide a number of methods of knowledge representation, e.g. rules and frames, inference mechanisms, e.g. backward and forward chaining, interfaces to conventional languages and databases as well as elements to design comfortable man-machine interfaces. It is a great advantage that already completed and tested

Sec. 6.4]



programs are provided for the implementation of commonly used functions in architectural elements of an expert system. Therefore, they will not have to be developed and tested on their own again. As those functions are completed with a specific application area in mind, their use is rather limited and there is no way to change them. Every problem category, for which expert systems should provide support, contains its own methods of reasoning and representation, e.g. diagnosis problems use backward chaining as a major inference mechanism, and for medical diagnosis certainty factors are often required to weight the experts' heuristic knowledge. In general, the use of a certain tool is restricted to one or two specific areas, where adequate methods are offered by the tool, e.g. EMYCIN for medical diagnosis. Even though expert system shells and tools are helpful in knowledge engineering, especially in the implementation phase, they do not offer a general solution. Fig. 6.3 shows the connection between problem class, tool, and strategies in a general view (ART would take a similar position to KEE). Therefore, it is very important for the selection of an expert system development tool to verify that the disposed methods available are sufficient for the problem-area you are aiming at. Commercially available tools can be divided into three classes (Harmon & King 1985):

Fig. 6.3 — Connections between problem class, tools, and methods (Harmon & King 1985).



[Ch. 6

— Small tools, which run on personal computers. Usually they are limited in their number of active rules and knowledge representation methods as well as inference methods. — Large tools for a certain problem domain, which run on workstations, hosts, or LISP machines to develop more complex systems (partially some thousand rules) in specific problem areas. — Large, hybrid tools also running on workstations, hosts, or LISP machines. They can be used for a variety of problem areas compatible with a wide spectrum of supported methods for knowledge representation and inference control. With a view to the building of expert systems, languages are most flexible, but are more difficult to use, because all parts of the system have to be programmed. Therefore, highly experienced programmers are needed. Tools are far less flexible but allow a quick construction of a prototype, provided that an adequate tool can be used for this problem area. If under certain circumstances the tool does not match a problem at all, then the underlying implementation language, for example LISP in KEE, might be chosen for the implementation of specific parts. This is possible if the tool offers the opportunity to integrate programs written in the implementation language of the tool. If the tool used has to be altered or if something has to be added, what should be thought of is a way of finding a better equipped tool for the implementation or, still better, would it not be more suitable to develop the whole system in an Al-language or a higher programming language? Providing access to the implementation language and interfaces to high-level programming languages becomes more and more important. Expert systems have to be integrated into the commercial application area and have to use already existing software packages, e.g. for numerical computation in FORTRAN or graphics packages written in C. Therefore they have to be interfaced to different hardware, e.g. in the medical diagnosis to a laboratory computer, and to software components already available, e.g. database systems (see Chapter 7), or graphics systems. In the following some tools are described in more detail: A R T (Automated Reasoning Tool, Inference Corp.) ART distinguishes between methods for representation of declarative (factual) knowledge and procedural knowledge such as forward chaining, backward chaining, and constraint rules. Declarative knowledge in ART is shown by means of facts and schemes (corresponding to frames). Futhermore, the concept of inheritance is provided for schemes. In this way ART offers a so-called ‘viewpoint structure’, i.e. a possibility to define a validation area for facts (named as ‘extent’). By means of this hypothetical method, alternative plans or perceptions of situations can be created. Additionally, the ‘viewpoint structure’ can be used for developing situations chang­ ing with time. All of these opportunities make ART a very suitable tool for design and planning problems. ART is implemented in LISP and in C. Versions of ART are available on workstations and LISP machines. KEE (Knowledge Engineering Environment, IntelliCorp) KEE is a large hybrid system which, just like ART, provides a number of methods

Sec. 6.4]



for the fast development of expert systems. For the declarative knowledge represen­ tation KEE uses frames with all possibilities to model relations between objects and the inheritance of their characteristics. Similar to ART, KEE offers with its ‘KEEWorlds’ the opportunity to generate hypotheses to follow various options in parallel. In KEE the rules themselves are implemented by using frames, so that rules have the opportunity to interact via message passing. Further, this representation allows a classification of rules into specific rule packages. KEE has an agenda mechanism, controlling the sequence of rule application. This mechanism can be applied to the knowledge base using either forward or backward chaining, or following a mixed strategy. In addition to an object-oriented starting point based on frames, KEE also proposes a data-oriented programming style. Procedures can be attached to vari­ ables or slots in the frame structure (‘active values'), becoming active if either a change in data occurs or certain data are used. Just like ART, KEE offers a variety for opportunities for graphical interaction. The open architecture allows the integ­ ration of programs written in different programming languages (e.g. C or PASCAL) and the extension and adaptation of the whole system according to specific user requirements. NEXPERT OBJECT (Neuron Data) NEXPERT OBJECT is one of the newer hybrid tools which has become very popular in recent years. NEXPERT OBJECT is an excellent implementation of object-oriented programming facilities very close to SMALLTALK, offering addi­ tional inheritance mechanisms. Additionally, it combines different rule-based infer­ ence strategies and object-oriented representation efficiently. It is implemented in C, which explains its broad spectrum of computer systems and provides a fascinating graphical user interface. LOOPS (XEROX Research Corp.) LOOPS is a specific tool which extends the 1NTERL1SP-D programming environ­ ment to provide a powerful system for artificial intelligence research and expert system development. It combines several programming styles, such as: • • • •

object-oriented; procedure-oriented; data-oriented; and rule-oriented.

LOOPS provides object-oriented concepts similar to those from SMALLTALK, and therefore enables object-oriented programming (see section 6.3). It is pro­ cedure-oriented, because it allows the user to build classical procedures in LISP, the embedding language of LOOPS. Rule-oriented programming is done by specifying the knowledge in condition-action rules. Additionally, in LOOPS, procedures can be attached to variables to be active, therefore realizing data-oriented programming. KES (Knowledge Engineering System, Software Architecture and Engineering, Inc.) KES is a high-level knowledge representation language that provides access to multiple methods of knowledge representation and reasoning facilities, including



[Ch. 6

rule-based deductions, hypothesis and test inferencing guided by a network of descriptive frames, and statistical pattern classification. The full system consists of three separate programs, KES-PS. KES-Bayes, KES-HT, each with a different kind of inference engine, and the ability to develop a different type of expert system. One of the most attractive capabilities of the KES package is that any of the three modules can call on the others as often as needed. This separation makes KES one of the first hybrid systems available for microcomputers. OPS5 (Carnegie Mellon University) OPS5 is an expert system development tool based on production rules as described in Chapter 4. Originally, OPS5 was developed at the Carnegie Mellon University to support research in cognitive science. Although it mainly provides a forward chaining facility and therefore is restricted to a specific class of problems, OPS5 has the power for bigger developments. This has been demonstrated in well known projects like Rl/XCON (see Chapter 4). OPS5 uses production rules within the framework of simple loops called 'recognize-act cycle’. OPS5 provides a single global data base called working memory. The OPS5 inference engine executes a production system by performing the following operations: (1) Determine which rules have satisfied antecedents (match). (2) Select one rule which satisfies antecident (if no rule has satisfied anticedent halt execution). Conflict resolution. (3) Perform the actions of the selected rule (act). Return to (1). REFERENCES

Bobrow, D. G. & Stefik, M. (1981) The Loops manual, XEROX PARC. Charniak, E. & McDermott, D. (1986) Introduction to Artificial Intelligence, Addison-Wesley: Reading, Mass. Clocksin, W. F. & Mellish, C. S. (1984) Programming in PROLOG, Springer-Verlag. Goldberg, A. & Robson, O. (1983) SM ALLTALK-80: The language and its implementation, Addison-Wesley: Reading, Mass. Hardy, S. (1984) ‘A new software environment for list processing and logic programming’. In: O’Shea, T. & Eisenstadt, M. (eds). Artificial intelligence: tools, techniques and applications. Harper & Row. Harmon, P. & King,, D. (1985) Expert systems: Artificial Intelligence in Business, Wiley: New York. Kowalski, R. (1979) Logic for problem solving. Artificial Intelligence Series, North-Holland. O'Shea, T. & Eisenstadt, M. (eds) (1984) Artificial Intelligence: tools, techniques and applications. Harper & Row.. Pinson, L. & Wiener, R. (1988) An introduction to object-oriented programming and C+ + , Addison-Wesley: Reading, Mass. Rauch-Hindin, W. B. (1988) A guide to commercial artificial intelligence: fundamen­ tals and real-world applications, Prentice Hall. Steele, G. E. (1984) Common LISP: the language. Digital Press.

Ch. 6]



Stefik, M. & Bobrow, D. G. (1985) ‘Object-oriented programming: themes and variations’, The A l Magazine.

7 Expert and database systems Expert systems are used in a rapidly increasing number of application areas which rely on large volumes of data. Such areas as financial management and military command and control have already been handled by database technology. Know­ ledge normally kept by expert systems includes rules and facts. The set of facts is divided into those with permanent validity and others which are case-specific. Since expert systems use domain specific knowledge and, depending on the application, parts of its knowledge are kept in an existing database management system (DBMS), efforts have been pursued to integrate expert systems and database management systems combining the advantages of both technologies. The fusion of DBMSs and expert systems may prove beneficial for future development in both areas. Expert systems research has developed methods for knowledge representation and inference to perform adequate modelling of human problem solving behaviour. In support of tasks such as diagnosis and configuration (see Chapter 3 and 4) expert system technology has focused on the ability to represent different kinds of information and relationships between data items with increasing depth and preci­ sion. The major goal of knowledge representation is expressiveness. Under normal circumstances, relatively small volumes of data are processed. In some application areas, however, e.g. CAD/CAM, weather forecasting, geological prospecting, and medical assistance, the expert system’s knowledge is too large to be kept in the main memory. Efficient external data management is required. From a different perspec­ tive database technology has been engineered to efficiently represent, access and manipulate large databases in order to support transaction-intensive data processing applications. Managing large data sets is the key issue of database systems. In the last few years a new type of software system called Expert Database System or Knowledge Base Management System has been classified (Kerschberg 1986, 1987, 1989). In his keynote addressed to the ‘First International Workshop on Expert Database Systems’, Smith (1986) characterizes an Expert Database System (EDS) as: ‘a system for developing applications requiring knowledge-oriented processing of shared information’.



EDS will merge capabilities of DBMS with expert systems technology. The latter contributes:

Sec. 7.1]



• problem solving mechanisms on the basis of advanced reasoning facilities and heuristic search algorithms; • classification o f knowledge in strategic, meta and domain-specific knowledge; • support in knowledge engineering based on exploratory development capabilities, e.g. knowledge representation being flexible in time; • a rich world of knowledge representation systems, e.g. frames, networks, and rules. EDSs inherit, from conventional database management, functions for large databases including the following features: • Concurrent use o f shared data, that is, the ability for several different users to be actually accessing the database at the same time. • Storage and search structures with search and update optimization (even in distributed databases). • Error recovery and automatic back-up to ensure that the database can be reorganized with a minimum of delay after damage caused by human error or a technical failure. • Multi-level DBMS architecture (external, conceptual and internal models) provid­ ing individual user views, a community user view, and a storage view of the database. • Consistency and integrity checking to ensure that the database is accurate. Historically, the two technologies, expert systems and database systems, emerged from different roots. While database systems were initially close to commercial applications, expert systems originated in AI research laboratories closer to basic research. This explains why the technologies differ in implementation methods, user interfaces, direction of innovation, developer/user community, robustness, etc. and, therefore, linking the two technologies depends on the point of view as well as on the application area. Key issues of EDS from the expert system's point of view are: • Access to already existing databases It does not make sense to duplicate large databases for expert systems use. High costs, loss of time, and difficulties in maintaining consistency would be the inevitable consequences. Therefore, for many application areas, a sharing of existing databases is the only way to solve the user's problems by means of expert systems technology. • Storage of large rule and databases Generally, all rules and facts are kept in the main memory and held in specialized structures, such as frames, lists, and semantic networks accessed by certain routines. If the expert system’s knowledge base is too large to fit into the main memory, parts of it must be swapped onto external devices. • Multi-user-access capability Knowledge and data are corporate resources managed and shared among a



[Ch. 7

diverse user community. Depending on the approach, different architectures are possible (Figs 7.1a and 7.1b). • Recovery and automatic back-up Expert systems technology becomes widely operational and, therefore, questions arise on what to do if something fails. Features handling the database and knowledge base recovery are necessary to re-establish consistency after a system breakdown. • Knowledge representation independence In some conventional DBMSs a so-called ‘data independence’ exists. This means that the application programs should be independent of the physical database structure, i.e. a change of the access path will not affect any application program. It seems to be useful to apply this principle to knowledge representation. Rules will then be stored independently from any heuristic or reasoning mechanism. From the point of view of conventional database theory, an EDS behaves like a more intelligent database interface that allows more sophisticated queries, e.g. enables the user to ask ‘what/if queries. As discussed later with object-oriented databases (see section 7.5), EDS provides data with more semantics. In addition, artificial intelligence can be used to improve the capabilities of database systems by supporting natural language query interfaces (see Chapter 5). Furthermore, rulebased integrity and consistency checking offer powerful features for internal data­ base management. Depending on the application and performance criteria, one of these elements should be integrated into an EDS approach. The main variants (Jarke & Vassiliou, 1983, 1984) for the integration of both technologies suggested are:

(a) Coupling of independent systems This procedure provides a clear interface between an existing expert system and a general-purpose DBMS. Both systems are individual components which communi­ cate with each other. There are two different ways of coupling:

• loose coupling and • tight coupling Using loose coupling, the rule-base resides in the expert system and the necessary data are retrieved from the DBMS in snapshots. In tight coupled systems the expert system and DBMS can interact at any moment, and data and rules are continuously loaded.


Enhancement of the expert system by database technology An expert system is intrinsically enriched by basic database management functions tailored to fulfil its application-specific requirements. This DBM-component is mainly incorporated to manage the expert system’s facts. The rule-base is still managed by the expert system. Components like the inference engine or the explanation facility access the data via these DBM operations.

Sec. 7.1)


Expert System A p p lic a tio n

Database A p p lic a tio n s

E xp e rt D a ta b a s e S yste m


DB-1 ...............


Fig. 7.1a — Database and expert system applications with shared databases.

Fig. 7.1b — Database and expert system application with shared data and knowledge bases.




[Ch. 7


Enhancement of the database system by expert system technology In this class of systems, the expert system's reasoning facilities are integrated into the DBMS mainly in the form of a deductive component based on first order logic (FOL). This results in a so-called intelligent or deductive database. Further, as an additional feature a rule manager is added to the DBMS which gives the DBMS expert system shell capabilities.


Options (b) and (c) involve the performance enhancement of either an expert system with a specialized database manager or the extension of a general-purpose DBMS getting expert system capabilities. In (a), both expert and database systems co-exist as independent systems communicating with each other. This form allows the DBMS to operate completely separately. The expert system may be seen as one user in a multi-user environment. Additionally, in many cases there is a need for consulting an existing very large database. One important point is how the coupling is activated. There are two distinctive ways: Loose coupling The required data are loaded as a snapshot at a fixed time from the existing database system into the expert system’s internal knowledge base. The expert system can access only those facts currently loaded. When the data actually loaded have been processed, the expert system may retrieve new data from the DBMS. This approach proceeds as follows. Based on its knowledge the expert system first generates the necessary DBMS queries. The DBMS then executes these queries and delivers the results to the expert system. A major advantage of loose coupling is that the expert system and the DBMS preserve their own identity, and the solution has an immediate practical application. There are two major disadvantages: • If the data do not fit into the main memory of the expert system, additional mechanisms for the memory management have to be activated. • If the extracted data are used, while the original version is updated, consistency problems may occur. A typical example often mentioned for this approach is the experience Olson & Ellis report (1982) about PROBWELL, an expert system used to determine problems with oil wells. Important data for such determination are stored in a large IMS Database System (Information Management System, a hierarchical database system from IBM Corp.). Most of the approaches of loose coupling have been followed in interfacing PROLOG-based expert systems with Relational DBMSs (RDBMS). This is caused by the syntactical similarity in representing data as facts in PROLOG (see section 6.2) and the storage of data as tuples in RDBMS. Different RDBMS like INGRES, ORACLE, UNIFY have been successfully interfaced to several PROLOG implementations. Special interest must be devoted to expert system development tools offering

Sec. 7.2]



concepts and mechanisms for the coupling of expert systems and database systems. KEE and NEXPERT OBJECT (see section 6.4) are two systems offering software packages for the coupling. KEEConnection KEEConnection is a KEE extension providing a bridge between relational databases which use the query language SQL, and expert systems developed with KEE. It consists of three software modules: mapping, translation and data communication. The mapping function generates a description of the relation between object structures in the database and in the expert system. Based on this information, the data requests are translated into SQL queries. The data communication module ensures the connection between database and expert system even in a network environment and for different databases. Depending on the application require­ ments, the expert system loads the necessary data either in one large step, or in smaller discrete steps. Further download operations can be triggered by rules during the inference process. Additionally, KEEConnection allows the user to upload data, so that the database can incorporate data from the expert system. NEXPERT OBJECTS database interface NEXPERT OBJECT provides interfaces to relational databases such as ORACLE and INGRES. In contrast to KEE the queries are not automatically generated but have to be formulated in SQL explicitly. Since NEXPERT OBJECT is totally implemented in C it allows a more direct coupling approach than KEE. Tight coupling The interaction between expert and database systems for data transfer is dynamic, which means that it can take place at any moment. Queries can be generated and transmitted to the expert system, and answers can be received in the expert system’s knowledge representation. This requires online-communication. Therefore, optimi­ zation becomes an issue to prevent inefficient use of the communication system. The major drawback is its bad performance. The processing time of the expert system decreases because of continuous database retrievals on secondary storage. Single database tuples are associated instead of the whole set of data necessary for the next working steps of the expert system. An extraordinary example of this approach is given by the BERMUDA System (Ioannidis, et al. 1989) in tightly coupling a PROLOG system with a database machine. BERMUDA (Brains Employed for Rules, Muscles Used for Data Access) BERMUDA realizes tight coupling between several PROLOG processes and a Britton-Lee Intelligent Database Machine 500 (IDM). Interaction between PRO­ LOG and DBMS processes is coordinated and scheduled by a central agent. The design goals of the system are • programming in BERMUDA should be the same as in PROLOG,



[Ch. 7

• efficient execution of programs, and • sharing of data among multiple PROLOG programs.



If handling of large volumes of data is required, expert system’s knowledge management cannot cope because of main memory size. The data are held in specific data structures and are accessed by specific routines. Data structures, like flavours (section 6.1), frames (section 2.6), lists (section 6.1), etc., and routines are provided by specific packages of the implementation language or systems like LOOPS or NEXPERT OBJECT (section 6.4). No problems occur when new data are derived by the expert system. The best example of the direct use of database features in this area is the programming language PROLOG (Clocksin 1986) with embedded predicates like ‘ASSERT’ and ‘RETRACT’. But all knowledge representation languages and tools including PROLOG are characterized by the direct manipula­ tion of data in main memory. Specific application areas like CAD/CAM or expert systems in financial management, require the necessity to process large volumes of case-specific data in the reasoning process. In these areas runtime problems might occur. To overcome this weakness, efforts are focusing on embedding a specific database management component into the expert system, as shown by a simplified architecture in Fig. 7.2. This can be accomplished by the extension of the implemen-

Fig. 7.2 — Architecture of an extended expert system.

tation language of an expert system or even more specifically for the expert system development by the extension of an expert system tool like Lafue and Smith did with the system STROBE (Lafue & Smith 1986). PROLOG has often been used for

Sec. 7.4]



implementations embedding DBMS capabilities in the expert system, because it resembles relational database concepts. Today, most of these approaches are in an early stage, but it is planned to expand them to full DBMS with features such as security, data sharing, etc.



In this approach, a general-purpose DBMS is enhanced by sophisticated data manipulation and control functions to allow more intelligent database queries. The theoretical basis for deductive databases is the first order logic (FOL). FOL offers a standard notation with explicit semantics and inference methods which can be mechanized. This fact accounts for the popularity of FOL in Al-applications (Kowalski 1979). The best example is the programming language PROLOG (section 6.2). A relational database consists of a set of tuples which correspond to wellformed formulas containing only ground literals in FOL. This characterization indicates restrictions in the reasoning capabilities of DBMSs compared to AIsystems: databases cannot present quantified information and disjunct information and have no capabilities for inferences. Above all, they cannot handle recursion! A deductive database is a conventional relational database (RDBMS) extended by the ability to deduce new facts. The rules arc in the form of horn clauses which represent a suitable mechanism for the formulation of rule-based systems. Parts, in particular the non-recursive queries, of the necessary reasoning capabilities can be embedded in modern relational databases by the dynamic generation of ‘views’. Fig. 7.3 illustrates how a PROLOG program is encoded by using the view concept. This is done with the well-known ‘parent relation'. Since ‘Peter is a parent of Susy’ and ‘Susy is a parent of Michael’, ‘Peter is a grandparent of Michael' can be inferred. Though (peter, michael) is in the relation GRANDPARENT, it should be emphasized that the relation GRANDPARENT is created dynamically, when a query like ‘Who is the grandfather of Michael?’ is made. This example demonstrates that the relational view concept provides support for such a kind of simpler rules. Problems occur in the handling of rules using recursion. Efficient iterative handling of recursive views and the provision of an efficient interface between this process and the database search process are still key issues in this approach. Al-Zobaidie and Crimson distinguish three different procedures to embed a deductive component into the DBMS (Al-Zobaidie & Grimson 1987). As shown in Fig. 7.4, they involve different forms of optimization to improve the efficiency and capability of DBMSs. The capabilities, as described above, give deductive databases the potential to do rule-based inference. Examples of this approach are offered in the MAD SMART DATABASE SYSTEM (MAD Intelligent Systems 1989) and the LOGICAL DATA LANGUAGE (Tsur & Zaniolo 1988). SM ART DA TABASE SYSTEM (SDS) SDS is a deductive database system developed on top of the relational database


[Ch. 7


Prolog Notation FACTS: parent(parent,child). RULES: grandparent(gp,gc):- parent(gp.X), parent(X.gc). RELATION: parent VIEW DEFINITION: create view grandparent(gp.gc) as select first.parent, second.child from parent first, parent second where first.child=second parent


Peter Peter Jack Susy Thomas Mary

Susy Jim Thomas Michael Nora Marilyn Liza Robert John Pearl Marilyn Nelly

grandparent VIEW

Peter Jack Mary

Michael Nora


Fig. 7.3 — Example: Prolog-database notation.

system TransBase. SDS’s host language Common LISP (section 6.1) has been used to implement the central component of SDS: Relational LISP. Relational LISP provides a group of functions like • database management functions for data definition functions; • query functions for data manipulation operations; • recursive functions for queries according to recursive rules. Further, Relational LISP includes the component Declare representing a com­ fortable rule interpreter. Additionally, SDS is capable of accessing and joining data from other database systems which are interfaced by specific adaptors. L o g i c a l D a ta L a n g u a g e

(L D L ,

M ic r o e le c tr o n ic s a n d C o m p u t e r T e c h n o lo g y C o r p . )

LDL represents a deductive database front end on the basis of horn clauses. Like

Sec. 7.51



Fig. 7.4 — Different approaches for embedding a deductive component.

SDS it wants to combine the benefits of Logic Programming and database query languages. It also provides optimization techniques for recursive queries. A different approach, compared to the systems enhanced by Logic Programming capabilities, has been followed with POSTGRES. In POSTGRES, an RDBMS (INGRES) has been enriched by a rule manager as a tool for data manipulation. POSTGRES reasoning facilities closely resemble the expert system shell OP55 (section 6.5). Further, POSTGRES is an example of an object-oriented database system (section 7.5) as another variant of enhancing DBMS with expert systems technology. The major goal of object-oriented databases is to add more semantics to the stored databases. Out of the growing importance of object-oriented architectures for several application areas related to A1 and its importance in programming as well as in modelling, section 7.5 gives an in-depth treatment of this topic.



The concepts of objects and object-oriented architecture represent one of the most promising methods pertaining to knowledge-based systems, databases, and pro­ gramming languages (Kim & Lochovsky 1989). The spectrum of applications and programming environments includes:



[Ch. 7

• Programming languages like SMALLTALK 80 (see Chapter 6) or LISP with the embedded FLAVOUR-System (section 6.3) • CAD systems and engineering design databases • Office information systems • Expert system shells, namely LOOPS and NEXPERT OBJECT (section 6.4) Shortcomings of classical data models for an efficient and adequate represen­ tation of object-oriented structure and behaviour favour the development of objectoriented database systems. These systems close the gap between expert system applications implemented with object-oriented languages or expert systems shells/ tools and the retrieval languages of database systems. Since classical data models are record-oriented, the knowledge of the interrelation of the objects and their behav­ iour is lost in the reproduction of data in a classical database. Object-oriented database systems fulfil the requirement of an adequate representation, offering a concept equivalent to model structure and object-interrelation. A d a ta b a s e s y s te m is c a lle d o b j e c t - o r i e n t e d , if the data model provides concepts for the representation of structure and behaviour of objects in the database. Additionally, we would like to have features such as: —

o b je c t id e n tity ,

i.e. the objects can be identified independently of their current

value; — —

d a ta e n c a p s u l a ti o n ,

i.e. the internal representation of a data structure is hidden; i.e. the user can define hierarchies between objects and can inherit features concerning structure and operations.

in h e r ita n c e ,

According to the different concepts for the representation of structure and behaviour we can distinguish three kinds of object-oriented data models (Dittrich 1988); S tr u c tu r a l o b je c t - o r i e n te d d a ta m o d e l s

These data models allow direct representation and manipulation of complex data­ base entities. Especially, the attributes of an entity do not necessarily contain simple values, but can also hold a highly structured object, e.g. necessary for the represen­ tation of office documents or VLSI design objects. Additionally, predefined (generic) operations on these highly structured database entities are provided by the data model; examples are the finding and reading of a specific object, deletion of the whole object, and navigation through an object. B e h a v io u r a l o b je c t - o r i e n te d d a ta m o d e l s

Modelling the behaviour of objects in the database in terms of user-defined, typespecific operators is unique to this class of model. The data structure in this model remains record-oriented. Users can define type specific operators based on the record-oriented data structures and use them in the same way as the generic ones, prescribed by the database system. Therefore, they have to specify the interface and the algorithms on the basis of the record-oriented data structures and elementary operations by means of the query language and a programming language.


Sec. 7.6]


F u ll o b j e c t - o r i e n t e d d a ta m o d e l s

This data model combines the structural and behavioural object orientation. It supports complex objects as well as user-defined type-specific operators. Since for the construction of complex objects user-defined types can also be used, and, further, user-defined types may use complex objects, any combination in the construction of objects and in the definition of new types is possible.

Fig. 7.5 — Object-oriented data models.

Fig. 7.5 shows the relation of these data models in relation to the classical relational database system. Examples for the full object-oriented database are the above mentioned POSTGRES (University of California/Berkeley, USA), for the behavioural data model, GemStone (ServioLogic, USA), and for the structural system, XSQL (IBM San Jose, USA).



Expert database systems (EDS) are unique in the way they integrate ideas from Artificial Intelligence, Database Management, Logic and Object-Oriented Pro­ gramming, and Information Retrieval. This synergism leads to new architectures for




intelligent systems. Although covered under one term, it has become obvious that there is no common architecture of EDS yet. The specific architecture of an EDS depends on the application area and on the developer/user community. The initiator of a specific development, whether originated in the database or expert system world, influences architectures as well as characteristics of a single system. In general, what can be summarized is that we already have some existing systems for every category of EDS mentioned in this chapter. Additionally, there are many developments under way, so that we can expect for the future a merging between database systems, expert systems, and programming languages, fewer differences between programs and data, more efficient implementations, and the occurrence of new concepts even in traditional systems. REFERENCES Al-Zobaidie, A. & Grimson, J. B. (1987) ‘Expert systems and database systems: how can they serve each other?’, Expert Systems, 4, No. 1. Bayer, R. (1985) ‘Database technology for expert systems’, In: Proceedings from the GI-KongreB, Wissensbasierte Systeme, Informatik Fachbericht 112. Bobrow, D. G. & Stefik. M. (1981) The loops manual, XEROX PARC. Clocksin, W. F. & Mellish, C. S. (1984) Programming in PROLO G , Springer-Verlag. Dittrich, K. R. (ed.) (1988) Advances in Object-Oriented Database Systems, Pro­ ceedings of the 2nd International Workshop on Object-Oriented Database Systems, Springer-Verlag. Goldberg, A. & Robson, O. (1983) SM ALLTALK-80: The language and its implementation, Addison-Wesley. IntelliCorp (1987) ‘KEEConnection: a bridge between databases and knowledge bases’, an IntelliCorp technical article. Ioannidis, Y. E., Chen, J., Friedman, M. A. & Tsangaris, M. M. (1989) ‘BER­ MUDA — An architectural perspective in interfacing Prolog to a database machine’, In: Kerschberg, L. (ed.), Expert database systems, Proceedings from the Second International Conference. Jarke, M. & Vassiliou, Y. (1983) ‘Coupling expert systems with database manage­ ment systems’, In: Artificial intelligence applications for business, W. Reitman (ed.), Ablex Publishing Corporation, Norwood, New Jersey. Jarke, M. & Vassiliou, Y. (1984) ‘Databases and expert systems: opportunities and architectures for integration’, New applications o f data bases, Academic Press. Kerschberg, L. (ed.) (1986) Expert database systems. Proceedings from the First International Workshop. Kerschberg, L. (ed.) (1987) Expert database systems. Proceedings from the First International Conference. Kerschberg, L. (ed.) (1989) Expert database systems, Proceedings from the Second International Conference. Kim, W., & Lochovsky, F. H. (ed.) (1989) Object-oriented concepts, databases and applications, ACM Press. Kowalski, R. (1979) Logic for problem solving. Artificial Intelligence Series, North-Holland.

Ch. 7]



Lafue, G. M. E. & Smith, R. G. (1986) ‘Implementation of a semantic integrity manager with a knowledge representation system’. In: E x p e r t d a ta b a s e s y s t e m s , Proceedings from the First International Workshop, pp. 333-350. MAD Intelligent System (1989) T h e M A D s m a r t d a ta s y s te m : a te c h n ic a l o v e r v i e w . Missikoff, M. & Wicderhold, G. (1986) ‘Towards a unified approach for expert and database systems’. In: E x p e r t d a ta b a s e s y s t e m s , Proceedings from the First International Workshop, pp. 383-399. Olson, J. P. & Ellis, S. P. (1986) ‘PROBWELL — an expert advisor for determining problems with producing wells’. P r o c e e d in g s o f th e I B M S c ie n tif ic ! E n g in e e r in g C o n f e r e n c e pp. 95-101. Smith, J. M. (1986) ‘Expert database systems: a database perspective’, In E x p e r t d a ta b a s e s y s t e m s . Proceedings from the First International Workshop. Stonebreaker, M. & Hearst, M. (1989) ‘Future trends in expert database systems’, In: Kerschberg, L. (ed.). E x p e r t d a ta b a s e s y s te m s . Proceedings from the Second International Conference. Tsur, S. & Zaniolo, C. (1988) ‘LDL: A logic-based data-language’, P r o c e e d i n g o f th e 12th V L D B C o n f e r e n c e , K y o t o . Vassiliou, Y. (1985) ‘Integrating database management and expert systems’. In: I n f o r m a t ik F a c h b e r ic h t 94, Proceedings. Vassiliou, Y. (1986) ‘Knowledge-based and database systems: enhancement, cou­ pling or integration?’, In: T o p ic s in in f o r m a tio n s y s t e m s , Brodie, M. C. & Mylopoulos, J. (eds).

8 Al techniques applied to business management Artificial Intelligence techniques have been applied to industrial and business management in several programs. This chapter discusses the Intelligent Manage­ ment System, ROME, ISIS, and Organization Modelling, all developed at the Intelligent Systems Laboratory of The Robotics Institute at Carnegie-Mellon University. The application of natural language interfaces to business management programs is also discussed. 8.1


Classical research in the area of factory automation has been concerned more with production processes than with management. Yet, it has been observed that in many small batch-size factories, white collar labour accounts for a large fraction of total labour-cost, and income cases exceeds 50%. And that small batch-size production accounts for 50-75% of the dollar value of durable goods produced in the United States. In metal-cutting, job-shop production environments, it has been found that only 20% of the time that an order is in a factory, is it actually mounted on a machine. And during only 5-10% of its time on the machine, are value-adding operations being performed. There are (at least) two approaches to dealing with this problem. The first is to discover new methods of producing products that do not suffer from these inefficiences. The second approach is to increase the effectiveness of pro­ fessional and managerial personnel. The latter approach is the objective of the Intelligent Management System (IMS) project. 8.2


The Intelligent Management System (IMS) project is part of the Factory of the Future project in the Robotics Institute of Carnegie-Mellon University. IMS is a long term project concerned with applying Artificial Intelligence techniques in aiding professionals and managers in their day to day tasks. This section discusses both the long term goals of IMS, and current research. It describes research in modelling of organizers, job-shop scheduling, organization simulation, user interfaces, and system architecture. The broad function goals of the Intelligent Management System Project include: (1) Providing expert assistance in the accomplishment of professional and manager­ ial tasks, and

Sec. 8.2]



(2) Integrating and coordinating the management organization. The result of the research has been targeted to run in a distribution multiproces­ sor and process environment (see Fig. 8.1). Employees will have a User Interface FACTORY

Fig. 8.1 — Distributed multi-processor and process environment.

Process (UIP) that will act as an intelligent ‘aide’. A UIP is composed of a personal computer, graphics display, keyboard, microphone, and network interface. The UIP will have either voice or typed natural language input. It will act as an ‘aide’ in the sense that it will interpret and implement user requests and queries. All UIPs will be interconnected via a communication network, allowing them to cooperatively




interact to solve problems and communicate information. The U1P will also carry out many of the employee’s well-structured tasks automatically. Each machine will have a Machine Supervisory Process (MSP) which monitors and controls it. It is also connected to the network, and can reply to queries and commands initiated by other MSPs or UIPs on the network. Lastly, there are Task Management Processes (TMP). A TMP provides the focus for task management. It does more of the mundane task monitoring and control, freeing managers to do the more complex decision making tasks. One use of a modelling system is for machine diagnosis. Engineers spend a great deal of time watching a malfunctioning machine to determine what has gone wrong and what variable to alter to improve performance. One of the major stumbling blocks in the systematic analysis of such systems is the unavailability of process instrumentation for data collection. Yet the availability of the data solves only half of the problem. The other is the automatic analysis of data to find relations between system parameters and productivity. Statistical correlations are only a small part of the analysis. Statistics alone can only suggest relations. Understanding whether the relation is valid requires a thorough knowledge of the process itself and all its interactions. This cannot be performed without a sufficiently rich model of process describing physical, functional, and time relations. Another use of a model is to provide process cost analysis. Given a model, questions about the resource consumption and production of each process and subprocess can be answered. The individual process descriptions can be integrated across the group to provide summary cost information. But more importantly, because the model represents not only the process but the effects and relations among processes, questions related to process alterations can also be analyzed. A natural language understanding and discourse modelling system to support model acquisition, factory layout, and job-shop scheduling is currently underdevelopment. Some important aspects of the IMS arc: • Flow-Shop Modelling: A complete model of a printed-wire-board (PWB) bareboard production plant has been constructed at the machine level. • Job-Shop Modelling: A model of part of a turbine component production plant has been constructed to support simulation and job-shop scheduling functions. • Simulation: Many 'what if' questions are encountered in understanding of structural changes in the factory. For example, the decision to put in a more flexible but more expensive machine depends on its effect on the performance of the factory. In a job-shop, it is difficult to answer questions analytically owing to the complexity of the model. Simulations are used to predict the performance of complex systems. Hence, they are a potentially useful tool to provide managers. • Job-Shop Scheduling: One of the most important aspects in a factory is schedul­ ing. The scheduling capabilities of the IMS are presented in the ISIS section which follows.



ROME is a knowledge-based support system for long range financial planning, developed at Carnegie Mellon University (Fox 1982).

Sec. 8.3]



Many business decisions are based on information produced by computerized financial planning models. While the models themselves may be quite sophisticated, their computer implementations generally do little more than calculate and display the results. Not much attention is given to screening the input data for anomalies, verifying that the data satisfy the assumptions of the model, or checking to make sure that the output seems reasonable for the situation at hand. Nor are there facilities for explaining what the output represents, showing its derivation, or justifying the results to users who are not familiar with what a particular program does. Traditio­ nally, these tasks have been left to human analysts who could intelligently apply a program model to answer managerial questions. The ROME project, being sponsored by the Digital Equipment Corporation, is an effort to develop a knowledge based system which could itself perform many of the above tasks and hence more effectively support decision making in the area of long range financial planning. Our approach is based on the idea that current programs are limited by a lack of knowledge, i.e. they simply don’t know what the variables in the models they manipulate mean. For example, they don’t have knowledge of how the variables are defined in terms of real world entities, and so they cannot explain what the variables stand for. They don't themselves keep track of the relationships used to derive the variables, and so they can’t explain how they got their values. They have no knowledge o f‘normal’ versus ‘abnormal’ circumstances, and so cannot detect peculiar values, whether they be for input, intermediate, or output variables. Finally, they have no sense of the consequences implied by the variables and hence cannot tell ‘good’ values from ‘bad’ ones with respect to the goals of the organization. In contrast, the overall goal for ROME is to make the meaning of the variables available to and usable by the system itself. Therefore, we have developed an expressive representation for financial models using the SRL 1.5 knowledge rep­ resentation language. This representation allows ROME to keep track of the logical support for model variables, such as their external source, method of calculation, and assumptions that must hold for the variables’ values to be valid. Tracing back through the dependencies associated with a variable’s computation can be used to explain why a value should be believed. Similarly, ROME can challenge the values of a particular variable by comparing them against relevant expectations, organizatio­ nal goals, and independently derived values. Prototype implementations of two subsystems, called ROMULUS and REMUS, have recently been completed. ROMULUS is the user interface for the ROME system. Instead of the rigid and stylized input language used with most computerized support systems, ROMULUS has been designed to accept natural language queries about the model expressed in English sentence form. The query types currently understood are those which relate to definitions and calculations such as ‘What is the definition of production spend­ ing?’ and ‘How was line 46 calculated?’ ROMULUS also supports the interactive construction and editing of financial models in natural language, by allowing the addition of new variables, formulas, and constraints on variables. Examples of acceptable user assertions are ‘Define year and people to go up.’ A major goal for ROMULUS has been to make the system as cooperative as possible, by including ways to recover from user mistakes (e.g. by spelling correction) and to tolerate (e.g. by accepting synonyms and variations in syntactic form).



[Ch. 8

REMUS is the financial model reviewing expert for the ROME system. Given a financial model and a set of constraints entered by users which represent plan reviewer expectations and corporate goals, REMUS scans the models to detect constraint violations which are then reported to the user. When a constraint violation is detected, REMUS attempts to determine the underlying circumstances that account for it by examining the formulas, input, and intermediate variables that are involved in its computation. By this process, REMUS can localize the source of a constraint violation to the input variable(s) which seem to be responsible. An integrated version of the ROME system, called ROME 1.0, was delivered to Digital in 1983. The CMU group is presently extending the capabilities of the ROME system in three areas: the causal diagnosis of constraint violations, the dependency based revision of financial models in the face of inconsistency or change, and the support of user explorations of hypothetical plans. This work will make major additions to not only the REMUS and ROMULUS subsystems, but to the SRL 1.5 knowledge representation language itself. The relevance of this research to IMS design stems from the fact that intelligent management systems will most likely not be stand alone systems that will be integrated with conventional databases and with causal models of an organization and its environment. In addition, AI languages and inference engines may have to be modified (e.g. with sensitivity analysis procedures) so that they assume some of the character of a decision support system. The issues in this area have not yet been resolved, but it is clear that the designers of IMS will have to take into account not only the research that has been done on managerial and organizational behaviour during the past few decades, but also the research that has been and is being done on decision support systems.



The ISIS project was completed to an operation prototype stage by a CarnegieMellon University group consisting of Mark Fox et al. (1979, 1983). The ISIS project began in the summer of 1980 in conjunction with the Westinghouse Corporation Turbine Component Plant in Winston-Salem, NC. The goal of ISIS is to investigate new, Al based approaches to solve problems in the manage­ ment and control of production in a job-shop environment, the results of this investigation being an operational prototype. At present three versions of ISIS have been constructed: ISIS-I (December 1980), ISIS-II (December 1982), and ISIS-III (December 1984). ISIS has been designed and implemented so that its functions are independent of the particular plant. The remainder of this summary describes the major capabilities of the latest version. The level of intelligent processing behaviour a system may exhibit is limited by the knowledge it has of its task and its environment. To enable ISIS to perform ‘intelligent' management and production control, an Artificial Intelligence approach is used to model the production environment. The SRL knowledge representation system is used to model all relevant information.

Sec. 8.4]



Conceptual primitives have been defined for the modelling of organizations. They include: • • • • • •

States (of the organization) Object descriptions (e.g. parts, attributes) Goals (e.g. shipping orders) Time Causality Possession

SRL provides ISIS with the capability of modelling a plant at all levels of detail: from physical machine description to process descriptions, to organizational structures and relations. SRL subsumes functions normally provided by database systems. ISIS uses SRL to describe all products, resources including machines, tools, personnel, etc., operations, departments, plant layout, and other information necessary to support its functionality. The following describes some of SRL’s uses in ISIS: • Order definition. Any type of order, e.g. live, forecast, customer, manufacturing, etc., can be created, and updated interactively. New types of orders can be created as needed. • Lot definition. Orders may be grouped into lots, which may run as a unit through the plant. • Resource definition. Resources such as machines, tools, fixtures, materials, and personnel can be defined and used by an extendable set of functions. Resource definitions include substitutability in operations, and current operation assign­ ments in the plant. • Line-up (operation) definition. How a product may be produced may be described as an operations graph. The operations graph describes all alternative operations, processing information, and resource requirements. Operations can be described hierarchically, enabling the description of operations at varying levels of detail. • Work area definition. Cost centres, work areas, and any other plant floor organizations may be defined, and resources, sub-work areas, etc., can be defined, and resources, sub-work areas, etc., can be assigned to them. Possible uses besides scheduling include accounting, personnel, and other functions. • Department definition. Department, personnel, and any other organization structures may be defined, and linked with other parts of the model. • Reservation definition. Any resource may be reserved for an activity (operation). ISIS provides full reservation creation and alteration, both interactively and automatically ( S c h e d u lin g ) . • Plant organization. The plant may be described hierarchically both from an organization structure perspective, and a physical layout perspective. This is used to support functions such as colour graphic displays of the plant layout. ISIS provides full interactive model perusal and editing from multiple users in parallel. The user may alter the model of the plant from any terminal. Perusal is provided in a number of forms including menus, a simple subset of English, and graphic displays. The interactive sub-system of ISIS provides the user with the ‘hands on' capability of creating and altering schedules interactively. The system monitors the



[Ch. 8

user's actions and signals when constraints are broken. The following are a few of its features: • Lotting. ISIS provides the user with an interactive lotting facility for searching and examining orders, and grouping them into lots. • Resource Scheduling Resources may be scheduled by reserving them for use in a particular operation at user specified time. Such resources are noted both with the resource and the reserver (lot). • Hierarchical scheduling. The user may construct schedules at different levels of abstraction, and ISIS will automatically fill in the other levels. For example, the scheduler may schedule only the critical facilities, and ISIS will complete the schedule at the detailed level. • Overlap flagging. If user defined reservations for resources result in conflicting assignments, ISIS will inform the scheduler, and may automatically shift other reservations. • Constraint checking. Whenever a reservation for a resource is made, ISIS will check all relevant constraints, and inform the user of their satisfaction. For example, if the reserved machine cannot be used for that sized product, the user will be informed at the time the reservation is made. ISIS allows the specification of almost any constraint the user may wish to specify (Automatic Scheduling). Analysis of the Winston-Salem plant, and other job-shops, has shown that the driving force behind scheduling is the determination and satisfaction of constraints from all parts of the plant. Current approaches to scheduling fail in their inability to consider all the constraints found in the plant, hence their results are mere suggestions which are continually changed on the factory floor. Hence ISIS was designed to perform constraint-directed scheduling of job-shops. ISIS provides the capability of defining and using almost any constraint in the construction of schedules. It is able to select which constraints to satisfy, on an order by order basis, and if they are not all satisfiable, ISIS may relax these constraints. Types of constraints include: Operation alternatives Machine alternatives Machine physical constraints Queue ordering preference Start data Work-in-process Resource requirements Personnel requirements Shifts Productivity Productivity goals Order priority

Operation preferences Machine preferences Set-up times Queue stability Due data Local queue time Resource substitutions Resource reservations Down time Cost Quality Lot priority

The following are some of the capabilities of automatic scheduling: • Constraint Representation. Much of the work in ISIS has centred around a

Sec. 8.4]



general approach to the representation of constraints. The results of this research allow ISIS to represent and use almost any constraint the user desires. Information representable in a constraint includes: duration of applicability over time and during operations, obligation by the system to use the constraint if it cannot be satisfied, interactions amongst constraints, and the utility of their satisfaction. • Multi-level Scheduling. ISIS allows scheduling to be performed at differing levels of detail and perspective. It currently performs a bottleneck analysis whose output is a set of constraints to the detailed scheduling level. • Bottleneck Analysis. The capacity level of scheduling determines the availability of machines to produce an order. It puts a set of constraints on when each operation for an order should be performed so that it can avoid unnecessary waiting, and tardiness. • Detailed Scheduling. The detailed level of scheduling provides complete schedul­ ing of all resources required to produce an order. It takes into account all relevant constraints, both model and user defined. As it constructs a schedule, it tests to see how well the schedule satisfies the known constraints. If important constraints cannot be satisfied, they will be relaxed. This level searches for a constraint satisfying schedule. ISIS provides the user with interfaces to update the status of all orders and resources. If the new status does not coincide with its schedule, then ISIS will reschedule only the affected resources. For example, if a machine breaks down, all the affected orders are rescheduled. Hence, ISIS can be used to provide real time reactive scheduling of plants. It can also be extended to connect to an on-line data gathering system on the plant floor, providing real time updating of plant status. One of ISIS's features is that its constraint representation allows it to perform a subset of generative process planning. Currently, ISIS has knowledge of a product’s basic physical characteristics, and can choose machines based on them. These constraints can be extended to include geometric information. Since ISIS schedules all specified resources, the user is informed of the resource requirements needed to satisfy the production. Constraints on the utilization of resources (e.g. machine, tools, personnel, etc.) may be specified and used to guide detailed scheduling. This enables departments, like advance planning, to specify resource utilization constraints directly to ISIS. The ISIS interface is menu/window based. The user is presented with multiple windows of information on the screen, plus a menu of commands to choose from. The menu system provides a network of displays ranging from order entry/update to interactive lotting to report generation. All reports may be printed directly to the screen or to an attached printer. A simple natural language interface is also provided to the user for perusing the plant model. A device independent colour graphics display system is also available. It is currently used to view a blueprint-like display of the plant, and to zoom in on work areas and machines. It can be used to display the status of the plant during operation. ISIS has been designed to be a multi-user system. Its model can be shared amongst multiple programs. Hence, the model may be perused and altered from multiple locations in the plant, allowing departments to get the information they need to perform their tasks, and to provide information directly to ISIS. ISIS is being



[Ch. 8

extended to determine who the user is and to restrict access and alteration capabili­ ties, depending on their function. ISIS is part of the Intelligent Management Project, hence it can use other functions available in the project. For example, KBS, a knowledge based simulator, can interpret the organization model directly to perform simulations. All the users have to do is modify the model to reflect the environment they wish to simulate. So far, results have been very encouraging.



A program for organization modelling was developed by Mark Fox (1983) at Carnegie-Mellon University. The management and analysis of an organization require a richness and variety of information not commonly found in the database of management information systems. For example, a simulation system requires knowledge of existing processes including process times, resource requirements, and its structural (routing) relation to other processes. It must also know when routings for products are static or are determined by a decision process such as a scheduler. In the latter case, it must know when and where to integrate the scheduler into the simulation. If the IMS is to generate the sequence of events to produce a new product, it must have knowledge of processes (e.g. machines) which includes the type of processing it can do, its operating constraints, the resources it consumes, and its operating tolerances. If data are to be changed in an interactive, possibly natural language mode, the IMS must have knowledge of generic processes such as machines, tasks, and departments if it is to understand the interaction. It must also know what information is important and how it relates to other information in order to detect missing information and inconsistencies. Hence, the organizational model must be able to represent object and process descriptions (structural and behavioral), and functional, communica­ tion, and authority interactions and dependencies. It must represent individual machines, tools, materials, and people, and also more abstract concepts of depart­ ments, tasks, and goals. Current organizational models are found typically in databases fragmented across one or more computer systems. How information in the database is inter­ preted is defined by the program and not by agreed-upon conventions of field and relation names (though work in relation schemata is proceeding). By taking an AI knowledge representation approach to orgnization modelling, the variety of infor­ mation described above can be represented. The model is accessible by all subsys­ tems, while the semantics of the model is jointly understood. Secondly, an AI approach to organization modelling provides the information required by all man­ agement and analysis functions. While much of the information enumerated can be represented by using current AI knowledge representation techniques, there is still much that requires craftmanship and is poorly understood. More work is required to standardize the represen­ tation of casual relations, data changes over time, and idiosyncratic inheritance relations. To date, the research has focused on the use of the SRL knowledge represen­




tation system as the basis for organization modelling. SRL has been extended to include conceptual primitives such as: • • • • •

actions, states, and objects constraints and their relations time causality and dependencies belief

With these primitives, detailed models have been constructed (Fox 1984). These models are embedded in the new held of distributed decision making (DDM). DDM is based on the three elements of communication, individual problem solving, and coordination. Clearly, the participants in DDM must be able to communicate with each other in the sense of making both requests and responses. In addition to conventional modes of communication, computer based communication methods can support distributed decision making. Both remote and local communication networks, as well as electronic mail systems, can have favourable impacts on the speed and cost of communicating. DDM also requires an effective coordination of individual solving activities. Coordination defines the structural and dynamic pattern of inter-agent actions. It has several aspects including planning, control, and review. Planning involves problem reduction, scheduling, and synthesis. Control indicates mediation, negotiation, and execution. And review deals with performance evalu­ ation as well as organizational learning. Moreover, coordination occurs in a context of concurrent problem solving where multiple decisions are pending simultaneously. The principles of DDM must provide adequate constructs for explaining and modelling these coordination aspects with respect to various organization designs, leading to knowledge based organizations (Holsapple & Whinston 1987). REFERENCES

Fox, M. S. (1983) ‘The intelligent management system: an overview’, In: Sol, H. G. (ed.), P r o c e s s e s a n d T o o ls o f d e c is io n s u p p o r t . North Holland: Amsterdam. Machine tool technology, A m e r i c a n M a c h in is t , 124 No. 10, October 1980, pp. 105-128. Fox, M. S. (1983) P r o je c t s a m p l e r , Carnegie-Mellon University, The Robotics Laboratory. Fox, M. S. (1979) O r g a n iz a tio n stru c tu rin g ', d e s ig n in g la r g e c o m p l e x s o f t w a r e , Technical Report CMU-CS-79-155, Computer Science Department, CarnegieMellon University, Pittsburgh, PA. Fox, M. S. (1982) T h e in te llig e n t m a n a g e m e n t s y s te m : a n o v e r v i e w , Carnegie-Mellon University Report No. CMV-RI-TR-81-4. Fox, M. S. (1984) ‘ISIS: a knowledge-based-system for factory scheduling, E x p e r t s y s t e m s 1(1). Fox, M. S., Smith, S. F., Allen, B. P. & G. A. Strohm, (1983) ‘ISIS: A constraint directed reasoning approach to job-shop scheduling’, P r o c . o f th e I E E E C o n f . o n T r e n d s a n d A p p li c a ti o n s .

O ’Connor, D. E. (1984) ‘Using expert systems to manage change and complexity in manufacturing. In: Reitman, W. (ed.), A r ti f i c ia l in te llig e n c e a p p l i c a t i o n s f o r b u s in e s s . Ablex: Norwood, N. J.



[Ch. 8

Holsapple, C. W. & Whinston, A. B. (1987) Business expert systems, Homewood: Irwin.

9 Industrial applications of Al There are numerous potential applications for artifical intelligence technology within the industrial environment: • Robotic applications involving AI are presented in Chapter 11. • Speech recognition technology and industrial applications, such as voice data entry, are presented in Chapter 10.



XCON is the best known expert system used in manufacturing. This program was developed by John McDermott at Carnegie-Mellon University in 1980 (Hayes-Roth 1983, Kraft 1984). It was originally called ‘R l’, and is still referred to by that name by many in the research community. XCON is a program that configures VAX-11/780 computer systems. Given a customer’s order, it determines what, if any, modifications have to be made to the order for reasons of system functionality, and it produces a number of diagrams showing how the various components of the order are to be associated. The program is being used on a regular basis by Digital Equipment Corporation’s manufacturing organization. XCON is in the control cycle implemented as a production system. It uses ‘Match’ as its principal problem solving method; it has sufficient knowledge of the configuration constraints that at each step in the configuration process, it simply recognizes what to do. Consequently, little search is required to configure a computer system. A typical VAX-11/780 computer has 100 or more different components in its final form. These must be selected from a set of about 420 components, many of which are bundles of more basic components. All these individual items must be placed in cabinets that will fit in the available floor space. The components must be intercon­ nected properly; power supplies must be provided that have sufficient capacity to supply adequate voltage and current to all components for all conditions of operation. Enough cabinet space must be provided so that everything fits in. Many different peripherical devices can be connected to the system. Thus, the configu­ ration of a typical VAX-11 system is a difficult task. Not only are many components involved, but certain interactions and constraints apply. When connecting modules to a data bus, for example, one sequence of component placement will be optimal, while another will not be as satisfactory.



[Ch. 9

Further associated backplanes are available with either 4 slots or 9 slots. Further considerations may suggest that a non-optimal arrangement of modules will lead to a less costly configuration. Many trade-offs of this type must be considered before a satisfactory configuration can be obtained. The XCON program is structured as a production system (IF (A is true) THEN (Do B) ELSE (Do something else)) using a computer language known as OPS5. OPS5 is a powerful language for expressing patterns, but is not very close to natural English, thus it requires translation or knowledge of OPS5. The program is accessible on a number of terminals throughout the DEC manufacturing system and is widely used in the company by different people. XCON performs a number of different subtasks in the process of configuring a system. The first subtask is to determine whether the purchase order received from the salesman is a reasonable system, is more or less complete, and that all the elements are compatible with respect to voltages and operating frequencies. If some components are incompatible with others, XCON substitutes components that will be compatible. XCON also determines whether any components of the order require other components that are not on the order. If any components are missing, XCON adds them to the order. It checks to see that at least one master drive, for example, is available for every seven slaves. This subtask uses 196 production rules. Subtask number 2 entails putting the components of the CPU into cabinets. In accomplishing this subtask, XCON uses the templates of the cabinets extensively. The subtask involves a number of considerations concerning the CPU, memory, and power supplies. If, for example, the order contains options for the floating point accelerator or writeable control store, boards for these options must be put into a certain space in the CPU backplane. When the memory is put in, the number of memory controllers and adaptors will determine whether or not the memory is to be interleaved. Associated with the CPU cabinet may be an expansion cabinet. XCON also fills this with the appropriate components. The subtask uses 87 rules. Putting boxes into unibus (a part of the computer) expansion cabinets and putting unibus modules into the boxes constitutes the third subtask. Because the amount of box space is limited, all of the modules must be configured before XCON can determine if the configurations are sometimes generated before XCON finds one that is acceptable. Unacceptable configurations are generated because the modules are to be placed on the unibus in as optimal a sequence as possible. To prevent an exhaustive search for an optimum sequence, XCON classifies sequences as optimal, almost optimal, and suboptimal. If the amount of space in the boxes allows an optimal sequence, XCON uses this sequence. If the space available does not allow an optimal sequence, XCON will settle for an almost optimal sequence. Finally, if the space available is still not enough, XCON will try for a suboptimal sequence. If this effort also fails, XCON will add another box to the order and begin the subtask again. XCON uses 256 rules for this subtask. In subtask four, XCON assigns panels to cabinets. Typically, all XCON has to do is select panels of the right type and size. Sometimes a panel will be added to the order. This subtask requires 61 rules. The last subtask, number six, is to determine what cables are to be used to interconnect the various devices of the system. This requires determining the lengths and types of cables required. Cable selection needs 36 rules.

Sec. 9.3]



When all six subtasks are finished, XCON produces an output that describes the system.



A great many AI programs use the technique known as heuristic search. In this technique, a possible solution is found and then evaluated; the technique is also known as ‘Generate-and-Test’. The XCON program, however, does not use this technique to any great extent. Instead, XCON uses a generalized form of matching. In essence, XCON has an idealized concept of what the final configuration of any VAX-11 system should be. It then proceeds to match the target system against the ideal. Initially, XCON has available only the descriptions of the components ordered. However, XCON’s rules allow it to determine what step it should take next. One result of this technique is that XCON does not have a backtrack and undo work it has already done. XCON is probably the first of the domain specific knowledge engineering AI system to use the match strategy as its major problem solving method. The major difference between XCON and those AI programs or systems that use ‘Generate-and-Test’ is that XCON generates only one hypothesis before it reaches a conclusion. XCON’s rules, which are used to perform the various subtasks, can be divided into domain-specific rules (480 rules) and general rules (292 rules, 772 rules total). Domain-specific rules do such things as generate new contexts, examine prerequi­ sites, retrieve information about components, and perform computations. The general rules are used to generate outputs, decide how to proceed from one context to the next, and generate empty data structures for use by domain-specific rules.



XCON (or R l) was begun in December 1978. The first phase took about four months and learning the basics of the VAX system configuration plus writing and initial program that had fewer than 200 rules. This version could fill simple orders correctly but could not handle complex orders. In the second phase, XCON's outputs were examined by DEC experts in VAX configurations, and its mistakes and their causes were identified. During this phase, XCON’s domain specific knowledge was increased about three times. One of the problems in developing rules for domain specific knowledge base systems is that the human experts who are supplying the information do not realize the depth and extent of the knowledge they possess. Thus the initial rules generated in the development of such systems are usally grossly inadequate. Only when the results are examined, the exceptions noted, and other constraints on the system defined and elucidated, can the system be brought to the expert level. When XCON was first put in place at DEC, some employees did not believe it would be able to configure systems successfully. This attitude has changed. In a sample of more than 300 orders, for example, the configuration provided by XCON was accepted by human configuration in all but nine cases. The problems associated



[Ch. 9

with the nine unacceptable configurations were identified, and the rules responsible for the problems were modified. In handling a typical order for a system having about 1(X) components, XCON requires from 3 to 5 minutes of CPU time. XCON has successfully configured thousands of VAX computer systems and has done so faster, more accurately, and more thoroughly than had ever been done before. At the present time DEC has expanded the XCON technique so that it can act as a salesman's assistant in ordering systems. In addition, XCON has been expanded to cover more of the products manufactured by DEC.


Hayes-Roth, F., Waterman, D. A. & Lenat, D. B. (eds) (1983) B u il d in g e x p e r t s y s t e m s , Addison-Wesley: Reading, Mass. Kraft, A. (1984) ‘XCON: an expert configuration system at Digital Equipment Corporation’, Chap. 3, In: Winston, P. H. & Prendergast, K. A. (eds): T h e A I b u s i n e s s , MIT Press: Cambridge, Mass. McDermott, J. (1981) ‘RI: the formative years’, T h e A l M a g a z in e 2 (2), pp. 21-29. McDermott, J. (1982) ‘RI: a rule based configurer of computer systems, A r ti f i c ia l I n te llig e n c e 19(1), pp. 39-88. McDermott, J. (1984) RI revisited: Four years in the trenches, A l M a g a z i n e 5 (3), pp. 21-24.

10 Speech recognition Voice or speech recognition systems are probably the area of Artificial Intelligence with the greatest commercial potential. It is possible that this form of computer data input will surpass CRT usage at some point in the future if the right level of performance can be achieved.



There are several classifications of speech recognition systems. The word recognition process may be broken into two parts: matching templates and language structure. The voice recognition process may be classified as: • Continuous-speech recognition: word endpoints may be ambiguous (e.g. ‘porous’ versus ‘poor us’), but endpoint determination can be aided by knowledge of the language syntax. • Isolated-word recognition: word endpoints are determined by the periods of silence, which typically must last at least 200ms. Voice recognition systems which are ‘speaker independent’ are designed to recognize the speech of most speakers. Such systems offer the most flexible potential application. However, within the current state-of-the-art, a higher recognition reliability is achieved with ‘speaker dependent’ systems. These systems are ‘trained’ by the operator by repeating each word in the vocabulary several times. Dialects, accents, or the language itself make no difference. A Spanish-speaking operator, for example, might choose to train the system in Spanish. Each operator’s training utterances are stored on tape while his personal identification code, keyed in before data entry, loads his individual training utterances into the system’s active memory. There is no limit to the number of operators who can use such a system, even if they all speak different languages. Speech recognition and understanding should be distinguished (Catalano 1983): • Speech recognition: Recognition by a computer (primarily by pattern-matching) of spoken words or sentences.



[Ch. 10

• Speech understanding: Speech perception by computer. AI programs for natural language understanding are discussed in Chapter 5.



There are several unique advantages of voice programming over other forms of operator data entry: • An added element of safety can be accomplished. An employee may simply shout ‘stop’, and a machine may respond to prevent a potential accident. • Instantaneous control is achieved. The program may be changed or halted while in progress with simple commands. • The use of voice for programming is more ‘user friendly’ than pendant or CRT teaching; the programmer would require a minimum of training and would not be required to have any direct contact with the computer controller. • While access to CRT is limited to one person at a time, multiple person programming may be employed, using voice programming. • Voice communications are faster, less fatiguing, and present less eye strain than CRT interaction for programmers who perform all-day tasks. • The novice in computer use can easily relate to voice communications. This may lead to a more rapid employee acceptance, particularly among those who may have never worked with a computer. • Command control can be made in a ‘heads busy, eyes busy' situation.



The first step in voice recognition is the conversion of the acoustical signal into an electrical signal by a microphone. These analog signals are then converted into the binary pattern which the computer can recognize. After a number of repetitions they can distinguish the binary representation of, say, ‘RUN’, from the binary represen­ tation of ‘LIST’, or some other word in their vocabulary — and then act on the command in the same way as if it has been entered into the computer through a keyboard (Catalano 1983). Two approaches to extracting speech spectra are com­ monly applied. With the zero-crossing technique, fillers split an input signal into three frequency bands; zero-crossing detectors then estimate the dominant fre­ quency in each. With the Fourier transform method, the spectrum is computed directly (Elphick 1982). The voice recognition process is shown in Fig. 10.1. The process consists of four steps. To convert voice input from analog to digital representation the analog waveform is sampled thousands of times per second in various segments of the sound frequency spectrum. Finding the best match between the input word and the admissible vocabulary involves extensive statistical analysis, so fast numerical processors and sophisticated algorithms are required for real time applications. Processing time in this step is proportional to the vocabulary size of the application. After finding the best match to justify a ‘recognition’, confirmation is sent to the

Sec. 10.3]


Fig. 10.1 — Voice recognition process.




[Ch. 10

speaker by voice synthesis or CRT display. If no recognition is found, the system can prompt the speaker for more information to clarify the situation. Increasing the system’s threshold for recognition reduces system throughput (by generating more requests for follow-up information from the speaker) but reduces the number of errors from ‘recognizing’ the incorrect word.



In a five-year project, the Defense Advanced Research Projects Agency (DARPA) sponsored research at several universities and research organizations to develop speech-understanding systems. The goals of the DARPA program (Freedman 1983, Greene 1982) were to develop a system that would ‘accept connected speech from many cooperative speakers of the General American Dialect, in a quiet room, using a good quality microphone, with slight tuning per speaker, requiring only natural adaptation by the user, permitting a slightly selected vocabulary of 1000 words, with a highly artificial syntax and highly constrained task, providing graceful interaction, tolerating less then 10% semantic error, in a few time real time on a 100-millioninstructions-per-second machine, and be demonstrable in 1976 with a moderate chance of success. The four primary systems develop in the DARPA program are: HEARSAY-II (Carnegie-Mellon), HWIM (Bolt Beranek and Newman), the SRI System (SRI International), and HARPY (Carnegie-Mellon). All of the systems are based on the idea of diverse, cooperating KS (Knowledge Sources) to handle the uncertainty in the signal and processing. They differ in the types of knowledge, interactions of knowledge, representation of search space, and control of the search. The following sections provide a brief overview of the four systems. More recently (1987) DARPA is targeting more ambituous programs. Various application environments require systems capable of up to 10000 word vocabularies with 98% recognition of words from independent speakers in real time. A recogni­ tion set of this size represents the vocabulary an average individual uses 80% of the time. Advanced Speech Systems must provide better recognition capability in noisy environments, with speaker independence, and without semantic and syntactic constraints. A new generation speech system would include demonstrated capabili­ ties of enhanced recognition times, more robust recognition of speech, more accurate recognition of speech, and better continuous and discrete speech recogni­ tion capabilities. So far (1988), DARPA-research has demonstrated a high perfor­ mance continuous speech recognition system with a 99% word accuracy on a 350word task, and has developed modules for high performance phonetic recognition using declarative knowledge to support a system for speaker independent continuous speech recognition.



The HEARSAY-II system, developed at Carnegie-Mellon University, was funded by the DARPA program. It recognizes connected speech in a 1000-word vocabulary with correct interpretations for 90% of test sentences. Its basic methodology uses

Sec. 10.5]



symbolic reasoning as an aid to signal processing. A marriage of general Artificial Intelligence techniques with specific acoustic and linguistic knowledge was needed to accomplish satisfactory speech-understanding performance. Because the various techniques and heuristics employed were embedded within a general problem solving framework, the HEARSAY-II system embodies several design characteristics that are adaptable to other domains as well. Its structure has been applied to such tasks as multisensor interpretation, protein crystallographic analysis, image understanding, a model of human reading, and dialogue comprehension. The HEARSAY-II problem solving framework reconstructs an intention from hypothetical interpretations formulated at various levels of abstraction. In addition, it allocates limited processing resources first to the promising incremental actions. The final configuration of the HEARSAY-II system comprises problem solving components to generate and evaluate speech hypothesis, and a focus-of-control mechanism to identify potential actions of greater value. Many of these specific procedures reveal novel approaches to speech problems. Most important, the system successfully integrates and coordinates all of these independent activities to resolve uncertainty and control combinatorics. A schematic of the HEARSAY-II architecture is shown in Fig. 10.2. H W IM system (LaBrecque 1983, Levas & Selfridge 1983)

Fig. 10.3 shows the structure of Bolt Beranek and Newman’s HWIM (Hear What I Mean) system. In overall form, HWIM’s general processing structure is striking similar to that of HEARSAY-II. Processing of a sentence is bottom-up through audio signal digitization, parameter extraction, segmentation and labelling, and a scan for work hypothesis. Following this initial phase, the Control Strategy module takes charge, calling the Syntax and Lexical Retrieval knowledge sources as subroutines. The SRI system (Medress el al. 1978, Newell et al. 1973)

The SRI system, though never fully operational on a large vocabulary task, presents another interesting variant on structuring a speech understanding system. It uses an explicit control strategy with, however, much more control being centralized in the Control Strategy module. The designers of the system felt there was ‘a large potential for mutual guidance that would not be realized if all knowledge source communica­ tion was indirect'. Part of this explicit control is embedded within the rules defining the possible constituent structure for phrases in an extended form of BNF, contains procedures for calculating attributes of phrases and factors used in rating phases. These procedures may, in turn, call as subroutines any of the knowledge sources in the system, the attributes such as the representation of the meaning of the phase, and discourse attributes for anaphora and ellipsis. Thus the phase itself is the basic unit for integrating and controlling knowledge source execution. The HARPY system (Lowerre 1976, Lowerre & Reddy 1980)

The HARPY system was developed at Carnegie-Mellon University. Most of the knowledge of this system is precompiled into a unified structure representing all possible utterance; a relatively simple interpreter then compares the spoken utter-



[Ch. 10

Blackboard Leveln


Level? Level i



Program modules Databases

Data flow Control flow

Fig. 10.2 — Schematic of the HEARSAY-11 architecture.

ance against this structure to find the utterance that matches best. The motivation for this approach is to speed up the search so that a larger portion of the space may be examined explicitly. In particular, the hope is to avoid errors made when portions of the search space are eliminated on the basis of characteristics of small partical solutions; so to this end, pruning decisions are delayed until larger partial solutions are constructed.

10.6 APPLICATIONS The most popular current applications of voice recognition include: • Voice data entry • NC programming • Robotics

Sec. 10.7]



Fig. 10.3 — Structure of HW1M.

These applications are discussed in the following sections. While they are very cost effective for specific applications, they represent only the mere beginning of a vast scope of possible applications. As technology develops in speaker independent continuous speech recognition systems and natural language systems and natural language understandings, the possibilities are almost limitless. A central point of speech recognition is Al-based system architecture such as HEARSAY-II.



Data Entry is the primary industrial application of voice recognition systems at the present time. Voice data input quality control systems are reported to create six-figure savings



[Ch. 10

at Continental Can Company. One operation at their St. Louis plant requires quality control inspectors to measure pull-ring can lids. With Threshold’s QC System, workers’ hands are always free to inspect the lids while speech communicates data directly to a central computer, avoiding the inefficiencies of manual record keeping and subsequent data entry operations. The QC System prompts inspectors through correct data entry sequences and immediately identifies out-of-spec lids. Throughput has increased 40%, and manufacturing costs have dropped because of faster detection of faulty production runs. At United Parcel Service distibution terminals, many thousands of incoming packages must be quickly and efficiently routed to their destination every day. Before installing Threshold’s voice input systems, three men at each receiving platform were required to handle the packages — two to unload and one to route. Now, one worker handles both operations. As he unloads a package his_ voice command encodes the proper destination into the sorting machine. The package is then placed on a conveyor for rapid and efficient sorting. Voice commands are transmitted through either a wired microphone headset or a compact, mobile radio transmitter. Either way, immediate verification of routeing instructions is provided by a large, easily read display. A completely host-controllable speech recognition unit, the Threshold 500 Voice Terminal, allows control of the training process. Operator prompting, storage of individual speaker reference data sets, and interpretation of work output codes to be performed by the host computer, allow the low cost 500s recognition capability to be increased to an available 370 words. A 16-character alphanumeric readout displays spoken entries for verification and prompt messages for operator convenience. When operators require complete mobility, such as in inventory applications, wireless radio input is available with all Threshold Voice Terminals. Voice boosts productivity by 25% because it is 17.5% faster and 64.6% more accurate than keying, according to recent tests conducted by the Navy at its Postgraduate School in Monterey, CA. Even though study participants were skilled typists and had been introduced to the Threshold System only three hours before testing, operators substantially increased productivity when they used their voices to enter data.



The programming of NC machines is another application of voice recognition systems. In common everyday shop language, the programmer simply speaks commands into the systems in the same logical order as if he were preparing a written machining procedure. Each surface of the machined part is defined by a series of words and numbers which are verbal responses to questions that appear on a display terminal. All other functions required to process the tape are performed automati­ cally. Tape production time is cut by 80 to 90%, and costs are half those of the timeshared programming. Maul Brothers, the largest US maker of automatic bottle manufacturing machi­ nery, has dramatically cut NC programming time by using Voice Numerical Control System. The average bottle making machine has over 7000 parts — nearly all made


Ch. 10]


in-house at Maul, and nearly all requiring machining. ‘Now that they're producing machine tool tapes by voice, per piece programming time has fallen by 65 to more than 90%, depending on the part's complexity.’ One part that took six hours to program manually took less than half an hour by voice. Maul has 18 NC tools, all of which had to be manually programmed before the switch to voice was made.



Speech recognition is now used in a robotic inspection of semiconductor wafers. Sometimes, the orientation of the wafer does not lend itself to easy recognition, thus, requiring assistance from the operator. Looking through a microscope, hands and eyes busy, the operator uses voice commands to orient the wafer correctly and perhaps adjust its location in the inspection station. He then turns over the remaining steps in the program to a robot.


Speech recognition technology has matured in the past few years, yet the perfor­ mance of the speech recognition devices varies greatly, depending on the talker, the vocabulary, the environment, and numerous other more subtle factors. Recognition systems extending to speaker independence, connected speech, and complex voca­ bulary have still significant error rates. This is particularly striking when we compare device performance with human achievement. Human recognition error rates for digits, for example, are still orders of magnitude better than those of the best available speech recognition devices. Successful development of this technology will require a great deal of sustained research in many areas. Over the last three decades, our understanding of the acoustic properties of speech sound has advanced significantly, but we must arrive at a better quantitative understanding of the contextual influences on the properties of speech sounds. This will require the establishment of a large speech database, with the associated computational resources. Furthermore, we must study the capabilities of the human auditory system to learn how the signal is analyzed and how a specific phonetic contrast is encoded, and apply this knowledge to machine recognition of speech. In addition to research in acoustic analysis and auditory perception, work in a number of other areas will play a major role in advancing speech recognition technology. These areas include natural language understanding, parallel comput­ ing, knowledge representation, and human factors engineering.


Bradbeer, R., DeBono, P., & Laurie, P. (1982) The beginners guide to computers, Addison-Wesley: Reading, Mass. Catalano, F. (1983) ‘Data collection devices play key role in automated factories’, Mini-Micro Systems, June, pp. 113-120.



(Ch. 10

Elphick, M. (1982) ‘Unraveling the mysteries of speech recognition’. H ig h T e c h n o ­ lo g y . , March/April, pp. 71-78. Erman, L. D., Hayes-Roth, F., Lesser, V. R., & Reddy, D. R. (1980) ‘The Hearsay11 speech understanding system: integrating knowledge to resolve uncertainty’, C o m p u t in g S u r v e y s , 12, (2), pp. 213-253. Freedman, D. H. & Friedman, Roy (1983) ‘Bar-code and voice recognition ease data-entry problems’, M in i- M ic r o S y s t e m s , June, pp. 239-246. Greene, A. M. (1982) ‘Speech technology: the voice of tomorrows factories?’. I r o n A g e , 21, May, pp. 92-97. Information provided by Threshold Technology, Inc., Delran, N.J. LaBrecque, M. (1983) ‘The tantalizing quest for speech recognition computers’, P o p u la r S c ie n c e , July, pp. 61-63. Levas, A., & Selfridge, M. (1983) ‘Voice communications with robots’, P r o c e e d i n g s o f th e 13th I n te r n a tio n a l S y m p o s iu m o n I n d u s tr ia l R o b o t s a n d R o b o t s , 7 SME, 12-79 to 12-83. Lowerre, B. T. (1976) ‘The HARPY speech recognition systems’, Ph.D. thesis, Computer Science Dep., Carnegie-Mellon Univ., Pittsburgh, PA. Lowerre, B. T. & Reddy, R. (1980) ‘The HARPY speech understanding system’, In: T r e n d s in s p e e c h r e c o g n i ti o n , W. A. Lea, ed., Prentice-Hall, Englewood Cliffs, N.J., Chapter 15. Medress, M. F., Cooper, F. S., Forgie, J. W., Green, C. C., Klatt, D. H., O'Malley, M. H., Nwuberg, E. P., Newell, A. Reddy, D. R., Ritea, R., Shoup-Hummel, J. E., Walker, D. E. & Woods, W. A. (1978) ‘Speech understanding systems: Report of a steering committee’, A r tif ic ia l I n te llig e n c e , 9, pp. 307-316. Newell, A., Barnett, J., Forgie, J., Green, C., Klatt, D. Licklider, J. C. R., Munston, J., Reddy, R. & Woods, W. (1973) S p e e c h u n d e r s ta n d in g s y s t e m s , Final report of the study group, North Holland: Amsterdam. Schadewald, R. (1983) ‘The speech gap’, T e c h n o lo g y I l lu s t r a t e d , June, pp. 55-59. Stauffer, R. N. (1982) ‘Voice programming: robots next?’, R o b o t i c s T o d a y Febru­ ary, pp. 30-31. Walker, D. E. (1980) ‘SRI research on speech understanding’. In: T r e n d s in s p e e c h r e c o g n i ti o n , Lea, W. A. (ed.), Prentice Hall, Englewood Cliffs, N.J., Chapter 13. Walker, D. E. (ed.) (1978) U n d e r s ta n d in g s p o k e n la n g u a g e , Elsevier North-Holland, New York. Warren, J. R. (ed.) (1982) I n d u s tr ia l r o b o t s in te r n a tio n a l, 3(16). Weiner, J. M. (1977) R o b o ti c s c o n tr o l u s in g is o la t e d w o r d r e c o g n itio n o f v o ic e in p u t. Jet Propulsion Laboratory, JPL Publication, pp. 77-73. Wolf, J. J. & Woods, W. A. (1980) ‘The HWIM speech understanding system', In: T r e n d s in s p e e c h r e c o g n itio n . Lea, W. A .(ed.), Prentice-Hall: Englewood Cliffs, N. J., Chapter 14. Woods, W., Bates, M., Brown, G., Bruce, B., Cook, C., Klovstad, J., Makhoul, J., Nash-Webber, B., Schartx, R., Wold, J. & Zue, V. (1976) S p e e c h u n d e r s ta n d in g s y s te m s ', f i n a l te c h n ic a l p r o g r e s s r e p o r t, Tech. Rep. 3438, Bolt Beranek and Newman, Cambridge, Mass, (in five volumes).

11 Al and robotics Artificial Intelligence is the area which needs most to be developed and mastered to accelerate robot evolution. Birk & Kelley (1982) state that an intelligent robot is one capable of: • • • • •

receiving communication understanding its environment by the use of models formulating plans executing plans monitoring its operation.

Some researchers do not distinguish between Artificial Intelligence and Robo­ tics; instead, a unified model that encompasses both is used. An intelligent robot should be able to think, sense, and effect. Thinking is primarily a brain function. Sensing (seeing and touching) and effecting (moving and manipulating) are primarily body functions. The thinking function executed by a computer is the domain of Artificial Intelligence. Sensing and effecting are based on physics, mechanical engineering, electrical engineering, and computer science. Planning and execution of tasks entail both brain and body functions and are the concern of both artificial intelligence and robotics.



The field of industrial robots began in 1951 with a patent by George C. Devol for a 'programmable artificial transfer’. The first industrial robot, a Unimate from Unimation, Inc., was installed in 1961 at a General Motors plant. The task was unloading a casting machine. All initial installations were fostered by Joseph Engelberger, founder of Unimation, Inc., who is considered the ‘father of industrial robotics’. Unimation’s robots were based upon the concepts of George C. Devol, who is considered the ‘inventor of the industrial robots. Shakey

In 1968 a robot called Shakey was built at Stanford Research Institute (SRI) International. This system allowed a computer-controlled mobile robot equipped with a TV camera, to navigate from room to room through doorways and around obstacles. It used an automatic plan generating system called STRIPS, and had rudimentary abilities to store plans to use as components of more complex plans later.



[Ch. 11

Shakey showed that the interactions between an intelligent organism and its environment was very complicated and was beset by all kinds of practical problems, such as low batteries and the accumulation of errors as one tries to move from one position to some other position that one calculate from available data. The develop­ ment of Shakey (actually two versions) was dropped, possibly because no military application could be found for the technology (funding was coming from the US Defense Department). Mobile robots in space exploration

Roving robots for the NASA space program were developed in the early 1970s by the Jet Propulsion Laboratory (JPL) and Martin Marietta Corporation. The Viking Lander remote space vehicle is considered to be the most complex robot ever constructed, and has proved remarkably dependable. The Lander houses numerous experiments, all conducted under automatic control. Viking II landed on Mars on 3 September 1976. The Viking Lander was developed by Martin Marietta Corporation. The JPL took up robotics and vision in the attempt to determine if life existed on Mars. As part of the Mars landing program, JPL developed and demonstrated the use of computer vision in tracking a target and controlling a robot manipulator arm, causing it to grapple the target, such as a sample of rock or soil. Capable of operating under local autonomous computer control and remote human supervision, this laboratory rover was equipped with stereo imaging, an ranging laser scanner, and an experimental manipulator arm.

Mobile robots

In the 1970s, research in mobile robotics was largely limited to wheel driven robots, such as the SRI vehicle. Robot control system

Current non-AI robot controllers may utilize a PC or minicomputer. Programmable controllers (PC) introduced into industry over ten years ago, provided a micropro­ cessor-based robotic controller which is easy to reprogram. The controller primarily serves to direct the sequence of robot motion, stop points, gripper actions, and velocity. When control beyond a PC is required, a minicomputer may control the entire system, including other programmable machinery in a robot work cell. While PC’s are limited in their programming, minicomputers may utilize a special robot programming language or standard language (such as BASIC or PASCAL) for more advanced off-line programming or CAD/CAM interface. Minicomputer type robot controllers became commercially available in about 1980. These controllers now allow integration with vision or tactile sensors. The next generation of robot controllers may have artificial intelligence capabili­ ties. Some rudimentary efforts have already been made in this direction, with some Al-type algorithms (such as for bin picking and such routines for gripper positioning) now available.

Sec. 11.2]





Applications needing greater precision, complex control, and sensory capabilities began to be considered after the development of the PUMA (Programmable Universal Machine for Automation) by Unimation, Inc. The first PUMA robot was shipped to General Motors in 1978. In the late 1970s and early 1980s, several significant commercial robotics products with vision and intelligence capabilities were introduced. Some of the most impor­ tant developments were: • In 1978, Machine Intelligence Corporation, founded by researchers from SRI International, introduced the first machine vision system, the Model-VS-100. The system was used for inspection and with Unimate PUMA robots. • In 1978, a CONSIGHT vision system was developed by General Motors. The system was used with a Cincinnati Milacron T3 robot for sorting castings. • In 1981, CRC Welding Systems, Inc. introduced an intelligent seam tracking system for robots at welding. The ‘THRUARC2’ was originally developed by George E. Cook at Vanderbilt University. • In 1982, Object Recognition Systems, Inc. introduced the ‘i-bot-1’ vision system at Robot 6, with the capability for bin picking (acquiring parts randomly placed in a bin). They represented the first commercial robotic system which performed an Artificial Intelligence type of task. • In 1982, IBM Corporation, Control Automation, Intelletex, and others intro­ duced robot models with extreme precision and vision/tactile capabilities for use in the electronics industry for tasks such as insertion of PCB components. • In 1983 at Robot 7, Automatix Inc. introduced the Statistical Process Control software package, used in conjunction with their Autovision IV system. This was the first robotic system with capabilities for decision making control of an industrial process. • In 1983, Lord Corporation and Barry Wright Corporation introduced advanced tactile sensor pads for robot grippers at Robot 7. Sometime, perhaps around the turn of the century, robot technology will develop to the degree necessary to produce the totally automated factory. In such factories robots will perform most, if not all, of the operations that now require human skills. There will be totally automatic inventory and tool management, automatic machin­ ing, assembly, finishing, and inspection systems. Automatic factories will even be able to reproduce themselves. That is, automatic factories will make the components for other automatic factories. Once this occurs, productivity improvements will propagate from generation to generation. Each generation of machines will produce machines less expensive and sophisticated than themselves. This will bring an exponential decline in the cost of robots and automatic factories which may equal the cost/performance record of the computer industry. Eventually, products produced in automatic factories may cost only slightly more than the raw materials and energy from which they are made. .> To date, AI researchers have been working on ways to add intelligence to these machines in the form of vision, tactile sensing, planning, and learning. Even now, commercially available robots are capable of simple vision! Robovi-



[Ch. 11

sion from Automatix of Billerica, Mass., for instance, is a vision-guided, arc welding system. Simple tactile sensors are also available. The refinement of advanced sensors and the incorporation of other Al technolo­ gies will eventually render robots capable of performing more sophisticated tasks. If robots had the ability to respond to natural language commands, they would be easier to control. Increased ability to process and respond to sensor and vision input would allows robots to manipulate objects more effectively and move more success­ fully through the environment. It is hoped that Al techniques will eventually enable robots to ‘learn’, to solve problems based on changing needs in their task environ­ ments, and to plan how to accomplish tasks. These skills would make them useful in environments less structured than those in which they are used today. In summary, what we see emerging are robots with increasing intelligence, sensory capability, and dexterity. Initially, we will see an increasing use of off-line programming of computer controlled robots, using improved robot command languages. Provision will be made to include the role of sensors, such as vision and touch, in this programming. Later, self-planning will emerge as higher and more general commands are given to the robot. At this point, the marriage of robotics and artificial intelligence will be virtually complete. At the same time as all this is taking place, robotic hands will emerge. Also emerging will be robots with coordinated multiple arms and eventually even legs, supported by even more sophisticated control systems. As this evolution progresses, information and intelligence will become the dominant factor in robotics, with the maipulator devices and sensors shrinking in importance to the skeleton that undergirds this dominating ‘ghost in the machine.’ REFERENCES

Birk, J. R. & Kelley, R. B. (1981) ‘An overview of the basic research needed to advance the state of knowledge in robotics’, I E E E T r a n s a c tio n s S y s t e m s M a n , C y b e r n e ti c s , SMC-11, No. 8 pp. 575-579. Brown, D. R. e t a l . (1982) R & D p la n f o r a r m y a p p li c a ti o n s o f A l l R o b o t i c s , Defense Logistic Agency. Cunningham, R., Gennery, D. & Kan, E. (1983) ‘From outer space to factory floor’, C o m p u t e r s In M e c h a n ic a l E n g in e e r in g , April, p. 9. Gevarter, William B. (1982) A n o v e r v i e w o f a r tific ia l in te llig e n c e a n d r o b o t i c s , volume II — Robotics, National Bureau of Standards, NBSIR 82-2479. Joslin, Charles (ed.) (1983) ‘Better brains, sensors needed to get mobile robots moving’. I n d u s tr ia l R o b o t s I n te r n a tio n a l, 8, August, p. 4. Klein, C. A. & Patterson, M. R. (1983) ‘Computer coordination of limb motion for locomotion of a multiple-armed robot for space assembly', I E E E T r a n s a c tio n s o n S y s t e m s , M a n , a n d C y b e r n e ti c s , SMS-12, No. 6. Lewis, R. A. & Johnston, A. R. (1977) A s c a n n in g la s e r r a n g e f in d e r f o r a r o b o t i c v e h ic le , Jet Propulsion Laboratory Technical Memorandum No. 33-809. Plantier, M. & Bodmer, R. e t a l (1981) T e le o p e r a tio n a n d a u to m a tio n ', a s u r v e y o f E u r o p e a n e x p e r tis e a p p li c a b le to d o c k in g a n d a s s e m b l y in s p a c e , ESTEC Contract No. 4402/80/NL/AK (SC), Geneva: EUROSTAT, S.A., May.

Ch. 11]



Raibert, M. H. & Sutherland, I. E. (1983) ‘Machines that walk’, Scientific American, 248 No. 1, January, pp. 44—53. Stangaard, A. C. Jr. (1987) Robotics and AI, Prentice Hall: Englewood Cliffs, NJ.

12 Automatic programming Programs for computers have grown so complex that they may be close to the limit of human capability. The programs used in A1 work are themselves among the most complex ever written. As indicated elsewhere, one program for the synthesis of organic chemicals represents about 60 man-years of work and is still being refined. Much of the research on AI is still in the form of theory because the effort to turn it into usable programs is beyond the available resources. In science and engineering projects that use computers, the programming effort today alone often exceeds 70% of the total engineering costs. This situation has developed into what has come to be called 'the software crisis’. Computer programs designed to deal with real problems are often extremely complex and sometimes incomprehensiblet to anyone except the person who wrote the program. Even an expert programmer will have difficulty understanding or trying to fix a complex existing program, even one he may have written himself. Program­ ming languages, even the most recent high level types, are not very much like ordinary English. Further, a program represents a complex procedure that typically has loops, loops within loops, subroutines, subroutines that can call other subrou­ tines, and even call themselves on occasion. One result of such complexity is that most programs contain bugs — that is, they do not do precisely what the programmer designed them to do. They may work well most of the time, but something the programmer never thought of can completely derail them. Most bugs are discovered and corrected before a program is released for use by non-programmers, but most complex programs probably contain strange obscure bugs that may never come to light. Thus program development entails not only writing the program but verifying that it actually does what it is supposed to do, and does not do things it should not do. Program verification is not a trivial task, and is probably impossible in certain cases. Finally, the program must be compatible with the computer it is going to run on. It must not require more memory than is actually available, it must not use instructions the computer cannot execute, etc. The first computers were programmed in machine language — Is and Os. This gave way to higher level languages such as FORTRAN, COBOL, BASIC, PAS­ CAL, and others, including LISP, the dominant language in AI research. None of these languages is very much like English. Recent work on easing the programming task includes the development of special programs that can either write programs t

Unless designed and documented according to rigorous engineering standards.

Ch. 12]



themselves or can help programmers do the job better and faster. Program synthesis by computer or automatic programming has made impressive strides in recent years, but apparently has much farther to go before it will become widely used. Automatic programming systems must have a way to state the problem that the end program is to solve. This is known as a specification method. Preferably the specification method would be in a natural language — such as English. This method is in use but is limited by the computer’s ability to comprehend the message. The specification requires the use of a very high level programming language. This method is known as a formal specification and in general must be a complete statement of the problem since there is little or no interaction between programmer and programming system. A third method of specification is by the use of examples. In this case the programming system must deduce the underlying framework of the solution from the examples. Automatic programming systems are usually designed to handle only certain types of problems. Some programs with limited problem domains include the NLPQ (Natural Language Programming for Queuing Simulations) whose problem solving area is simple queuing problems. The PROTOSYSTEM program handles ‘input/ output intensive data processing systems, including inventory control, payroll, and other record keeping systems' (Bidoit et al. 1979). The problem domain of the PSI system is ‘symbolic computation’, including list processing, searching, and sorting, data storage and retrieval, and concept formation (Elschlager et al. 1981). The Al techniques used in automatic programming are extremely varied, and include ‘theorem proving, program formation, knowledge engineering, automatic data selection, traditional problem solving, and induction’. All these techniques have been subjects of intense activity and many papers in Al research. The idea of program debugging has been found particularly useful in automatic programming. Nilsson (1980) writes: One of the important contributions of research in automatic programming has been the notion of debugging as a problem solving strategy. It has been found that it is often more efficient to produce an inexpensive, errorful solution to a programming or robot control problem and then modify it (to make it work correctly) than to insist on a first solution completely free of errors. Automatic programming may also make it possible for non-programmers to use computers to solve problems. Automatic code generation is one aspect of the task. Another is the development of intelligent programming aids, such as an automated help facility that will respond to a user’s natural language query with an answer or by carrying out the user’s request. An advantage of automatic program synthesis is that the programmer can express the purpose of the program in specifications that are close to the way he or she thinks without having to make them computable. Another advantage is that maintenance need not be done on the source program but at a higher level — on the specification. This simplifies the user’s maintenance task. The user must first make a few choices about how the program should be implemented, but then can leave the coding up to the automatic programming system. In addition to freedom from clerical errors, increased opportunity for



[Ch. 12

optimization, and better documentation, automatic programming makes possible a library of re-usable software, and a library of specifications. 12.1


The Programmer’s Apprentice (PA) was developed at MIT by Charles Rich, Howard Shrobe, and Richard Waters. The program is designed to help a program­ mer write a program, not write it for him, since the PA cannot write complete programs on its own. In this respect it is somewhat similar to the expert consultation systems used in AI, medicine and other areas. The idea behind the PA is that the human programmer will do the hard parts of the program design and implementations while the program ‘...w ill act as a junior partner and critic, keeping track of details and assisting the programmer in the documentation, verification, debugging, and modification of his program. In order to cooperate with the programmer in this fashion, the PA must be able to ‘understand’ what is going on. From the point of view of AI, the central development of the Programmer’s Apprentice project has been the design of a representation (called a ‘plan’) for programs and for knowledge about programming that serves as the basis for ‘understanding’. Developing and reasoning about plans is the central activity of the PA (Nilsson 1980). What is a plan? As used by the PA, a plan represents a network of operations between data flow. In essence, the plan expresses the logical interrelationships in a program. The plan is structured as a hierarchy of program segments within segments. The structure of each segment is specified by the plan. However, some segments can be represented by more than one plan; in this case each plan provides a different point of view of the segment. The program also has knowledge about programming. The PA contains many common algorithms and many ways to structure data. In a typical scenario worked out for the PA, the human programmer wants to modify an existing program by deleting an entry from a hash table. First, the programmer tells the PA (in a high level but not natural language) that the plan for a routine called DELETE has three steps, and he then specifies what each step does. The PA thereupon constructs a plan that represents the program's structure. However, the PA finds it cannot verify that the program will do what the program­ mer wants. It therefore reports back to the programmer the nature of the difficulty (the bug). The programmer therefore must modify the plan so that the PA can verify its correctness. Once this is done, the programmer usually will ask the PA to write the code for the program so far developed. On examining this code, the PA may discover an implementation bug. If so, this is again reported back to the programmer. In certain cases the PA will even suggest what or where the bug actually is. Program development therefore proceeds in the above interactive manner. The PA may understand some parts of the program being developed well enough to proceed on its own, using its own stored information on algorithms and data handling. Any parts of the program it does not understand, it turns back to the programmer.

Sec. 12.2]



Rich & Waters (1982) believe that the PA will allow a person to build a program from cliches, fragments corresponding to common algorithms, and represented in the system by 'plans’. The current demonstration system is a knowledge-based editor that uses a small number of plans to build up a program or to modify one in terms of its logical structure. In using the system, the programmer’s job is to maintain an overall view of what must be done to accomplish the goals of the program. The editor’s job is to keep track of the details of the implementation of the existing program. An advantage of using this system is that the opportunity to make many simple errors is eliminated for both the system and the programmer because the programmer works not on code but on the plans, and only the coding module of the system must work on the level of code. The programmer can also construct programs faster by drawing from the stock of plans in the system’s library.



Array manipulation is a recurring problem in many computer programs. In a system devised at the University of Paris, M. Bidoit, C. Gresse, and G. Guicho use a technique that allows the specification (what the program is to do) to be stated in a high level language. The high level language program is non-compilable; that is, it cannot be manipulated by a compiler program into codes that run on a computer. The automatic programming system transforms the high level specification into an equivalent language that can be compiled. In accomplishing this transformation, ‘the system uses heuristics and rewriting rules to eliminate non-compilable terms which occur in the specification language'. Then matching methods and generalizations are used to create the compilable program. The system was implemented on a MIRTA 125, a small computer, and generated a large number of array manipulation programs. As examples of what the system can do, the following array manipulation programs were synthesized: • • • • • • • • • • • • •

finding the maximum element of an array testing to see if an array is sorted testing if a number is less than every element of an array testing if an array contains duplicates testing if every element of an array is less than every element of another inserting an element in an array reversing an array finding an element equal to its subscript finding the kxh (in increasing order) element computing the number of elements greater than a particular one testing if two arrays are identical computing the intersection of two arrays finding a certain subscript between certain limits with certain properties.

All sample programs shown above were generated in less than one minute of computer time. The developers cite three major reasons for the system's efficiency:



[Ch. 12

(1) The system has a thorough knowledge of the domain, and so backtracking is greatly reduced. (2) It uses a very powerful first order matching algorithm (section 9.1). (3) The simplicity of the theorem prover makes it very efficient.


Bidoit, M., Gresse, C. & Guiho, G. (1979) ‘A system which synthesizes array manipulating programs from specifications’, I J C A I - 1 9 , 1. Elschlager, R. S. & Phillips, J. (1981) ‘Automatic programming’, In: H a n d b o o k o f A r ti f i c ia l I n te llig e n c e (vol 1), Barr, A. & Feigenbaum, E. A., William Kaufmann, Inc. Nilsson, N. J. (1980) P r in c ip le s o f A r tif ic ia l I n te llig e n c e , William Kaufmann, Inc. Waters, R. C. (1982) ‘The programmer’s apprentice: knowledge based program editing’, I E E E T r a n s a c tio n s o n S o f t w a r e E n g in e e r in g , SE-8.

13 Intelligent Decision Support Systems An intelligent decision support system (IDSS) is a computer-based interactive tool of decision making for well-structured decision and planning situations that uses jointly decision-theoretic methods and expert system techniques, and provides access to structured data bases. From the particular view of expert systems an IDSS is a model-based expert system, underlying a decision-theoretic model imposing a normative (prescriptive) structure of decision making. In the past few years there has been substantial attention devoted to the use of artificial intelligence (AI) techniques, most com­ monly rule-based expert systems, as tools for decision support. These systems typically use production rules to develop a diagnosis of a disease or a system malfunctioning, for example. Given the diagnosis, the system generates a recommended solution to the problem. The solution may be a drug therapy in a medical domain or a set of parts to replace in a trouble shooting application. Rule based techniques have proven to be very attractive for a variety of problems, particularly those which have fairly well structured (though possibly large) problem spaces, which can be solved through the use of heuristic methods or rules of thumb, and are currently solved by human experts. In these domains the reasoning and explanation capabilities offered by rule based expert systems are very effective. A rule-based approach tends to break down when applied to more difficult problems or problems that require a normative, prescriptive structure for decision and inference purposes, in particular, relating to the following situations: (i) there is substantial uncertainty on various levels of decision making; (ii) the preferred solution is sensitive to the specific preferences and desires of one or several decision makers; (iii) problems of rationality and behavioural coherence are intrinsic concerns of decision systems. In established fields such as operations research and management science we have been developing methods for allocating resources under various conditions of time, uncertainty and rationality constraints. Central to these methods is the existence of an objective or utility function, as an indicator of the desirability of various outcomes. We will draw on this body of knowledge, especially elements related to the normative use of individual and group decision theory to approach difficult decision problems. Decision making is best viewed as a process of making a series of related,



[Ch. 13

incremental observations, judgements, and decisions. In some cases simple, deter­ ministic relations such as those used in many rule-based expert systems can be used for informed decision making, while in others explicit treatment of uncertainty of vagueness in the domain and the objectives of the decision maker are needed. Therefore, for a single complex decision situation it may be desirable to combine deterministic reasoning with uncertainty and decision theoretic calculi in the course of exploring the decision situation, as our understanding and insight into a problem evolves.



We start out from recent efforts to design computer systems for decision support based on decision theory (Holtzman 1985, Shachter 1986). The basic result of the axioms of decision theory is the existence of a value function for scoring alternative sets of outcomes under certainty and a utility function for scoring uncertain outcome bundles. If the decision maker accepts the axioms (say. Savage’s axioms; Savage 1954, Gottinger 1980) in the sense that he would like his decision making to be consistent with these axioms, then the decision maker should choose that course of action which maximizes expected utility. The impor­ tance of these axioms is that encoding decision procedures based on these axioms provide a basis for recommendations by an intelligent decision aid under uncer­ tainty. They provide an explicit set of norms by which the system will behave. Other authors have argued why an individual should accept the decision axioms for decision making. The acceptance of these axioms is implicit in the philosophy and design of decision methods described here. In addition, an approach to decision making based on decision theory has a mechanism, at least in principle, for handling completely new decision situations. The theory ensures the existence of a value and utility function. If the current expression of the preferences in the system does not incorporate the attributes of a new decision situation, the system can resort to the construction of a higher level or more general preference structure. By following these principles we are able to use the richness of modern decision theory and their axiomatic foundation (Fishburn 1988). Domains in which there is a well developed empirical and theoretical basis for development of utility functions (e.g. financial and engineering decision making and some areas in medicine) are most promising. Thus the decision axioms, along with the fundamentals of first order logic, provide a normative basis for reasoning about decisions. It is in this light that both logical and probabilistic inference will be utilized in an intelligent decision system.



For decision making, a model consists of the following elements (1) alternatives, (2) state descriptions,

Sec. 13.4]



(3) relationships, and (4) preferences. There can be no decision without alternatives, the set of distinct resource allocations from which the decision maker can choose. Each alternative must be clearly defined. State descriptions are essentially collections of concepts with which the decision is framed. It includes the decision alternatives and the outcomes which are related to the choices. The state description forms the means of characterizing the choice and outcome involved in a decision. The state description is also intertwined with expression of relationships. Relationships are simply the mappings of belief in some elements of the state description to others. The relations could be represented as logic relations, if-then rules, mathematical equations, or conditional probability distributions. The final component of a decision model is preferences. These are the decision maker’s rankings in terms of desirability for various possible outcomes. They include not only his rankings in terms of the various outcomes which may occur in a decision situation, but also his attitude toward risky outcomes and preferences for outcomes which may occur at various times. They also embody information identifying those factors in a decision situation that are of concern, whether a factor indicates a desirable or undesirable outcome, and how to make tradeoffs among alternative collections of outcomes.



As a computationally convenient way for a decision model based representation we deal with influence diagrams. Influence diagrams are network depictions of decision situations (Howard & Matheson 1981). Until recently, their primary use has been in the professional practice of decision analysis as a means of eliciting and communicat­ ing the structure of decision problems. Each node in the diagram represents a variable or decision alternative; links between nodes connote some type of ‘influence’. Decision makers and experts in a given domain can view a graphical display of the diagram, and readily apprehend the overall structure and nature of dependencies depicted in the graph. Recently, there has been additional attention devoted to influence diagrams based on their uses in providing a complete mathema­ tical description of a decision problem and as representations for computation. In addition to representing the general structure of a decision model, information characterizing the nature and content of particular links is attached to the diagram (Howard & Matheson 1981). The diagram then presents a precise and complete specification of a decision maker’s preferences, probability assessments, decision alternatives, and states of information. In addition the diagrammatic representations can be directly manipulated to generate decision-theoretic recommendations and to perform probabilistic inference. The formalism of Bayes networks (Pearl 1988) are identical graphical constructs which express probabilistic dependencies (no prefer­ ences or decisions). Following the notation of Shachter (1986) we define the syntax and semantics of influence diagrams. Definition — An influence diagram is an acyclic directed graph G = (N,A) consisting of a set, N, of nodes and a set, A, of arcs. The set of nodes, N, is partitioned into



[Ch. 13

subsets V, C, and D. There is one value node in V, representing the objective of the decision maker. Nodes in C, the chance nodes, represent uncertain outcomes. Nodes in D, the decision nodes, represent the choices or alternatives facing the decision maker. A simple diagram appears in Fig. 13.1. By convention, the value node is drawn as a diamond, chance nodes are drawn as circles, and decision nodes are drawn as rectangles.

Fig. 13.1 — A Simple Influence Diagram. V is the value node, the proposition which embodies the objective to be maximized in solving the decision problem. Cl and C2 represent uncertainties, and D represents the decision. The semantics of arcs in the graph depend on the type of the destination node. Arcs into value or chance nodes denote probabilistic depen­ dence. These arcs will be referred to as probabilistic links. Arcs terminating in decisions indicate the state of information at the time a decision is made. Thus, Cl is an uncertainty which is probabilistically influenced (conditioned) by C2 and the decision. The ultimate outcome V, depends on the decision D and C2. Definition — Each node’s label is a restricted proposition, a proposition of the form

(p f, t2 ■■■t„) where each t, is either an object constant or alternative set. We now define a set Q(z) and a mapping n, for each node. Definition — The set Q(/) is the outcome setfor the proposition represented by node i.

It is a set of mutually exclusive and collectively exhaustive outcomes for the proposition. Definition — The predecessors of a node i are the set of nodes j with arcs from j to i. Definition — The successors of a node i are the set of nodes j such that there is an arc from i and j. The mapping jt,- depends on node type. The domain of each mapping is the cross product of the outcome sets of the predecessors of node i. Let the cross product of predecessors of i be CP(z') where

CP(z') = (£2(fl) x Q(i2)... x £2(m)|nodes /1 ,.. .,in e predecessors of node z}


Sec. 13.4]


The range of each mapping it, depends on the type of node i. Each is discussed in turn. The value node

The value node expresses the decision maker’s relative valuation of different possible combinations of outcomes for its predecessors. Since we require a cardinal measure for expected value calculations, the outcome set of value nodes has some restrictions. In terms of the previous definition of the outcomes set, the set Q(i) for the value node is:

fi (0 = {«'©l®e {{*,/*! 1 } W * i2} ......{ x J X XK}}} where X t, . . . X 1KeR Thus, there is only one restricted variable, Jt,, and its values, the ,¥1Ks, are real valued. We can therefore associate with each member of Q(i) exactly one real number. The value function it, for a value node is defined as follows: it,: CP(/)—»alternative set of x x{ Xu , X l2, ■. - , XlK} T his function maps each combination of outcomes for the predecessors into a single real number. T his will be used to express the expected value or the expected utility as a function of the outcomes of the predecessor nodes. Chance nodes

Chance nodes represent uncertain propositions that are not directly controlled by the decision maker. The members of Q(i) are the possible outcomes for the proposition. There are two types of chance nodes. A stochastic chance node admits uncertainty regarding the outcome of the proposition given the values of its predecessors. In this case the mapping it, is a conditional probability density function. it,: CP(i) x Q(i)—>[0,1] in probabilistic terms 7t,(a),|a>,|,co,2, ... co,„). If node i has no predecessors, then it,(to,) is a prior (unconditional) probability distribution. The other type of chance node is a deterministic chance node. The outcome of a deterministic chance node is a deterministic function of the outcomes of its pre­ decessors. In this case, the mapping it, is defined as follows. it,: CP(i')-»£2(i) Thus, given the values of its predecessors, there is no uncertainty regarding the outcome of the proposition. If node i has no predecessors, the rc,( ) is constant.


[Ch. 13


Decision nodes

Decision nodes represent propositions which are under the direct control of the decision maker. The members of Q(/) are the alternative outcomes from which the decision maker can choose. At the time the decision is made, the decision maker knows the outcomes of the predecessors of i. The mapping n ,•therefore expresses the optimal decision choice as a function of what is known at the time the decision is made. n :: C P ( i ) - * f t ( 0

The mapping is calculated in the course of manipulating an influence diagram and the associated maximization of expected value. Though the construct is similar to that of the mapping for a deterministic chance node, it differs in that it is the result of an optimization. For this reason we define a special construct. Definition — A

d e c is io n f u n c t io n n d

l is a function

7t,/ (: C P ( t ' ) - > f i ( 0

determined as the result of an optimization (see

T r a n s f o r m a t io n s ) .


An influence diagram is said to be a d e c is io n n e tw o r k if (1) it has at least one node, and (2) if there is a directed path which contains all the decision nodes (Olmsted 1984, Howard & Matheson 1981). The second condition implies that there is a time ordering to the decisions, consistent with the use of an influence diagram to represent the decision problem for an individual. Furthermore, arcs may be added to the diagram so that the choices made for any decision are known at the time any subsequent decision is made. These are ‘no-forgetting’ arcs, in that they imply the decision maker (1) remembers all of his previous selections for decisions, and (2) has not forgotten anything that was known at the time of a previous decision. The language of influence diagrams is a clear and computable representation for a wide range of complex and uncertain decision situations. The structure of dependen­ cies (and lack thereof) is explicit in the linkages of the graph, as are the states of information available at each state in a sequence of decisions. The power of the representation lies, in large part, in the ability to manipulate the diagram to either (1) express an alternative expansion of a joint probability distribution underlying a particular model, or (2) to generate decision recommendations. The basic transfor­ mations of the diagram required to perform these operations are n o d e r e m o v a l and a r c r e v e r s a l. These operations will be illustrated and defined with respect to a generic set of node labels: i and j are chance nodes, v is the value node. The labels/?l,p2, and p 3 will in general represent groups of predecessors of i. /, or v as indicated by the figures. In the interest of simplifying the descriptions of the operations, they will be treated as individual nodes. More detailed descriptions of these operations appear in Shachter (1986) and Olmsted (1984). R e m o v a l o f a s t o c h a s ti c c h a n c e n o d e , i, which is a predecessor of a value node, v, is performed by taking conditional expectation.


Sec. 13.4]


The new expected value function for v is calculated as follows:

^ n e w .r ( t ^ p l

y ^ p 2 i^ p s )


1^old,i’( ^ p l ’^ p 2 ’^ / ; 3 ) T ( I'L

1 .(U p2)

The value nodes new predecessors are p \ , p 2 , and p 3. R e m o v a l o f a d e te r m i n is t ic c h a n c e n o d e , i , which is a predecessor to the value node, v, is performed by substitution. The picture of this process is the same as the previous case. The new expected value function for v is: rtnew.i-(c,)/>l

= Itold.i ( 7t/ ’® p 2 )'® /;2 '® p 3 )

R e m o v a l o f a s to c h a s tic c h a n c e n o d e , /, which is a predecessor to another chance node,/, is also performed by taking conditional expectation.

The new distribution for successor node j is calculated as:

7ln cw ,/(® p l> ® p 2 '® p 3 )

Ttld j (


S t ip i

) 7l;( (!)/(0^,]

S O p i)


The new predecessors of j are the predecessors of j other than /, that is, p i ,



P 3.

predecessor to the value node v is performed by maximizing expected utility. The decision node can only be removed when all of its predecessors are also precessors of the value node; that is, the choice is based the expectations for the value, given what is known.

R e m o v a l o f a d e c is io n n o d e , i,



[Ch. 13

After removal the new expected value function for v is: ttn c w ,.( ® „ 2 ) =

M ax

7IoU1-, , ( ( 0 „ t 0 p 2 )

(0l-eQ(/) The new predecessors of v are the predecessors of / which are also predecessors of v, p2 as illustrated here. Note that there may be some informational predecessors of /, for example p 1, which are not predecessors v before the removal. The values of these variables are irrelevant to the decision, since the expectation for the value is independent of their values. The optimal plicy for the decision i is: n, = argmax Ttold,r (co„co/)2) co(eQ(/) this is the calculated n,- for decision nodes. We will refer to this calculated mapping as the decision function for i, ndj(/)3) 7to ld . [ ( K > /l® p O (0p 2 )


The operations of reversal and removal allow a well formed influence diagram to be transformed into another ‘equivalent’ diagram. The original and the transformed diagrams are equivalent in two senses. First, the underlying joint probability

Sec. 13.4]



distribution and state of information associated with each is identical, since the diagram expresses alternative ways of expanding a joint distribution into a set of conditional and prior distributions (Howard & Matheson 1981). Secondly, the expectation for the value in the diagram and the sequence of recommended actions from decision node removal are invariant over these transformations (Shachter 1986, Holtzman 1987). In the next section, we focus on applying a sequence of these manipulations to obtain these recommendations. Solution procedures

On the basis of these manipulations, there exist algorithms to evaluate any well formed influence diagram (Shachter 1986). For purposes of probabilistic inference, we need two separate algorithms. In one version, which applies to well formed diagrams, evaluation consists of reducing the diagram to a single value node with no predecesSbrs, the value of which is the expected value of the decision problem assuming the optimal policy is followed. In the course of removing decisions, the optimal policy, that is, the set of decision functions 7id , associated with each decision is generated. In the other algorithm, the objective is to determine the probability distribution for a variable, as opposed to its expected value. Both versions of the algorithm are described below. Procedure EXPECTED VALUE (diagram) 1. Verify that the diagram has no cycles. 2. Add ‘no-forgetting’ arcs between decision nodes as necessary. 3. WHILE the value node has predecessors 3.1 IF there exists a deterministic chance node predecissor whose only succes­ sor is the value node, THEN Remove the deterministic chance node into the value node ELSE 3.2 IF there exists a stochastic chance node predecessor whose only successor is the value node, THEN Remove the stochastic chance node into the value node ELSE 3.3 IF there exists a decision node predecessor and all the predecessors of the value node are predecessors of the decision node, THEN Remove the decision node into the value node ELSE 3.4 BEGIN 3.4.1 Find a stochastic predecessor X to the value node that has no decision successors. 3.4.2 For each successor Sxof X such that there is no directed path from X to Sx Reverse Arc from X to Sx 3.4.3 Remove stochastic predecessor X 4. END At the conclusion of the EXPECTED VALUE procedure, the value node has no predecessors, and its single value is the expected value of the value node. Optimal decision functions are generated in the course of removing the decision nodes.



[Ch. 13

The algorithm to solve for a probability description (or lottery) for a node is as follows. Procedure PROBABILITY-DISTRIBUTION (diagram) 1. Verify that the diagram has no cycles. 2. IF the value node is deterministic, THEN convert to a probabilistic chance node with unit probability on deterministic values. 3. WHILE the value node has predecessors 3.1 IF there exists a deterministic chance node predecessor whose only successor is the value node, THEN Remove the deterministic chance node into the value node ELSE 3.2 IF there exists a stochastic chance node predecessor whose only successor is the value node, THEN Remove the stochastic chance node into the value node ELSE 3.3 IF there exists a decision node predecessor and all the predecessors of the value node are predecessors of the decision node, THEN Remove the decision node from the list of predecessor ELSE 3.4 BEGIN 3.4.1 Find a stochastic predecessor X to the value node that has no decision successors. 3.4.2 For each successor Sxof X such that there is no directed path from X to Sx Reverse Arc from X to Sx 3.4.3 Remove stochastic predecessor X 4. END The termination of this procedure is a probabilistic chance node with probabili­ ties over the alternative possible outsomes of the original value node. Note that if decision predecessors are encountered in the algorithm, the distribution will be conditioned on the possible choice of the decision variables. The procedure does not remove decision nodes or generate decision functions.


Recall the elements that are necessary to represent a decision domain — alterna­ tives, state descriptions, relationships, and preferences. We will summarize by indicating how each element of a decision description can be expressed with respect to the constructs generated above. First, recall that propositions form the basic unit of representation for a decision domain. There are three levels of knowledge regarding a proposition expressible in the language. First, it is possible to express a fact for a proposition, that is, a set of values for the variables (as in a fact substitution) in the proposition that are asserted


Sec. 13.6]


to be true with certainty. Second, the values of the variables in a proposition may be restricted to some set. Thus, the outcomes for that proposition are restricted to a collectively exhaustive, mutually exclusive set, termed the alternative outcomes. Finally, a probability distribution can be used to associate each possible outcome with a probability. We have also shown how probability distributions and outcome sets are expressed for conjunctions of propositions. Alternatives, the decision maker’s options, are expressed in the set of outcomes for a proposition which is the consequent of an informational influence. The fact that a proposition has alternative outcomes and is the consequent of an informational influence defines it as a decision proposition. State descriptions consist of the set of facts and probabilities expressed within or deducible from a domain description. Relationships between states are expressed by the various types of influences available in the language; the logic, probabilistic, and informational influences expressed for the domain. Preferences are handled by identification of a particular proposition whose outcome incorporate the decision maker’s objectives. A real valued variable in the proposition is identified as the objective — i.e. the value to be maximized or minimized. A logical influence is defined which is capable of comput­ ing this value as a function of other propositions in the domain.



This section presents a simple example, using the decision language to describe a specific subproblem in a decision domain. Consider a security trader dealing in a single instrument, perhaps a particular Treasury security issue or foreign currency. The dealers’ task is to trade continually in the instrument in order to make a profit. The traders decisions are what quantity of the security to buy or sell at each instant of the trading day. The fundamental strategy is to ‘buy low, sell high,’ which is considerably easier to write down than to execute. The dealer’s primary uncertainty is that the price of the security will be in the future. Changes in the price are dynamic and dependent on the price in previous periods as well as some other economic conditions or market factors. The trader wishes to maximize his expected profit at some terminal time (Cohen et til. 1982) The following basic decision alternatives represent the trader’sdecision to buy, sell, or do nothing (hold) in each trading period. The set of propositions for this situation is shown below along with an interpretation for each. Alternative values for restricted variables are shown in brackets {}. These propositions constitute the means of expressing state descriptions for this domain: (PROFIT profit time)

(POSITION value time)

Trader’s net profit. This is the cumulative total of all the trader’s gains and losses in terms of profits since trading was initiated. Trader’s net holding of the security. This is the cumulative total of all the trader’s sales and purchases in terms of units of the security.



(TRADE {BUY SELL HOLD) time) (PRICE {90 91 92} time) (FUTURES-EXPIRE time)



[Ch. 13

Trader’s decision alternatives. Range of security prices. This is a restric­ tion on the assumed range of prices that the instrument can adopt. Futures contract expiration. Futures are contracts for the delivery of a given secur­ ity at a future date. Standard security future contracts expire on a predeter­ mined date (e.g. the 3rd Friday in March, June, September, etc.). This proposition is true if ‘time’ occurs on a date when futures contracts mature. Indicator of activity level for futures markets. The level of activity in futures affects the levels of activity and prices in the ‘cash’ market (i.e. for current delivery) that is considered in this example. Forecast by a market prognosticator or analyst. This represents the information of some outside expert. The ‘guru’ is ‘bullish’ if he believes prices are likely to rise, and ‘bearish’ if prices are thought to fall.

We now describe the set of relationships which characterizes this domain. The trader’s profit and position are simply accounting relations, expressed as deterministic influences. We assume an initial position of zero units of the security, an initial profit of zero dollars, and a single trade quantity of 100 units. The facts (PROFIT 0.0 0)