Probabilistic Logic Programming 1 Introduction - CiteSeerX

Probabilistic Logic Programming Raymond Ng and V.S.Subrahmaniany

Department of Computer Science A. V. Williams Building University of Maryland College Park, Maryland 20742, U.S.A.

Abstract Of all scienti c investigations into reasoning with uncertainty and chance, probability theory is perhaps the best understood paradigm. Nevertheless, all studies conducted thus far into the semantics of quantitative logic programming(cf. van Emden [51], Fitting [18, 19, 20], Blair and Subrahmanian [5, 6, 49, 50], Kifer et al [29, 30, 31]) have restricted themselves to non-probabilistic semantical characterizations. In this paper, we take a few steps towards rectifying this situation. We de ne a logic programming language that is syntactically similar to the annotated logics of [5, 6], but in which the truth values are interpreted probabilistically. A probabilistic model theory and xpoint theory is developed for such programs. This probabilistic model theory satis es the requirements proposed by Fenstad [16] for a function to be called probabilistic. The logical treatment of probabilities is complicated by two facts: rst, that the connectives cannot be interpreted truth functionally when truth values are regarded as probabilities; second that negation-free de nite clause like sentences can be inconsistent when interpreted probabilistically. We address these issues here and propose a formalism for probabilistic reasoning in logic programming. To our knowledge, this is the rst probabilistic characterization of logic programming semantics. Our work is closely related to current work of Fitting [18, 19], Fagin, Halpern and Megiddo [15], van Emden [51], Kifer et al [29, 30] and Blair and Subrahmanian [5].

1 Introduction Probabilities play a central role in our understanding of the world, and in the way in which we reason about the world we are in. For instance, it is not uncommon to hear, from the Department of Health, that 94% of all people who test positive on the only existing test (called HIV) for AIDS actually have AIDS. Suppose now that an insurance company is considering an application from John Doe for health insurance. John Doe had a positive HIV test. Thus, there is a 6% chance that John Doe does not have AIDS. Nevertheless, it is unlikely that the insurance company will insure John Doe. Electronic Mail: [email protected] y Electronic Mail: [email protected]

1

This scenario has been described to demonstrate that reasoning about probabilistic and statistical information is common in many real life situations (for numerous examples on the applications of probability theory to human reasoning, see Gnedenko and Khinchin [23]). Often, such probabilistic information is used in decisions made automatically (without human intervention) by computer programs. Thus, automated reasoning systems need to know how to reason with probabilistic information. Despite the fact that quantitative logic programming has been studied intensely (cf. the works of van Emden [51], Shapiro [46], Fitting [18, 19], Blair and Subrahmanian [5, 6], Kifer et al [29, 32] and Morishita [39]), no probabilistic foundation for multivalued logic programming has been developed thus far. There is no doubt that probability theory is the most widely accepted formalism for reasoning about chance and uncertainty. As logic programs are a natural formalism for designing rule based expert systems, it is of vital importance that they have the ability to reason with probabilistic information. The main aim of this paper is to propose and semantically characterize such a logic programming language. In brief, the principal contributions of this paper are: 1. to design a logical framework within which probabilistic information can be easily expressed. This is done by extending the annotated logics introduced by Blair and Subrahmanian [5, 48, 50] to allow (i) conjunctions and disjunctions to be annotated and (ii) to allow annotations to be closed intervals of truth values. 2. to study the semantics of this language, and to clearly understand the relationships between probability theory, model theory, xpoint theory and proof theory for such languages. 3. In particular, we show that the model-theoretic framework developed here satis es the criteria proposed by Fenstad [16] for a function to be called probabilistic. 4. some complications that arise in 2) above are that even sets of de nite-clause-like formulas may be inconsistent in a probabilistic sense. For instance, the probabilistic statement: \The probability of event E lies in the range [0:2; 0; 3]" is inconsistent with the probabilistic statement: \the probability of E lies in the range [0:5; 0:6]". Our model theory appropriately handles such probabilistic phenomena. 5. we develop a query processing procedure for handling queries to such programs. The procedure is complicated by the fact that uni cation of conjunctions and disjunctions of atoms does not proceed in the classical way, and that mgu's may not be unique.

2 Syntax Let L be a xed rst order language containing in nitely many variable symbols, and nitely many constant and predicate symbols, but no function symbols 1 , and let BL be the Herbrand 1

The technical reason why function symbols are not supported will be clari ed later.

2

base of L. Thus, BL is always nite.

De nition 1 i) conj (BL) = fA ^ : : : ^ An jn 1 is an integer and A ; : : :; An 2 BL and 81 i; j n; i 6= j ) Ai 6= Aj g ii) disj (BL) = fA _ : : : _ An jn 1 is an integer and A ; : : :; An 2 BL and 81 i; j n; i 6= j ) Ai = 6 Aj g 2 1

1

1

1

Thus, conj (BL) and disj (BL) denote, respectively, the set of all ground conjunctions and disjunctions formed by using distinct atoms in BL. A conjunction or disjunction with repeated atoms is considered equivalent to one without repeated atoms, by simply deleting the repetitions.

De nition 2 If A is an atom (not necessarily ground) and = [; ] [0; 1], then A : is

called a p-annotated atom. is called the p-annotation of A2 . Similarly, if C is a conjunction and D is a disjunction (both of which are not necessarily ground), then C : and D : are called a p-annotated conjunction and p-annotated disjunction respectively. 2

De nition 3 A basic formula (not necessarily ground) is either a conjunction or a disjunction

of atoms. Note that both disjunction and conjunction cannot occur simultaneously in one basic formula. Furthermore, let bf (BL) denote the set of all ground basic formulas by using distinct atoms in BL , i.e. bf (BL) = conj (BL) [ disj (BL). 2

De nition 4 If A is Van atom, F ; : : :; Fn are basic formulas, and ; ; : : :; n are p-annotations, V then A : F : : : : Fn : n is called a p-clause. A : is called the head of this p-clause, V V while F : : : : F : is called the body. We assume, usually, that ; ; : : :; are all 1

1

1

1

distinct from [0; 1].

1

1

n

n

1

n

2

Intuitively, if = [; ], then F : is to be read as: \The probability of F lies in the interval [; ]". Thus, to say that event F cannot occur, we merely say F : [0; 0] which is read as: \ the probability of event F occurring lies in the interval [0; 0]" which is the same as saying \the probability of event F is zero".

De nition 5 A probabilistic logic program (p-program for short) is a nite set of p-clauses. 2 We say that A : is uni able with B : 0 via i A and B are uni able via some substitution . Note that we are not de ning the result of the uni cation yet. Also notice that p-programs are dierent from the annotated programs of Blair and Subrahmanian [5] in two ways: rst, conjunctions and disjunctions are allowed to be annotated, and second, the annotations are sets of truth values rather than individual truth values. Both these distinctions have a signi cant Note that here, atoms are annotated with sets of truth values rather than truth values as in [5]. While sets of truth values may be used to annotate atoms, the resulting semantics prescribed by [5] is non-probabilistic. 2

3

impact on the semantics of annotated logics. Further distinctions with [5] will be discussed in Example 6 and 14. Note that p-programs are implicitly allowed to contain negation in clause heads because A : [0; 0], intuitively corresponds to the classical logic sentence \A is false." Semantics of logic programs with negations in clause heads were studied rst by Subrahmanian [48] and later by Blair and Subrahmanian [5].

Example 1 A long distance phone company, on receiving customer requests for connections,

tries to nd reliable paths within a network of relay centers. (Here assume that reliability is de ned as the probability that a connection is error-free throughout the connected period.) The company supports two types of direct connection between relay centers. Suppose a statistical survey reveals the following performance gures for these two types of connections: i) Type A connection on its own has a reliability of 90% 5%. ii) Type B connection on the other hand is more reliable, providing a reliability of over 90% on its own. iii) Suppose X; Y and Z are three centers. X and Z are connected by a Type A connection, while Z and Y are connected by a path of reliability at least 85%. Then the resulting path from X to Y suers a drop in reliability to the 80% to 95% range. iv) As part of a path, type B again is more reliable. If X and Z are connected by a type B connection and Z and Y are connected by a path of reliability at least 75%, then the resulting path from X to Y has a reliability of at least 85%. Now the company can use the following p-program to nd the reliability of paths from one center to another: (clauses 1-4 correspond to points i) to iv) respectively) path(X; Y ) : [0:85; 0:95] a(X; Y ) : [1; 1] path(X; Y ) : [0:9; 1] b(X; Y ) : [1; 1] path(X; Y ) : [0:8; 0:95] a(X; Z ) : [1; 1] V path(Z; Y ) : [0:85; 1] path(X; Y ) : [0:85; 1] b(X; Z ) : [1; 1] V path(Z; Y ) : [0:75; 1] 2

3 Fixpoint Semantics De nition 6 An atomic function is a mapping f : BL ?! C [0; 1] where C [0; 1] denotes the set of all closed sub-intervals of the unit interval [0; 1].

2

Note that the empty interval, denoted by ;, is a closed interval of the form [; ] with < . Intuitively an atomic function assigns a probability range to each ground atom. In the situation when the empty interval ; is assigned to an atomic function, inconsistency seems to exist. We will formalize the notion of probabilistic consistency in the next section. Recall that p-programs allow non-atomic basic formulas to appear in the body but not in the head of p-clauses. Thus we need a mechanism to assign probability ranges to non-atomic formulas. Suppose P(E ) is used to denote the probability of event E . Kolmogorov [33] and Hailperin [24] have shown that given distinct events E1 and E2 , we cannot precisely specify P(E1 ^ E2) from 4

P(E ) and P(E ). But we can characterize precisely the range within which the probability of (E ^ E ) must lie. As Frechet [21] has shown, maxf0; P(E ) + P(E ) ? 1g P(E ^ E ) minfP(E ); P(E )g represents the tightest bounds for P(E ^ E ). This result can be generalized 1

1

2

2

1

1

2

1

2

1

2

2

as shown below. In the sequel, a world is simply an Herbrand interpretation as de ned in Lloyd [36]. Given the two events, there are four possible worlds: rst (world K1 ), in which the events E1 and E2 both occur; second (world K2 ) in which E1 occurs, but E2 does not occur; third (world K3 ) in which E2 occurs while E1 does not occur; and lastly (world K4) in which neither E1 nor E2 occur. Suppose P(E1) 2 [1 ; 1] [0; 1] and P(E2) 2 [2; 2] [0; 1]. Furthermore, let ki be the probability that world Ki is the actual world. This situation can be expressed via the following linear program Q: 0 1 k1 + k2 1 1, 0 2 k1 + k3 2 1, 4 X kj = 1, and j =1

for j = 1; : : :; 4; kj 0, As event E1 occurs in worlds K1 and K2 which are mutually incompatible worlds, the probability of E1 occurring is (k1 + k2). As P(E1) is known to be in [1; 1], this gives rise to the rst inequality in the linear program Q. The second inequality arises similarly when we consider E2 instead of E1 . The third inequality says that the four possible worlds encompass all possibilities. The fourth inequality simply asserts that probabilities are non-negative. To nd the range for P(E1 ^ E2), we need to solve the linear program Q for the parameter k1 that represents the probability of the world in which E1 and E2 are both true. However, in general, there is no unique solution. Thus, we need to solve Q to nd the minimal and maximal values of k1. Likewise, to nd the range for P(E1 _ E2), we solve for the minimal and maximal values of (k1 + k2 + k3 ). Hereafter we use the notation minQ E and maxQ E to denote the minimization and maximization of expression E subject to the linear program Q described above.

Theorem 1 For the linear program Q: 0 1 k1 + k2 1 1, 0 2 k1 + k3 2 1, 4 X kj = 1, and j =1

for j = 1; : : :; 4; kj 0, i) minQ k1 = maxf0; 1 + 2 ? 1g ii) maxQ k1 = minf 1; 2g iii) minQ (k1 + k2 + k3 ) = maxf1 ; 2g and iv) maxQ (k1 + k2 + k3) = minf1; 1 + 2g. 5

Proof i) Claim :- minQ k = maxf0; + ? 1g. Case 1: + 1 First observe that k = 0; k = ; k = ; k = (1 ? ? ) is a solution to the linear program. Secondly, for any solution to the linear program, it satis es the constraint k 0. Therefore, it follows that minQ k = 0 = maxf0; + ? 1g. 1

1

1

2

2

1

2

1

3

2

4

1

2

1

1

1

2

Case 2: 1 + 2 > 1 First observe that k1 = (1 + 2 ? 1); k2 = (1 ? 2); k3 = (1 ? 1 ); k4 = 0 is a solution to the linear program. Suppose there exists a solution such that k1 < (1 + 2 ? 1). That is, k1 = (1 + 2 ? 1 ? ), for some > 0. Then to satisfy the rst inequality, k2 (1 ? 2 + ). Similarly to satisfy the second inequality, k3 (1 ? 1 + ). Therefore, it follows that (k1 + k2 + k3) (1 + 2 ? 1 ? ) + (1 ? 2 + ) + (1 ? 1 + ) = (1 + ) > 1. Thus, k4 < 0, which is a contradiction! Therefore, for all solutions to the linear program, k1 (1 + 2 ? 1). Thus it follows that minQ k1 = (1 + 2 ? 1) = maxf0; 1 + 2 ? 1g. Combining the results for Case 1 and 2, claim i) is proved. ii) Claim:- maxQ k1 = minf 1; 2g. Case 1: 1 2 First observe that k1 = 1; k2 = 0; k3 = ( 2 ? 1); k4 = (1 ? 2) is a solution to the linear program. Secondly, for all solutions to the linear program and in particular to the rst inequality, k1 1 , since k2 0. Therefore, maxQ k1 = 1 = minf 1; 2g. Case 2: 1 > 2 The proof that maxQ k1 = 2 = minf 1; 2g is similar to the above. Combining the results for Case 1 and 2, claim ii) is proved. iii) Claim :- minQ (k1 + k2 + k3 ) = maxf1; 2g. We subtract the rst two inequalities of Q from 1 to have the following inequalities: 1 ? 1 1 ? k 1 ? k 2 = k4 + k3 1 ? 1 , 1 ? 2 1 ? k 1 ? k 3 = k4 + k2 1 ? 2 . It is then easy to see that k4 plays the role of k1 in Q. Hence, it follows from ii) that maxQ k4 = minf1 ? 1; 1 ? 2g. Thus, it is easy to check that minQ (k1 + k2 + k3) = 1 ? maxQ k4 = maxf1 ; 2g. iv) Claim:- maxQ (k1 + k2 + k3 ) = minf1; 1 + 2 g. The proof is similar to that of case iii). This completes the proof of the theorem. 2 We de ne two operators and that combine intervals according to Theorem 1.

De nition 7 Let [ ; ] and [ ; ] be sub-intervals of [0,1]. De ne: 1) the operator where [ ; ] [ ; ] = [maxf0; + ? 1g; minf ; g], and 2) the operator where [ ; ] [ ; ] = [maxf ; g; minf1; + g]. 1

1

2

2

1

1

2

2

1

1

2

2

1

1

2

2

1

1

2

2

The following lemma shows a few properties of and that will be used in later proofs. 6

2

Lemma 1 Let [ ; ], [ ; ], [ ; ], [ ; ], and [ ; ] all be sub-intervals of [0,1]. 1) and are commutative, e.g. [ ; ] [ ; ] = [ ; ] [ ; ]. 2) and are associative, e.g. ([ ; ] [ ; ]) [ ; ] = [ ; ] ([ ; ] [ ; ]). 3) and are monotonic on both arguments, e.g. if [ ; ] [ ; ], and [ ; ] [ ; ], then [ ; ] [ ; ] [ ; ] [ ; ]. 4) and are strictly monotonic on both arguments, i.e. replace the relation in case 3 above with . 5) ; [ ; ] = ; and ; [ ; ] = ;. 6) [ ; ] [0; 1] = [ ; 1] and [ ; ] [0; 1] = [0; ]. 1

1

2

2

3

3

1

1

1

1

1

1

2

2

2

2

2

2

2

3

3

1

1

1

1

1

1

2

2

1

1

1

1

1

2

2

1

1

1

1

1

1

1

2

2

2

3

2

3

2

2

2

1

1

1

1

Proof 1) and 6) follow immediately from De nition 7. As for proofs of 2), 3) and 5), we only show the case for , as those for are similar. The proof of 4) is a straightforward modi cation

of the one for 3). i) Claim:- ([1; 1] [2 ; 2]) [3; 3] = [1; 1] ([2; 2] [3; 3]). By De nition 7, [1 ; 1] [2; 2] = [maxf0; 1 + 2 ? 1g; minf 1; 2g]. Then again by the same de nition, ([1; 1] [2 ; 2]) [3; 3] = [maxf0; maxf0; 1 + 2 ? 1g + 3 ? 1g; minf 3; minf 1; 2gg]. Since 3 1, maxf0; maxf0; 1 + 2 ? 1g + 3 ? 1g = maxf0; 1 + 2 + 3 ? 2g. In addition, minf 3; minf 1; 2gg = minf 1; 2; 3g. Therefore, ([1; 1] [2; 2])

[3 ; 3] = [maxf0; 1 + 2 + 3 ? 2g; minf 1; 2; 3g]. Similarly, [1; 1] ([2; 2] [3; 3]) = [maxf0; maxf0; 2 + 3 ? 1g + 1 ? 1g; minf 1; minf 2; 3gg]. Since 1 1, [1; 1]

([2; 2] [3; 3]) = [maxf0; 1 + 2 + 3 ? 2g; minf 1; 2; 3g]. ii)Claim:- If [1 ; 1] [1 ; 1], and [2 ; 2] [2; 2], then [1 ; 1] [2; 2] [1 ; 1] [2; 2]. >From De nition 7, [1; 1] [2 ; 2] = [maxf0; 1 + 2 ? 1g; minf 1; 2g], and [1; 1] [2 ; 2] = [maxf0; 1 + 2 ? 1g; minf 1; 2g]. Therefore, given 1 1 , 2 2 , 1 1 , and

2 2, it suces to prove that a) maxf0; 1 + 2 ? 1g maxf0; 1 + 2 ? 1g, and b) minf 1; 2g minf 1; 2g. a) Case 1: 1 + 2 1 Therefore, it follows that maxf0; 1 + 2 ? 1g 0 = maxf0; 1 + 2 ? 1g. Case 2: 1 + 2 > 1 Since 1 1 and 2 2 , it follows that (1 + 2 ? 1) (1 + 2 ? 1) > 0. Therefore, maxf0; 1 + 2 ? 1g = (1 + 2 ? 1) (1 + 2 ? 1) = maxf0; 1 + 2 ? 1g. This completes the proof for a). b) It is true that minf 1; 2g 1 1 , and similarly minf 1; 2g 2 2. Therefore, it follows that minf 1; 2g minf 1; 2g. This completes the proof for b) and claim ii). iii) Claim:- ; [1; 1] = ;. This result can be readily seen from the proof of Theorem 1, as the constraint for ; is not satis able. 2 Recall that p-programs allow non-atomic basic formulas to appear in the body but not in the head of p-clauses. As formalized in the following de nition, a formula function determines assignments of probability ranges to non-atomic formulas by applying the operators and on the probability ranges assigned to atomic formulas. 7

De nition 8 Given an atomic function f : BL ?! C [0; 1], a corresponding formula function h : bf (BL) ?! C [0; 1] is de ned inductively as follows: i) h(F ) = f (F ), if F is an atom, ii) h(F1 ^ F2 ) = h(F1 ) h(F2 ), where (F1 ^ F2 ) is in bf (BL), and iii) h(F1 _ F2 ) = h(F1 ) h(F2 ), where (F1 _ F2 ) is in bf (BL).

2

Thus, there is a correspondence between formula functions and atomic functions in the sense that given any formula function h, we can get an atomic function by restricting h to ground atoms. Likewise, given any atomic function f , De nition 8 allows us to obtain a formula function h. We now de ne an ordering on the set of formula functions.

De nition 9 Given two formula functions h and h , we say that h h i 8F 2 bf (BL); h (F ) h (F ) 2 1

1

2

1

2

2

De nition 10 Let FF denote the set of all formula functions.

2

Note 1 Observe that arbitrary unions of closed intervals are not necessarily closed. To see this, suppose that our language consists of the single propositional symbol p, and that fj , for j 1, S f (p) is equal to the right-open interval assigns the closed interval [0; 1 ? ] to p. Then 1 2

j 1 j

j

[0; 1). Thus, we cannot de ne greatest lower bounds by simply taking unions of closed sets. Instead, we take the topological closure (w.r.t. the order topology on the real line), of [0; 1), which, as we expect, is the closed interval [0; 1]. (Recall that in any topological space X , the closure, closure(Y ), of a set Y X is the smallest closed set that contains Y . Closures are well de ned in all topologies, [27]). The following lemma tells us that FF forms a complete lattice.

Lemma 2 FF forms a complete lattice wrt the ordering de ned above. Proof For every subset G of FF , 8F 2 bf (BL); F(G)(F ) = Tfjg(F ) = and g 2 Gg and u(G)(F ) = closure(Sfjg(F ) = and g 2 Gg): 2 The top element of FF is the function h such that 8F 2 bf (BL); h(F ) = ;. And the bottom element is the function h such that 8F 2 bf (BL); h(F ) = [0; 1]. We now de ne a xpoint operator TP for program P .

De nition 11 Let P be a p-program.TTP : FF ?! FF is de ned inductivelyVas follows: V i) For all atoms A 2 BL , TP (h)(A) = MA where MA = fjA : F : : : : Fn : n is a ground instance of a clause in P and 8i; 1 i n; h(Fi) i g: If the set MA is empty, then 1

1

TP (h)(A) = [0; 1]. ii) TP (h)(C1 ^ C2) = TP (h)(C1) TP (h)(C2), where (C1 ^ C2) is in bf (BL), and iii) TP (h)(D1 _ D2) = TP (h)(D1) TP (h)(D2), where (D1 _ D2 ) is in bf (BL). 8

2

In the following we prove that TP is monotonic.

De nition 12 Let F be a ground basic formula. Then let rank(F ) be the number of distinct

2

atoms occurring in F .

Theorem 2 TP is monotonic. (That is, whenever h h ; TP (h ) TP (h ).) 1

2

1

2

Proof Given a basic formula F , proceed by induction on rank(F ) and apply part 3 of Lemma

2

1.

De nition 13 The upward iteration of TP is de ned as follows: i) TP " 0 = ?, i.e. TP " 0 is the function that assigns [0; 1] to all F 2 bf (BL) and ii) TP " = TP (TP " ( ? 1)) where is a successor ordinal whose immediate predecessor is denoted by ( ? 1) and iii) TP " = tfTP " j < g, where is a limit ordinal. 2 Example 2 TP may not always be continuous. Let q : [0; 0:5] p : [0; 0] be the only clause of a probabilistic program P . Consider the following directed subset of FF : G = (gj )1 j , where gj (p) = [0; j ], gj (q) = [0; 1]. Then, (tG)(p) = [0; 0], and TP (tG)(q) = [0; 0:5]. However, for all j , TP (gj )(q) = [0; 1]. Therefore, it follows that t(TP (gj )(q)) = [0; 1] = 6 TP (tG)(q). 2 =0

1 2

To prove Lemma 4, we prove the following lemma rst.

Lemma 3 For all F 2 bf (BL), TP " !(F ) = ) 9n < ! such that TP " n(F ) = . Proof Proceed by induction on rank(F ).

Base Case: rank(F ) = 1 Then F A for some atom A. Suppose TP " ! (A) 6= TP " n(A) for all n < ! . Since TP " 0(A) TP " 1(A) : : : TP " !(A), there exists an ascending sequence of integers 0 ; 1; : : : such that TP " 0 (A) TP " 1 (A) : : :. In particular, since TP " ! (A) = tfTP " n(A)jn < ! g, and TP " ! (A) 6= TP " n(A) for all n < ! , the sequence 0 ; 1; : : : must be in nite. But for each j in the sequence, TP " j (A) = tXj , where Xj A = fj is the annotation of the head of a clause which uni es with Ag. Therefore, there exists a corresponding in nite sequence X0; X1; : : : of subsets of A such that X0 X1 : : :. However since program P consists of only a nite set of clauses, A and therefore the number of subsets of A must both be nite. Therefore, there exists i < j such that Xi = Xj , a contradiction! Inductive Case: rank(F ) > 1 Then, F is either a conjunction or a disjunction. Case 1: F C1 ^ C2 By De nition 11, TP " ! (F ) = TP " ! (C1 ^ C2) = TP " ! (C1) TP " ! (C2). But by the induction hypothesis, there exists n1 < ! such that TP " n1 (C1) = TP " ! (C1), and n2 < ! 9

such that TP " n2 (C2) = TP " ! (C2). Pick n = maxfn1; n2g < ! . Then, it follows that TP " n(F ) = TP " n(C1) TP " n(C2) = TP " !(C1) TP " !(C2) = TP " !(F ). Case 2: F D1 _ D2 The proof is similar to the one for conjunctions in case 1. This completes the induction and the proof of the lemma. 2 Despite the fact that TP is not always continuous, the following result holds.

Lemma 4 There exists an integer n < ! such that TP " n = lfp(TP ). Proof Immediate consequence of the lemma above and the fact that bf (BL) is nite.

2

Lemma 4 tells us that the TP operator always achieves a xpoint after a nite iteration.

4 Probabilistic Model Theory In this section, we present a model theory that captures the uncertainty described in p-programs. We introduce notions such as probabilistic truth values of formulas and probabilistic interpretations and models for p-programs. We also study the relationships between formula functions and probabilistic interpretations, and between xpoints of TP and probabilistic models for P .

De nition 14 Consider any enumeration of 2BL , i.e. 2BL = fK ; : : :; Krg for some integer r. A probabilistic kernel interpretation is a mapping KI : 2BL ?! [0; 1] such that for all X KI (Kj ) = 1. Hereafter, we denote KI (Kj ) by kj . 2 Kj 2 2BL ; KI (Kj ) 0 and 1

Kj 22BL

Intuitively, probabilistic kernel interpretations assume that the \real" world is de nite, i.e. there is some set of propositions that are true, and some set of propositions that are false. However, it is not sure which of the various \possible worlds" (i.e. worlds are just 2-valued interpretations) is the right one. Hence, a kernel interpretation assigns a probability to each 2-valued interpretation of our language. As 2BL consists of all possible worlds, it must be the case that the sum of all probabilities assigned be 1. Any two distinct worlds are mutually incompatible as they must dier on at least one atom. Hence, we can compute the probability of a formula F in a kernel interpretation KI by just summing up the probabilities assigned to those worlds in 2BL in which F is true. Note on Notation. Throughout this paper, given a kernel interpretation KI , we will use K1; : : :; Kn to denote the elements of 2BL and k1; : : :; kn to denote KI (K1); : : :; KI (Kn) respectively.

Example 3 Suppose L consists of two propositional symbols p and q. Then KI (f;g) = 0:4, KI (fpg) = 0:25, KI (fqg) = 0:35, KI (fp; qg) = 0 is a probabilistic kernel interpretation. Intuitively, KI says that the probability that both p and q are false is 0:4, the probability that p is 10

true in our real world, but q is false is 0:25, the probability that q is true in the real world but p is false is 0:35, and the probability that both p and q are true in the real world is 0. 2 Recall from the previous section that a formula function h speci es a probability range for each basic formula. Given h, we would like to nd those kernel interpretations whose probability assignments to basic formulas fall within the ranges speci ed by h. The notion de ned below captures this idea. In the sequel, we will assume we are speaking about some xed language L with 2BL = fK1; : : :; Kr g and some xed probabilistic kernel interpretation KI such that KI (Kj ) = kj for all 1 j r.

De nition 15 1) Let h be a formula function, and let LP (h) denote the linear program: 1 0 X kj C 8Fi 2 bf (BL); i B@ A i, where h(Fi) = [i; i]; B Kj j Fi and Kj 2 L X =

2

kj = 1; and Kj 22BL 8Kj 2 2BL ; kj 0.

Let KI (h) be the solution set of the above linear program. 2) Let f be an atomic function. Then let LP (f ) denote the linear program similar to the one above, except that there is one constraint based on the range assigned by f for each ground atom, instead of for each ground basic formula. Similarly, let KI (f ) be the solution set of LP (f ). 2 Note that in the above linear program, the Kj 's are 2-valued interpretations and the Fi 's are basic formulas. All basic formulas are formulas of classical logic. Hence, the expression Kj j= F is to be read: \Kj satis es F " where satisfaction is de ned as in classical logic [8, 47]. Notice that every solution to LP (h) is a kernel interpretation that satis es h. Hence, we use the notation KI (h) to denote the family of kernel interpretations that satisfy h. Also note that each basic formula in our language generates an inequality in the above linear program. If function symbols are allowed in our language, then bf (BL) will be in nite and the linear program will have in nitely many constraints and kj 's. As far as we know, current technology on the theory of linear programming in in nite dimensional space [1] can only deal with semi-in nite linear programs which either have in nitely many variables (kj 's in our case) or in nitely many constraints, but not both. This is the technical reason why function symbols are disallowed in our framework. In the remainder of this section, we study the relationship between a kernel interpretation and a probabilistic interpretation, and more importantly the relationship between xpoints and probabilistic models. While we will discuss in detail all these notions and relationships very shortly, we rst examine when a family of kernel interpretations is non-empty. 11

Consider the atomic function f associated, as per De nition 8, with a given formula function h. As de ned in De nition 8, for all A 2 BL; f (A) = h(A). In the following, we show that the family of kernel interpretations for a formula function h is the same as the family for the atomic function f associated with h, that is KI (h) = KI (f ). The reason is that the constraints in LP (h) for non-atomic formulas are redundant and can therefore be discarded without altering the solution set to the linear program. These constraints are redundant because, as we recall from the previous section, probability ranges are propagated to non-atomic formulas through repeated applications of and on the ranges assigned to atoms. Hence, in nding solutions to the entire set of constraints, it suces to consider constraints for the atoms only. In particular, Lemma 5 below states that in the presence of the constraints for C1 ; C2 2 conj (BL), the constraint for C1 ^ C2 can be ignored.

Lemma 5 Let be a subset of LP (h) for some formula function h (i.e. a set of linear

constraints de ning a linear program). Further suppose that contains at least the following constraints: 0 1

X

i) R 1 B @

kj C A 1 (i.e. the constraint for h(C1) for some C1 2 conj (BL)),

0Kj j C1 and Kj 2 BL 1 X kj C ii) S B A (i.e. the constraint for h(C ) for some C 2 conj (BL)), @ Kj j C2 and Kj 2 BL 0 1 X kj C iii) T maxf0; + ? 1g B A minf ; g (i.e. the constraint @ =

2

=

2

2

2

1

2

for h(C1 ^X C2 )), kj = 1, and iv) I

2

2

1

Kj j=C1 ^C2 and Kj 22BL

2

Kj 22BL 8Kj 2 2BL ; kj

0. v) M Then the solution set for ( ? fT g) is the same as that for .

Proof i) Claim:- The solution set for is contained in the solution set for ( ? fT g).

For every solution k in the solution set for , it is obvious that k is also a solution to ( ?fT g). ii) Claim:- The solution set for ( ? fT g0) is contained in the solution 1 set for 0 .

X

Let k be a solution to ( ?fT g). Let i1 = B @

0 i =B @ 3

B Kj j C0 1 ^C2 and Kj 2 L 1 X kj C A ; and i = B@ =

X Kj j=:C1 ^C2 and Kj 22BL

kj C A ; i2 = B@

2

4

Kj j=:C1 ^:C2 and Kj 22BL

R; S; T and I can be rewritten as: i) R 1 (i1 + i2) 1, ii) S 2 (i1 + i3 ) 2, iii) T maxf0; 1 + 2 ? 1g i1 minf 1; 2g, and 12

X

1 Kj j C1 ^:C2 and Kj 2 BL kj C A. Then constraints =

2

1 kj C A;

iv) I

X 4

j =1

ij = 1.

Now let N be the constraint: N for j = 1; : : :; 4; ij 0. Note that whenever constraint M is satis ed, constraint N is satis ed. Since constraint M 2 , k is also a solution to ( ? fT g [ fN g). Now rewrite ( ? fT g [ fN g) (fR; S; I; N g) [ 1 for some set 1 of constraints. But recall that the set fR; S; I; N g is the same linear program used in Theorem 1. Then by Theorem 1, minQ i1 and maxQ i1 are maxf0; 1 + 2 ? 1g and minf 1; 2g respectively, i.e. maxf0; 1 + 2 ? 1g i1 minf 1; 2g. In other words, every solution of fR; S; I; N g satis es constraint T automatically, and therefore every solution of fR; S; I; N g is a solution of fR; S; I; N; T g. Since k is a solution to ( ? fT g [ fN g), k satis es both 1 and fR; S; I; N g. Thus, k satis es both 1 and fR; S; I; N; T g. But

1 [ (fR; S; I; N; T g) ( ? fT g [ fN g) [ fT g [ fN g. Therefore, k is also a solution to ( [fN g). However as argued above, since constraint M 2 , it follows that the solution set for ( [fN g) is the same as that for . Therefore, k is in the solution set for . This completes the proof of the lemma. 2 A similar lemma exists for disjunctions, that is, given the constraints for D1; D2 2 disj (BL), the constraint for D1 _ D2 can be discarded. The proof can be easily obtained by modifying the above proof to consider 1 T to be: 0 the redundant constraint

T maxf1; 2g B @

X

Kj j=D1 _D2 and Kj 22BL

kj C A minf1; 1 + 2g,

where 1 ; 2; 1 and 2 are the lower bounds and upper bounds of the constraints for D1 and D2 respectively. By repeated applications of the above lemmas, we show in the following theorem that all constraints for non-atomic formulas are redundant and can therefore be ignored in considering the family of kernel interpretations that satisfy a formula function.

Theorem 3 Given a formula function h, KI (h) = KI (f ), where f is the atomic function associated with h.

Proof Consider the linear program LP (h). Pick any conjunction C ^ C 2 conj (BL) of the 1

2

maximal rank. This is always possible as BL is nite (though there may be several dierent choices for picking C1 ^ C2 ). Let the constraint for C1 ^ C2 in LP (h) be T , the one for C1 be R, and the one for C2 be S . Since LP (h) contains constraints I and M as de ned in Lemma 5, the lemma guarantees that the solution set for (LP (h) ?fT g) is the same as the one for LP (h), i.e. KI (h). Note that (LP (h) ? fT g) still contains constraints I and M . Thus among those conjunctions with rank 2 and whose constraints have not been deleted, the lemma can be applied repeatedly to a conjunction of the maximal rank. This iterative process stops when the remaining linear program only contains constraints for atoms, disjunctions and constraints I and M - which is denoted by (LP (h) ? 1 ) for some set 1 of constraints. However, each application of the lemma guarantees that the solution sets before and after a deletion are identical. Therefore, KI (h) is identical to the solution set of (LP (h) ? 1 ). Now constraints for disjunctions with rank 2 are similarly deleted, based on repeated applications of the 13

lemma for disjunctions. This iterative process stops when the remaining linear program only contains constraints for atoms and constraints I and M - which becomes LP (f )! However, each application of the lemma guarantees that the solution sets before and after a deletion are identical. Therefore, KI (f ), the solution set of LP (f ), is identical to the solution set of (LP (h) ? 1 ) which in turn is identical to KI (h). 2 Theorem 3 demonstrates that from now on, we only need to consider linear programs associated with atomic functions rather than those associated with formula functions.

Example 4 Suppose BL = fA; B; C g. Then there are eight dierent Herbrand Interpretations K1 to K8, as summarized by the truth table below in the usual way : K1 K2 K3 K4 K5 K6 K7 K8

A B C 1 1 1 1 0 0 0 0

1 1 0 0 1 1 0 0

1 0 1 0 1 0 1 0

Thus, K1 represents the Herbrand Interpretation containing A; B and C , and so on. Suppose for some formula function h, h(A) = [1; 1], h(B ) = [2; 2], and h(C ) = [3; 3]. Then since A is true in the classical 2-valued sense in K1; K2; K3 and K4 , the constraint for A is:

1 k 1 + k 2 + k 3 + k 4 1 . (As usual, ki denotes the probability assigned to the Herbrand interpretation Ki). Similarly, the constraints for B and C are as follows:

2 k 1 + k 2 + k 5 + k 6 2 , 3 k 1 + k 3 + k 5 + k 7 3 . For the given Herbrand Base BL , the set of all basic formulas bf (BL) = fA; B; C; A ^ B; A ^ C; B ^ C; A ^ B ^ C; A _ B; A _ C; B _ C; A _ B _ C g. Then according to De nition 8, the ranges for conjunctions are computed using the operator . Hence, since A ^ B is true in K1 and K2 , the constraint for A ^ B is:

maxf0; 1 + 2 ? 1g k1 + k2 minf 1; 2g. Similarly, the constraints for A ^ C , B ^ C and A ^ B ^ C are respectively:

maxf0; 1 + 3 ? 1g k1 + k3 minf 1; 3g, maxf0; 2 + 3 ? 1g k1 + k5 minf 2; 3g and maxf0; 1 + 2 + 3 ? 2g k1 minf 1; 2; 3g. 14

As for disjunctions, again by De nition 8, the ranges are computed using the operator . It can be easily veri ed that the constraints for A _ B , A _ C , B _ C and A _ B _ C are respectively:

maxf1 ; 2g k1 + : : : + k6 minf1; 1 + 2 g, maxf1; 3g k1 + k2 + k3 + k4 + k5 + k7 minf1; 1 + 3g, maxf2; 3g k1 + k2 + k3 + k5 + k6 + k7 minf1; 2 + 3g and maxf1; 2; 3g k1 + : : : + k7 minf1; 1 + 2 + 3 g Now by applying the constraint deletion process described in the proof of Theorem 3, the constraint for A ^ B ^ C is deleted rst, followed by the constraints for A ^ B; A ^ C , B ^ C , A _ B _ C; A _ B; A _ C , and B _ C . What remains is the following linear program:

1 k 1 + k 2 + k 3 + k 4 1 , 2 k 1 + k 2 + k 5 + k 6 2 , 3 k 1 + k 3 + k 5 + k 7 3 , k1 + : : : + k8 = 1 and k1 ; : : :; k8 0. which according to Theorem 3, has the same solution set as the original LP (h).

2

In the following, we characterize those formula functions h (atomic functions) whose family KI (h) of kernel interpretations is non-empty.

De nition 16 A formula function h is fully-de ned i for all F 2 bf (BL); ; h(F ) [0; 1]. An atomic function f is fully-de ned i for all A 2 BL ; ; f (A) [0; 1]. 2 Intuitively, the assignment of the empty interval to an atom (or basic formula) tells us that there is no way of assigning a probability to that atom or formula. Thus, there seems to be an inconsistency concerning that atom or formula. This is what the de nition of \full-de nedness" tries to capture.

Lemma 6 Let f be an atomic function, and h be the formula function associated with f . Then:

h is fully-de ned i f is fully-de ned.

Proof i) Claim:- If f is not fully-de ned, then h is not fully-de ned. There exists an atom A in BL such that the condition ; f (A) [0; 1] is violated. But since h(A) = f (A), it is not true that ; h(A) [0; 1]. It follows that h is not fully-de ned.

ii) Claim:- If f is fully-de ned, then h is fully-de ned. Let F be any ground formula in bf (BL). Proceed by induction on rank(F ). Base case: rank(F ) = 1 Then F is a ground atom. Since f is fully-de ned and for all F 2 BL, h(F ) = f (F ), it follows immediately that ; h(F ) [0; 1]. 15

Inductive case: rank(F ) > 1 Then F is either a conjunction or a disjunction. Case 1: F C1 ^ C2 >From De nition 8, h(F ) = h(C1 ) h(C2). >From the induction hypothesis, ; h(C1) and ; h(C2). Then it follows immediately from Lemma 1 that ; = ; ; h(C1) h(C2). This completes the proof for case 1. Case 2: F D1 _ D2 The proof is similar to the one for conjunctions in case 1. This completes the induction and the proof of the lemma. 2 We are now in a position to characterize formula functions whose family of kernel interpretations are non-empty. Theorem 4 tells us that if a formula function h is fully-de ned, then the set LP (h) of inequalities is guaranteed to possess at least one solution.

Theorem 4 If a formula function h is fully-de ned, then KI (h) is non-empty. Proof >From Lemma 6 above, and Theorem 3, it suces to prove that if f , the atomic function associated with h, is fully-de ned, then KI (f ) is non-empty. Without loss of generality, consider an arbitrary enumeration A ; : : :; An of BL (where jBLj = n) such that f (Ai ) = [i ; i] for 1 i n, and such that : : : n . Based on this enumeration, the following table 1

1

2

represents an enumeration of 2BL :

K1 K2 .. .

K2n?2 K2n?2+1 .. .

K2n?1 K2n?1+1 .. .

K2n?1+2n?2 K2n?1+2n?2 +1 .. .

K2n

A1 A2 : : : A n 1 1 ::: 1

.. . .. . 1 1 .. . 1 0 .. . 0 0 .. . 0

.. . .. . 1 0 .. . 0 1 .. . 1 0 .. . 0

0

0 1 0 1 0 1 0

For example, K1 represents the 2-valued interpretation (i.e. world) fA1; : : :; An g, K2 the 2valued interpretation fA1 ; : : :; An?1 g, and so forth. Under this enumeration, the system of linear inequalities LP (f ) de ned in De nition 15 becomes:

1 k1 + k2 + : : : + k2n?1 1 , 2 (k1 + : : : + k2n?2 ) + (k2n?1+1 + : : : + k2n?1 +2n?2 ) 2, 16

.. . n k1 + k3 +n: : : + k2n?1 n , and 2 X kj = 1. j =1

Then construct a solution for KI (f ) by considering each inequality in turn: i) Take the rst inequality and set k1 = 1; k2 = : : : = k2n?1 = 0. Then, k1 + : : : + k2n?1 = 1 , and therefore the rst inequality is satis ed, as f is fully-de ned and 1 1. In addition, k1 ; : : :; k2n?1 0. ii) Take the second inequality and set k2n?1+1 = 2 ? 1 ; k2n?1+2 = : : : = k2n?1 +2n?2 = 0. Recall that 1 2 and hence, k2n?1+1 0. Then,

n?2 X

2

j =1

kj +

n?1 +2n?2

2

X

j =2n?1 +1

kj = k1 + k2n?1 +1 =

1 +(2 ? 1 ) = 2 : Therefore, the second inequality is satis ed as f is fully-de ned and 2 2. In addition, k1; : : :; k2n?1+2n?2 0, and k1 + : : : + k2n?1+2n?2 = 2: iii) Continue this process one inequality at a time. Finally when the last inequality is considered, n?1 X

2

set k2n?1 = n ? n?1 . Then, k2j?1 = n , satisfying the last inequality as n n. In j =1 addition, k1; : : :; k2n?1 0, and k1 + : : : + k2n ?1 = n : iv) Finally, to satisfy the condition that all kj 's add up to 1, set k2n = 1 ? n 0: Thus whenever f is fully-de ned, KI (f ) is non-empty. 2 Theorem 4 guarantees that the linear program generated by a fully de ned formula function is always solvable. We show below an example to illustrate how the proof of Theorem 4 works.

Example 5 Continue with the situation described in Example 4. Without loss of generality, suppose the enumeration A; B and C corresponds to the one where . Then 1

2

3

according to the construction shown in the above proof, k1 = 1 ; k2 = k3 = k4 = 0; k5 = 2 ? 1; k6 = 0; k7 = 3 ? 2; k8 = 1 ? 3 is a solution to the linear program:

1 k 1 + k 2 + k 3 + k 4 1 , 2 k 1 + k 2 + k 5 + k 6 2 , 3 k 1 + k 3 + k 5 + k 7 3 , k1 + : : : + k8 = 1 and k1 ; : : :; k8 0.

2

De nition 17 Given a probabilistic kernel interpretaton KI , it can be extended to a probabilistic interpretation0which is a mapping1I from formulas to [0; 1] in the following way: For all X 2 kj C formulas F , I (F ) = B A. @ Kj j=F and Kj 22BL

17

Theorem 4 is signi cant as it guarantees that fully-de ned functions can always be extended to probabilistic interpretations that assign probabilistic truth values to formulas. The two lemmas below show that this way of assigning probabilistic truth values satis es many general properties of probability. Hereafter whenever no confusion arises, we simply use I to denote a probabilistic interpretation, without referring to the probabilistic kernel interpretation KI from which I is generated.

Lemma 7 (Hailperin) Suppose that KI is a probabilistic kernel interpretation and that I is the probabilistic interpretation associated with KI . Then the following conditions hold: i) I () = 0, if ) A ^ :A, for some A, ii) I () I ( ), if ) , iii) I (:) = 1 ? I (), and iv) I ( _ ) = I () + I ( ), if ^ ) A ^ :A, for some A.

2

Fenstad [16] has identi ed the following requirements for de ning a probability function p on a rst-order langauge L: i) p( _ ) + p( ^ ) = p() + p( ), ii) p(:) = 1 ? p(), iii) p() = p( ), if and are logically equivalent in L, and iv) p() = 1, if is provable in L. It is obvious from the above lemma that probabilistic interpretations satisfy the last three requirements of Fenstad. The following lemma shows that probabilistic interpretations also satisfy the rst requirement. The proof is straightforward.

Lemma 8 Let and be arbitrary formulas in language L. Suppose that KI is a probabilistic

kernel interpretation and that I is the probabilistic interpretation associated with KI . Then I ( _ ) + I ( ^ ) = I () + I ( ). 2 Recall that for every formula function h, there is a corresponding family of kernel interpretations KI (h). And as de ned above, for each of these kernel interpretations, there is a corresponding probabilistic interpretation I . Therefore, associated with h is a family of probabilistic interpretations, denoted by I (h).

De nition 18 Suppose KI is a probabilistic kernel interpretation, and I is the probabilistic interpretation associated with KI . Also let F be a basic formula and F ; : : :; Fn 2 bf (BL), and ; ; : : :; n [0; 1]. i) I j= F : i I (F ) 2 , ii) I j= (F : ^ : : : ^ Fn : n ) i for all 1 j n, I j= Fj : j , V V V V iii) I j= F : F : : : : Fn : n i I j= F : or I 6j= (F : : : : Fn : n ), iv) I j= (9x)(F : ) i I j= (F (x=t) : ) for some ground term t where F (x=t) denotes the 0

0

1

1

1

1

0

0

1

1

0

0

replacement of all free occurrences of x in F by t and v) I j= (8x)(F : ) i I j= (F (x=t) : ) for all ground terms t. 18

1

1

2

Note that I j= F : de nes a satisfaction relation, that is a probabilistic interpretation I either satis es F : or not; I does not try to calculate the probability range for F . As usual, we use the notation j= also to denote logical consequence. We say program P logically entails formula F , denoted P j= F i whenever I is a probabilistic interpretation that satis es each clause in P , then I j= F . We now investigate the relationship between xpoints and probabilistic models of a p-program. Lemma 9 below is necessary to prove Theorem 5.

Lemma 9 Suppose h is a fully de ned formula function. Then for all F 2 bf (BL), h(F ) = [; ] is the smallest interval that contains fI (F )jI 2 I (h)g which is the set of probabilistic truth values of F assigned by the family of probabilistic interpretations associated with h.

Proof i) Claim :- For all F 2 bf (BL), h(F ) = [; ] contains fI (F )jI 2 I (h)g. Suppose there exists 1 that I (F ) 62 [; ]. Then I (F ) does not satisfy the 0 an I 2 I (h) such X kj C inequality B A = I (F ) as de ned in De nition 15. Therefore, I @ Kj j=F and Kj 22BL

cannot be in I (h), which is a contradiction! ii)Claim:- For all F 2 bf (BL), if [; ] contains fI (F )jI 2 I (h)g, then h(F ) = [; ] [; ]. Let F be any ground formula in bf (BL). Proceed by induction on rank(F ). Base Case: rank(F ) = 1 Then F A for some ground atom A. Consider some 2 [; ]. Recall that KI (h) = KI (f ) which is the solution set for LP (f ) as de ned in De nition 15. Construct 0 1 a new

X

system Q0 from LP (f ) by only replacing the constraint B @

0 B @

Kj j A and Kj 2 BL 1 kj C A . Recall from Lemma 6 that if h is fully de ned, f is fully=

X Kj j=A and Kj 22BL

kj C A with

2

de ned. Since ; [; ] [0; 1], then by Theorem 4, Q0 has a non-empty solution set. That is, there exists a solution KI for Q0 . Since , KI 2 KI (f ) = KI (h). Therefore, 1 interpretation corresponding to KI , I 2 I (h). Thus, I (F ) = I (A) = 0if I is the probabilistic

B@

X

Kj j=A and Kj 22BL

kj C A = 2 [; ]. This completes the proof that [; ] [; ] for the base

case. Inductive Case: rank(F ) > 1 Then, F is either a conjunction or a disjunction. Case 1: F C1 ^ C2 By the induction hypothesis, h(C1) = [1; 1] and h(C2) = [2 ; 2] are the smallest intervals that contain fI (C1)jI 2 I (h)g and fI (C2)jI 2 I (h)g respectively. Thus for all I 2 I (h), I satis es the constraints of the linear program listed in Theorem 1, where I (C1) = k1 + k2 , I (C2) = k1 + k3 and I (C1 ^ C2) = k1 . Then by De nition 8, h(C1 ^ C2) = [; ] implies = minQ I (C1 ^ C2) and = maxQ I (C1 ^ C2). Now suppose there exists [; ] that contains fI (C1 ^ C2)jI 2 I (h)g such that [; ] [; ]. But according to the proof of Theorem 1, there actually exists I1; I2 19

2 I (h) such that I (C ^ C ) = and I (C ^ C ) = . Therefore, [; ] cannot contain both I (C ^ C ) and I (C ^ C ), which is a contradiction! Therefore, [; ] [; ]. Case 2: F D _ D 1

1

1

2

1

2

1

1

2

2

1

2

2

2

The proof is similar to the one for conjunction in case 1. This completes the induction and the proof of the lemma. 2 The Lemma says that h(F ) is the smallest interval that contains the set of all the probabilistic truth values of F assigned by the family of probabilistic interpretations associated with h. It is useful in proving the following theorem which states that all fully-de ned pre- xpoints of TP generates a non-empty family of probabilistic models for P .

Theorem 5 Suppose a formula function h is fully-de ned. Then I (h) is a family of probabilistic models of P i TP (h) h. Proof i) Claim:- If h is fully-de ned, and I (h) is a family of probabilistic models, then TP (h)

h.

Let F be any ground formula in bf (BL). Proceed by induction on rank(F ). Base Case: rank(F ) = 1 Then F A for some ground atom A. Consider the atomic function f associated with h. According to Theorem 3, KI (f ) = KI (h). Therefore, I (f ) is a family of probabilistic models. T V V Recall from De nition 11 that for all A 2 BL , TP (h)(A) = fjA : F1 : 1 : : : Fn : n is a ground instance of a clause in P and 8i; 1 i n; h(Fi) i g. Let C be any clause described in the set above. Then, for all I 2 I (f ), I (A) 2 , where is the p-annotation in the head of C . Therefore, I (A) 2 TP (h)(A). Thus, TP (h)(A) contains fI (A)jI 2 I (f )g. But by Lemma 9, f (A) is the smallest interval that contains fI (A)jI 2 I (f )g. Therefore, h(A) = f (A) TP (h)(A). Inductive Case: rank(F ) > 1 Then, F is either a conjunction or a disjunction. Case 1: F C1 ^ C2 By the induction hypothesis, h(C1) TP (h)(C1) and h(C2) TP (h)(C2). But by Lemma 1, h(C1 ^ C2) = h(C1) h(C2) TP (h)(C1) TP (h)(C2) = TP (h)(C1 ^ C2). Case 2: F D1 _ D2 The proof is similar to the one for conjunctions in case 1. This completes the proof of i). ii) Claim:- If h is fully-de ned, and TP (h) h, then I (h) is a family of probabilistic models. V V For all I 2 I (h), let A : F1 : 1 : : : Fn : n be a ground instance of a clause in P , and I (Fi ) 2 i ; for all 1 i n. But by Lemma 9, h(Fi ) is the smallest interval that contains fI (Fi)jI 2 IV(h)gV. Therefore, h(Fi) i; for all 1 i n. Recall that TP (h)(A) = TfjA : F1 : 1 : : : Fn : n is a ground instance of a clause in P and 8i; 1 i n; h(Fi) i g. Therefore,0 TP (h)(A) . Since 1 TP (h) h, h(A) TP (h)(A) . However, since I 2 I (h), B @

I j= A : 2

X

B Kj j=A and K Vj 22 VL

kj C A = I (A) , where h(A) = [; ]. Therefore, I (A) 2 . Thus,

F1 : 1 : : : Fn : n . Combining the results of i) and ii), the theorem is proved. 20

Corollary 1 If h is fully-de ned and h is a xpoint of TP , then I (h) is a family of probabilistic

2

models of P .

Lemma 10 If formula functions h and h are fully-de ned, then whenever h h , it is necessary that KI (h ) KI (h ). 1

1

2

1

2

2

Proof Straightforward from De nition 9.

2

Corollary 2 Suppose the least xpoint of TP is fully-de ned. Then since TP is monotonic, it is the case that: I (lfp(TP )) = [fI (f ) j TP (f ) f g. 2 According to the two corollaries above, if the least xpoint of TP is fully-de ned, then it generates a non-empty family of probabilistic models for P . Moreover, this family contains the family associated with each pre- xpoint of TP . As TP is a monotone operator on a complete lattice, it is guaranteed to possess at least one xpoint (and hence there is at least one h such that TP (h) h). However, it is possible that there is no formula function h that satis es each of the conditions below:

TP (h) h and h is fully-de ned. For instance, let P be the p-program containing two clauses:

q : [0; 0:2] q : [0:4; 0:5] Here, for all h 2 FF , TP (h)(q ) = ;, and hence h TP (h) for all h 2 FF . Thus, there is only one atomic function h such that TP (h) h, and this is the function, denoted by h0 , which assigns ; to q , i.e. h0 (q ) = ;. Clearly, TP (h0 ) = h0 . But h0 is clearly not fully de ned. For such a program P and such an h0, I (h) = ;. In the following, we characterize p-programs whose least xpoints are fully-de ned.

De nition 19 A p-program P is inconsistent i it has no probabilistic model.

2

We now relate probabilistic inconsistency with the notion of full-de nedness.

Lemma 11 Suppose I is a probabilistic model of the p-program P . Then for all F 2 bf (BL), if TP " n(F ) , it is necessary that I j= F : for all n 0. Proof Prove by induction on n.

2

The following theorem demonstrates that there is a close relationship between consistency of probabilistic logic programs and the full-de nedness of the least xpoint of the operator associated with the program. 21

Theorem 6 A p-program P is inconsistent i lfp(TP ) is not fully-de ned. Proof i) Claim:- If lfp(TP ) is fully-de ned, then P has a probabilistic model. By Theorem 4, KI (lfp(TP )) is non-empty. Then by Corollary 1, I (lfp(TP )) is a non-empty

family of probabilistic models of P . ii)Claim:- If lfp(TP ) is not fully-de ned, then P does not have a probabilistic model. By Lemma 6, if lfp(TP ) is not fully-de ned, then the atomic function associated with lfp(TP ) is not fully-de ned. That is, 9A 2 BL such that lfp(TP )(A) = ;. By Lemma 4, there exists an integer n 0 such that TP " (n + 1)(A) = ;: Thus there exist clauses C1; : : :; Ck so that for all V V 1 i k; Ci A : i F1i : i1 : : : Fmi i : imi , and TP " n(Fji ) ij for all 1 j mi . In addition,

\k

i=1

i = ;. Suppose I is a probabilistic model for P and hence C1; : : :; Ck . If I j= A : i

(i.e. I (A) 2 i) for all 1 i k, then I (A) 2

\k

i = ;, which is impossible! Therefore, 9i V V such that I 6j= A : i. Since I is a probabilistic model of P , I 6j= (F1i : i1 : : : Fmi i : imi ): Then, there exists a j , 1 j mi , such that I 6j= Fji : ij . However by the above lemma, since TP " n(Fji ) ij , I j= Fji : ij , which is a contradiction! Therefore, P cannot have a probabilistic model. Combining i) and ii) the theorem is proved. 2 i=1

Example 6 The following p-program is inconsistent, as lfp(TP )(p) = ; and lfp(TP )(q) =

[0:1; 0:6]. p : [0:3; 0:5] p : [0:1; 0:2] q : [0:1; 0:6]

p : [0:2; 0:9] p : [0:3; 0:4].

2

Example 6 demonstrates another distinction between the paraconsistent programs of Blair and Subrahmanian [5], and the framework proposed here. Blair and Subrahmanian would consider this program to use intervals as truth values. However, in the programs of [5, 6], every program was guaranteed to possess a model. Moreover, inconsistent theories did not entail all formulas. However, in the probabilistic framework, programs need not possess models, and inconsistent theories entail all formulas. Thus, the system proposed here is not paraconsistent. Solvability of sets of linear equations has been widely studied (cf. [28, 35]). In our work, linear programs are only used as a tool enabling the model-theoretic study of probabilistic logic programming. As such, we do not address deeper issues in linear programming in this paper.

5 Proof Procedure In this section, we show how we may process queries to p-programs.

De nition 20 is a uni er of annotated conjunctions C (A ^ : : : ^ An ) : and C (B ^ : : : ^ Bm ) : i fAi j1 i ng = fBij1 i mg. Similarly, is a uni er of annotated 1

1

2

22

1

1

2

disjunctions D1 (A1 _ : : : _ An ) : 1 and D2 (B1 _ : : : _ Bm ) : 2 i fAi j1 i ng = fBij1 i mg. 2 In the following we introduce a notion that is analogous to mgu's in the classical paradigm.

De nition 21 Let UNI (C ; C ) be the set of uni ers of C and C . i) Given ; 2 UNI (C ; C ), i there exists a substitution such that = . ii) i and . 2 1

1

1

2

2

1

1

2

2

2

2

1

1

2

2

1

2

1

Intuitively 1 2 means that 2 is more general than 1 . Note that may not be a partial ordering, since 1 2 and 2 1 does not necessarily imply that 1 = 2 . It is however easy to see that is an equivalence relation.

De nition 22 Given 2 UNI (C ; C ), denote the equivalence class of by [], i.e. [] = f j g. i) [ ] [ ] i there exists such that [ ] = [ ]. ii) [ ] < [ ] i [ ] [ ] and [ ] = 6 [ ]. 2 1

1

1

2

2

2

1

1

2

1

2

2

De nition 23 is a max-gu (maximally general uni er) of C and C i i) is a uni er, i.e. 2 UNI (C ; C ), and ii) there does not exist 2 UNI (C ; C ) such that [] < [ ]. 1

1

1

2

1

2

1

2

2

We note here that max-gu's 3 are exactly like mgu's in ordinary logic programming except that they are not necessarily unique.

Example 7 Consider the annotated disjunctions below: (p(X; a) _ p(Y; b)) : [1; 1] (p(Z; Z ) _ p(c; W )) : [1; 1]: Then the following two substitutions are both max-gu's of the above two annotated disjunctions:

1 = fa=X; a=Z; c=Y; b=W g 2 = fc=X; a=W; b=Y; b=Z g:

2 The following lemma guarantees the existence of a max-gu, not necessarily unique, of two basic formulas, if they are uni able. The proof depends on a result of Martelli and Montanari [38] relating the uni cation problem to the solvability of a set of equations of terms. Like the notion of a complete set of mgu's, there is a corresponding notion for max-gu's, which is unique up to the equivalence as de ned above. 3

23

Lemma 12 If two basic formulas are uni able, then there exists a max-gu of the basic formulas. Proof Here we only show the conjunction case, as the disjunction case is similar. In particular, we show that given C (A ^ : : : ^ An ) : and C (B ^ : : : ^ Bm ) : , if C and C are 1

1

1

2

1

2

1

2

uni able, then there exists a max-gu of C1 and C2. Informally, the proof proceeds as follows. If is a uni er of C1 and C2, then for each Ai , 1 i n, there must be a B(i), 1 (i) m such that Ai = B(i). Conversely, for each 1 j m, there exists a (j ), 1 (j ) n such that Bj = A (j). Thus, informally, when is applied to C1 and C2, there is a suitable \match" between the Ai 's and the Bj 's. We may observe, informally, that C1 and C2 are uni able i there is a solvable set of equations of the form:

f = ? ; : : :; r = ?r g 1

1

where each i 2 fA1; : : :; An g and each ?j 2 fB1; : : :; Bmg and fi j 1 i rg = fA1 ; : : :; Ang and f?j j 1 j mg = fB1; : : :; Bmg. (Note here that we may have the same i occurring more than once on the left side of an equation, and likewise for the ?j 's, but the same equation cannot be repeated twice). We now give a formal explication of this strategy. Formal Proof. An atom A is of the form p(t1; : : :; tn ) for some predicate symbol p and some integer n 0 such that for all 1 i n, ti is a term de ned in the usual way. Call n the arity of A. Then let ai be the arity of Ai for 1 i n and bj be the arity of Bj for 1 j m. Let Aik be the k-th argument of Ai for 1 k ai; 1 i n and Bjk be the k-th argument of Bj for 1 k bj ; 1 j m. Set up all possible sets of equations such that each set of equations satis es the following conditions: 1) The left side of each equation is some Aik , 1 k ai ; 1 i n. 2) The right side of each equation is some Bjk , 1 k bj ; 1 j m. 3) For every equation Aik = Bjl for some i; k; j; l, the following conditions are true: 3.1) Ai and Bj are atoms having the same predicate symbol, 3.2) k = l, and 3.3) for all 1 h ai = bj , there exists the equation Aih = Bjh . (In other words, whenever equations are set up between the arguments of two atoms, these atoms must have the same predicate symbol, and every argument of the predicate are equated in order.) 4) For all 1 k ai ; 1 i n, there exists an equation with Aik on the left side. 5) For all 1 k bj ; 1 j m, there exists an equation with Bjk on the right side. 6) No equation is repeated. Intuitively, as the Aik 's and the Bjk 's are arguments of the atoms of the conjunctions, the equations de ned above can be viewed as the constraints which the uni cation must satisfy. The proof of the lemma then consists of 4 parts. i) Claim:- There are nitely many sets that satisfy the conditions above. Since all the equations are of the form Aik = Bjl for some i; k; j; l such that 1 i n, 1 j m, 1 k ai , 1 l bj , there are at most nL mL = nmL2 distinct equations where L = max(fai j1 i ng Sfbj j1 j mg). Thus there are at most 2nmL2 sets that satisfy the

24

conditions above. Now let S1 ; : : :; Su be all such sets of equations which have solutions. The following shows that since C1 and C2 are uni able, u 1. ii) Claim:- Any uni er of C1 and C2 is a solution to some set of equations de ned above. Regard = fv1=t1 ; : : :; vh =th g where vi 's are variables and ti 's are terms for some h. Rewrite each element of in equation form, that is either as vi = ti or ti = vi (1 i h) according to conditions 1 and 2. Consider those Aik 's (1 k ai ; 1 i n) and Bjk 's (1 k bj ; 1 j m) which do not appear in an equation. They are all ground terms. Since is a uni er, each of these ground terms must be matched against itself during uni cation. Hence, for every Aik (1 k ai ; 1 i n) and Bjk (1 k bj ; 1 j m) which do not appear in an equation, add the equation Aik = Aik or Bjk = Bjk . Thus the set of equations constructed in this way satis es conditions 4 and 5. In addition, since is a uni er such that fAij1 i ng = fBj j1 j mg, the equations constructed satisfy condition 3. Hence this set of equations is the same as Si for some 1 i u. And obviously is a solution of this set of equations. iii) Claim:- For all 1 i u, a solution to Si is a uni er. Given a solution, for every equation where either the left side or the right side is a variable, i.e. v = t or t = v , include v=t in . Now for any Ai (1 i n), according to conditions 3 and 4, there exists some 1 j m such that Ai = Bj , as represents a solution to the equations. Hence fAi j1 i ng fBj j1 j mg. Similarly, for every Bj (1 j m), according to conditions 3 and 5, there exists some 1 i n such that Ai = Bj . Hence fAij1 i ng fBj j1 j mg. Therefore, fAij1 i ng = fBj j1 j mg. Thus is a uni er. iv) Claim:- There exists a max-gu of C1 and C2 . Recall that S1 ; : : :; Su are the sets of equations that have solutions. From ii) and iii), it is proved that solutions to S1 ; : : :; Su correspond to all the uni ers of C1 and C2 . Now for 1 i u, as Si is solvable, it follows by the result of Martelli and Montanari [38] that each Si has an mgu i . Moreover, i is unique in the sense that if i is also an mgu of Si , then i i , i.e. [i ] = [i ]. As we need to consider only nitely many such i 's, it follows that f[1]; : : :; [u]g contain a maximal element (again not necessarily unique) wrt the ordering. This maximal element is a max-gu of C1 and C2. 2 The proof of the above Lemma yields an algorithm to compute max-gu's (or determine their non-existence). We believe that more ecient algorithms exist, but the study of such algorithms is beyond the scope of this paper. Once again, the main reason that we distinguish between max-gu's and ordinary mgu's is that the former are not always unique. In the remainder of this section, we present a proof procedure for p-programs. As this procedure operates on the compiled version of p-programs, we rst formalize the compilation process.

De nition 24 Given a p-program P , de ne REDUN (P ) = P S fA : [0; 1]

j A 2 BLg. 2

De nition 25 1) Given a pair of distinct clauses C ; C of the form C A : Body and C A : Body such that A ; A are uni able via max-gu , let the clause RC1 ;C2 1

2

2

2

2

1

2

25

2

1

1

1

1

V

be the clause A1 : (1 \ 2 ) (Body1 Body2 ). 2) The closure of a p-program P , denoted by CL(P ), is the p-program constructed by repeatedly adding to P all the clauses RC1 ;C2 obtained from distinct p-clauses C1 and C2 in P whose heads are uni able. 2 Note that in generating the closure of a p-program, unlike the treatment in [5, 6], it is sucient only to consider pairs of distinct clauses in P , instead of triplets, quadruplets and so on. Suppose there is a clause RC1 ;:::;Cn that is generated from clauses C1; : : :; Cn in the way de ned above, where n 2. But observe that for n 2 and i = [i ; i] for all 1 i n, 1 \ : : : \ n = [1 ; 1] \ : : : \ [n ; n] = [maxni=1i ; minnj=1 j ] = i0 \ j0 , for some i0; j0 where 1 i0; j0 n. Thus, the clause RCi0 ;Cj0 can replace RC1 ;:::;Cn .

De nition 26 Given a p-program P whose clauses are all standardized apart (cf. Lloyd [36]), let m be the number of clauses in P . Then the normal form of P , denoted by NF (P ), is de ned as follows: i) CF1(P ) = DF1 (P ) = CL(REDUN (P )) ii) for all 2 i m; CFi (P ) = f(A1 ^ : : : ^ Ai) : Body1 V : : : V Bodyi j 81 j i; Aj : j Bodyj 2 CL(REDUN (P )); and = 1 : : : i ; and 81 k; l i; k 6= l ) Ak 6= Alg; DFi (P ) = f(A1 _ : : : _ Ai ) : Body1 V : : : V Bodyi j 81 j i; Aj : j Bodyj 2 CL(REDUN (P )); and = 1 : : : i ; and 81 k; l i; k 6= l ) Ak 6= Alg S 2 iii) NF (P ) = mi=1 (CFi (P ) [ DFi (P ))

Example 8 Let P =SfB : Body ; B : Body ; B : 1) REDUN (P ) = P fB : [0; 1] ; B : [0; 1] g: S 2) CL(REDUN (P )) = REDUN (P ) f B : ( \ ) Body V Body ; B : ( \ [0; 1]) Body ; B : ( \ [0; 1]) Body ; B : ( \ [0; 1]) Body g V S = REDUN (P ) fB : ( \ ) Body Body g. 3) CF (P ) = f(B ^ B ) : [0; 1] ; (B ^ B ) : ( [0; 1]) Body ; (B ^ B ) : ( [0; 1]) Body ; V (B ^ B ) : (( \ ) [0; 1]) Body Body ; (B ^ B ) : ([0; 1] ) Body ; V (B ^ B ) : ( ) Body Body ; V (B ^ B ) : ( ) Body Body ; V V (B ^ B ) : (( \ ) ) Body Body Body g. 1

1

1

1

2

1

1

1

1

2

2

2

3

3

2

1

1

1

2

2

3

Body3 g. Then:

2

1

1

2

2

1

1

1

2

2

1

2

2

1

1

2

2

1

2

1

2

1

2

1

1

2

1

2

1

3

3

1

3

1

2

2

3

2

3

1

2

1

2

2

3

3

1

2

4) DF2 (P ) can be obtained in a similar way. 5) Finally, DF3 (P ) = CF3(P ) = ;.

3

2 26

We now show that the clauses added to P to construct NF (P ) are logical consequences of P , and hence, they do not change the meaning of P . The addition of such clauses is to ensure that the truth value assigned to any basic formula depends on a single clause, rather than a group of such clauses.

Lemma 13 For every clause C 2 NF (P ), P j= C . Proof Case 1: C 2 P

Then if I is a probabilistic model of P , I must be a probabilistic model of C . Therefore, P j= C . Case 2: C 2 REDUN (P ) ? P Then C A : [0; 1] for some A 2 BL. But for any probabilistic model I of P , I (A) 2 [0; 1]. Thus I j= A : [0; 1]. Therefore, P j= C . Case 3: C 2 CL(REDUN (P )) ? REDUN (P ) V Then C A : (1 \ 2 ) (Body1 Body2). Recall from De nition 25 that the clause C is derived from two clauses in REDUN (P ), namely A1 : 1 Body1 and A2 : 2 Body2 , such that A1 ; A2 are uni able via max-gu . (We assume that all clauses in CL(REDUN (P )) are V standardized apart.) Let C0 A0 : (1 \ 2 ) (Body1 Body2 )0 be a ground instance of C . Suppose I is a probabilistic model of P and therefore REDUN (P ), as shown in case V 2 above, and I j= (Body1 Body2)0 . Then I j= Body10 and I j= Body2 0 . Since I is a probabilistic model of REDUN (P ), I j= A0 : 1 and I j= A0 : 2 . Hence it follows that I j= A0 : (1 \ 2 ). Thus, I j= C0 , for any substitution 0 . Thus, I j= C . S Case 4: C 2 li=2 CFi (P ), where l is the cardinality of P V V Then for some 2 k l, C (A1 ^ : : : ^ Ak ) : Body1 : : : Bodyk such that 81 j k; Aj : j Bodyj 2 CL(REDUN (P )), = 1 : : : k , and 81 m; n k; m 6= n ) Am 6= An . Let C0 (A1 ^ : : : ^ Ak )0 : (Body1 V : : : V Bodyk )0 be a ground instance of C . Let I be a probabilistic model of P and therefore CL(REDUN (P )), as shown in cases 1, 2 and 3 above. V V Further suppose that I j= (Body1 : : : Bodyk )0 . Then for all 1 j k, I j= Bodyj 0 . But since I is a probabilistic model of P and CL(REDUN (P )), then for all 1 j k, I j= Aj 0 : j , i.e. I (Aj 0 ) 2 j . But according to Theorem 1, I (A10 ^ A2 0 ) 2 1 2 . By applying the same theorem repeatedly, I ((A1 ^ : : : ^ Ak )0 ) 2 1 : : : k = . Therefore, I j= (A1 ^ : : : ^ Ak )0 : . That is, I j= C0 , for any substitution 0 . Thus, I j= C . S Case 5: C 2 li=2 DFi (P ), where l is the cardinality of P The proof is similar to the one for conjunctions in case 4. This completes the proof for the lemma. 2 We now present a refutation procedure for query processing.

De nition 27 A query is a formula of the form 9(F : V : : : V Fn : n) where for all 1 i n, 1

Fi is a basic formula, not necessarily ground.

1

2

De nition 28V Suppose CG : G : V : : : V Gm : m is a clause in NF (P ), and Q 9(F : : : : V Fn : n ) is a query, and Q; C are standardized apart. Then 0

1

0

1

1

1

27

9((F : V : : : V Fi? : i? V G : V : : : V Gm : m V Fi : i V : : : V Fn : n )) 1

1

1

1

1

1

+1

+1

is an SLDp-resolvent of C and Q on Fi : i i i) is an max-gu of G0 and Fi , and ii) 0 i . If is a uni er, but not necessarily an max-gu, then the resolvent is called an unrestricted SLDp-resolvent. 2

De nition 29 An SLDp-deduction of the initial query Q 9(F : V : : : V Fn : n ) from a p-program P is a sequence hQ ; C ; i; : : :; hQr ; Cr; r i; : : :, where for all i 1, Ci is a renamed 1

1

1

1

1

1

version of a clause in NF (P ) and Qi+1 is an SLDp-resolvent of Qi and Ci via an max-gu i . If the i 's are not restricted to be max-gu's, then the sequence is called an unrestricted SLDpdeduction. 2

De nition 30 An SLDp-refutation of the initial query Q 9(F : V : : : V Fn : n ) from a pprogram P is a nite SLDp-deduction hQ ; C ; i; : : :; hQn; Cn ; n i, where the SLDp-resolvent of 1

1

1

1

1

1

Qn and Cn via max-gu n is the empty query. 1 : : :n is called the computed answer substitution. If the i 's are not restricted to be max-gu's, then the deduction is called an unrestricted SLDprefutation. 2

We now demonstrate that SLDp-refutation is always sound. Hereafter, given a query Q 9(FV1 : V1 V : : : V Fn : n ) and a substitution , we abuse the notation 8(Q) to denote 8(F1 : 1 : : : Fn : n ).

Theorem 7 (SoundnessVof SLDp-refutation) If there exists an SLDp-refutation of the iniV tial query Q 9(F : : : : Fn : n ) from a p-program P , then P j= 8(Q ), where is the 1

1

1

1

computed answer substitution.

Proof Let the given SLDp-refutation be hQ ; C ; i; : : :; hQn; Cn; ni. Proceed by induction on 1

1

1

n, the length of the refutation. Base Case: n = 1 Then Q1 F1 : 1 and C1 G0 : 0 is a renamed version of a clause in NF (P ) such that F11 = G01 and 0 1. Suppose I is a probabilistic model of P . Then by Lemma 13, I is a probabilistic model of all clauses in NF (P ). Therefore, I j= 8(G0 : 0). In particular, I j= 8((G0 : 0)1). That is, I j= 8((G01 ) : 0 ). Thus, I j= 8((G01 ) : 1 ) as 0 1 . That is, I j= 8((F1 1) : 1). Thus, I j= 8((F1 : 1 )1), i.e. I j= 8(Q1 1 ). Inductive case: n > 1 V V Let Q1 (F1 : 1 : : : Fm : m ) and C1 G0 : 0 Body1 be a renamed version of V V a clause in NF (P ) such that Fi 1 = G01 and 0 i . Now Q2 (F1 : 1 : : : Fi?1 : i?1 V Body1 V Fi+1 : i+1 V : : : V Fm : m )1 . But by the induction hypothesis, P j= 8(Q22 : : :n ). V V V V V V In other words, P j= 8((F1 : 1 : : : Fi?1 : i?1 Body1 Fi+1 : i+1 : : : Fm : m )1 2 : : :n ). V V V V V Therefore, P j= 8((F1 : 1 : : : Fi?1 : i?1 Fi+1 : i+1 : : : Fm : m )1 : : :n ), and 28

P j= 8((Body1)1 : : :n ). From the latter, it follows that P j= 8((G01 : : :n ) : 0). Therefore, as Fi1 = G0 1 , it follows that P j= 8((Fi 1 : : :n) : 0). Since 0 i , P j= 8((Fi1 : : :n) : i ) = 8((Fi : i)1 : : :n ). Thus, P j= 8((F1 : 1 V : : : V Fi?1 : i?1 V Fi : i V Fi+1 : i+1 V : : : V Fm : m )1 : : :n ). Therefore, P j= 8(Q11 : : :n ), i.e. P j= 8(Q1 ). 2 Before we proceed to prove the completeness of SLDp-refutation, we prove several lemmas below, all of which are used in proving Theorem 8 (the Completeness Theorem).

Lemma 14 (Max-gu Lemma) Suppose query Q has an unrestricted SLDp-refutation from 1

P . Then Q1 has an SLDp-refutation from P of the same length, such that if 1 ; : : :; n are the uni ers in the unrestricted refutation, and 10 ; : : :; n0 are the max-gu's used in the SLDprefutation, then 1 : : :n = 10 : : :n0 for some .

Proof Similar to the proof for the classical case [36].

2

Lemma 15 (Lifting Lemma) Let Q be a query and be a substitution. Suppose Q has 1

1

an SLDp-refutation from P . Then Q1 has an SLDp-refutation from P of the same length. Moreover, if 1 ; : : :; n are the max-gu's used in the SLDp-refutation from Q1 , and 10 ; : : :; n0 are the max-gu's in the SLDp-refutation from Q1 , then 1 : : :n = 10 : : :n0 for some .

Proof Similar to the proof for the classical case [36].

2

Lemma 16 below requires that P is a consistent p-program.

Lemma 16 1) Let P be a consistent p-program, and C ; C 2 conj (BL). Suppose P j= (C ^ C ) : . Then P j= C : and P j= C : for some ; such that . 2) Similarly, given D ; D 2 disj (BL), if P j= (D _ D ) : , then P j= D : and P j= D : for some ; such that . 1

2

1

1

1

2

1

2

2

2

1

1

1

2

2

1

2

1

1

2

1

2

2

2

Proof Since P is consistent, then for every probabilistic model I of P , I (C ^ C ) 2 . Recall 1

2

from Corollary 2 that the family of probabilistic models corresponding to the least xpoint contains all probabilistic models for P . Hence, it follows from Lemma 9 that lfp(TP )(C1 ^ C2 ) . But lfp(TP )(C1 ^ C2) = lfp(TP )(C1) lfp(TP )(C2). Let 1 and 2 be lfp(TP )(C1) and lfp(TP )(C2) respectively. In other words, 1 2 . Again by Lemma 9, for every probabilistic model I of P , I (C1) 2 1 and I (C2) 2 2 . Therefore, P j= C1 : 1 and P j= C2 : 2 . This completes the proof of 1). The proof of 2) is similar. 2

Lemma 17 Let P be a p-program, F be a basic formula (not necessarily ground), and [0; 1]. Then P j= 9(F : ) ) P j= F : for some ground instance F of F . Proof Proceed by induction on rank(F ).

Base case: rank(F ) = 1 Then F A for some atom A. Let fA1; : : :; Am g be all the ground instances of A, and lfp(TP ) 29

be denoted by h. Further suppose that for all 1 i m, h(Ai ) = [i ; i]. Suppose there exists no i such that 1 i m and [i ; i] . That is, for all 1 i m, 6 [i ; i]. In other words, for all 1 i m, there exists i 2 [i ; i] such that i 62 . Now construct a linear program 1 Q 0

X

from LP (h) by replacing the m linear inequalities of the form i B @

0 with i B @

Kj j Ai and Kj 2 L 1 kj C A i, for 1 i m. Then by Theorem 4, the solution set of =

X Kj j=Ai and Kj 22BL

B

kj C A i

2

Q is non-empty. That is, there exists a solution KI for Q. Since i i i , for all 1 i m, KI 2 KI (h). (Recall from De nition 15 that KI (h) is the family of kernel interpretations corresponding to h.) Therefore, if J is the probabilistic interpretation constructed from KI , J 2 I (h). Thus, J is a probabilistic model of P . But for all 1 i m, J (Ai) = i 62 . Therefore, J 6j= 9(A : ), which is a contradiction! Therefore, there exists an i such that 1 i m and [i ; i] . Now for any probabilistic model I of P , I j= Ai : [i ; i]. Therefore, I j= Ai : . Thus, it follows that P j= Ai : . Inductive case: rank(F ) > 1 Then F is either a conjunction or a disjunction. Case 1: F C ^ D Let fF1 ; : : :; Fk g be all the ground instances of F . For each 1 i k, there exists a substitution i such that Fi Fi Ci ^ Di. Let h be lfp(TP ). Then suppose it is not true that there exists i such that 1 i k and i i , where for all 1 i k, h(Ci ) = i and h(Di ) = i . In other words, for all 1 i k; i i 6 . Let i i = [il; iu ] and (i i ) \ = [ il; iu], for all 1 i k. Then, by our assumption, [ il; iu] cannot contain both il and iu at the same time, for all 1 i k. Now consider the following set of linear inequalities Q on any probabilistic model I : for all 1 i k; li I (Ci ) ui ; where i = [li; ui] for all 1 i k; li I (Di ) ui ; where i = [li; ui] for all 1 i k; il I (Ci ^ Di ) iu . Consider the new set of linear inequalities Q0 constructed from Q. (That is, replace the ranges i i by (i i ) \ , for all 1 i k.) for all 1 i k; li I (Ci ) ui for all 1 i k; li I (Di) ui for all 1 i k; il I (Ci ^ Di ) iu. But according to Lemma 9, all ranges computed by h are the tightest possible ranges. This is because, as shown in the proof of Lemma 9, probabilistic interpretations can always be constructed to take on the upper or lower bounds of ranges computed by h for all basic formulas. Then from the induction hypothesis, there exist probabilistic models that satisfy the following inequalities: 30

for all 1 i k; li I (Ci ) ui for all 1 i k; li I (Di ) ui . In particular, there are models that take on the upper and lower bounds of the inequalities. Thus according to Theorem 1, these models satisfy system Q above. However since for all 1 i k; i i 6 , either il or iu does not lie in [ il; iu]. Therefore, there exists a probabilistic model I such that it satis es system Q above, but for all 1 i k; I (Ci ^ Di ) does not lie in . Thus, I 6j= 9(C ^ D) : , which is a contradiction! Therefore, there exists an i such that 1 i k and i i . Now for any probabilistic model I of P , I j= (Ci ^ Di ) : (i i ). Therefore, I j= (Ci ^ Di ) : . Thus, it follows that P j= (C ^ D)i : . Case 2: F C _ D The proof is similar to the one for conjunctions in case 1. This completes the induction and the proof of the lemma. 2

Lemma 18 Let P be a probabilistic program and Q (F : V : : : V Fn : n ). If P j= 9Q, then P j= Q for some ground instance Q of Q. 1

1

Proof Let fQ ; : : :; Qmg be all the ground instances of Q. (Recall that our language is function free, and thus there are only nitely many such ground instances.) For each 1 j m, V V there exists a substitution j such that Qj Qj F j : : : : Fn j : n . Abbreviate lfp(TP ) by h. Let for all 1 j m; 1 i n; h(Fi j ) = [lij ; uij ]. Assume that there does not exist a k such that 1 k m and for all 1 i n; [lik ; uik ] i . In other words, for all 1 j m, there exists 1 i n, such that [lij ; uij ] 6 i . Let for all 1 j m; 1 i n; [lij ; uij ] \ i = [ ijl ; iju ]. Then by our assumption, for all 1 j m, there exists 1 i n, such that [ ijl ; iju ] cannot contain both lij and uij . Recall that for any 1

1

1

probabilistic model I of P , I satis es the following set Q of linear inequalities:

li1 I (Fi 1 ) ui1 ::: for all 1 i n; lim I (Fim ) uim . for all 1 i n;

Now consider the new set of linear inequalities Q0 constructed from Q. (That is, replace the ranges [lij ; uij ] with [lij ; uij ] \ i = [ ijl ; iju ] for all 1 j m; 1 i n.) for all 1 i n;

il1 I (Fi 1 ) iu1 ::: l I (Fim ) u . for all 1 i n; im im But according to Lemma 9, all ranges computed by h are the tightest possible ranges. This is because, as shown in the proof of Lemma 9, probabilistic models can always be constructed to take on the upper or lower bounds of ranges computed by h for all basic formulas. In particular, there exist models I of P such that I (Fi j ) takes on the values either lij or uij , for 31

all 1 j m and some 1 i n dependent on j . Therefore, there exists a model I of P such that I (Fi j ) 62 i for all 1 j m and some 1 i n dependent on j . Thus, for all V V 1 j m, I 6j= F1 j : 1 : : : Fn j : n , which contradicts the assumption that P j= 9Q! Therefore, there exists a k such that 1 k m and for all 1 i n; [lij ; uij ] i . Now recall that for any probabilistic model I of P , I j= Fi k : [lik ; uik ], for all 1 i n. Thus, I j= Fi k : i , for all 1 i n. Therefore, it follows that I j= Qk where I is any probabilistic model of P . Thus, P j= Qk . 2 Lemmas 17 and 18 may seem false when one considers Fi 2 disj (BL). Consider the classical logic sentence p(a) _ p(b). This sentence may be represented as the single clause p-program P : (p(a) _ p(b)) : [1; 1]

:

Note that P does not probabilistically entail the query (9x)(p(x) : [1; 1]). To see this, let KI be the probabilistic kernel interpretation that assigns 31 to each of the worlds K1 = fp(a); p(b)g, K2 = fp(a)g, K3 = fp(b)g and 0 to K4 = ;. Let I be the probabilistic interpretation associated with KI . Then I is a probabilistic model of P , but I assigns 32 to each of p(a); p(b). Thus, I 6j= (9x)(p(x) : [1; 1])). However, I assigns 1 to (9x)p(x). This is probabilistically correct because, in general, the probabilistic statement P((9x)q (x)) = is not equivalent to the (metalinguistic) statement (9x)(P(q (x)) = ).

Lemma 19 Let P be a consistent p-program, F be a basic formula, and be a substitution. For 1, if TP " (F) , then there exists a clause C in NF (P ) having an instance of the form G : 0 F : V : : : V Fn : n such that 0 and for all 1 i n, TP " ( ? 1)(Fi) i 1

1

and G subsumes F.

Proof Proceed by induction on rank(F ).

Base case: rank(F ) = 1 T Then F A for some atom A. Recall from De nition 11 that TP " (A) = S , where S = fjA : F1 : 1 V : : : V Fn : n is a ground instance of a clause in P and 8i; 1 i n; TP " ( ? 1)(Fi) ig. If S = ;, then TP " (A) = ? = [0; 1]. But recall that the clause A : [0; 1] is in REDUN (P ) and is therefore in NF (P ). So this clause satis es the requirement of the lemma. If there is only one element in S , then the clause whose head annotation is is in P and therefore in NF (P ). The lemma is therefore proved for this case. If however T there is more than one in S , then there exists a pair i ; j 2 S such that i \ j = S . But then by De nition 25, there exists a clause RC1 ;C2 in CL(REDUN (P )) and therefore in NF (P ) such that it is the closure of the 2 clauses Ci ; Cj whose head annotations are i ; j respectively and whose head's atomic parts have A as a common instance. And this clause satis es the requirement of the lemma. Inductive case: rank(F ) > 1 Then, F is either a conjunction or a disjunction. Case 1: F C1 ^ C2 Then, TP " (F) = TP " (C1) TP " (C2) = 1 2 . By the induction hypothesis, 32

V V

there exists a clause in NF (P ) having an instance of the form C10 : 0 F1 : 1 : : : Fn : n such that 0 1 and for 1 i n, TP " ( ? 1)(Fi ) i and C10 subsumes C1. Similarly, V V there exists a clause in NF (P ) having an instance of the form C20 : 0 G1 : 1 : : : Gn : m such that 0 2 and for 1 i m, TP " ( ? 1)(Gi) i and C20 subsumes C2. Then V V V V V the clause C (C1 ^ C2) : 0 0 F1 : 1 : : : Fn : n G1 : 1 : : : Gn : m is an instance of a clause in NF (P ). Moreover, 0 0 1 2 . Thus the clause C satis es the requirement of the lemma. Case 2: F D1 _ D2 The proof is similar to the one for conjunctions in case 1. This completes the induction and the proof of the lemma. 2 We now demonstrate that SLDp-resolution is complete when we consider consistent p-programs.

Theorem 8 (Completeness of SLDp-refutation) Let P be a consistent p-program and Q be a query. Then if P j= Q, there exists an SLDp-refutation of Q from P . Proof Let Q 9(F : V : : : V Fm : m ). Then by Lemma 18, there is a such that Q is ground and P j= (Q). Therefore, for all 1 j m, P j= (Fj : j ), where Fj is ground. Case 1: Fj Aj ^ : : : ^ Ajnj Then by Lemma 16, for all 1 i nj , P j= (Aji : i) for some i such that : : : nj j . Therefore, lfp(TP )(Aji ) i . Thus, there exists an integer ji < ! such that TP " ji (Aji ) i . Now pick j = maxfji j1 i nj g. Since TP is monotonic, TP " j (Aji ) i , for all 1 i nj . Since is monotonic, then for all 1 j m, TP " j (Fj ) = TP " j (Aj ) : : : TP " j (Ajnj ) : : : nj j . Case 2: Fj Aj _ : : : _ Ajnj Then by Lemma 16, for all 1 i nj , P j= (Aji : i) for some i such that : : : nj j . Therefore, lfp(TP )(Aji ) i . Thus, there exists an integer ji < ! such that TP " ji (Aji ) i . Now pick j = maxfji j1 i nj g. Since TP is monotonic, TP " j (Aji ) i , for all 1 i nj . Since is monotonic, then for all 1 j m, TP " j (Fj ) = TP " j (Aj ) : : : TP " j (Ajnj ) : : : nj j . 1

1

1

1

1

1

1

1

1

1

Therefore, by combining cases 1 and 2 above, it follows that for all basic formulas F1 ; : : :; Fm , there exists an integer j such that TP " j (Fj ) j , for 1 j m. Now pick = maxfj j1 j mg. Since TP is monotonic, TP " (Fj ) j , for all 1 j m. Now we proceed by induction on to prove that there exists an SLDp-refutation of Fj for all 1 j m. Base case: = 1 By Lemma 19, there exists a clause in NF (P ) having a ground instance Cj Fj : 0j , where 0j j . Then Figure 1 shows an unrestricted refutation of Fj . By the Max-gu Lemma, there exists an SLDp-refutation of Fj .

Inductive case: > 1 Either i) TP " ( ? 1)(Fj ) j , in which case, by the induction hypothesis, there is an SLDprefutation of Fj , or ii) By Lemma 19, there exists a clause in NF (P ) having a ground instance Cj Fj : 0j 33

Fj

Cj

@@ ? ? @ ?

Figure 1: Unrestricted Refutation of Fj (base case)

6G

1

R

?

^ : : : ^ Gk HH . . G2 ^ : : : ^ G k HH . . G3 ^ : : : ^ Gk HH . .

6R ? 6R ? 6R ?

1

2

3

: : : Rk

V V

Figure 2: SLDp-refutation of G1 : 1 : : : Gk : k

G1 : 1 V : : : V Gk : k such that 0j j and for all 1 i k, TP " ( ? 1)(Gi) i. By the induction hypothesis, each of Gi has an SLDp-refutation Ri. Therefore, the ground query G1 : 1 V : : : V Gk : k has an SLDp-refutation R, as shown in Figure 2. Hence, Figure 3 shows an unrestricted refutation of Fj . By the Max-gu Lemma, Fj has an SLDp-refutation. This completes the induction on . Thus for each 1 j m, there is an SLDp-refutation of Fj . And these refutations can be combined into an SLDp-refutation of Q, analogous to the refutation shown in Figure 2. Finally, by the Lifting Lemma, there is an SLDp-refutation of Q. 2 Fj

C

HHH j G ^ : : : ^ Gk HH . . 1

6R ?

Figure 3: Unrestricted Refutation of Fj (inductive case)

34

6 Examples and Discussion We now present a few examples to show how SLDp-refutation works.

Example 9 Let P be the simple p-program containing the clauses: p : [0:7; 0:7] q : [0:3; 0:3]

q : [0:3; 0:4]

(1) (2)

Consider now the query, Q1 p : [0:7; 0:7]. A SLDp-refutation of this query is shown below:

p : [0:7; 0:7] q : [0:3; 0:4] 2

Initial Query Resolving 3; 1 Resolving 4; 2

(3) (4) (5)

Consider the more complex query Q2 (p _ p) : [0:7; 0:8]. Clearly, P j= Q2 . And (p _ p) : [0:7; 0:8] can resolve with clause (1) in NF (P ). Thus the remainder of the refutation proceeds in the same way as the refutation of Q1 . 2 In everyday reasoning, the occurrence of a single event E1 may be the cause for some action. Likewise, event E2 may be the cause for some other action. However, the simultaneous occurrence of events E1 and E2 may necessitate action that neither event, if occurring individually, would necessitate. For instance, a suspected murderer may be mildly alarmed when questioned by the police (event E1). He may hire an attorney (action A1 ). When he nds himself being followed by investigators (event E2), he becomes more alarmed and takes action (action A2 ) to ditch his shadows. Having ditched his shadows, he hears on the radio that his father is being questioned by the police (event E3). He decides to ee to an obscure country (action A3 ). Clearly, event E1 is routine and should not lead to action A3 . Event E3 by itself is also routine and should not necessitate A3 . However, events E1; E2 and E3 combined may be the cause for drastic action (e.g. if the father can invalidate the murderer's alibi) and necessitate him to ee. Example 10 below presents another scenario.

Example 10 Viper Labs is a small medical laboratory. It has facilities to conduct three kinds of tests (called t1 ; t2 ; t3) for identifying certain poisons secreted in the fangs of Indian vipers. There are three known species of vipers in India, and each secretes a dierent kind of poison (poisons p1 ; p2 and p3 ). Viper poison acts rapidly on the human circulatory system, and if prompt action is not taken, a person bitten by a viper may die. Not much is known about the general properties of such vipers as they are rather elusive creatures. However, based on statistical data derived 35

from previous experience with these creatures (and their hapless victims!), it has been concluded that the gures shown in the table below hold:

t1

t2

t3

p1

p2

p3

pos pos pos 95% 50% 75% pos pos neg 80% 35% 15% pos neg pos 85% 25% pos neg neg 20% - 15% neg pos pos 75% 100% 50% neg pos neg 40% 35% neg neg pos 10% 25% 50% neg neg neg The entries in the above table that are not lled in are to be treated as \don't knows". Intuitively, the rst row in the above table says that if an individual X who has been bitten by an Indian viper tests positive on all three tests, then there is a 95% chance that he was bitten by a viper secreting poison p1 . (We will assume that all the entries in the above table have a margin of error of 5%.) Thus, this really says that the probability that X is aected by poison p1 lies in the 90 ? 100% range. The question is: Exactly how do we translate this table into a probabilistic logic program ? We show two possibilities below. Possibility 1. We could translate the rst row into the clauses:

p1(X ) : [0:9; 1] p2 (X ) : [0:45; 0:55] p3(X ) : [0:7; 0:8]

t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; pos) : [1; 1] t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; pos) : [1; 1] t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; pos) : [1; 1]

(1) (2) (3)

Intuitively, the rst clause says that if person X tests positive for t1 ; t2 and t3 , then the probability that X is aected by poison p1 lies in the 90 ? 100% range. The second clause says that if person X tests positive for t1 ; t2 and t3 , then the probability that X is aected by poison p2 lies in the 45 ? 55% range (and similarly for the third clause). Likewise, the second row in the table translates into the three clauses:

p1 (X ) : [0:75; 0:85] p2(X ) : [0:3; 0:4] p3(X ) : [0:1; 0:2]

t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; neg) : [1; 1] t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; neg) : [1; 1] t1 (X; pos) : [1; 1] ^ t2 (X; pos) : [1; 1] ^ t3 (X; neg) : [1; 1]

(4) (5) (6)

In the same way, we can translate the other rows in the table and thus obtain a p-program. Let us call this program P . Let us assume that p also contains three facts about a person called joe, viz. that joe tested positive on all three tests: 36

t1 (joe; pos) : [1; 1] t2 (joe; pos) : [1; 1] t3 (joe; pos) : [1; 1]

(7) (8) (9)

Then it is veri able that:

P j= p1 (joe) : [0:9; 1] and P j= p2 (joe) : [0:45; 0:55] and P j= p3 (joe) : [0:7; 0:8]. A proof of p1 (joe) : [0:9; 1] is shown below: p1(joe) : [0:9; 1] t1 (X; pos) : [1; 1] ^ t2(X; pos) : [1; 1] ^ t3 (X; pos) : [1; 1] t2(X; pos) : [1; 1] ^ t3 (X; pos) : [1; 1] t3 (X; pos) : [1; 1] 2

Initial Query Resolving 10; 1 Resolving 11; 7 Resolving 12; 8 Resolving 13; 9

(10) (11) (12) (13) (14)

Possibility 2. An alternative possibility is to translate the rst row into the following three clauses:

p1 (X ) : [0:9; 1] p2 (X ) : [0:45; 0:55] p3(X ) : [0:7; 0:8]

(t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1] (t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1] (t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1]

(1) (2) (3)

Here, the program clause bodies contain annotated conjunctions. Repeating this procedure for every row, we get a program Q which also contains the clause: (t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1]

(4)

It is easy to ascertain that NF (Q) contains the clause:

p1(X ) _ p3(X ) : [0:9; 1]

(t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1]

In addition, the following can be proved: 37

(5)

Q j= p1(joe) : [0:9; 1] and Q j= p2(joe) : [0:45; 0:55] and Q j= p3(joe) : [0:7; 0:8]. Suppose now that we wish to nd suitable medication for Joe. We would like to nd the most appropriate medication to give him. Suppose we have three medicines m1; m2; m3. We may have the rules:

give(X; m1; [1]) : [1; 1] give(X; m2; [2]) : [1; 1] give(X; m3; [1]) : [1; 1] give(X; m1[1; 2]) : [1; 1] give(X; m3; [1; 3]) : [1; 1]

p1 (X ) : [0:7; 1] p2 (X ) : [0:6; 1] p1 (X ) : [0:6; 1] (p1(X ) _ p2 (X )) : [0:6; 1] (p1(X ) _ p3 (X )) : [0:8; 1]

(6) (7) (8) (9) (10)

The rst rule here says that if there is at least a 70% chance that X was bitten by a viper secreting p1, then we should surely give him medicine m1. The third argument of give just tells us which poisons are treated by this medication. On the other hand, the last rule is more complex; it says that if the probability of X being bitten by a viper secreting one of the posions p1 or p3 is greater than or equal to 80%, then we should give X medicine m3 . Let V IPER be the program Q together with the above rules de ning give. Then:

V IPER j= give(joe; m1; [1]) : [1; 1] V IPER j= give(joe; m2; [2]) : [0; 1] V IPER j= give(joe; m3; [1; 3]) : [1; 1] V IPER j= (p1(X ) _ p3 (X )) : [0:9; 1] A proof of give(joe; m3; [1; 3]) : [1; 1] is shown below: give(joe; m3; [1; 3]) : [1; 1] (p1(X ) _ p3(X )) : [0:8; 1] (t1 (X; pos) ^ t2 (X; pos) ^ t3 (X; pos)) : [1; 1] 2

Initial Query Resolving 11; 10 Resolving 12; 5 Resolving 13; 4

(11) (12) (13) (14)

A doctor may wish to use medicine m3 to treat Joe as it treats both p1 and p3 . Of course, if Joe was bitten by a snake secreting venom p2 , then he is going to die under this treatment. Such decisions are often made in criticial situations, especially if medicine m2 is incompatible with medicine m3. 2 We now present a few simple examples to demonstrate the specialization of the probabilistic framework developed here to the case of two valued logic. 38

Example 11 (Certain Information) Consider the following classical Horn program P 0 : p(X ) q(a)

q(X )

A corresponding p-program P is: p(X ) : [1; 1] q(X ) : [1; 1] q(a) : [1; 1] It is easy to verify that lfp(TP ) = TP " 2, which is the following formula function: TP " 2(p(a)) = [1; 1]; TP " 2(q(a)) = [1; 1]; TP " 2(p(a) ^ q(a)) = [1; 1]; TP " 2(p(a) _ q(a)) = [1; 1]. Then LP (lfp(TP )) is the following system of constraints: 1 (k1 + k2) 1 1 (k1 + k3) 1 1 k1 1 1 (k1 + k2 + k3 ) 1 k1 + k2 + k3 + k4 = 1, and k1; k2; k3; k4 0, where the four \possible worlds" are K1 = fp(a); q (a)g; K2 = fp(a)g; K3 = fq (a)g; and K4 = ;. But the only solution to LP (lfp(TP )) is k1 = 1; k2 = k3 = k4 = 0, indicating that fp(a); q (a)g is the only \real world", and that there is a unique probabilistic model for P 0 . Thus, in this case, the probabilistic semantics coincides with the classical logic semantics. 2 The following result is easy to prove.

Proposition 1 Suppose P is any classical logic program (cf. Lloyd [36]). Let Q be the p-

program obtained by annotating all atoms in P with [1; 1]. Let I be any Herbrand interpretation in the sense of Lloyd [36]. Let pr(I ) be the atomic function that assigns [1; 1] to all A 2 I and [0; 0] to all A 2= I . Then, for all ground atoms A, A 2 TP (I ) i TP (pr(I ))(A) = [1; 1]. (The rst occurrence of TP is the operator de ned in Lloyd [36]). 2

Example 12 Consider the following classical Horn program P 0: p(a)

q(a)

A corresponding p-program P is: p(a) : [1; 1] q(a) : [1; 1] Then lfp(TP ) = TP " 0 = ?, indicating that nothing can be concluded. The behaviour is the same if q p or q (not p). 2

Example 13 (Inconsistent Information) Consider the following set P 0 of clauses: p(a) q(a) (not p(a)) q (a)

39

q(a)

A corresponding p-program P is: p(a) : [1; 1] q(a) : [1; 1] p(a) : [0; 0] q(a) : [1; 1] q(a) : [1; 1] Then lfp(TP )(p(a)) = ;, indicating that program P is inconsistent. (It is easy to see that our notion of inconsistency generalizes the one used in the classical framework.) 2 As we have noted earlier, the semantics presented here is not paraconsistent in nature; in particular, inconsistent p-programs entail everything. The least xed-point of TP , however, may assign ; to some basic formulas, without assigning ; to all basic formulas. Hence, if we consider lfp(TP ) as a \reasonable" way to interpret an inconsistent theory, then this semantics is paraconsistent. Whether one chooses to adopt this approach or not depends on the user. Consider the following example adapted from Bacchus [2]:

Example 14 Suppose we know that over 80% of all dogs bark, and that Fido and Benjy are dogs. However, Benjy is unable to bark (his vocal cords were injured at some point). This can be represented as:

bark(X ) : [0:8; 1] dog(X ) : [1; 1] dog(fido) : [1; 1] dog(benjy) : [1; 1] bark(benjy) : [0; 0] Here, lfp(TP ) assigns ; to bark(benjy ) and [0:8; 1] to bark(fido). Thus, as far as Benjy is

concerned, this database contains some inconsistency, but the existence of this inconsistency does not aect Fido. 2

In general, our approach to representing probabilistic information is a subjectivistic view, while Bacchus' [2] approach uses an explicitly empirical representation of probabilities. However, as noted by Bacchus [2, p.19{20], \the possible worlds approach, which expresses a subjective probability, can assign a probability to a closed formula, but is incapable of representing empirical probabilities". On the other hand, Bacchus' system \is incapable of representing a subjective probability assignment to a closed formula", [2, p.20]. In addition, it is not clear how to use Bacchus' system as a basis for logic programming. Note that the program in Example 14 can also be represented in the framework of Blair and Subrahmanian [5] and Kifer and Subrahmanian [32]. According to their semantics which is paraconsistent, this program has a model. This program also shows that when intervals are considered to be truth values (as is possible in [5, 32]), the resulting semantics is dierent from the probabilistic semantics. The integration of logic and probability theory has been the subject of numerous studies [7, 9, 24, 37, 42, 13, 14, 41, 45]. Here, we only compare our work with those that are directly related to 40

our eorts. Nilsson [41] has given an informal operational account for integrating probabilities into logic. His framework lacks a model theory. Fagin, Halpern and Megiddo [15], in a pioneering paper, propose a model-theoretic basis for reasoning about systems of linear inequalities. Their aim is othogonal to ours; they are concerned with satis ability of such systems rather than with the assignment of probabilistic truth values to propositions. They explicitly state [15, p.2] that in their system \All formulas are either true or false. They do not have probabilistic truth values." In this respect, our aims and techniques are dierent from those of Fagin et al [15]. However, there seem to be a number of connections between our work and theirs; in particular, it would be interesting to see if their measure-theoretic approach could be used to develop a foundation for probabilistic logic programming, with inner measures serving as lower probability bounds, and outer measures serving as upper bounds for probabilities. We are currently studying this topic. In a related context, Kyburg [34] has used a complex metalanguage that includes ZF set theory to express statistical information. Our proposal achieves similar goals within a rst order framework. It is not clear how Kyburg's proposal may be used as the basis for a programming language (it was not intended for that). The well-known Dempster Shafer theory of evidence [13, 45] does not seem to t into our framework in any immediate way. There is some controversy, at present, on the epistemological basis for Dempster Shafer theory. For instance, Cheeseman [10] argues that the theory of Dempster Shafer belief functions is ad-hoc and non-probabilistic (cf. also Shafer[45]). However, recent results of Fagin, Halpern and Megiddo [15] indicate a closer connection between Dempster Shafer theory and probability theory. We avoid this controversy and note that a great deal of work still needs to be done in the development of a model-theoretic foundation for Dempster Shafer theory. Fitting [19] observes that developing quantitative logic programming languages based on Dempster Shafer theory is still an open problem. Bandler and Kohout [4] suggest an interval valued representation of multivalued logical operations. Their framework is based on fuzzy set theory, and they compute the lower and upper bounds of an interval with the use of min-max and product-sum. Fuzzy set theory [52, 53] which also plays an important role in uncertain reasoning is well known to possess non-probabilistic features and hence we do not discuss it in greater detail here. In logic programming, most work on quantitative deduction has focused on non-probabilistic logic programming. We feel that one reason for this has been because the relationship between logic and probability theory has been elusive. The frameworks of Blair and Subrahmanian [6] have dealt with lattice-based logic programming. A similar comment applies to the work of Fitting who interprets conjunction and disjunction as GLB and LUB, respectively, in the lattice. However, probabilities do not respect this interpretation (e.g. the probability of a disjunction may be much greater than the LUB of the probabilities of the individual disjuncts). van Emden [51] develops a quantitative logic programming language in which multiplication is used to assign truth values to conjunctions. Of course, probabilities can be multiplied only if the events are independent, and hence van Emden's framework is also non-probabilistic. Baldwin [3] develops an operational model for evidential logic programming which is based on fuzzy set theory, and there is no immediately forthcoming model-theoretic basis for his work. 41

Our framework has its limitations. In particular, there is no provision made for expressing conditional probabilities. In addition, we assume that programs are function free. This assumption was found to be necessary because we would like the set of inequalities determined by an atomic function to be nite. When function symbols are allowed, each ground atom (of which there may be an in nite number) determines an inequality, and hence we may have an in nite number of such inequalities. Solving an in nite set of inequations requires some dierent techniques. In a related context, Keisler [26] has shown that even nite logics with -additive probability distributions (cf. Halmos [25]) over the domain of an interpretation are not compact. We are currently working on this problem.

7 Conclusions Thus far, quantitative logic programming languages [51, 18, 19, 5, 48, 49] have been unable to deal with probabilistic information. As probabilistic and statistical information is widely used in everyday decision-making, it is essential that logic programs have the ability to represent probabilistic information. We have proposed, in this paper, a probabilistic framework for logic programming. We have developed a probabilistic model theory and showed various connections between families of probabilistic models and the xpoints of an operator associated with the program. Our probabilistic model theory satis es the four properties that Fenstad [16] states as desiderata for a function to be considered probabilistic. In addition, we have developed a sound and complete proof procedure for such languages. To our knowledge, this is the rst probabilistic semantics for quantitative logic programming. In ongoing research, we show how this framework can be used to reason about queueing systems and for developing logic operating systems. The latter is greatly facilitated by the fact that mutual exclusion of events E1; E2 can be expressed as (E1 ^ E2) : [0; 0] even though the individual probabilities of events E1 and E2 may be non-zero. We also study the support of variables in annotations and non-monotonic negation.

Acknowledgements. We would like to thank Fahiem Bacchus, Howard Blair, Wiktor Marek,

Michael Kifer, and Maarten van Emden for numerous fruitful discussions and constructive comments on the manuscript. VS also thanks the \Oce of Graduate Studies and Research of the University of Maryland" for the nancial support in the summer of 1990. Raymond thanks Timos Sellis for nancial support. The Research was partially sponsored by the National Science Foundation under Grant IRI-8719458.

References [1] E.J. Anderson and P. Nash. (1987) Linear Programming in In nite-Dimensional Spaces: Theory and Applications, John Wiley & Sons Ltd. 42

[2] F. Bacchus. (1988) Representing and Reasoning with Probabilistic Knowledge, Research Report CS-88-31, University of Waterloo. [3] J.F. Baldwin. (1987) Evidential Support Logic Programming, J. of Fuzzy Sets and Systems, 24, pps 1-26. [4] W. Bandler and L.J. Kohout. (1984) Uni ed Theory of multivalued logical operations in the light of the checklist paradigm, Proceedings of IEEE-Trans. SMC. [5] H. A. Blair and V.S. Subrahmanian. (1987) Paraconsistent Logic Programming, Theoretical Computer Science, 68, pp 35-54. Prelim version in: Proc. 7th Conference on Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science, Vol. 287, pps 340{360, Springer Verlag. [6] H. A. Blair and V. S. Subrahmanian. (1988) Paraconsistent Foundations for Logic Programming, Journal of Non-Classical Logic, 5, 2, pp. 45{73. [7] G. Boole. (1854) The Laws of Thought, Macmillan, London. [8] G. Boolos and R. Jerey. (1980) Computability and Logic, Cambridge University Press. [9] R. Carnap. (1962) The Logical Foundations of Probability, 2nd. ed., University of Chicago Press. [10] P. Cheeseman. (1985) In Defense of Probability, in: Proc. IJCAI{85, pp 1002{1009. [11] N. C. A. da Costa, J. M. Abe and V. S. Subrahmanian. (1989) Remarks on Annotated Logic, draft manuscript. [12] N. C. A. da Costa, V.S. Subrahmanian and C. Vago. (1989) The Paraconsistent Logics PT , to appear in: Zeitschrift fur Mathematische Logik und Grundlagen der Mathematik, Vol. 37, 1991. [13] A. P. Dempster. (1968) A Generalization of Bayesian Inference, J. of the Royal Statistical Society, Series B, 30, pp 205{247. [14] R. Fagin and J. Halpern. (1988) Uncertainty, Belief and Probability, in Proc. IJCAI-89, Morgan Kauman. [15] R. Fagin, J. Y. Halpern and N. Megiddo. (1989) A Logic for Reasoning About Probabilities, draft manuscript. [16] J. E. Fenstad (1980) The Structure of Probabilities De ned on First-Order Languages, in: Studies in Inductive Logic and Probabilities Volume 2, ed. R. C. Jerey, pp 251{262, University of California Press. [17] M.C. Fitting. (1985) A Kripke-Kleene Semantics for Logic Programming, Journal of Logic Programming, 4, pps 295-312. 43

[18] M. C. Fitting. (1988) Logic Programming on a Topological Bilattice, Fundamenta Informatica, 11, pps 209{218. [19] M. C. Fitting. (1988) Bilattices and the Semantics of Logic Programming, to appear in: Journal of Logic Programming. [20] M. C. Fitting. (1988) Bilattices and the Theory of Truth, to appear in: Journal of Philosophical Logic. [21] M. Frechet. (1935) Generalizations du Theoreme des Probabilities Totales, Fund. Math. 25, pps 379{387. [22] H. Gaifman. (1964) Concerning Measures in First Order Calculi, Israel J. of Math., 2, pps 1{17. [23] B. V. Gnedenko and A. Y. Khinchin. (1962) An Elementary Introduction to the Theory of Probability, Dover Publications. [24] T. Hailperin. (1984) Probability Logic, Notre Dame J. of Formal Logic 25, 3, pps 198{212. [25] P. Halmos. (1950) Measure Theory, Springer. [26] H. J. Keisler. (1985) Probability Quanti ers, in: J. Barwise and S. Feferman, eds., \Model Theoretic Logics", Springer. [27] J. L. Kelley. (1955) General Topology, Springer. [28] M. H. Karwan, V. Lofti, J. Telgen and S. Zionts. (1983) Redundancy in Mathematical Programming: A State of the Art Survey, Springer. [29] M. Kifer and A. Li. (1988) On the Semantics of Rule-Based Expert Systems with Uncertainty, 2-nd Intl. Conf. on Database Theory, Springer Verlag LNCS 326, (eds. M. Gyssens, J. Paredaens, D. Van Gucht), Bruges, Belgium, pp. 102{117. [30] M. Kifer and E. Lozinskii. (1989) RI: A Logic for Reasoning with Inconsistency, 4-th Symposium on Logic in Computer Science, Asilomar, CA, pp. 253-262. [31] M. Kifer and E.L. Lozinskii. (1989) A Logic for Reasoning with Inconsistency, submitted to a technical journal. [32] M. Kifer and V. S. Subrahmanian. (1991) Theory of Generalized Annotated Logic Programming and its Applications, to appear in Journal of Logic Programming. [33] A. N. Kolmogorov. (1956) Foundations of the Theory of Probability, Chelsea Publishing Co. [34] H. Kyburg. (1974) The Logical Foundations of Statistical Inference, D. Reidel.

44

[35] J.-L. Lassez, T. Huynh and K. McAloon. (1989) Simpli cation and Elimination of Redundant Linear Arithmetic Constraints, in: Proc. 1989 North American Conf. on Logic Programming, (eds. E. Lusk and R. Overbeek), pp 37{51, MIT Press. [36] J. W. Lloyd. (1987) Foundations of Logic Programming, Springer. [37] J. Lukasiewicz. (1970) Logical Foundations of Probability Theory, in: Selected Works of Jan Lukasiewicz, ed. L. Berkowski, pp 16{43, North Holland. [38] A. Martelli and U. Montanari (1982) An Ecient Uni cation Algorithm, ACM Trans. on Prog. Lang. and Systems 4, 2, pp 258{282. [39] S. Morishita. (1989) A Uni ed Approach to Semantics of Multi-Valued Logic Programs, Tech. Report RT 5006, IBM Tokyo, April 9th, 1990. [40] R.T. Ng and V.S. Subrahmanian. (1989) Probabilistic Logic Programming, Tech. Report., University of Maryland. [41] N. Nilsson. (1986) Probabilistic Logic, AI Journal 28, pp 71{87. [42] C. S. Peirce. (1883) A Theory of Probable Inference, in: \Studies in Logic", pp 126{181, Little, Brown and Co., Boston. [43] A. Schrijver. (1986) Theory of Linear and Integer Programming, John Wiley and Sons. [44] D. S. Scott and P. Krauss. (1966) Assigning Probabilities to Logical Formulas, in: Aspects of Inductive Logic, ed. J. Hintikka and P. Suppes, N. Holland. [45] G. Shafer. (1976) A Mathematical Theory of Evidence, Princeton University Press. [46] E. Shapiro. (1983) Logic Programs with Uncertainties: A Tool for Implementing Expert Systems, Proc. IJCAI '83, pps 529{532, William Kauman. [47] J. Shoen eld. (1967) Mathematical Logic, Addison-Wesley. [48] V.S. Subrahmanian. (1987) On the Semantics of Quantitative Logic Programs, Proc. 4th IEEE Symposium on Logic Programming, Computer Society Press, Washington DC, pps 173-182. [49] V. S. Subrahmanian. (1988) Mechanical Proof Procedures for Many Valued Lattice Based Logic Programming, to appear in: Journal of Non-classical Logic. [50] V. S. Subrahmanian. (1989) Paraconsistent Disjunctive Deductive Databases, to appear in: Theoretical Computer Science. [51] M.H. van Emden. (1986) Quantitative Deduction and its Fixpoint Theory, Journal of Logic Programming, 4, 1, pps 37-53. [52] L. A. Zadeh. (1965) Fuzzy Sets, Information and Control, 8, pp. 338{353. [53] L. A. Zadeh. (1968) Fuzzy Algorithms, Information and Control, 12, pp. 94{102. 45