Policy Evaluation for Network Management - CiteSeerX

0 downloads 0 Views 181KB Size Report
out to the “policy server” as a java object of a pre de- fined java class. The java object ..... ities are due to: simple policies for which an expo- nential number of ...
1

Policy Evaluation for Network Management Randeep Bhatia Jorge Lobo Madhur Kohli Network Computing Research Bell Labs 600 Mountain Ave Murray Hill, NJ 07974 Abstract— Policies are increasingly being used to manage complex communication networks. In this paper we present our work on a “policy server” which is being used to provide centralized administration of packet voice gateways and “soft switches” in next generation circuit and packet telephony networks. The policies running in the policy server are specified using a domain independent policy description language (PDL). This paper is motivated by the problem of evaluating policies specified in PDL. We present an algorithm for evaluating policies and study both its theoretical and empirical behavior. We show that the problem of evaluating policies is quite intractable. However we note that the hard instances of the policy evaluation problem are quite rare in real world networks. Under some very realistic assumptions we are able to show that our policy evaluation algorithm is quite efficient and is well suited for enforcing policies in complex networks. These results constitute the first attempt to develop a formal framework to study the informal concepts of policy based network management.

I. I NTRODUCTION A network policy defines the behavior of the network under various circumstances. Many of these policies can be formulated as sets of low-level rules that describe how to (re)configure a device or how to manipulate the different network elements under different network conditions [4]. These conditions may include a network element going down or too many calls arriving at a router etc. The condition or state of the network is conveyed by primitive events which originate at network elements. For example when a device is powered up it may generate a PowerUp event. In general a primitive event is a conveyor of

the state of the network element which originated it, at the time when the event was generated. State data in an event may contain information such as the name of the originating device and the time when the event was generated etc. A policy defines the set of actions which must be taken, to control the behavior of the network elements, when certain primitive events occur. For example when a router fails a policy may require that all switches route their calls through a different router. Consider a simplified example of setting some of the parameters of a modem pool for an Internet Services Provider (ISP). The ISP has two customers, A and B , and gives a different telephone number to each customer: 5599991 to A and 5599992 to B . Both telephone numbers share the same pool of 20 modems. There can be many simultaneous connections from the same customer to the modem pool, however the server can be configured to limit the maximum number of connections allowed per customer. A policy can describe how these limits change according to the time of the day. For example, during office hours a customer may access 15 of the modems, but only 10 after hours. Let us say we have a clock generating a primitive event, say CoarseTimeEvent. Instances of this event occur four times a day, with an attribute named Time with a value from the set f“morning”, “noon”, “evening”, “midnight”g. The policy rules for the

2

morning changes are: 1)

CoarseTimeEvent causes ModemPoolAssigment ; if CoarseTimeEvent:Time morning : CoarseTimeEvent causes ModemPoolAssigment ; if CoarseTimeEvent:Time morning : (5559991 15)

2)

(

= \

(

= \

")

(5559992 5)

")

The action ModemPoolAssigment changes the maximum number of connections that a telephone number (and hence a customer) can make to the modem pool. There will be similar rules for the evening actions. We have developed a policy based management system that is being used to do Operations, Administration, Maintenance and Provisioning (OAM&P) in carrier-grade communication networks. This next generation circuit and packet telephony network comprises of “packet voice gateways”, “soft switches”, and “device servers” among other network entities. An event occurring at a device (due to some conditions being met in the device operations) is received by its device server and streamed out to the “policy server” as a java object of a pre defined java class. The java object carries with it the state information about the device when the event occurred. In the policy server, policies are written using the names, attributes and methods of the java classes for the events. We have developed a domainindependent policy description language (PDL) for specifying policies. PDL provides a succinct representation of complex policies in a very general way. The policy server registers the events that define the behavior of the policies, with the device servers, thus informing the device servers to stream only those events that may be required by some policy. Policies are enforced by the policy server by streaming action objects to the device servers which results in device specific commands being sent to the underlying devices. At the heart of all this is the policy evaluator that evaluates the policies to determine the set of actions to enforce the policies, given the behavior of the network, as reported by the events. A description of the first version of the system can be found in [1].

Policies are becoming central to network management to the point that industry has already started standardization efforts related to policy description and manipulation [2], [8]. However these efforts have been quite ad-hoc and do not address the problem of evaluating policies per se. In this paper we mainly focus on the problem of evaluating policies specified using PDL. The results constitute to our knowledge the first attempt to develop a formal framework to study the informal concepts of policy based network management. We provide a complete characterization of the underlying complexities of evaluating these policies. We show that the evaluation of policies is quite hard, by showing that even restricted instances of this problem are NP-complete. However, we note that the hard instances of the problem are quite rare in real world networks. Motivated by our observations we present an algorithm for evaluating policies which works very well in practice as supported by our empirical results. We also analyze the complexity of the algorithm fixing various policy and network parameters. The rest of the paper is organized as follows. In Section II we give a brief description of PDL. Section III presents an efficient algorithm for evaluating policies specified using PDL. In Section IV we study the complexity of evaluating policies, specified using PDL. In Section V, we present simulation results on the performance of the algorithm. II. P OLICY D ESCRIPTION

LANGUAGE

A policy description is a succinct representation of a policy. We have developed a domain-independent policy description language that we call PDL. A detailed description of PDL can be found in [6]. In PDL, a policy P is represented as a collection of expressions of two types: (1) policy rule propositions which are expressions of the form

event

causes action if

condition

(1)

and (2) policy defined event propositions which are expressions of the form

event

triggers

pde m1 t1; : : : ; mk tk (

=

=

)

condition (2) If the event occurs under the if

A policy rule reads: conditon the action is executed. A policy defined event proposition reads: If the event occurs under the condition the policy defined event pde is triggered.

The event may be primitive or complex. A primitive event may have attributes which may be of type integer, float, string, etc, or simple data structures such as stacks, or queues, etc. There are standard attributes in every primitive event class such as time and location (URL) of the generation of the event. There are two types of primitive events, system defined primitive events and policy defined events. System defined events are generated by the environment while policy defined primitive events are only generated by the policy according to the policy defined event propositions. In the policy defined event proposition above, the mi ’s are the attributes of the policy defined event pde and their values are the ti ’s. The ti ’s can be attributes from other primitive events that appear in event, they could be constants or the result of operations applied to the attributes of the other primitive events in the proposition. The conditions are boolean functions over the event attributes and the actions may take event attributes as arguments and yield commands for controlling devices. The representation of more interesting policies requires descriptions of more elaborate events. In PDL, primitive events can be composed into complex events using notions from regular expressions. A. Complex events Policy decisions are made based on the stream of primitive event instances observed by the policy server running the policy. We will call the streams of event instances event histories. There may be several instances of one or more primitive events occurring at the same time (for example several calls may start simultaneously). Each set of primitive event instances occurring “simultaneously” in a stream is called an epoch. An event literal is a primitive event

3

symbol e or a primitive event symbol preceded by !. The event literal !e occurs in an epoch if there are no instances of the event e in the epoch. We will use the standard dot “.” notation to refer to the attributes of an event. Primitive events can be composed into basic events. A basic event is either an expression of the form e1 & : : : &en representing the occurrence of instances of e1 through en in the current epoch (i.e. the simultaneous occurrence of the n events) where each ei is an event literal, or e1 j : : : jen representing the occurrence of an instance of one of the ei s in the current epoch. Basic events only refer to instances of events that occur in a single epoch. However, we could have composite events that refer to several epochs simultaneously. For example, the sequence loginFail; loginFail; loginFail may represent the event: “three consecutive attempts to login that result in failure.” We can describe many other situations if we borrow the notion of a sequence of zero or more events from regular expressions. In general, a (complex) event is either a basic event, or an expression that can be formed by a finite number of applications of the following rules:

1. If E1 through En are events then E1 ; : : : ; En is a (complex) event representing the sequence: event E1 , immediately followed by event E2 , : : : , immediately followed by event En . 2. If E is an event then ˆE is a (complex) event representing the sequence of zero or more occurrences of the event E . 3. If E is an event then (E ) is a (complex) event. We can interpret the semantics of a complex event by associating a non deterministic finite automaton (NDFA) with a complex event expression. Arcs of the automaton are labeled with the basic events appearing in the expression and transitions are made based on the sequences and the caret operators. For example, the complex event e1 ˆe2 ; ˆe2 ; e3 &e4 is associated with the automaton: e2 e1

e2 

e3

&e4

4

B. Traces of events Let H = E1 ; : : : ; Em be a sequence of contiguous epochs. A trace of an event E is 0 0 ; sm , a sequence sj ?1 ; Ej ; sj ; Ej0 +1 ; : : : ; sm?1 ; Em where sj ?1 ; sj sj +1 : : : sm is an accepting path from the initial state sj ?1 to a final state sm in the NDFA for E , with sm being the only final state, and Ei0 is a minimal subset of events from Ei that triggers the transition from the state si?1 to the state si of the automaton. Note that two different instances of a primitive event in an epoch are considered to be distinct events for set membership. In a given stream of epochs H there is an instance of the complex event E for each distinct trace of E in H . When a trace of E reaches the final state sm of the NDFA for E , the condition of the proposition is evaluated. If the condition is true and the proposition is a policy rule the action in the proposition is invoked and the trace is terminated. If the proposition is a policy defined event rule, an instance of the policy defined event is generated and is appended to the incoming epoch. C. Conditions A condition is a sequence of predicates of the form tt0 , where t and t0 can be attributes from primitive events that appear in event, or they could be constants or the result of operations applied to the attributes of the primitive events that appear in event.  is a comparison operator such as , etc. There is a special class of operators that can be used to form terms, called aggregators. For a given “generic” aggregator Agg , the syntax of the operator will be Agg (e:x) or Agg (e). Here e is a primitive or policy defined event that appears in the event part of the proposition, e:x is an attribute of e. As the name suggests the aggregator operators are used to aggregate values over multiple epochs. For example a count aggregator Count(e) can be used to count the number of occurrences of event e over multiple epochs. An aggregator could also add the values of the attribute e:x or get the largest value, etc. These aggregation terms are applied to primitive events that appear under the scope of a caret “ˆ” operator. The

rule

e1 ; ˆe2

causes a if

Count e2 (

) = 20

(3)

will execute a if 20 instances of e2 follow an instance of e1 . D. Notation and Assumptions In the rest of the paper we will denote the event history H at time t to be H = E1 ; : : : ; Et . We will call Ej the epoch at time j . Note that epochs are not associated with time, they appear when events arrive at the policy server. Once an epoch is started, it terminates according to a global parameter of the system: length which is application dependent. For example, we can consider all events occurring in the same day as members of the same epoch, or change the scale (length) to minutes or seconds. III. P OLICY E VALUATION A LGORITHM In this section we present an algorithm for evaluating policies specified using PDL. The algorithm is implemented as the Policy Engine of a Policy Server embedded in the “softswitch”, a next generation switch for circuit and packet telephony networks. and has been used to implement policies for detecting alarm conditions, fail-overs, device configuration and provisioning, service class configuration, congestion control etc. The algorithm to evaluate a policy works in realtime: The algorithm only gets to see the current epoch. Based on the set of events that happen in the current epoch and the past history the algorithm evaluates all the actions that must happen in the current epoch, as specified by the policy. We present the algorithm for a single rule of the policy. The only interaction between the rules of a policy happens as a result of policy defined events which may be shared between rules. The policy defined events that get triggered in an epoch get added to the next epoch. The evaluation of a rule at any epoch t is therefore independent of the policy defined events triggered by other rules in the epoch t. Thus

we can treat the evaluation of the rules independent of each other. High level idea: Let the policy rule be

E causes A if B or E triggers T if B: The algorithm maintains the set R t of all possible distinct ”partial traces” for event E at epoch t (recall that the history at epoch t is E1 ; E2 : : : Et ). A partial trace P of event E at epoch t is de( )

fined to be a proper prefix of a trace of event E at some future epoch t + k (k > 0). In other words P = s0 ; Ej0 ; sj ; Ej0 +1 ; sj +1 : : : Et0 ; st , and there exist epochs Et+1 ; : : : Et+k and states st+1 ; : : : st+k of the NDFA for E , with st+k being a final state, such that P when appended with Et+1 ; st+1 : : : Et+k ; st+k is a trace of E at epoch t + k. We refer to these distinct partial traces as active threads of event E . The set R(t ? 1) and the epoch Et is used to compute the set R(t). This computation also yields all the distinct traces of the event E at epoch t. The algorithm evaluates the policy rule in each such trace at epoch t, thus computing all the actions that are to be done, and all the policy defined events that are to be triggered at epoch t. In order to facilitate the computation of the policy rule in a trace, some additional information is maintained for every (partial) trace. This information includes the value of the attributes of the events, and the aggregate operators that can be computed for the current history, for the (partial) trace. As a result the algorithm has complete information to compute the policy rule in a given trace. Intuitively an active thread (partial trace) represents a partial evaluation of the policy rule in the current epoch that holds a potential of leading to a complete evaluation of the rule in some future epoch, thus becoming a trace of the event. Given an active thread A at epoch t ? 1 and the epoch Et , the algorithm computes all possible ways of extending A to an active thread or a trace for epoch t. All such active threads are added to the set R(t). In addition set R(t) gets an active thread which represents a partial evaluation of the policy rule that starts at epoch t. In the following we will denote an active thread

5

s0; Ej0 ; sj : : : Et0 ; st by the tuple A1 ; A2 , where A1 fs0 ; sj : : : st?1 ; st g and A2 Ej0 ; Ej0 +1 : : : Et0 . A

=

(

=

)

=

)

(

The policy evaluation algorithm PE has two phases: the initialization phase and the real-time evaluation phase. The former involves compiling the policy description to evaluate all the static information. The latter phase lasts from the time the algorithm is started to forever. In the real-time phase, the algorithm PE, at epoch t, evaluates the policy rule in the current history H, to determine the actions to be taken, or the policy defined events to be triggered at epoch t. We will use a running example of a policy rule to illustrate the algorithm PE: ˆe1 ; e1 ; ˆe2 ; e2

causes A(count(e1 [1]); e2 [2]:x) if B (e1 [2]:y )

Note that instance the complex event E = ˆe1 ; e1 ; ˆe2 ; e2 consists of at least one e1 followed by at least one e2 . The condition B tests for the y attribute of the last e1 in the instance of E . If the condition B evaluates to true then it causes action A with two arguments which are dependent on the number of e1 and the value of the x attribute of the last e2 in the instance of E . A. Initialization phase This phase involves, among other things, the construction of a non-deterministic finite automata (NDFA) N for the complex event E in the rule. Example: For our example the NDFA is shown in Fig. 1. It has three states 1; 2; 3. State 1 is the initial state and state 3 is the final state.

e1 1

e2 e1

2

e2

Fig. 1. NDFA for the example.

3

6

B. Real-time evaluation phase We describe this phase for a particular epoch t. As described earlier the algorithm maintains a set of active threads R(t) for event E at epoch t. In addition for every active thread A = (A1 ; A2 ) in R(t) the algorithm maintains the partial information (attribute values, aggregated values etc), that can be obtained from A and which is required to evaluate the if condition B , the action A or to construct policy defined events T .

Example: Let t = 1 and E1 = fe1 g. One of the active threads of E at epoch 1 has A1 = f1; 2g and A2 = (fe1 g), corresponding to the only primitive event in the history at t = 1. Another active thread of E at epoch 1 has A1 = f1; 1g and A2 = (fe1 g). Note that for the first active thread count(e1 [1]) = 0 and e1 [2] is instantiated to the primitive event e1 in E1 . Similarly for the second active thread we know that count(e1 [1]) >= 1, and we do not have any information about e1 [2].

The general structure of the algorithm PE is as follows. Let epoch t be the next epoch. By our assumption the algorithm has the set of active threads R(t ? 1). Let A(t ? 1) 2 R(t ? 1) be an active thread. Let s be the last state of the NDFA N of E , in the path A1 for A(t ? 1). We will say that the active thread R(t ? 1) is in state s of the NDFA N . Let (s; s0 ) be a transition in N which is labeled with a basic event E(s;s0 ) . Then E(s;s0 ) is of the form e or !e or e1 &e2 & : : : or e1 je2 j : : :, where each e; ei is a primitive event. Let I be the set of minimal subsets of Et in which event E(s;s0 ) occurs. For each set i 2 I a new active thread is generated, for which

A1 A2

= =

append(A1 of A(t ? 1); (s0 )); append(A2 of A(t ? 1); (i))

In addition the algorithm computes the additional information obtained for evaluating the policy rule from the epoch i. At this point it is checked if this new active thread is in a final state of the NDFA N , and if it has all the information required to evaluate the policy rule. If it is then the policy rule is evaluated in this active thread, and the active thread is

terminated. Otherwise the active thread is added to the set R(t). This procedure is carried out for every active thread in R(t ? 1). In addition a new active thread ! (0) is added to R(t) with A1 = the initial state of the NDFA and an empty A2 . Example: In the previous example there are three active threads at epoch 1 in this history. Two were described earlier and the third one is ! (0).

Example: Let the history at time t = 2 be E1 = fe1 g; E2 = fe3 g. In this history all three active threads in R(1) are discarded at epoch 2. Therefore R(2) = f!(0)g. The evaluation of a policy rule at an epoch may cause some actions or trigger some policy defined events. The algorithm PE ensures that the same action is not done twice or the same policy defined event is not triggered twice in an epoch. Two actions are the same if they cause the invocation of the same function with the same values for the arguments. Two policy defined events are the same if they are an instance of the same event object with the same values for their attributes.

7

C. Pseudocode for the algorithm PE // Initialization phase P = the Policy for each rule r in P NDFA(r) = the NDFA for event E of rule r R(0; r) = fthe active thread !(0)g

t i1 E 1 = 1 =

// Online phase while(true) // pde is the set of triggered events pde = ; actions = ; for each rule r in P // Compute active threads for epoch t R(t; r) = Update(R(t ? 1; r); it ) [ R(0; r) // Compute actions for epoch t actions = actions [ Actions(R(t ? 1; r); it ) // Compute triggers for epoch t pde = pde [ Trigger(R(t ? 1; r); it ) // FIRE fires the actions FIRE(actions) it+1 = pde [ Et+1

t t =

+1

Example: Let the history H at epoch 2 be E1 = 1 2 = fe1 ; e1 ; e2 ; e2 g. The two instances of e1 in the second epoch have the y attributes set to 4 and 5 respectively. The two instances of e2 in the second epoch have the x attributes set to 2 and 7 respectively. The e1 in the first epoch has e1 :y = 9. As described earlier there are three active threads in R(1). For each of these active threads we present below the new active threads R(2) that are created at epoch 2.

fe g; E

1. For the active thread ! (0) 2 R(1) two new active threads may be added to R(2), each with A1 = f1; 1g and A2 = (fe1 g), one for each e1 in E2 . Note that for both these active threads we have the same information to evaluate the policy rule: count(e1 [1])  1. Hence both these active threads are indistinguishable, in terms of the evaluation of the policy rule. Hence only one of these active threads is added to R(2). In addition for the active thread ! (0) 2 R(1) two new active thread are added to R(2), each with A1 = f1; 2g and A2 = (fe1 g),

one for each e1 in E2 . Note that these two active threads are distinct since even though for both of them we have count(e1 [1]) = 0, one of them has e1 [2]:y = 4 and the other e1 [2]:y = 5. 2. For the active thread corresponding to the tuple (f1; 1g; (fe1 g)) in R(1) the two new active threads that are added to R(2) are determined as for the active thread ! (0) 2 R(1). 3. For the active thread corresponding to the tuple (f1; 2g; (fe1 g)) in R(1) one active thread corresponding to the tuple (f1; 2; 2g; (fe1 g; fe2 g)) is added to R(2). Here the first e1 refers to the e1 in the first epoch and the second e2 may refer to any of the two e2 ’s in the second epoch. In this active thread count(e1 [1]) = 0; e1 [2]:y = 9. In addition for the active thread (f1; 2g; (fe1 g)) 2 R(1) two traces of event E at epoch 2 are obtained, both corresponding to the tuple (f1; 2; 3g; (fe1 g; fe2 g)), where the first e1 refers to the e1 in the first epoch and the second e2 refers to one of the two e2 ’s in the second epoch. For each of these active threads count(e1 [1]) = 0; e1 [2]:y = 9, and for one of them e2 [2]:x = 2 and for the other e2 [2]:x = 7. Note that both these traces have enough information to evaluate the condition. If the condition evaluates to true for both these traces, then the action A is performed twice at epoch 2, once as A(0; 2) and once as A(0; 7). IV. T HE

COMPLEXITY OF

P OLICY E VALUATION

In this section we show that even restricted instances of the policy evaluation problem are quite hard. However we observe that the hard instances of the policy evaluation problem are quite rare in real world networks. Under some very realistic assumptions we are able to show that our policy evaluation algorithm (Section III) has a good worst case performance. A. Hardness of evaluating policies We establish the hardness of the easier decision version of the problem: Given a policy P of description size p and an action A and history H of

h epochs, is A caused by P in any of the epochs of

H?

We show that the decision version of the policy evaluation problem is NP-Hard for many restricted class of policies.

We first show that the decision version of the policy evaluation problem is NP-complete if there is no bound on p, the policy description size. Our first two results show this for some very simple class of policies. One might make p a small bounded constant. Unfortunately we are able to show that the problem is hard even if p is a small bounded constant. We present three more hardness results each bringing out a different aspect of the intractability of the problem. These are hardness results for policies with policy defined events, policies with “double non-determinism” , and policies that use aggregate operators for “un-grouped” events in the scope of a ˆ operator in the specification of policy event E . From these three results we conclude that the intractibilities are due to: simple policies for which an exponential number of policy defined events may get triggered (exponential in h), “double non-determinism” potentially leading to an exponential number of active threads A = (A1 ; A2 ), which differ in the path A1 traversed in the NDFA, and for the third kind (“un-grouped”) of policies for which there may be an exponential number of active threads A = (A1 ; A2 ), that differ only in the A2 values. We are able to show that these are the only necessary and sufficient condition for the intractability of the problem. That is we show later that the class of policies, in which none of the above mentioned intractibilities are present, can be evaluated in polynomial time. And the algorithm presented in Section III does so. First of all note that the decision version of the policy evaluation problem is in the class NP, under the assumption that all functions in the policy rule are computable in polynomial time. This is because given a trace T for the event E , of the policy rule, at an epoch t, it can be verified, in polynomial time, if T is a valid trace and if action A happens when the policy rule is evaluated in T . Below we present reductions from 3-SAT to this problem, thus estab nested ˆs?

8

lishing its NP completeness. Theorem 1: The policy evaluation problem is NPcomplete for policies in which the policy specification does not use the ˆ operator, For this result we require p to be a polynomial in h, thus allowing unbounded size policies. Proof: We reduce 3-SAT to the decision version of the policy evaluation problem. Let the given instance of 3-SAT have n variables x1 ; x2 ; : : : xn . Let C = fc1 ; c2 : : : cm g be the set of clauses in the given 3-SAT instance. Thus each ci is a disjunction of literals over x1 ; x2 ; : : : xn . We construct a policy P with n primitive events: e1 ; e2 : : : en, such that ei has a boolean attribute xi . The policy P has one rule:

e1 ; e2 ; e3 : : : ; en causes A if B: Here B is a conjunction of boolean expressions constructed as follows. For every clause ci which is a disjunction of the three literals corresponding to the variables xa ; xb ; xc , we define a boolean function fi with three arguments ea:xa ; eb :xb ; ec:xc that computes the disjunction of its arguments corresponding to the clause ci . For example if the clause ci is equal to x1 jx2 jx3 then fi (x; y; z ) = xjy jz: For clause ci we have a boolean expression bi which is true if and only if the function fi (ea :xa ; eb :xb ; ec :xc ) evaluates to 1. The condition B is a conjunction of the boolean expressions b1 ; b2 : : : bm .

n epochs in the history, with Ej and fej ; ej g;  j  n , such that ej1 :xj ej2 :xj : In other words at the j -th epoch two instances of primitive event ej occur, one of which has attribute ej :xj set to and the other one has the attribute ej :xj set to . Let there be

=

(1

= 1

)

= 0

1

0

Note that there are 2n traces (instances) of the complex event E = e1 ; e2 ; e3 ; : : : en in the input history H at epoch n, each corresponding to the 2n different choices of picking one of the two primitive events in each epoch. Note that depending on which ei we pick from the epoch i, either the value of ei :xi is 0 or it is 1. In other words each of the trace of the event E corresponds to a distinct setting of the attributes ei :xi ; (1  i  n). Hence there is a one

9

to one correspondence between an assignment to the variables x1 ; x2 : : : xn and an trace of the event E , such that the value of xi is equal to the value of the attribute ei :xi in the trace. But then if the policy P on history H causes the action A to happen at epoch n, then there must be an trace of the event E for which the condition B evaluates to true, which by construction implies a satisfying assignment for the given 3-SAT instance. On the other hand if there is a satisfying assignment for the given 3-SAT instance then in the corresponding trace of the event E the condition B evaluates to true, and hence the policy P on history H must cause the action A to happen at epoch n. Remark 2: This result seems to suggest that the policy evaluation problem may be hard due to there being two primitive events per epoch which leads to doubling of the active threads at every epoch, even though all these active threads traverse the same path in the NDFA for E (they agree on A1 ). Next we show that this is not the only reason for the intractability of the policy evaluation problem. Theorem 3: Even if there is only one primitive event, which may happen only once in each epoch, the decision version of the policy evaluation problem is NP-complete. For this result we require p to be a polynomial in h. Proof: We again reduce 3-SAT to the decision version of the policy evaluation problem. Let the given instance of 3-SAT have n variables x1 ; x2 : : : xn. Let C = fc1 ; c2 : : : cm g be the set of clauses in the given 3-SAT instance. We construct a policy P with one primitive event, and one rule: ˆe; ˆe; : : : ˆe causes A if

B: The history we consider has n epochs (h n) with Ej feg;  j  n : The proof is similar to =

=

(1

)

the proof of Theorem 1. As before we have to establish a correspondence between an assignment to the n variables x1 ; x2 : : : xn and a trace of the event E = ˆe; ˆe; : : : ˆe in the input H . Note that there are at least 2n traces of E in the history H , each corresponding to a different set of values of the functions count(e[i]); (1  i  n). Given an trace of E in the

history H we map it to an assignment of the n variables x1 ; x2 : : : xn as follows. We correspond xi to count(e[i]): count(e[i]) > 0 corresponds to xi = 1 and count(e[i]) = 0 corresponds to xi = 0. Now the proof is exactly the same as the proof of Theorem 1 where we replace ei :xi by count(e[i]) in the condition B . So far we have seen that simple class of policies are hard to evaluate. In real life most policies are small and have bounded description size. For such policies the previous results do not hold. In the following we will study such class of bounded size policies. Theorem 4: Even if only two primitive events may happen in each epoch, p is a constant and policies are defined using policy defined events, the decision version of the policy evaluation problem is NPcomplete. Proof: We reduce 3-SAT to this decision version of the policy evaluation problem. Let the given instance of 3-SAT be as in proof of Theorem 1, have n variables x1 ; x2 ; : : : xn . Let C = fc1 ; c2 : : : cm g be the set of clauses in the given 3-SAT instance. We construct a constant size policy P with 3 primitive events: e1 ; e2 ; e3 , such that e1 has a boolean attribute x and e2 has two attributes y and z. The attributes y and z of e2 are used to encode a clause c 2 C . The policy P has five rules:

e1 t triggers t a e1 :x; b ; c : e1 t t:b+1 e1 :x t:a; b t:b triggers t a e2 t triggers t a t:a; b t:b; c if f t:a; t:c; e2 :y; e2 :z : e2 t triggers t a t:a; b t:b; c if f t:a; t:c; e2 :y; e2 :z : e3 t causes A if t:c : &!

(

=

= 0

= 1)

&

(

&

= 2

+

(

=

=

(

=

=

(

;c

+1

:

= 1)

)

&

!

=

= 1)

(

= 0)

)

&

= 1

Here t is a policy defined event with three attributes: a; b; c. We consider a history H of n + m + 1 epochs,

Ej fe1 ; e1 g;  j  n ; Ej fe2 g; n  j  n m ; and En+m+1 fe3 g: Note that the his=

(1

+

)

)

=

(

+ 1

=

tory as described above does not include the policy defined events that get added to the epochs, it only

10

represents the system defined primitive events. The two events in each of the first n epochs satisfy e11 :x = 1; e12 :x = 0: Thus the events in the first n epochs, as in the proof of Theorem 1, encode the 2n assignments for the n variables x1 ; x2 : : : xn . The events in the next m epochs correspond to the m clauses, such that the event e2 in the epoch n + i encodes using its attributes y and z , the i-th clause. This is done as follows. Let the i-th clause involve the three variables x 1 ; x 2 and x 3 . Let 1 ; 2 ; 3 be each 0 or 1 depending on if the corresponding literal x 1 ; x 2 ; x 3 is positive or negative in the i-th clause. The event e2 in epoch n + i encodes the i-th clause by setting the attribute y to the boolean vector ( 1 ; 2 ; 3 ) and by setting the attribute z to the vector ( 1 ; 2 ; 3 ). The event in the last epoch is e3 which serves as a flag to terminate the evaluation of the policy. Note that the first rule applies only in the first epoch, and that at the end of the n epochs there are exactly 2n policy defined events t, each corresponding to one of the 2n possible satisfying assignments, which is encoded in the attribute a of t (as a decimal number corresponding to the binary vector). Each of these events at epoch n has t:c = 1. We now show that if for some policy defined event t, the attribute t:c is equal to one at epoch m + n + 1 then the assignment encoded in t:a is a satisfying assignment. Note that in the m epochs, n + 1 : : : n + m, only rules three and four are evaluated. In the epoch n + i, these rules ensure that if the encoded assignment t:a satisfies the i-th clause, for a policy defined event t with attribute c = 1, then a new instance of policy defined event t with attribute c = 1 is added to the next epoch, otherwise an instance of policy defined event t with attribute c = 0 is added to the next epoch. These rules also ensure that for a policy defined event t with attribute t:c = 0, the instance of the policy defined event t that gets added to the next epoch also has attribute c = 0. Hence if t:c = 1 for a policy defined event t at epoch m + n + 1 then the assignment encoded in t:a is a satisfying assignment. Therefore action A occurs at epoch m + n + 1 iff the given instance of 3-SAT has a satisfying assignment.

Note that the function f can be implemented in constant space, since it only checks the value of its second argument, and it only checks if the clause encoded in its last two arguments is satisfied by the assignment encoded in its first argument. Remark 5: We note that the reason triggers make the problem hard, is that policies may trigger an exponential number of policy defined events (exponential in h) in histories H of length h. Theorem 6: Even if only two primitive events may happen in each epoch, p is a constant, policy rules do not use policy defined events, however we allow “double non-determinism” in the event definition, the decision version of the policy evaluation problem is NP-complete.

Remark 7: An example of an event with ”double non-determinism” is the complex event E = ˆ(ˆe1 ; ˆe2 ). Such an event may have an exponential number of active threads (exponential in h) in an epoch history H of length h. Proof: The proof is similar to the proof of Theorem 4. We reduce 3-SAT to this decision version of the policy evaluation problem. Let the given instance of 3-SAT have n variables x1 ; x2 ; : : : xn . Let C = fc1 ; c2 : : : cm g be the set of clauses in the given 3-SAT instance. We construct a constant size policy P with 4 primitive events: e1 ; e01 ; e2 ; e3 , such that e1 has an attribute x, e2 and e3 are the same event as in the proof of Theorem 4. The policy P has one rule: ˆ(ˆe1 ; ˆe01 ); ˆe2 ; e3

causes A if f1 (f2 (e1 :x); e2 :y; e2 :z )

The history H with h n m is given by: Ej fe1 ; e01 g;  j  n ; Ej fe2 g; n  j  n m ; and En+m+1 fe3 g: Here the events e1 ’s in the first n epochs of the =

(1

=

+

)

+

)

+ 1

=

(

+ 1

=

history are used to encode the 2n assignments for the n variables x1 ; x2 : : : xn . This is done by setting e1 :x = 2j?1 for the event e1 in Ej ; (1  j  n). The events in the next m epochs, as in the proof of Theorem 4, encode the m clauses. The event e3 as before serves as a flag to terminate the evaluation of the policy.

11

The function f2 is an aggregator operator that sums up the e1 :x values. Note that at the end of the first n epochs there are 2n active threads for ˆ(ˆe1 ; ˆe01 ) in the history, for each of which the function f2 (e1 :x) encodes a potential satisfying assignment, as its decimal value.

P with

The function f1 is an aggregator operator that checks, as in the proof of Theorem 4, if the assignment encoded in f2 (e1 :x) satisfies the clause encoded in its last two arguments. The value of this function is one at the end of n + m + 1 epochs if and only if all the m clauses are satisfied by the encoded assignment. Note that both the functions f1 and f2 can be implemented in constant space.

H with h n m is given by: Ej fe1 ; e1 g;  j  n ; Ej fe2 g; n  j  n m ; and En+m+1 fe3 g: The pair of events e1 ’s in the first n epochs differ in their attribute x value, one has it set to and

Remark 8: The above shows that a complex event with “double non-determinism” E = ˆ(ˆe1 ; ˆe2 ), may have an exponential number of active threads in an epoch history H of length h. Note that these active threads A = (A1 ; A2 ) differ in A1 , the path that they take in the NDFA for E , and for a complex event with “non-determinism” there can be an exponential number of A1 ’s. We now show that even if we do not allow ”double non-determinism”, and p is a constant, the policy evaluation problem is still hard. Theorem 9: Even if only two primitive events may happen in each time instance, p is a constant, policy rules do not use policy defined events, and there is no non-determinism in the event definition, the decision version of the policy evaluation problem is NP-complete. Remark 10: The following proof establishes that there may be an exponential number of active threads A = (A1 ; A2 ) in an epoch history H of length h, all of which have the same A1 , the path in the NDFA, however they may differ in the A2 ’s. Proof: The proof is similar to the proof of Theorem 6. We reduce 3-SAT to this decision version of the policy evaluation problem. Let the given instance of 3-SAT have n variables x1 ; x2 ; : : : xn . Let C = fc1 ; c2 : : : cm g be the set of clauses in the given 3-SAT instance. We construct a constant size policy

primitive events: e1 ; e2 ; e3 , such that e1 has a attribute x, e2 and e3 are the same event as in the proof of Theorem 4. The policy P has one rule: 3

ˆe1 ; ˆe2 ; e3

The history =

=

(1

+

)

causes A if f1 (f2 (e1 :x); e2 :y; e2 :z ) +

)

+ 1

=

(

+ 1

=

0

the other has it set to 1. These events as in earlier proofs are used to encode the 2n assignments for the n variables x1 ; x2 : : : xn , since an active thread has to choose one of these events in each such epoch. The events in the next m epochs, as in the proof of Theorem 4, encode the m clauses. The event e3 as before serves as a flag to terminate the evaluation of the policy. The function f2 is an aggregator operator that encodes the e1 :x values (as a decimal representation of the binary vector). Note that at the end of the first n epochs there are 2n distinct active threads of E , each corresponding to a potential distinct satisfying assignment of the 3-SAT instance. The function f1 is an aggregator operator that checks, as in the proof of Theorem 6, if the assignment encoded in f2 (e1 :x) satisfies the clause encoded in its last two arguments. The value of this function is one at the end of n + m + 1 epochs if and only if all the m clauses are satisfied by the encoded assignment. Note that both the functions f1 and f2 can be implemented in constant space. Remark 11: In the PDL a “group” operator is provided to deal with this kind of intractability. For an event defined as group(e) for some basic event e, there is at most one instance in any given epoch, no matter how many distinct instances of e are present in the epoch. As the name suggests the “group” operator groups all these instances into one instance. We will show later that policies where all the basic events within the scope of a “ˆ” operator are of the form group(e), in its specification can be efficiently

12

evaluated given some other restrictions. B. Complexity of the Policy Evaluation Algorithm We have seen that even very restricted instances of the policy evaluation problem are hard to solve. However the hard cases seem to arise only for contrived instances of the problem. In real life we can assume that the following may hold:

 All active threads have small length, i.e. there are only a small number of epochs in its underlying partial trace.  The functions used in the policy rules can be efficiently computed.  The policy rules have bounded size descriptions.  There are only a small number of distinct instances of a primitive (system defined or policy defined) event in any epoch of the history.  Policies can be written without using “double nondeterminism” to construct complex events.  Policies can be written such that the basic events in the scope of a “ˆ” operator are also within the scope of the PDL “group” operator in the event E . These assumptions are justified by the fact that real networks have finite memory: the events that must lead to an action, happen not too far back in the past. Also network policies can be generally expressed as simple rules which do not require too much computation to implement. Under these assumptions the policy evaluation algorithm PE is quite efficient. Theorem 12: Given a policy rule R of size p, defined using complex event E that does not use ”double non-determinism”, and uses the “group” operator for basic events in the scope of a “ˆ”, and a history H in which there are at most k distinct instances of any primitive event (including policy defined) in any epoch, such that all the active threads of E have a length bounded by l in H, and all of R’s functions can be computed efficiently then the algorithm PE, to evaluate rule R, runs in time O ((k  l)p ) per epoch. Note that if p is a constant then the algorithm runs in time a polynomial in k and l.

Proof: Note that the complexity of the algorithm for any given epoch is directly proportional to the number of active threads in that epoch. In the following we will bound the number of active threads per epoch. It follows from our assumptions about event E that it can be written as E = C1 ; C2 : : : Cm , for events Ci which have the following property: either Ci = ˆDi for some event Di which does not contain any “ˆ” and all its basic events are within the PDL group operator, or Ci = Fi , where Fi does not contain any “ˆ” operator. Note that both Di and Fi are of the form e1 ; e2 : : : ej , where each ei is a basic event possibly within a group operator. Note that from this it follows that that the NDFA N can also be partitioned as shown in Fig. 2, which we will write as N = N1 ; N2 : : : Nm .

N1

N2 

Nm

transition

Fig. 2. Partition of the NDFA of the complex event

Let us first compute how many distinct active threads A = (A1 ; A2 ) can have the same path A1 in the NDFA N . Two such active threads are distinct if they have a different value of some attribute or some aggregate operator. Note that in any such active thread, an attribute e[s]:x can have at most k distinct values, one for each instance of the primitive event e in the epoch where this attribute is instantiated. This is because for any attribute e[s]:x the epoch in which it gets instantiated is the same for all the active threads since they all agree on their paths A1 in the NDFA N . Also note that all the aggregate operators have the same value for all these active threads, since aggregate operators involve events and their attributes which are within the scope of the “ˆ” operator and all such events by our assumption are also within the scope of a group operator. All this implies that there can be at most O (k p ) active threads that have the same A1 value. Now we compute the number of distinct values for

A1 that the active threads in any epoch may have.

Note that the path A1 can be partitioned into m paths, with the i-th path completely contained in Ni . Let the length of the chain (and hence the number of states) of Ni be ni . Note that m i=1 ni = O(p), and therefore m as well as ni ’s are each at most O (p). Since the active thread A1 has length a  l, we must have:

P

m X ci ni  l: i=1

for some non negative integers ci ’s. Note that A1 can be completely specified by specifying the ci values. A very crude estimate is 0  ci  l for all i. Hence the number of distinct values for all the ci and hence the number of distinct A1 are at most O(lm ) = O(lp ). Combining the two results we get that the number of active threads in any epoch are O ((k  l)p ).

V. S IMULATION R ESULTS In this section we present our empirical results on the performance of the policy evaluation algorithm PE, based on simulations. Our experimental setup is as follows. For a given policy P and a history H of h epochs we measure the time m(t) (in milli-secs) that it takes to evaluate the policy P at the epoch t on H . We plot t on the X-axis and m(t) on the y axis. In Fig. 3 we show the performance of the algorithm for 4 simple policies, each with rules of the form e causes A(e:x): policies with one such rule, two such rules, four such rules and eight such rules. The rules in a policy use different primitive events. For this experiment, the epochs in the history H grow linearly with h. That is the j -th epoch contains j distinct instances (with different value for attribute x) of each of the eight primitive events that may occur in any of the policy rule. As expected the time to evaluate a policy increases linearly with the number of epochs and the number of rules (for clarity we do not present the results for policy with 9; 10::: rules) The results in Fig. 3 serve as a baseline to measure the performance of the algorithm for more complicated policies.

13

1800 1600 1400 1200 1000 800 600 400 200 0

s.1 s.2 s.4 s.8

0 20 40 60 80 100 120 140 160 180 200 Fig. 3. Simple Policies

The next policy that we used for our experiments is for admission control, it enforces two levels of admission control: at a global level it allows at most maxConn, and at a class level it allows at most a class dependent number of simultaneous connections. The policy involves two external events TryConn and DisConn representing a request for a connection or for a disconnect respectively. In addition it involves two policy defined events ClassConn and GlobalConn, for enforcing the policy by keeping track of the number of connections at the class and global level respectively. Startup triggers

GlobalConn(conncts = 0);

StartClass triggers ClassConn(class = Startup.class, conncts = 0); TryConn & ClassConn & GlobalConn causes Conn("Connection", TryConn.user) if GlobalConn.conncts < maxCon, TryConn.class = ClassConn.class, ClassConn.conncts < maxConn(TryConn.class); Disconn causes Disc("Disconnection", Disconn.user);

TryConn & ClassConn & GlobalConn triggers ClassConn(class = ClassConn.class, conncts = ClassConn.conncts + 1) if GlobalConn.conncts < maxCon, TryConn.class = ClassConn.class, ClassConn.conncts < maxConn(TryConn.class); TryConn & ClassConn & GlobalConn triggers GlobalConn(conncts = GlobalConn.conncts + 1) if GlobalConn.conncts < maxCon, TryConn.class = ClassConn.class,

14 ClassConn.conncts < maxConn(TryConn.class); Disconn & ClassConn triggers ClassConn(class = ClassConn.class, conncts = ClassConn.conncts - 1) if Disconn.class = ClassConn.class; Disconn & GlobalConn triggers GlobalConn(conncts = GlobalConn.conncts - 1); Disconn & ClassConn triggers ClassConn(class = ClassConn.class, conncts = ClassConn.conncts ) if Disconn.class != ClassConn.class; TryConn & ClassConn triggers ClassConn(class = ClassConn.class, conncts = ClassConn.conncts ) if TryConn.class != ClassConn.class; StartClass & GlobalConn triggers GlobalConn(conncts = GlobalConn.conncts);

This policy monitors and controls the number of simultaneous connections from different classes of users into a network. There is an overall limit of connections given by the constant maxCon. In a given moment the number of simultaneous connections is limited to a maximum of maxCon. There is a second tier of control based on classes. Users have associated classes and each class has limited its maximum number of connection by the value of the function maxConn(class). The policy works as follows. There is an event triggered at the time the policy is brought up (i.e. it is triggered by the event Startup). The event triggered, GlobalConn, has one attribute that keeps track of the total number of current connections in the system. It is initialized in 0. This is done by the first rule in the policy. Each time a new user class is introduced the event StartClass is generated by the registration process. This event has one attribute, the event class name. This event triggers an internal event, ClassConn, with two attributes, one with the class name and the other a counter, also initialized in 0, that will keep track of the number of simultaneous connections for the class. This is encoded in the second rule. The next two rules process the connections and disconnection. The connection policy rule executes the action (i.e. establishes the connection) if

the limits are not exceeded. The fifth and six rules increment the appropriate class counter and the global counter if the connections will be established. The next two rules decrement the counters after disconnect occurs. The last three rules handle the cases when event occurs but counters do not change. Note that there is a StartClass event for each class and these events are re-triggered each time an external event (TryCOnn and Disconn) not affecting the class occurs. For this policy the epochs are defined as follows. A new epoch is initiated whenever a request for a connection or disconnection or any of the start events happen. We assume for this policy that every epoch contains exactly one external (system defined) event. We ran the experiments for randomly generated admission/dis-connection requests for the admisssion control policy, The running time of the algorithm stayed around 100ms per epoch. VI. R ELATED

WORK

Policy based network management has been a topic of research for several years. There has been work in formal models and methodologies to express policies (see, for example, [9] and the references therein) but no attention is directed to complexity issues. In many cases is not clear that the models always describe computable policies. General purpose event programming languages have been around for a while. Good examples are the READY system [3] and its predecessor YEAST [5]. Event languages have also been studied in the context of network monitoring (see, for example, [4], [10]). There is also a tremendous amount of effort from the networking industry to understand and standardize the concept of policy. The interest is reflected in the work of the Policy Working group in the Internet Engineering Task Force (IETF) [7], [8]. Our primary interest has been to develop a policy server, with a complete understanding of the complexity of the problem being addressed. For the latter goal we needed more than an implemented event language system. We needed a formal specification of what a policy could do. We developed PDL with this in

15

mind. Although we cannot do formal comparisons between PDL and the standards being proposed at IETFy our complexity results show that PDL is a very expressive language, and we believe sufficient for the policies required in real network applications. Details of the language and several examples can be found in [6]. R EFERENCES [1]

R. Bhatia, M. Kohli, J. Lobo, and A. Virmani. A policybased network management system. In Proc. of the International Conference on Parallel and Distributed Techniques and Applications/International Conference on Artificial Intelligence, June 1999. [2] J. Boyle, R. Cohen, D. Durham, S. Herzog, R. Rajan, and A. Sastry. The COPS (Common Policy Service) protocol. Technical Report draft-ietf-rap-cops-06.txt, Internet Engineering Task Force, February 1999. [3] R. E. Gruber, B. Krishnamurthy, and E. Panagos. Highlevel constructs in the READY event notification system. In 8th ACM SIGOPS European Workshop, Sintra, Portugal, September 1998. [4] M. Z. Hasan. An active temporal model for network management databases. In IFIP/IEEE 4th International Symposium on Integrated Network Management, pages 524– 535, Santa Barbara, California, May 1995. [5] B. Krishnamurthy and D. Rosenblum. YEAST: A general purpose event-action system. IEEE Transactions on Software Engineering, 21(10):845–857, Octuber 1995. [6] J. Lobo, R. Bhatia, and S. Naqvi. A policy description language. In Proc. of AAAI, Orlando, FL, July 1999. [7] J. Strassner and E. Elleson. Terminology for describing network policy and services. Technical Report draftstrassner-policy-terms-01.txt, Internet Engineering Task Force, February 1999. [8] J. Strassner and S. Schleimer. Policy framework definition language. Technical Report draft-ietf-policy-frameworkpfdl-00.txt, Internet Engineering Task Force, November 1998. [9] R. Wies. Policies in network and system management formal definition and architecture. Journal of Network and System Management, 2(1):63–83, 1994. [10] O. Wolfson, S. Sengupta, and Y. Yemini. Managing communication networks by monitoring databases. IEEE Transactions on Software Engineering, 17(9):944–953, September 1991.

y The models are work in progress and are still imprecise and

incomplete.