A Learning Approach for Call Admission Control ... - Semantic Scholar

A Learning Approach for Call Admission Control under QoS Constraints in Cellular Networks Xu Yang

John Bigham

MPI-QMUL Information Systems Research Centre A313, Macao Polytechnic Institute Macao SAR, China [email protected]

Department of Electronic Engineering Queen Mary University of London, London E1 4NS, U.K. [email protected]

Abstract— A new approach for learning Call Admission Control (CAC) schemes that can provide Quality of Service (QoS) guarantees to ongoing calls from several classes of traffic with different resource requirements is presented. Comparison with two other CAC schemes shows that the new approach is not only capable of utilizing the network resource to maximize revenue but also maintain the handover dropping rate (CDR) under a prescribed upper bound while still maintaining an acceptable Call Blocking Rate (CBR). The learning CAC scheme is shown to work successfully in the presence of smoothly changing arrival rates of traffic. The CAC policy is obtained through a form of NeuroEvolution (NE) algorithm. Keywords-Call Admission Control, QoS constraints, NeuroEvlution of Augmenting Topologies.

I.

INTRODUCTION

Next Generation Wireless Systems are expected to provide a wide range of services including high speed data and real time multimedia services to mobile user. Each kind of service has its own QoS requirement. More sophisticated Call Admission Control (CAC) is a way to guarantee QoS for such multiple services while efficiently utilizing the limited network resources. Generally the potential performance measures in cellular networks are long term revenue or utility, call blocking rate (CBR) and handoff failure rate (CDR). CDR can be reduced by reserving some bandwidth for future handoffs. However, the CBR may increase due to such bandwidth reservation. Hence, reduction of CBR and CDR are conflicting requirements, and optimization of both is extremely complex. Several researchers have shown that the guard channel threshold policy, which priori reserves a set of channels in each cell to handle handoff requests, is optimal for minimizing the CDR. However the computational complexity of these approaches becomes too high for multi-class services with diverse characteristics and context, and exact solutions become intractable[1,2]. There are a variety of artificial intelligence techniques that have been applied to learning CAC schemes, such as Reinforcement Learning [3,4] and Genetic Algorithms [5,6].However these schemes do not provide a single

1-4244-0411-8/06/$20.00 ©2006 IEEE.

solution to address CAC in the context of varying traffic load. Additionally they mention optimizing the CDR without considering the maintenance of a competitive CBR. This paper proposes a novel approach to solve the CAC in multimedia cellular networks with multiple classes of traffic with different resource and QoS requirements. A near optimal CAC policy is obtained through a form of NeuroEvolution (NE) algorithm called NeuroEvolution of Augmenting Topologies (NEAT) [7]. The learned policies are compared to a greedy policy scheme, which always accepts a call if there is enough bandwidth capacity to accept this call. The performance of the algorithms have been evaluated using three QoS metrics: total reward accruing from the accepted calls (total rewards); CBR and CDR for each class of traffic. Simulation results indicate that our scheme guarantees the CDR can be maintained under a predefined upper bound while still maintaining an acceptable CBR level. Additionally the proposed learning scheme is shown to work successfully in the presence of smoothly changing arrival rate of traffic, a scenario which are too complicated to do in practice for the other learning scheme evaluated. The rest of this paper is organized as follows. Section II gives a brief introduction to NeuroEvolution of Augmenting Topologies (NEAT), and how to apply the NEAT to the CAC application; section III describes the test system model, and formulate the fitness function, while section IV compares the performance in constant and varying traffic loads. Finally, section V gives a conclusion. II.

HOW TO USE NEAT FOR CAC

NEAT is a kind of Neuroevolution (NE) method that has been shown to work very efficiently in complex Reinforcement Learning (RL) problems. NE is a combination of neural networks and genetic algorithms where neural networks are the phenotype being evaluated. The genotype is a compact representation that can be translated into an artificial neural network [8]. The NEAT method for evolving artificial neural networks is designed to take advantage of neural network

structure as a way of minimizing the dimensionality of the search space. The evolution starts with a randomly generated small set of neural networks with simple topologies. Each of these neural networks is assigned a fitness value depending on how well it suits the solution. For example, given a function to maximize, the value of the function itself can serve as a fitness factor. Once all the members of the population are assigned fitness values, a selection process is carried out where better individuals (high fitness value) stand a greater chance to be selected for the next operation. Selected individuals undergo recombination and mutation to result in new individuals. Low fitness individuals are discarded from the population and better ones are included. Structural mutations add new connections and nodes to networks in the population, leading to incremental growth. The whole process is repeated with this new population until some termination criteria is satisfied. The average fitness of the population is expected to increase over generations. [7] This section specifies the main requirements of applying NEAT to the CAC application. 1) How many inputs? Normally the inputs are the perceived state of the environment that is essential to make the action decision. For the CAC problem the perceived state may include the new request call and all the currently carried connections in the cell. 2) How many outputs? Generally the outputs are the possible actions that can be performed in the real application. In CAC, there are only two possible actions: Accept and Reject, so one output is enough. We define the output is a real number, and its value is between 0 and 1, if it is larger than 0.5, then the action selected is Accept; otherwise, it is Reject. 3) How to evaluate each policy and how to formulate the fitness function? The fitness function defines what is good or bad policy.. A good policy has a high fitness score by generating better performance and so will have a higher probability to create offspring policies. A bad policy has a low fitness score and has less probability to create offspring. The sole object of NEAT is to evolve the policy that can obtain the best fitness. The fitness function is described in a later section. 4) How to end NEAT? There are two ways to stop NEAT: when a predefined maximum fitness score is reached or when a prescribed time is up. In CAC, it is difficult to know what the possible highest fitness score can obtained, so a fixed time is used 5) Internal and External Supervisor. During the learning period, many neural networks are randomly generated and some of them can lead to very bad performance. To prevent damage or unacceptable performance, an Internal Supervisor is needed to filter out clearly bad performance policies. In CAC, neural

networks that always reject all kinds of calls are frequently be generated and would be evaluated. However these Always Reject policies are obviously not good and evaluation is pointless. So the Internal Supervisor gives their fitness score the value of “0.0”. Additionally, the outputs generated by NEAT are not always feasible actions. For example, an evaluating neural network may try to accept a request call when the system is full, which is physically unrealizable. Therefore an External Supervisor is added and uses predefined knowledge in order to filtering impossible actions due to the system constraints. III.

FITNESS FUNCTION

Our objective for CAC is to accept or reject request (or hand off) calls so as to maximize the expected revenue over an infinite planning horizon and meet the QoS constraints, and more specifically to trade off CBR and CDR so that the specified CDR will not exceed a predefined upper bound while retaining an acceptable CBR level. In this paper we only consider non-adaptive services, e.g. the bandwidth of a connection is constant. We assume m classes of traffic: {1, 2, ...m} . A connection in class i consumes bi units of bandwidth or channels, and the network can obtain ri rewards per logical second by carrying it. We also assume that each class arrives according to Poisson distribution and their holding time is exponentially distributed. All the arrival distributions and call holding distributions are independent of each other. The arrival events include new call arrival events and handoff arrival events. Since the call departures do not affect the CAC decisions, we only consider the arrival events in the CAC state space. Additionally, we only consider a cell with fixed capacity. The fixed total number of bandwidth (channels) is C . We denote λi , s and λi , h as the arrival rate of new setup and handover requests in class i , μi , s and μi , h as the average holding time of setup and handover calls in class i. A. First Goal: maximize revenue- F1 If the goal is simply to maximize revenue the fitness can be assessed by F1 , which can be calculated from the average reward obtained per request connection as equation (1). Let N be the total number of request connections (includes setup and handoff calls), n is the total number of accepted connections, r j is the reward rate for each accepted connection, and t j is the holding time of each accepted connection.

B. Second Goal: Handle CBR and CDR trade-off-- F2 .

n

¦r t

j j

F1 =

j =0

N

m

¦ r (k i

For large N, F1 ≅

i=0

(1)

i,s

μ i , s + ki , h μi , h ) N

Here m denotes the total number of classes of traffic, i denotes the class of traffic for each individual connection, ki , s and ki , h denotes the number of the accepted new setup calls and handover calls in class i .The relationship below also holds for large Ni,s and Ni,h, λi , s λi , h ki , s = N m pi , s and ki , h = N m pi , h , λ λ λ λ ( ) ( ) + + ¦ i ,s i,h ¦ i, s i,h i =0

i =0

Where pi.s and pi ,h denotes the acceptance rate of new

the number of accepted setup calls in class i ° pi , s = request setup calls in class i ° ® ° p = the number of accepted handover calls in class i °¯ i ,h request handover calls in class i Therefore § · m ¨ ¸ λi , s λi , h F1 ≅ ¦ ri ¨ m μi , s pi , s + m μi , h pi , h ¸ i =0 ¨ ¦ ( λi , s + λi , h ) ¸ ¦ ( λi , s + λi , h ) © i =0 ¹ i=0

(2)

To simplify the above formula, define the service demand parameter α as λi , s

¦ ( λi , s + λi , h ) i=0

μi , s and α i , h = ri

λi , h

¦ ( λi , s + λi , h ) m

× μi , h

i =0

m

F1 = ¦ (α i , s pi , s + α i , h pi , h ) i =0

(4)

i=0

The other parameter is the Penalty Fitness f .When the observed CBR or CDR exceeds their predefined upper limit bound ( TCBR and TCDR ), the evaluated policy needs to be large enough to affect the total fitness F2 . Additionally, although dropping handoff calls is usually more undesirable than blocking a new connection, designing a system with zero CDR is practically impossible and will lose unacceptable revenue. In practice, an acceptable CDR is around 1 or 2%. fi , s and TCBR give an opportunity to prevent the network from losing too much revenue by rejecting setup calls. Generally, we consider that CDRs are more important than CBRs, so the absolute value of f i ,h

f i , s . The fitness function is:

is larger than

It can be seen that the total revenue is determined by both of the network environment parameters and the learned CAC performance pi , s and pi , h , which can be calculated after evaluation of each individual policy. If the environment parameters change, the fitness score for each policy will be changed, then the learned best policy may be not the best any more. Therefore to learn a policy in a dynamic environment, the changed parameters should be taken into account as the inputs of NEAT.

m

m

F2 = ¦ (ηi , sα i , s pi , s + ηi , hα i , h pi , h )

be punished by a negative fitness score f , which should

setup request calls and handover calls in class i .

α i , s = ri

Two parameters are used to trade off between CBR and CDR: the Importance Factor η and the Penalty Fitness f . The Importance Factor η , indicates the preference of the network operator. For a network it is commonly more desirable to reject new setup requests and accept more handoff requests, so normally ηi , sα i , s < ηi ,hα i , h . By carefully selecting the η for each individual class of traffic, the expected CAC policy can be evolved. The fitness function is shown as below:

(3)

m

m

i=0

i =0

F2 = ¦ (η i , sα i , s pi , s + η i , hα i , h pi , h ) − ¦ ( fi , s + fi , h )

(5)

The two parameters η and f work cooperatively to trade off between CBRs and CDRs, η gives extra weight for each different kind of traffic to indicate the preference of the network, f i ,h can increase the speed to find policies that decrease CDR under TCDR , and

f i , s defines the upper

limit of TCBR to trade off between CBRs and CDRs. IV.

PERFORMANCE ANALYSIS

There are two sets of experiments. The first set involves a constant traffic load for two classes of traffic; the second uses varying traffic load during learning and evaluation Due to the heterogeneous bandwidth requirements of the four different kinds of traffic, the offered load for each cell is defined in a different way from the normal practice in the literature, which is named as “Normalized offered load” and defined as follows:

L=

1ª2 º ¦ ( λi , s μi , s bi + λi , h μi , h bi )» C «¬ i =1 ¼

A. Constant traffic load There are two classes of traffic labeled C1 and C2 , and two kinds of requests: setup and handover requests. a single cell with a limited number of channels C = 20 is considered. The network traffic parameters are set as shown in Table 1. Table 1: simulation environment parameters

C1

Parameters

C2

B (bandwidth unit)

1

2

Reward rate per logical second: r

1

1

New call arrival Rate, λi , s

0.2

0.1

Handoff arrival rate, λi, h

0.1

0.05

New call holding time, μi , s

30

20

Handoff holding time, μi , h

20

15

Simulation runs lasted 100,000,000 logic seconds with 13 generations, each generation had 200 policies, and each policy was evaluated for 20,000 events. The neural networks require five inputs: one input specifies what kind of request event is, the other four inputs specify how many ongoing calls for each kind of traffic in the cell. Three different CAC schemes were evaluated and compared:

FNEAT _ NoUL : No defined upper limit of CBRs and CDRs. FNEAT _ UL _ H : Only defines a penalty for breaking the CDR upper limit of each class of traffic. FNEAT _ UL : Defines upper limits and penalty fitness for

CBRs and CDRs of each class of traffic. Table 2 fitness function parameters

C1

C2

Parameters Setup

Handover

Setup

Handover

αi

13.33

4.44

3.33

2.22

ηi

3

10

5

80

ηi × α i

40

44.44

16.67

177.78

TCBR TCDR

10%

1%

10%

2%

fi

-5

-10

-5

-10

Table 2 shows the parameters of different CAC schemes. The value of each coefficient associated with each kind of traffic is pivotal because it indicates the expectation of the learned policy. The Importance Factors in Table 2 represent a high preference for handoff calls, and the setup calls in class C2 are less important and have more probability to be degraded.

Table 3 shows the CBRs and CDRs of four CAC schemes. It can be seen that for all of three learning schemes the CBR of C2 is highest because the Important Factors defined the system preference. Addtionally, FNEAT _ UL can successfully trade off between CBRs and CDRs to meet the predefined QoS constraints. Both the CBRs and CDRs are under their upper bound., Furthermore it obtains highest revenue compared to the other two learning schemes, and only 5% revenue loss compare to Greedy Policy. This is because the definition of TCBR and TCDR can prevent the unacceptable revenue lost.

F F The performances of NEAT _ NoUL and NEAT _ UL _ H are F similar. As NEAT _ NoUL does not define the upper limit bounds, it decreases CDRs as low as possible, which results in higher CBRs. Table 3 CBRs and CDRs in constant traffic load Revenue C1 % C2 % During Parameters Evaluation CBR CDR CBR CDR % Greedy

1.94

5.39

2.13

5.39

100

FNEAT _ NoUL

9.07

0.50

11.69

1.64

95.0

FNEAT _ UL _ H

8.65

0.41

20.81

1.37

94.0

FNEAT _ UL

8.53

0.79

9.9

1.89

95.7

B. Varying Traffic Load-experiment two This section describes how to apply our CAC scheme in a cell with varying traffic load. To simplify the problem, only the arrival rates of each kind of traffic are changed, the holding times are kept as constant. The experiment is still divided into two periods: the Learning Period and Evaluation Period. In the learning period, the system is trained with a small set of different normalized offered load, and in the evaluation period the learned best policy is proven to maintain the total CDR under a predefined upper limit with smoothly varying normalized offered loads. In the learning period, each policy is evaluated for the equal length of time duration called a training session. Each training session is divided into ten equal lengths of time intervals with different values of arrival rates for −1

each kind of traffic. The changes of λ follow a sine function as equation (6). Within each interval the λ is kept unchanged. Each policy is evaluated interval by interval, after finishing the evaluation of one interval the policy can obtain an interval fitness score. After finishing the evaluation of the tenth interval, the ten interval fitness scores are averaged as the final fitness of the policy. The interval fitness function (IF) is shown as equation (7), the neural network is punished a negative fitness score for t _ cdr ). In this breaking the upper limit of total CDR (

experiment, the upper limit of total CBR is 2%, and the penalty score is -10. −1

−1

−1

λt = λmin + λmin × sin( IF =

nπ 10

+

π

)

(6) −

ft _ cdr

(7)

The Evaluation Period evaluates the learned best policy. It is divided into 1000 equal lengths of time intervals with different values of arrival rates for each kind of traffic. The changes of λ −1 again follow a sine function as (8), which is similar as the one used in the learning phase, but sampled more finely. Within each evaluation interval the arrival rate is kept unchanged. Each evaluation interval is as long as one Training Interval. −1

−1

−1

V.

18

3α1, s p1, s + 10α 1, h p1, h + 5α 2 , s p2 , s + 80α 2 , h p2 , h

λe = λmin + λmin × sin(

under the predefined upper limit 2% during the whole evaluation period even normalized offered load is quite high.

nπ

)

(8)

1000

The system definition is shown as Table 4. Table 4 simulation parameters in varying traffic loads

C1

This paper proposes a novel learning approach to solve the CAC in multimedia cellular networks with multiple classes of traffic, which requires different bandwidth resource and QoS parameters. The near optimal CAC policy is obtained through a form of NeuroEvolution (NE) algorithm. Simulation results show that the new scheme can effectively guarantee the specified CDRs remain under their predefined upper bound while retaining acceptable CBRs. Additionally this scheme is shown to work successfully in the smoothly varying traffic loads. The new approach has also been evaluated against Qlearning as a benchmark and shows considerably better performance, both in terms of the revenue and QoS control and the time taken to learn new policies. This will be described in a future paper.

C2

Parameters

λ −1min μi

Setup

Handover

Setup

Handover

0.25

0.125

0.2

0.1

25

20

10

5

ACKNOWLEDGMENT Thanks Kenneth O. Stanley and Ugo Vieruccifor for providing free software of NEAT in the website http://www.cs.utexas.edu/~nn/index.php.

The CAC requires six inputs: the current value of normalized offered load, the user request, and the number of ongoing calls for four different kinds of traffic.

CBR and CDR%

The simulation runs 200,000,000 logical seconds. Each training or evaluation interval is 10,000 logical seconds. Each policy is evaluated in 100,000 logical seconds. The learned best policy is evaluated in 1,000,000 logical seconds with 1000 different normalized offered loads. NEAT evolves 20 generations, each generation has 100 policies. 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0

REFERENCES [1]

[2]

[3]

[4] t_ c b r o f G re e d y t_ c b r o f G re e d y t_ c b r o f C N t_ c d r o f C N u p p e r lim it o f t_ c d r i

[5]

[6]

0 .6

0 .7

CONCLUSION

0 .8

0 .9

1 .0

1 .1

[7]

N o rm a liz e d o ffe re d lo a d

Figure 1. t_cbr and t_cdr in varying traffic load

Figure 1 compares the t _ cbr and t _ cdr of the Greedy Scheme and CAC using NEAT (CN) computed t _ cbr and every evaluation interval, where t _ cdr indicates the total CBR and CDR. It can be seen that the t _ cbr of CAC using NEAT (CN) is maintained

[8]

W.S. Soh and H. S. Kim “ Dynamic Guard Bandwidth Scheme for Wireless Broadband Networks”, department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh. 2001 IEEE 0-7803-7016-3/01. B. Li, C. Lin, and S.T. Chanson “Analysis of a Hybrid Cutoff Priority Scheme for Multiple Classes of Traffic in Multimedia Wireless Networks,” Wireless Networks 4, pp279-290, 1998. S.M. Senouci, A.Beylot, G. Pujolle, “ Call Admission Control in Cellular Networks: A Reinforcement Learning Solution”. International journal of network management 2004. Fei Yu, Vincent W.S. Wong and Victor C. M. Leung, “ Reinforcement-Learning-Based Call Admission Control and Bandwidth Adaptation in Mobile Multimedia Networks”, IEEE 2003 0-7803-8185-8/03 C. Rose , and R. Yates, “ Genetic Algorithm and Call Admission to Telecommunications Networks”, Department of Electrical and Computer Engineering, Rutgers University. May 9, 1995. D. Karabudak, C. Hung, B. Bing “ A Call Admission Control Scheme using Genetic Algorithms”, 2004 ACM Symposium on Applied Computing. Kenneth O. Stanley, “Efficient Evolution of Neural Networks through Complexification.” 2004, PhD thesis, Artificial Intelligence Laboratory, The University of Texas at Austin, TX 78712. Gomez, F. J. “Robust Non-Linear Control through Neuroevolution.” 2003, PhD thesis, Department of Computer Sciences, The University of Texas at Austin. Technical Report AI-TR-03-3.