Building Fine Bayesian Networks Aided by PSO

0 downloads 0 Views 153KB Size Report
Genomics Algebra [20] [21] [22], i.e. G ↔ 00; A ↔ 01; T ↔ 10 and C ↔ 11 for a .... aplicaciones de la estructura booleana del Código Genético. Revista Cubana ...
Building Fine Bayesian Networks Aided by PSO-based Feature Selection

María del Carmen Chávez, Gladys Casas, Rafael Falcón, Jorge E. Moreira and Ricardo Grau Computer Science Department, Central University of Las Villas Carretera Camajuaní km 5 ½ Santa Clara, Cuba {mchavez, gladita, jmoreira, rfalcon, rgrau}@uclv.edu.cu

Abstract. A successful interpretation of data goes through discovering crucial relationships between variables. Such a task can be accomplished by a Bayesian network. The dark side is that, when lots of variables are involved, the learning of the network slows down and may lead to wrong results. In this study, we demonstrate the feasibility of applying an existing Particle Swarm Optimization (PSO)-based approach to feature selection for filtering the irrelevant attributes of the dataset, resulting in a fine Bayesian network built with the K2 algorithm. Empirical tests carried out with real data coming from the bioinformatics domain bear out that the PSO fitness function is in a straight concordance to the most widely known validation measures for classification. Keywords: Bayesian network, feature selection, classification, particle swarm optimization, validation measures.

1 Introduction Bayesian networks are a powerful knowledge representation and reasoning tool that behave well under uncertainty [2], [3], [4]. A Bayesian network is a directed acyclic graph (DAG) with a probability table associated to each node. The nodes in a Bayesian network represent propositional variables in a domain and the edges between them stand for the dependency relationships among the variables. When constructing Bayesian networks from databases, the most common approach is to model a set of attributes as network nodes [2], [3]. Learning a Bayesian network from data within the Bioinformatics field is often expensive, since we start considering quite a few numbers of variables, which has a detrimental direct impact on the building time it is required to finish shaping the network. Recall you have to make up both the nodes as well as their associated probabilities tables. This triggers the need to look for attribute reduction algorithms. It is the aim of this study to demonstrate that an existing Particle Swarm Optimization (PSO)-based algorithm to feature selection can be used as a starting point to compute the most suitable reduct (i.e., the minimal number of attributes

which correctly characterizes the dataset) from which the network can be built upon, therefore dropping the irrelevant attributes present in the dataset. This approach exploits the high convergence rate inherent to the PSO algorithm and yields a direct measure (fitness function value) of the reduct’s quality. From this point on, you can build a Bayesian network with sound good characteristics by using any of the traditional algorithms previously mentioned in the literature, such as the K2 method, although in classification task it is possible to use the NB (Naïve Bayes), or TAN (Tree Augmented Bayes Network), etc. [14]. Such algorithms are responsible of uncovering the hierarchical relationships between the attributes already included in our reduct. Our claim is empirically supported by a statistical concordance between the aforementioned fitness value and several widely known measures that serve the purpose of evaluating the performance of the classification systems, in this case the Bayesian network model. The study is structured as follows: Section 2 dwells on the main details of a Bayesian network whereas Section 3 summarizes the key ideas behind the PSO metaheuristic and the PSO-based feature selection algorithm. The experiments carried out to test the feasibility of our contribution are properly included in Section 4. Finally, some conclusions and comments are provided.

2 An Overview to Bayesian Networks A Bayesian network (also called “probabilistic network”) is formally a pair (D, P) where D is a Directed Acyclic Graph (DAG), P = {p(x1|π1), …, p(xn |πn)} is a set of n Conditional Probabilistic Distributions (CPD), one for each variable xi (nodes of the graph), and πi is the set of parents of node xi in D. The set P defines the associated joint probabilistic distribution as shown in Equation 1. n

p ( x ) = ∏ p ( xi π i )

x = ( x1 , x 2 ,..., x n )

(1)

i =1

Before the network can be used, it is necessary to undertake a learning stage which, in turn, can be divided into two steps: (1) defining the structure (topology) of the network and (2) setting the values of the parameters involved in the probability distribution associated with each node. Making up the network’s topology is a tough process, since you would have to consider all possible combinations of attributes in order to determine the hierarchical relationships between them (i.e., which of them are parents of which nodes). The usual behavior encountered in literature is to use heuristic search procedures that browse all throughout the search space attempting to find a fairly good reduct. We do not step aside of this state of mind but come up with a novel approach using PSO which offers a straight measure (fitness function value) that is in correlation to the classical measures for the assessment of the performance of classifiers. Our claim is

that the set of attributes bore by the PSO-based feature selection algorithm is an excellent starting point for building the network topology.

3 Particle Swarm Optimization PSO is an evolutionary computation technique proposed by Kennedy and Eberhart [9] [10] [11] [12]. This concept was motivated by the simulation of social behavior of biological organisms such as bird flocks or fish schools. The original intent was to reproduce the graceful but unpredictable movement of bird flocking. The PSO algorithm takes very seriously the idea of sharing information between the individuals (particles) so as to solve continuous optimization problems, although specific versions for discrete optimization problems have been developed [8] [13] [15] [16] [24]. 3.1 PSO fundamentals and notations PSO is initialized with a population of random solutions, called ‘particles’. Each particle is treated as a point in an S-dimensional space. The i-th particle is represented as Xi = (xi1, xi2, . . . ,xiS) and keeps track of its coordinates in the problem space. PSO also records the best solution of all the particles (gbest) achieved so far, as well as the best solution (pbest) reached by each particle thus far. The best previous position of any particle is recorded and represented as Pi = (pi1, pi2, . . . ,piS). The velocity of the i-th particle is denoted as a vector Vi = (vi1, vi2, . . . , viS). The individual’s velocity is updated following the criteria below:

vid = wvid + c1r1 ( pid − xid ) + c2 r2 ( pgd − xid )

(2)

where vi is the current velocity of the i-th particle, pi is the position reaching the best fitness value visited by the i-th particle and g is the particle having the best fitness among all the particles, d = 1, 2, …, S. Additionally, w is the inertia weight which may be fixed before the algorithm execution or may dynamically vary as the algorithm executes. The acceleration constants c1 and c2 in (2) represent the weighting of the stochastic acceleration terms that pull each particle toward pbest and gbest positions, respectively. Some versions of the PSO mate-heuristic confine the particle’s velocity on each dimension to a specified range limited by Vmax. The performance of each particle is computed according to a predefined fitness function. Finally, each particle is updated as in (3):

xid = xid + vid

(3)

3.2 Moving on to the formal algorithm The formal algorithm drawn from [24] is the following:

Given: m: the number of particles; c1, c2: positive acceleration constants; w: inertia weight MaxV: maximum velocity of particles MaxGen: maximum generation MaxFit: maximum fitness value Output: Pgbest: Global best position Begin Swarms {xi, vi} = Generate (m); /* Initialize a population of particles with random positions and velocities on S dimensions */ Pbest(i) = 0; i = 1,. . . ,m, d = 1,…, S Gbest = 0; Iter = 0; While(Iter < MaxGen and Gbest < MaxFit) {For(every particle i) {Fitness(i) = Evaluate(i); IF(Fitness(i) > Pbest(i)) {Pbest(i) = Fitness(i); pi= xi; } IF(Fitness(i) > Gbest) {Gbest = Fitness(i); gbest = i;} } For(every particle i) Update its velocity using (2) IF(vid > MaxV) {vid = MaxV;} IF(vid < -MaxV) {vid = -MaxV;} Update its position using (3) } } Iter = Iter + 1; } /* r1 and r2 are two random numbers in [0, 1] */ Return P_{gbest} End

3.3 PSO and Rough Set Theory to Feature Selection In this section we confine ourselves to briefly describe the algorithm depicted in [24]. The optimal feature selection problem can be approached via PSO in the following way. Assuming that we have N total features and that each feature may or may not be a part of the optimal redoubt it adds up a total of 2N possible vectors. Each vector is represented by a particle in the described approach. Over time, they change their position, communicate with each other and search around the local best and global best positions.

The particle’s position was represented as a binary bit string of length N (the total number of attributes). Has a bit a value of 1, it indicates that the current attribute is selected to make up the redoubt; otherwise the bit is reset. The velocity of each particle varies from 1 to Vmax and can be semantically interpreted as the number of features that must be changed in order to match that of the global position. A thorough explanation can be found at [24]. Once the particle’s velocity has been updated by (2), the particle’s position is also updated by comparing the current velocity with the number of different bits between the current particle and gbest. A semantic meaning leads to the update of the particle’s position. This algorithm also limits the maximum speed Vmax initially to the range [1, N] but it was afterwards shortened to the interval [1, N / 3]. By limiting the maximum speed, the particle can not fly too far away from the optimal solution. One of the most important components of the PSO technique is the fitness function which is used to evaluate how good the position of each particle is. In this case, the selected fitness function is outlined in (4).

Fitness

= α *γ R ( D ) + β

*

C − R

(4)

C

where γR (D) is a rough set based measure known as “quality of classification” applied to the conditional set of attributes R with respect to D [19], [24]; |R| is the number of bits set to one in the particle i.e. the length of the redoubt represented by the particle whereas |C| is the total number of features. It is also worth stressing that α and β are two parameters corresponding to the importance of the quality of classification and the redoubt length, respectively with the constraints that α ∈ [0, 1] and β = 1 - α. Another remarkable improvement made to the original PSO algorithm is the dynamic variation of the inertia weight by lowering it as the number of iterations increase. This guarantees that, at the beginning, a higher value of w brings about a faster exploration along the search space. As the algorithm is executed, this initial value is decreased to a predefined threshold, encouraging the local exploration.

4 Empirical results In this section we are about to introduce the dataset coming from the bioinformatics domain we have used to support our viewpoint (that by previously filtering a dataset in terms of the relevant attributes and using a sound indicator of the goodness of the reduct, comparable results with other traditional validation measures for classification are achieved). Subsequently, we will delve on the experiments carried out as well as the statistical tests to check their validity.

4.1 An overview of the Reverse Transcriptase Protein Dataset Proteins are the primary components of living things. If they are the workhorses of the biochemical world, nucleic acids are their drivers because they control the proteins’ action. All of the genetic information in any living creature is stored in deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) components, which are polymers of four simple nucleic acid units, called nucleotides. Four nucleotides can be found in DNA, each of them consisting of three parts: one of two-base molecules (a purine or a pyrimidine) plus a sugar (ribose in RNA and dexoxyribose DNA) and one or more phosphate groups. The purine nucleotides are adenine (A) and guanine (G) while the pyrimidines are cytosine (C) and thymine (T). Nucleotides are sometimes called bases and, since DNA is comprised by two complementary strands bonded together, these units are often called base-pairs. The length of a DNA sequence is often measured in thousands of bases, abbreviated kb. Nucleotides are generally abbreviated by their first letter and appended into sequences e.g., CCTATAG [6]. The task under consideration here is the prediction of mutations of the DNA sequence of Reverse Transcriptase (RT) protein which was obtained at the Stanford University. Such dataset involves 603 base-pairs in 419 sequences (cases). It is publicly available online at http://hivdb.stanford.edu/cgibin/PIResiNote.cgi In order to set the values for each attribute, the list of mutations described in the dataset was used along with the reference strain HXB2. In particular, for each sample, the original sequence was reconstructed by replacing each mutation with the reverse transcriptase genes of HXB2 (Gene Bank access number K03455). In this way, each sequence that was built has no gaps or areas with loss of information and therefore, the “unknown” attribute value is no longer required [7] [17] [18]. This modified dataset is to be recoded with the results of the Research Group on Genomics Algebra [20] [21] [22], i.e. G ↔ 00; A ↔ 01; T ↔ 10 and C ↔ 11 for a grand total of 1206 attributes. The number of decision classes (values of the dependent variable) was set to three before applying the K-means clustering algorithm so as to group the DNA sequences into three well-defined clusters. This number was set empirically [5]. Table 1 pictures the search process carried out with the PSO-based feature selection approach for the reverse transcriptase protein of the HIV virus. In this case, up to 10 000 runs of the algorithm were performed so as to observe the behavior of the algorithm. A shallow look at the results shown at Table 1 allows remarking how the fitness value is stabilized as the number of iterations increase and the obtained reduct remains the same. It draws us to think of the possibility of using such reduct as a starting point for the construction of the Bayesian network. Notice that, after a few number of iterations, the PSO approach is able to find a reduct with a high fitness value (Iterations = 30, Fitness = 0.978). When the number of iterations is 500, the algorithm behavior is stabilized. The fitness is 0.997 and the size of the optimal reduct is 12.

Table 1. Execution of the PSO- based feature selection algorithm. When the number of iterations is 500, the algorithm behavior is stabilized. Iterations

Best Solution

30

5 22 25 29 45 48 50 53 54 61 68 69 78 86 93 96 100 115 121 123 124 134 145 146 147 159 164 165 172 176 179 180 183 190 194 198 212 213 219 222 227 246 253 254 257 260 262 264 266 267 268 272 279 296 297 304 305 311 313 316 319 321 324 333 336 341 343 346 348 353 359 360 363 366 367 374 375 390 391 395 398 401 404 5 22 25 29 48 53 54 61 68 69 78 86 93 100 121 123 134 145 146 164 165 172 174 176 179 180 183 194 212 219 222 227 229 246 254 257 260 262 264 267 279 284 296 297 304 305 311 313 316 319 321 324 337 341 343 348 353 359 360 363 366 367 374 377 391 395 398 401 404 5 22 29 48 53 61 68 86 93 100 124 134 145 146 164 212 254 260 262 264 268 279 284 305 311 313 316 324 333 341 343 348 353 359 360 363 366 374 390 5 61 93 134 260 262 324 341 343 353 360 366 5 61 93 134 260 262 324 341 343 353 360 366 5 61 93 134 260 262 324 341 343 353 360 366

50

100 500 1000 10000

Fitness Value 112 0.978 174 229 284 337 377

Redoubt Length 85

124 0.981 198 268 333 390

74

172 0.989 319

41

0.997 0.997 0.997

12 12 12

Our experiment was configured as follows: the PSO-based algorithm was run fifteen times and both the value of the fitness function and the yielded reduct were stored for each run with 500 iterations each. The size of the swarm was set to 20 particles. The balancing parameters were chosen as α = 0.9 and β = 0.1 The second stage of the experiment was the building of the Bayesian networks, each of them using a single reduct from the ones obtained in the previous step. The K2 machine learning algorithm for building Bayesian networks was utilized in the framework of the WEKA environment [1] [2] [3] [25]. The first parameter in the BN is the maximum number of parents. We set up the network so as to use two parents per node, but it is possible to set this parameter to one (1), which automatically turns it into a Naive Bayes classifier. When set to 2, a TAN is learned and when the number of parents is greater than two, then a Bayes Augmented Network (BAN) is learned. Another parameter is the order of the variables. We assume that the variables are randomly ordered. The third parameter is the score type, which determines the measure used to assess the quality of the network structure. You may pick up one of the following list: Bayes, BDeu, Minimum Description Length (MDL), Akaike Information Criterion (AIC) and Entropy. For our experiments, we selected the Bayes measure [3] because it is reliable although any of the remaining ones could also be employed. After running the Bayesian network against the previously outlined bioinformatics dataset, a group of measures for validating the classification is reported. Among them: the accuracy of the classification, the area under the ROC curves (AUC) and precisions per class. It is known that the ROC curve contains all the information about

the False Positive and True Positive rates [25]. The outcome of the experiment is portrayed in Table 2. Table 2. Fitness and overall accuracy as well as AUC and precision per class Iteration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Fitness 0.9873 0.9958 0.9080 0.9955 0.9970 0.9975 0.9972 0.9965 0.9958 0.9965 0.9958 0.9965 0.9960 0.9972 0.9960

Accuracy 98.09 94.27 94.51 95.22 98.32 98.56 94.27 97.85 96.65 98.56 96.89 97.61 97.61 98.56 97.85

AUC1 0.999 0.992 0.992 0.997 0.998 0.999 0.969 0.996 1 0.996 0.996 0.997 0.995 0.998 0.997

AUC2 1 0.986 0.985 0.994 0.999 1 0.987 0.999 0.997 0.999 0.958 0.998 0.999 1 0.999

AUC3 1 0.995 0.994 0.998 1 1 0.999 1 0.998 0.996 1 1 1 1 1

Prec1 0.902 0.915 0.913 0.936 0.957 0.958 0.81 0.977 1 0.978 0.932 0.955 0.976 0.939 0.956

Prec2 0.977 0.894 0.883 0.916 0.969 0.977 0.89 0.948 0.925 0.969 0.947 0.947 0.941 1 0.962

Prec3 1 0.975 0.987 0.975 0.996 0.996 0.996 0.996 0.983 0.996 0.988 0.996 0.996 0.988 0.992

Table 3. Mean Ranks according to Kendall test. Kendall coefficient (W). Iterations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kendall’s W

FitnessAccuracy 9.50 12.75 14.00 12.50 4.00 1.50 8.50 6.25 11.00 4.00 10.50 7.25 8.50 2.25 7.50 0.754

Fitness-AccuracyAUC 5.70 13.20 14.10 10.90 4.70 2.50 10.80 6.70 8.90 7.40 10.20 7.10 8.00 3.20 6.60 0.611

Fitness-AccuracyAUC-Precision 5.75 13.19 13.81 11.38 4.88 3.13 11.00 6.06 8.69 6.06 10.13 7.13 7.38 4.56 6.88 0.552

Our next step is to find out whether there exists a concordance between the fitness value and the rest of the measures (columns) in Table 2. For that purpose, we set up a Kendall test [23].

Fig. 1. The structure of the Bayesian network for the reduct computed in the iteration number 6 which consists of 10 nodes. The learned Bayesian network represents the information well, allowing the inference of mutations in reverse transcriptase sequences.

Kendall’s test was run with several measure groups: Fitness-Accuracy, FitnessAccuracy-AUC and Fitness-Accuracy-AUC-Precision. The results appear in Table 3. For each run, a list with the mean ranks of iterations and Kendall’s W coefficients are shown. There exists a satisfactory concordance between the fitness of PSO method and the Accuracy (W = 0.754). The W value for Fitness-Accuracy-AUC is 0.611 and if we include Precision for class as a fourth estimator, we obtain W = 0.555, in all cases greater than 0.5. So we conclude that there exists an adequate concordance between Fitness and the mentioned classification gauges. The best results are always obtained in the iteration number 6 (the lowest rank). In Table 2 it can be seen that in this case we effectively yield the best fitness and network’s performance. Remark that in such iteration we obtain a reduct with size 10, more simplified than the one initially produced. The corresponding Bayesian Network is portrayed in Fig. 1. The nodes of the network can be interpreted through the operation module 6 (remember that each codon is represented as 6 binary digits). For instance, position 378 = 63 * 6 corresponds to position 6 of codon 63; position 974 = 162 * 6 + 2 corresponds to position 2 of codon 163; position 473 corresponds to position 5 of

codon 79. The arcs stand for the dependences. So, the final class depends on all positions. Position 12 interacts with position 29 and 1056 whereas position 473 interacts with position 1132 and position 617 interacts with position 942. After completing the network’s topology, each node is assigned a suitable conditional probabilities table. The WEKA environment was utilized for this purpose. The resultant accuracy was 98.5% as stated above.

4

Conclusions

In this paper we have proposed a solution to the annoying problem of building the topology of a Bayesian network by filtering the relevant attributes using a PSO-based feature selection approach. Once the redoubt has been computed, a Bayesian network is built from scratch by means of the K2 algorithm. Our contribution lies in the demonstration of the direct concordance between the values of the fitness function associated to the best redoubt and the rest of the traditionally used validation indicators for classification, therefore guessing in advance the performance of the Bayesian network during classification from the fitness value achieved for computing its best redoubt. As a future work we will focus on several specific scenarios where the so-built Bayesian network outperforms traditional approaches. Acknowledgments. This work was developed in the framework of a collaboration program supported by VLIR (Flemish Interuniversity Council, Belgium). We would also like to thank the critical and helpful comments and suggestions of the referees so as to improve the paper’s readability and overall quality.

References 1. 1. Baldi, P., Brunak S.: Assessing the accuracy of prediction Algorithms for classification: An Overview. Bioinformatics, Vol 16 No. 5 (2000) 412 – 424 2. Bockhorst J., Craven M., Page D., Shavlik J.and Glasner J.: A Bayesian network approach to operon prediction. Bioinformatics Vol. 19 no. 10 (2003) 1471227–1235 3. Bouckaert R. R.: Bayesian Network Classifiers in Weka, (2004) 4. Brazma A., Jonassen I.: Context - specific independence in Bayesian networks. Inc Proc. Twelfth Conference on Uncertainty in Artificial Intelligence, (1996) 115 – 123 5. Grau, R., Galpert, D., Chávez, M., Sánchez, R., Casas, G., Morgado, E.: Algunas aplicaciones de la estructura booleana del Código Genético. Revista Cubana de Ciencias Informáticas, Año 1 Vol 1, (2005) 6. Hunter, L.: Artificial Intelligence and Molecular Biology. 500 pp., references, index, illus. electronic text ISBN 0-262-58115-9 http://www.biosino.org/mirror/www.aaai.org/Press/Books/Hunter/default.htm 7. Marchal, K., Thijs, G., De Keersmaecker, S., Monsieurs, P., De Moor, B., Vanderleyden, J.: Genome-specific higher-order background models to improve motif detection. Trends in Microbiology, 11 (2): (2003) 61-66

8. Mahamesd G.H.O., Andries P.E., Ayed S.: Dynamic Clustering using PSO with Application in Unsupervised Image Classification. Transactions on Engineering, computing and Technology, V9, (2005) 9. Kennedy, J.: The particle swarm: social adaptation of knowledge. In: IEEE International Conference on Evolutionary Computation, April 13–16, (1997) 303–308 10. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, (1995a) 1942–1948 11. Kennedy, J., Eberhart, R.C.: A new optimizer using particle swarm theory. In: Sixth International Symposium on Micro Machine and Human Science. Nagoya, (1995b) 39–43 12. Kennedy, J., Spears, W. M.: Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: Proceedings of the IEEE International Conference on Evolutionary Computation, (1998) 39–43 13. Kudo, M., Sklansky, J.,. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33 (1): (2000.) 25–41 14. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J. Inza, I. Lozano, J. A. Armañanzas, R., Santafé, G., Pérez, A. and Robles, V.: Machine learning in bioinformatics. BIOINFORMATICS. VOL 7. NO 1: (2005) 86-112 15. Liu, H., Li, J., Wong, L.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics 13: (2002) 51-60 16. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. Proc. IEEE 7th International Conference on Tools with Artificial Intelligence, (1995) 338-391 17. Mellors, J. W., Brendan, A. L., Schinazi, R. F.: Mutations in HIV-1 Reverse Transcriptase and Protease Associated with Drug Resistance. 18. Murray, R. J.: Predicting Human Immunodeficiency Virus Type 1 Drug Resistance From Genotype Using Machine Learning. Master of Science School of Informatics. University of Edinburgh (2004) 19. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishing, Dordrecht, (1991) 20. Sánchez, R., Grau, R., Morgado, E.: Genetic Code Boolean Algebras, WSEAS Transactions on Biology and Biomedicine. Issue 2 Volume 1, (2004) 190-197 21. Sánchez, R., Morgado, E., Grau, R.: A genetic code boolean structure I. The meaning of boolean deductions. Bulletin of Mathematical Biology 67, (2005) 1–14 22. Sánchez, R., Morgado, E., Grau, R.: The genetic code boolean lattice, MATCH Communications in Mathematical and in Computer Chemistry, Vol 52 (2004) 29-46 23. Siegel, S.: Diseño Experimental no parametrico, segunda edicion, (1987) 245- 256 24. Wang X., Yang J., Teng X., Xia W., Jensen R.: Feature Selection Based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letter, Elsevier, (2006) 25. Witten, I.H., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques. Second edition, Morgan Kaufman, (2005) 363- 483