Software Defect Prediction using a High ... - Semantic Scholar

International Journal of Software Engineering and Its Applications Vol. 8, No. 12 (2014), pp. 177-188 http://dx.doi.org/10.14257/ijseia.2014.8.12.17

Software Defect Prediction using a High Performance Neural Network Mohamad Mahdi Askari and Vahid Khatibi Bardsiri Department of Computer Science Islamic Azad University, Kerman, Iran [email protected], [email protected] Abstract Predicting the existing defects in software products is one of the considerable issues in software engineering that contributes a lot toward saving time in software production and maintenance process. In fact, finding the desirable models for predicting software defects has nowadays turned into one of the main goals of software engineers. Since intricacies and restrictions of software development are increasing and unwilling consequences such as failure and errors decrease software quality and customer satisfaction, producing error-free software is very difficult and challenging. One of the efficient models in this field is multilayer neural network with proper learning algorithm. Many of the learning algorithms suffer from extra overfitting in the learning datasets. In this article, setting multilayer neural network method was used in order to improve and increase generalization capability of learning algorithm in predicting software defects. In order to solve the existing problems, a new method is proposed by developing new learning methods based on support vector machine principles and using evolutionary algorithms. The proposed method prevents from overfitting issue and maximizes classification margin. Efficiency of the proposed algorithm has been validated against 11 machine learning models and statistical methods within 3 NASA datasets. Results reveal that the proposed algorithm provides higher accuracy and precision compared to the other models. Keywords: software defect prediction, support vector machine, multilayer perceptron neural network

1. Introduction With the development of computer technology, software systems became more in number and more complicated and due to limited human ability, there are a lot of defects are caused in software’s life cycle development [1]. Despite precise planning, acceptable documents and control process along with software development, specific defects are inevitable. A software defect is an error, defect, problem, mistake or failure in a computer program. This way, the system may cause a wrong or unexpected result [2]. In face the precision of the errors that occur within codes is of importance because it can directly contribute to decreasing costs and developing software quality [3]. Studies have shown that defects are found in only some software modules. These defects can lead to software failure, decreased customer satisfaction or increased maintenance costs [4]. Error prediction models have turned into a famous method for early recognition of code error. These studies usually develop an error prediction model that helps software engineers with focusing on development activities and codes prone to error and as a result, they provide software quality improvement and better uses of

ISSN: 1738-9984 IJSEIA Copyright ⓒ 2014 SERSC

International Journal of Software Engineering and Its Applications Vol. 8, No. 12 (2014)

resources. Unfortunately, there has been no consensus achieved about influencing factors on error prediction [3]. Through the past years, several methods of software defect prediction have been developed [5-6]. The discussed methods have made use of different datasets, models and validation measures. Among the most frequently used datasets to mention is general datasets available NASA repository which were produced since 2005 and open to public access. In the recent years, using defect prediction models that are based on machine learning has been widely paid attention by researchers [5, 7]. Among the models based on machine learning, we can mention models based on decision making trees, k-nearest neighbor, naive bayes model, support vector machine, neural networks and etc. in this article, a combination of multilayer perceptron neural network and support vector machine algorithm is used, so we can obtain acceptable results with the aid of non-linear calculative power of support vector machine and high learning power of multilayer neural networks. In Section 2 of the article, some of the most important works in this field are given and in Section 3 of the article, support vector machine is discussed. In Section 4, multilayer neural networks are discoursed and in Section 5 the proposed algorithm is introduced. In Section 6, validation method and used datasets are introduced and efficiency and cross validation criteria are explained. Finally, in Section 7, validation results are given and conclusions are made.

2. Related Works There has been a lot of related research conducted most of which are a combination of data mining algorithm and an optimization algorithm. Porter and Selby [8] applied classification trees on the two datasets of NASA and Hughes air-craft organization by means of surface criteria method and managed to reach 79.3% precision. Though, their metrics were not suitable for software defect prediction studies because of imbalanced datasets. Mertik, et al., [9] validated software quality with pruned C4.5 and non-pruned C4.5, multi-method, support vector machine with RBF core, support vector machine using linear techniques and NASA datasets and multi-method’s data mining tool managed to reach higher efficiency. Chang et al [10] suggested an error predicting approach based on society rules for discovering error patterns. They reported prediction results as great. The advantage of this method is discovering error patterns that are used in typical analysis to find error cases. Sing, et al., [11] used a support vector machine model to find the relationship between object-oriented and error-prone metrics tentatively and using KC1 dataset and calculated the efficiency of support vector machine with received features operation. Aljamimi and Gouti [12] combined support vector machine and probabilistic neural network and to do so, they made use of 5 NASA datasets to make prediction models provable, deniable or repeatable. According to their results, probabilistic neural network model provides good efficiency for large datasets. Kahn, et al., [13] proposed a software defect predicting model using particle swarm optimization and support vector machine that was called P-SVM model. This model benefited from nonlinear calculations of support vector machine and optimization parameters. At first, P-SVM model uses Particle swarm optimization for calculating the best parameters of support vector machine and then, combines it with the optimized support vector machine to predict software defects. Ponita and Chitra [14], aiming to help developer of defect detection and increase accuracy and precision of data mining algorithms’ classification, firstly, being inspired by weakness of the existing classification algorithms, proposed a new neural network algorithm with a degree of fuzziness. Arma, et al., [15] provided a comparison between the efficiency of preprocess results of data by filtering and the main dataset without preprocess for early and primary detection of defective software components and considering the fact that predicting

178

Copyright ⓒ 2014 SERSC


software defects in the primary stages of development decreases costs. They compared the efficiencies by 4 samples of the K-nearest neighbor techniques (KNN-LWL, Kstar, IBK, IB1), NNGE1 and random tree and random forest.

3. Overview of Support Vector Machine Support vector machine is a machine learning algorithm based on statistical learning theory and structural risk minimization principle that was first proposed by Vanik in 1995. Support vector machine is used for regression and classification and its main idea is to transform a classification function calculation problem into a grade 2 programming problem with constraints. The linear sample is usually inseparable; in this case kernel function is used which can register the input data to higher multi-dimensional feature space so the sample is separable in this feature space. The support vector machine has some important features: 1- It can work in spaces with high dimensions and under small learning sample situation; this means that support vector machine’s ability to learn can be independent from space dimension’s feature. 2 It is a desirable global solution and since support vector machine is formulated, it is considered a grade 2 formula. 3- It can link applicable non-linear models with other techniques.

4. Multilayer Perceptron Neural Network Multilayer perceptron neural network is a type of feed-forward neural network [16]. This network includes three layers: input, hidden and output. The number of cells each layer can have is determined by trial and error method. In multilayer perceptron neural networks, each neuron in a layer is linked with the previous layer’s all neurons. Such networks are called ‘completely related’ networks. Figure 1 shows a perceptron networks with two hidden layers and 3 neurons in each hidden layer and 1 neuron in the output layer.

Figure 1. Multilayer Perceptron Network The input layer is transformer and a tool for preparing the data. The last layer-output layerincludes the values predicted by the network and registers model output. The middle layers-

1

Non Nested Generalized Exemplars


179


hidden layers- that are formed by calculator neurons are where the data is processed. network output is obtained according to formula 1. (1) Where Yi represents network output, Xi resembles network input, Wij is the connection weights between input and output nodes, Bi is the bias and Fi is the transfer function.

5. The Hybrid Error Prediction Model The aim of proposing this model is to present a quick and precise non-linear classification for predicting software defect. Knowing that multi-layer neural network algorithm is one of the most famous universal approximators and can suite most of the datasets and the advantages of support vector machine, it was tried to teach the multi-layer neural network algorithm in a way that makes it possible to achieve higher efficiency or at least efficiency similar to other methods. In addition, to increase the efficiency, finding better answers in each level compared to previous ones and parallel implementation to increase the speed in larger problems, genetic algorithm is used as one of the most well-known meta-heuristic methods. Model prediction stages are illustrated in Figure (2).

Figure 2. Implementation Stage In order to increase the learning precision of neural network in predicting software defect, regulation method was used here to improve generalization capability of the multi-layer neural network [17]. However, there are methods such as selective sampling, we can prevent the learning dataset from over-fitting by using the regulation technique. There are three

180



common regulation methods for neural networks: Early stopping, curvature-driven smoothing and weight decay. In the Early stopping method, the existing data is divided into three subsets. The first one is the learning dataset which is used for updating network weights and biases. The second one is used as a validation dataset and, the third subset is used for evaluation of the final performance. The curvature-driven smoothing method includes requirements on learning algorithms’ cost function which depends on network edition. The input values of the hybrid error prediction model are a matrix whose columns are input vectors and a line vector related to test and validation whose values must be 1 and -1. There are some changeable parameters in the hybrid proposed algorithms that can be valued such as stress parameter that can be increased or decreased and increasing its value leads to quicker convergence. However, the selected stress for larger values may guide the population algorithm toward the local minimum [18], the penalty parameter that is similar to the existing parameter in support vector machine and is used as Regulation parameter. Here, we use penalty coefficient so that very large values of weights and biases are penalized and this may indicate genetic algorithm combination with neural network. The above mechanism forces the genetic algorithms to always keep a number of the bests in each generation so that if chromosomes are destroyed due to using crossover and mutation operators, its efficiency doesn’t decline [19] and the number of hidden neurons is indicated in the input. The samples are first separated by their positive and negative classes and then the dataset is balanced according to formula (2) provided that the negative samples are 3 times larger than the number of positive samples. Firstly a one-dimensional array and as many as 3 times the number of positive classes generated random digits between 0 and 1. Then each generated digit is multiplied by n-1 where n is the number of negative samples. The generated digits are considered integers and each one is added one unit. Finally, there are i negative classes indicators generated whose number is exactly 3 times the number of positive classes. The extent to which indicators are decreased in this relationship would be n-3p. I = Round ((n-1)*Rand (1 , 3P )) + 1

(2)

Then, the algorithm uses Nguyen – Widrow function to randomly generate the initial population2 in a uniform distribution. This function regulates the weights and biases in a way that there are few neurons wasted due to being put in the input space and also since each area of the input space has neurons, learning happens quickly. After some repetitions, the neural network begins to connect the data and the error on validation dataset becomes ascending. When the validation error increases to a specific number of repeats the learning section algorithm is stopped and the weights and biases related to the minimum validation error are accepted. Likewise, in order to control the complexity of classification, particular values are used to penalize the severely large values of weights and biases to force the neural network to respond to fewer and milder probabilities as well. To make the next generation better than the previous one, chromosomes with higher fitness percentage must have more chances of being involved in the linkage act. Maximum and minimum of particular values are used to calculate the fitness and likewise individuals are ranked according to their fitness values and then the crossover linking operation is implemented to generate new individuals through random selection of parents by their scores and according to the random variable which is shown in formula (3) where is a random variable with a uniform distribution and a>0 is selected as the selected stress parameter.

2

Npop


181


(3) Eventually, the obtained output is the optimized weight of the neural network learning and is utilized in estimating the model.

6. Evaluation Method To determine the efficiency of the proposed method, it has been tried to compare a range of existing methods with the proposed method. The aim of this experimental study is to estimate the proposed method to predict the defect of software modules prone to error against (RF, J48, NB1) [20], (CFS+NB) [21], (KNN, RBF, BBN, NB2) [4] and (CBA2, C4.5, RIPPER) [22] models. In addition, the 10-fold cross validation method is used for reflecting real efficiency of the proposed method. 6.1. Dataset The data used in this research are obtained from NASA MDP database. This database includes 3 datasets that are open to public since 2005 and include information of different software researched by NASA database. A brief description of the available database is brought in Table 1. Table 1. Specifications of the Selected Datasets Dataset

Languag e

Size(LOC)

CM1

C

20

505

10

KC1

C++

43

2107

15

PC1

C

40

1107

7

# of Modules

Defective Modules %

Each of these datasets contains data and features that describe erroneous modules and errorless ones. Some measurement criteria that are used for modeling the existing software in this database are mentioned in Table 2. Table 2. Features of Data Sets Metric

Type

V (g)

McCabe

Cyclomatic Complexity

EV (g)

McCabe

Essential Complexity

IV (g)

McCabe

Design Complexity

182

Definition



LOC

McCabe

Total Lines of code

N

Derived

Total number of operands and operators

Halstead V

Derived Halstead

L

Volume on minimal implementation

Derived Program Length = V/N Halstead

D

Derived Difficulty = 1/L Halstead

I

Derived Intelligent count Halstead

E

Derived Halstead

B

Effort to write program = V/L

Derived Effort Estimate Halstead

T

Derived Halstead

Time to write program = E/18s

LOCode

Line Count

Number of lines of statement

LOComment

Line Count

Number of lines of comment

LOBlank

Line Count

Number of lines of blank

LOCode And Comment

Line Count

Number of lines of code and comment

UniqOp

Basic Halstead

UniqOpnd

Basic Halstead

TotalOp

Number of Unique operands

Basic Halstead


Number of Unique operators

Total Number of operators

183


TotalOpnd

Basic Total Number of operands

Halstead BranchCount

Branch

Total Number of branch count

These datasets have mostly been used in researches about software engineering in the recent years. In this article, KC1, PC1 and CM1 datasets are used. 6.2. Efficiency Measurement Criteria

There are plenty of criteria for estimating efficiency of the classification techniques but the accuracy criteria which are acquired by confusion matrix in Table 3 and according to formula (4) have been used in many articles. Table 3. Confusion Matrix

Actual Predicted defective

Not defective

Defective

TP

FP

Not defective

FN

TN

Accuracy=

(4)

6.3. Cross Validation In this test, 10 cross validations were used in which the main sample was randomly divided to 10 subsamples. 9 out of 10 subsamples were for learning data and one subsample was for data validation to test the model. The cross validation process is repeated 10 times. The average result from 10 repeats is considered as the final result.

7. Empirical Result In this section we show the results from implementing the proposed method on three different databases. These results are associated with the test data and are obtained from the implementation of 10-fold cross validation. The calculated parameter in this section is Accuracy. Table 4 illustrates implementation results of the proposed method in comparison with those of other methods in all the three databases. As can be seen, the proposed algorithm has reached the highest precision among all databases.

184



Table 4. Results of Comparing Accuracy Parameter in Three Datasets Dataset CM1

KC1

PC1

Algorithms 90.16

85.4

93.24

J 48

85.46

84.16

90.90

NB1

82.26

82.44

88.27

RF

85.75

85.06

91.43

CFS + NB

84.84

82.46

89.00

KNN

83.27

83.99

91.82

RBF

89.91

84.81

92.84

BBN

76.83

75.99

90.44

NB2

86.74

82.86

89.21

CBA2

80.36

83.71

91.78

C4.5

85.12

81.34

88.39

RIPPER

84.52

82.91

92.07

Proposed method

The results of Table 4 confirm the high capability of the proposed algorithm. Accuracy value for CM1 dataset is shown in Figure 3. As shown in the figure, the highest accuracy goes to the proposed method and the lowest goes to BNN algorithm. RBF stands second to the proposed method. The superiority of the proposed method is obvious in this figure and only RBF managed to achieve results close to those of the proposed method.

Figure 3. Results on CM1 dataset


185


Figure 4 shows results of accuracy estimation on KC1 dataset. As illustrated in the figure, the proposed method stands at the highest rank of accuracy. RBF method has managed to achieve results very close to those of the proposed method. The lowest accuracy goes to BNN algorithm.

Figure 4. Results on KC1 Dataset Figure 5 confirms the higher efficiency of the proposed method in comparison with others once more. RBF has achieved the closest results and NB1 stands the last.

Figure 5. Results on PC1 Dataset

8. Conclusion Since initial and early detection of defective software components helps software experts to optimally benefit from time and resources, increases reliability and improves software control process, this article attempted to propose a hybrid method to improve the precision of predicting defective components. The proposed method benefits from support vector machine algorithm, multi-layer perceptron neural networks and evolutionary Algorithms. In this method, a new learning approach was used in multi-layer perceptron neural networks algorithm which increased network efficiency significantly. The efficiency of the used hybrid method was compared and analyzed against 11 statistical and machine learning models on 3 NASA datasets. The results indicate that the proposed method has higher efficiency as compared to other models. Acceptable efficiency on large and small datasets is one of the advantages of the proposed hybrid method. In future work, it will be attempted to give a more prominent role to evolutionary algorithms so that implementing new operators would increase the prediction accuracy.

186



References [1]. Y. Chen, X. Shen, P. Du and B. Ge, “Research on software defect prediction based on data mining”, Computer and Automation Engineering, (2010) February 26-28, Singapore. [2]. M. Ravat and S. Dubey, “Software defect prediction models for quality improvement: a literature study”, International Journal of Computer Science, vol. 9, (2012), pp. 288-296. [3]. T. Hall, S. Beechams, D. Bowes and S. Counsel, “A systematic review of fault prediction performance in software engineering”, Software Engineering IEEE Transactions on, vol. 38, (2011), pp. 1276-1304. [4]. K. Elish and M. Elish, “Predicting Defect-Prone Software Modules Using Support Vector Machines”, Journal of Systems and Software, vol. 81, (2008), pp. 649-660. [5]. C. Catal and B. Diri, “A systematic review of Software Fault Prediction Studies”, Expert Systems with Applications Elsevier, vol. 36, (2009), pp. 7346-7354. [6]. C. Catal, “Software fault prediction: A literature review and current trends”, Expert Systems with Applications Elsevier, vol. 38, (2011), pp. 4626-4636. [7]. A. G. Koru and H. Liu, “Building Defect Prediction Models in Practice”, Software IEEE Transactions on, vol. 22, (2005), pp. 23-29. [8]. A. Porter and R. Selby, “Empirically Guided Software Development Using Metric-Based Classification Trees”, Software IEEE, vol. 7, (1990), pp. 46-54. [9]. M. Mertic, M. Lenic, G. Stiglic and P. Kokol, “Estimating Software Quality With Advanced Data Mining Techniques”, Software Engineering Advances International Conference on, (2006), October 29, Tahiti. [10]. C. Chang, C. Chu and Y. Yeh, “Integration In-Process Software Defect Prediction With Assocation Mining To Discover Defect Pattern”, Information and software technology, vol. 51, (2009), pp. 357-384. [11]. Y. Singh, A. Kaur and R. Malhotra, “Software Fault Proneness Prediction Using Support Vector Machines”, Proceedings of the World Congress on Engineering, (2009), January, London, U. K. [12]. H. A. Al-Jamimi and L. Ghouti, “Efficient Prediction Of Software Fault Proneness Modules Using Support Vector Machines And Probabilistic Neural Networks”, Software Engineering (MySEC), IEEE, (2011), pp. 251-256. [13]. H. Can, X. Jianchun, Z. Ruide and L. Juelong, “A New Model For Software Defect Prediction Using Particle Swarm Optimization And Support Vector Machine”, Control and Decision Conference (CCDC), (2013) May 25-27, Guiyang. [14]. K. Punitha and S. Chitra, “Software Defect Prediction Using Software Metrics –A Survey”, Information Communication and Embedded Systems (ICICES), (2013), February 21-22, Chennai. [15]. G. K. Armah, G. Luo and K. Qin, “Multi_Level Data Pre_Processing for Software Defect Prediction”, Information Management, Innovation Management and Industrial Engineering (ICIII), (2013), November, 23-24, Xi'an. [16]. V. K. Asari, “Training of a Feedforward Multiple-valued Neural Networks by Error Back-propagation with a Multilevel Threshold Function”, IEEE Transactions on Neural Networks, vol. 12, (2001), pp. 1519-1521. [17]. O. Ludwig, U. Nunes and R. Araujo, “Eigenvalue decay: A new method for neural network regularization”, Neurocomputing, vol. 124, (2013), pp. 33-42. [18]. O. Ludwig, P. Gonzalez and A. Lima, “Optimization of ANN applied to non-linear system identification”, International Conference on Modeling, Identification, and Control, (2006), February 06, USA. [19]. L. J. Eshelman and J. D. Schaffer, “Real-Coded Genetic Algorithms and Interval-Schemata”, L. Darrell Whitley, FOGA, Morgan Kaufmann, (1992), pp. 187-202. [20]. A. Chug and S. Dhall, “Software Defect Prediction Using Supervised Learning Algorithm and Unsupervised Learning Algorithm”, The Next Generation Information Technology Summit, (2013) September 26-27, Noida. [21]. P. Wang, C. Jin and S. Jin, “Software Defect Prediction Scheme Based on Feature Selection”, Information Science and Engineering (ISISE), (2012), December 14-16, Shanghai. [22]. B. Ma, K. Dejaeger, J. Vanthienen and B. Baesens, “Software Defect Prediction Based on Association Rule Classification”, International Conference on E-Business Intelligence, (2010), December 19-21, china.


187


Authors

Mohamad Mahdi Askari He received his B. Sc degree of Information Technology engineering from Payam Noor University of Bam in (2011), he is a student in M. Sc of Information Technology engineering (Software design and production trends) of Islamic Azad University of Kerman, his main research interests include software defect prediction, software project management and data mining.

Vahid Khatibi Bardsiri He is a lecturer at the Department of Computer Science, Islamic Azad University, Kerman Branch, and Iran. He holds B.Sc. and M.Sc. Degrees in software engineering from Ferdowsi University of Mashhad, Iran (2002) and Science and Research Branch of Islamic Azad University, Iran (2004), respectively. He received his Ph.D. in the area of software development effort estimation at University Technology Malaysia (UTM), 2013. He is a senior member of International Association of Computer Science and Information Technology (IACSIT). His research interests are agile software development methods, soft computing techniques and software measurement.

188