Extracting software static defect models using data mining

15 downloads 37394 Views 3MB Size Report
sive finding of the best data mining technique which solve the problem of predicting defective modules in software projects, these tools should apply and ...
Ain Shams Engineering Journal (2014) xxx, xxx–xxx

Ain Shams University

Ain Shams Engineering Journal www.elsevier.com/locate/asej www.sciencedirect.com

ELECTRICAL ENGINEERING

Extracting software static defect models using data mining Ahmed H. Yousef

*

Department of Computers and Systems, Faculty of Engineering, Ain Shams University, Cairo, Egypt Received 2 June 2014; revised 13 August 2014; accepted 20 September 2014

KEYWORDS Defect models; Software testing; Software metrics; Defect prediction

Abstract Large software projects are subject to quality risks of having defective modules that will cause failures during the software execution. Several software repositories contain source code of large projects that are composed of many modules. These software repositories include data for the software metrics of these modules and the defective state of each module. In this paper, a data mining approach is used to show the attributes that predict the defective state of software modules. Software solution architecture is proposed to convert the extracted knowledge into data mining models that can be integrated with the current software project metrics and bugs data in order to enhance the prediction. The results show better prediction capabilities when all the algorithms are combined using weighted votes. When only one individual algorithm is used, Naı¨ ve Bayes algorithm has the best results, then the Neural Network and the Decision Trees algorithms. Ó 2014 Production and hosting by Elsevier B.V. on behalf of Ain Shams University.

1. Introduction The data mining approach is used to discover many hidden factors regarding software. This includes the success factors of software projects that attracted researchers a long time ago [1], the support of software testing management [2] and the defect pattern discovery [3]. The software defects estimation and prediction processes are used in the analysis of software quality [4]. They are also used to estimate the required effort and cost of maintenance after delivery [5]. The maintain* Tel.: +20 1001701882. E-mail address: [email protected]. Peer review under responsibility of Ain Shams University.

Production and hosting by Elsevier

ability evaluation depends mainly on the use of software design metrics [6]. The quality of software depends on the maturity level of the software development processes [7]. In [8], Kelly stressed the importance of inspections to minimize the densities of defects, especially the requirements and design inspections. The number of defect densities decreased exponentially in the coding phase because defects were fixed when detected and did not migrate to subsequent phases. This peer review based inspection process becomes very expensive and unrealistic in the coding phase because the code review process is labor intensive; 8– 20 LOC/minute can be inspected and this effort repeats for all members of the review team [9]. Therefore, the prediction of potential defective modules (on the code level) will be useful to prioritize which modules are the best candidates for this expensive code reviews, inspection and through testing. This prediction process is a complementary approach that should be used carefully to avoid the trend

http://dx.doi.org/10.1016/j.asej.2014.09.007 2090-4479 Ó 2014 Production and hosting by Elsevier B.V. on behalf of Ain Shams University. Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

2 of leaving blind portions of the system that may contain defects which may be missed. Because assessing software goes toward areas that are believed to be mission critical, several defect detectors based on static code measures are proposed [10]. With the existence of software repositories including [11], several attempts are done to use empirical data to construct and validate different static defect models for multiple software projects or different versions of the same project [12]. One of these attempts is the NASA data sets online [13] which contains five large software projects with thousands of modules. Each set includes introductory text that lists and explains the static measures and other variables of each project. Data points correspond to modules and their statistic measures as well as a binary variable indicating whether the module is defective or not. In order to model the causes of defective modules, static measures obtained from source code, mainly size, coupling, cohesion, inheritance, and complexity measures are used to determine whether a module is defective or not [14]. In [2,3,15], data mining techniques are used to search for rules that indicate modules with a high probability of being defective. Other techniques include the use of SVM and Service oriented architecture using expert COCOMO [16,17]. Koru and Liu used PROMISE repository to analyze the datasets representing the five projects provided by NASA [14,18]. The data sets are stratified according to module size. They found improved prediction performance in the subsets that included larger modules. Based on these results, they developed some guidelines that practitioners can follow in their defect-prediction efforts. These guidelines can help to overcome the low prediction performance, reported by Khoshgoftaar and Menzies [19,20]. However, these guidelines are not converted to a real software system that is integrated with compilers to guarantee that software developers follow the guidelines and write code that minimize the probability of having defective modules. The absence of an established benchmark makes it hard, if not impossible, to compare approaches. Refs. [21,22] present a benchmark for defect prediction, in the form of a publicly available data set consisting of several software systems, and provide an extensive comparison of the explanative and predictive power of well-known bug prediction approaches. Ref. [23] proposed a framework for benchmarking the classification models for software defect prediction and found that no significant performance differences could be detected among the top 17 classifiers. An analysis of several software defect models is found in [24]. An extensive comparison is performed of many machinelearning algorithms on the PROMISE data sets [19]. However, the results of this comparison do not indicate that a specific algorithm has significant better results in the capabilities of prediction. The aforementioned challenges can be summarized to three points. There is a need for effective software tools that uses data from available software repositories to predict the defective modules of a new software project in order to enable the project managers and quality team to know where to invest their time in testing and debugging. Because there is no conclusive finding of the best data mining technique which solve the problem of predicting defective modules in software projects, these tools should apply and combine different data mining techniques and compare the results.

A.H. Yousef The contributions of this paper are worth consideration for two reasons. Firstly, it provides a solution architecture that can enhance software development based on data in software repositories. Most of the aforementioned references converted the software metrics into knowledge and rules that should be used by experienced development team members to enhance their code. However, there is a lack of a software architecture that can be implemented and integrated with compilers to use these extracted knowledge to warn the less experienced software development team members of potential defective modules and enable the project manager to assign more resources to these potential defective modules. The integration of the data mining models with bugs tracking databases and metrics extracted from software source code leads to have more accurate results. Secondly, the paper provides a benchmark that provides an ensemble of data mining models in the defective modules prediction problem and compares the results. This benchmark has the same conditions including the same input dataset, the same percentage of the training dataset and the same feature selection approach. The paper proposes a combined approach to get better prediction results according to the values of precision, recall, accuracy and F-measure. The remainder of this paper is organized as follows: Section 2 discusses related work and highlighted the most important approaches that are used to tackle the problem of extracting software defect models. This section can be skipped if the reader is familiar with software defect models literature. Section 3 presents the methodology used in this research. This includes the used metrics, proposed architecture, data collection methodology and the used data mining algorithms. In Section 4, the results of applying the data mining tools on several projects are presented. A thorough analysis and discussion are presented. Finally, Section 5 concludes the paper. 2. Related work A software defect is defined as a deficiency or an issue arising from the use of a software product and causes it to perform unexpectedly. The ways to practically identify, and prioritize defects are addressed in several researches including [25]. Ref. [26] listed the top ten practices to reduce defects. An effective defect fighting strategy has many techniques that are described in [27,28]. Defects are real, observable indications of the software development progress from a schedule viewpoint [29]. Determining whether a new software change is buggy or clean is used to predict latent software defects before releasing the software to users [30]. Boehm found that about 80% of the defects come from 20% of the modules, and about half the modules are defect free [26]. Defects in the software cause failures of the programs during operation. Software failure intensity is related to the number of defects in the program [31]. Systematically conducted unit and system tests play an important role in revealing faults that lead to failures during the software execution [27]. It is natural to expect the continued discovery of defects after the software is placed into operation. Fixing Defects in the operational phase is considerably more expensive than doing so in the development or testing phase. Cost-escalation factors ranges from 5:1 to 100:1 [26]. The study of defect prediction can be classified into two categories: dynamic models and static models. The dynamic

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

Extracting software static defect models using data mining models are used during the software development lifecycle to predict the future number of defects based on the defects distribution over time. These dynamic models use the number of defects detected in the earlier phases of the development process as the independent variable [32]. Defect prediction and estimation models are used to predict the total number of defects and their distribution over time. Dynamic models require defect data and are used once the tracking of defects starts [29]. The number of remaining defects in an application under development may cause faults in the product that should be studied carefully to ensure reliability [33–36]. This number is one of the most important factors that allow one to decide if a piece of software is ready to be released to customers [37,38]. An old review of these researches is found in [39]. Dynamic defects models follow distribution patterns including Rayleigh distribution and other similar curves [40,41,42]. Ref. [43] found that the previous assumption of the Rayleigh distribution is not valid for current complex projects and proposed a linear combination of Rayleigh curves is to match the collected empirical data. Ref. [44] found that software estimation models that are based on COQUALMO are useful in facilitating the right balance of activities to meet quality goals. The static defect models attempt to predict the number of defects in a software product or project according to the product/project characteristics and metrics (i.e. measurements of software products [45]). The used software metrics include complexity, lines of code, volume, size and other metrics [46,47]. Fenton criticized the numerous software metrics and statistical models that use the size and complexity metrics to predict defects [48]. He recommended holistic models for software defect prediction, such as Bayesian Belief Networks. Several researchers used and enhanced Bayesian networks techniques since that [49–52]. Ref. [51] introduced a Bayesian Belief Networks based commercial software tool, AgenaRisk. Ref. [53] presented a framework for conducting software defect prediction as aid for the practitioner project managers and provided a guide to the body of existing studies on defect prediction. Other researches regarding defect models include regression models [54,55], statistical models and machine learning based models [56,57]. This includes artificial neural networks, instance-based reasoning, decision trees, and rule inductions. Many techniques are used as enhancements to predict unknown or missing data like [58]. Other static models include regression models [19] and metric based technology [30]. Other infrastructures include SUDS for creating dynamic bug detection tools [59,60] which contains phases for both static analysis and dynamic instrumentation allowing users to create tools that take advantage of both paradigms [59–61]. Because data regarding many characteristics are usually missing, rough sets can be used to overcome the missing data [62,63]. [58] introduced a set of fuzzy modeling systems that enable the use of (easy-to-measure features), to estimate the hard-to-measure features with a high accuracy. Many researchers searched for the best mathematical models for a special class of software systems or for a certain task like maintenance [64]. Starting from an existing static prediction model, Refs. [65,66] have introduced a new approach, using the time-scale. Because ‘‘noise’’ complicates the analysis of defects, Ref. [67] proposes approaches to deal with the noise in defect data by

3 measuring the impact of noise on defect prediction models. It is found that use of the moving averages and exponential smoothing methods produce defect-occurrence projections that are worse than using the original data without processing [22]. Ref. [68] verified the existence of variability in a bug prediction model’s accuracy over time. There is a difference between bugs in the open source software (like Debian) and regular software bugs due to the difference in the number of testers and bug reporters [69]. In [70], an empirical case study regarding the releases of the OpenBSD open source is presented. Rodriguez used datasets from the PROMISE repository and apply feature selection and genetic algorithm to work only with those attributes from the datasets capable of predicting defective modules [71]. Guo suggested the use of random forest approach to enhance the prediction capabilities [72]. Many studies suggest the use of data points that represent larger modules only to improve the prediction capabilities [73] because a large portion of the small modules is usually non-defective. All the aforementioned references showed that the software defect prediction problem has two unsolved challenges. Firstly, the data mining models that converted the available software repositories into knowledge and guidelines for development team should be embedded in software solution architecture to be useful. This software solution architecture should be integrated with compilers and bug tracking systems to provide instant feedback about potential defective modules to the development team. Secondly, it is believed that using single detector models will not lead to accurate prediction. Therefore, a software system that combines different data mining models and compare them should be available in the problem of predicting defective software modules. 3. Methodology The methodology of this paper will be presented in the next subsections. These subsections include the used software metrics, the data collection procedures, the used data mining models, the solution architecture and the used performance criterions. 3.1. Software metrics The used software metrics include McCabe metrics, Halstead metrics, branch-count and five different measures representing the lines of code [10]. The McCabe metrics are a collection of four software metrics: cyclomatic complexity, essential complexity, design complexity and Lines of Code (LOC). Cyclomatic Complexity, or ‘‘v(G)’’, measures the number of ‘‘linearly independent paths’’. Essential Complexity, or ‘‘eV(G)’’ is the extent to which a flowgraph can be ‘‘reduced’’. Design Complexity, or ‘‘iv(G)’’, is the cyclomatic complexity of a module’s reduced flowgraph. The flow graph, ‘‘G’’, of a module is reduced to eliminate any complexity which does not influence the interrelationship between design modules. Lines of code are measured according to McCabe’s line counting conventions. The Halstead metrics are categorized into three groups: the base measures, the derived measures, and lines of code measures. The base measures include the number of unique operators, the number of unique operands, the total occurrences of

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

4

A.H. Yousef Table 1

Properties of the software projects datasets used in the repository.

Project ID

Development location

Project

Language

Number of modules

Percent of defective modules (%)

Notes

01 02

Location 1 Location 2

CM1 JM1

C C

496 10,885

9.7 19.0

03 04

Location 3 Location 3

KC1 KC2

C++ C++

2107 523

15.4 20.0

05 Total

Location 4

PC1

C

1107 15,118

6.8 17.9

A NASA spacecraft instrument Real-time predictive ground system: Uses simulations to generate predictions Storage management for receiving and processing ground data Science data processing; another part of the same project as KC1 with different personnel. Shared some third-party software libraries with KC1, but no other software overlap Flight software for earth orbiting satellite

operators and the total occurrences of operands. The derived measures include the volume, difficulty, intelligence, effort to write the program and time to write it. The lines of code measures include blank lines, code lines, code and comment lines. 3.2. Data collection Data is collected from the NASA data sets online [13]. Each set includes introductory text that lists and explains the static measures and other variables. Data points correspond to modules and their statistic measures as well as a binary variable indicating whether the module is defective or not. The attributes of the used datasets are shown in Table 1. In this study, to avoid the critiques provided by Fenton [48], no data points are removed from any project unless it contains errors. The numbers of records containing errors do not exceed 0.1% of the data. For example, importing two records from JM1 file causes error due to missing columns values. It is worth mentioning here that although 10,885 records are stated to exist in the repository for project JM1, only

Table 2

7087 record were available. Therefore, the total number of actual available records used is 11,325 records representing the five projects. The data used in this study came from five large software projects. The used attributes are shown in Table 2. The Number of attributes is 22. The lines of code measures are represented by five different measures. Also, three McCabe metrics and twelve Halstead measures are used. Also a branch-count is used as an additional input field. The solution uses the defects attribute as a goal predictable field. Table 2 defines the attributes of the modules. All attributes are numeric except the predicted attribute (defects) which is Boolean. The equations governing the derived attributes are summarized in Table 3. 3.3. Data mining models This paper uses four models for static defect prediction. They are Naı¨ ve Bayes, Neural Networks, Association Rules and Decision Tree. They are selected because of their capabilities

Definitions of attributes of each project.

Attribute ID

Attribute

Definition

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

Loc v(g) eV(g) iv(g) N v L D I E B T LOCode LOComment LOBlank LOCodeAndComment uniq_Op uniq_Opnd total_Op total_Opnd branchCount defects

McCabe’s line count of code McCabe ‘‘cyclomatic complexity’’ McCabe ‘‘essential complexity’’ McCabe ‘‘design complexity’’ Halstead total operators + operands Halstead ‘‘volume’’ Halstead ‘‘program length’’ Halstead ‘‘difficulty’’ Halstead ‘‘intelligence’’ Halstead ‘‘effort’’: effort to write program Halstead ‘‘Number of Delivered Bugs’’ Halstead’s time estimator: time to write program Halstead’s line count Halstead’s count of lines of comments Halstead’s count of blank lines Halstead’s count of lines which contain both code and comments unique operators unique operands total operators total operands of the flow graph module has/has not one or more reported defects

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

Extracting software static defect models using data mining Table 3

Governing equations of derived attributes.

Attribute

Equation

Halstead ‘‘difficulty’’ Halstead ‘‘intelligence’’ Halstead ‘‘effort’’: effort to write program Halstead ‘‘Number of Delivered Bugs’’ Halstead ‘‘time estimator’’: time to write program

D = 1/L i = L 0 * V0 e = V/L b = V/3000 t = E/18 s

for classification. Then, they are combined together using the weighted voting rule approach. The output decision of each model is assigned a weight and the overall decision is the weighted average of this individual decision. The models are applied to a table that contains the data of all the modules that represent the union or superset of all aforementioned projects. Input data is randomly split into two sets, a training set and a testing set. The percentage of data for testing is selected to 30%. The training set is used to create the mining model. The testing set is used to check model accuracy. The Naı¨ ve Bayes algorithm uses Bayes theorem and does not take into account the dependencies that may exist between input variables. The Neural Network algorithm creates a perceptron network that is composed of up to three layers of neurons. These layers are an input layer, an optional hidden layer, and an output layer. Both the Naı¨ ve Bayes and the neural network algorithm are readable by the machine but not readable by humans. Therefore, the development team will use their viewers including the dependency network viewer and the attribute discrimination viewer to explore relationships. Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. They are readable by the humans. For example, an association rule could be ‘‘If the lines of code is larger than 400, the module are 80% likely to be defective’’.

Figure 1

5 The decision trees algorithm is a classification algorithm for use in predictive modeling. The decision trees algorithm builds a data mining model by creating a series of splits in the tree. These splits are represented as nodes. These nodes can be understood by the development team by using the decision tree viewer. 3.4. Solution architecture The solution architecture of the proposed system is shown in Fig. 1. The software repositories data (shown in the top right corner) are fed to the data mining algorithms. The user can define which projects to include in the data mining process and which to exclude, depending on their similarity to the software project to be predicted. Several data mining models are created including Naı¨ ve Bayes, Neural Networks, Association Rules and Decision Tree. When the current project software development lifecycle starts, the source code of the current project (shown in the top left corner) is evaluated by the metrics evaluator and fed with the bug tracking data to the predictor data mining algorithms. The results of applying the models to the current project data to predict which modules are expected to be defective are sent to the development team. Also, hints to solve these defective modules are passed to the programmer. For example, ‘‘module1 should have less LOC’’ or ‘‘minimize the complexity of module2’’. 3.5. Performance criteria In order to evaluate the different data mining models performance, the measures of precision, recall, accuracy and F-measure are used. In statistical analysis of binary classification, the F-measure is a measure of a test’s accuracy. It considers both the precision and the recall of the test to compute the score. The F-Measure can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The traditional F-measure is the harmonic mean of precision and recall.

Solution architecture for defective module identification using data mining and software metrics.

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

6

A.H. Yousef

Precision ¼ tp=ðtp þ fpÞ Recall ¼ tp=ðtp þ fnÞ F  measure ¼ 2  Recall  Precision=ðRecall þ PrecisionÞ where tp is the number of true positives; fp is number of false positives; fn is number of false negatives and tn is number of true negatives

4.2. Results of Naı¨ve Bayes classifier

4. Results and discussion Data is analyzed using the four different data mining algorithms. The results and discussions of executing the aforementioned data mining models are shown in the following subsections under the headings: association algorithm, Naı¨ ve Bayes, Neural Networks, Association Rules and Decision Tree. Each section contains the results of the ‘‘All Modules Data Table’’ that represents the five projects. The section is finalized with a performance evaluation comparison of the algorithms. 4.1. Simple correlation Data of the all modules data table show that the variations between defects probability in different projects are very high.

Figure 2

It is clear also that the order of the attributes is different from one project to another. Fig. 2 shows the order of attributes that affect the defective attribute. As shown from the left top corner of Fig. 2, the project attribute is the most affecting factor of defective modules and non-defective modules. Other secondary factors are dependent on the project. For example, project CM1 depends on UniqueOperand and IOComments while JM1 depends on IV(g) and branch count.

Naı¨ ve Bayes algorithm is applied to the data table representing the five projects. The dependency network of the Naı¨ ve Bayes algorithm is shown in Fig. 3. The results for the top twenty rows of the attribute discrimination viewer are shown in the following Fig. 4. The attribute discrimination viewer shows a colored bar that indicates how strongly the attribute value favors certain predictable attribute value. Fig. 3 shows that the Branch Count is the most important factor that determines the defective attribute. The complexity measures iv(g) and v(g) come next in importance, and then the lines of code measure. The nodes in the dependency network, when viewed clockwise, have descending order in importance. Fig. 4 shows that when the design complexity iv(g) is small (less than 5.6) and when the Branch Count is small (less than

Simple correlation of between defective field and other fields (for all projects and each individual project).

Figure 3

Dependency network of the Naı¨ ve Bayes classifier for the all projects dataset.

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

Extracting software static defect models using data mining

Figure 4

7

Attribute values and their probability on the defective field.

9.8), it is expected that the module is not defective. When the values of these attributes are higher, the probability to have defective module is high. 4.3. Results of neural network classifier The neural network attribute discrimination viewer is shown below in Fig. 5. It shows the value range of each attribute, the expected state of the module to be defective or not and the expectation probability. The first lines of the discrimination viewer (Fig. 5) show that high value of complexity presented by essential complexity Ev(g), branch count and design complexity iv(g) causes high prediction probability of defective modules. When the

cyclomatic complexity v(g) is larger than 218, it is expected the modules will not be defective with probability 71.9%. The probability of having non-defective modules decreases to 63.6 when v(g) decreases to be in the range from 68.3 to 218. 4.4. Results of association rules classifier The association rules viewer is shown in Fig. 6. It shows the association rules sorted by the results of the predictive defective attribute, then by importance and then by probability. The probability describes how likely the result of a rule is to occur. The importance is designed to measure the usefulness of a rule.

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

8

A.H. Yousef

Figure 5

Attribute discrimination of the neural network data mining algorithm.

The dependency network is shown in Fig. 7 with strongest relations appear first and weakest relations appear last. The dependency network of the association rules data mining algorithm always contains large number of nodes, compared to other algorithm because it does not only show the attributes that affect the predictive state but it shows also the possible values of these attributes. 4.5. Results of decision tree classifier A decision tree is composed of a series of splits, with the most important split, as determined by the algorithm, at the left of the viewer in the ‘‘All’’ node. This is shown in Fig. 8. Additional splits occur to the right. The first split is the most important because it contains the strongest splitcausing condition in the dataset. On each node in the tree, the viewer displays the condition that caused the split and a histogram that represents the distribution of the states of the predictable attribute, ordered by popularity. The background color of each node represents the concentration of cases. The dependency network of the decision tree algorithm is shown in Fig. 9. Fig. 9 shows that the branch count is the most important attribute, then the number of lines that contain both code and comment. Again, the attributes are sorted in a way

that show weaker dependencies when we move in the clockwise direction. 4.6. Accuracy of models Comparing the accuracy of the different models is an important task to evaluate the correctness of the data mining algorithms. Two techniques are used here. They are the lift chart and the classification matrix. The lift chart comparison of data mining algorithms is shown in Fig. 10. Table 4 shows the classification matrix and the derived metrics of accuracy, precision, recall and F-measure. It is clear from the lift chart that population correct% of all algorithms is nearly equal. This is homogenous with accuracy results represented by the accuracy column of Table 4. It is clear also that the association rule algorithm has a region which has no improvement of the percentage of correct population, represented by the flat horizontal curve representing the range between 40% and 60%. The derived measures of accuracy, precision, recall and F-measure are calculated from the classification metrics for the four algorithms. It is clear from Table 4 that although the accuracy values are nearly the same for the four algorithms, the differences in precision, recall and F-Measure values are very significant. This note is very consistent with

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

Extracting software static defect models using data mining

Figure 6

Figure 7

9

Top 20 association rules and their favorite state.

Dependency chart for association rules data mining algorithm.

other references including [10,74,75]. The best values of FMeasure are for Naı¨ ve Bayes, then the Neural Network and the Decision Trees algorithms.

The fifth row shows the use of the weighted voting algorithm to combine the decisions of the four models. The values of accuracy, precision, recall and F-measure are much better

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

10

A.H. Yousef

Figure 8

Decision tree.

than the use of any individual algorithm. However, the weighted voting algorithm has the drawback of having no viewers that explain the relationships between the input attributes and the defective module output attribute. Figure 9

Dependency network of the decision tree algorithm.

Figure 10

Table 4

Lift chart comparisons of data mining algorithms.

Classification matrix, precision, recall, accuracy and f-measure of data mining algorithms.

Data mining algorithm

Defective field

True (Actual)

False (Actual)

Precision (%)

Recall (%)

Accuracy (%)

F-measure (%)

Association rules

True (Predicted) False (Predicted)

35 776

16 2650

68.6

4.3

77.2

8.1

Decision tree

True (Predicted) False (Predicted)

221 590

206 2380

51.8

27.3

76.6

35.7

Naı¨ ve Bayes

True (Predicted) False (Predicted)

367 444

468 2118

44.0

45.3

73.2

44.6

Neural network

True (Predicted) False (Predicted)

263 548

364 2222

41.9

32.4

73.2

36.6

Weighted voting rule of the four algorithms

True (Predicted) False (Predicted)

417 344

418 2118

49.9

54.8

77.6

52.3

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

Extracting software static defect models using data mining 5. Conclusion and future work The knowledge of available software repositories can be integrated with current software development tools and bug tracking systems to notify software developers with potential software defective modules. Data mining algorithms can be used to show the software project managers, quality team members and programmers the relationships between the quality metrics and the potential defective modules using the different viewers. These viewers include the dependency network viewer, attribute discrimination viewer and the lift chart comparison viewer. The data mining models have reasonable accurate prediction of these defective modules. The best prediction performance measures achieved by the use of individual approach were for the Naı¨ ve Bayes algorithm, then the Neural Network algorithm and the Decision Trees algorithms. The weighted voting rule approach is used to combine the decisions of the four models and achieved better results for accuracy, precision, recall and F-measure. The authors encourage researchers to check the generality and validity of the conclusions by applying the same methodology to new software projects. Multifunction optimization techniques may be compared with data mining techniques as well. Other remaining software engineering challenges should be considered. These challenges include the validity of the architecture and software metrics tools with the existence of in-line assembly code, or the parallel programming embedded code sections inside C, like the case of CUDA. Acknowledgments The author would like to acknowledge Sherehan Shams, Karim Sahar and Engy Ahmed for their help presenting the results of this research. Special thanks to all the individuals who participated and contributed to this paper. I thank the anonymous reviewers for their important comments. References [1] Yousef A et al. Software projects success factors identification using data mining. In: The 2006 international conference on computer engineering and systems. IEEE; 2006. [2] Hewett R. Mining software defect data to support software testing management. Appl Intell 2011;34(2):245–57. [3] Chang C-P, Chu C-P, Yeh Y-F. Integrating in-process software defect prediction with association mining to discover defect pattern. Inf Softw Technol 2009;51(2):375–84. [4] QingWANG SW, Mingshu LI. Software defect prediction. J Softw Maint Evol: Res Practice 2008;19(7):1565–80 [In Chinese]. [5] Putnam LH. Example of an early sizing, cost and schedule estimate for an application software system. In: Computer software and applications conference, 1978. COMPSAC ‘78. The IEEE computer society’s second international; 1978. [6] El-lateef TA, Yousef A, Ismail M. Object oriented design metrics framework based on code extraction. In: International conferences on computer engineering & systems; 2008. [7] Harter DE, Krishnan MS, Slaughter SA. Effects of process maturity on quality, cycle time, and effort in software product development. Manage Sci 2000;46(4):451–66. [8] Kelly JC, Sherif JS, Hops J. An analysis of defect densities found during software inspections. J. Syst. Softw. 1992;17(2):111–7. [9] Menzies TD, Justin; Orrego, Andres; Chapman, R Assessing predictors of software defects. In: Proc. workshop predictive software models; 2004.

11 [10] Menzies T, Di Stefano JS. How good is your blind spot sampling policy. In: Proceedings eighth IEEE international symposium on high assurance systems engineering. IEEE; 2004. [11] Open Research Datasets in Software Engineering. [08.08.2014]. [12] Ramler R et al. Key questions in building defect prediction models in practice. In: Bomarius F et al., editors. Product-focused software process improvement. Berlin Heidelberg: Springer; 2009. p. 14–27. [13] PROMISE Software Engineering Repository Public Datasets. . [14] Koru AG, Liu H. Building effective defect-prediction models in practice. Software IEEE 2005;22(6):23–9. [15] Riquelme JC et al. JISBD 04: finding defective software modules by means of data mining techniques. Latin Am Trans IEEE (Revista IEEE America Latina). 2009;7(3):377–82. [16] Liu G-J, Wang W-Y. Research on an educational software defect prediction model based on SVM. In: Zhang X et al., editors. Entertainment for education. Digital techniques and systems. Berlin Heidelberg: Springer; 2010. p. 215–22. [17] Liu J et al. A defect prediction model for software based on service oriented architecture using expert COCOMO. In: Proceedings of the 21st annual international conference on Chinese control and decision conference. Guilin, China: IEEE Press; 2009. p. 2639–42. [18] Gu¨nesß Koru A, Tian J. An empirical comparison and characterization of high defect and high complexity modules. J Syst Softw 2003;67(3):153–63. [19] Khoshgoftaar TM, Seliya N. The necessity of assuring quality in software measurement data. In Proceedings 10th International Symposium on Software Metrics; 2004. [20] Menzies T, Di Stefano JS, Cunanan C, Chapman R. Mining repositories to assist in project planning and resource allocation. In: International workshop on mining software repositories; 2004. [21] D’Ambros M, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. In: 7th IEEE working conference on mining software repositories (MSR). IEEE; 2010. [22] D’Ambros M, Lanza M, Robbes R. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Softw Eng 2012;17(4–5):531–77. [23] Lessmann S et al. Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 2008;34(4):485–96. [24] Yu T-J, Shen VY, Dunsmore HE. An analysis of several software defect models. IEEE Trans Softw Eng 1988;14(9): 1261–70. [25] Beal CW. Practical use of defect detection and prediction in the development and maintenance of software. In: Proceedings of the 4th international workshop on Predictor models in software engineering. Leipzig, Germany: ACM; 2008. p. 45–6. [26] Boehm B, Basili VR. Software defect reduction top 10 list. In: Foundations of empirical software engineering: the legacy of Victor R. Basili; 2005, p. 426. [27] Grottke M, Trivedi KS. Fighting bugs: remove, retry, replicate, and rejuvenate. Computer 2007;40(2):107–9. [28] McDonald M, Musson R, Smith R. The practical guide to defect prevention. Microsoft Press; 2007, p. 480. [29] Laird LM, Brennan MC. Software measurement and estimation: a practical approach. IEEE Comput Soc 2007. [30] Sunghun K, Whitehead EJ, Yi Z. Classifying software changes: clean or buggy? IEEE Trans Softw Eng 2008;34(2):181–96. [31] Hartman P. Utility of popular software defect models. WA: Seattle; 2002. [32] Putnam LH. A general empirical solution to the macro software sizing and estimating problem. IEEE Trans Softw Eng 1978;SE4(4):345–61. [33] Arisholm E, Briand LC, Johannessen EB. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 2010;83(1):2–17.

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007

12 [34] Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. Software Engineering, IEEE Transactions on 2012;38(6):1276–304. [35] Radjenovic´ D et al. Software fault prediction metrics: a systematic literature review. Inf Softw Technol 2013;55(8):1397–418. [36] Singpurwalla ND, Simon PW. Software reliability modeling. Int Stat Rev/Revue Int Stat 1994;62(3):289–317. [37] Thwin MMT. Estimating software readiness using predictive models. In: Electrical and electronic engineering. Nanyang Technological University; 2006. [38] Quah T-S. Estimating software readiness using predictive models. Inf Sci 2009;179(4):430–45. [39] Clark B, Zubrow D. How good is the software: a review of defect prediction techniques. Sponsored by the US department of Defense; 2001. [40] Li PL et al. Empirical evaluation of defect projection models for widely-deployed production software systems, vol. 29. ACM; 2004. [41] Bergander T, Yan L, Ben Hamza A. Software defects prediction using operating characteristic curves. In: IEEE international conference on information reuse and integration, IRI 2007; 2007. [42] Raja U, Hale DP, Hale JE. Modeling software evolution defects: a time series approach. J Softw Maint Evol: Res Pract 2009;21(1):49–71. [43] Yousef AH. Analysis and enhancement of software dynamic defect models. In: International conference on networking and media convergence, ICNM 2009; 2009. [44] Raymond Madachy BB, Dan Houston. Modeling software defect dynamics. University of Southern California: Center for Systems and Software Engineering; 2010. [45] Gopal A et al. Measurement programs in software development: determinants of success. IEEE Trans Softw Eng 2002;28(9): 863–75. [46] McCabe TJ. A complexity measure. In: Proceedings of the 2nd international conference on Software engineering. San Francisco, California, USA: IEEE Computer Society Press; 1976. p. 407. [47] Halstead MH. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.; 1977, p. 128. [48] Fenton NE, Neil M. A critique of software defect prediction models. IEEE Trans Softw Eng 1999;25(5):675–89. [49] Fenton N, Krause P, Neil M. A probabilistic model for software defect prediction. IEEE Trans Softw Eng 2001. [50] Fenton N et al. Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 2007;49(1):32–43. [51] Neil M, Fenton N. Improved software defect prediction. In: 10th European SEPG, London; 2005. [52] Fenton N et al. On the effectiveness of early life cycle defect prediction with Bayesian Nets. Emp Softw Eng 2008;13(5):499–537. [53] Wahyudin D, Ramler R, Biffl S. A framework for defect prediction in specific software project contexts. In: Huzar Z et al., editors. Software engineering techniques. Berlin Heidelberg: Springer; 2011. p. 261–74. [54] Muhammad Dhiauddin, Mohamed Suffian SI. A prediction model for system testing defects using regression analysis. Int J Soft Comput Softw Eng (JSCSE) 2012;2(7). [55] Ostrand T, Weyuker E. Progress in automated software defect prediction. In: Chockler H, Hu A, editors. Hardware and software: verification and testing. Berlin Heidelberg: Springer; 2009. p. 200–4. [56] Challagulla VUB, et al. Empirical assessment of machine learning based software defect prediction techniques. In: 10th IEEE international workshop on object-oriented real-time dependable systems, WORDS 2005; 2005. [57] Challagulla VUB et al. Empirical assessment of machine learning based software defect prediction techniques. Int J Artif Intell Tools 2008;17(02):389–400.

A.H. Yousef [58] Fakhrahmad S, Sami A. Effective estimation of modules’ metrics in software defect prediction. In: Proceedings of the world congress on engineering; 2009. [59] Larson E. SUDS: An Infrastructure for Creating Bug Detection Tools. In: Seventh IEEE international working conference on source code analysis and manipulation, SCAM 2007; 2007. [60] Larson E. SUDS: an infrastructure for creating dynamic software defect detection tools. Autom Softw Eng 2010;17(3):301–46. [61] Larson E. SUDS: an infrastructure for dynamic software bug detection using static analysis. SIGSOFT Softw Eng Notes 2006;31(6):1–2. [62] Yang W, Li L. A rough set model for software defect prediction. Proceedings of the 2008 international conference on intelligent computation technology and automation, vol. 01. IEEE Computer Society; 2008. p. 747–51. [63] Weimin Y, Longshu L. A new method to predict software defect based on rough sets. In: First international conference on intelligent networks and intelligent systems, ICINIS ’08; 2008. [64] Li Paul Luo, Shaw M, Herbsleb J. Selecting a defect prediction model for maintenance resource planning and software insurance. In The fifth workshop on economics-driven software research, IEEE Computer Society; 2003. p. 32–7. [65] Vladu AM. Software reliability prediction model using rayleigh function. Univ Polit Buch Sci Bull C 2011;73(4). [66] Vladu AM, Fagarasan I. On prediction of defect rates. In: IEEE international conference on automation quality and testing robotics (AQTR); 2012. [67] Kim S et al. Dealing with noise in defect prediction. Proceedings of the 33rd international conference on software engineering. Waikiki, Honolulu, HI, USA: ACM; 2011. p. 481–90. [68] Ekanayake J et al. Time variance and defect prediction in software projects. Emp Softw Eng 2012;17(4–5):348–89. [69] Davies J et al. Perspectives on bugs in the debian bug tracking system. In: 7th IEEE working conference on mining software repositories (MSR). IEEE; 2010. [70] Li PL, Herbsleb J, Shaw M. Forecasting field defect rates using a combined time-based and metrics-based approach: a case study of OpenBSD. In: 16th IEEE international symposium on software reliability engineering, ISSRE 2005; 2005. [71] Rodriguez D, Herraiz I, Harrison R. On software engineering repositories and their open problems. In: First international workshop on realizing artificial intelligence synergies in software engineering (RAISE); 2012. [72] Guo L, et al. Robust prediction of fault-proneness by random forests. In: 15th International symposium on software reliability engineering, ISSRE, IEEE; 2004. [73] Khoshgoftaar TM et al. Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Networks 1997;8(4):902–9. [74] Menzies T et al. Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 2010;17(4):375–407. [75] Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 2007;33(1):2–13.

Ahmed H. Yousef is an associate professor in the department of computers and systems, Faculty of Engineering, Ain Shams University, Egypt since 2009. He received his B.Sc., M.Sc. and Ph.D. degrees in Computers and Systems Engineering from Ain Shams University, Cairo, Egypt in 1995, 2000 and 2004, respectively. The major fields of his studies are Software Engineering, Data Mining and Artificial Intelligence.

Please cite this article in press as: Yousef AH, Extracting software static defect models using data mining, Ain Shams Eng J (2014), http://dx.doi.org/10.1016/ j.asej.2014.09.007