Source Code Metrics and Maintainability: a Case ... - Semantic Scholar

Source Code Metrics and Maintainability: a Case Study Péter Heged˝ us1 , Tibor Bakota1 , László Illés1 , Gergely Ladányi2 , Rudolf Ferenc1 , and Tibor Gyimóthy1 1

University of Szeged, Department of Software Engineering ´ ad tér 2. H-6720 Szeged, Hungary Arp´ {hpeter,bakotat,Illes.Laszlo.2,ferenc,gyimothy}@inf.u-szeged.hu 2 DEAK Cooperation Research Private Unlimited Company Dugonics tér 13. H-6720 Szeged, Hungary [email protected] Abstract. Measuring high level quality attributes of operation-critical IT systems is essential for keeping the maintainability costs under control. International standards and recommendations, like ISO/IEC 9126, give some guidelines regarding the different quality characteristics to be assessed, however, they do not define unambiguously their relationship to the low level quality attributes. The vast majority of existing quality models use source code metrics for measuring low level quality attributes. Although, a lot of researches analyze the relation of source code metrics to other objective measures, only a few studies deal with their expressiveness of subjective feelings of IT professionals. Our research involved 35 IT professionals and manual evaluation results of 570 class methods of an industrial and an open source Java system. Several statistical models have been built to evaluate the relation of low level source code metrics and high level subjective opinions of IT experts. A decision tree based classifier achieved a precision of over 76% during the estimation of the Changeability ISO/IEC 9126 attribute. Keywords: Metrics evaluation, Empirical quality model, ISO/IEC 9126, Software maintainability

1

Introduction

Many important areas of our lives are supported and controlled by software systems. We rely on them, moreover, we entrust our lives to them in some cases (e.g. flight control systems or nuclear facilities). This fact has made the fields of software quality and reliability substantial and unavoidable research areas. The internal quality of many industrial systems has deteriorated owing to long evolution phases of 10-20 years. Continuous quality monitoring in case of these systems is unavoidable for keeping the maintainability costs under control. Measuring different high level quality aspects allows the developers and managers to take the right decisions, to back up intuition, to estimate future costs and to assess risks. The international standards and recommendations like ISO/IEC 9126 [16], ISO/IEC 25000/SQuaRE [17], ISO/IEC 15504/SPICE [15], Automotive SPICE (http://www.automotivespice.com) and CMMI [6] give guidelines regarding the different quality characteristics to be assessed.

In this paper our focus is on the ISO/IEC 9126 standard, which defines six high level product quality characteristics which are widely accepted both by industrial experts and academic researchers. These characteristics are: Functionality, Reliability, Usability, Efficiency, Maintainability and Portability. The characteristics are affected by low level quality properties, that can be internal (measured by looking inside the product, e.g. by analyzing the source code) or external (measured by executing the product, e.g. by performing testing). For the lowest level quality attributes, we used metrics, which quantify different attributes of the source code (e.g. size, complexity, coupling, coding rule violations, rate of code clones, etc.). Our research focuses on the relationship between the low level source code metrics and the high level quality characteristics defined by the standard. Most of the related researches tackle the correlation of source code metrics with objective measures like failure rates during operation or bug numbers reported in an issue tracking system. Provided that it does not require a considerable manual work to be done, the reliability of the results highly depends on the reliability of the data collected during the operation or recorded in the issue tracking system. In this research, contrary to the above mentioned approaches, we invested a large amount of manual work in order to gather reliable information regarding high level quality attributes of the systems’ source code. The research involved 35 IT professionals and manual evaluation results of 570 class methods of an industrial and an open source Java system. Our aim was to find correlations between low level objective measures of source code elements (i.e. source code metrics) and high level subjective opinions of IT professionals. Beside drawing conclusions from the collected data, several machine learning models had been trained for estimating the high level quality characteristics. In the case of Changeability, a decision tree based classifier achieved a precision of over 76% during estimation, using the 10 fold cross-validation method. The paper focuses on the following two research questions: RQ1: How do the well-known source code metrics correlate individually with the subjective opinions of IT experts, regarding Maintainability? RQ2: How do the well-known source code metrics perform together as predictors when using machine learning algorithms for assessing the subjective opinions of IT experts regarding Maintainability? The rest of the paper is structured as follows. In Section 2 we overview the related work. Then, in Section 3 we introduce the approach and technical details about the performed case study and the employed analysis methods. Section 4 presents the results of the case study. Afterwards, Section 5 discusses the known threats to the validity of our work. Finally, we conclude the paper and present future work in Section 6.

2

Related Work

The ISO/IEC 9126 standard is currently one of the most accepted standards for measuring software quality. Numerous adaptations and customizations have

been developed for evaluating quality characteristics defined by the standard. Most of the researches utilize source code metrics as low level quality attributes and apply sophisticated methods for aggregating the values to higher levels. Alas, only a few papers tackle the connection between the metrics and the subjective feelings of IT professionals about the higher level characteristics. Jung and Kim [19] examined the validity of the structure of the ISO/IEC 9126 standard based on a survey. They focused on the connection of characteristics and subcharacteristics by grouping the latter ones based on the answers of the evaluators. The authors found that most of the resulting groups corresponded to a characteristic defined by the standard. Here, we did not examine the connection between the subcharacteristics and characteristics but the code metrics and subcharacteristics. Many researches use source code metrics for estimating software quality in terms of fault-proneness. For example, Olague et al. [20] studied the ability to predict fault-proneness by using the CK [5], QMOOD [3] and MOOD [13] metric suites. Basili et al. [4] and Gyimóthy et al. [11] calculated code metrics and used regression and machine learning techniques to predict fault-proneness. We used metrics and applied machine learning techniques to predict maintainability instead of fault-proneness. Many researches propose maintainability models based on source code metrics. Bakota et al. [2] suggest a probabilistic approach for computing maintainability for a system. They use benchmark databases to ensure objectiveness, and apply probabilistic aggregation to capture the subjectiveness of high level characteristics. Heitlager et al. [14] also introduce a maintainability model. They transform metric value averages to the [-2,2] discrete scale and perform an aggregation to get a measure for maintainability. Bansiya and Davis [3] developed a hierarchical model (QMOOD) for assessment of high level design quality attributes and validated it on two large commercial framework systems. Our aim was not to aggregate metrics to high level attributes, but to reveal the connection between the low level metrics and high level characteristics given by IT experts. We built models for estimating high level attributes by applying a top-down approach. In our case, machine learning algorithms were trained with source code metrics as predictors and the results of manual assessment as the classes to be learned. Many approaches try to quantify subcharacteristics by using surveys, before the aggregation to higher levels. Chua and Dyson [7] estimated the quality of an e-learning system merely based on end users’ opinion. They demonstrated the validity of their model with a case study and showed how it could be used to detect design flaws. Others used both subjective opinions and metrics calculated from the code for estimating high level properties. Similarly, we also used expert opinions and code metrics, but we used the latter ones to learn the former ones by using machine learning techniques. We use the resulting models to predict maintainability based on source code metrics. Bagheri and Gasevic [1] assessed the maintainability of software product line feature models based on structural metrics. They evaluated manually the

high level maintainability attributes of different feature models. The authors studied the correlation between the low level structural metrics and the high level maintainability attributes. They also applied machine learning algorithms to predict the subjective opinions of these high level attributes using the structural metrics as predictors. Although their work is similar to ours, there are a number of differences. First of all, their focus is on the maintainability of product line feature models, while we studied the software product quality. It follows that the metrics suite we study is different. Our survey involved 35 IT experts who have more than 2 years programming experience on average, while they asked graduate students to answer their questions. They studied the Analyzability, Changeability, and Understandability attributes, while we investigated Stability and Testability as well.

3

Approach

In order to analyze the relationship between the source code metrics and the high level maintainability attributes, we performed a time-consuming manual evaluation task. IT experts evaluated 570 class methods of two Java systems from five different aspects of quality. The purpose of the evaluation was to collect subjective ranks for different quality attributes for a large number of methods.To ease the evaluation process, we developed a web-based framework to collect, store, and organize the evaluation results. In this section we give a brief overview of the evaluated systems, the evaluation process, and the developed framework itself. 3.1

Evaluated Systems

One of the evaluated systems was JEdit (http://www.jedit.org), a well-known text editor designed for programmers. It is a powerful tool written in Java to ease writing source code in several languages. It includes syntax highlight, built-in macros, plug-in support, etc. The system contains more than 700 methods (over 20,000 lines of code), from which we selected 320 to evaluate. The main aspect of the selection was the length of methods, e.g. we skipped the getter/setter methods and the generated ones. The other evaluated system was an industrial software product, which contained more than 20,000 methods and over 200,000 lines of code. From this abundance of methods, we selected 250 to evaluate. The evaluation was performed by 35 experts, who varied in age and programming experience. 3.2

The Evaluation Framework

The developed Metric Evaluation Framework is a complex system, capable of analyzing Java source code, storing and visualizing artifacts, and guiding the user through the evaluation process. The system consists of four modules: • AnalyzeManager - the module controls the Columbus analyzer tools [8–10] for computing low level source code metrics and other analysis results. • Uploader - the module uploads the source code artifacts into a database. • AdminPages - web interface to manage and control the analysis process.

• EvalPages - web interface providing necessary metrical information and allowing the users to evaluate the methods. The Columbus analyzer tools produce metric information based on the source code and its structure. The results of this process are then handled by the Uploader component, which processes and uploads the information into a database. The AnalyzeManager and Uploader modules are hidden from the users (experts involved in the evaluation task). From the users’ point of view, the other two

(a) Item selector screen

(b) Evaluator screen

Fig. 1. Metric Evaluation Framework screens

modules are more important. The AdminPages module is a web interface where the users and the projects can be managed. The analysis of Java sources can also be initialized from this interface. The most important module is the EvalPages, where the user can evaluate the source code of the projects. First, the user has to select a method and an aspect from which the evaluation is performed. This can be done using the item (method) selector screen (see Figure 1(a)). The questions are organized into the following five categories: • Analyzability - how easy it is to diagnose the system for deficiencies or to identify where to make a change • Changeability - how easy it is to make a change in the system (includes designing, coding and documenting changes) • Stability - how well does the system avoid unexpected effects after a change • Testability - how easy it is to validate the software after a change • Comprehension - how easy it is to comprehend the source code of a method (understanding its algorithm) The first four aspects are defined by the ISO/IEC 9126 standard as subcharacteristics of the Maintainability characteristic. The standard defines a fifth subcharacteristic, which is the Compliance, but it has no practical meaning to a programmer so we left it out. Furthermore, Comprehension is not part of the standard but the experts agreed that it should be included. After selecting an item and an aspect, the evaluator panel appears (see Figure 1(b)), where the evaluation can be performed. On the left-hand side of the screen, the item’s source code can be seen. On the bottom left, there are two

tables with the metric values and rule violations (if present) for the current item. On the right-hand side, the questions and text boxes for textual answers take place. With the help of these questions the user can form his own opinion regarding the item. It is important to note that every aspect has its own questions. Furthermore, the questions asked depend on the users’ earlier answers. An example can be seen on Figure 2. Each node of the graph represents a question (starting from the white colored node) that is asked from the evaluators. The edges of the graph show the next question asked, based on the evaluator’s answer (the possible answers are the labels of the edges). After the user finishes the

Fig. 2. Sample questions (Stability)

evaluation, the given answers and rates will be stored in the project’s database. The collected information is then used to build models that are able to predict high level quality characteristics based on metrical values.

4

Experimental Results

This section presents the results of our case study. During the evaluation process all of the previously mentioned 570 methods were evaluated one by one using the Metric Evaluation Framework and the results were stored in a database. Beside the code metric values we stored the high level attribute values (poor, average, good) assessed by one of the 35 IT experts involved in the evaluation process. We imported these data into the Weka Experimenter [12] to build models using different machine learning algorithms. First, we assess the source code metrics that are calculated and used as predictors for the machine learning algorithms, and present the correlation of the metric values calculated for our test projects. Second, we examine the connection

between the code metrics, then we try to answer our research questions through the experimental results. 4.1

The Applied Source Code Metrics

The method-level source code metrics that we examined and used as predictors for the machine learning algorithms in our case study are the following: Number of Outgoing Invocations (NOI ); Lines Of Code (LOC ); Logical Lines Of Code (LLOC ); Number Of Statements (NOS ); Number of Local Methods Accessed (NLMA); Nesting Level (NL); Number of Foreign Methods Accessed (NFMA); Number of Incoming Invocations (NII ); Number of Parameters (NPAR); McCabe Cyclomatic Complexity (McCC ); Clone Coverage (CC ); Number of PMD warnings (http://pmd.sourceforge.net) in a method (PMD). Table 1. Pearson Correlation between the code metrics NOI LOC LLOC NOS NLMA NL NFMA NII NPAR McCC CC PMD NOI 1.00 0.17 0.12 0.10 0.38 0.01 0.80 0.00 0.01 0.03 0.00 0.01 LOC 1.00 0.97 0.89 0.09 0.21 0.12 0.00 0.01 0.73 0.00 0.54 LLOC 1.00 0.95 0.08 0.20 0.07 0.00 0.01 0.82 0.00 0.64 NOS 1.00 0.08 0.19 0.05 0.00 0.00 0.86 0.00 0.72 NLMA 1.00 0.00 0.04 0.00 0.01 0.05 0.00 0.03 NL 1.00 0.00 0.00 0.00 0.17 0.03 0.03 NFMA 1.00 0.01 0.00 0.01 0.00 0.00 NII 1.00 0.01 0.00 0.00 0.00 NUMPAR 1.00 0.00 0.00 0.00 McCC 1.00 0.00 0.79 CC 1.00 0.00 PMD 1.00

Table 1 shows the Pearson correlation (R2 : coefficient of determination) between the metrical values measured for the methods of the Java projects (see Section 3.1). We found very high correlation between the following metrics: LOC, LLOC, NOS, McCC and PMD. The correlation between the LOC, LLOC and NOS metrics is not surprising, since they are similar size measures. The high correlation between the method size measures, McCC and PMD warnings is more interesting. It means that for our test projects the larger and more complex the code of a method was the more coding rule violations it contained and vice versa. The NOI metric correlates well with NFMA. This is again not so surprising since NOI is a generalization of NFMA. 4.2

Relationship of Individual Metrics and High Level Attributes

To answer our first research question we examined the correlation matrix between the code metrics and the high level maintainability attributes. Table 2. Pearson correlation between the code metrics and maintainability attributes

Analyzab. Changeab. Stability Testab. Comprehen.

NOI -0.38 -0.35 -0.28 -0.25 -0.34

LOC -0.41 -0.41 -0.35 -0.38 -0.38

LLOC -0.38 -0.38 -0.34 -0.37 -0.36

NOS NLMA NL NFMA NII NPAR McCC CC PMD -0.34 -0.23 -0.16 -0.35 -0.03 -0.05 -0.27 0.12 -0.22 -0.35 -0.20 -0.17 -0.33 -0.02 -0.10 -0.29 0.09 -0.21 -0.31 -0.19 -0.13 -0.24 0.00 -0.06 -0.26 0.07 -0.22 -0.34 -0.16 -0.34 -0.22 0.01 -0.07 -0.29 -0.02 -0.24 -0.33 -0.22 -0.15 -0.30 0.02 -0.10 -0.26 0.09 -0.21

Table 2 shows that the assessed source code metrics have no statistically significant correlation with any of the maintainability properties. We note however, that almost all of the Pearson correlation (R) values are negative. This means that the smaller metric values indicate better subjective opinion about the maintainability properties. This is in accordance with our intuitive expectations. 4.3

Relationship of Metric-based Models and High Level Attributes

To answer our second research question we have built several models using different machine learning algorithms and evaluated their prediction strength. To perform the machine learning algorithms on the collected data we used the well-known data mining software package called Weka. It is a collection of machine learning algorithms for data mining tasks. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. The main aim of this paper was to examine the connection between the code metrics and high level maintainability properties of methods. We showed in Section 4.2 that we haven’t found any well-known source code metric that would be able to predict the subjective opinions of the IT experts by itself. In this section we are going to investigate how the basic multivariate machine learning algorithms perform in predicting these values. Before every classification process Table 3. Rate of the correctly classified instances

Analyzability Changeability Stability Testability Comprehension

ZeroR J48 Decision Tree Log. Regression Neural Network 67.93% 73.68% 70.97% 70.25% 66.79% 76.65% 73.00% 74.26% 70.20% 73.12% 70.55% 70.92% 66.55% 64.72% 69.45% 70.54% 70.92% 76.68% 70.93% 73.99%

we executed a Principal Component Analysis (PCA) [18] which reduced the dimension of the problem. We used the Weka’s PCA attribute selector function with the following parameters: variance = 0.97, centerData = TRUE. The latter parameter means that Weka subtracts the column’s mean from each column, so each variable has zero mean. To get the proper number of dimensions we set the variance parameter to 0.97. After applying PCA, we tested the well-known basic classifiers: logistic regression, J48 decision tree and neural network. We used the ZeroR algorithm as a baseline of the effectiveness in our experiment. This is the most simple classifier that chooses the class which has the most elements in the training data set. Table 3 shows the rate of correctly classified instances for each maintainability property. In our experiments, we used 10-fold cross-validation. We found that the best classifier is the J48 (Weka implementation of C4.5) decision tree algorithm in four out of five cases. It performed very poorly in the classification of Testability, which was attributed to the fuzziness of the subcharacteristic’s definition. The IT experts involved in the survey varied in testing skills, therefore it is possible that they interpreted the concept differently.

In the case of Changeability the precision of the J48 classifier was by 10% better than ZeroR. In this case the Logistic Regression and the Neural Network algorithms also performed well. Precision is a good way to measure the efficiency, but if we examine the precision and recall values separately for classes it appoints that the J48 algorithm is much more useful than ZeroR. Table 4. Statistics by classes in the case of Changeability J48 decision tree Class TP Rate FP Rate Precision Bad 0.238 0.011 0.455 Average 0.640 0.160 0.624 Good 0.852 0.330 0.839

ZeroR Recall TP Rate FP Rate Precision Recall 0.238 0 0 0 0 0.640 0 0 0 0 0.852 1 1 0.668 1

Table 4 shows detailed statistics about the precision and recall values of the ZeroR and J48 algorithms in the case of Changeability. The precision of the J48 algorithm is by 17 % higher for the Good class than ZeroR’s. Moreover, it found 64% of the Average and 23.8 % of the Poor instances while ZeroR missed them completely. During the evaluation, experts were asked to explain their opinion about the different maintainability properties in a textual format also. Based on their comments we created some new simple predictors (that were not covered by any of our metrics): • Indenting - number of lines divided by the sum of the tabulate characters. • Logging - true if there are ”log”, ”logger”, ”Log” or ”Logger” strings in the source code, false otherwise. • Comments - sum of the lines starting with ”/*” or ”//”. • Naming - number of elements of the set of PMD naming related rule violations. After adding these predictors to the learning process the results became slightly better, for example the precision of Comprehension rose up to 77.04%. This shows that there is a potential to extract more sophisticated predictors from the textual answers to increase the precision of the estimation.

5

Threats to Validity

The paper presents a case study involving 35 IT professionals and a manual evaluation of 570 methods of industrial and open source Java systems. The purpose of the case study is to reveal the relationship between the well-known source code metrics of methods and the high level Maintainability subcharacteristics defined by the ISO/IEC 9126 standard. We also built models with the help of machine learning algorithms for estimating the subjective opinions of the evaluators using source code metrics as low level predictors. Both the evaluation process and the data analysis approach bear some properties which may affect the validity of the results and the usability of the approach. First of all, the 570 evaluated methods are not enough for drawing any general conclusions about the relationship between the source code metrics and the subjective evaluations of maintainability properties. But this number of evaluated

methods is good enough for showing that there is a potential in our approach and it is worth putting more effort in further research. The evaluators were asked to qualify methods on a scale of Poor, Average, Good. The three categories might seem scant but based on our experiences there would be no practical benefit from using a finer scale. Another threat to the validity of the built prediction models is that the data used for training is very unbalanced. This is because most of the methods fell into the Good category in many aspects according to the evaluators. It is also a threat to the validity that every method is evaluated by only one IT professional. Even in such way the evaluation required a lot of manual work. It is sure that more evaluations would increase the validity of our work, but we did not want to put more effort in it before it becomes clear that our approach is applicable and is worth investigating further. Another problem is that the high level quality attributes defined by either the ISO/IEC 9126 standard or us are ambiguous, and different evaluators may interpret them differently. To minimize the risks of misunderstanding the concepts we held a briefing to the evaluators about the ISO/IEC 9126 standard. Moreover, the definitions of the basic concepts were available through the web framework used for the evaluation process. Finally, the experience level and theoretical background of the evaluators were varying. But they were selected from IT professionals having knowledge about source code metrics and software quality.

6

Conclusions and Future Work

The current work tries to reveal the relationship between the well-known source code metrics and the subjective opinions of IT experts about different high level maintainability properties of the source code. There is a lot of work in the literature dealing with quality models. Most of them adapt the ISO/IEC 9126 standard and build hierarchical models in a bottom-up fashion using source code metrics at the lowest level and defining different aggregation operations to calculate higher level attributes. We chose a top-down approach for adapting the standard meaning that we tried to develop models using machine learning algorithms and the subjective evaluations of many experts. To achieve our goal we performed a case study involving 35 IT experts and a manual evaluation of 570 methods of an industrial and an open source Java system. To help the process we have also developed a web-based application for collecting, storing, and organizing the evaluations. We can conclude, that the results of the case study presented in the paper are very promising. It is clear, however, that the metrics we used for building our models are not enough by themselves. They simply do not bear enough information to describe the complex terms like Comprehension that a sophisticated human mind can understand. Some aspects can be predicted better by metrics, e.g. Changeability, while others cannot, e.g. Testability. We answered the following two research questions in the following way: RQ1: How do the well-known source code metrics correlate individually with the subjective opinions of IT experts, regarding Maintainability?

We found no statistically significant correlation, however, almost every value was negative. It corresponds with our intuitive assumption that higher metrical values indicate lower maintainability feelings of experts. RQ2: How do the well-known source code metrics perform together as predictors when using machine learning algorithms for assessing the subjective opinions of IT experts regarding Maintainability? We applied three well-known machine learning techniques (decision tree, logistic regression, and neural network) to build prediction models using metrical values for predicting the human opinion. We found that metrics have the potential to predict high level quality indicators. In the case of the Changeability property the models performed with more than 76% precision (more than 10% better than the ZeroR algorithm). Although the results are promising, future research is needed to extract other factors that humans consider but the current metrics cannot measure. It can be done by analyzing the textual remarks of the evaluators and implementing appropriate predictors based on them. Of course, the number of evaluated methods is too small for drawing any general conclusions. A larger number of data could help getting over this problem. Therefore, in the future we plan to extend the manual evaluation to much more methods of different systems. It seems reasonable to group the evaluators by age, programming experience or other aspects, and examine the results for every group consecutively. After collecting more data we will also examine the distribution of the evaluations for different groups of IT experts. During the evaluation lots of textual opinions have been collected also. Processing the answers is in progress currently. We hope that from the experts’ remarks we can reveal some new properties that influence the subjective quality feeling of humans but are not measured by any of the investigated metrics yet. Based on these results and the opinions of the experts, we plan to define and implement new predictor metrics, which may increase the accuracy of the classification. In this work, we used several well-known learning algorithms, but this does not mean that they are the only nor the best. On the contrary, a more complex algorithm may produce better results. Therefore, we plan to study much more machine learning algorithms and techniques to find the one that is the most suitable for the defined task.

Acknowledgements This research was supported by the Hungarian national grants GOP-1.1.2-07/12008-0007, OTKA K-73688, TECH 08-A2/2-2008-0089, and by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.

References 1. Bagheri, E., Gasevic, D.: Assessing the maintainability of software product line feature models using structural metrics. In: Software Quality Journal 19(3):579612. Springer (2011)

2. Bakota, T., Heged˝ us, P., K¨ ortvélyesi, P., Rudolf, F., Gyim´ othy, T.: A Probabilistic Software Quality Model. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. pp. 368–377. ICSM 2011, IEEE Computer Society, Williamsburg, VA, USA (2011) 3. Bansiya, J., Davis, C.: A Hierarchical Model for Object-Oriented Design Quality Assessment. IEEE Transactions on Software Engineering 28, 4–17 (2002) 4. Basili, V.R., Briand, L.C., Melo, W.L.: A Validation of Object-Oriented Design Metrics as Quality Indicators. In: IEEE Transactions on Software Engineering. vol. 22, pp. 751–761 (Oct 1996) 5. Chidamber, S.R., Kemerer, C.F.: A Metrics Suite for Object Oriented Design. IEEE Trans. Softw. Eng. pp. 476–493 (June 1994) 6. Chrissis, M.B., Konrad, M., Shrum, S.: CMMI Guidlines for Process Integration and Product Improvement. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (2003) 7. Chua, B., Dyson, L.: Applying the ISO9126 model to the evaluation of an e-learning system. In: Beyond the comfort zone: Proceedings of the 21st ASCILITE Conference. pp. 184–190. Citeseer, Perth, Australia (2004) ´ Gyim´ 8. Ferenc, R., Beszédes, A., othy, T.: Tools for Software Maintenance and Reengineering, chap. Extracting Facts with Columbus from C++ Code, pp. 16–31. Franco Angeli Milano (2004) ´ Tarkiainen, M., Gyim´ 9. Ferenc, R., Beszédes, A., othy, T.: Columbus – Reverse Engineering Tool and Schema for C++. In: Proceedings of the 18th International Conference on Software Maintenance (ICSM 2002). pp. 172–181. IEEE Computer Society (Oct 2002) 10. Ferenc, R., Siket, I., Gyim´ othy, T.: Extracting Facts from Open Source Software. In: Proceedings of the 20th International Conference on Software Maintenance (ICSM 2004). pp. 60–69. IEEE Computer Society (Sep 2004) 11. Gyim´ othy, T., Ferenc, R., Siket, I.: Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. IEEE Transactions on Software Engineering pp. 897–910 (2005) 12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations (2009) 13. Harrison, R., Counsell, S., Nithi, R.: An evaluation of the mood set of objectoriented software metrics. IEEE Transactions on Software Engineering 24, 491–496 (1998) 14. Heitlager, I., Kuipers, T., Visser, J.: A Practical Model for Measuring Maintainability. Proceedings of the 6th International Conference on Quality of Information and Communications Technology pp. 30–39 (2007) 15. International Organization for Standardization: ISO/IEC 15504:2004 Information technology – Process assessment – Part 3: Guidance on performing an assessment. Tech. rep., International Organization for Standardization (2004) 16. ISO/IEC: ISO/IEC 9126. Software Engineering – Product quality. ISO/IEC (2001) 17. ISO/IEC: ISO/IEC 25000:2005. Software Engineering – Software product Quality Requirements and Evaluation (SQuaRE) – Guide to SQuaRE. ISO/IEC (2005) 18. Jolliffe, I.: Principal Component Analysis. Springer Verlag (1986) 19. Jung, H.W., Kim, S.G., Chung, C.S.: Measuring Software Product Quality: A Survey of ISO/IEC 9126. IEEE Software pp. 88–92 (2004) 20. Olague, H.M., Etzkorn, L.H., Gholston, S., Quattlebaum, S.: Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. IEEE Transactions on Software Engineering pp. 402–419 (2007)