A New Credit Scoring Method Based on Rough ... - Semantic Scholar

A New Credit Scoring Method Based on Rough Sets and Decision Tree XiYue Zhou, DeFu Zhang, and Yi Jiang Department of Computer Science, Xiamen University, Xiamen 361005, China [email protected], [email protected], [email protected]

Abstract. Credit scoring is a very typical classification problem in Data Mining. Many classification methods have been presented in the literatures to tackle this problem. The decision tree method is a particularly effective method to build a classifier from the sample data. Decision tree classification method has higher prediction accuracy for the problems of classification, and can automatically generate classification rules. However, the original sample data sets used to generate the decision tree classification model often contain many noise or redundant data. These data will have a great impact on the prediction accuracy of the classifier. Therefore, it is necessary and very important to preprocess the original sample data. On this issue, a very effective approach is the rough sets. In rough sets theory, a basic problem that can be tackled using rough sets approach is reduction of redundant attributes. This paper presents a new credit scoring approach based on combination of rough sets theory and decision tree theory. The results of this study indicate that the process of reduction of attribute is very effective and our approach has good performance in terms of prediction accuracy. Keywords: Data Mining, Credit Scoring, Rough Sets, Decision Tree, Attribute Reduction.

1 Introduction The credit scoring model has been used in commercial and consumer loan for a few decades. Numerous methods have been presented in many literatures to develop the credit scoring model. Those models include traditional statistical models (e.g.: logistic regression [4]), nonparametric statistical models (e.g., k-nearest neighbor [5], decision trees [11, 12] and neural network models [3]). All these models are widely used. But they didn’t all process the original sample data when they were used to build the credit scoring model. It is necessary and very important to preprocess the original sample data to eliminate redundant data and noise data, etc. In this paper, we finish the process by using rough set theory [9]. In addition, due to higher prediction accuracy and generating automatically classification rules [7], we will build the credit scoring model by using decision tree method. The rest of this paper is organized as follows. We will briefly explain the basic concepts of rough set in section 2, and discuss the decision tree algorithm C4.5 [11, 12] T. Washio et al. (Eds.): PAKDD 2008, LNAI 5012, pp. 1081–1089, 2008. © Springer-Verlag Berlin Heidelberg 2008

1082

X. Zhou, D. Zhang, and Y. Jiang

in section 3. The design and generating of our model will be illustrated in section 4. In section 5 we will analyze the experimental results of the credit scoring model in this paper and compare with other methods in the prediction accuracy. Finally, section 6 addresses the conclusion and discusses the possible future research work.

2 Rough Sets Theory 2.1 The Information System An information system can be represented as follows:

S = U , A,V , f

(1)

Where, U is a non-empty finite set of objects called universe; A is a non-empty finite set

Va is the range of the attribute a; V = U a∈A Va ; f : U × A → V is the information function such that f ( x, a ) ∈ Va , for any a ∈ A and x ∈ U .

of attributes;

2.2 Indiscernibility Relation The indiscernibility relation [10, 13, 14] is an equivalence relation on the set U and can be defined as follows: There is an indiscernibility relation INP ( P ) , for arbitrary attribute subset P ⊆ A :

IND( P) = {< x, y >∈ U × U ∀a ∈ P, f ( x, a) = f ( y, a )}

(2)

If < x, y >∈ IND ( P ) , that means objects x and y are indiscernible with attribute set P. 2.3 Reduction of Concept and the Core In real-world application, we are often required to eliminate irrelevant or redundant attributes; meanwhile, we must maintain the primary areas of the information system. This problem refers to two basic concepts: reduction and core [9, 13, 14]. 2.4 Discernibility Matrix Discernibility matrix [13, 14] is a very important concept in rough set theory. Discernibility matrix can be used to complete attributes reduction. Definition 1. [13, 14]: Given an information system S, of objects,

U = {x1 , x 2 ,L , x n } is the set

C = {c1 , c 2 ,L , c m }is the predictive attributes set, D is the class attribute.

Discernibility matrix is denoted by M(S), whose elements are as follows:

A New Credit Scoring Method Based on Rough Sets and Decision Tree

{

}

⎧ a a ∈ C : f ( x i , a ) ≠ f ( x j , a ) ∧ f ( x i , D ) ≠ f ( x j , D) mij = ⎨ 0 otherwise ⎩

1083

(3)

where i, j = 1,2, L , n , here n=|U|.

3 C4.5 The ID3 [11] is a famous algorithm to construct a decision tree. And the C4.5 [12] is the extended version of the ID3. The C4.5 mainly contains two phases: generating an initial decision tree and pruning the initial decision tree. 3.1 Generating Decision Tree The original ID3 algorithm used a criterion called gain to select the test attribute. The criterion gain refers to the concept entropy in information theory [6, 10]. 3.2 Pruning Decision Tree Since there is a less objects to work with after each decision node split, it is necessary to prune the decision tree to get a better accuracy of prediction. On this issue, C4.5 uses a specific technique to estimate the prediction error rate. This technique is called pessimistic error pruning [11].

4 New Credit Scoring Model 4.1 Design In this paper, the credit scoring model is built based on the combination of rough set theory and decision tree theory. Firstly, our model preprocesses the sample data by using rough set; then generates the credit scoring model by using C4.5. This approach can bring many benefits. Rough set can not only remove redundant data but also simplify the dimension of input information space by discovering the relation among all data.

Fig. 1. Experiment procedures

1084


Fig. 1 illustrates the self-explanatory experimental procedure for generating the credit scoring classifier. The details will be presented in next section. 4.2 Reduction of Attributes Before the classifier generated, we will preprocess the original sample data. The process mainly processes the redundant attribute. In this paper the process is called reduction of attributes. We process the redundant attributes by using the algorithm in Wang and Pei’s paper [14] (denoted by WPA in this paper). In this paper we improved the WPA, and denotes by Algorithm 1. The Algorithm 1 is faster than WPA in run-time. Before discussion the Algorithm 1, we give a theorem in Boolean algebra as follows: Theorem 1. [1]: a ∧ ( a ∨ b) ↔ a . We will not prove the Theorem 1 because it is easy to prove it. Algorithm 1. Compute all reductions of attributes. Input: Information system S = U , A, V , f ; Output: All reductions of attributes of the information system S = U , A, V , f ; Procedure: 1. construct the discernibility matrix M ( S ) = [cij ] n×n of information system S, and construct M = {cij | 0 ≤ i, j ≤ n − 1} ; 2. construct discernibility function fM (S) according to M , the form of the function fM (S) is disjunctive normal form; 3*. reduce the disjunctive normal form according to the Theorem 1; 4. FunctionMinDnf ( M , DNFM ) ; 5 get all reduction of attributes from DNFM .

．

Algorithm 2. FunctionMinDnf ( M , DNFM ) Input: the discernibility matrix M ( S ) = [cij ] n×n of information system S, and the set M = {cij | 0 ≤ i, j ≤ n − 1} Output: DNFM Procedure: 1. DNFM 1 = ∅ DNFM 2 = ∅

；

；

M = ∅ return ∅ ; 3. if | M |= 1 return DNFM 1 = M ; 4. divide M into M 1 , M 2 FunctionMi nDnf ( M 1 , DNFM 1 ) FunctionMinDnf ( M 2 , DNFM 2 ) ; 5. Construct R = {d 1 ∧ d 2 | d1 ∈ DNFM 1 , d 2 ∈ DMFM 2 } 6. DNFM = reduction of R; return DNFM ; 2. if

；

；


1085

The definition of reduction of R, the definition of discernibility function f M (S ) , and the definition of the DNF are in [14]. The step 3 with notation ‘*’ is our optimization step in algorithm 1, and other steps is the same as the WPA. 4 2 The time complexity of WPA is T = O(| A | ( K log | U | + | U |)) , where K = Max {Card ( DNFG ) | G ⊆ N} [14]. The time complexity is decreased in the Algorithm 1. The value of K will be greatly decreased, because the size |M| of the set M is greatly decreased. And the rank of K is 4, so the Algorithm 1 will evidently reduce the running time. The experimental results of the efficiency of reduction of attribute will be given in the section 5. 4.3 Generating Classifier After reduction of attributes, we got a new sample data set. We can randomly select a majority of the instances as train sample of decision tree classifier from the new sample. We build the credit scoring model by using the Algorithm C4.5.

5 Experimental Results 5.1 Efficiency of Reduction of Attributes To compare with WPA, we selected the same databases as Wang and Pei’s. We selected nine databases from the database of UCI machine learning. The experiment was completed on the same PC (Intel-Celeron, 2.4GHz, 256MB RAM WinXP Professional). We obtained the same results by the two methods. And the experimental results on the running time are reported at the Table 1:

，

Table 1. Comparison of the running time of reduction algorithms

Name of database

Number of instances

Number of attributes

Algorithm 1 Running time(s)

WPA Running time(s)

Postoperative Patient Hayes-Roth Balance-scale Teaching Assistant Evaluation Zoo Tic-Tac-Toe Endgame Car Evaluation BUPA liver disorders Monk's Problems(1) Monk's Problems(2) Monk's Problems (3)

90 132 625 152 101 958 1728 345 432 432 432

9 6 5 6 17 10 7 7 7 7 7

0.000 0.031 0.156 0.035 0.046 0.453 0.846 0.062 0.093 0.109 0.093

0.102 0.155 5.410 0.226 0.150 17.885 68.756 1.520 2.633 2.598 2.375

According to the Table 1, obviously, the algorithm 1 has a higher efficiency comparing with WPA on the running time. It shows that it is effective that we improved the WPA. The analysis about time complexity of the Algorithm 1 is validated by the result.

1086


5.2 Prediction Accuracy Analysis The two databases of in our experiments are from the UCI Machine Learning Repository [8]: German Credit database and Australian Credit database. For the German Credit, there are in all 1000 instances which contain 700 good credit instances and 300 bad credit instances, and each instance consists of 20 predictive attributes and 1 class attribute. For the Australian Credit, there are in all 690 instances which contain 303 good credit instances and 387 bad credit instances, and each instance consists of 14 predictive attributes and 1 class attribute. Table 2. Description of Databases from the UCI Machine Learning Name Instance German Credit 1000 Australian Credit 690

Predictive Attributes 20 14

class attribute 1 1

Good credit 700 303

Bad credit 300 387

We respectively test the two databases by using the rough set & C4.5 method (denoted by RSC) and the single C4.5 with two different ratios. The two ratios are 7:3 and 8:2 between the size of train sample and the size of test sample. We had 20 experiments by using RSC and the single C4.5 for each database. The process of choice train sample is stochastic. Remainder instances of the database are used for test sample after choosing train sample. The experimental results are respectively reported at the Table 3 and at the Table 4. Table 3. (UCI——German Credit) prediction accuracy Methods C4.5 (7:3) RSC (7:3) C4.5 (8:2) RSC (8:2)

Number of predictive attribute 20 12 20 12

Max prediction accuracy (%) 74.0 80.67 74.5 82.0

Min prediction accuracy (%) 68.0 72.0 68.0 72.0

Average of prediction accuracy (%) 72.0 78.67 73.5 79.5

Table 4. (UCI——Australian Credit) prediction accuracy Methods C4.5(7:3) RSC(7:3) C4.5(8:2) RSC(8:2)

Number of predictive attribute 14 11 14 11

Max prediction accuracy (%) 88.12 90.68 88.41 90.95

Min prediction accuracy (%) 81.26 83.47 81.26 84.78

Average of prediction accuracy (%) 85.31 87.78 85.45 88.21

According to the Table 3, only 12 of all 20 predictive attributes in RSC were used to build the credit scoring model after the reduction of attributes. Comparing with the single C4.5 method, the prediction accuracy of the RSC method are both evidently improved with the radio 7:3 and with 8:2. The average prediction accuracy is heightened


1087

～

about 6% 7%. The RSC method had a good performance for the German Credit database. From the Table 4, we can find that the RSC has also a good performance for the Australian Credit database. By comparing with the single C4.5, we conclude that the RSC method have a good performance for the two databases. The reduction of attribute not only reduced the dimension of the decision table (the original sample), but also enhanced the prediction accuracy of the credit scoring model. Reduction of attribute before building model is very effective. Reduction of attribute plays a very important role in the process of building the credit scoring model. However, we must notice the index min prediction accuracy in the Table 3 and Table 4. The min prediction accuracy is low in the RSC method, which indicates that the stability of RSC is not good. We think that it is caused by the algorithm C4.5 by theoretical analysis. We can get the worst instance, and the prediction accuracy is just 0%. Suppose the 700 train instances are all from the 700 good instances and the 300 test instances are all from the 300 bad instances, which is possible though the probability is very tiny. If it happened, the generated decision tree would just have one node, and the class of the node is good. So we would get the 100% error rate when we test all bad instances in the test experiment. Hence, we can conclude that the prediction accuracy of model is associated with the ratio of good instances and bad instances in the train sample. 5.3 Comparing with Other Methods Because those methods on credit scoring problem in many papers often use the k-fold cross validation to complete experiment, we tested again the prediction accuracy of our model by using the 10-fold cross validation by using the RSC method to compare with other methods. The result is the average of the accuracy determined for each of the 10 independent stochastic data set partitions. For the two databases, we all had 10 experiments with the 10-fold cross validation (10x10-CV). The results are reported at the Table 5. Table 5. The accuracy rates (%) with the 10-fold cross validation for German credit database and Australian credit database by using the RSC method

German Australian

No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10 79.9 79.4 79.9 80.2 79.0 79.8 79.2 79.6 79.8 79.5 88.55 87.97 88.26 88.55 88.70 88.26 88.59 88.99 88.84 88.70

Avg. 79.63 88.54

At the present, many data mining techniques such as neural networks and genetic programming and SVM-based are successfully applied to build the credit scoring model, and they usually have good prediction accuracy. Therefore, the RSC method will be compared with these methods for the German Credit database and Australian Credit database. We will compare RSC with the single C4.5, BPN (Back-propagation Neural Network), GP (Genetic Programming) [6], SVM+GA (Support Vector Machine +Genetic

1088


Algorithm) [2].The results of the two databases are summarized in the Table 6 by using RSC, single C4.5, BPN, GP, and SVM+GA. Where, the results of BP, GP, and SVM+GA are from the paper [2]. In the paper [2], the two credit scoring databases are partitioned into training and independent test sets by the same 10-fold cross validation procedure. The GP specific parameters for the set two credit datasets areas follows: population size is 250, reproduction rate is 0.2, crossover rate is 0.7, mutation rate is 0.08, and maximum number of generations is 2000-3000; For the BP model, several options of the neural network configurations are tested, in which14-32-1 and 24-43-1 respectively for the Australian data and German data are selected to obtain better results. Additionally, the learning rate and momentum are set to 0.8 and 0.2, respectively; For C4.5 and SVM+GA, it chooses their default settings. Table 6. Result summary with the 10-fold cross validation for German credit database and Australian credit database

Method RSC C4.5 BPN GP SVM+GA

German Avg. (%) 79.63 73.50 77.83 78.10 77.92

Australia Avg. (%) 88.54 85.31 86.83 87.00 86.90

On the basis of the results of Table 6, we can conclude that the RSC method in our study outperforms to other methods for Australian Credit database and German Credit database. It indicates that the RSC method is effective and successful to build the credit scoring model in this paper.

6 Conclusions and Future Works The Data Mining technique is a very effective approach to research the financial orderliness and quickly make decision. The credit scoring model based on rough set and decision tree in this paper fully exhibits the advantages of rough set and decision tree. On the basis of the result of the section 5, we can conclude that the reduction of attribute in this paper is a very important and effective instrument to improve the prediction accuracy. Reducing the redundant attributes not only avoids the harmful data to impact the prediction accuracy but also reduces the cost of calculation in the process of building credit scoring model. Moreover, the RSC method has higher prediction accuracy than the single C4.5, BP, GP, and SVM+GA on the benchmarks. The RSC method is effective and successful on the credit scoring problem in this paper. However, the method has some limitations and need to improve the stability even as our analysis in the section 5. In future works, we can try to improve the RSC method combining with the Boosting Algorithm or Bagging Algorithm to get higher accuracy,.


1089

Acknowledgements The paper is supported by the National Nature Science Foundation of China (Grant no. 60773126) and the Province Nature Science Foundation of Fujian (Grant no. A0710023) and academician start-up fund (Grant No. X01109) and 985 information technology fund (Grant No. 0000-X07204) in Xiamen University.

References 1. Hamiltion, A.G.: Logic for Mathematicians. Cambridge University Press, Cambridge (1988) 2. Huang, C.-L., Chen, M.-C., Wang, C.-J.: Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications (2006), doi: 10.1016/j.eswa 2006.07.007 3. Desai, V.S., Crook, J.N., Overstreet, G.A.: A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research 95(1), 24–37 (1996) 4. Henley, W.E.: Statistical aspects of credit scoring. Dissertation, The Open University, Milton Keynes, UK (1995) 5. Henley, W.E., Hand, D.J.: A k-nearest neighbor classifier for assessing consumer credit risk. Statistician 44(1), 77–95 (1996) 6. Koza, J.R.: Genetic programming: On the programming of computers by means of natural selection. The MIT Press, Cambridge, MA (1992) 7. Kantardzic, M.: Data Mining: Concept, Models, Methods, and Algorithms. IEEE Press, America (2002) 8. Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning Database (2001), http://www.ics.uci.edu/~mlearn/MLRepository.html 9. Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11(5), 341–356 (1982) 10. Hu, Q., Zhao, H., Xie, Z., Yu, D.: Consistency Based Attribute Reduction. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 96–107. Springer, Heidelberg (2007) 11. Quinlan, J.R.: Introduction of decision trees. Machine Learning 1(1) (1986) 12. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993) 13. Skowron, A., Rauszer, C.: The discernibility matrices and function in information system. In: Slowinski, R. (ed.) Intelligent Decision support Handbook of Application and Advances of the Rough sets Theory, pp. 331–362. Kluwer Academic Publisher, Dordrecht (1991) 14. Yuan-Zhen, W., Xiao-Bing, P.: A Fast Algorithm for Reduction Based on Skowron Discernibility Matrix. Compute Science (in China) 32(4), 42–44 (2005)