Credit Scoring Model Based on Neural Network with ... - Springer Link

9 downloads 290 Views 358KB Size Report
world and the business community today. ... The algorithm is successfully applied to a real credit problem. .... Morgan Kaufman, San Francisco. (2001). 6.
Credit Scoring Model Based on Neural Network with Particle Swarm Optimization* Liang Gao, Chi Zhou, Hai-Bing Gao, and Yong-Ren Shi Department of Industrial & Manufacturing System Engineering Huazhong Univ. of Sci. & Tech., Wuhan, 430074, China [email protected]

Abstract. Credit scoring has gained more and more attentions both in academic world and the business community today. Many modeling techniques have been developed to tackle the credit scoring tasks. This paper presents a Structuretuning Particle Swarm Optimization (SPSO) approach for training feed-forward neural networks (NNs). The algorithm is successfully applied to a real credit problem. By simultaneously tuning the structure and connection weights of NNs, the proposed algorithm generates optimized NNs with problem-matched information processing capacity and it also eliminates some ill effects introduced by redundant input features and the corresponding redundant structure. Compared with BP and GA, SPSO can improve the pattern classification accuracy of NNs while speeding up the convergence of training process.

1 Introduction Credit scoring is a system creditors use to assign credit applicants to either a "good credit" one that is likely to repay financial obligation or a "bad credit" one who has high possibility of defaulting on financial obligation. Due to recent financial crises and serious competitions among gigantic corporations, credit scoring has gained more and more attention. Generally, LDA and logistic regression are two popular statistical tools to construct credit scoring models. However, both of them are designed to fit the linear relationship between the dependent and independent variables, when the variables exhibit complex nonlinear relationships, Neural Networks (NNs) served as a promising alternative. In credit industry, NNs are increasingly found to be useful scoring models, and have been adopted by many credit scoring systems [1-3]. The success of NNs for credit scoring is relied on the sophisticated training algorithms which assist the NNs to approximate any continuous nonlinear functions [4]. The commonly used training algorithm Back-Propagation (BP) and Genetic Algorithm (GA) are easy to be trapped in local optimum or lack of effective local search mechanism. Particle swarm optimization (PSO) is a method for optimizing numerical functions and real-world problems. This paper proposed a new PSO based training algorithm-Structure-tuning Particle Swarm Optimization (SPSO) to train NNs and then applied it to the pattern classification for credit scoring. *

This paper is supported by the National Basic Research Program of China (973 Program), No.2004CB719405 and the National Natural Science Foundation of China, No. 50305008.

L. Jiao et al. (Eds.): ICNC 2006, Part I, LNCS 4221, pp. 76 – 79, 2006. © Springer-Verlag Berlin Heidelberg 2006

Credit Scoring Model Based on Neural Network with Particle Swarm Optimization

77

The rest of paper is organized as follows. In Section 2, we briefly review the background of basic PSO and neural networks. Section 3 presents the structure design and the formulation of credit scoring. The computational results of the illustrative examples are given in Section 4. Finally, Section 5 makes a conclusion to this study.

2 Background Neural Networks are mathematical representations inspired by the functioning the human brain. A typical network is composed of a series of interconnected nodes and the corresponding weights between them. It aims at simulating the complex mapping between the input and output. The training process is carried out on a set of data including input and output parameters. The learning procedure is based on the training samples and the testing samples are used to verify the performance of the trained network. During the training, the weights in the network are adjusted iteratively till a desired error is obtained. The investigation and analysis on the biologic colony demonstrated that intelligence generated from complex activities can provide efficient solutions for specific optimization problems [5]. Inspired by the social behavior of animals such as fish schooling and bird flocking, Kennedy and Eberhart designed the Particle Swarm Optimization (PSO) in 1995 [6]. This method is a kind of evolutionary computing technology based on swarm intelligence. The basic PSO model consists of a swarm of particles moving in a d-dimensional search space. The direction and distance of each particle in the hyper-dimensional space is determined by its fitness and velocity. In general, the fitness is primarily related with the optimization objective and the velocity is updated according to a sophisticated rule.

3 Connection Structure Optimization Algorithm Traditional training methods only optimize connection weights. However, insufficient or over-sufficient information processing capacity will affect the classification performance of NNs. Therefore, generalized NNs training should involve optimization of connection structure, i.e., the training algorithm should also delete some redundant connections to obtain a more compact connection structure. To conduct the simultaneous optimization, several critical items are defined as follows: Definition 1. Connection structure { cih } { cho } { cih } represents the connection structure between input layer and hidden layer. Likewise, { cho } is the connection structure between hidden layer and output layer. { cih } and { cho } are both binary variable matrixes. The binary variable is assigned 1 if the corresponding connection are present, 0 if absent. Definition 2. Connection threshold constants { θih } { θ ho } { θih } and { θ ho } are real number matrixes between 0 and 1. Cooperated with connection variable described later, these constants serve as criteria of setting the status of connection. In this instance, the threshold was set as 0.5 empirically.

78

L. Gao et al.

Definition 3. Connection variable { δ ih } { δ ho } As mentioned in Definition 2, in combination with connection threshold constants, connection variables { δ ih } and { δ ho } reflect the connection structure of NNs. If connection variable is greater than the corresponding connection threshold, the threshold will open and accordingly the connection is present, otherwise absent.

4 Experimental Results 4.1 Neural Network Simulation Australian credit data set publicly available from UCI Repository of Machine Learning Databases [7] was used to test the predicate accuracy of the credit scoring models. In this set, 468 samples were accepted and maintain good credit and 222 samples were accepted, but became delinquent. All the source data should be preprocessed and normalized. The input layer is incorporated to the preprocessed variables associated with applicant and the single node of the output layer is incorporated to the credit identified as “accepted” or “rejected”. With the specified architecture and data, the network will conduct the training and testing procedure to learn the relationship between input attributes and credit class. 4.2 Results and Analysis In this section, we present the experimental results of the PSO based training algorithm (including SPSO and PSO) and the other commonly used training methods (GA and BP). It should be pointed out that each experiment was randomly carried out 20 times, the typical results (medium performance) were recorded in the table and figure. Table 1 summarized the classification results of different methods respectively. Figure 1 illustrated the performances on convergence of two different population based algorithms for the dataset. As can be seen from Figure 1, pattern classifications performance of NNs is enhanced remarkably by optimizing connection structure. The comparison between the algorithms with simultaneous optimization and the single weights optimization showed that the concise NNs can still guarantee the classification accuracy. Moreover, the figure also visually illustrate that SPSO converge faster in the training procedure and ends up with higher accuracy. Compared with SGA, the best and average training error curves of SPSO converge much more consistently. Table 1. Classification result of different methods using the NNs rule

PSO SPSO GA SGA BP

Error

Train Accuracy

Test Accuracy

18.1 17.8 20.8 20.9 18.0

90.2% 90.8% 89.6% 89.6% 90.0%

83.7% 85.3% 80.0% 81.6% 84.7%

CPU time (s) 3.32 3.55 5.37 5.68 5.51

Deleted connection

Remained connection

0 114 0 106 0

225 111 225 119 225

Credit Scoring Model Based on Neural Network with Particle Swarm Optimization

79

Fig. 1. Training error curves of SPSO and SGA

5 Conclusions and Discussion Credit scoring problems have gained more and more attention over the past decades and credit techniques have become popular tools to evaluate credit risk. However, traditional modeling techniques are often criticized due to strong model assumptions or limitation of linear relationship mapping. The neural networks are becoming a very popular alternative in the credit scoring tasks. However, the commonly used training algorithms have inherent drawbacks. Considering these problems, this paper presents a work to apply a novel algorithm (PSO) to optimize the structure and the weights of NNs simultaneously. The computational results of the credit dataset reveal that the proposed approach has remarkably improved the data-handling efficiency and generalization ability. The proper classification has minimized the creditor's risk and thereby, this approach has shown enough attractive features for credit analysis system.

References 1. Malhotra, R., Malhotra, D.K.: Differentiating between good credits and bad credits using neuro-fuzzy systems. Vol.136. European Journal of Operational Research. (2002) 190-211. 2. Li, Z., Xu, J.S., Xu, M.: ANN-GA approach of credit scoring for mobile customers. Conference on Cybernetics and Intelligent Systems. Singapore (2004) 1148-1153 3. Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Vol.28 Expert Systems with Applications. (2005) 743-752 4. Hornik, K., Stinchcombe, M., White H.: Multilayer feed-forward networks are universal approximators. Vol. 2. Neural Networks (1989) 359–366 5. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufman, San Francisco (2001) 6. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. Proceedings of IEEE International Conference on Neutral Networks, Perth, Australia (1995) 1942-1948 7. Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA (2001)