Fusion of artificial neural networks for learning ... - Wiley Online Library

2 downloads 0 Views 1MB Size Report
Feedforward neural networks‐based medical image classification is demonstrated by Coppini et al. (2003). The conventional three‐layer neural network is used ...
Received: 25 July 2016

Revised: 24 April 2017

Accepted: 11 June 2017

DOI: 10.1111/exsy.12225

ARTICLE

Fusion of artificial neural networks for learning capability enhancement: Application to medical image classification Jude D. Hemanth1

|

J. Anitha1

|

Bernadetta Kwintiana Ane2

1

Department of ECE, Karunya University, Coimbatore, India

Abstract

2

Artificial neural network (ANN) is one of the commonly used tools for computational applications.

University of Stuttgart, Germany

Correspondence Jude D. Hemanth, Department of ECE, Karunya University, Coimbatore, India. Email: [email protected]

The specific advantages of ANN are high accuracy, less convergence time, less computational complexity, and so forth. However, all these merits are not available in the same ANN. Even though back propagation neural (BPN) networks are accurate, their computational complexity is significantly high. BPN networks are also not stable. On the other hand, Hopfield neural network (HNN) is better than BPN in terms of computational efficiency. But the accuracy of HNN is low. In this work, a modified ANN is proposed to overcome this specific problem. The modified ANN is a fusion of BPN and HNN. The technical concepts of BPN and HNN are mixed in the training algorithm of the proposed back propagation‐Hopfield network (BPHN). The objective of this fusion is to improve the performance of conventional ANN. Magnetic resonance brain image classification experiments are used to analyse the proposed BPHN. Experimental results have suggested improvement in the learning process of the proposed BPHN. A comparative analysis with the conventional networks is performed to validate the performance of the proposed approach. KEY W ORDS

accuracy, back propagation network, computational complexity and image classification, Hopfield network

1

|

I N T RO D U CT I O N

Medical image classification is the technique of clustering the abnormal images into different categories based on some similarity measures (Ge et al., 2006). Conventional medical image classification procedures are based on human perception. However, human perception is prone to error. Several automated techniques have been developed to overcome the drawback of the conventional image classification system. Artificial neural network (ANN) is one of the commonly used tools among the automated techniques. Literature survey has revealed the availability of many research works on ANN‐based medical image classification. The application of bidirectional associative memory neural network for medical image classification is explored in Sharma et al. (2008). A semisupervised approach is adopted in this work for the recognition of different abnormal categories. This work focused only on classification accuracy analysis. Hence, the performance of the classifier cannot be validated. An extensive review on various conventional ANN for medical image analysis is reported in Amato et al. (2013). The various merits and the demerits of the ANN are highlighted in this work. This research work specifically focuses on medical diagnosis applications of the ANN. ANN‐based medical image analysis is performed by Jiang et al. (2010). Discussions on performance enhancement of the conventional neural networks are carried out in this work. However, quantitative analysis of the results is not reported in this work. Feedforward neural networks‐based medical image classification is demonstrated by Coppini et al. (2003). The conventional three‐layer neural network is used in this work for nodule detection in chest radiogram images. The accuracy and sensitivity reported in this work are sufficiently high. However, this approach is tested on less number of images. This method is not sufficient to validate the robustness of the system. The application of Kohonen neural network for abnormality diagnosis in medical images is explored by Markaki, Asvestas, and Matsopoulos (2009). The effectiveness of the algorithm is evaluated based on accuracy and robustness. The results reported in this work are better than

Expert Systems. 2017;e12225. https://doi.org/10.1111/exsy.12225

wileyonlinelibrary.com/journal/exsy

Copyright © 2017 John Wiley & Sons, Ltd

1 of 20

2 of 20

HEMANTH

ET AL.

conventional image processing algorithms. However, a comparative analysis with other neural networks is not performed in this work. Probabilistic neural networks are used for enhancing the quality of the images (Pantelis et al., 2008). A multilayer neural network is used in this approach. Sun and Wang (2005) have used the Gaussian basis function neural networks for tissue classification in magnetic resonance (MR) images. The proposed approach is efficient in terms of precision and convergence time. Some applications of ANN are reported in literature (Comtat & Morel, 1995; Middleton & Damper, 2004; Reddick et al., 1997; Zhou et al., 2002). However, most of these ANN are not suitable for practical applications. Several modified neural networks have been developed to overcome the drawbacks of conventional ANN. A hybrid neural classifier with rule‐ based system is implemented for mammogram image classification (Papadopoulos et al., 2002). This method is developed to improve the accuracy of the conventional neural network. However, the sensitivity measure reported in this work is relatively low. An efficient neural network for medical image classification is reported in Christoyianni et al. (2000). Radial basis function neural networks are used in this work for categorization of the normal and abnormal images. But the classification accuracy is not dealt in detail in this work. An accurate and fast image classification method based on Adaptive Resonance Theory neural network is implemented in Vigdor and Lerner (2006). A modified training methodology is adopted in this approach. Quick convergence is guaranteed in this approach. A comparative analysis with other works is also reported in this work. However, the accuracy reported in this method is quite low. Radial basis function neural networks‐based automated medical image classification system is developed by Cai et al. (2010). A multiobjective simultaneous learning methodology is used in this work. However, information about the complexity of this approach is not given in this work. A modified training algorithm for spiking neural networks (SNN) is proposed by Wade et al. (2010). The concept of synaptic weight association is incorporated into the training algorithm of spiking neural networks. The generalization capability of the proposed algorithm is also verified in this work. An optimized ANN is used for tissue classification in abnormal MR brain images (Shen et al., 2005). The fuzzy rule‐based system is used with the optimized neural network to improve the efficiency of the classification system. Fatemizadeh, Lucas, and Soltanian‐Zadeh (2003) have implemented a modified growing neural gas network for object extraction in images. The clustering approach is combined with the neural network in this approach. But the generalization capability of this approach is not validated in this work. Dokur and Olmez (2003) have proposed a quantizer neural network. This modified network is used for tissue classification of MR images. This work has given emphasis on both accuracy and time complexity. Modified ANN‐based approaches for medical image analysis are reported in previous studies (Duan & Xu, 2012; Hainc & Kukal, 2006; Jaiswal & Gaikwad, 2006; Kondo, Kondo, & Ueno, 2009; Li & Da, 2000; Lisboa & Taktak, 2006; Pan et al., 2010; Shang, Lv, & Yi, 2006; Wang, Chen, & Bi, 2015; Zhou & Xu, 2001). In this work, a hybrid approach is implemented for brain image analysis. This method is the fusion of two conventional neural networks. The objective of this fusion is to develop a single ANN with multiple advantages. The proposed approach is used for MR brain image classification. Several performance measures are calculated from the experimental results to prove the efficiency of the technique. Sections 2 and 3 provide the details about the image database and the feature extraction techniques. Section 4 discusses about the feature selection methodologies. Section 5 discusses about the conventional neural networks. In this section, the architecture and the training algorithm of back propagation neural (BPN) network and Hopfield neural network (HNN) are discussed. Section 6 details the proposed modified approach with its architecture and training algorithm. Finally, Section 7 illustrates the various experimental results of the proposed and the conventional techniques. A detailed discussion on the various performance measures is also given in this section.

2

|

MATERIALS AND METHODS

The proposed methodology of the automation system is shown in Figure 1. The workflow is divided into several steps: (a) MR image database and feature extraction, (b) feature selection, (c) conventional ANN‐based image classification, and (d) back propagation‐Hopfield network (BPHN)‐based image classification. Real‐world MR images from four abnormal categories are collected from scan centres. The four abnormal categories are meningioma, astrocytoma, metastase, and glioma. Sample MR images are shown in Figure 2. The MR slices are acquired on 0.2 Tesla, Siemens—Magnetom CONCERTO MR Scanner (Siemens, AG Medical Solutions, Germany). The images are collected from M/S Devaki MRI and CT scans, Madurai, India. The scan images are taken with a slice gap of 2 mm. All T2 (TR/TE of 4400/118 ms) weighted images are collected using Turbo Spin Echo sequences. All the images used in this work are grey scale images. The dimension of the images is 256 × 256. The images are stored in the bitmap (BMP) format. Conventionally, these images are categorized by the physician. Human perception is prone to error and may lead to inaccurate results. Thus, an automated classification system is essential for medical diagnosis. An extensive feature set is extracted from these images and supplied as input to the ANN. These features are textural features. These features are based on the intensity of the input images. However, accuracy is not guaranteed with all the extracted features. Hence, the relevant features are selected from the feature set using feature selection techniques. The feature selection method is used to improve the accuracy and minimize the complexity of the neural architectures. In this work, genetic algorithm (GA) is used for feature selection. Initially, the conventional neural classifiers are tested. Then, the proposed BPHN is trained and tested with these optimal features. The training algorithm of BPHN is given in detail with necessary justifications for the modifications. The architecture of BPHN is also illustrated in this work. Finally, the performance measures of these approaches are estimated. A comparative analysis is done with the conventional neural networks. The performance measures used in this work are accuracy, Sensitivity(Sn), Specificity(Sp), convergence time, and computational complexity.

HEMANTH

3 of 20

ET AL.

FIGURE 1 Proposed methodology for this work. BPN, back propagation neural; MR, magnetic resonance

FIGURE 2

3

|

Sample magnetic resonance images: (a) meningioma, (b) astrocytoma, (c) metastase, and (d) glioma

FEATURE EXTRACTION

Feature extraction is an important step for image‐based computational applications. The objective of the feature extraction step is to select the key characteristic features of the images. The size of the image is very large. Hence, it is practically impossible to supply them as input to the computing system. The inputs to the ANN must be always crisp to minimize the computational complexity of the system. Hence, feature extraction is useful in dimensionality reduction of the images. The features uniquely represent the parent image. The features from the same category images must be similar whereas the features extracted from different category images must be different. Because the training of ANN is also based on the accuracy of the dataset, this step is extremely important for performance enhancement of ANN‐based classification system. There are different types of features available for depicting the images. In this work, textural features are used for the experiments. Literature survey reveals the high success rate of these features for medical applications. Textural features are estimated from many approaches such as structural approaches, statistical approaches, model based approaches, and transform approaches. However, literature survey clearly reveals that higher discrimination rates are achieved by the statistical approaches (Taji & Gore, 2013). The commonly used statistical approaches are first‐order statistics, second‐order statistics, and higher order statistics. Among these techniques, higher order statistics are relatively complex (Srinivasan & Shobha, 2008). Hence, second‐order statistical features (grey‐level co‐occurrence matrix [GLCM]) are used in this work. The next area of concern is the number of features used in the experiments. Because the dimensions of the architecture of ANN are based on these features, the number of features cannot be too high. On the other hand, sufficient number of features must be given to ensure accuracy. Hence, an optimal number of features are used. Also, these features must characterize the abnormal categories in a unique manner. In this work, 12 features are extracted from all the images of each categories. The features such as mean, standard deviation, skewness, kurtosis, energy, and entropy are estimated directly from the first‐order histogram of the input images. On the other hand, features such as contrast, inverse difference moment, correlation, variance, cluster shade, and cluster prominence are extracted from the second‐order GLCM (Hemanth, Vijila, & Anitha, 2010). The mathematical equations used to estimate these features are given below. The first‐order histogram estimate is given by

pðbÞ ¼

NðbÞ ; M

where b = grey level in the image. M = total number of pixels in a neighbourhood window centred about an expected pixel.

(1)

4 of 20

HEMANTH

ET AL.

N(b) = the number of pixels with grey value b in the same window. Mean: L−1

SM ¼ b ¼ ∑ bpðbÞ:

(2)

b¼0

Standard deviation:  SD ¼ σb ¼

1 = 2 2 ∑ b−b pðbÞ :

L−1

(3)

b¼0

Skewness: 3 1 L−1 ∑ b−b pðbÞ: σ3b b¼0

(4)

4 1 L−1 ∑ b−b pðbÞ−3: σ4b b¼0

(5)

SS ¼

Kurtosis: SK ¼

Energy: L−1

SN ¼ ∑ ½pðbÞ2 :

(6)

b¼0

Entropy: L−1

SE ¼ − ∑ pðbÞlog2 fpðbÞg:

(7)

b¼0

GLCM is one of the commonly used second‐order statistical approach. GLCM {P(d, θ) (i, j)} represents the probability of occurrence of a pair of grey levels (i, j). These pairs of pixels are separated by distance d and angle θ. The commonly used distance value is 1, and the angles are 0°, 45°, 90°, and 135°. The detailed algorithm of GLCM {P(d, θ) (i, j)} is available in the literature. Notations: Ng

N

g px ðiÞ ¼ ∑j¼1 pði; jÞ ; py ðjÞ ¼ ∑ pði; jÞa;

(8)

i¼1

Ng Ng

pxþy ðkÞ ¼ ∑ ∑ pði; jÞ; k ¼ 2; 3:……2Ng ; i þ j ¼ k; i¼1j¼1

(9)

Ng Ng

px−y ðkÞ ¼ ∑ ∑ pði; jÞ; k ¼ 0; 1::……Ng −1; ji− jj ¼ k:

(10)

i¼1 j¼1

p(i, j)= GLCM. The features such as contrast, inverse difference moment, correlation, variance, cluster shade, cluster prominence, and homogeneity are calculated using the equations given below. Contrast: Sc ¼ ∑ ∑ ði− jÞ2 pði; jÞ: i

(11)

j

Inverse difference moment: SI ¼ ∑ ∑ i

j

1 1 þ ði− jÞ2

pði; jÞ:

(12)

Correlation: So ¼ ∑ ∑ ðijÞpði; jÞ−μx μy i

j

σx σy :

(13)

HEMANTH

5 of 20

ET AL.

Variance: N

SV ¼ ∑ ði− jÞ2 pði; jÞ:

(14)

i; j¼1

Cluster shade: N

SCS ¼ ∑ ði−Mx þ j−My Þ3 pði; jÞ:

(15)

i; j¼1

Cluster prominence: N

SCP ¼ ∑ ði−Mx þ j−My Þ4 pði; jÞ; i; j¼1

N

N

i; j¼1

i; j¼1

(16)

where Mx ¼ ∑ ipði; jÞ ; My ¼ ∑ jpði; jÞ:

Thus, the number of features extracted from the images is 12. The feature selection process is performed to select the best features from this feature set.

4

F E A T U RE SE L E C T I O N

|

All the features do not contribute to accuracy of the system. Hence, feature reduction technique is essential. Feature selection refers to the problem of dimensionality reduction of input data (Li & Xu, 2001; Li, Xu, Wang, & Mo, 2003). The objective is to choose optimal subsets of the original features. These features must contain the essential information for the classification task. In this work, GA is proposed for feature selection. GA is a general purpose search method based on biological evolution (Chaudhry, Varano, & Xu, 2000; Jiang, Xu, Wang, & Wang, 2009; Li, Xu, Jin, & Wang, 2011a; Li, Xu, Jin, & Wang, 2011b; F. Li, Xu, Jin, & Wang, 2012). GA is the commonly used optimization algorithm for many engineering applications (L. Li, Ge, Zhou, & Valderdi, 2012; Li, Xu, & Wang, 2013; Xu et al., 2014; Yin, Zeng, Chen, & Fan, 2016). It falls under the category of evolutionary algorithms because the operations are based on natural theory of evolution.

4.1

|

Algorithm

The complete algorithm of GA for optimal feature selection is given below. Step 1: The population and the size of the population are initialized randomly. The population is the set of possible solutions. The size of the population is one of the deciding factors of the complexity of the algorithm. Hence, it shall not be too large or too small. Each member of the population is called as chromosome. Each chromosome is represented in binary form. A chromosome is a string of bits with each bit corresponding to an input feature. The position of each feature can be assigned by the user. Each bit in the chromosome is called as a gene. A sample population is given below: Pop ¼ ½ 1 0 0 0 0 0 1 0 0 0 1 0

010001000000

101100110011

1 1 0 0 0 1 0 0 1 1 0 0:

In this example, the size of the population (number of chromosomes) is 4 and each chromosome is represented by 12 genes. The number of genes corresponds to 12 features used in this work. Step 2: A suitable fitness function is selected to evaluate the efficiency of each member of the population. The fitness function used in this work is given in Equation 17. This fitness function can be a minimization function or maximization function. In this work, maximization function is used. The fitness value is estimated for each chromosome. The fitness function used in this work is given by

Fitness ¼ ðαγÞ þ β

  jcj−jrj ; jcj

(17)

where γ = classification accuracy; |c| = total number of features

  0 0 jr j ¼ length of the chromosome number of 1 s ; α∈½0; 1 and β ¼ 1‐α:

This formula shows that the classification accuracy and the feature subset length have different impact on feature selection. The classification accuracy of each chromosome is estimated with the features having a value “1” in that specific chromosome.

6 of 20

HEMANTH

ET AL.

Step 3: After estimating the fitness value, the members of the population are arranged in descending order. The least two members (with lowest fitness values) in this order are eliminated. Step 4: In GA, the size of the population must be same throughout the operation. Hence, two new members (chromosomes) must be formed to maintain the uniform size of the population. This is achieved with a mathematical operation called as crossover operator. In this method, two parents (members with the best fitness values) are chosen and swapping of bits is performed to generate two offsprings (new members). An example is given below: Parent 1 ¼ 1 0 0 1 1 0 1 1 0 1 0 1 Parent 2 ¼ 0 0 1 1 0 1 0 1 1 0 1 0 Offspring 1 ¼ 1 0 0 1 1 0 1 1 0 0 1 0 Offspring 2 ¼ 0 0 1 1 0 1 0 1 1 1 0 1 The vertical line in Parents 1 and 2 is called as crossover site. All the bits that are on the right‐hand side of the crossover site are swapped between the two parents. Thus, two new offsprings are formed. Because they are formed from the fittest parents, these offsprings also carry the characteristic features of the fittest parents. Step 5: These two offsprings are further added to the already existing population, and the complete process (Steps 2–4) is repeated again.

4.2

|

Implementation of GA

The computational flowchart of GA is shown in Figure 3. In this work, 12 features are represented by a single chromosome (string of bits). The size of the initial random population is 20. This process continues for a specified number of iterations. The fittest chromosome is determined from the outputs of the last iteration. The number of iterations used in this work is 300. In the fittest chromosome, the features with a bit value “1” are accepted and the features with the bit value of “0” are rejected. In this work, four features (mean, standard deviation, skewness, and variance) are eliminated from the 12 input features. The remaining eight features are further used for the ANN‐based classification process. Hence, the number of features used is only eight.

FIGURE 3

Flow diagram of genetic algorithm

HEMANTH

5

7 of 20

ET AL.

CONVENTIONAL ANN‐BASED IMAGE CLASSIFICATION

|

In this work, two conventional ANN are implemented for image classification. Initially, the standard BPN network and HNN are analysed. The analysis of these conventional neural networks is essential to develop the proposed ANN.

5.1

Conventional BPN

|

BPN network is a multilayer ANN with input layer, hidden layer, and output layer. The supervised training methodology is used in BPN. The gradient descent rule is used for adjusting the weights of the layers. Even though BPN is accurate, the computational complexity is very high. The computational complexity is also proportional to the convergence time requirement. Also, the stability of BPN is not always guaranteed. Thus, it is very clear that BPN can be used in applications where the accuracy is prime concern. However, it cannot be used in situations where both accuracy and time are equally important.

5.1.1

|

Architecture of BPN

The layout of the BPN is shown in Figure 4. In this work, BPN with a single hidden layer is considered for implementation. Two sets of weight matrices are used in the architecture. The different layers (input, hidden, and output) are interconnected by the weight matrices W and U. The target vector is supplied to the output layer. A detailed explanation of the conventional BPN is available in Fausett (2006). However, the training algorithm of BPN is illustrated in this section.

5.1.2

|

Training algorithm of BPN

The standard back propagation method consists of two different stages: (a) During the feedforward phase, the input is fed to the neural network and the output is calculated. (b) In the reverse phase, the error is estimated based on the output. The entire process is repeated till an error threshold value is reached. A summarized version of the algorithm is given below: Step 1: Initialize the weight matrices W and U. Step 2: Repeat Steps 3–8 till the convergence condition is reached. Step 3: Implement Steps 4–8 for each training pair Feedforward Step 4: Each input node xi receives the input signal, and distribute it to the hidden layer. Step 5: Each hidden node calculates the weighted sum of their inputs, and estimate the output signal using the activation function.

FIGURE 4

Framework of back propagation neural

z_ inj ¼ ∑xi :wij ;

(18)

 zj ¼ f z_ inj :

(19)

8 of 20

HEMANTH

ET AL.

Step 6: The same process is repeated in the output layer, and the output values are estimated. y_ ink ¼ ∑zj :ujk ;

yk ¼ f y_ ink



(20)

(21)

Reverse phase Step 7: Each output node is compared with the target value tk, and the error value is calculated. The weight correction terms are then estimated to adjust the weights. 0

δk ¼ ðtk −yk Þ:f ðy− ink Þ;

(22)

Δujk ¼ α:δk :zj ;

(23)

where α = learning rate Step 8: The weight correction term for the hidden layer is estimated based on the correction term of the output layer. δ_ inj ¼ ∑δk :ujk ;

(24)

 0 δj ¼ δ_ inj :f z_ inj ;

(25)

Δwij ¼ α:δj :xi :

(26)

Step 9: The weight matrices are then adjusted using the following formulae: wij ðnewÞ ¼ wij ðoldÞ þ Δwij ;

(27)

ujk ðnewÞ ¼ ujk ðoldÞ þ Δujk :

(28)

A careful observation of the above algorithm shall point out the various drawbacks of this training algorithm. The computational complexity is significantly high due to the requirement of weight adaptation procedures for two matrices. The complexity increases with the increase in the number of layers. There is also no standard procedure to ensure the stability of the system. The number of extensive mathematical operations also impacts the convergence time of the approach. The high accuracy normally seen in BPN is also dependent on the number of iterations. Hence, there is always a requirement for an improved version of conventional BPN.

5.2

Conventional HNN

|

HNN is the most stable ANN. It is a single‐layer ANN, and the training process takes place without the target vector. HNN is mainly used for pattern recognition applications.

5.2.1

|

Architecture of HNN

The architecture of the HNN is shown in Figure 5. A feedback loop is seen in the architecture. In this architecture, two important conditions are adopted to ensure stability of the system. They are

uii ¼ 0; uij ¼ uji :

(29)

The weight calculation procedure is relatively simple because an exhaustive estimation procedure is not necessary. A detailed explanation on the architecture is given in Wang et al. (2015). However, the training algorithm is discussed in this work.

HEMANTH

FIGURE 5

5.2.2

|

9 of 20

ET AL.

Architecture of Hopfield neural network

Training algorithm of HNN

The training algorithm of HNN is relatively simple. In HNN, the weights remain fixed whereas the output value changes and tries to reach the equilibrium condition. The steps involved in the algorithm are summarized as follows: Step 1: The weight matrix uij is estimated using the Hebbian rule with the help of the inputs. uij ¼ ∑ini inj ∀i≠ j:

(30)

yi ¼ ini ;

(31)

y_ ini ¼ ini þ ∑ yj uji :

(32)

Step 2: The initial output value is given by

where ini is the external input supplied to the output layer. Step 3: The NET value of the output layer is given by

j

Step 4: The output value is calculated using the activation function  yi ¼ f y_ ini :

(33)

Step 5: The energy function is estimated using these output values. If the energy function value changes frequently, then Steps 3–5 are repeated. The entire procedure is repeated till the energy function reaches the equilibrium condition. Thus, the output value of HNN changes continuously whereas the weight values remain fixed. The computational complexity is lower than BPN. The convergence time is also significantly reduced. However, the accuracy of such networks is relatively low because the weight remains fixed throughout the process. The output values are dependent on the weight matrix and the input values. Hence, the accuracy of the input values must be high in order to ensure high output accuracy. This shows the necessity for a modified neural network. The modified neural network must be accurate and computationally efficient.

6

|

MODI FI ED ANN ‐ BA SED I M A GE CLASS I FICAT I ON

In this work, the combination of HNN and BPN network is used to frame the proposed approach.

10 of 20

6.1

HEMANTH

ET AL.

BPHN‐based image classification

|

The common drawbacks of BPN are high computational complexity and lack of stability. The huge computational complexity is mainly due to the number of weight adjustment conditions for each layer. In this approach, the number of weight adjustment equations is reduced. This objective is achieved by including a layer with fixed weight values (Hopfield layer). Because the Hopfield layer is used, physical connections exist between the output layer and the input layer. This methodology ensures the stability of the system. Thus, the drawbacks of the conventional BPN are removed in this approach. On the other hand, the accuracy of HNN is lesser than BPN because the numbers of layers are limited. It may be noted that multilayer networks are accurate than the single‐layer neural networks. This problem is tackled by including the middle layer in the HNN. This improves the accuracy of the overall system. Thus, the proposed approach enjoys the benefits of the BPN and HNN simultaneously.

6.1.1

|

Architecture of BPHN

The structural design of BPHN is shown in Figure 6. In the proposed approach, one input layer, one middle layer, and one output layer are used. The number of neurons in the input layer is based on the input features, and the number of neurons in the output layer is based on the number of classes. Because feedback exists between the output layer and the input layer, the number of neurons in both these layers must be equal. Hence, the number of neurons in the output layer is increased and more than one neuron is assigned for each class. In this work, eight neurons are used in the output layer. Hence, two output layer neurons are assigned for each class. Two weight matrices are used in this work. One set of weights remains fixed, and the other set is adaptive. The weights between the input layer and the middle layer are adjusted using the BPN approach. The weights between the middle and the output layer are adjusted using the Hopfield approach. The target vector is supplied to the middle layer with different representations. The number of neurons used in the hidden layer is 20.

6.1.2

|

Training algorithm of BPHN

In the proposed approach, only one weight matrix is adjusted whereas the other weights remain fixed. This concept is different from the conventional BPN. The supervised training methodology is used in the input layer. The Hopfield training methodology is used in the output layer. Because the external input is supplied to the input layer of BPHN, there is no necessity for the external input in the Hopfield layer. The training algorithm of BPHN is summarized as follows: Step 1: Supply the external input xi to input layer of BPHN. Step 2: The weighted sum of the inputs for each middle layer neuron is estimated using the randomly initialized weight values. The output values are calculated using the sigmoid activation function. These operations are implemented using Equations 18 and 19. Step 3: The error value is estimated using the target values supplied to the middle layer. Based on these error values, the weight matrix is adjusted using the following equations.

FIGURE 6

 0  γj ¼ tj −zj :f z− inj ;

(34)

Δwij ¼ α:γj :xi ;

(35)

wij ðnewÞ ¼ wij ðoldÞ þ Δwij :

(36)

Framework of back propagation‐Hopfield network

HEMANTH

11 of 20

ET AL.

Step 4: The output values of the middle layer are given as input to the Hopfield layer. The weights of this layer are estimated using the following equation. ujk ¼ ∑zj zk ∀ j≠k:

(37)

Step 5: The output values of the Hopfield layer are estimated using the following formulae y− ink ¼ ∑ zk ujk ;

(38)

yk ¼ f ðy− inki Þ:

(39)

k

Step 6: The energy function is estimated using these output values. If the energy function value changes frequently, these output values are further fed back as input to the input layer y k ¼ xi ;

(40)

Because the dimensions are same, the output values are assigned to the corresponding input layer neurons. Step 7: Steps 2–6 are repeated till the following two conditions are satisfied: (a) The error value in the input layer is lesser than the predefined threshold value, and (b) the energy function reaches the equilibrium condition. Thus, a modified approach with the fusion of BPN and HNN is proposed in this paper. The performance measures of the proposed approach are significant, which is analysed in the next section.

7

|

E X P E R I M E N T A L R E S U L T S A ND D I S C U S S I O N S

The experiments are conducted on 420 real‐world abnormal brain images collected from 44 patients of different ages. These images are collected from four categories such as metastase, meningioma, astrocytoma, and glioma. The dataset used for implementation is based on k‐fold cross‐validation technique. In this work, the k value is fixed to be 3. Hence, the entire dataset is divided into three equal parts (140 each). In this work, one part is used for training, one part is used for testing, and the other part is used for validation. The experiment will be repeated k times for the same classifier but with different training data, testing data, and validation data. This concept is illustrated in Figure 7. In this method, ANN is trained using the training data set. The accuracy of training is checked using the validation data set. If the accuracy is satisfactory, then the testing dataset is used to analyse the performance measures of the proposed network. This methodology adds solidity to the analysis. This eliminates the bias in classification error to some extent. The overall performance of the classifier is based on the average value of the classifier at different k values. The experiments are carried out on the Pentium processor with 2 GB RAM and 1.66 GHz clock frequency. The software used for implementation is MATLAB. The three approaches are analysed in terms of classification accuracy, sensitivity, specificity, computational complexity, and convergence time. The experiments are analysed for noisy images also. The formulae for calculating these performance measures are given by

FIGURE 7

Three‐fold cross‐validation

CA ¼ TP þ TN=ðTP þ TN þ FP þ FNÞ;

(41)

Sensitivity ¼ TP=ðTP þ FNÞ;

(42)

Specificity ¼ TN=ðTN þ FPÞ:

(43)

12 of 20

HEMANTH

ET AL.

In the above equations, TP corresponds to True Positive, TN corresponds to True Negative, FP corresponds to False Positive, and FN corresponds to False Negative. These parameters for a specific category are as follows: TP = True Positive (an image of “meningioma” type is categorized correctly to the same type), TN = True Negative (an image of “Non‐meningioma” type is categorized correctly as “Non‐meningioma” type), FP = False Positive (an image of “Non‐meningioma” type is categorized wrongly as “meningioma” type), and FN = False Negative (an image of “meningioma” type is categorized wrongly as “Non‐meningioma” type). “Non‐meningioma” actually corresponds to any of the three categories other than “meningioma.” Thus, “TP and TN” corresponds to the correctly classified images and “FP and FN” corresponds to the misclassified images The same parameters are determined for all the categories by replacing “meningioma” in the above definitions with other abnormal categories.

7.1

|

Classification accuracy analysis

The performance measures of the classifiers are estimated using the confusion matrix. In the confusion matrix, the row‐wise elements correspond to the four categories and the column‐wise elements correspond to the target class associated with that abnormal category. Hence, the number of images correctly classified (TP) in each category is determined by the diagonal elements of the matrix. The row‐wise summation of elements for each category (other than the diagonal elements) corresponds to the “FN” of that category. The column‐wise summation of elements for each category (other than the diagonal elements) corresponds to the “FP” of that category. Similarly, “TN” of the specific category is determined by adding the elements of the matrix (other than the elements in the corresponding row and column of the specific category). The performance measures are estimated only for the testing images. Tables 1, 2, and 3 show the confusion matrices of BPN, HNN, and BPHN for different k values. From the above reported tables, it is evident that the level of misclassification rate is reduced in BPHN. It can be also noted that metastasis category has been classified more accurately than other classes. The performance measures of the classifiers are further estimated from these confusion matrices. Tables 4–9 show the performance measures of the classifiers in terms of sensitivity, specificity, and accuracy. Table 10 shows the average performance measures of the three classifiers. The superior nature of the proposed approach over other two classifiers is verified from the above reported results. Sensitivity is the measure of correctly classifying the images to the same category. A high value of sensitivity is always preferable. It is evident that the significant improvement is achieved in the proposed approach over the BPN and HNN. Similarly, specificity deals with the True Negatives and the value must be high for an efficient system. It can be seen that the specificity is almost the same for all the techniques. The classification accuracy of BPHN is also higher than the conventional techniques.

TABLE 1

Confusion matrix of back propagation neural for different k values

k=1

k=2 Class 1

Men

Class 2

Class 3

Class 4

Class 1

k=3

Class 2

Class 3

Class 4

Class 1

Class 2

Class 3

Class 4

26

4

2

3

29

2

2

2

28

3

3

1

G

2

28

3

2

4

27

1

3

2

31

1

1

A

0

2

29

4

2

5

25

3

3

2

27

3

Met

3

1

1

30

2

1

3

29

1

2

1

31

Class 2

Class 3

Class 4 4

Note. Class 1 = meningioma (Men); Class 2 = glioma (G); Class 3 = astrocytoma (A); Class 4 = metastasis (Met). TABLE 2

Confusion matrix of Hopfield neural network for different k values

k=1

k=2 Class 1

Men

Class 2

Class 3

Class 4

Class 1

k=3

Class 2

Class 3

Class 4

Class 1

23

4

4

4

25

4

2

4

26

3

2

G

3

27

3

2

4

26

2

3

2

29

3

1

A

5

2

25

3

4

5

23

3

4

4

25

2

Met

3

3

1

28

3

3

3

26

2

2

2

29

Class 2

Class 3

Class 4

TABLE 3

Confusion matrix of back propagation‐Hopfield network for different k values

k=1

k=2 Class 1

Men

Class 2

Class 3

Class 4

Class 1

k=3

Class 2

Class 3

Class 4

Class 1

29

2

2

2

32

0

0

3

31

0

2

2

G

0

32

1

2

2

29

3

1

0

33

0

2

A

2

0

31

2

3

2

28

2

2

2

30

1

Met

1

1

0

33

2

0

1

32

1

0

0

34

13 of 20

HEMANTH

ET AL.

TABLE 4

Performance analysis of back propagation neural for k = 1 and k = 2 k=1

k=2

TP

TN

FP

FN

Sn

Sp

CA (%)

TP

TN

FP

FN

Sn

Sp

CA (%)

Men

26

100

9

5

0.84

0.92

90

29

97

6

8

0.79

0.94

90

G

28

98

7

7

0.80

0.93

90

27

97

8

8

0.77

0.92

89

A

29

99

6

6

0.83

0.94

91

25

99

10

6

0.81

0.91

88

Met

30

96

5

9

0.77

0.95

90

29

97

6

8

0.78

0.94

90

0.81

0.93

90

Average value

0.79

0.93

89

Average value

TABLE 5

Performance analysis of back propagation neural for k = 3 TP

TN

FP

FN

Sn

Sp

Ca (%)

Men

28

99

7

6

0.82

0.93

90

G

31

98

4

7

0.81

0.96

92

A

27

100

8

5

0.84

0.93

91

Met

31

100

4

5

0.86

0.96

94

0.83

0.94

92

Average value

TABLE 6

Performance analysis of Hopfield neural network for k = 1 TP

TN

FP

FN

Sn

Sp

CA (%)

Men

23

94

12

11

0.68

0.89

84

G

27

96

8

9

0.75

0.92

88

A

25

97

10

8

0.76

0.91

87

Met

28

96

7

9

0.76

0.93

89

0.74

0.91

87

Average value

TABLE 7

Performance analysis of Hopfield neural network for k = 2 and k = 3 k=2

k=3

TP

TN

FP

FN

Sn

Sp

CA (%)

TP

TN

FP

FN

Sn

Sp

CA (%)

Men

25

94

10

11

0.69

0.91

86

26

97

9

8

0.77

0.92

88

G

26

93

9

12

0.68

0.91

85

29

96

6

9

0.76

0.94

89

A

23

98

12

7

0.77

0.89

86

25

98

10

7

0.78

0.91

88

Met

26

95

9

9

0.74

0.91

87

29

98

6

7

0.81

0.94

90

0.72

0.90

86

Average value

0.78

0.93

89

Average value

TABLE 8

Performance analysis of back propagation‐Hopfield network for k = 1 and k = 2 k=1

Men

k=2

TP

TN

FP

FN

Sn

Sp

CA (%)

TP

TN

FP

FN

Sn

Sp

CA (%)

29

102

6

3

0.91

0.94

94

53

153

5

9

0.85

0.97

93

G

32

102

3

3

0.94

0.97

96

42

164

7

7

0.86

0.96

93

A

31

102

4

3

0.91

0.96

95

35

173

6

6

0.85

0.97

93

Met

33

99

2

6

0.85

0.98

94

63

143

9

5

0.93

0.94

94

0.90

0.96

95

Average value

0.87

0.96

93

Average value

TABLE 9

Performance analysis of back propagation neural for k = 3 TP

TN

FP

FN

Sn

Sp

CA (%)

Men

31

102

4

3

0.91

0.96

95

G

33

103

2

2

0.94

0.98

97

A

30

103

5

2

0.94

0.95

95

Met

34

100

1

5

0.87

0.99

96

0.91

0.97

96

Average value

14 of 20

HEMANTH

TABLE 10

ET AL.

Average measures of the classifiers

Classifiers

Average sensitivity

Average specificity

Average classification accuracy (%)

BPN

0.81

0.93

90

HNN

0.74

0.91

87

BPHN

0.89

0.96

95

Note. BPN = back propagation neural; BPHN = back propagation‐Hopfield network; HNN = Hopfield neural network.

Among the conventional techniques, it is seen that BPN is more efficient than the HNN. Among the abnormal categories, meningioma is slightly difficult to classify than the other categories. The reason for these results can be derived from the nature of the classifiers. Because BPN is a multilayer network, the performance is better than HNN.

7.2

|

Receiver operating characteristics (ROC) analysis

The proposed classifiers are further analysed based on the ROC curves. The sensitivity and specificity measures are estimated at three different threshold points (0.25, 0.5, and 0.75). Table 11 shows the sensitivity and specificity of the classifiers at different threshold values. Based on these values, the ROC curve is plotted and shown in Figure 8. The ROC curve for the BPHN classifier is close to the ideal curve. As per the DeLongs' test, the standard errors of the area under curve are 0.0245, 0.0361, and 0.0491 for the BPHN, BPN, and HNN classifiers, respectively. Thus, the superior nature of the BPHN over other classifiers is verified.

7.3

|

Robustness analysis

The same experiments are repeated with images under noisy conditions to ensure the robustness of the proposed approach. The images are mixed with the Gaussian noise with zero mean and 0.1 variance. The images are also mixed with the “salt and pepper noise.” The classification process is repeated with these images, and the results are analysed. Table 12 illustrates the performance of the classifiers under noisy conditions. The robustness of the proposed approach is evident from Table 12. It may be noted that the performance of the classifiers does not vary much for the different types of noises. The deviation in the performance measures is only marginal. However, the performance of the classifiers has significantly reduced in comparison with the images without noise. Among the classifiers, 8–9% reduction in CA is observed for the BPN and 11–12% reduction in CA is observed for the HNN. However, the performance degradation of the BPHN for the noisy images is only 3–4%. Thus, BPHN is

TABLE 11

Sensitivity and specificity of the classifiers at three different threshold values Sensitivity (at 0.25, 0.5, and 0.75 threshold values)

Specificity (at 0.25, 0.5, and 0.75 threshold values)

Classifiers

0.25

BPHN BPN HNN

0.5

0.75

0.25

0.5

0.75

0.79

0.89

0.94

0.98

0.96

0.8

0.7

0.81

0.9

0.97

0.93

0.7

0.62

0.74

0.84

0.96

0.91

0.55

Note. BPN = back propagation neural; BPHN = back propagation‐Hopfield network; HNN = Hopfield neural network.

FIGURE 8

Receiver operating characteristic curves for the proposed classifiers

HEMANTH

TABLE 12

15 of 20

ET AL.

Performance of the classifiers with noisy images BPN

Noise

Categories

Gaussian

“Salt and pepper”

HNN

BPHN

CA (%)

Sn

Sp

CA (%)

Sn

Sp

CA (%)

Sn

Sp

Meningioma Glioma Astrocytoma Metastasis

86 87 85 89

0.80 0.81 0.83 0.88

0.88 0.91 0.89 0.90

79 77 76 81

0.71 0.70 0.69 0.77

0.84 0.85 0.81 0.89

93 92 93 94

0.92 0.92 0.91 0.94

0.97 0.96 0.96 0.97

Meningioma Glioma Astrocytoma Metastasis

82 81 83 90

0.78 0.82 0.85 0.87

0.86 0.89 0.85 0.89

76 74 78 80

0.68 0.67 0.72 0.75

0.82 0.81 0.79 0.90

90 93 92 93

0.90 0.93 0.89 0.92

0.95 0.94 0.94 0.96

Note. BPN = back propagation neural; BPHN = back propagation‐Hopfield network; HNN = Hopfield neural network.

relatively better than other approaches. It can be also observed that the average value of CA is approximately 93–94%. Even though the images are noisy, the proposed approach is better than the conventional approaches. It can be also seen that the performance degradation of BPHN in terms of sensitivity and specificity for the noisy images is very low. The metastasis images are classified more efficiently than other categories. The ability of the proposed approach to perform under noisy conditions is verified with this analysis. The performance of the ANN is based on the accuracy of the input. Hence, the minimal performance degradation of the proposed approach is tolerable for practical applications. Thus, the robustness of the proposed approach is validated with these experimental results.

7.4

|

Stability analysis

Stability is one of the significant features of any ANN that directly impacts the convergence rate of the system. In a highly stable system, the output of the ANN moves in the correct direction. But large fluctuations in the output values are seen in the less stable system. These large fluctuations usually force the ANN to take more time for convergence. A stable system converges quickly due to fewer fluctuations in the output values and error values. The stability of the ANN can be measured quantitatively with the help of the training error values. Initially, the error value will be high and it drops down during the training process. The decrease in the error value may not be steady, and this can be used to analyse the stability of the system. Figures 9, 10, and 11 show the training sequence for 20 epochs of BPN, HNN, and BPHN, respectively. The training error value must decrease monotonically for the stable system. A sample training instance is shown in this analysis. In the training curves of BPN, the fluctuations in the error values are very high. This shows the instability of the system. The fluctuations are very less in BPHN. It is almost similar to the HNN. An average 8–9 “rise and fall” situations for every 20 epochs are observed in the training process of BPN. This implies that the error value increases and decreases very frequently in BPN. On the other hand,

FIGURE 9

Training curves of back propagation neural

16 of 20

HEMANTH

FIGURE 10

Training curves of Hopfield neural network

FIGURE 11

Training curves of back propagation‐Hopfield network

ET AL.

the average “rise and fall” situations in HNN are 1–2 and the average “rise and fall” situations for BPHN are 2–3. Even though HNN is marginally better than BPHN, the less accurate nature of the HNN makes the system less suited for practical situations. Thus, BPHN can be considered as an optimal system in terms of stability and accuracy. It may be also noted that the error value of BPN at the end of 20th epoch is 0.53 and the error value of HNN is 0.72. The error value of BPHN is 0.49. The next section deals with the computational complexity analysis and the time requirement for the training process.

7.5

|

Computational complexity analysis

The computational complexity of the system is based on the number of mathematical calculations. The computational calculations must be less for an efficient system. In this section, an analysis is performed between the classifiers based on the number of mathematical operations. The convergence time is directly proportional to the number of operations. The following two‐step procedure is used to determine the number of mathematical operations for each classifier.

HEMANTH

ET AL.

17 of 20

Step 1: Divide the training algorithm into small segments. Step 2: Determine the number of mathematical calculations in each segment. Even though all the mathematical operations are not unique, the average computational complexity of each system can be analysed using this procedure.

7.5.1

|

Complexity analysis of BPN

The two steps involved in the training of BPN are as follows: (a) Feedforward phase In this phase, “n” multiplication operations, “n + 1” addition operations, and 1 division operation are required for calculating the output values in the hidden layer. Here, “n” represents the number of input layer neurons. This process is repeated for the output layer. Hence, the number of operations are “2n” multiplications, “2(n + 1)” additions, and “2” division operations. Thus, the total number of operations is given by 4n + 5. (b) Reverse phase (weight adjustment of the output layer) In this phase, 1 addition operation, 3 multiplication operations, and 1 subtraction operation are required for weight adjustment in the output layer of BPN. These operations are performed individually for a matrix of size “n × n.” Hence, the total number of operations increases to 5n2. In this work, the number of neurons used in all the layers is same. (c) Reverse phase (weight adjustment of the hidden layer) In this phase, 2 addition operations, 4 multiplication operations, and 1 subtraction operation are required for weight adjustment in the hidden layer of BPN. These operations are performed individually for a matrix of size “n × n.” Hence, the total number of operations increases to 7n2. Thus, the total mathematical operations of BPN are 12n2 + 4n + 5. If “t” is the number of iterations, then the overall mathematical operations increase to t(12n2 + 4n + 5). Usually, the value of “t” is very high.

7.5.2

|

Complexity analysis of HNN

The steps involved in the training of HNN are as follows: (a) Weight estimation: The number of operations required for this process is 1 addition and 1 multiplication operations. These operations are performed individually for a matrix of size “n × n.” Hence, the total number of operations increases to 2n2. (b) Training of the Hopfield layer: In this section, 2 addition operations, 1 multiplication operation, and 1 division operation are required for the training process. These operations are performed individually for a matrix of size “n × n.” Hence, the total number of operations increases to 4n2. The sigmoid activation function is used for HNN in this work. Thus, the total mathematical operations of HNN are 6n2. If “t” is the number of iterations, then the overall mathematical operations increase to t(6n2).

7.5.3

|

Complexity analysis of BPHN

The steps involved in the training of HNN are as follows: (a) Forward pass of BPN (input layer to middle layer): In this phase, “n” multiplication operations, “n + 1” addition operations, and 1 division operation are required for calculating the output values of the middle layer. The total number of operations is 2n + 2. (b) Weight adjustment of the middle layer:

18 of 20

HEMANTH

ET AL.

In this section, 1 addition operation, 3 multiplication operations, and 1 subtraction operation are required for weight adjustment in the output layer of BPN. These operations are performed individually for a matrix of size “n × n.” Hence, the total number of operations increases to 5n2. (c) Training of Hopfield layer: This section includes the estimation of weights in the Hopfield layer. The total number of operations is 6n2. If the number of iterations is defined by “t”, then the number of operations increase to t(11n2 + 2n + 2). The computational complexity of the neural networks is in the order of O(n2). However, differences exist in terms of the number of mathematical operations. This analysis includes only the significant operations of the neural network. From the above analysis, it is evident that the computational complexity of BPN is significantly high. The weight adjustment equations are primarily responsible for the high computational complexity. Also, the number of iterations required for convergence is significantly higher than the other two approaches. On the other hand, HNN is less complex due to the lack of weight adjustment equations. The proposed BPHN is significantly less complex than the BPN but more complicated than the HNN. Because HNN is not preferred due to the lack of accuracy, BPHN can be termed as the optimal approach among the three classifiers. The number of epochs required for convergence by the classifiers is shown in Figure 12. It is evident that the number of iterations required for the convergence of BPN is significantly higher than the other classifiers. The number of epochs significantly impacts the computational complexity of the classifiers.

7.6

|

Comparative analysis

A comparative analysis between all the classifiers in terms of the performance measures is shown in Table 13.

FIGURE 12 Convergence analysis of the classifiers. BPN, back propagation neural; BPHN, back propagation‐Hopfield network; HPN, Hopfield neural network TABLE 13

Comparative analysis of the classifiers BPN

HNN

BPHN

CA without noise (%)

High

Low

High

CA with noise (%)

High

Low

High

Sensitivity without noise

High

Low

High

Sensitivity with noise

High

Low

High

Specificity without noise

High

Low

High

Specificity with noise

High

Low

High

Stability

Low

High

High

Computational complexity

High

Low

Low

Convergence time

High

Low

Low

Note. BPN = back propagation neural; BPHN = back propagation‐Hopfield network; HNN = Hopfield neural network. TABLE 14

Comparative analysis of the classifiers with other works

Techniques

Average sensitivity

Average specificity

Fuzzy neural network (Shen et al., 2005)

0.72

0.64

Probabilistic neural classifier (Pantelis et al., 2008)

0.80

0.87

SVM + RBF (Pantelis et al., 2008)

0.85

0.88

Feedforward neural network (Pantelis et al., 2008)

0.84

0.92

BPHN (proposed approach)

0.89

0.96

Note. BPHN = back propagation‐Hopfield network; RBF = radial basis function.

HEMANTH

ET AL.

19 of 20

From Table 13, it is evident that BPN is efficient in terms of accuracy but computationally expensive. On the other hand, the computational complexity of HNN is very less but it is inaccurate. However, the proposed BPHN is efficient in terms of both accuracy and computational complexity. Thus, the fusion methodology used in this work is found to be useful for practical medical applications. A comparative analysis with the “state‐of‐the‐art” approaches is also reported in Table 14. The methodologies used for comparison are adopted from the literature. However, these techniques are tested with the dataset used in this work. This analysis has verified the efficiency of the proposed approach over other ANN‐based image classification approaches. Thus, this work highlights the significance of the proposed approach for practical image classification applications.

8

|

C O N CL U S I O N

In this work, a hybrid neural network is proposed for medical image classification applications. This modified approach is framed by the fusion of two conventional neural networks. This modification is done with an objective to include the benefits of both the conventional neural networks in the single proposed approach. The effectiveness of the proposed approach is evaluated in terms of accuracy measures, generalization capability, computational complexity, convergence time, and stability. The experimental results have yielded significantly positive results for the proposed approach. The proposed approach has been also much efficient than many of the conventional neural networks. Thus, this research work suggests a suitable alternate for the conventional neural networks. As a future work, the proposed approach can be tested for other practical applications. ACKNOWLEDGEMEN T The authors wish to thank M/S Devaki Scan Centre, Madurai, India, for their help regarding database collection and validation. RE FE R ENC E S Amato, F., Lopez, A., Pena‐Mendez, E. M., Vanhara, P., Hampl, A., & Havel, J. (2013). Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine, 11, 47–58. Cai, W., Chen, S., & Zhang, D. (2010). A multiobjective simultaneous learning framework for clustering and classification. IEEE Transactions on Neural Networks, 21(2), 185–200. Chaudhry, S. S., Varano, M. W., & Xu, L. D. (2000). Systems research, genetic algorithms, and information systems. Systems Research and Behavioral Science, 17, 149–162. Christoyianni, I., Dermatas, E., & Kokkinakis, G. (2000). Fast detection of masses in computer aided mammography. IEEE Signal Processing Magazine, 17(1), 54–64. Comtat, C., & Morel, C. (1995). Approximate reconstruction of PET data with a self‐organizing neural network. IEEE Transactions on Neural Networks, 6(3), 783–789. Coppini, G., Diciotti, S., Falchini, M., Villari, N., & Valli, G. (2003). Neural networks for computer aided diagnosis: Detection of lung nodules in chest radiograms. IEEE Transactions on Information Technology in Biomedicine, 7(4), 344–357. Dokur, Z., & Olmez, T. (2003). Segmentation of MR and CT images by using a quantiser neural network. Neural Computing and Applications, 11(3), 168–177. Duan, L., & Xu, L. D. (2012). Business intelligence for enterprise systems: A survey. IEEE Transactions on Industrial Informatics, 8(3), 679–687. Fatemizadeh, E., Lucas, C., & Soltanian‐Zadeh, H. (2003). Automatic landmark extraction from image data using modified growing neural gas network. IEEE Transactions on Information Technology in Biomedicine, 7(2), 77–85. L. Fausett, ‘Fundamentals of neural networks: Architectures, Algorithms and Applications, Pearson Education India, New Delhi, 2006. Ge, J., Sahiner, B., Hadjiiski, L. M., Chan, H. P., Wei, J., Helvie, M. A., & Zhou, C. (2006). Computer aided detection of clusters of micro calcifications on full field digital mammograms. Medical Physics, 33(8), 2975–2988. Hainc, L., & Kukal, J. (2006). Role of robust processing in ANN de‐noising of 2D image. Neural Network World, 16(2), 163–176. Hemanth, D. J., Vijila, C. K. S., & Anitha, J. (2010). Performance improved PSO based modified counter propagation neural network for abnormal MR brain image classification. International Journal of Advances in Soft computing and its applications, 2(1), 65–84. Jaiswal, R. R., & Gaikwad, A. N. (2006). Neural network assisted effective lossy compression of medical images. IETE Technical Review, 23(2), 119–126. Jiang, J., Trundle, P., & Jinchang, R. (2010). Medical image analysis with artificial neural networks. Computerized Medical Imaging and Graphics, 34, 617–631. Jiang, Y., Xu, L. D., Wang, H., & Wang, H. (2009). Influencing factors for predicting financial performance based on genetic algorithms. Systems Research and Behavioral Science, 26(6), 661–673. Kondo, C., Kondo, T., & Ueno, J. (2009). Three‐dimensional medical image analysis of the heart by the revised GMDH‐type neural network self‐selecting optimum neural network architecture. Artificial Life and Robotics, 14(2), 123–128. Li, F., Xu, L. D., Jin, C., & Wang, H. (2011a). Intelligent bionic genetic algorithm (IB‐GA) and its convergence. Expert Systems with Applications, 38(7), 8804–8811. Li, F., Xu, L. D., Jin, C., & Wang, H. (2011b). Structure of multi‐stage composite genetic algorithm (MSC‐GA) and its performance. Expert Systems with Applications, 38(7), 8929–8937. Li, F., Xu, L. D., Jin, C., & Wang, H. (2012). Random assignment method based on genetic algorithms and its application in resource allocation. Expert Systems with Applications, 39(15), 12213–12219. Li, H., & Da, X. L. (2000). A neural network representation of linear programming. European Journal of Operational Research, 124, 224–234. Li, H. X., & Xu, L. D. (2001). Feature space theory—A mathematical foundation for data mining. Knowledge‐Based Systems, 14, 253–257. Li, H. X., Xu, L. D., Wang, J. Y., & Mo, Z. W. (2003). Feature space theory in data mining: Transformations between extensions and intensions in knowledge representation. Expert Systems, 20(2), 60–71. Li, L., Ge, R., Zhou, S., & Valderdi, R. R. (2012). Integrated healthcare information systems. IEEE Transaction of InformationTechnology in Biomedicine, 16(4), 515–517.

20 of 20

HEMANTH

ET AL.

Li, S., Xu, L. D., & Wang, X. (2013). A continuous biomedical signal acquisition system based on compressed sensing in body sensor networks. IEEE Transactions on Industrial Informatics, 9(3), 1764–1771. Lisboa, P. J., & Taktak, A. F. G. (2006). The use of artificial neural networks in decision support in cancer: A systematic review. Neural Networks, 19(4), 408–415. Markaki, V. E., Asvestas, P. A., & Matsopoulos, G. K. (2009). Application of Kohonen network for automatic point correspondence in 2D medical images. Computers in Biology and Medicine, 39(7), 630–635. Middleton, I., & Damper, R. I. (2004). Segmentation of magnetic resonance images using a combination of neural networks and active contour models. Medical Engineering & Physics, 26(1), 71–86. Pan, W., Xu, L., Zhou, S., Fan, Z., Li, Y., & Feng, S. (2010). A novel Bayesian learning method for information aggregation in modular neural networks. Expert Systems with Applications, 37(2), 1071–1074. Pantelis, G., Dionisis, C., Ioannis, K., Antonis, D., George, C K., Koralia, S., … Ekaterini, S. (2008). Improving brain tumor characterization on MRI by probabilistic networks & non‐linear transformation of textural features. Computer Methods and Programs in Biomedicine, 89, 24–32. Papadopoulos, A., Fotiadis, D. I., & Likas, A. (2002). An automatic microcalcification detection system based on a hybrid neural network classifier. Artificial Intelligence in Medicine, 25(2), 149–167. Reddick, W. E., Glass, J. O., Cook, E. N., Elkin, T. D, & Deaton, R. J. (1997). Automated segmentation and classification of multispectral magnetic resonance images of brain using artificial neural networks. IEEE Transactions on Medical Imaging, 16(6), 911–918. Shang, L. F., Lv, J. C., & Yi, Z. (2006). Rigid medical image registration using PCA neural network. Neurocomputing, 69(13), 1717–1722. Sharma, N., Ray, A. K., Sharma, S., Shukla, K. K., Pradhan, S., & Aggarwal, L. M. (2008). Segmentation and classification of medical images using texture‐primitive features: Application of BAM type artificial neural network. Journal Medical Physics, 33(3), 119–126. Shen, S., Sandham, W., Granat, M., & Sterr, A. (2005). MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural‐network optimization. IEEE Transactions on Information Technology in Biomedicine, 9(3), 459–467. Srinivasan, G. N., & Shobha, G. (2008). “Statistical texture analysis”, proceedings of World Academy of Science. Engineering and Technology, 36, 1264–1269. Sun, W., & Wang, Y. (2005). Segmentation method of MRI using fuzzy Gaussian basis neural network. Neural Information Processing, 8(2), 19–24. Taji, T. S., & Gore, D. V. (2013). Overview of texture image segmentation techniques. International Journal of Advanced Research in Computer Science and Software Engineering, 3(12), 283–288. Vigdor, B., & Lerner, B. (2006). Accurate and fast off and online fuzzy ARTMAP based image classification with application to genetic abnormality diagnosis. IEEE Transactions on Neural Networks, 17(5), 1288–1300. Wade, J. J., McDaid, L. J., Santos, J. A., & Sayers, H. M. (2010). SWAT: A spiking neural network training algorithm for classification problems. IEEE Transactions on Neural Networks, 21(11), 1817–1830. Wang, X., Chen, X., & Bi, Z. (2015). Support vector machine and ROC curves for modeling of aircraft fuel consumption. Journal of Management Analytics, 2(1), 22–34. Xu, B., Xu, L. D., Cai, H., Xie, C., Hu, J. & Bu, F. (2014). Ubiquitous data accessing method in IoT‐based information system for emergency medical services. IEEE Transactions on Industrial Informatics, 10(2), 1578–1586. Yin, Y., Zeng, Y., Chen, X., & Fan, Y. (2016). The internet of things in healthcare: An overview. Journal of Industrial Information Integration, 1(1), 3–13. Zhou, S. M., & Xu, L. D. (2001). A new type of recurrent fuzzy neural network for modeling dynamic systems. Knowledge‐Based Systems, 14, 243–251. Zhou, Z. H., Jiang, Y., Yang, Y. B., & Chen, S. F. (2002). Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 24(1), 25–36.

D. Jude Hemanth received his B.E. degree in Electronics and Communication Engineering from Bharathiar University and his M.E. degree in Communication Systems from Anna University. He received his doctoral degree from Karunya University in 2013. Currently, he is working as Associate Professor in the Department of ECE of Karunya University, India. He is a member of IEEE CIS TC on Neural Networks and serves in the editorial board member of many reputed International Journals. J. Anitha received her B.E. degree in ECE from Bharathiar University and M.E. degree in Applied Electronics from Anna University. She received her PhD degree from Karunya University in 2013. Currently, she is working as Associate Professor in the Department of ECE of Karunya University. Her areas of interest are computer vision and medical image processing. Bernadetta Kwintiana Ane is a Senior Researcher at the Institute of Computer‐aided Product Development Systems, Universität Stuttgart, in Germany. She has won various international research grants and has published more than 65 scientific writings in the form of books, chapters in books, journal papers, and conference papers. Currently, as an IEEE professional member, she serves internationally as research fellow for several European research centres. She also serves as Associate Editor for the Journal of Intelligent Automation and Soft‐Computing (Taylor & Francis) and Applied Soft Computing (Elsevier).

How to cite this article: Hemanth JD, Anitha J, Ane BK. Fusion of artificial neural networks for learning capability enhancement: Application to medical image classification. Expert Systems. 2017;e12225. https://doi.org/10.1111/exsy.12225