Failure prediction of critical electronic systems in ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
[11] Skapura D M, Building Neural Networks, Addison-Wesley Publishing Company, ... [15] Powell M J D, Restart procedures for the Conjugate Gradient Method, ...
Connectionist Models for Intelligent Reactive Power Control Ajith Abraham & Baikunth Nath School of Computing & Information Technology Monash University (Gippsland Campus), Churchill 3842, Australia Email: {Ajith.Abraham, Baikunth.Nath}@infotech.monash.edu.au

Abstract In this paper, we demonstrate the usage of connectionist networks for intelligent prediction of power factor and efficient utilization of power. We propose two neuro-fuzzy networks and artificial neural networks using various learning techniques. The proposed connectionist networks are trained with the plant load current and a highly fluctuated voltage for online prediction of power factor. For on-line control, voltage and current are fed into the network after preprocessing and standardization. The models are trained with a 24-hour load demand pattern and performance of the proposed method is evaluated by comparing the test results with the actual expected values. It is observed that neuro-fuzzy models perform better than neural networks. Keywords: Power factor, reactive power, power demand prediction, load flow, neuro-fuzzy systems and neural networks.

1

Introduction

Reactive power management of dynamic loads is always a complicated problem. Increase in reactive power consumption leads to hike in energy costs, use of large generators and carrying conductors, unstable system voltages etc. Most of the monitoring systems serving the power consumers are just to track the quality of the power supply and for load flow analysis [6]. For the monitoring system to be more intelligent, we propose the use of connectionist systems [1][8] for predicting the trend of power factor and reactive power demand. By predicting the power factor it is possible to automate the control of reactive power load and better utilization of volt-amperes (VA) inflow. Efficient usage of the VA loading will not only improve the overall grid condition but also reduce the consumer’s industrial tariffs. Depending on the predicted reactive power demand, power factor corrective measures could be turned on or off to control the VA inflow into the plant. This prediction system will be extremely useful for automated control of power inflow, especially in the countries where there are limitations on the usage of consumers’ peak VA maximum demand.

2

Connectionist Systems

Connectionist systems make use of some of the popular soft computing techniques including the methods of: neural computing, fuzzy inference systems [5], evolutionary programming and several hybrid techniques. In contrast with conventional AI techniques, which only deal with precision, certainty and rigor, connectionist models are able to exploit the tolerance for imprecision, uncertainty and are very robust. Connectionist systems are a rapidly growing area of information technology, as numerous practical applications exist in almost every area of industry and society. Artificial Neural Network (ANN) [9] offers a highly structured architecture with learning and generalization capabilities, which attempts to mimic the neurological mechanisms of the brain. ANN stores knowledge in a distributive manner within its weights; which have been determined by learning with known samples. The generalization ability for new inputs is then based on the inherent algebraic structure of the ANN. However it is very hard to incorporate human a priori knowledge into an ANN. This is mainly due to the fact that the connectionist paradigm gains most of its strength from a distributed knowledge representation. In Contrast, Fuzzy Inference System (FIS) exhibit complementary characteristics, offering a very powerful framework for approximate reasoning as it attempts to model the human reasoning process at a cognitive level. FIS acquires knowledge from domain experts and this is encoded within the algorithm in terms of the set of if-then rules. FIS employ this rule based approach and interpolative reasoning to respond to new inputs. The incorporation and interpretation of knowledge is straight forward, whereas learning and adaptation constitute major problems. Evolutionary Algorithms (EA) [20]works by simulating evolution on a computer. From an historical point of view, the evolutionary optimization methods are divided into three main categories: Genetic Algorithms (GAs), Evolutionary Programming (EP) and Evolution Strategy (ES). These methods are fundamentally iterative generation and alteration processes operating on a set of candidate solutions that form a population. The entire population evolves towards better candidate solutions via the selection operation and genetic operators such as crossover and mutation. The selection operator decides which candidate solutions move on into the next generation, thus limits the search space.

3

Reactive Power Control – Myths and Facts Unleashed

A distribution system's operating power is composed of two parts: Active (working) power (KW) and reactive (non-working magnetizing) power. The active power performs the useful work - the reactive power does not. The main function of the reactive power is to develop magnetic fields required by inductive devices. Working power and reactive power together make up the apparent power. Apparent (S), active (P) and reactive power (R) is measured in kilovolt –amperes (KVA), kilowatts (KW) and kilovolt-amperes reactive (KVAR) respectively. Power triangle shown in Figure 1 illustrates the relationship between the three types of power. The power triangle relationship can also be summarized by

KVA 2 = KW 2 + KVAR 2

Figure 1: Power triangle showing the relationships between apparent, active and reactive power. The ratio of active power (KW) to the apparent power (KVAR) is termed the power factor: Power factor = cos (ϑ ) =

KW . KVA

Essentially, power factor is a measurement of how effectively electrical power is being used. The higher the power factor, the more effectively electrical power is being used. It has become a normal practice to say that the power factor is lagging when the current lags the supply voltage and leading when the current leads the supply voltage. This means that the supply voltage is regarded as the reference quantity. A majority of loads served by a power utility draw current at a lagging power factor. When the power factor of the load is unity, active power equals apparent power (P=S). But, when the power factor of the load is less than unity, say 0.6, the power utilized is only 60%. This means that 40% of the apparent power is being utilized to supply the reactive power, VAR, demand of the system. It is therefore clear that the higher the power factor of the load, the greater the utilization of the apparent power. For the generating and transmission stations, lower the power factor the larger must be the size of the source to generate that power, and greater must be the cross-sectional area of the conductor to transmit it. In other words, the greater is the cost of generation and transmission of the power. Moreover, lower power factor will also increase the I2R (I denotes current) losses in lines/equipment as well as result in poor voltage regulation [2][4],[10]. When low power factor is not corrected, the utility must provide the nonworking reactive power in addition to the working active power. This results in the use of larger generators, transformers, bus bars, wires, and other distribution system devices that otherwise would not be necessary. As the utility's capital expenditures and operating costs are going to be higher, they are going to pass these higher expenses to industrial users in the form of power factor penalties and higher utility bills. The simplest way to solve low power factor problems is by adding power factor correction capacitors to the electrical network. Power factor correction capacitors work, as reactive current generators "providing" needed reactive power to the power supply. By self-generation of reactive power, the industrial user frees the utility from having to supply it; therefore, the total amount of apparent power (KVA) supplied by the utility will be less. Power factor correction capacitors reduce the total current drawn from the distribution system and subsequently increase system capacity. Not only will power factor correction capacitors save you money, they will: • • • •

reduce heat loss of transformers and distribution equipment prolong the life of distribution equipment stabilizes voltage levels increase system capacity and many more.

Industrial energy consumption depends directly on the production planning and schedule. Most of the manufacturing plants will be having a tentatively fixed production schedule and thereby a energy consumption associated with the schedule. We proposed to use connectionist models to fully automate the reactive power controllers depending upon the energy drawn from the grid. We considered a heavy automobile industry for studying the load demand patterns. The plant works on 3 shifts of 8 hours duration each and the apparent and active power loading patterns for a 24-hour period were used for training the different connectionist systems. Since over compensation could lead to system damages, care should be taken to turn off the power factor compensators when not required. Since over compensation could cause harmful effects, it is equally important 2

to turn off the reactive power supply sources when not required. The task is to predict the upward and downward trend of the reactive power demand and provide required power factor compensation. The proposed connectionist models are capable of learning relationships amongst variables (function approximation). For this problem, we explored artificial neural networks using different learning algorithms, 2 neuro-fuzzy models and a radial basis function network learn the relationship between the power factor, voltage and current. The proposed connectionist models trained on the data taken at every minute for a 24-hour period to predict the reactive power demand parameters, and tested to evaluate the prediction accuracy.

3.

Artificial Neural Networks

Artificial Neural Networks (ANNs) have been developed as generalizations of mathematical models of biological nervous systems [11]. Networks may be distinguished on the basis of the directions in which signals flow. There are two types of networks: feedforward and feedback. A network in which signals propagate in only one direction (left to right) from an input stage through intermediate neurons to an output stage is called a feedforward network. Feedback networks, on the other hand, are networks in which signals may also propagate from the output of any neurons to the input of any neuron. A neural network is characterized by the network architecture, the connection strength between pairs of neurons (weights), node properties, and updating rules. The updating or learning rules control weights and/or states of the processing elements (neurons). Normally, an objective function is defined that represents the complete status of the network, and its set of minima corresponds to different stable states of the network. Each neuron is an elementary processor with primitive operations, like summing the weighted inputs coming to it and then amplifying or thresholding the sum. There are three broad paradigms of learning: supervised, unsupervised (or self-organized) and reinforcement (a special case of supervised learning). In supervised learning, adaptation occurs when the system directly compares the network output with a known correct or desired answer. The network is initially randomized to avoid imposing any of our own prejudices about an application on the network. The training patterns can be thought of as a set of ordered pairs {(x1, y1), (x2, y2), … , (xP, yP)} where x i represents an input pattern and y i represents the output pattern vector associated with the input vector x i . Once the network weights and biases have been initialized, the network is ready for training. The network can be trained for function approximation, pattern association or pattern classification. There are several different training algorithms for feedforward networks. All of these algorithms use the gradient of the function to determine how to adjust the weights to minimize performance. The gradient is determined using a technique called backpropagation, which involves performing computations backwards through the network. One iteration of this algorithm can be written as

x k + 1 = xk − α k g k where x k is a vector of current weights and biases, g k is the current gradient, and α k is the learning rate. Gradient descent algorithm can be implemented in two ways: incremental mode and batch mode. In the incremental mode, the gradient is computed and the weights are updated after each input is applied to the network. In the batch mode all of the inputs are applied to the network before the weights are updated. For the basic steepest (gradient) descent algorithm, the weights and biases are moved in the direction of the negative gradient of the performance function. The changes to the weights and biases of the network are obtained by multiplying the learning rate times the negative of the gradient. Besides the learning rates, momentum allows a network to respond not only to a local gradient, but also to recent trends in the error surface. Momentum can be added to backpropagation learning by making weight changes equal to the sum of a fraction of the last weight change and the new change suggested by the backpropagation rule.



Variable learning rate

With standard steepest descent, the learning rate is held constant throughout the training. If the learning rate is too high, the algorithm may oscillate and become unstable. If the learning rate is too small, the algorithm will take too long to converge. It is not practical to determine the optimal setting for the learning rate before training, and, and, in fact, the optimal learning rate changes during the training process, as the algorithm moves across the performance surface [13]. The performance of the steepest descent algorithm can be improved by using an adaptive learning rate, which will keep the learning step size as large as possible while keeping learning stable. The learning rate is made adaptive to the complexity of the local error surface. If the new error exceeds the old error by more than a predefined ratio (typically 1.04), the new weights are discarded. In addition, the learning rate is decreased (typically by 70%). Otherwise the new weights are kept. If the new error is less than the old error, the learning rate is increased (typically by 5%). Thus a near optimal learning rate is obtained for the local terrain. When a larger learning rate could result in stable learning, the learning rate is also increased. When the learning rate is too high to guarantee a decrease in error, it gets decreased until stable learning resumes. 3



Resilient Backpropagation

The purpose of the resilient backpropagation training algorithm is to eliminate the harmful effects of the magnitudes of the partial derivatives [14]. Only the sign of the derivative is used to determine the direction of the weight update; the magnitude of the derivative has no effect on the weight update. The size of the weight change is determined by a separate update value. The update value for each weight and bias is increased by a factor whenever the derivative of the performance function with respect to the weight has the same sign for two successive iterations. The update value is decreased by a factor whenever the derivative with respect that weight changes sign from the previous iteration. If the derivative is zero, the update value remains the same. If the weight continues to change in the same direction for several iterations, then the magnitude of the weight change will be increased. This algorithm will be useful particularly when sigmoidal transfer functions are used. Sigmoid functions are characterized by the fact that their slope must approach zero as the input gets large. This causes a problem when using steepest descent technique to train a feedforward network with sigmoidal functions., since the gradient can have a very small magnitude, and therefore cause small changes in the weights, even though the weights and biases are far from their optimal values.



Conjugate Gradient Algorithms

In the Conjugate Gradient Algorithm (CGA) a search is performed along conjugate directions, which produces generally faster convergence than steepest descent directions. A search is made along the conjugate gradient direction to determine the step size, which will minimize the performance function along that line. All the conjugate gradient algorithms start out by searching in the steepest descent direction (negative of the gradient) on the first iteration.

p0 = − g 0 A line search is then performed to determine the optimal distance to move along the current search direction:

x k + 1 = xk + α k p k Then the next search direction is determined so that it is conjugate to previous search direction. The general procedure for determining the new search direction is to combine the new steepest descent direction with the previous search direction:

pk = − g k + β k pk − 1 The various versions of conjugate gradient algorithms are distinguished by the manner in which the constant β k is computed. For the Fletcher Reeves update [13]the procedure is

βk =

gT k gk gT k −1g k −1

This is the ratio of the norm squared of the current gradient to the norm squared of the previous gradient. For the Polak – Ribiére update [13], the constant β k is computed by

βk =

∆g T k −1g k

gT k −1g k −1

This is the inner product of the previous change in the gradient with the current gradient divided by the norm squared of the previous gradient. For all conjugate gradient algorithms, the search direction will be periodically reset to the negative of the gradient. The standard reset point occurs when the number of iterations is equal to the number of network parameters (weights). According to the Powelle – Beale version [15], there will be a restart if there is very little orthogonality left between the current gradient and the previous gradient. This is tested with the following inequality: gT k − 1 g k ≥ 0 .2 g k

2

If this condition is satisfied, the search direction is reset to the negative of the gradient. An important drawback of the above mentioned conjugate gradient algorithm is the requirement of a line search, which is computationally expensive.

4

The Scaled Conjugate Algorithm [16]was designed to avoid the time consuming the line search. The key principle is to combine the model trust region approach with the conjugate gradient approach. •

Quasi - Newton algorithms

Newton's method is an alternative to the conjugate gradient methods for fast optimization. The basic step of Newton's method is given by x k + 1 = x k − Ak−1 g k where Ak is the Hessian matrix (second derivatives) of the performance index at the current values of the weights. Newton's method often converges faster than conjugate gradient methods. Unfortunately, it is computational expensive to derive the Hessian matrix for feedforward ANN. In a Quasi - Newton method (or secant), an approximate Hessian matrix is updated at each iteration of the algorithm. The update is computed as a function of the gradient [17]. The One Step Secant (OSS) [18] method is an attempt to bridge the gap between the computational complexity of conjugate gradient algorithms and the storage and computation in each iteration requirement in the Quasi-Newton algorithm. This algorithm does not store the complete Hessian matrix, it assumes that at each iteration the previous Hessian was the identity matrix. This also has the advantage that the new search direction can be calculated without computing a matrix inverse. •

Levenberg-Marquardt algorithm

Like the Quasi-Newton method the Levenberg-Marquardt (LM) algorithm [13]was designed to approach second order training speed without having to compute the Hessian matrix. When the performance function has the form of a sum of squares, then the Hessian matrix can be approximated to H = J T J ; and the gradient can be computed as g = J T e , where J is the Jacobian matrix, which contains first derivatives of the network errors with respect to the weights, and e is a vector of network errors. The Jacobian matrix can be computed through a standard backpropagation technique that is less complex than computing the Hessian matrix. The LM algorithm uses this approximation to the Hessian matrix in the following Newton-like update: x k + 1 = x k − [ J T J + µI ] −1 J T e When the scalar µ is zero, this is just Newton's method, using the approximate Hessian matrix. When µ is large, this becomes gradient descent with a small step size. As Newton's method is more accurate, µ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function. In this way, the performance function will always be reduced at each iteration of the algorithm.

4

Neuro- Fuzzy Systems

ANN and FIS are both very popular techniques in soft computing. FIS can utilize human expertise by storing its essential components in rule base and database, and perform fuzzy reasoning to infer the overall output value. The derivation of if-then rules and corresponding membership functions depends heavily on the a priori knowledge about the system under consideration. However there is no systematic way to transform experiences of knowledge of human experts to the knowledge base of a FIS. There is also a need for adaptability or some learning algorithms to produce outputs within the required error rate. On the other hand, ANN learning mechanism does not rely on human expertise. Due to the homogenous structure of ANN, it is hard to extract structured knowledge from either the weights or the configuration of the ANN. The weights of the ANN represent the coefficients of the hyper-plane that partition the input space into two regions with different output values. If we can visualize this hyper-plane structure from the training data then the subsequent learning procedures in an ANN can be reduced. However, in reality, the a priori knowledge is usually obtained from human experts and it is most appropriate to express the knowledge as a set of fuzzy if-then rules and it is not possible to encode into an ANN. Table 1 summarizes the comparison of FIS and ANN. Table 1. Complementary features of ANN and FIS ANN FIS Black box Interpretable Learning from scratch Making use of linguistic knowledge

To a large extent, the drawbacks pertaining to these two approaches seem complementary. Therefore it is natural to consider building an integrated system combining the concepts of FIS and ANN modeling. A common way to apply a learning algorithm to a FIS is to represent it in a special ANN like architecture. However the conventional ANN learning algorithms (gradient descent) cannot be applied directly to such a system as the functions used in the inference process are usually non

5

differentiable. This problem can be tackled by using differentiable functions in the inference system or by not using the standard neural learning algorithm. In the section 5 and 6 we present two neuro-fuzzy connectionist models, which we have used in our experimentations.

5.

Evolving Fuzzy Neural Networks

Evolving Fuzzy Neural Network (EFuNN) (Figure 3) implements a Mamdani type FIS (Figure 2) [3]and all nodes are created during learning [12]. The nodes representing membership functions (MF) can be modified during learning. Each input variable is represented here by a group of spatially arranged neurons to represent a fuzzy quantization of this variable. For example, three neurons can be used to represent "small", "medium" and "large" fuzzy values of the variable. Different membership functions can be attached to these neurons (triangular, Gaussian, etc.). New neurons can evolve in this layer if, for a given input vector, the corresponding variable value does not belong to any of the existing MF to a degree greater than a membership threshold. A new fuzzy input neuron, or an input neuron, can be created during the adaptation phase of an EFuNN. A new Rule Node rn is created and its input and output connection weights are set as follows: W1 (rn)=EX; W2 (rn ) = TE where TE is the fuzzy output vector for the current fuzzy input vector EX. In case of "one-of-n" EFuNNs, the maximum activation of a rule node is propagated to the next level. Saturated linear functions are used as activation functions of the fuzzy output neurons. In case of "many-of-n" mode, all the activation values of rule nodes that are above an activation threshold of Ahtr are propagated further in the connectionist structure.

Figure 2. Mamdani fuzzy inference system ( if x is A 1 and y is B 1 then z 1= C 1 )

Figure 3. Architecture of EFuNN

6. Adaptive Neuro-Fuzzy Inference System (ANFIS) ANFIS implements a Takagi Sugeno Kang (TSK) [7] fuzzy inference system (Figure 4) in which the conclusion of a fuzzy rule is constituted by a weighted linear combination of the crisp inputs rather than a fuzzy set [19].

Figure 4. Takagi Sugeno fuzzy inference system

6

Figure 5. Architecture of ANFIS

Figure 5 depicts the 5 layered architecture of ANFIS and the functionality of each layer is as follows: Layer-1 Every node in this layer has a node function Oi1 = µ Ai ( x) , for i =1, 2

or Oi1 = µ Bi − 2 ( y ) ,

for i=3,4,….

Oi1 is the membership grade of a fuzzy set A ( = A1, A2, B1 or B2) and it specifies the degree to which the given input x (or y)

satisfies the quantifier A. Usually the node function can be any parameterized function.. A gaussian membership function is specified by two parameters c (membership function center) and σ.(membership function width) guassian (x, c, σ) =

1 e 2 −

2

(σ ) x −c

Parameters in this layer are referred to premise parameters. Layer-2 Every node in this layer multiplies the incoming signals and sends the product out. Each node output represents the firing strength of a rule. Oi2 = wi = µ Ai ( x) × µ Bi ( y ), i = 1,2...... .

In general any T-norm operators that perform fuzzy AND can be used as the node function in this layer. Layer-3 Every i-th node in this layer calculates the ratio of the i-th rule’s firing strength to the sum of all rules firing strength. Oi3 = wi =

wi , i = 1,2.... . w1 + w2

Layer-4 Every node i in this layer is with a node function O14 = wi fi = wi ( pi x + qi y + ri ) ,

where wi is the output of layer3, and {pi , qi , ri } is the parameter set. Parameters in this layer will be referred to as consequent parameters. Layer-5 The single node in this layer labeled Σ computes the overall output as the summation of all incoming signals: O15 = Overall output = å wi f i = i

åi wi f i . åi wi

ANFIS makes use of a mixture of back propagation to learn the premise parameters and least mean square estimation to determine the consequent parameters. A step in the learning procedure has two parts: In the first part the input patterns are propagated, and the optimal conclusion parameters are estimated by an iterative least mean square procedure, while the 7

antecedent parameters (membership functions) are assumed to be fixed for the current cycle through the training set. In the second part the patterns are propagated again, and in this epoch, back propagation is used to modify the antecedent parameters, while the conclusion parameters remain fixed. This procedure is then iterated.

7.

Experimentation Results

The experimental system consists of two stages: Network training and performance evaluation. 24-hour load flow patterns of a heavy automobile industry were used to train the connectionist models. All the training data were standardized before training. The input parameters considered are the Voltage (V) and Current (I). We randomly fluctuated the input parameter voltage (V) +/- 2.5% of the normal value to cater for worst conditions in the grid voltage regardless of the plant load. This also tests the learning ability of connectionist models during worst situations. We used a Pentium II, 450 MHz platform for simulating the connectionist systems using MATLAB. •

EFuNN training and test results

We used 3 (MF) membership functions and the following evolving parameters: sensitivity threshold Sthr=0.95, error threshold Errthr=0.05. EFuNN uses a one pass training approach. Network was trained using 70% of the data and the remaining 30% data was used for testing and validation. Figure 6 shows the predicted power factor results for the test data.

Figure 6. Test result – predicted power factor using EfuNN

Figure 7. ANFIS with each input variable assigned to 3 MFs.

8

Figure 8. Three-dimensional view of the spatial relationship between input and output variables.

Figure 9. ANFIS rule structure showing the 9 rules, input membership functions and resulting output

Figure 10. Test result – predicted power factor using ANFIS



ANFIS training and test results

We used the network shown in Figure 7 with three gaussian bell membership functions assigned to each input variable. As shown in Figure 9, nine rules were learned using 70% of the data and the remaining 30% was used for testing. Figure 8 illustrates the three-dimensional view of the spatial relationship between input (voltage, current) and output variable (power factor). Test results are shown in Figure 10. We have presented a comparative performance of ANFIS and EfuNN in Table 2. Table 2. Power factor prediction –comparative performance of neuro-fuzzy systems Learning epochs Training time Training Error (RMSE) Testing Error (RMSE) •

EFuNN 1 90 Sec 0.02265 0.021036

ANFIS 20 25 Sec 0.03045 0.02830

ANN training and test results

We used a feedforward neural network with 1 input layer, 2 hidden layers and an output layer [2-36-20-1]. We used the same architecture for simulating the different learning techniques. Input layer consists of 2 neurons corresponding to the input variables. The first and second hidden layers consist of 36 and 20 neurons respectively. The network was trained using 70% of the data and the remaining 30% data was used for testing and validation. We considered 3 variants of backpropagation (BP) algorithm, 4 variants of conjugate gradient algorithm (CGA), 2 variants of quasi-Newton algorithm and Levenberg-Marquardt algorithm. We used a [2-42-1] architecture for the Levenberg-Marquardt network. The training and test results are illustrated in Figures (11-20) and Table 3. Since all the neural network models, which we considered were taking more than 30 minutes for training, the training time for each model is not reported.

9

Figure 11: Training and test results - backpropagation training using learning rate and momentum

Figure 12: Training and test results - Backpropagation training using variable learning rate

Figure 13: Training and test results - resilient backpropagation training

Figure 14: Training and test results – conjugate gradient algorithm using Fletcher-Reeves update

10

Figure 15: Training and test results – conjugate gradient training using Polak – Ribiere update

Figure 16: Training and test results – conjugate gradient algorithm using Powell-Beale restarts

Figure 17: Training and test results - scaled conjugate gradient algorithm

Figure 18: Training and test results - one step secant algorithm

11

Figure 19: Training and test results - Quasi-Newton based conjugate gradient algorithm

Figure 20: Training and test results - Levenberg-Marquardt algorithm Table 3. Comparative performance of neural networks using different learning algorithms Learning algorithm used

Epochs

BP algorithm (normal) BP algorithm with variable learning rate Resilient BP algorithm CGA algorithm - Fletcher-Reeves update CGA algorithm - Polak – Ribiere update CGA algorithm – Powell-Beale restarts Scaled conjugate gradient algorithm One step secant algorithm Quasi-Newton based CGA Levenberg-Marquardt algorithm

8.

5000 5000 5000 5000 871 4416 5000 5000 1000 1000

Performance achieved (RMSE) Training Test 0.0380 0.0364 0.0360 0.0360 0.0271 0.0329 0.0270 0.0320 0.0285 0.0308 0.0239 0.0306 0.0232 0.0357 0.0265 0.0309 0.0249 0.0352 0.0263 0.0300

Conclusions

In this paper, we proposed different connectionist models for predicting the power factor and the reactive power requirement of an automobile industry. Compared to ANN, neuro-fuzzy models have an upper edge in terms of performance time and low RMSE error. Moreover, neuro-fuzzy systems are able to reason about its states unlike an ANN. EFuNN is adaptive and capable of online learning. The performance of the EFuNN can be further enhanced by suitable selection of evolving parameters such as the sensitivity threshold, error threshold, learning rate and forgetting rate. However, finding the optimal values of the evolving parameters is a difficult task. Compared to BP training the CGA and Quasi-Newton methods are computational expensive. CGA tend to perform better than the BP algorithm and its variants. LM algorithm has the fastest convergence with the lowest RMSE test error To test the learning capability of connectionist systems, we considered random values of input parameter voltage to test the learning ability of connectionist systems during worst conditions. The performance could have been even better if the observed rather than fluctuated values of voltage were used. Moreover, the considered connectionist models are very robust, capable of handling the noisy and approximate data that are typical in power systems, and therefore should be more reliable during worst conditions. 12

References [1]

Nauk D, Klawonn F and Kruse R, Foundations of Neuro Fuzzy Systems, John Willey & Sons, 1997.

[2] Miller T J E, Reactive Power Control in Electric Systems, Wiley – Interscience, 1982. [3] Mamdani E H and Assilian S, An experiment in Linguistic Synthesis with a Fuzzy Logic Controller, International Journal

of Man-Machine Studies, Vol. 7, No.1, pp. 1-13, 1975. [4] Cory B J, Weedy B M, Electric Power Systems, (4th Edition), John Wiley & Sons; 1998. [5] Cherkassky V, Fuzzy Inference Systems: A Critical Review, Computational Intelligence: Soft Computing and Fuzzy-

Neuro Integration with Applications, Kayak O, Zadeh LA et al (Eds.), Springer, pp.177-197, 1998. [6] Abraham A and Nath B, Artificial Neural Networks for Intelligent Real Time Power Quality Monitoring Systems, First

International Power & Energy Conference, INT-PEC'99, 1999. [7] Sugeno M, Industrial Applications of Fuzzy Control, Elsevier Science Pub Co., 1985. [8] Abraham A and Nath B, Designing Optimal Neuro-Fuzzy Systems for Intelligent Control, The Sixth International

Conference on Control, Automation, Robotics and Vision, (ICARCV 2000), December 2000. [9] Zurada J M, Introduction to Artificial Neural Systems, PWS Pub Co, 1992. [10] Sheble G B, Reactive Power: Basics, Problems and Solutions, IEEE Press, 1987. [11] Skapura D M, Building Neural Networks, Addison-Wesley Publishing Company, 1996. [12] Kasabov N, Evolving Fuzzy Neural Networks - Algorithms, Applications and Biological Motivation, in Yamakawa T and

Matsumoto G (Eds), Methodologies for the Conception, Design and Application of Soft Computing, World Scientific, pp. 271-274, 1998. [13] Hagan M T, Demuth H B and Beale M H, Neural Network Design, Boston, MA: PWS Publishing, 1996. [14] Riedmiller M and Braun H, A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm, In

Proceedings of the IEEE International Conference on Neural Networks, 1993. [15] Powell M J D, Restart procedures for the Conjugate Gradient Method, Mathematical Programming, Volume 12, pp 241-

254, 1977. [16] Moller A F, A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning, Neural Networks, Volume (6), pp.

525-533, 1993. [17] Dennis J E and Schnabel R B, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood

Cliffs, NJ: Prentice Hall, 1983. [18] Battiti R, First and Second Order Methods for Learning: Between steepest descent and Newton's Method, Neural

Computation, Vol. 4, No 2, pp. 141-166, 1992. [19] Jang R, Neuro-Fuzzy Modeling: Architectures, Analyses and Applications, PhD Thesis, University of California,

Berkeley, July 1992. [20] Back Thomas, Evolutionary Algorithms in Theory and Practice, Oxford University Press, 1996.

13