A Comparison of Selected Artificial Neural ... - Wiley Online Library

32 downloads 1160 Views 1MB Size Report
35 Broad Street, Atlanta, GA 30303-4321, email: [email protected] ... Evaluating financial viability and expressing an opinion about a client's financial health is ...
Decision Sciences Volume 31 Number 2 Spring 2000 Printed in the U.S.A.

A Comparison of Selected Artificial Neural Networks that Help Auditors Evaluate Client Financial Viability Harlan L. Etheridge Department of Accounting, College of Business, University of Louisiana at Lafuyette, Lafayette, LA 70504

Ram S. Sriram School of Accountancy, J. Muck Robinson College of Business, Georgia State Universiv, 35 Broad Street, Atlanta, GA 30303-4321, email: [email protected]

H. Y. Kathy Hsu Department of Accounting, College of Business, University of Louisiana at Lafayette, Lafayette, LA 70504

ABSTRACT This study compares the performance of three artificial neural network (ANN) approaches-backpropagation, categorical learning, and probabilistic neural networkas classification tools to assist and support auditor’s judgment about a client’s continued financial viability into the future (going concern status). ANN performance is compared on the basis of overall error rates and estimated relative costs of misclassification (incorrectly classifying an insolvent firm as solvent versus classifying a solvent firm as insolvent). When only the overall error rate is considered, the probabilistic neural network is the most reliable in classification, followed by backpropagation and categorical learning network. When the estimated relative costs of misclassification are considered, the categorical learning network is the least costly, followed by backpropagation and probabilistic neural network.

Subject Areas: Artificial Intelligence, Auditor Judgment, Decision Support, and Financial Distress.

INTRODUCTION Auditors, as part of the audit process, must express an opinion as to whether the audited client will continue as a going concern over the next year. The going concern assumption is that a business will continue to operate for an indefinite period (Needles, Anderson, & Caldwell, 1987). The Statements of Auditing Standards (SAS) No. 58 (Reports on Audited Financial Statements) (AICPA, 1988b) and No. 59 (The Auditor’s Consideration of an Entity S Ability to Continue as a Going

53 1

532

A Comparison of Selected Arrijicial Neural Networks

Concern) (AICPA, 1988c) require auditors to form a judgment and to state their opinion regarding this assumption. Evaluating financial viability and expressing an opinion about a client’s financial health is not easy because auditors do not examine 100%of a client’s records (Dilla, File, Solomon, & Tomassini, 1991). Records can be incomplete or inadequate, and the data provided are subject to uncertainties (about, for example, a firm’s future ability to finance its operations or interest rate changes). If auditors issue an opinion indicating that a client will remain financially healthy, but the client fails soon after, the auditors could be held liable for losses suffered by the stockholders and creditors (in the form of a decline in market value of stocks or decline in the value of assets). Conversely, if auditors express doubts about the continued financial viability of a client, this could destabilize the client financially, leading perhaps to revocation of lines of credit by suppliers and other creditors; again, the auditors could be held liable for losses (Palmrose, 1987, 1988). The literature indicates auditors do issue inappropriate opinions regarding a client’s solvency (Altman & McGough, 1974; Menon & Schwartz, 1987; Koh, 1991). Menon and Schwartz, for example, reported that in a sample of 147 f m s filing for bankruptcy, auditors had identified financial viability problems and expressed a negative going concern opinion only in 63 cases. Researchers who have experimented with financial opinion distress models and statistical techniques to improve the tools available to auditors evaluating the financial viability of audit clients include Altman (1968); Altman, Haldeman, and Narayanan (1977); Ohlson (1980); Hamer (1983); Jones (1987); Hopwood, McKeown, and Mutchler (1989); Gilbert, Menon, and Schwartz (1990); Mutchler and Williams (1990); Bell (1991); Hooks (1992); Boritz and Kennedy (1995); and Etheridge and Sriram (1996,1997). Logit and multivariate discriminant analysis (MDA) are the two most frequently used techniques in financial distress modeling studies (Sinkey, 1975; Altman et al., 1977; Martin, 1977; Ohlson, 1980; Jones, 1987; and Gilbert et al., 1990). Both logit and MDA provide reliable outputs with fewer classification or prediction errors for classifying a holdout sample of firms as failed or healthy. These techniques are complex, however, and also suffer other limitations, requiring researchers to seek other techniques. A nonparametric technique that researchers find useful is Artificial Neural Networks, or ANNs (Parker & Abramowicz, 1989; Bell, 1991; Liang, Chandler, Han, & Roan,1992; Fanning & Cogger, 1994; and Etheridge & Sriram, 1997). Backpropagationis the most common ANN for financial distress modeling. Several recent studies in accounting have used backpropagation A ” s to categorize firms as failed or nonfailed, with varying degrees of success (Bell, Ribar, & Verchio, 1990; Tam & Kiang, 1992; and Klersey & Dugan, 1994). Most of these studies found that backpropagation ANNs performed at least as well as traditional statistical techniques in categorizing firms. A backpropagation ANN is designed to forecast continuous output values based on a set of continuous input parameters, so it may not be well suited for a binary outcome such as classifying a firm as failed or not. However, because backpropagation ANNs have been used in prior studies investigating financial distress, we include them in our study. This allows us to compare backpropagation with that of ANNs specifically designed to solve classification problems.

Etheridge, Srirarn, and Hsu

533

Several ANNs are designed to place observations into discrete categories. Most ANNs designed for classification can be put into two categories-those that use a form of competition among processing elements and those that use a form of Bayesian classifier technique. Categorical learning is a type of ANN that uses competition among processing elements, and the probabilistic neural network uses the Bayesian classifier technique. Because these two ANNs are designed to group data into discrete groups, they may be able to classify firms as failed or nonfailed with fewer errors than a backpropagation ANN. We compare the performances of the three ANNs-backpropagation, categorical learning network, and probabilistic neural network-because we believe auditors may find them useful in evaluating the financial health of a client. We train the three ANNs using financial data from a sample of failed and healthy banks. After training, they are tested on a holdout sample of failed and financially healthy banks. We compare their classification reliability in terms of overall error rates. Our results indicate that the estimated overall error rate is lowest for the probabilistic ANN, followed by backpropagation and categorical learning ANNs. For auditors, the costs of incorrectly classifying a failed bank as solvent (Type I error) or incorrectly classifying a solvent bank as failed (Type I1 error) are not equal. Costs, in this context, refer to the probabilistic increase in liability that auditors might incur if they incorrectly assess the financial health of an audited client. Research indicates that auditors consider incorrectly judging a failed firm as solvent (Type I error) to be costlier than incorrectlyjudging a solvent firm as failed (Qpe I1 error) (Sinkey, 1975; Altman et al., 1977; Hopwood et al., 1989; Wilson & Sharda, 1994). Because auditors are more likely to be concerned about Type I errors than Type I1 errors, we compare the estimated relative costs (Type I to Type I1 error ratios) of using each ANN. Because the actual costs of 5 p e I and Type I1 errors are mostly unobservable or unknown, the literature suggests the use of estimated relative costs (Altman et al., 1977; Zmijewski, 1984; Dopuch, Holthausen, & Leftwich, 1986; Hopwood et al., 1989; and Koh, 1992). Our results show that the categorical learning ANN has the lowest estimated relative cost, followed by probabilistic ANN and backpropagation ANN.

EVALUATING FINANCIAL VIABILITY Auditing standards require that auditors express an opinion regarding the client’s continued financial viability. Auditors are rightfully careful in this regard. If they express unwarranted doubts about the’continuedfinancial viability of a client, the firm’s creditors are likely to deny credit, the firm’s stock value may decline, and investors will lose confidence in the firm. The firm could be forced into financial failure because users of the firm’s financial statements may react to the auditor’s report as if the firm had already failed. This reaction to the auditor’s report is known as a self-fulfilling prophecy. Yet, if auditors fail to express doubts regarding the future financial viability of a failing client, they could be held liable for negligence as well as losses suffered by creditors and stockholders (Palmrose, 1987, 1988). Because the consequences of indicating that a client may fail are so telling (a negative going concern opinion), auditors presumably express such an opinion

534

A Comparison of Selected Artificial Neural Networks

only when they are certain of a client’s near bankruptcy (Koh, 1992; McKeown, Mutchler, & Hopwood, 1991). Then again, the low percentage of negative going concern opinions may be attributed to the failure of auditors to recognize an impending financial failure (Altman & McGough, 1974; Menon & Schwartz, 1987; and Koh, 1991,1992). To evaluate a client’s financial condition, auditors rely on the audit evidence and also on various analytical techniques. The auditing standards SAS 56 (AICPA, 1998a) and SAS 59 (AICPA, 1998c)require that auditors use analytical techniques to gather sufficient evidence while making an informed decision regarding the future financial viability of audit clients. These analytical techniques include both financial distress models and statistical techniques. As Kida (1984, p. 147) pointed out, “[ D]iscussions with audit partners revealed that prediction models are . . . used in actual going concern decisions.” Other studies provide similar evidence of auditors’ use of prediction models and techniques (Graham & Horner, 1988; Koh & Killough,l990; and Koh & Oliga, 1990). Researchers and practitioners have long been interested in the use of prediction models and statistical techniques to improve the audit decision process. For example, since Beaver (1966) published his seminal work on bankruptcy prediction, several studies have modeled financial distress in order to enhance the reliability of analytical techniques (Altman, 1968; Elam, 1975; Martin, 1977; Ohlson, 1980; Hamer, 1983; Jones, 1987; and Barr & Siems, 1994). Most of these studies use financial ratios as variables to model financial viability, and use bankruptcy as a surrogate variable for financial problems (Jones, 1987). The financial ratio variables usually provide high prediction rates (Beaver, 1966; Altman et al., 1977). Some studies have used nonfinancial variables, including management turnover or financial market-related variables (Dopuch et al., 1986), client size (McKeown et al., 1991), or reorganization (Mutchler, 1985). These studies do not find that the inclusion of nonfinancial ratios improves prediction rates significantly (Mutchler). The sample of f m s typically used consists of a matched pair of failed and healthy firms, with financial ratio data for each firm. A set of firm characteristics, usually financial ratios considered relevant for evaluating financial health, is obtained for one or more years. The sample firms are then split into training and holdout samples, and the training sample is used to develop the financial distress models. The model is constructed one year later, after it is known whether the firm is still a going concern. The model is tested on the holdout sample of firms. For statistical purposes, the value for each characteristic is used to define a hyperplane such that firms above the hyperplane are going concerns and those below the hyperplane are not. The performance is evaluated according to the number of correct predictions of failed and healthy firms in the holdout sample.

MDA, LOGIT, AND ANN-A

COMPARISON

Most financial distress studies use either logit or MDA as the statistical technique, both to test the financial distress models and to predict the failed and healthy firms in the holdout sample. Both these techniques are assumed to perform reliably if

Etheridge, Sriram, and Hsu

535

they correctly classify most of the firms in the holdout sample (low misclassification errors). These are complex techniques, however, and have some limitations that raise questions about their reliability as statistical techniques. MDA and logit require the data to be multivariate normal and have equal covariance matrices, or that there be a log-linear relationship among the independent variables (Williams, 1985; Odom & Sharda, 1990; and Coats & Fant, 1993). Because financial distress studies often use dummy variables to group firms as failed or healthy, the data always violate the multivariate normality assumption (Altman et al., 1977; Jones, 1987). Frecka and Hopwood (1983) also found that many common financial ratios are not normally distributed. To a certain extent, deleting outlier observations and square root transformation of the variables can restore normality, but one must be careful not to omit important observations (Frecka & Hopwood; Jones, 1987). Similarly, real-world data often violate assumptions of equal covariance matrices (Pinches & Trieschmann, 1977; Altman et al., 1977; and Ohlson, 1980). Using quadratic discriminant analysis can reduce the problem of unequal covariance (Altman et al.). Anytime the data violates the assumptions of the model, it worsens the performance of logit and MDA and increases the misclassification errors (Kida, 1980; Mutchler, 1985; Bell & Tabor, 1991; Chen & Church, 1992; and Coats & Fant, 1993). Unlike logit or multivariate discriminant analysis, ANNs have certain attributes that make them more attractive as modeling techniques. ANNs do not require the data to be multivariate normal or have equal covariance matrices, or maintain a log-linear relationship among independent variables. They have attributes that make them more attractive as statistical techniques. They are good at pattern recognition. That is, from the underlying data, ANNs learn data patterns (variable relationships) and their association with financial failure or health. In turn, they use these patterns and relationships to classify new firms as either failed or healthy (Odom & Sharda, 1990; and Coats & Fant, 1993). ANNs also accommodate numerous variables, without the detraction of multicollinearity. Because they are nonlinear procedures, ANNs also are more versatile and robust than linear statistical techniques, and can use both quantitative and qualitative cues (Liang et al., 1992; and Etheridge & Sriram, 1997). ANNs have some limitations, too. ANNs require significant training time and computer resources. These requirements increase exponentially when continuous data is used. Therefore, before using ANN, it may be necessary to convert continuous financial ratio data into qualitative ranges such as high, medium, or low. Such conversions are mostly subjective and can lead to prediction errors. If auditors consider a financial indicator as too high or too low, this can, in turn, affect the prediction. Also, ANNs use past data patterns to predict the future. In other words, ANNs assume that the future will be like the past. If the environment changes, then ANNs may be no better than any other traditional statistical model at predicting the future (unless they are retrained). In certain attributes, ANNs are similar to MDA or logit. They require a sample of a training and a holdout sample of firms and a data set of financial information. The training sample and the financial data are used to train the ANNs, and the holdout sample is used to test and validate the ANNs.

536

A Comparison of Selected Art$cial Neural Networks

ARTIFICIAL NEURAL NETWORK PARADIGMS ANNs are computer-basedtechniques that process data in a parallel manner. They are trained to learn the relationships among variables and use this learning to recognize the presence or absence of similar patterns in other data. They perform even when the data are incomplete or noisy. ANNs are composed of a large number of highly connected, simple processing elements, or neurons. The processing elements in ANNs are organized into layers; each layer with numerous processing elements and different functions. The processing elements in each layer are also connected to the processing elements in the preceding and the successive layers, and sometimes even to the processing elements in the same layer. Each processing element processes the data it receives from its input connections, and then provides a signal to the other processing elements through its output connections. The strength of a connection between two processing elements is represented by a weight (value) that can vary depending on the strength of the signals received from other processing elements, among other things. Weights normally have values in the range of -1 to +l. A weight of -1 indicates that the connection is strongly inhibiting or that a signal received through that connection is unlikely to initiate a signal of its own. A weight with a value of +1 is strongly excitative in nature, meaning that a signal from this connection is likely to induce a processing element to transmit its own signal. If a processing element has more than one connection, the weights from those connections will be stored in a vector. For example, if a processing element has input connections from 100 other processing elements, then the strength of each connection will be represented by one weight in a vector of 100 weights. The vectors of weights associated with a processing element are sometimes called the “adaptive coefficients” of a processing element. These weights are adaptive in the sense that they can change in response to new stimuli. Qpically, the first layer in an ANN is the input layer. Its purpose is to accept inputs. The input layer processing elements then transmit signals to the second layer in the ANN, the hidden layer, where most of the learning takes place. After processing the signals from the input layer, the processing elements in the hidden layer transmit signals to the output layer. The processing elements in the output layer, after further processing the signals from the hidden layer, produce a result for the user. The key component that controls the functions of an ANN is a group of mathematical functions that include the summation function, the transfer function, and the learning law. Remember that a processing element receives input through its connections (these inputs can be either positive or negative). The summation function, when it receives the input, converts it to a single value, then sums the products of the input multiplied by their respective weights (the weights represent the strength of the connections). The transfer function then determines the output of the processing element, depending on the result of the summation function. Because the transfer functions are nonlinear, it is possible to use the ANNs for a wide range of problems. The learning law helps in revising the adaptive coefficients as a function of the input values or as a function of the feedback value. Revising or modifying the adaptive coefficients helps the ANN to learn.

Etheridge, Sriram, and Hsu

537

Improvements in training are monitored by observing the change in meansquared error. The objective is to minimize the mean-squared error. The term “error” refers to the difference between predicted result and actual result. For example, for a set of learning cases, an error occurs when the output values from the ANN for a case (such as the prediction of failure or nonfailure) do not conform to the actual values of the case (whether a firm failed or not). The learning law defines how weights are changed to reduce the number of times the ANN output does not conform to the actual output. Once it is trained, the ANN is ready to be tested on a holdout sample.

ANN and Financial Health Evaluation-An Illustration The auditor, before using the ANN, must obtain sufficient data on a group of firms that includes both failed (firms that filed for bankruptcy) and healthy firms. It is critical that the auditor obtain data on firms that are representative of the firms to which the ANN will be applied. Failure to do so can lead to incorrect decisions. The failed and healthy firms may be selected and matched on the basis of revenue, assets, or industrial classification code (SIC code). The auditor must obtain relevant financial ratio data on each failed and healthy firm to use as input data. Suppose the auditor chooses three financial ratios as input data: return on equity, working capital ratio (current assets over current liabilities), and return on assets. Data on these three financial ratios must be included for every healthy and failed firm included in the sample. Once the data are ready, they can be separated into training and holdout samples. The training sample can be used initially to train the ANN, and the holdout sample is used to test the ANN. The ANN, during training, learns the relationships between the independent variables (financial ratios) and the dependent variable (failed or healthy). Assigning weights to the financial ratios and associating them with a failed or healthy firm is part of the ANN training. The ANN begins to associate certain ratio relationships with a failed or a healthy firm. By constantly adjusting the weights, the ANN learning can be improved. That is, the ANN begins to consistently produce the known output (failed or healthy) from the given inputs. Once an ANN is trained, it is ready for testing on the holdout sample. The ANN uses financial ratios on unidentified firms and places them into failed or healthy categories. By comparing the predicted categories to the actual categories, the error rates and ANN performance can be determined. Auditors may not use the ANN output identifying a client as healthy or failing as the sole reason to express an opinion on the client’s financial status. If the audit evidence overwhelmingly points to a financially healthy client or to a financially weak client, the auditors will have only limited use for the ANN. At the most, they may use the ANN output as one more document to support their opinion. When there are significant uncertainties surrounding a client’s financial status or when there are lingering concerns about the financial health of a client, for the auditors, the ANN can be a useful audit tool. If the ANN identifies the client as healthy, an auditor can use it to reinforce and support the evaluation of the client. If the ANN identifies the client as failing, an auditor can use the ANN output as a factor to collect more audit evidence and to investigate the client further before expressing an opinion.

538

A Comparison of Selected Artificial Neural Networks

Problem Domain and ANN Choice ANNs differ in their models and architectures. The differences in ANN models and architectures influence their suitability for various uses. Selection of the wrong ANN could complicate interpretation of the output; also, the results might not be reliable. Backpropagation ANNs, for example, are more suited for prediction than classification problems. Because the backpropagation ANN strives to minimize the mean-squared error in the network and provides continuous output, the user is expected to manually select cutoff points to differentiate between groups. Such a selection can be arbitrary and “can lead to irregular boundaries and unexpected classification regions due to the distributed nature of the network and the unboundedness of the transfer functions” (Neuralware, 1996, p. 22). Another example is that classification ANNs such as categorical learning networks or probabilistic neural networks are trained to categorize data patterns into discrete categories and to provide discrete output values; for example, 1,2, or 3. This makes them more suitable for problems requiring the classification of a sample into discrete groups. There are also other differences in ANN architecture that influence ANN performance. Learning laws typically vary across models, and even within the same set of learning laws, there are several variations. The learning laws influence the approach to problem solving. ANNs may also require different numbers of layers, and may use different types of transfer, summation, or output functions. Two ANNs constructed using the same general approach, for example, backpropagation, may even vary in the numbers of processing elements and layers used. All of these factors must be considered when choosing an ANN. The type of problem to be solved is probably one of the most important factors to consider. ANN choice must not be arbitrary and must be guided by the problem domain.

Three ANN Paradigms Suitable for Evaluating Financial Viability The Neural Works Professional IVPlus software used in this study provides us a choice of over 30 ANN paradigms. We use only three major approaches that we believe are useful to an auditor to evaluate financial viability of a client. The three approaches are: backpropagation (BPN), categorical learning network (CLN), and probabilistic neural network (PNN). The three ANNs also represent three different problem-solving approaches. BPN approaches problem solving by minimizing the total network error through a gradient-descent learning law. CLN uses competition in its pattern layer to develop a mapping of input patterns (we also add a learning component to the output layer to improve the performance). PNN uses the Bayesian classifier technique to distinguish between categories of items. Inclusion of BPN also allows us to compare the results of our study with numerous prior studies that use backpropagation. In Table 1, we provide a comparison of the architecture of the three ANNs used in this study.

Backpropagation ANNs (BPN) Suppose the auditor decides to use three financial ratios as the input variables (return on equity, working capital ratio, and return on assets), and failed (1) or

539

Etheridge, Sriram, and Hsu

Table 1: A comparison of BPN, CLN, and PNN architectures. ANN Paradigm Basic Architecture Backpropagation Input layer (BPN) One hidden layer One output layer

.

. 0

Categorical Learning Network (CLN)

Input layer Normalization layer Pattern layer Output layer

0

0

Probabilistic Neural Network (P”)

Input layer Normalization layer Pattern layer Summation layer Output layer

0

0

0

Remarks Each input layer connects all processing elements in the hidden layer. Each hidden layer connects all processing elements in the output layer. Processing elements associated with feedback value connected to all processing elements in both hidden and output layers. One output layer is adequate to distinguish between failed (1) and healthy firms (0). Number of processing elements in hidden layer is by trial and error. Too few processing elements reduce learning. Too many processing elements lead to “grandmothering” (memorization) of the patterns. Grandmothering reduces ability to generalize knowledge. The actual output is likely to be between 0 and 1; closer to zero represent healthy firms, and closer to 1 are failing firms. Numbers of processing elements in input layer equal number of input variables. Normalization layer helps the pattern layer to develop correct classes for the input vectors. Pattern layer uses unsupervised learning to distinguish between categories. Include as many processing element in pattern layer as output classes (e.g., two processing elements to categorize failed and healthy firms). Similarly, include as many processing elements in output layers as there are categories. Uses a Bayesian approach to classification. Minimizes classification errors. Summation layer helps in classifying firms as failed or healthy. Output or classification layer uses signals from the summation layer to determine the correct group.

healthy (0) as output variables. For the BPN architecture, they would use an input layer with four processing elements; three for the input variables, and one for the feedback value (failed or healthy). Since the input layer in a BPN does not learn, there is no need to use a learning law. The output processing element can be trained (using the feedback values) to signal the value of zero for firms not expected to fail and 1 for firms expected to fail. During training, caution must be taken t o present the input observations randomly to prevent the BPN from learning the pattern for which the inputs are

540

A Comparison of Selected Artificial Neural Networks

presented, and use that information in determining its output. (Similar precautions must be taken when training CLNs or PNNs.) The progress of the training can be evaluated using performance monitors. ANN software generally provides several types of network performance monitors, and the auditor can choose the monitor that provides adequate information. The monitors include confusion matrices, Pearson’s R coefficient, and network error graphs. If the BPN learning is unsatisfactory, auditors can halt the training, randomize the weights in the network, and begin the training again. Auditors can stop the training when they believe that the BPN is adequately trained, according to their interpretation of the performance monitors, or when the root mean square error of the network reaches a level prespecified by the auditor. After training the ANN, the auditor uses the holdout sample to test the BPN. The auditor can compare the BPN’s output with actual outcomes to calculate the total number of misclassifications and the number of Type I and Type I1 misclassifications. If the number and type of misclassifications are acceptable, the BPN can be retained for future use. Otherwise, the auditor can modify the ANN or discard it.

Categorical Learning ANNs (CLN) Categorical learning network, like backpropagation network, is trained using a training sample, performance monitors, and a holdout sample. We discuss only those attributes that are unique to CLN. If auditors use three financial variables as input, and failed or nonfailed as output, the CLN, unlike BPN, would require four layers: an input layer, a normalization layer, a pattern layer, and an output layer. The normalization layer, by normalizing all input vectors, ensures that the larger value input vectors do not overwhelm smaller value input vectors. This helps to reduce classification errors. The pattern layer, by requiring the processing elements to compete with each other during learning, again improves network performance. CLN does not require as many data iterations as BPN to learn the data patterns adequately. If there are many processing elements in the pattern layer, however, each iteration through the training sample may take longer than training the BPN.

Probabilistic Neural Network (PNN) Unlike both backpropagation network and categorical learning network, probabilistic neural network requires five processing element layers: an input layer, a normalization layer, a pattern layer, a summation layer, and an output layer. Although PNN uses a pattern layer similar to CLN, it uses a Bayesian approach to classify objects. The summation layer, using various mathematical functions, summarizes the outputs into a specific category (either failed or nonfailed) and helps the output layer in generating the correct classification. PNN, similar to CLN, does not require as many data iterations as BPN to adequately learn the data patterns. Again, if there are numbers of processing elements in the pattern layer, each iteration through the training sample may take longer than with a BPN.

Etheridge, Sriram, and Hsu

54 1

DATA AND METHODOLOGY A Big Six CPA firm provided the data used in this study. The data consist of 57 financial ratios for the years 1986-1988 for 1,139 banks in various regions of the U.S. The original data set consisted of 991 healthy banks and 148 failed banks. We use the FDIC definition of failure to operationalize failed banks as assisted mergers and liquidated banks (FDIC Statistics on Banking, 1992). The FDIC assumed the operations of the failed banks in 1989, so we use the years 1986, 1987, and 1988 to represent three years, two years, and one year prior to failure. Two financial ratios, INTEXDEP and VOLATILE, are excluded from the sample because of incomplete data. The final training sample consists of 1988: 749 healthy and 114 failed banks; 1987: 752 healthy and 115 failed banks; and 1986: 776 healthy and 116 failed banks. The holdout sample consists of 192 healthy and 23 failed banks for each of the three years. In the interest of space, we have excluded the descriptive statistics. Interested readers may contact the authors. For the purpose of analysis, we use prior probabilities of .02 and .98 for failure and nonfailure, respectively, to construct probabilistic neural network and to calculate estimated overall error rates. Watts and Zimmerman (1986) stated that these are the probabilities of failure and nonfailure in the real world, while Sinkey (1975) pointed out that over time, fewer than 2% of all banks are problem banks. CLN includes an Instar learning algorithm to improve its performance (Etheridge 8z Sriram, 1996, 1997). The network training is monitored using root mean square error plots and Pearson’s R coefficients.

RESULTS The primary objective of this study is to compare the performance of the three ANNs in terms of their overall error rates and Type I and Type I1 error rates. The estimated overall error rate (EOER) is defined as: EOER = [Type I x (Probability of Failure)] + [Type I1 x (Probability of Nonfailure)]. Type I error is defined as classifying a failed bank as a nonfailed bank, and a Type I1 error is defined as classifying a nonfailed bank as a failed bank. Table 2 presents a comparison of the error rates for the three ANNs. The estimated overall error rate is low for all three ANNs. Backpropagation network and probability neural network yield lower estimated overall error rates (ranging from 2.4% one year before failure to 6.57% three years before failure). Figure I shows a comparative plot of the estimated overall error rates. If the estimated overall error rate is adequate to determine the reliability of a financial distress model, we can conclude that both backpropagation network and probability neural network perform well. In fact, overall error rates are inadequate because they do not consider the rates of misclassifying failed and nonfailed banks (Type I and Type I1 errors). Therefore, we also compare the ANNs on the basis of Type I and Type I1 error rates, although we did not specifically train the three ANNs to reduce Type I or Type I1 errors. The three ANNs were only trained to minimize overall error rates.

A Comparison of Selected Artificial Neural Networks

542

Table 2: Estimated error rates (percentages). ANN Model BPN

Year 1988 I987 1986

TYLpeI 13.04 43.48 47.83

5 P e 11 3.65 4.69 5.73

EOER 3.83 5.46 6.57

CLN

1988 1987 1986

0.00 17.39 21.74

7.29 7.81 11.98

7.15 8.00 12.17

PNN

1988 1987 1986

43.48 47.83 52.17

1.56 3.13 3.13

2.40 4.02 4.11

Both backpropagation network and probability neural network have low Type I1 error rates (see Table 2) and perform more reliably than categorical learning network in categorizing nonfailed banks in the holdout sample. The Type II error rate for backpropagation network is lower than 4% one year before failure and lower than 6% three years before failure. For probability neural network, the Type I1 error rate is lower than 2% one year before failure and lower than 4% three years before failure. Backpropagation network and probability neural network do not perform as reliably as categorical learning network in correctly classifying failed banks. Both ANNs have higher Type I error rates than categorical learning network (ranging from 13% to 52%). Categorical learning network is definitely more reliable in classifying failed banks in the holdout sample than backpropagation network or probability neural network. It is important to note that this discussion of the performance reliability of the three ANNs is applicable only to the set of data and training conditions that we used in this study, and may not be generalizable to other data or training conditions. Figure 1 also presents comparative plots of Type I and Type I1 error rates, respectively, for the three ANNs. While backpropagation network and probabilistic neural network are more reliable in classifying the nonfailed banks in the holdout sample, and categorical learning network is more reliable in classifying the failed banks, is the type of error rate adequate as an indicator of performance reliability, or should we consider the costs of misclassification? Prior studies with concerns about misclassification costs or Type I and Type I1 error costs include Odom and Sharda (1990), Coats and Fant (1993), and Etheridge and Sriram (1997). Auditors do not view Type 1 and Type I1 error costs as equal. As stated earlier, cost in this context refers to the probabilistic increase in liability that might be incurred by the auditors if they incorrectly assess the financial health of an audited client. Most decision makers perceive a Type I error (incorrectly classifying a failed bank as nonfailed) as costlier than a Type I1 error (incorrectly classifying a nonfailed bank as failed). Therefore, a financial distress model with a smaller Type I error is likely to be viewed as more helpful than a financial distress model with a larger Type I1 error.

Etheridge, Srirarn, and Hsu

543

Figure 1: Comparative plots of error rates for the ANNs. a. Estimated Overall Error Rates

BPN Q al

* 0.20

wE

L ---

..-.- -.- --

0-1° 0.00

One

Two

Three

- . . - - - -CLN

----

PNN

Years Prior to Failurn b. Type I Error Rates I

' hQ

c,

0 L

t

1.oo 0.50 0.00

BPN

__-_----------

One

Two

Three

Years Prior to Failure

...---. CLN

----

PNN

I

c. Type II Error Rates Q

c,

0.15

;0.05 2

_......._. -.-•

..-....

+-

0.00

om

TWO

---Thee

Yeam Prior to Fail=

Because the actual costs of Type I and Type I1 errors are mostly unobservable or unknown (Koh, 1992), researchers use relative cost ratios (ratio of Type I to Type I1 error rates) to study the costs of misclassification. Hopwood et al. (1989) suggested using Type m p e I1 cost ratios of 1:1, 10:1, 20: 1, 30:1, 40: 1, and 50: 1. Although decision makers are unlikely to use symmetrical cost preferences, Hopwood et al. included the 1:1 ratio to allow comparison with prior studies. Altman et al. (1977), Zmijewski (1984), and Dopuch et al. (1986) also used similar cost ratios. We follow the same approach.

544

A Comparison of Selected Artificial Neural Networks

The estimated relative cost (RC) of using a financial distress model is computed as: RC = (PI x CZ) + (PI1 x CZZ),

where PI is the probability of a Type I error, CZ is the relative cost of a Type I error, PZZ is the probability of a Type I1 error, and CZZ is the relative cost of a Type II error (Koh, 1992).Table 3 presents acomparison of the performance rankings of the three ANNs for the various cost ratios. To compare the performance of the ANNs, we compute a simple sum of the ranks of the models for the three years and for different relative costs. Because we use the relative cost of 1:1 only for comparison with prior studies, we exclude the rankings for relative cost of 1:1 from the rank-sum measure. The rank-sum measure yields a score of 18 for categorical learning network, 3 1 for backpropagation network, and 41 for probabalistic neural network. A lower rank-sum indicates low relative costs or better performance. Overall, categorical learning network performs the best with the lowest relative costs, followed by backpropagation network and probabalistic neural network. We also test whether the differences in relative costs across models are statistically different. Table 4 presents these results. Table 4a lists the yearly means of relative costs across cost ratios, while 4b presents the one-tailed r-tests by year across model means. The r-tests basically support the simple rank-sum score. One year prior to failure, categorical learning network, with a low relative cost, is the best performer followed by backpropagation network and probabalistic neural network. Two years prior to failure, categorical learning network, again with a low relative cost, performs better than backpropagation network and probabalistic neural network. Three years prior to failure, there are no significant differences in the relative costs for any of the ANN models.

DISCUSSION AND CONCLUSIONS We pursued this study with two objectives: (1) to discuss how auditors can use nonparametric ANNs as analytical techniques during a going concern evaluation, and (2) to compare the performance of three ANN techniques in terms of error rates and relative costs. As a secondary objective, we also evaluate the suitability of the backpropagation ANN for a classification problem. When the relative cost ratios are low, the probabilistic neural network and the backpropagation perform well in our tests. As the relative cost ratios increase, the categorical learning network performs better than either the backpropagation ANN or probabilistic neural network. Auditors performing a going concern evaluation would choose a categorical learning network over backpropagation or probabilistic neural network in order to minimize the costs associated with an incorrect assessment of a client’s financial health. Backpropagation network is frequently used in financial distress and other classification studies. While by its architecture it is more suited for prediction problems than classification problems, it appears to have performed reasonably

Etheridge. Sriram, and Hsu

545

Table 3: Model rank by estimated relative cost. Model BPN

CLN

PNN

Cost Ratio 1:l 1O:l 20: 1 30: 1 40: 1 50: 1

1988

2 1 2 2 2 2

1987 2 3 2 2 2 2

1986 2 2 3 2 2 2

3 3 1 1 1 1

18

1 1 2 3 3 3

41

1:l 10:l 20: 1 30: 1 40: 1 50: 1

3 2 1 1 1 1

3 1 1 1 1 1

1:l 10: 1 20: 1 30: 1 40: 1 50: 1

1 3 3 3 3 3

1 2 3 3 3 3

Rank Sum

31

accurately in classifying failed and nonfailed banks. It has low estimated overall error rates and also lower estimated relative costs than probabilistic neural network, a network that, by its architecture, is better suited for classification problems. A few constraints and limitations affect our results. First, the relative performance of the three ANNs would have been different if we had varied the prior probabilities of failure and nonfailure (.15 and .85 instead of .02 and .98). We adopted the probabilities used in other studies for reasons of comparability and generalizability. Second, the sample composition could have influenced the results. The sample includes more solvent banks than failed banks, and we do not use a one-to-one match of failed and nonfailed banks. While the greater proportion of nonfailed banks is more reflective of the real world, the unequal sample could have caused the ANNs to memorize the input patterns of nonfailed banks during training, instead of learning the relationship between the input and output variables. This might lead the ANNs to predict nonfailed banks more accurately than failed banks. Another limitation is that the data represent only a single industry, banking, and only for the years 1986, 1987, and 1988. This limits the generalizability of the results to other industries or to other periods. It would be worthwhile extending the tests to include other industries and other ANN models. The contribution of this study is, nevertheless, to introduce auditors to the value of ANNs with respect to going concern evaluations. Our results highlight the importance of error rates and show how different ANNs can reduce or increase the costs of misclassification. [Received: September 26,1996. Accepted: October 19, 1999.1

A Comparison of Selected Artijicial Neural Networks

546

Table 4: Differences in relative costs across models. a. Annual relative cost means Model 1988 BPN 0.1140 CLN 0.07 15 PNN 0.2762 b. t-statistics of differences between means Year Model 1988 BPN

1987 0.3068 0.1809 0.3 176

1986 0.3431 0.2478 0.3437

CLN -2.3057 (0.0412)

PNN -2.5 266 (0.0264) -3.3295 (0.0146)

-1.9011

-0.1179 (0.4545) -1.8990

CLN

1987

BPN

(0.0579)

CLN

(o.osso) 1986

BPN

-1.2823 (0.1235)

CLN

-0.0056 (0.4978) -1.1989 (0.1421)

Note: Numbers in boldface denote significance at the .05 level.

REFERENCES Altman, E. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4),589-609. Altman, E., Haldeman, R., & Narayanan, P. (1977). ZETA analysis: New model to identify bankruptcy risk of corporations. Journal of Banking and Finance, (June), 29-54. Altman, E., & McGough, T. (1974). Evaluation of a company as a going concern. Journal of Accountancy, 138(6),51-57. American Institute of Certified Public Accountants. (1988a). Analytical techniques. Statement on Auditing Standards No. 56. New York: AICPA. American Institute of Certified Public Accountants. (1988b). Reports on audited financial statements. Statement on Auditing Standards No. 58. New York: AICPA. American Institute of Certified Public Accountants. (1988~).The auditor's consideration of an entity's ability to continue as a going concern. Statement on Auditing Standards No. 59. New York: AICPA.

Etheridge. Sriram, and Hsu

547

Barr, R. S., & Siems, T. F. (1994). Predicting bank failure using DEA to quantify management quality. Federal Reserve Bank of Dallas Financial Industry Studies, 1, 1-31. Beaver, W. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4,71-111. Bell, T. B. (1991). Discussion of ‘Towards an explanation of auditor failure to modify the audit opinions of bankrupt companies.’ Auditing, A Journal of Practice and Theory, 29(2), 14-20. Bell, T. B., Ribar, G. S., & Verchio, J. (1990). Neural nets versus logistic regression: A comparison of each model’s ability to predict commercial bank failures. In R. P. Srivatsava, Proceedings of the 1990 Deloitte & ToucheKJniversity of Kansas Symposium of Auditing Problems, Lawrence, KS, 29-58. Bell, T. B., & Tabor, R. H. (1991). Empirical analysis of audit uncertainty qualifications. Journal of Accounting Research, 29(2), 350-37 1. Boritz, J. E., & Kennedy, D. B. (1995). Effectiveness of neural network types for prediction of business failure. Expert Systems with Applications, 9(4), 95112. Chen, K. C. W., & Church, B. K. (1992). Default on debt obligations and the issuance of going-concern opinions.Auditing, A Journal of Practice and Theory, 11(2), 30-49. Coats, P. K., & Fant, L. F. (1993). Recognizing financial distress patterns using a neural network tool. Financial Management, 22(3), 143-154. Dilla, W. N., File, R. G., Solomon, I., & Tomassini, L. A. (1991). Predictive bankruptcy judgments by auditors: A probabilistic approach. Auditing: Advances in Behavioral Research, I , 113-129. Dopuch, N., Holthausen, R. W., & Leftwich, R. W. (1986). Abnormal stock returns associated with media disclosures of ‘subject to’ qualified audit opinions. Journal of Accounting and Economics, 8(2), 93-1 17. Elam, R. (1975). The effect of lease data on the predictive ability of financial ratios. The Accounting Review, 50(1), 25-43. Etheridge, H. L., & Sriram, R. S. (1996). A neural network approach to financial distress analysis. Advances in Accounting Informution Systems, 4,201 -222. Etheridge, H. L., & Sriram, R. S. (1997). A comparison of the relative costs of financial distress models: Artificial neural networks, logit and multivariate discriminant analysis. Intelligent Systems in Accounting, Finance and Management, 6,235-248. Fanning, K., & Cogger, K. 0. (1994). A comparative analysis of artificial neural networks using financial distress prediction. Intelligent Systems in Accounting, Finance and Management, 3(4), 241-252. Frecka, T. J., & Hopwood, W. S. (1983). The effects of outliers on the cross-sectional distributional properties of financial ratios. The Accounting Review, 58, (l), 115-128.

548

A Comparison of Selected Artijicial Neural Networks

Gilbert, R., Menon, K., & Schwartz, K. B. (1990). Predicting bankruptcy for firms in financial distress. Journal of Business Finance and Accounting, I7( l), 161-171. Graham, F. C., & Homer, J. E. (1988). Bank failure: An evaluation of the factors contributing to the failure of national banks. Issues in Bank Regulation, I2(2), 8-12. Hamer, M. (1983). Failure prediction: Sensitivity of classification accuracy to alternative statistical methods and variable sets. Journal of Accounting and Public Policy, 2(4), 289-307. Hooks, L. (1992). A test of the stability of early warning models of bank failures. Federal Reserve Bank of Dallas Financial Industry Studies, 2, 1-25. Hopwood, W., McKeown, J., & Mutchler, J. (1989). A test of the incremental explanatory power of opinions qualified for consistency and uncertainty. The Accounting Review, 64( l), 28-48. Jones, F. L. (1987). Current techniques in bankruptcy prediction. Journal of Accounting Literature, 6, 13 1 - 164. Kida, T. (1980). An investigation into auditor’s continuity and related qualification judgments. Journal of Accounting Research, 22( l), 506-523. Kida, T. (1984). The effect of causality and specificity on data use. Journal of Accounting Research, 18(2), 145-152. Klersey, G. F., & Dugan, M. T. (1995). Substantial doubt: Using artificial neural networks to evaluate going concern. Advances in Accounting Information Systems, 3, 137-159. Koh, H. C. (1991). Model predictions and auditor assessments of going concern status. Accounting and Business Research, 84, 33 1-338. Koh, H. C. (1992). The sensitivity of optimal cutoff points to misclassification costs of type I and type I1 errors in the going-concern prediction context. Journal of Business Finance and Accounting, 19(2), 187-197. Koh, H. C., & Killough, L. N. (1990). The use of multiple discriminant analysis in the assessment of the going concern status of an audit client. Journal of Business Finance and Accounting, I7(2), 179-192. Koh, H. C., & Oliga, J. C. (1990). More on AUP17 and going-concern prediction models. Australian Accountant, 60(9), 67-7 1. Liang, T. P., Chandler, J. S., Han, I., & Roan, J. (1992). An empirical investigation of some data effects on the classification accuracy of Probit, ID3, and Neural Networks. Contemporary Accounting Research, 9(l), 306-328. Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of Banking and Finance, 1,249-276. McKeown, J. C., Mutchler, J. F., & Hopwood, W. (1991). Toward an explanation of auditor failure to modify the audit opinions of bankrupt companies.Auditing, A Journal of Practice and Theory, 10, 1-24.

Etheridge, Stiram, and Hsu

549

Menon, K., & Schwartz, K. B. (1987). An empirical investigation of audit qualification decisions in the presence of going concern uncertainties. Contemporary Accounting Research, 3(2), 302-3 15. Mutchler, J. F. (1985). A multivariate analysis of the auditor's going concern opinion decision. Journal of Accounting Research, 23(2), 668-682. Mutchler, J. F., & Williams, W. (1990). The relationship between audit technology, client risk profiles, and the going-concern opinion decision. Auditing, A Journal of Practice and Theory, 9,39-54. Needles, B. E., Anderson, H. R., & Caldwell, J. C. (1987). Principles of accounring (3rded.). Boston: Houghton-Mifflin. Neuralware Inc. (1996). The Neural Works Professional II/Plus. Carnegie, PA. Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. In Proceedings of the International Joint Conference on Neural Networks. Amsterdam, Netherlands. Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18( l), 109-13 1 . Palmrose, Z. (1987). Litigation and independent auditors: The role of business failures and management fraud. Auditing, A Journal of Practice and Theory, 6(2), 90-103. Palmrose, Z. (1988). An analysis of auditor litigation and audit service quality. The Accounting Review, 63( l), 55-73. Parker, J. E., & Abramowicz, K. F. (1989). Predictive abilities of three modeling procedures. Journal of American TaxationAssociation, 1, 37-53. Pinches, G. E., & Trieschmann, J. S. (1977). Discriminant analysis, classification results and financially distressed property-liability insurers. Journal of Risk and Insurance, 44(2), 289-298. Sinkey, J. F., Jr. (1975). A multivariate statistical analysis of the characteristics of problem banks. Journal of Finance, 30( 1), 21-36. Smith, S. D., & Wall, L. D. (1992). Financial panics, bank failures, and the role of regulatory policy. Economic Review (Federal Reserve Bank of Atlanta), 77(1), 1-11. Tam, K. Y., & Kiang, M. Y. (1992). Managerial applications of neural networks: The case of bank failure predictions. Management Science, 38(7), 926-947. Watts, R. L., & Zimmerman, J. L. (1986). Positive accounting theory. PrenticeHall contemporary topics in accounting series. Englewood Cliffs, NJ: Prentice-Hall. Williams, R. J. (1985). Learning internal representatives by error propagation. Znstitute for Cognitive Science report 8506, San Diego, University of California. Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems, 11(5), 544-557. Zmijewski, M. E. (1984). Essays on corporate bankruptcy. Unpublished doctoral dissertation. University Microfilms International, Ann Arbor, MI.

550

A Comparison of Selected Artificial Neural Networks

Harlan L. Etheridge is an associate professor of accounting at University of Louisiana at Lafayette. He received his PhD in accounting from Louisiana State University. Dr. Etheridge’s current research interests include the application of artificial neural network technology to accounting and auditing, and investigating the impact of differences in international accounting standards on financial markets. Ram S. Sriram is a professor at the School of Accountancy, J. Mack Robinson College of Business, Georgia State University. He received his PhD from the University of North Texas. His current research focuses on artificial intelligence techniques as it applies to accounting and the business process issues of electronic commerce. Kathy H. Y. Hsu is an assistant professor at the University of Louisiana at Lafayette. She received her PhD in accounting from the University of Houston. Her current research interests lie in the areas of international accounting, application of artificial intelligence to the capital markets and environmental accounting.