Artificial intelligence for - NCBI

2 downloads 253 Views 359KB Size Report
not stand up to clinical or statistical scrutiny. To ensure the validity of .... to outcome prediction for colorectal cancer patients in separate institutions. Lancet 1997 ...


Artificial intelligence for clinicians


On p. l 19 of this issue Dr Rae and her colleagues explore the potential value of a powerful artificial intelligence tool, artificial neural networks, in prediction of osteoporosis from clinical data. So far, artificial intelligence in medicine has trodden a rocky pathl. Take 'expert systems', for example; these are programs that attempt to encode explicit representations of human expertisel. In general, they are difficult to construct and fragile in operation; furthermore, the clinicians who feed in data are often unaware of the complex interactions between the variables used in the analysis and consequently provide the system with incorrect estimates of risk2. Despite the obvious mathematical limitations3'4, extensive work has been done with Bayesian analysis for clinical purposest. The disappointing conclusion is that, though simple programs and expert systems that used a Bayesian inference engine were useful as teaching aids, they did not consistently improve on the judgment of cliniciansl. Nevertheless, Bayesian theory remains a powerful statistical tool and Bayesian inference techniques have now even been used in neural networkss. The recognition of complex non-linear interactions within clinical datasets6'7 has led to the development of additional methods of analysis8'9. Artificial neural networks represent one end of a spectrum of techniques for analysis of complex data. This spectrum includes numerous different regression methods for the modelling required to produce predictions9'10, and other classification algorithms1l-5 can be used as well. However, the development of such powerful methods of analysis does not obviate the need for appropriate design and scientific rigour. The published work abounds with regression analyses and derived prognostic indices that do not stand up to clinical or statistical scrutiny. To ensure the validity of the final model, any regression analysis should be constructed in the manner of a research project that begins in the laboratory and proceeds to the clinicl1. Box 1 summarizes guidelines for the design of investigations into predictive factors10' 16, and these principles apply to the latest methods of analysis whose power enables them to produce results that, though superficially impressive, are not necessarily repeatable or valid. The way in which results are presented can also confuse the reader. For example, at first glance Rae and colleagues achieved remarkably accurate prediction of osteoporosis with their neural network; but, because of the low prevalence of the condition in the dataset, if all the patients had been predicted as not having osteoporosis, the overall

Volume 92

March 1999

accuracy would still have been around 70%, or close to the figure cited as the best response from the network. Such results are better expressed in terms of sensitivity and specificity, with corresponding predictive values and likelihood ratios or post-test odds, all of which can be given with their 95% confidence intervals17'18. It is also noteworthy that they took 20 of the 40 risk factors as being the most relevant and utilized only the 'most common'. Presumably it was the commonest occurring risk factors in the 274 cases that were analysed; and given that 84/274 patients were classified as osteoporotic, this yields an events-to-variable ratio of 1:2 for the total dataset, rising to 1:4 for the selected parameters. A suggested ratio of 1:10 has been established as a guideline for the development of multivariate analysis and subsequent prognostic indices1o, and investigators should never assume that the power of a neural network can make up for a small sample size. The manipulation of data can also be hazardous. Splitting groups into high risk and low risk according to a particular cut-off point is always questionable. There is no valid reason to split the group according to the median value, and calculating the most significant 'data-derived' cut-off point creates obvious bias in the final analysis19'20. The final model should always be validated prospectively on data not Box 1 Suggested guidelines for evaluation of prognostic factors Classification (i) Investigation: To investigate the association of a putative new factor with diagnosis or disease characteristic (ii) Exploration: To examine how the predictive power of a new prognostic factor relates to those already available (iii) Confirmation: To confirm the hypothesis generated in (i) and (ii) and estimate the magnitude of any effect (iv) Application: To establish treatment effects on those subsets of patients identified by the factor Design Each phase should be governed by the relevant:

(a) Biological hypothesis: Stated in advance of analysis (b) Sample size calculation: Numbers of patients required to assemble the required number of 'events' should be calculated with appropriate power calculations (c) Bias considerations: (i) Population should be representative with a minimum of missing data (ii) Assay-reliable, reproducible, blinded to clinical data (d) Statistically valid prespecified cut-off values Analytical methodology Appropriate statistical techniques should be specified at the design stage Reproducibility Validation on data unrelated to the construction of the original model


used for its construction8-10'16. Where this is not possible a technique known as bootstrapping, in which multiple random datasets are generated and tested, can be used to examine the stability of the final model10. Despite these reservations, the paper by Rae and others represents a brave attempt to exploit this rapidly developing area of non-linear analysis. In general, a neural network is potentially more successful than traditional statistical techniques when the importance of a particular variable is expressed as a complex unknown function of the value of the variable, when the prognostic impact of the variable is influenced by other prognostic variables, or when the prognostic impact varies over time21. There are, however, alternative statistical techniques for performing the complex analysis required to unravel non-linear relationships9-15; perhaps clinicians should now be exploring these newer analytical methods in addition to neural networks so that a fair comparison can be made22. Neural networks have been used for predicting outcome in many clinical areas23'24 and undoubtedly deserve further research and development. But whatever the statistical techniques applied, reliable information depends on adherence to common principles10'16. At the design stage of any prognostic-factor study a biomedical statistician should be consulted, along with a computer scientist if neural networks or other artificial intelligence techniques are to be used. Finally, such methods are by no means the last word: clinicians should be aware that alternative analytical techniques may yield still more information from their data. Acknowledgment Our work is supported by the British Oncological Association. P J Drew J R T Monson University of Hull Academic Surgery Unit, Castle Hill Hospital, Hull HU1 6 5JQ, UK E-mail: [email protected]

REFERENCES 1 Clancey WJ, Shortcliffe EH. Readings in Medical Artificial Intelligence: the First Decade. Reading, MA: Addison Wesley, 1984 2 Baxt WG. A neural network trained to identify the presence of myocardial infarction bases some decisions on clinical associations that differ from accepted clinical teaching. Med Decision Making 1994;14:217-22


Volume 92

March 1 999

3 Engle RL, Davis BJ. Medical diagnosis: present past and future I: present-concepts of the meaning and limitations of medical diagnosis. Arch Intern Med 1963;112:108-15 4 Engle RL. Medical diagnosis: present past and future III, diagnosisthe future including a critique on the use of electronic computers as diagnostic aids to the physician. Arch Intern Med 1963;112:126-39 5 MacKay DJC. Bayesian methods for backpropagation networks. In: Domay E, van Hemmen JL, Schulten K, eds. Modelsfor Neural Networks III. New York: Springer Verlag, 1994 6 Schipper H, Turley EA, Baum M. A new biological framework for cancer research. Lancet 1996;348: 1149-51 7 Goldberger AL. Non-linear dynamics for clinicians: chaos theory, fractals and complexity at the bedside. Lancet 1996;347: 1312-14 8 Bottaci L, Drew PJ, Hartley JE, et al. Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions. Lancet 1997;350:469-72 9 Clark GM, Hilsenbeck SG, Ravdin PM, De Laurentis M, Osborne CK. Prognostic factors: rationale and methods of analysis and integration. Breast Cancer Res Treat 1994;32:105-15 10 Simon R, Altman DG. Statistical aspects of prognostic factor studies in oncology. BrJ Cancer 1994;69:979-85 11 Harrell FE, Lee KL, Matchar DB, Reichart TA. Regression models for prognostic prediction: advantages, problems and suggested solutions. Cancer Treat Rep 1985;69:1071-7 12 Durrelman S, Simon R. Flexible regression models with cubic splines. Stat Med 1989;8:551-61 13 Schmoor C, Ulm K, Schumaker M. Comparison of the Cox model and the regression tree procedure in analysing a randomised clinical trial. Stat Med 1993;12:2351-66 14 Segal MR. Regression trees for censored data. Biometrics 1988;44: 35-47 15 Ciampi A, Lawless JF, McKinney SM, Singhal K. Regression and recursive partition strategies in the analysis of medical survival data. J Clin Epidemiol 1988;41:737-48 16 McGuire W. Breast cancer prognostic factors: evaluation guidelines. J Nat] Cancer Inst 1991;83:154-5 17 Altman DG, Bland MJ. Diagnostic tests 2: predictive values. BMJ 1994;308: 1552 18 Gardner MJ, Altman DG. Calculating confidence intervals for proportions and their differences. In: Gardner MJ, Altman DG, eds. Statistics with Confidence. London: BMJ Publishing Group, 1989: 28-33 19 Hilsenbeck SG, Clark GM, McGuire WL. Why do so many prognostic factors fail to pan out? Breast Cancer Res Treat 1992;22:197-206 20 Altman DG. Categorising continuous variables. Cancer 1992;64:975 21 De Laurentis M, Ravdin PM. Survival analysis of censored data: neural network analysis detection of complex interactions between variables. Breast Cancer Res Treat 1994;32: 113-18 22 Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks. Lancet 1995;346: 1075-9 23 Baxt WG. Application of artificial neural networks to clinical medicine. Lancet 1995;346: 1135-8 24 Burke HB, Goodman PH, Rosen DB, et al. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 1997;79:857-62