An artificial neural network improves prediction of ... - Springer Link

Eur Arch Otorhinolaryngol (2006) 263: 541–547 DOI 10.1007/s00405-006-0021-2

HEAD AND NECK ONCO L OGY

Andrew S. Jones Æ Azzam G. F. Taktak Timothy R. Helliwell Æ John E. Fenton Martin A. Birchall Æ David J. Husband Anthony C. Fisher

An artificial neural network improves prediction of observed survival in patients with laryngeal squamous carcinoma Received: 21 July 2005 / Accepted: 13 October 2005 / Published online: 5 May 2006 Ó Springer-Verlag 2006

Abstract The accepted method of modelling and predicting failure/survival, Cox’s proportional hazards model, is theoretically inferior to neural network derived models for analysing highly complex systems with large datasets. A blinded comparison of the neural network versus the Cox’s model in predicting survival utilising data from 873 treated patients with laryngeal cancer. These were divided randomly and equally into a training set and a study set and Cox’s and neural network models applied in turn. Data were then divided into seven sets of binary covariates and the analysis repeated. Overall survival was not significantly different on Kaplan–Meier plot, or with either test model. Although the network produced qualitatively similar results to Cox’s model it was significantly more sensitive to differences in survival curves for age and N stage. We propose that neural networks are capable of prediction in systems involving complex interactions between variables and non-linearity. Keywords Artificial intelligence Æ Laryngeal carcinoma Æ Complex systems modeling Æ Chaos

Background Life is arguably the greatest triumph of nature and the most complex system studied by scientists. The mathematics required to understand and describe it are equally challenging. Biologists have understood the concepts of A. S. Jones Æ A. G. F. Taktak Æ T. R. Helliwell J. E. Fenton Æ M. A. Birchall Æ D. J. Husband Æ A. C. Fisher Head and Neck Oncology Group, Faculty of Medicine, University of Liverpool, Liverpool, UK A. S. Jones (&) Section of Head and Neck Surgery, Faculty of Medicine, Clinical Sciences Centre, University Hospital Aintree, University of Liverpool, Longmoor Lane, Liverpool, L9 7AL, UK E-mail: [email protected]

probability and have harnessed this mathematics to convert Biology from a merely descriptive science into an exact one capable of prediction. The analysis of complex systems requires classification, explanation and prediction and for the latter various types of regression are typically used [1, 2]. These assume linearity in the system to be studied, as well as a simple mathematical relationship between the input and output variables [3]. As the methods used for analysing biological systems are probabilistic and statistical the techniques required analysing the survival of animals in health and disease are similarly statistical. Unsurprisingly these techniques are also applied to the study of course and outcome in patients with cancer. These statistical methods are now highly developed, almost universally used and appear accurate: why then do we require additional more complex techniques to advance the study of Biology and Medicine? The commonly used regression models employed for studying the association between variables and the failure of biological systems in disease are reaching their limit in today’s Biology. These models can only deal with a relatively small number of covariates and analysis of interactions is both crude and limited. When investigating failure the same problems apply and the performance of regression models sub-optimal. Also a number of mathematical and practical limitations apply including problems with unknown or assumed distributions, poor curve-fitting performance and dubious methodology in baseline hazard derivation. Neural networks have been used in medical research for the past 20 years for classification and for prediction of failure. They have found their most extensive role as tools for the classification in oncology but have still not been widely adopted for explanation or prediction. Although neural networks can address various multivariate problems, including complex multivariate regression, they have been criticised for not being statistically or mathematically rigourous. In response, statistical refinements have been incorporated, greatly enhancing their use in the biological sciences.

542

Whilst they are effective at classification and can facilitate decision making [4, 5] neural networks have been seen as less effective at prognostication [6–15] with only half of the published studies claiming networks to be superior to statistically based regression methods. However, many studies are flawed technically, mathematically or methodologically. It should be cautioned, however, that some authorities consider neural networks inappropriate for analysing survival data [16]. Squamous carcinoma of the larynx has several features that make it suitable for testing the relative accuracy of neural networks and regression models in predicting survival. It can be accurately staged using clinical and radiological techniques, distant metastases occur relatively late, and it has a short hazard period (4 years) facilitating reliable follow-up. In addition locoregional recurrence in laryngeal cancer is relatively easy to detect and typically occurs within 2 years. Finally our group possesses an accurate and audited dataset of over 1,000 patients with laryngeal squamous carcinoma who have been rigorously followed up. The present study was designed to test the null hypothesis that neural networks are equivalent to Cox’s proportional hazards model and simple Kaplan–Meier plots when predicting survival in a large number of patients from a small set of input variables.

Methods Of the 1,327 patients with squamous carcinoma of the larynx on our database, 873 patients seen over a 30-year period were included. Those excluded had not received curative treatment or had primary treatment elsewhere; 18 were lost to follow-up. All data were entered sequentially in a prospective manner by two surgical oncologists as patients attended the clinic. These data Fig. 1 Observed baseline survival by the three methods

were updated as new TNM classifications [17] became accepted and the database was modernised several times. The current UICC (AJC) classification was used and performance status scored [18]. Only squamous carcinoma was studied and this was graded as well differentiated, moderately differentiated or poorly differentiated and 18% remained un-graded. For models incorporated grading data these fields were deleted. TNM stage and subsite were graded on an ordinal scale and age as a continuous interval variable. The date last seen was censored but death, whatever the cause, was not. Follow-up data were continuously updated from outpatient visits, general practitioner records, the Merseyside and Cheshire Cancer Registry and the Statistics Office. The median potential follow-up was 14.9 years. The relevant data were downloaded onto an Excel spreadsheet and imported into the SAS [19] and MATLAB [20] software. In this study observed survival was employed as it maximises the number of patients available for analysis by maximising the number of deaths in the study group. In addition it is less prone to dataset follow-up error and avoids any confusion regarding non-cancer deaths and tumour-related death as apposed to death from cancer. In the present randomised study we were not concerned with the cause of death. The whole dataset was provisionally analysed using univariate methods including the Kaplan–Meier product limit estimator. Baseline survival was plotted as the observed survival of all patients without stratification (Fig. 1). Data were then analysed using binary stratification for the seven host and tumour factors. Age, sex, ECOG, site, histology, T stage, and N stage were all analysed. The factors were described as 0 or 1 using the following code: Age 60=1, male=0 versus female=1, ECOG 0=0 versus ECOG

543

1–4=1, histology well or moderately differentiated squamous cell carcinoma=0 versus poorly differentiated=1, glottic site=0 versus supraglottic and subglottic site=1, T stage 1–2=0 versus T stage 3–4=1, and N stage 0=0 versus N stage 1–3=1. The level of a was taken as