Eur Arch Otorhinolaryngol (2006) 263: 541–547 DOI 10.1007/s00405-006-0021-2
HEAD AND NECK ONCO L OGY
Andrew S. Jones Æ Azzam G. F. Taktak Timothy R. Helliwell Æ John E. Fenton Martin A. Birchall Æ David J. Husband Anthony C. Fisher
An artificial neural network improves prediction of observed survival in patients with laryngeal squamous carcinoma Received: 21 July 2005 / Accepted: 13 October 2005 / Published online: 5 May 2006 Ó Springer-Verlag 2006
Abstract The accepted method of modelling and predicting failure/survival, Cox’s proportional hazards model, is theoretically inferior to neural network derived models for analysing highly complex systems with large datasets. A blinded comparison of the neural network versus the Cox’s model in predicting survival utilising data from 873 treated patients with laryngeal cancer. These were divided randomly and equally into a training set and a study set and Cox’s and neural network models applied in turn. Data were then divided into seven sets of binary covariates and the analysis repeated. Overall survival was not signiﬁcantly diﬀerent on Kaplan–Meier plot, or with either test model. Although the network produced qualitatively similar results to Cox’s model it was signiﬁcantly more sensitive to diﬀerences in survival curves for age and N stage. We propose that neural networks are capable of prediction in systems involving complex interactions between variables and non-linearity. Keywords Artiﬁcial intelligence Æ Laryngeal carcinoma Æ Complex systems modeling Æ Chaos
Background Life is arguably the greatest triumph of nature and the most complex system studied by scientists. The mathematics required to understand and describe it are equally challenging. Biologists have understood the concepts of A. S. Jones Æ A. G. F. Taktak Æ T. R. Helliwell J. E. Fenton Æ M. A. Birchall Æ D. J. Husband Æ A. C. Fisher Head and Neck Oncology Group, Faculty of Medicine, University of Liverpool, Liverpool, UK A. S. Jones (&) Section of Head and Neck Surgery, Faculty of Medicine, Clinical Sciences Centre, University Hospital Aintree, University of Liverpool, Longmoor Lane, Liverpool, L9 7AL, UK E-mail: [email protected]
probability and have harnessed this mathematics to convert Biology from a merely descriptive science into an exact one capable of prediction. The analysis of complex systems requires classiﬁcation, explanation and prediction and for the latter various types of regression are typically used [1, 2]. These assume linearity in the system to be studied, as well as a simple mathematical relationship between the input and output variables . As the methods used for analysing biological systems are probabilistic and statistical the techniques required analysing the survival of animals in health and disease are similarly statistical. Unsurprisingly these techniques are also applied to the study of course and outcome in patients with cancer. These statistical methods are now highly developed, almost universally used and appear accurate: why then do we require additional more complex techniques to advance the study of Biology and Medicine? The commonly used regression models employed for studying the association between variables and the failure of biological systems in disease are reaching their limit in today’s Biology. These models can only deal with a relatively small number of covariates and analysis of interactions is both crude and limited. When investigating failure the same problems apply and the performance of regression models sub-optimal. Also a number of mathematical and practical limitations apply including problems with unknown or assumed distributions, poor curve-ﬁtting performance and dubious methodology in baseline hazard derivation. Neural networks have been used in medical research for the past 20 years for classiﬁcation and for prediction of failure. They have found their most extensive role as tools for the classiﬁcation in oncology but have still not been widely adopted for explanation or prediction. Although neural networks can address various multivariate problems, including complex multivariate regression, they have been criticised for not being statistically or mathematically rigourous. In response, statistical reﬁnements have been incorporated, greatly enhancing their use in the biological sciences.
Whilst they are eﬀective at classiﬁcation and can facilitate decision making [4, 5] neural networks have been seen as less eﬀective at prognostication [6–15] with only half of the published studies claiming networks to be superior to statistically based regression methods. However, many studies are ﬂawed technically, mathematically or methodologically. It should be cautioned, however, that some authorities consider neural networks inappropriate for analysing survival data . Squamous carcinoma of the larynx has several features that make it suitable for testing the relative accuracy of neural networks and regression models in predicting survival. It can be accurately staged using clinical and radiological techniques, distant metastases occur relatively late, and it has a short hazard period (4 years) facilitating reliable follow-up. In addition locoregional recurrence in laryngeal cancer is relatively easy to detect and typically occurs within 2 years. Finally our group possesses an accurate and audited dataset of over 1,000 patients with laryngeal squamous carcinoma who have been rigorously followed up. The present study was designed to test the null hypothesis that neural networks are equivalent to Cox’s proportional hazards model and simple Kaplan–Meier plots when predicting survival in a large number of patients from a small set of input variables.
Methods Of the 1,327 patients with squamous carcinoma of the larynx on our database, 873 patients seen over a 30-year period were included. Those excluded had not received curative treatment or had primary treatment elsewhere; 18 were lost to follow-up. All data were entered sequentially in a prospective manner by two surgical oncologists as patients attended the clinic. These data Fig. 1 Observed baseline survival by the three methods
were updated as new TNM classiﬁcations  became accepted and the database was modernised several times. The current UICC (AJC) classiﬁcation was used and performance status scored . Only squamous carcinoma was studied and this was graded as well diﬀerentiated, moderately diﬀerentiated or poorly diﬀerentiated and 18% remained un-graded. For models incorporated grading data these ﬁelds were deleted. TNM stage and subsite were graded on an ordinal scale and age as a continuous interval variable. The date last seen was censored but death, whatever the cause, was not. Follow-up data were continuously updated from outpatient visits, general practitioner records, the Merseyside and Cheshire Cancer Registry and the Statistics Oﬃce. The median potential follow-up was 14.9 years. The relevant data were downloaded onto an Excel spreadsheet and imported into the SAS  and MATLAB  software. In this study observed survival was employed as it maximises the number of patients available for analysis by maximising the number of deaths in the study group. In addition it is less prone to dataset follow-up error and avoids any confusion regarding non-cancer deaths and tumour-related death as apposed to death from cancer. In the present randomised study we were not concerned with the cause of death. The whole dataset was provisionally analysed using univariate methods including the Kaplan–Meier product limit estimator. Baseline survival was plotted as the observed survival of all patients without stratiﬁcation (Fig. 1). Data were then analysed using binary stratiﬁcation for the seven host and tumour factors. Age, sex, ECOG, site, histology, T stage, and N stage were all analysed. The factors were described as 0 or 1 using the following code: Age 60=1, male=0 versus female=1, ECOG 0=0 versus ECOG
1–4=1, histology well or moderately diﬀerentiated squamous cell carcinoma=0 versus poorly diﬀerentiated=1, glottic site=0 versus supraglottic and subglottic site=1, T stage 1–2=0 versus T stage 3–4=1, and N stage 0=0 versus N stage 1–3=1. The level of a was taken as