Clinical Chemistry 48:10 1828 –1834 (2002)

Oak Ridge Conference

Comparison of Logistic Regression and Neural Net Modeling for Prediction of Prostate Cancer Pathologic Stage Robert W. Veltri,1* Manisha Chaudhari,1 M. Craig Miller,2 Edward C. Poole,3 Gerard J. O’Dowd,3 and Alan W. Partin1

Background: Prostate cancer (PCa) pathologic staging remains a challenge for the physician using individual pretreatment variables. We have previously reported that UroScoreTM, a logistic regression (LR)-derived algorithm, can correctly predict organ-confined (OC) disease state with >90% accuracy. This study compares statistical and neural network (NN) approaches to predict PCa stage. Methods: A subset (756 of 817) of radical prostatectomy patients was assessed: 434 with OC disease, 173 with capsular penetration (NOC-CP), and 149 with metastases (NOC-AD) in the training sample. Additionally, an OC ⴙ NOC-CP (n ⴝ 607) vs NOC-AD (n ⴝ 149) twooutcome model was prepared. Validation sets included 120 or 397 cases not used for modeling. Input variables included clinical and several quantitative biopsy pathology variables. The classification accuracies achieved with a NN with an error back-propagation architecture were compared with those of LR statistical modeling. Results: We demonstrated >95% detection of OC PCa in three-outcome models, using both computational approaches. For training patient samples that were equally distributed for the three-outcome models, NNs gave a significantly higher overall classification accuracy than the LR approach (40% vs 96%, respectively). In the two-outcome models using either unequal or equal case distribution, the NNs had only a marginal advantage in classification accuracy over LR. Conclusions: The strength of a mathematics-based disease-outcome model depends on the quality of the input variables, quantity of cases, case sample input distribu-

1 Johns Hopkins Hospital, Department of Urology, 600 North Wolfe St., Baltimore, MD 21287. 2 8032 Covered Bridge Drive, Quakertown, PA 18951. 3 UroCor, Inc., Division of Dianon Systems, Oklahoma City, OK 73104. *Author for correspondence. Fax 410-614-3695; e-mail [email protected] Received June 7, 2002; accepted July 23, 2002.

tion, and computational methods of data processing of inputs and outputs. We identified specific advantages for NNs, especially in the prediction of multiple-outcome models, related to the ability to pre- and postprocess inputs and outputs. © 2002 American Association for Clinical Chemistry

Prostate cancer (PCa)4 is the most common malignancy among men in the US, with an estimated ⬃200 000 new cases and 30 200 deaths in 2002 (1 ). Approximately 30% of men who are treated for localized disease will have recurrences, and a subset of these men will develop progressive disease (2–5 ). The clinical staging of PCa continues to utilize serum prostate-specific antigen (PSA) and the digital rectal examination. Most patients diagnosed early with organ-confined (OC) tumors are curable ⬃90 –95% of the time with radical prostatectomy (3, 5 ) and ⬃80 –95% with radiation therapy (6 ). Today, a significant proportion (⬃25– 40%) of patients diagnosed with clinical stage T1c disease (PSA ⬎4.0 g/L and nonpalpable tumor) are found to have non-OC pathology (higher grade and/or stage) at radical prostatectomy (7–10 ). Thus, there are still variable but significant numbers of patients with clinically localized disease who will have pathologically non-OC PCa at the time of treatment (10 –13 ). The goal of PCa clinical staging is to estimate the anatomic extent of the disease at the time of diagnosis (i.e., pretreatment), and several pretreatment prediction multiparameter models have been developed (4, 5, 13– 19 ). Currently, one of the most widely used pretreatment staging prediction algorithms is based on the SPORE nomograms, also known as “Partin tables” (16, 17 ). These

4 Nonstandard abbreviations: PCa, prostate cancer; PSA, prostate-specific antigen; OC, organ-confined; tPSA, total PSA; CP, capsular penetration; LR, logistic regression; NN, neural network; NOC-CP, non-organ-confined with capsular penetration; and NOC-AD, non-organ-confined with metastasis.

1828

1829

Clinical Chemistry 48, No. 10, 2002

tables were developed with use of the clinical stage, total PSA (tPSA), and Gleason sum results from 4133 men with clinically localized PCa treated with radical prostatectomy between 1982 and 1996. The Partin tables provide the population-derived likelihood of having an OC cancer, isolated capsular penetration (CP), seminal vesicle involvement, and pelvic lymph node involvement for an individual falling into a particular tPSA range, Gleason sum, and clinical stage category (17 ). Statistical tools, such as logistic regression (LR), have also been applied to analyze data and create patient-specific stage prediction models for use at the pretreatment decision step as well as in posttreatment prognosis for PCa disease management (13–22 ). More recently, the application of neural networks (NNs) for predicting outcomes for PCa has attracted a great deal of attention, and computational solutions have been produced using various software configurations (23–27 ). A NN is a software machine that uses electronic components designed as parallel distributed processors with a propensity for storing experiential knowledge and making it available for use. NNs resemble the brain in at least two respects: (a) knowledge is acquired by the network through a learning process, and interneuron connection strengths (known as synaptic weights) are used to store knowledge (28 –30 ); and (b) the NN is adaptive, fault tolerant, capable of very large-scale integration of information using neurobiological simulation principles, and produces a highly structured uniformity of analysis and architecture when finalized. The computing cells in a NN use an interconnection of simple computing cells, referred to as “neurons” or “processing units”. The “learning algorithm” uses nonlinear mathematical transfer functions ranging from gaussian, sinusoid, and sigmoid to hybrids of these and sometimes combines these functions with preprocessing statistical or pre- and postprocessing genetic algorithms to create optimally performing NNs (28 –30 ). The objective of such mathematical manipulations is to modify the synaptic weights of a network’s processing units in an orderly fashion to attain the desired outcome prediction based on the availability of sufficient quantitative inputs (training data sets). There are numerous types of NN designs capable of processing complex data to make outcome predictions, including genetic, free form, error back-propagation, radial-based function, probabilistic, generalized regression, and self-organizing feature maps (23–30 ). Depending on the type of problem, e.g., monitoring complex machine functions, medical outcomes, stock market forecasting, credit assignments, or pattern recognition, the NN design can be engineered to optimize the outcome prediction. The present study compares LR and a single type of NN computational architecture to predict PCa stage, using a well-defined cohort of PCa patients to assess the robustness of each model under different training conditions.

Materials and Methods We obtained specimens from 2400 prostate sextant biopsy cases diagnosed between 1991 and 1997 from both academic and private-practice urologists. Reliable pathologic staging and/or clinical information was obtained for 988 patients: 191 from five academic collaborators and 797 from numerous community-based private-practice urologists. Any patient with a radical prostatectomy pathology report indicating neoadjuvant therapy or noting specific histopathologic evidence of such was excluded from the study. The final patient UroScoreTM original sample included 817 cases after removal of cases with missing pretreatment tPSA values (23 ). None of the patients from whom the 817 patient samples had been collected had neoadjuvant therapy before the biopsy. Using the 1997 TNM staging guidelines (31 ), we excluded 61 cases [47 OC, 12 non-OC with CP (NOC-CP), and 2 non-OC with metastasis (NOC-AD)], yielding 756 usable cases for the current analysis. The final distribution of cases was 434 OC, 173 NOC-CP, and 149 NOC-AD patients in the training sample. The removal of these 61 cases from the patient sample did not significantly alter the prediction of patient outcomes when the revised (n ⫽ 756) models were compared with the original (n ⫽ 817) models (data not shown). The second grouping used for this study included OC ⫹ NOC-CP (n ⫽ 607) vs with NOC-AD (n ⫽ 149) patients to assess a two-outcome model. The validation patient sample set was based on the radical prostatectomy reports from an additional 120 PCa biopsy cases obtained from two sites: 98 from Dr. Rube Hundley, (Urology Associates of Dothan, P.A., Dothan, AL) and 22 cases from Johns Hopkins Hospital. These cases were audited and classified in the same manner. This validation group case distribution included 60 OC cases, 30 NOC-CP cases, and 30 NOC-AD cases. For the two-outcome model, the validation case distribution was 90 OC ⫹ OC-CP cases and 30 NOC-AD cases. In addition, when we constructed equal case distribution models, all remaining unused cases not in the training set were included in the validation runs. All of the 756 and 98 prostate biopsy specimens (Dothan Urology) were processed and evaluated at a large national pathology reference laboratory (Oklahoma City, OK), whereas the 22 Johns Hopkins Hospital biopsies were processed at this site. The sextant biopsy pathology variables measured included the following: number of positive cores, highest Gleason sum, presence of Gleason grade 4 and/or 5, total percentage of tumor involvement, average percentage of tumor involvement per core (formulated by dividing the total percentage of tumor involvement by the total number of cores), average percentage of tumor involvement per positive core (formulated by dividing the total percentage of tumor involvement by the number of positive cores), and the tumor location (i.e., ⱖ5% tumor involvement in base, mid, and/or apex core). The tPSA assay used the Food and Drug Administration-

1830

Veltri et al.: Computational Algorithms for PCa

approved equimolar TOSOH (UroCor Labs) or Hybritech (Johns Hopkins Hospital) methods, and comparison of both tests yielded a correlation coefficient ⬎98%. For the purposes of our analyses, the tPSA results were categorized by increments of 2 g/L, based on a previously described method (23 ). Table 1 provides the patient demographics for the training (n ⫽ 756) and validation (n ⫽ 120) sets. As shown in Table 1, the age distribution was similar with no statistically significant differences among the outcome groups for the training and validation sets. Table 1 also demonstrates that with increasing disease severity (OC 3 NON-CP 3 NOC-AD), similar trends were observed for the input variables of the training and validation sets. We used the Stata v7.0 statistical software program for all LR methods, as detailed previously (23 ). Briefly, all of the biopsy pathology and clinical variables were examined multivariately by use of a stringency set at P ⫽ 0.20 for all independent variables and selection by backward stepwise LR to predict a three-outcome dependent variable of OC vs NOC-CP vs NOC-AD or the binary twooutcome dependent variable of OC ⫹ NOC-CP vs NOCAD. The formulae and functions used to calculate patientspecific outcome predictive probabilities based on the results of the OLOGIT analysis have been published (23 ). The iUnderstand v1.4 (BioComp Systems, Inc.) software program was used to construct an error backpropagation NN that included pre- and postprocessing statistical and genetic algorithms to optimize NN performance. We constructed an optimized three- and twooutcome model using unequal or equal case input distribution and the fixed variables summarized in Table 2. The NN used either LR statistical preselected or nonpreselected input variables, the parameters in Table 2, and 200 cycles or iterations. When LR preselection was applied, we used backward stepwise LR at a stringency of P ⫽ 0.20. The NN computed relationships between input variables and outcomes, identifying the 10 “fittest” solutions, and ultimately “evolved” a single optimized network. The optimization processing used a randomized training-testing set ratio for the 756 patients of 60:30:10 for training, testing, and robustness determination. ValidaTable 1. Summary of training and validation patient demographics. Mean values Training (n ⴝ 756) OC

Age, years tPSA, g/L Positive cores, n Total involvement, % Average involvement per core, % Involvement of positive core, %

NOC-CP NOC-AD

62.6 63.1 8.3 11.1 2.1 2.6 61.3 105.7 11.3 19.7 26.5

36.0

Validation (n ⴝ 120) OC

NOC-CP NOC-AD

67.9 63.1 100.3 6.4 3.6 2 209.1 61.4 37.4 15.8

64.6 7.7 2.1 85.6 21.4

63.4 25.5 3.3 156.5 26.3

50.7 30.7

37.8

42.0

Table 2. Error back-propagation NN training and testing fixed parameters. Parameter

Value

Target solution Model type Accuracy metric Data for modeling, % Data for optimizing, % Data for selection, % Number of iterations Learning rates decrease from Momentums decrease from Tau learning rates decrease from Tau momentums decrease from Number of hidden layers Number of nodes per layer Transfer functions at hidden layer nodes

Classification Back-propagation Classification 60% 30% 10% 200 0.4 to 0.1 0.2 to 0.05 0.4 to 0.1 0.2 to 0.05 2–3 1–8 Linear, logistic, and tangent hyperbolic

tion used only the PCa cases not used for training the NN and LR regression models.

Results Shown in Table 3A is a comparison of the mathematical performance of the recapitulated UroScore model applying the same 35% cutoffs previously used (only with the n ⫽ 756 data set) as described in the Materials and Methods above. The high percentages of correct classifications of OC by OLOGIT and NN and the overall accuracy were very similar to those described in the original report (23 ). Preselection by LR of the NN inputs had only a minor impact on NOC-CP identification. The 120-case validation set yielded an OC prediction of 91.7% for OLOGIT and 100% for NN regardless of statistical preselection (Table 3B). OLOGIT identified 20% of the NOC-CP validation cases, whereas the NN identified only 3.3% of these cases. Overall accuracy for validation was indistinguishable for the LR and NN models. When the OLOGIT and NN three-outcome computational models were built using an equal number of cases in each pathologic stage category, the impact on OC correct classification was significant (Table 4A). The NN modeling approach for the three-outcome models achieved a ⱖ96% correct classification of OC cancer compared with 40% for OLOGIT. However, the LR model approach enabled much improved classification of the NOC-CP group (67.3% vs 14.7%; Table 4A). On the other hand, on validation, the NN model was much more robust, with overall accuracies of 75.8 –78.2% vs only 35.9% for LR. We next tested a two-outcome computational model based on the fact that patients with OC and NOC-CP clinically localized disease will all be definitively treated with either radical prostatectomy or radiotherapy. Table 5 summarizes the unequal case input models and validation results and illustrates the marked improvement in

1831

Clinical Chemistry 48, No. 10, 2002

Table 3. Performance characteristics of OLOGIT and NN: Unequal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual outcome

n

OC

NOC-CP

NOC-AD

A. Training set (n ⴝ 756; three-outcome) OC 434 395 25 14 NOC-CP 173 137 14 22 NOC-AD 149 51 15 83 Overall accuracy B. Validation set (n ⴝ 120; three-outcome) OC 60 55 2 3 NOC-CP 30 22 6 2 NOC-AD 30 16 3 11 Overall accuracy a

NN, no preselected inputs

Predicted outcome Correctly classified, %

OC

NOC-CP

NOC-AD

91.0 8.1 55.7 65.1

420 141 58

1 12 3

13 20 88

91.7 20.0 36.7 60.0

60 28 18

0 1 1

0 1 11

Predicted outcome Correctly classified, %

Correctly classified, %

OC

NOC-CP

NOC-AD

96.8 6.9 59.1 68.8

425 148 64

0 0 0

9 25 85

97.9 0.0 57.0 67.5

100.0 3.3 36.7 60.0

60 29 18

0 0 0

0 1 12

100.0 0.0 40.0 60.0

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

Discussion

classification of OC and NOC-AD as well as the overall percentage correctly classified. The NN marginally outperformed the LR model in the classification of NOC-AD in both the training (⬃89% vs ⬃87%) and validation testing (⬃83.3% vs ⬃79.2%). Overall accuracy for these two-outcome models showed a marked improvement over the three-outcome models, and the NNs tended to show improved performance over LR. When we used the equal case distribution for training, the OLOGIT and NN performed very similarly (Table 6A). These results also held up in the validation testing with a highly skewed patient cohort. Thus, in a twooutcome equal case distribution model, the statistical and NN computational models validated with very similar overall and individual percentages of patient categories correctly classified. On validation, there was a slight advantage of ⬃5% for the NN over LR (Table 6B). Overall accuracy for correct identification of the validation cases was similar both two outcome models (Tables 5B and 6B).

Among contemporary PCa patients, there is limited clinical outcome predictive value when only the individual variables such as clinical stage, serum PSA, Gleason score, or grade of the biopsy are used to counsel the patient at pretreatment. The generation of additional quantitative pathology variables and the unreliability of this prior approach led to the development of several multimodal staging tools (13–21 ). These multiparameter computational tools incorporate several clinical and biopsy pathology variables, including DNA ploidy, to predict specific endpoints, such as pathologic stage or disease-free survival (7, 8, 13–15 ). The validity of some of these tools may become compromised when the training sets come from a demographically restricted cohort and the validation cases come from multiple demographically unrestricted sources (e.g., academic centers of excellence, communitybased practices, or combinations thereof). Under such circumstances, it is not uncommon that the performance

Table 4. Performance characteristics of OLOGIT and NN: Equal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual outcome

A. Training set (n OC NOC-CP NOC-AD Overall accuracy

n

OC

NOC-CP

NOC-AD

ⴝ 449; three-outcome) 150 60 80 10 150 22 101 27 149 8 43 98

B. Validation set (n ⴝ 427; three-outcome) OC 344 103 208 33 NOC-CP 53 8 35 10 NOC-AD 30 0 15 15 Overall accuracy a

NN, no preselected inputs

Predicted outcome Correctly classified, %

OC

NOC-CP

NOC-AD

40.0 67.3 65.8 57.1

144 110 56

5 22 3

1 18 90

29.9 66.0 50.0 35.9

307 46 15

23 4 3

14 3 12

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

Predicted outcome Correctly classified, %

Correctly classified, %

OC

NOC-CP

NOC-AD

96.0 14.7 60.4 57.0

147 128 43

0 0 0

3 22 106

98.0 0.0 71.1 56.3

89.2 7.5 40.0 75.8

317 45 14

0 0 0

27 8 16

92.2 0.0 53.3 78.2

1832

Veltri et al.: Computational Algorithms for PCa

Table 5. Performance characteristics of OLOGIT and NN: Unequal number of cases.a OLOGIT model

NN, preselected Inputs

Predicted outcome Actual outcome

n

OC/NOC-CP

NOC-AD

A. Training set (n ⴝ 756; two-outcome) OC 607 586 NOC-AD 149 75 Overall accuracy B. Validation set (n ⴝ 120; two-outcome) OC 90 89 NOC-AD 30 24 Overall accuracy a

Predicted outcome Correctly classified, %

OC-NOC-CP

NOC-AD

NN, no preselected inputs Predicted outcome

Correctly classified, %

OC/NOC-CP

NOC-AD

Correctly classified, %

21 74

96.5 49.6 87.3

573 49

34 100

94.4 67.1 89.0

571 48

36 101

94.1 67.8 88.9

1 6

98.9 20.0 79.2

86 16

4 14

95.6 46.7 83.3

87 16

3 14

96.7 46.7 84.2

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

of some decision support tools may deteriorate when exposed to new cases (27, 28 ). Therefore, tools that do not withstand rigorous validation from the same or multiple institutions, including community-based practices, may not be as useful to aid in clinical decision-making (28 ). As a result of increased PSA screening and public awareness, there is a predominance of clinically localized disease, including a predominance of T1c clinical stage cancers, in the US population (5, 7, 8 ). Additionally, the pathologic grades and stages of these contemporary cancers have decreased with a concomitant drop in mortality from PCa (2, 3, 5, 7–9, 20, 21 ). Likewise, PSA concentrations and standard biopsy pathologies have consolidated and become more similar (Gleason 6 and 7) among more contemporary patients cohorts (2, 3, 5, 13, 20, 21 ). It is therefore becoming a more challenging task for the urologist to predict the pathologic stage of those cancers that clinically present so uniformly with respect to the major pretreatment input variables often used: age, tPSA and or its derivatives, Gleason score, and clinical stage. Additional information, derived from a more detailed evaluation of the prostate biopsy, is readily available and has been shown to provide critical information in the past, and today, it may even outperform the Gleason score on multivariate analysis (5, 13–17, 20 –22, 27, 32–34 ). For ex-

ample, Badalament et al. (14 ) and others (15, 22, 23 ) identified high Gleason score and grade and number of biopsy-positive cores with cancer as two very important variables for pathologic stage prediction. Other investigators noted that the percentage of cores positive for cancer and surface-area-positive for cancer were the best predictors for pathologic stage and cancer volume in multivariate logistic regression analyses that also included PSA, age, clinical stage, and Gleason score (13–15, 18 –22, 32– 34 ). Aside from the emerging evidence for more detailed quantitative biopsy assessment, other variables, such as DNA ploidy (14, 15 ), quantitative nuclear morphometric analysis (8, 14, 15, 25, 32–34 ), or other serum and tissue markers, may provide information to overcome the challenge of optimal PCa staging (15, 33 ). Previously (23 ), we demonstrated high diagnostic accuracy for OC disease when a more detailed evaluation of prostate biopsies, termed quantitative prostate biopsy pathology, was used to develop and challenge a statistical (LR) or NN model to predict PCa stage based on preoperative variables. These models underwent additional validation with a subset of 116 new patients and were able to correctly classify patients with OC cancers at the ⬎90% level. The focus of the present report was to compare LR, a demonstrated statistical modeling method that we have

Table 6. Performance characteristics of OLOGIT and NN: Equal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual accuracy

n

OC/NOC-CP

A. Training set (n ⴝ 299; two-outcome) OC 150 125 NOC-AD 149 41 Overall accuracy

NOC-AD

25 108

B. Validation set (n ⴝ 576; two-outcome) OC 547 463 84 NOC-AD 30 13 17 Overall accuracy a

Predicted outcome Correctly classified, %

OC/NOC-CP

NOC-AD

83.3 72.5 77.9

133 38

17 111

84.6 56.7 83.2

495 13

52 17

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

NN, no preselected inputs Predicted outcome

Correctly classified, %

Correctly classified, %

OC/NOC-CP

NOC-AD

88.7 74.5 81.6

133 38

17 111

88.7 74.5 81.6

90.5 56.7 88.7

486 12

61 18

88.8 60.0 87.3

1833

Clinical Chemistry 48, No. 10, 2002

applied successfully in the past (23 ), to a updated version of the NN software program (BioComp Systems, Inc.). Although the present report uses a subset of the original 817 patients (n ⫽ 756), its purpose was to make a comparison of the impact on patient outcome predictions when different patient sample training set distributions (equal and unequal) were used for the original threeoutcome model. In addition, we also developed new two-outcome statistical and NN models that were more clinically relevant, using the same patient distribution approaches we assessed for the three-outcome model. By doing so, we have confirmed the performance of the UroScore statistical and NN three-outcome models and determined that only when an equal number of cases in each pathologic stage category is used for training does the NN dramatically outperform the LR model (Tables 3B vs 4B). In addition, we developed and validated new two-outcome clinically appropriate computational models and compared the statistical and NN computational modeling approaches with equal and unequal patient training sets. Under these modeling conditions, we demonstrated an overall correct classification advantage of ⬃5% for NNs over LR to classify patients as either OC with or without CP (clinically localized curable disease) or advanced (positive seminal vesicles, lymph nodes, or bone disease). In summary, when attempting to develop PCa staging pretreatment models, multiple quantitative biopsy pathology variables are strongly valued by either method of computation, NN or LR. However, the case distribution, whether unequal or equal, can severely impact training and validation of the LR three-outcome model (Table 3 vs Table 4). The two-outcome LR and NN models performed with near parity irregardless of training case input distribution. We can make several observations regarding the above computational differential performance of the three- vs two-outcome models: (a) It may be assumed that a large proportion of NOC-CP cancers are more biologically reminiscent of OC cancers as evidenced by the degree of overlap for the distribution of the input variables in each group, and therefore, the current input variables applied in the three-outcome model may not be as reliable to accurately classify these subtly different cancers. The biological outcome of surgically removed cancers that demonstrate CP on pathologic examination supports the assumption of a comparable behavior of at least a proportion of pT3 cancers. In a study of 721 men at Johns Hopkins, 58% of men with evident CP had no evidence of biochemical recurrence 10 years after surgery (35 ). (b) In a two-outcome model, the skewness of case distribution for model development is not as critical as in the threeoutcome model, at least when LR is compared with a modified error back-propagation NN. (c) NNs tend to be more robust on validation, especially when the case distribution for training is equal. (d) This comparison of computational methods to produce patient-specific out-

comes demonstrates some of the strengths and weaknesses of the two approaches. Clearly for the future, these mathematical multiparameter approaches to predict disease outcomes will continue to be of value when they are properly architected, tested, and validated. There is little doubt that larger case numbers should be used for training, and we need to validate new pathologic and molecular biomarkers to further improve the performance of such computational algorithms (15, 33 ). Ultimately, our greatest challenge will be to provide quality and up-to-date disease management tools delivered to the urologist in forms such as the Partin tables (17 ), the Kattan nomogram (18 ), and UroScore (23 ).

References 1. National Prostate Cancer Coalition. 2002 fact sheet. http:// www.4npcc.org/Fact_Sheet2002.pdf (Accessed June 2002). 2. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7–33. 3. Merrill RM, Brawley OW. Prostate cancer incidence and mortality rates among white and black men. Epidemiology 1997;8:126 –31. 4. O’Dowd GJ, Miller MC, Orozco R, Veltri RW. Analysis of repeat biopsy results within one year following a non-cancer diagnosis. Urology 2000;55:553–9. 5. Pound CR, Partin AW, Eisenberger MA, Chan DW, Pearson JD, Walsh PC. Natural history progression after PSA elevation following radical prostatectomy. JAMA 1999;281:1591–7. 6. Shipley WU, Thames HD, Sandler HM, Hanks GE, Zietman AL, Perez CA, et al. Radiation therapy for clinically localized prostate cancer: a multi-institutional pooled analysis. JAMA 1999;281: 1598 – 604. 7. Han M, Partin AW, Epstein JI, Walsh PC. Long-term biochemical disease-free and cancer-specific survival following anatomic radical retropubic prostatectomy. The 15-year Johns Hopkins experience [Review]. Urol Clin North Am 2001;28:555– 65. 8. Veltri RW, Miller MC, Mangold LA, O’Dowd GJ, Epstein JI, Partin AW. Prediction of pathological stage in clinical stage T1c prostate cancer patients: the new challenge. J Urol 2002;168:100 – 4. 9. Ghavamian R, Blute ML, Bergstralh EJ, Slezak J, Zincke H. Comparison of clinically nonpalpable prostate-specific antigendetected (cT1c) versus palpable (cT2) prostate cancers in patients undergoing radical retropubic prostatectomy. Urology 1999;54: 105–10. 10. Polascik TJ, Oesterling JE, Partin AW. Prostate specific antigen: a decade of discovery—what we have learned and where we are going. J Urol 1999;162:293–306. 11. Lalani el-N, Laniado ME, Abel PD. Molecular and cellular biology of prostate cancer. Cancer Metastasis Rev 1997;16:29 – 66. 12. Yoshida BA, Chekmareva MA, Wharam JF, Kadkhodaian M, Stadler WM, Boyer A, et al. Prostate cancer metastasis-suppressor genes: a current perspective. In Vivo 1998;12:49 –58. 13. O’Dowd GJ, Veltri RW, Orozco R, Miller MC, Oesterling JE. Update on the appropriate staging evaluation for newly diagnosed prostate cancer. J Urol 1997;158:687–98. 14. Badalament RA, Miller MC, Peller PA, Young DC, Bahn DK, Kochie P, et al. An algorithm for predicting non-organ confined prostate cancer using the results obtained from sextant core biopsies and prostate specific antigen level. J Urol 1996;156:1375– 80. 15. Veltri RW, O’Dowd GJ, Orozco R, Miller MC. The role of biopsy pathology, quantitative nuclear morphometry, and biomarkers in the pre-op prediction of prostate cancer staging and prognosis. Semin Urol Oncol 1998;16:106 –17. 16. Partin AW, Steinberg GD, Pitcock RV, Wu L, Piantadosi S, Coffey

1834

17.

18.

19.

20.

21. 22.

23.

24.

25.

Veltri et al.: Computational Algorithms for PCa

DS, et al. Use of nuclear morphometry, Gleason histologic scoring, clinical stage, and age to predict disease-free survival among patients with prostate cancer. Cancer 1992;70:161– 8. Partin AW, Kattan MW, Subong EN, Walsh PC, Wojno KJ, Oesterling JE, et al. Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer. A multi-institutional update. JAMA 1997; 277:1445–51. Kattan MW, Stapleton AM, Wheeler TM, Scardino PT. Evaluation of a nomogram used to predict the pathologic stage of clinically localized prostate carcinoma. Cancer 1997;79:528 –37. Vollmer RT, Keetch DW, Humphrey PA. Predicting the pathology results of radical prostatectomy from preoperative information. Cancer 1998;83:1567– 80. Partin AW, Mangold LA, Lamm DM, Walsh PC, Epstein JI, Pearson JD. Contemporary update of prostate cancer staging nomograms (Partin tables) for the new millennium. Urology 2001;58:843– 8. Ross PL, Scardino PT, Kattan MW. A catalog of prostate cancer nomograms. J Urol 2001;165:1562– 8. Gilliland FD, Hoffman RM, Hamilton A, Albertsen P, Eley JW, Harlan L, et al. Predicting extracapsular extension of prostate cancer in men treated with radical prostatectomy: results from the population based prostate cancer outcomes study. J Urol 1999; 162:1341–5. Veltri RW, Miller MC, Partin AW, Poole EC, O’Dowd GJ. Prediction of prostate carcinomas stage by quantitative biopsy pathology. Cancer 2001;91:2322– 8. Tewari A, Narayan P. Novel staging tool for localized prostate cancer: a pilot study using genetic adaptive neural networks. J Urol 1998;160:430 – 6. Potter SR, Miller MC, Mangold LA, Jones KA, Epstein JI, Veltri RW, et al. Genetically engineered neural networks for predicting prostate cancer progression after radical prostatectomy. Urology 1999;54:791–5.

26. Narayan P, Gajendran V, Taylor SP, Tewari A, Presti JC, Leidich R, et al. The role of transrectal ultrasound-guided biopsy-based staging, preoperative serum prostate-specific antigen, and biopsy Gleason score in prediction of final pathologic diagnosis in prostate cancer. Urology 1995;46:205–12. 27. Graefen M, Haese A, Pichlmeier U, Hammerer PG, Noldus J, Butz K, et al. A validated strategy for side specific prediction of organ confined prostate cancer: a tool to select for nerve sparing radical prostatectomy. J Urol 2001;165:857– 63. 28. Wei JT, Zhang Z, Barnhill SD, Madyastha KR, Zhang H, Oesterling JE. Understanding artificial neural networks and exploring their potential applications for the practicing urologist. Urology 1998; 52:161–72. 29. Reckwitz T, Potter SR, Snow PB, Zhang Z, Veltri RW, Partin AW. Artificial neural networks in urology: update 2000. Prostate Cancer Prostatic Dis 1999;2:222– 6. 30. Forrest S. Genetic algorithms: principles of natural selection applied to computation. Science 1993;261:872– 8. 31. Sobin LH, Fleming ID. TNM classification of malignant tumors, fifth edition (1997). Union Internationale Contre le Cancer and the American Joint Committee on Cancer. Cancer 1997;80:1803– 4. 32. Veltri RW, Partin AW, Miller MC. Quantitative nuclear grade (QNG): a new image analysis-based biomarker of clinically relevant nuclear structure alterations. J Cell Biochem Suppl 2000;35:151–7. 33. Veltri RW, Miller MC, An G. Standardization, analytical validation, and quality control of intermediate endpoint biomarkers. Urology 2001;57(Suppl 4A):164 –70. 34. Sebo TJ, Bock BJ, Cheville JC, Lohse C, Wollan P, Zincke H. The percent of cores positive for cancer in prostate needle biopsy specimens is strongly predictive of tumor stage and volume at radical prostatectomy. J Urol 2000;163:174 – 8. 35. Noldus J, Graefen M, Haese A, Henke RP, Hammerer P, Huland H. Stage migration in clinically localized prostate cancer. Eur Urol 2000;38:74 – 8.

Oak Ridge Conference

Comparison of Logistic Regression and Neural Net Modeling for Prediction of Prostate Cancer Pathologic Stage Robert W. Veltri,1* Manisha Chaudhari,1 M. Craig Miller,2 Edward C. Poole,3 Gerard J. O’Dowd,3 and Alan W. Partin1

Background: Prostate cancer (PCa) pathologic staging remains a challenge for the physician using individual pretreatment variables. We have previously reported that UroScoreTM, a logistic regression (LR)-derived algorithm, can correctly predict organ-confined (OC) disease state with >90% accuracy. This study compares statistical and neural network (NN) approaches to predict PCa stage. Methods: A subset (756 of 817) of radical prostatectomy patients was assessed: 434 with OC disease, 173 with capsular penetration (NOC-CP), and 149 with metastases (NOC-AD) in the training sample. Additionally, an OC ⴙ NOC-CP (n ⴝ 607) vs NOC-AD (n ⴝ 149) twooutcome model was prepared. Validation sets included 120 or 397 cases not used for modeling. Input variables included clinical and several quantitative biopsy pathology variables. The classification accuracies achieved with a NN with an error back-propagation architecture were compared with those of LR statistical modeling. Results: We demonstrated >95% detection of OC PCa in three-outcome models, using both computational approaches. For training patient samples that were equally distributed for the three-outcome models, NNs gave a significantly higher overall classification accuracy than the LR approach (40% vs 96%, respectively). In the two-outcome models using either unequal or equal case distribution, the NNs had only a marginal advantage in classification accuracy over LR. Conclusions: The strength of a mathematics-based disease-outcome model depends on the quality of the input variables, quantity of cases, case sample input distribu-

1 Johns Hopkins Hospital, Department of Urology, 600 North Wolfe St., Baltimore, MD 21287. 2 8032 Covered Bridge Drive, Quakertown, PA 18951. 3 UroCor, Inc., Division of Dianon Systems, Oklahoma City, OK 73104. *Author for correspondence. Fax 410-614-3695; e-mail [email protected] Received June 7, 2002; accepted July 23, 2002.

tion, and computational methods of data processing of inputs and outputs. We identified specific advantages for NNs, especially in the prediction of multiple-outcome models, related to the ability to pre- and postprocess inputs and outputs. © 2002 American Association for Clinical Chemistry

Prostate cancer (PCa)4 is the most common malignancy among men in the US, with an estimated ⬃200 000 new cases and 30 200 deaths in 2002 (1 ). Approximately 30% of men who are treated for localized disease will have recurrences, and a subset of these men will develop progressive disease (2–5 ). The clinical staging of PCa continues to utilize serum prostate-specific antigen (PSA) and the digital rectal examination. Most patients diagnosed early with organ-confined (OC) tumors are curable ⬃90 –95% of the time with radical prostatectomy (3, 5 ) and ⬃80 –95% with radiation therapy (6 ). Today, a significant proportion (⬃25– 40%) of patients diagnosed with clinical stage T1c disease (PSA ⬎4.0 g/L and nonpalpable tumor) are found to have non-OC pathology (higher grade and/or stage) at radical prostatectomy (7–10 ). Thus, there are still variable but significant numbers of patients with clinically localized disease who will have pathologically non-OC PCa at the time of treatment (10 –13 ). The goal of PCa clinical staging is to estimate the anatomic extent of the disease at the time of diagnosis (i.e., pretreatment), and several pretreatment prediction multiparameter models have been developed (4, 5, 13– 19 ). Currently, one of the most widely used pretreatment staging prediction algorithms is based on the SPORE nomograms, also known as “Partin tables” (16, 17 ). These

4 Nonstandard abbreviations: PCa, prostate cancer; PSA, prostate-specific antigen; OC, organ-confined; tPSA, total PSA; CP, capsular penetration; LR, logistic regression; NN, neural network; NOC-CP, non-organ-confined with capsular penetration; and NOC-AD, non-organ-confined with metastasis.

1828

1829

Clinical Chemistry 48, No. 10, 2002

tables were developed with use of the clinical stage, total PSA (tPSA), and Gleason sum results from 4133 men with clinically localized PCa treated with radical prostatectomy between 1982 and 1996. The Partin tables provide the population-derived likelihood of having an OC cancer, isolated capsular penetration (CP), seminal vesicle involvement, and pelvic lymph node involvement for an individual falling into a particular tPSA range, Gleason sum, and clinical stage category (17 ). Statistical tools, such as logistic regression (LR), have also been applied to analyze data and create patient-specific stage prediction models for use at the pretreatment decision step as well as in posttreatment prognosis for PCa disease management (13–22 ). More recently, the application of neural networks (NNs) for predicting outcomes for PCa has attracted a great deal of attention, and computational solutions have been produced using various software configurations (23–27 ). A NN is a software machine that uses electronic components designed as parallel distributed processors with a propensity for storing experiential knowledge and making it available for use. NNs resemble the brain in at least two respects: (a) knowledge is acquired by the network through a learning process, and interneuron connection strengths (known as synaptic weights) are used to store knowledge (28 –30 ); and (b) the NN is adaptive, fault tolerant, capable of very large-scale integration of information using neurobiological simulation principles, and produces a highly structured uniformity of analysis and architecture when finalized. The computing cells in a NN use an interconnection of simple computing cells, referred to as “neurons” or “processing units”. The “learning algorithm” uses nonlinear mathematical transfer functions ranging from gaussian, sinusoid, and sigmoid to hybrids of these and sometimes combines these functions with preprocessing statistical or pre- and postprocessing genetic algorithms to create optimally performing NNs (28 –30 ). The objective of such mathematical manipulations is to modify the synaptic weights of a network’s processing units in an orderly fashion to attain the desired outcome prediction based on the availability of sufficient quantitative inputs (training data sets). There are numerous types of NN designs capable of processing complex data to make outcome predictions, including genetic, free form, error back-propagation, radial-based function, probabilistic, generalized regression, and self-organizing feature maps (23–30 ). Depending on the type of problem, e.g., monitoring complex machine functions, medical outcomes, stock market forecasting, credit assignments, or pattern recognition, the NN design can be engineered to optimize the outcome prediction. The present study compares LR and a single type of NN computational architecture to predict PCa stage, using a well-defined cohort of PCa patients to assess the robustness of each model under different training conditions.

Materials and Methods We obtained specimens from 2400 prostate sextant biopsy cases diagnosed between 1991 and 1997 from both academic and private-practice urologists. Reliable pathologic staging and/or clinical information was obtained for 988 patients: 191 from five academic collaborators and 797 from numerous community-based private-practice urologists. Any patient with a radical prostatectomy pathology report indicating neoadjuvant therapy or noting specific histopathologic evidence of such was excluded from the study. The final patient UroScoreTM original sample included 817 cases after removal of cases with missing pretreatment tPSA values (23 ). None of the patients from whom the 817 patient samples had been collected had neoadjuvant therapy before the biopsy. Using the 1997 TNM staging guidelines (31 ), we excluded 61 cases [47 OC, 12 non-OC with CP (NOC-CP), and 2 non-OC with metastasis (NOC-AD)], yielding 756 usable cases for the current analysis. The final distribution of cases was 434 OC, 173 NOC-CP, and 149 NOC-AD patients in the training sample. The removal of these 61 cases from the patient sample did not significantly alter the prediction of patient outcomes when the revised (n ⫽ 756) models were compared with the original (n ⫽ 817) models (data not shown). The second grouping used for this study included OC ⫹ NOC-CP (n ⫽ 607) vs with NOC-AD (n ⫽ 149) patients to assess a two-outcome model. The validation patient sample set was based on the radical prostatectomy reports from an additional 120 PCa biopsy cases obtained from two sites: 98 from Dr. Rube Hundley, (Urology Associates of Dothan, P.A., Dothan, AL) and 22 cases from Johns Hopkins Hospital. These cases were audited and classified in the same manner. This validation group case distribution included 60 OC cases, 30 NOC-CP cases, and 30 NOC-AD cases. For the two-outcome model, the validation case distribution was 90 OC ⫹ OC-CP cases and 30 NOC-AD cases. In addition, when we constructed equal case distribution models, all remaining unused cases not in the training set were included in the validation runs. All of the 756 and 98 prostate biopsy specimens (Dothan Urology) were processed and evaluated at a large national pathology reference laboratory (Oklahoma City, OK), whereas the 22 Johns Hopkins Hospital biopsies were processed at this site. The sextant biopsy pathology variables measured included the following: number of positive cores, highest Gleason sum, presence of Gleason grade 4 and/or 5, total percentage of tumor involvement, average percentage of tumor involvement per core (formulated by dividing the total percentage of tumor involvement by the total number of cores), average percentage of tumor involvement per positive core (formulated by dividing the total percentage of tumor involvement by the number of positive cores), and the tumor location (i.e., ⱖ5% tumor involvement in base, mid, and/or apex core). The tPSA assay used the Food and Drug Administration-

1830

Veltri et al.: Computational Algorithms for PCa

approved equimolar TOSOH (UroCor Labs) or Hybritech (Johns Hopkins Hospital) methods, and comparison of both tests yielded a correlation coefficient ⬎98%. For the purposes of our analyses, the tPSA results were categorized by increments of 2 g/L, based on a previously described method (23 ). Table 1 provides the patient demographics for the training (n ⫽ 756) and validation (n ⫽ 120) sets. As shown in Table 1, the age distribution was similar with no statistically significant differences among the outcome groups for the training and validation sets. Table 1 also demonstrates that with increasing disease severity (OC 3 NON-CP 3 NOC-AD), similar trends were observed for the input variables of the training and validation sets. We used the Stata v7.0 statistical software program for all LR methods, as detailed previously (23 ). Briefly, all of the biopsy pathology and clinical variables were examined multivariately by use of a stringency set at P ⫽ 0.20 for all independent variables and selection by backward stepwise LR to predict a three-outcome dependent variable of OC vs NOC-CP vs NOC-AD or the binary twooutcome dependent variable of OC ⫹ NOC-CP vs NOCAD. The formulae and functions used to calculate patientspecific outcome predictive probabilities based on the results of the OLOGIT analysis have been published (23 ). The iUnderstand v1.4 (BioComp Systems, Inc.) software program was used to construct an error backpropagation NN that included pre- and postprocessing statistical and genetic algorithms to optimize NN performance. We constructed an optimized three- and twooutcome model using unequal or equal case input distribution and the fixed variables summarized in Table 2. The NN used either LR statistical preselected or nonpreselected input variables, the parameters in Table 2, and 200 cycles or iterations. When LR preselection was applied, we used backward stepwise LR at a stringency of P ⫽ 0.20. The NN computed relationships between input variables and outcomes, identifying the 10 “fittest” solutions, and ultimately “evolved” a single optimized network. The optimization processing used a randomized training-testing set ratio for the 756 patients of 60:30:10 for training, testing, and robustness determination. ValidaTable 1. Summary of training and validation patient demographics. Mean values Training (n ⴝ 756) OC

Age, years tPSA, g/L Positive cores, n Total involvement, % Average involvement per core, % Involvement of positive core, %

NOC-CP NOC-AD

62.6 63.1 8.3 11.1 2.1 2.6 61.3 105.7 11.3 19.7 26.5

36.0

Validation (n ⴝ 120) OC

NOC-CP NOC-AD

67.9 63.1 100.3 6.4 3.6 2 209.1 61.4 37.4 15.8

64.6 7.7 2.1 85.6 21.4

63.4 25.5 3.3 156.5 26.3

50.7 30.7

37.8

42.0

Table 2. Error back-propagation NN training and testing fixed parameters. Parameter

Value

Target solution Model type Accuracy metric Data for modeling, % Data for optimizing, % Data for selection, % Number of iterations Learning rates decrease from Momentums decrease from Tau learning rates decrease from Tau momentums decrease from Number of hidden layers Number of nodes per layer Transfer functions at hidden layer nodes

Classification Back-propagation Classification 60% 30% 10% 200 0.4 to 0.1 0.2 to 0.05 0.4 to 0.1 0.2 to 0.05 2–3 1–8 Linear, logistic, and tangent hyperbolic

tion used only the PCa cases not used for training the NN and LR regression models.

Results Shown in Table 3A is a comparison of the mathematical performance of the recapitulated UroScore model applying the same 35% cutoffs previously used (only with the n ⫽ 756 data set) as described in the Materials and Methods above. The high percentages of correct classifications of OC by OLOGIT and NN and the overall accuracy were very similar to those described in the original report (23 ). Preselection by LR of the NN inputs had only a minor impact on NOC-CP identification. The 120-case validation set yielded an OC prediction of 91.7% for OLOGIT and 100% for NN regardless of statistical preselection (Table 3B). OLOGIT identified 20% of the NOC-CP validation cases, whereas the NN identified only 3.3% of these cases. Overall accuracy for validation was indistinguishable for the LR and NN models. When the OLOGIT and NN three-outcome computational models were built using an equal number of cases in each pathologic stage category, the impact on OC correct classification was significant (Table 4A). The NN modeling approach for the three-outcome models achieved a ⱖ96% correct classification of OC cancer compared with 40% for OLOGIT. However, the LR model approach enabled much improved classification of the NOC-CP group (67.3% vs 14.7%; Table 4A). On the other hand, on validation, the NN model was much more robust, with overall accuracies of 75.8 –78.2% vs only 35.9% for LR. We next tested a two-outcome computational model based on the fact that patients with OC and NOC-CP clinically localized disease will all be definitively treated with either radical prostatectomy or radiotherapy. Table 5 summarizes the unequal case input models and validation results and illustrates the marked improvement in

1831

Clinical Chemistry 48, No. 10, 2002

Table 3. Performance characteristics of OLOGIT and NN: Unequal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual outcome

n

OC

NOC-CP

NOC-AD

A. Training set (n ⴝ 756; three-outcome) OC 434 395 25 14 NOC-CP 173 137 14 22 NOC-AD 149 51 15 83 Overall accuracy B. Validation set (n ⴝ 120; three-outcome) OC 60 55 2 3 NOC-CP 30 22 6 2 NOC-AD 30 16 3 11 Overall accuracy a

NN, no preselected inputs

Predicted outcome Correctly classified, %

OC

NOC-CP

NOC-AD

91.0 8.1 55.7 65.1

420 141 58

1 12 3

13 20 88

91.7 20.0 36.7 60.0

60 28 18

0 1 1

0 1 11

Predicted outcome Correctly classified, %

Correctly classified, %

OC

NOC-CP

NOC-AD

96.8 6.9 59.1 68.8

425 148 64

0 0 0

9 25 85

97.9 0.0 57.0 67.5

100.0 3.3 36.7 60.0

60 29 18

0 0 0

0 1 12

100.0 0.0 40.0 60.0

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

Discussion

classification of OC and NOC-AD as well as the overall percentage correctly classified. The NN marginally outperformed the LR model in the classification of NOC-AD in both the training (⬃89% vs ⬃87%) and validation testing (⬃83.3% vs ⬃79.2%). Overall accuracy for these two-outcome models showed a marked improvement over the three-outcome models, and the NNs tended to show improved performance over LR. When we used the equal case distribution for training, the OLOGIT and NN performed very similarly (Table 6A). These results also held up in the validation testing with a highly skewed patient cohort. Thus, in a twooutcome equal case distribution model, the statistical and NN computational models validated with very similar overall and individual percentages of patient categories correctly classified. On validation, there was a slight advantage of ⬃5% for the NN over LR (Table 6B). Overall accuracy for correct identification of the validation cases was similar both two outcome models (Tables 5B and 6B).

Among contemporary PCa patients, there is limited clinical outcome predictive value when only the individual variables such as clinical stage, serum PSA, Gleason score, or grade of the biopsy are used to counsel the patient at pretreatment. The generation of additional quantitative pathology variables and the unreliability of this prior approach led to the development of several multimodal staging tools (13–21 ). These multiparameter computational tools incorporate several clinical and biopsy pathology variables, including DNA ploidy, to predict specific endpoints, such as pathologic stage or disease-free survival (7, 8, 13–15 ). The validity of some of these tools may become compromised when the training sets come from a demographically restricted cohort and the validation cases come from multiple demographically unrestricted sources (e.g., academic centers of excellence, communitybased practices, or combinations thereof). Under such circumstances, it is not uncommon that the performance

Table 4. Performance characteristics of OLOGIT and NN: Equal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual outcome

A. Training set (n OC NOC-CP NOC-AD Overall accuracy

n

OC

NOC-CP

NOC-AD

ⴝ 449; three-outcome) 150 60 80 10 150 22 101 27 149 8 43 98

B. Validation set (n ⴝ 427; three-outcome) OC 344 103 208 33 NOC-CP 53 8 35 10 NOC-AD 30 0 15 15 Overall accuracy a

NN, no preselected inputs

Predicted outcome Correctly classified, %

OC

NOC-CP

NOC-AD

40.0 67.3 65.8 57.1

144 110 56

5 22 3

1 18 90

29.9 66.0 50.0 35.9

307 46 15

23 4 3

14 3 12

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

Predicted outcome Correctly classified, %

Correctly classified, %

OC

NOC-CP

NOC-AD

96.0 14.7 60.4 57.0

147 128 43

0 0 0

3 22 106

98.0 0.0 71.1 56.3

89.2 7.5 40.0 75.8

317 45 14

0 0 0

27 8 16

92.2 0.0 53.3 78.2

1832

Veltri et al.: Computational Algorithms for PCa

Table 5. Performance characteristics of OLOGIT and NN: Unequal number of cases.a OLOGIT model

NN, preselected Inputs

Predicted outcome Actual outcome

n

OC/NOC-CP

NOC-AD

A. Training set (n ⴝ 756; two-outcome) OC 607 586 NOC-AD 149 75 Overall accuracy B. Validation set (n ⴝ 120; two-outcome) OC 90 89 NOC-AD 30 24 Overall accuracy a

Predicted outcome Correctly classified, %

OC-NOC-CP

NOC-AD

NN, no preselected inputs Predicted outcome

Correctly classified, %

OC/NOC-CP

NOC-AD

Correctly classified, %

21 74

96.5 49.6 87.3

573 49

34 100

94.4 67.1 89.0

571 48

36 101

94.1 67.8 88.9

1 6

98.9 20.0 79.2

86 16

4 14

95.6 46.7 83.3

87 16

3 14

96.7 46.7 84.2

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

of some decision support tools may deteriorate when exposed to new cases (27, 28 ). Therefore, tools that do not withstand rigorous validation from the same or multiple institutions, including community-based practices, may not be as useful to aid in clinical decision-making (28 ). As a result of increased PSA screening and public awareness, there is a predominance of clinically localized disease, including a predominance of T1c clinical stage cancers, in the US population (5, 7, 8 ). Additionally, the pathologic grades and stages of these contemporary cancers have decreased with a concomitant drop in mortality from PCa (2, 3, 5, 7–9, 20, 21 ). Likewise, PSA concentrations and standard biopsy pathologies have consolidated and become more similar (Gleason 6 and 7) among more contemporary patients cohorts (2, 3, 5, 13, 20, 21 ). It is therefore becoming a more challenging task for the urologist to predict the pathologic stage of those cancers that clinically present so uniformly with respect to the major pretreatment input variables often used: age, tPSA and or its derivatives, Gleason score, and clinical stage. Additional information, derived from a more detailed evaluation of the prostate biopsy, is readily available and has been shown to provide critical information in the past, and today, it may even outperform the Gleason score on multivariate analysis (5, 13–17, 20 –22, 27, 32–34 ). For ex-

ample, Badalament et al. (14 ) and others (15, 22, 23 ) identified high Gleason score and grade and number of biopsy-positive cores with cancer as two very important variables for pathologic stage prediction. Other investigators noted that the percentage of cores positive for cancer and surface-area-positive for cancer were the best predictors for pathologic stage and cancer volume in multivariate logistic regression analyses that also included PSA, age, clinical stage, and Gleason score (13–15, 18 –22, 32– 34 ). Aside from the emerging evidence for more detailed quantitative biopsy assessment, other variables, such as DNA ploidy (14, 15 ), quantitative nuclear morphometric analysis (8, 14, 15, 25, 32–34 ), or other serum and tissue markers, may provide information to overcome the challenge of optimal PCa staging (15, 33 ). Previously (23 ), we demonstrated high diagnostic accuracy for OC disease when a more detailed evaluation of prostate biopsies, termed quantitative prostate biopsy pathology, was used to develop and challenge a statistical (LR) or NN model to predict PCa stage based on preoperative variables. These models underwent additional validation with a subset of 116 new patients and were able to correctly classify patients with OC cancers at the ⬎90% level. The focus of the present report was to compare LR, a demonstrated statistical modeling method that we have

Table 6. Performance characteristics of OLOGIT and NN: Equal number of cases.a OLOGIT model

NN, preselected inputs

Predicted outcome Actual accuracy

n

OC/NOC-CP

A. Training set (n ⴝ 299; two-outcome) OC 150 125 NOC-AD 149 41 Overall accuracy

NOC-AD

25 108

B. Validation set (n ⴝ 576; two-outcome) OC 547 463 84 NOC-AD 30 13 17 Overall accuracy a

Predicted outcome Correctly classified, %

OC/NOC-CP

NOC-AD

83.3 72.5 77.9

133 38

17 111

84.6 56.7 83.2

495 13

52 17

Used cutoff of ⱖ35% only for OLOGIT probabilities. NN classifies predictions into one outcome.

NN, no preselected inputs Predicted outcome

Correctly classified, %

Correctly classified, %

OC/NOC-CP

NOC-AD

88.7 74.5 81.6

133 38

17 111

88.7 74.5 81.6

90.5 56.7 88.7

486 12

61 18

88.8 60.0 87.3

1833

Clinical Chemistry 48, No. 10, 2002

applied successfully in the past (23 ), to a updated version of the NN software program (BioComp Systems, Inc.). Although the present report uses a subset of the original 817 patients (n ⫽ 756), its purpose was to make a comparison of the impact on patient outcome predictions when different patient sample training set distributions (equal and unequal) were used for the original threeoutcome model. In addition, we also developed new two-outcome statistical and NN models that were more clinically relevant, using the same patient distribution approaches we assessed for the three-outcome model. By doing so, we have confirmed the performance of the UroScore statistical and NN three-outcome models and determined that only when an equal number of cases in each pathologic stage category is used for training does the NN dramatically outperform the LR model (Tables 3B vs 4B). In addition, we developed and validated new two-outcome clinically appropriate computational models and compared the statistical and NN computational modeling approaches with equal and unequal patient training sets. Under these modeling conditions, we demonstrated an overall correct classification advantage of ⬃5% for NNs over LR to classify patients as either OC with or without CP (clinically localized curable disease) or advanced (positive seminal vesicles, lymph nodes, or bone disease). In summary, when attempting to develop PCa staging pretreatment models, multiple quantitative biopsy pathology variables are strongly valued by either method of computation, NN or LR. However, the case distribution, whether unequal or equal, can severely impact training and validation of the LR three-outcome model (Table 3 vs Table 4). The two-outcome LR and NN models performed with near parity irregardless of training case input distribution. We can make several observations regarding the above computational differential performance of the three- vs two-outcome models: (a) It may be assumed that a large proportion of NOC-CP cancers are more biologically reminiscent of OC cancers as evidenced by the degree of overlap for the distribution of the input variables in each group, and therefore, the current input variables applied in the three-outcome model may not be as reliable to accurately classify these subtly different cancers. The biological outcome of surgically removed cancers that demonstrate CP on pathologic examination supports the assumption of a comparable behavior of at least a proportion of pT3 cancers. In a study of 721 men at Johns Hopkins, 58% of men with evident CP had no evidence of biochemical recurrence 10 years after surgery (35 ). (b) In a two-outcome model, the skewness of case distribution for model development is not as critical as in the threeoutcome model, at least when LR is compared with a modified error back-propagation NN. (c) NNs tend to be more robust on validation, especially when the case distribution for training is equal. (d) This comparison of computational methods to produce patient-specific out-

comes demonstrates some of the strengths and weaknesses of the two approaches. Clearly for the future, these mathematical multiparameter approaches to predict disease outcomes will continue to be of value when they are properly architected, tested, and validated. There is little doubt that larger case numbers should be used for training, and we need to validate new pathologic and molecular biomarkers to further improve the performance of such computational algorithms (15, 33 ). Ultimately, our greatest challenge will be to provide quality and up-to-date disease management tools delivered to the urologist in forms such as the Partin tables (17 ), the Kattan nomogram (18 ), and UroScore (23 ).

References 1. National Prostate Cancer Coalition. 2002 fact sheet. http:// www.4npcc.org/Fact_Sheet2002.pdf (Accessed June 2002). 2. Greenlee RT, Murray T, Bolden S, Wingo PA. Cancer statistics, 2000. CA Cancer J Clin 2000;50:7–33. 3. Merrill RM, Brawley OW. Prostate cancer incidence and mortality rates among white and black men. Epidemiology 1997;8:126 –31. 4. O’Dowd GJ, Miller MC, Orozco R, Veltri RW. Analysis of repeat biopsy results within one year following a non-cancer diagnosis. Urology 2000;55:553–9. 5. Pound CR, Partin AW, Eisenberger MA, Chan DW, Pearson JD, Walsh PC. Natural history progression after PSA elevation following radical prostatectomy. JAMA 1999;281:1591–7. 6. Shipley WU, Thames HD, Sandler HM, Hanks GE, Zietman AL, Perez CA, et al. Radiation therapy for clinically localized prostate cancer: a multi-institutional pooled analysis. JAMA 1999;281: 1598 – 604. 7. Han M, Partin AW, Epstein JI, Walsh PC. Long-term biochemical disease-free and cancer-specific survival following anatomic radical retropubic prostatectomy. The 15-year Johns Hopkins experience [Review]. Urol Clin North Am 2001;28:555– 65. 8. Veltri RW, Miller MC, Mangold LA, O’Dowd GJ, Epstein JI, Partin AW. Prediction of pathological stage in clinical stage T1c prostate cancer patients: the new challenge. J Urol 2002;168:100 – 4. 9. Ghavamian R, Blute ML, Bergstralh EJ, Slezak J, Zincke H. Comparison of clinically nonpalpable prostate-specific antigendetected (cT1c) versus palpable (cT2) prostate cancers in patients undergoing radical retropubic prostatectomy. Urology 1999;54: 105–10. 10. Polascik TJ, Oesterling JE, Partin AW. Prostate specific antigen: a decade of discovery—what we have learned and where we are going. J Urol 1999;162:293–306. 11. Lalani el-N, Laniado ME, Abel PD. Molecular and cellular biology of prostate cancer. Cancer Metastasis Rev 1997;16:29 – 66. 12. Yoshida BA, Chekmareva MA, Wharam JF, Kadkhodaian M, Stadler WM, Boyer A, et al. Prostate cancer metastasis-suppressor genes: a current perspective. In Vivo 1998;12:49 –58. 13. O’Dowd GJ, Veltri RW, Orozco R, Miller MC, Oesterling JE. Update on the appropriate staging evaluation for newly diagnosed prostate cancer. J Urol 1997;158:687–98. 14. Badalament RA, Miller MC, Peller PA, Young DC, Bahn DK, Kochie P, et al. An algorithm for predicting non-organ confined prostate cancer using the results obtained from sextant core biopsies and prostate specific antigen level. J Urol 1996;156:1375– 80. 15. Veltri RW, O’Dowd GJ, Orozco R, Miller MC. The role of biopsy pathology, quantitative nuclear morphometry, and biomarkers in the pre-op prediction of prostate cancer staging and prognosis. Semin Urol Oncol 1998;16:106 –17. 16. Partin AW, Steinberg GD, Pitcock RV, Wu L, Piantadosi S, Coffey

1834

17.

18.

19.

20.

21. 22.

23.

24.

25.

Veltri et al.: Computational Algorithms for PCa

DS, et al. Use of nuclear morphometry, Gleason histologic scoring, clinical stage, and age to predict disease-free survival among patients with prostate cancer. Cancer 1992;70:161– 8. Partin AW, Kattan MW, Subong EN, Walsh PC, Wojno KJ, Oesterling JE, et al. Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer. A multi-institutional update. JAMA 1997; 277:1445–51. Kattan MW, Stapleton AM, Wheeler TM, Scardino PT. Evaluation of a nomogram used to predict the pathologic stage of clinically localized prostate carcinoma. Cancer 1997;79:528 –37. Vollmer RT, Keetch DW, Humphrey PA. Predicting the pathology results of radical prostatectomy from preoperative information. Cancer 1998;83:1567– 80. Partin AW, Mangold LA, Lamm DM, Walsh PC, Epstein JI, Pearson JD. Contemporary update of prostate cancer staging nomograms (Partin tables) for the new millennium. Urology 2001;58:843– 8. Ross PL, Scardino PT, Kattan MW. A catalog of prostate cancer nomograms. J Urol 2001;165:1562– 8. Gilliland FD, Hoffman RM, Hamilton A, Albertsen P, Eley JW, Harlan L, et al. Predicting extracapsular extension of prostate cancer in men treated with radical prostatectomy: results from the population based prostate cancer outcomes study. J Urol 1999; 162:1341–5. Veltri RW, Miller MC, Partin AW, Poole EC, O’Dowd GJ. Prediction of prostate carcinomas stage by quantitative biopsy pathology. Cancer 2001;91:2322– 8. Tewari A, Narayan P. Novel staging tool for localized prostate cancer: a pilot study using genetic adaptive neural networks. J Urol 1998;160:430 – 6. Potter SR, Miller MC, Mangold LA, Jones KA, Epstein JI, Veltri RW, et al. Genetically engineered neural networks for predicting prostate cancer progression after radical prostatectomy. Urology 1999;54:791–5.

26. Narayan P, Gajendran V, Taylor SP, Tewari A, Presti JC, Leidich R, et al. The role of transrectal ultrasound-guided biopsy-based staging, preoperative serum prostate-specific antigen, and biopsy Gleason score in prediction of final pathologic diagnosis in prostate cancer. Urology 1995;46:205–12. 27. Graefen M, Haese A, Pichlmeier U, Hammerer PG, Noldus J, Butz K, et al. A validated strategy for side specific prediction of organ confined prostate cancer: a tool to select for nerve sparing radical prostatectomy. J Urol 2001;165:857– 63. 28. Wei JT, Zhang Z, Barnhill SD, Madyastha KR, Zhang H, Oesterling JE. Understanding artificial neural networks and exploring their potential applications for the practicing urologist. Urology 1998; 52:161–72. 29. Reckwitz T, Potter SR, Snow PB, Zhang Z, Veltri RW, Partin AW. Artificial neural networks in urology: update 2000. Prostate Cancer Prostatic Dis 1999;2:222– 6. 30. Forrest S. Genetic algorithms: principles of natural selection applied to computation. Science 1993;261:872– 8. 31. Sobin LH, Fleming ID. TNM classification of malignant tumors, fifth edition (1997). Union Internationale Contre le Cancer and the American Joint Committee on Cancer. Cancer 1997;80:1803– 4. 32. Veltri RW, Partin AW, Miller MC. Quantitative nuclear grade (QNG): a new image analysis-based biomarker of clinically relevant nuclear structure alterations. J Cell Biochem Suppl 2000;35:151–7. 33. Veltri RW, Miller MC, An G. Standardization, analytical validation, and quality control of intermediate endpoint biomarkers. Urology 2001;57(Suppl 4A):164 –70. 34. Sebo TJ, Bock BJ, Cheville JC, Lohse C, Wollan P, Zincke H. The percent of cores positive for cancer in prostate needle biopsy specimens is strongly predictive of tumor stage and volume at radical prostatectomy. J Urol 2000;163:174 – 8. 35. Noldus J, Graefen M, Haese A, Henke RP, Hammerer P, Huland H. Stage migration in clinically localized prostate cancer. Eur Urol 2000;38:74 – 8.