Testing Pattern Recognition as a Method fQrMeasuring ... - Europe PMC

3 downloads 0 Views 837KB Size Report
'University of Health Sciences/The Chicago Medical School, North Chicago, Illinois 60064 .... primary diagnosiswas rated first in 79.3% of the cases, among the ...
Testing Pattern Recognition as a Method fQr Measuring Severity of Illness D. Trace, M.D.', F. Naeymi-Rad, M.S.', L. Carmony, Ph.D.2, S. Chen, M.S.3, K. Kems, B.S.', P. Yarnold, Ph.D.4 M. Tan, M.D.', M. Astiz, M.D.', C. Mecher, M.D.', M.H. Weil, M.D.', and M. Evens, Ph.D.3 'University of Health Sciences/The Chicago Medical School, North Chicago, Illinois 60064 2Lake Forest College, Department of Mathematics and Computer Science,Lake Forest, Illinois 60045 3111inois Institute of Technology, Department of Computer Science, Chicago, Illinois 60616 4Northwestem University Medical School, Section of General Internal Medicine, Chicago, Illinois 60611 spectrum of patients, thus being more useful for prognostic purposes. Three of the more established physiological scales include the APACHE I [14], MPM [15], and MEDISGROUPS

Abstract

[16].

This paper describes a multimembership Bayesian index of severity calculated by MEDAS (the Medical Emergency Decision Assistance System). This severity index measures the likelihood that the patient will die without immediate intervention. The MEDAS inference engine operates on binary features representing signs, symptoms, and laboratory results. As a basis for calculation of the severity index, severity weights, ranging from 0 to 9 were assigned to each feature by an expert physician in order to form a severity pattern. To evaluate the MEDAS severity index, two physician experts in critical care medicine independently provided severity assessments for a series of patients hospitalized for congestive heart failure (N=19) or diabetes mellitus (N=22). These disorders were selected because they are common in the patient load at our VA hospital and lead to a wide range of outcomes. Agreement between the MEDAS severity index and the expert assessments was at or was close to the theoretical maximum.

APACHE II, the outgrowth of years of research at the George Washington University Medical Center, produces a numerical rating by assigning subjective values to a small set of physiological and chronic health parameters. MEDISGROUPS uses an expanded set (approximately 250) of key clinical findings assigned with subjective weights from 0 to 3 and employed to group the patient into one of four categories. In contrast to the subjective weighting of the APACHE II and MEDISGROUP, the MPM method uses multivariate statistical analysis on known hospital data to produce a predictive scale. In all three methods, the patient assessments and resulting score are usually obtained several times during an entire hospitalization and do not reflect instantaneous ("real-time") changes in the patient's

condition. Clearly, the objective of all predictive models involved defining and manipulating variables in order to produce a severity index. Such an index, if accurate, easily computed, and readily validated for all types of health care organizations, would reflect a potentially useful metric for studying issues such as assessing quality care, the impact of DRG reimbursement, and ethical concerns involving the rationing of care. Current methods of reimbursement to hospitals caring for sicker patients are fraught with difficulty since no "fair" system has yet been accepted by all participants in the health care industry. Similar concerns center around comparing morbidity and mortality data in an effort to assess quality of care. Indeed, the need for an accurate, reliable, instantaneous assessment of patient severity has never been as great as in today's health care environment. Current severity indices, while accurate when assessing low-risk patients [17,181, have unfortunately been relatively inaccurate and inconsistent when assessing patients with significant morbidity [ 19 ]. Highlighting the need for a more accurate, reliable method of measuring severity is a recent study that compared advantages of APACHE I when compared with clinical assessments by nurses and physicians, and that the reported advantages claimed for APACHE II did not appear [201. MEDISGROUPS and MPM have not yet received rigorous evaluation, thus limiting their clinical usefulness.

Introduction While science has been the engine driving medicine to remarkable technological advances in the twentieth century, these advances have produced significant social and ethical problems. Cost containment [1], rationing of health care [2,3,4,5], and quality assurance [6,7,8] have now surfaced as major issues for the next century. Central to understanding and resolving these complex issues is the physicians' ability to predict patient outcomes. The estimation of severity is an important component of the predictive process. We propose a new method of assessing severity that provides an instantaneous, comprehensive severity score based on the multimembership Bayesian algorithm used in MEDAS (the Medical Emergency Diagnostic Assistance System). The past two decades have seen a marked interest in the development of predictive models for medical decisions [9,10]. These models have used the following variables either alone or in combination in their attempt to predict outcomes: diagnosis [11], anatomic or physiologic status [12], chronic disease level, and response to therapy [13]. Anatomic or physiological status has received the most investigative attention although anatomic status is usually confined to a defined patient group (e.g., the Glasgow Coma Scale for head injuries). Physiological scales, by virtue of using parameters from multiple organ systems, tend to represent a broader

MEDAS

MEDAS is a probablistic diagnostic consultant that employs a multimembership Bayesian model as its inference engine and relational database technology for its knowledge

253

0195-4210/89/0000/0253$01.00 © 1989 SCAMC, Inc.

This calculation is computed with the assumption that for every i the features are conditionally independent under Di and

base maintenance [21,22,23]. Research on MEDAS began at the University of Southem Califomia Institute of Critical Care in early 1970's. MEDAS moved to Chicago in 1981 and its current progress is due to collaboration between the Illinois Institute of Technology and the Chicago Medical School. Since 1981, the MEDAS diagnostic consultant has grown from approximately 70 disorders and 700 features (signs, symptoms, history and laboratory results), to over 10 medical domains, each with up to 200 disorders and sharing more than 7000 features. This growth is due to development of the TOOL-BOX, a knowledge engineering tool that allows users to create knowledge without any programming knowledge [24,25]. A recent evaluation of the prognostic performance of MEDAS [26] involved the collection of information from 300 patient charts representing ten different disorders in intemal medicine from the North Chicago Veterans Administration Medical Center. This evaluation showed that when the top three MEDAS diagnoses were considered, the physician's primary diagnosis was rated first in 79.3% of the cases, among the top two in 93%, and among the top three in 95%. MEDAS is now expanding in several different directions: the Feature Dictionary provides definitions of terms and supports translation between knowledge bases; [27]; the Portable Patient File [28,29] is a multi-encounter medical record in a portable format; and Treatment Protocols [30] are designed to provide patient-specific plans for case management. The design of a mechanism to deliver treatment protocols raised the issue of deciding which disorder should be treated first for patients with multiple disorders. To resolve this issue it became apparent that MEDAS needed some procedure for measuring severity. The goal was a severity assessment procedure that could help the system analyze a situation with multiple diagnoses, each with its own treatment protocol, and mesh these individual protocols into an overall, patient specific protocol.

under Di, fori= 1,2, ... m. We can define the contribution index of the feature Xj to the posterior probability of disease Di as:

Aij = P(DiIXj)-P(Di) Figure 1 shows the contribution indices for a typical disorder pattem produced by the MEDAS Tool-Box. The delta values can be computed both when the feature Xj is present (Post + When and when this feature is absent (Post When -).

1179

.949 .9

.100 a

.3Q0 3Q0 .X00 .300 .050 .300

OMIRURSALEPPtCO UNILA1ItAL

1)La

3)1345

G

4)Op pmb

Pa9L ?wt P5w Who+ Who-

00

.050

.009

.100 .100

.20

.100

.019

.119

.5

.000

.455

.064 .051 .015

.575

.342 .154 .154 .154

.154 .165 .139

.0

h05i

.034 .S .0a .054

S) 5Smt *lak

P1m d1, 2,x34, x A

Figure 1 Traditionally, developers of MEDAS knowledge bases have estimated the values of Pij and the contribution columns instead of subjectively estimating Pij. The process then uses the multimembership formula with only one feature present to solve for Pij. The first author assigned severity weights between 0 (least severe) and 9 (most severe) to each of 900 features from the selected knowledge base. In the present research, Pij was fixed at 0.2, the contribution indices were selected using the geometric sequence shown in Table 1, and the P column was then computed from the other values. No contribution was given to those features with a severity rating of 0 to 3.

1) P(Di); the probability of a disorder, or prevalence rate. 2) Pij = P(XjlDi); the conditional probability that feature X is positive, given that disorder Di is present. 3) Pij = P(XjIDi); the conditional probability that feature X

5A

Sc

is positive, given that disorder Di is absent.

4 5 6 7 8 9

The posterior probability calculation for the MEDAS inference engine is derived from the following version of the Bayesian theorem: **

73 VRWIUOSCAN: MMAMVE4T&(41) DD22O.ULMONARYPERJUONDWSC a-314W-7PATnMN 7Xno0 ^ i3 CM: PU18ARY UMUMAW UNCLA00" D4 MUUCtIAR1!NDSRNRSS DP

V5 JSPlONDEUN AltAL P)4 a VAP LUNG SCAN: D43QJM I74M5IMVRIIRONOU OCaL AT11CASS 4135

The multimembership Bayesian inference system [21,22] requires three parameters in its calculation of posterior probability. These parameters include:

P(Di)fl(x

SOTO.Pot+ P

Fam Nm

FM

11

909:75

N_u: IU1MKAYTROCWL

CO-: RESP41DBY _i 0.07

(ORSIF

The MEDAS Inference Engine

Pmj(Dl1....xo-

D

Dharelt

1)

Sm Na

0.13). Whereas the mean severity ratings of physician A were statistically comparable to the mean severity ratings produced by MEDAS for both diagnoses (i's > 0.28), the mean severity ratings of physician B was marginally greater that the corresponding mean ratings by MEDAS (p's < 0.03 for both diagnoses). Considered together, these results suggest that the two physicians and MEDAS generated severity ratings with relatively comparable mean values. Before assessing the degree of concordance between physicians' and MEDAS' ratings of severity, it is important to assess the degree of concordance between physician ratings. This analysis, referred to as inter-rater reliability, provides an estimate of the extent to which experts agree on patients' severity scores, and thus on the theoretically maximum expected concordance between physician ratings and MEDAS (or other expert-system) ratings of severity [281. For

Directions for Future Research There are a number of ways in which this research can be extended. The first and easiest step involves asking the physicians to develop a consensus severity assessment after making their individual assessments. If the consensus ratings are more reliable than either the individual sevcrity ratings, or

255

the quality of life to which the patient can be restored, and the cost and length of medical care needed.

than the mean of those individual severity ratings, then the theoretically maximum obtainable correlation between MEDAS and physician ratings would be greatest when the consensus ratings were employed. The concordance between our model and physician severity ratings might also be improved by increasing the reliability of the MEDAS severity ratings. That is, the severity weights for MEDAS features in the present research were determined by one physician acting alone. We plan to ask Physicians A and B to independently assign weights to the MEDAS features, thus resulting in three independent MEDAS severity assessment patterns. These three MEDAS severity pattems could then be employed to obtain a mean MEDAS severity assessment, as was done for the individual physician estimates. Since the reliability of a composite is greater than the reliability of the components, the mean MEDAS severity estimate should be more reliable than the current MEDAS severity estimate. This increase in reliability would then theoretically argue for a correspondingly greater upper bound for the correlation between (mean) MEDAS and physician severity ratings. Future research should also examine the reliability of a MEDAS severity knowledge base constructed by physician consensus, rather than by simply obtaining a mean of several independent MEDAS severity estimates. During debriefing, our physician-experts explained that some feature ranges were too broadly defined to allow for "proper judgements of severity." For example, in the current version of MEDAS, one of the respiratory rate features presents a domain of 24 to 36 respirations per minute. We are currently converting MEDAS to increase the number of respiratory rate features, decreasing the range of respirations for each feature, thus increasing the sensitivity. Such an increase in sensitivity is hypothesized to improve the concordance between physician- and MEDAS-based severity assessments, as well as the diagnostic performance of MEDAS. We are also collecting data from more patients, from more physicians, and from a greater variety of disorders, in order to further evaluate the limits of generalizability of the present findings. The present severity measure reflects an estimate--by physicians and by MEDAS--of the probability of a patient's death without interventions. Thus, the purpose of such a severity measure is to guide intervention, with the objective of lowering the magnitude of the severity measure as far and as rapidly as possible. Future research should evaluate the validity of MEDAS (and physician) estimates of severity, although obviously such research is rather complex. Such validational research must include information concerning the differential process of treatment associated with differential severity assessments, so that a simple examination of mortality rates does not appear to reflect an appropriate validational end-point. Finally, we also plan to experiment with a multimembership Bayesian algorithm to measure other facets of "severity", such as the long-term prognosis for the patient,

Summary As we have shown, the multimembership Bayesian measure of severity computed by MEDAS agrees with assessments of severity made by expert physicians, at levels near the theoretical maximum established by examining interphysician concordance. The MEDAS severity index has the advantage of being instantaneous whereas other severity measures may require up to 24 hours to develop. This severity index permits us to improve the MEDAS treatment protocols: we use MEDAS to attach an index to each of the disorders in the differential diagnosis and display treatment protocols in order of disorder severity. We are now building an automated medical record system with a link to MEDAS so that we can deliver an instantaneous severity index without repeated data input.

Acknowledgements

This research was partially supported by Bionetics/NASA. We are particularly grateful for helpful comments on our work from Dr. Daniel Woodard from Bionetics and Dr. Paul Buchanan from NASA. References

[1]

Horn S.D: Measuring Severity of Illness: Comparisons Across Institutions. APH 1983; 73:25.

[21

Zimmerman J.E., Knaus N.A., Sharpe S.M., et al. The Use and Implications of Do Not Resuscitate Orders in Intensive Care Units. JAMA 1986; 255:351.

[3]

Bellamy P.E., Gye R.K. Admitting Elderly Patients to the ICU: Dilemmas and Solutions. Geriatrics 1987; 42:61.

[41

Strauss MJ., LoGerto J.P., Yeltatzie J.A., et al. Rationing of Intensive Care Unit Services. JAMA 1986; 255:1143.

[5]

Zimmernan J.E., Knaus W.A., Judson J.A.,

[6]

Dubois R.W., Brook RH., Rogers W.H. Adjusted Hospital Death Rates: A Potential Screen for Quality of Medical Care. AJPH 1987; 77:1162.

[7]

Goldstein R.L., Campion E.W., Thibault G.E., et al. Functional Outcomes FoUowing Medical Intensive Care. Critical Care Medicine. 1986; 14:783.

[8]

Le Gall H.R., Brun-Buisson C., Trunet P., et al. Influence of Age, Previous Health Status, and Severity of Acute Illness on Outcome From Intensive Care. Critical Care Medicine 1982; 10:575.

[9]

Baker S.P., O;Neil B., Haddon W., et al. The injury severity Score: A Method for Describing Paticnts with Multiple Injuries and Evaluating Emergency Care. J Trauma 1974; 14:187.

et al. Patient Selection for Intensive Care: A Comparison of New Zealand and United States Hospitals. Critical Care Medicine 1988; 16:318.

[101 Knaus W.A., Zimmerman J.E., Wagner D.P., et al. APACHEAcute Physiology and Chronic Health Evaluation.

256

A

IEEE/Ninth Annual Conf. of the Eng. in Med. & Bio. Soc., November 1987; Vol 3:1535-1536.

Physiologically Based Classification System. Crit Care Med 1981; 9:591.

[111 Gonnella J.S., Hombrook M.C., Louis D.Z.: Staging

[261 Georgakis C., Rosenthal R., Trace D., Evens M., Measures of performance of the MEDAS system.Proceeding of Artificial Intelligence West, May 1988.

a

Disease. A Case-Mix Measurement. JAMA 1984; 251:637.

[12] Teasdale G., Jennett B.: Assessment of Coma and Impaired

a Multi-Domain Medical Knowledge Base, Symposium on Computer Application in Medical Care, November (1988) pp. 212-217.

[27] Naeymi-Rad F. A Feature Dictionary for

Consciousness: A Practical Scale Lancet 1974;81.

[13] Keene A.R., Cuflen D.J.: Therapeutic Intervention Scoring System: Update 1983. Crit Care Med 1983; 11:1.

[281 Naeymirad S., Trace D., Naeymi-Rad F., Carnony L., Kobashi

M., Kepic T. and Evens M. The Portable Patient File: An Intelligent Automated Medical Record, to appear at MEDINFO 89.

[14] Knaus W.A., Draper E.A., Wagner D.P., et al. APACHE II: A Severity of Disease Classification for Acutely Ill Patients. Crit Care Med 1985; 13:818.

[29] Naeymi-Rad F., Trace D., Carmony L,Naeymirad S., Kepic T.

[15] Lemeshow S, Teres D., Avrumn J.P., et al. A Comparison of Methods to Predict Mortality of Intensive Care Unit Patients. Crit Care Med 1987; 15:715.

and Freese U., Weil M., Evens M. Feature Dictionary supporting An Intelligent Medical Record, to appear at MEDINFO 89.

[16] Brewster A.C., Karlin B.G., Hyde L.A., Jacobs C.M., Bradbury R.C., Chae Y.M.: MEDISGRPS: a clinically based approach to classifying hospital patients at admission. In Quiry 1985 Winter, 22(4). p 377-87.

[301 Naeymi-Rad F., Koschmann T., Trace D., Kepic T., Carlson

C.R., Weil M.H., and Evens M. Expert Knowledge Base Designed Using ER-Modeling Technique. Proceedings of the Fifth Conference on Medical Infornatics (1986) 5: 51-55.

[17] Wagner D.P., Knaus W.A., Draper E.A., Zimmerman J.E.

[31] Magnusson D. Test Theory, Reading, MA, Addison-Wesley,

Identification of Low-Risk Monitor Patients Within a MedicalSurgical Intensive Care Unit. Medical Care, April 1983; Vol. 21, No. 4: pp. 425-434.

1967.

[32] Kleinbaum D.G., Kupper LL., Muller K.E. Applied Regression Analysis and Other Multi-variable Methods, 2'nd edition, Boston, MA PWS-Kent, 1988.

[181 Wagner D.P., Knaus W.A., Draper E.A. Identification of LowRisk Monitor Admissions to Medical-Surgical ICUs. CHEST, September 1987; Vol.92, No. 4: pp. 423428.

[19] Papadakis M.A. Browner W.S. Prognosis of Noncardiac

Medical Patients Receiving Mechanical Ventilation in a Veterans Hospital. Am. Joumal of Med, October 1987; Vol. 83: pp. 687-692.

[20] Kruse J.A., Thill-Baharozian M.C., Carlson R.W. Comparison of Clinical Assessment With APACHE II for Predicting Mortality Risk in Patients Admitted to a Medical Intensive Care Unit. JAMA 1988; 260: pp.1739-1742.

[211 Ben-Bassat M., Carlson R.W., Puri V.K., Davenport M.D., Schriver J.A., Latif M., Smith, R., Portigal L.D., Lipnick E.H., Weil M.H.: Pattem-Based Interactive Diagnosis of Multiple Disorders: The MEDAS System. IEEE Transactions in Pattem Analysis and Machine Intelligence 1980; PAMI-2: pp. 148160.

[221 Ben-Bassat M.: Multimembership and Multiperspective Classification: Introduction, Applications, and a Bayesian Model. IEEE Transactions on System, Man, and Cybernetics, June 1980; SMC-ID 6: pp. 331-336.

[23] Koschmann T.,

Evens M., Naeymi-Rad F., Lee C.M., Weil M.H. Knowledge Engineering Tools for a Bayesian Diagnostic Consultant. Symposium on Computer Applications in Medical Care, November (1985) 274-280.

[241 Naeymi-Rad F., Koschmann T., Lee C.M., Kepic T., Evens M., Weil M.H. Maintaining a Knowledge Base Using the MEDAS Knowledge Engineering Tools, Symposium on Computer Applications in Medical Care, November (1985) 298-303.

[251 Naeymi-Rad F.,

Koschmann T., Rosenthal R., Trace D., Naeymi-Rad S., Swanson J., Lee C., Carlson R. Weil M.H., and Evens M. Use of an E-R Diagram in the Design of a Feature Dictionary for a Multi-Domain Medical Knowledge Base.

257