Medical Decision Making

Medical Decision Making http://mdm.sagepub.com/

2010 Abstracts : 32nd Annual Meeting of the Society of Medical Decision Making SAGE Publications Med Decis Making 2011 31: E54 DOI: 10.1177/0272989X10393877 The online version of this article can be found at: http://mdm.sagepub.com/content/31/1/E54

Published by: http://www.sagepublications.com

On behalf of:

Society for Medical Decision Making

Additional services and information for Medical Decision Making can be found at: Email Alerts: http://mdm.sagepub.com/cgi/alerts Subscriptions: http://mdm.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from mdm.sagepub.com at Chung Shan Medical University Library on February 24, 2011

2010 Abstracts: 32nd Annual Meeting of the Society of Medical Decision Making DIAGNOSTIC ERRORS IN MEDICINE (DEM 1–12)

PG E55

GLOBAL HEALTH (GLH 1–7)

PG E59

HEALTH STATUS AND PREFERENCES (HSP 1–21)

PG E61

HEALTH SERVICES AND POLICY RESEARCH (HSR 1–66)

PG E70

EVIDENCE, ECONOMICS, AND ETHICS (HTA 1–60)

PG E95

JUDGEMENT AND DECISION MAKING (JDM 1–107)

PG E118

ADVANCES IN QUANTITATIVE METHODS (QMA 1–48)

PG E160

RESOURCES FOR RESEARCH (RFR 1–3)

PG E179

AUTHOR INDEX

PG E181

E54 • MEDICAL DECISION MAKING/JAN–FEB 2011 Downloaded from mdm.sagepub.com at Chung Shan Medical University Library on February 24, 2011

Abstracts SEMDM 2010 ANNUAL MEETING OPENING PLENARY SESSION (TOP-RANKED) ABSTRACTS DEM-1 THE MISCALIBRATION CURVE: A NOVEL PLOT TO ASSESS CALIBRATION ERROR OF A PREDICTION MODEL (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Michael W. Kattan, PhD, Changhong Yu, and Brian Wells, Cleveland Clinic, Cleveland, OH Purpose: Plots to assess the calibration of a prediction model, when applied to a validation dataset, are critical for judging the adequacy of the model or comparing rival models. Traditionally, these plots have applied a smoothing function to a plot of the “actual” outcome on the vertical axis and “predicted” outcome on the horizontal axis. In this way, the reader can compare the smoothed line to that of a perfectly straight 45 degree line that denotes perfect calibration. While such plots are helpful, 2 deficiencies remain. First, this plot does not naturally indicate where the bulk of the predictions lie. Second, and related to the first, is that the prevalence of the predictions in a region of miscalibration cannot be inferred. The purpose of the present study was to introduce a plot that repairs both deficiencies of a traditional calibration plot. Method: After several unsuccessful iterations involving the manipulation of axes, addition of shading, etc., a plot was constructed that appeared to solve the deficiencies above plus provide ready interpretation. The vertical axis is displayed as prediction error (actual value—predicted value). The horizontal axis is predicted value spaced in proportion to the frequency of predicted values. In other words, the spacing of the x-axis is such that a histogram of predicted values would indicate a uniform distribution. This approach makes it easy to infer where the bulk of the predictions lie. More importantly, it quickly illustrates the frequency of predictions which might lie in a miscalibrated area of the prediction model. Result: Figure 1 presents a traditional calibration curve for a prediction model applied to a validation dataset. Note that this

figure suggests quite poor calibration of the prediction tool. Figure 2 is the novel “miscalibration curve.” Note that this curve suggests a substantially different interpretation, indicating excellent calibration of the model for the vast majority of its predictions. Conclusion: The miscalibration curve is a useful plot for providing improved insight into the performance of a prediction model, relative to the traditional calibration curve.

Figure 2. Miscalibration curve. DEM-2 CORRECTING FOR PARTIAL VERIFICATION BIAS: A COMPARISON OF METHODS (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Joris A. H. de Groot, MSc1, Kristel J. M. Janssen, PhD2, Koos H. Zwinderman, PhD3, Patrick M. Bossuyt, PhD3, Hans B. Reitsma, PhD3, and Karel G. M. Moons, PhD2 (1) University Medical Center, The Netherlands, Utrecht, Netherlands, (2) University Medical Center, Utrecht, Netherlands, (3) Academic Medical Center, Amsterdam, Netherlands

Figure 1. Traditional calibration curve.

Purpose: A common problem in diagnostic research is that the reference standard has not been performed in all patients. This partial verification may lead to biased accuracy measures of the test under study. Several solutions have been proposed to alleviate this bias. The authors studied the performance of multiple imputation and the conventional correction method proposed by Begg and Greenes under a range of different situations of partial verification, to examine under which circumstances they produce similar results and when their results differ. Method: In a series of simulations, using a previously published Deep Venous Thrombosis dataset (N = 1292), the authors deliberately set the outcome of the reference standard to missing based on various

32nd Annual Meeting of the Society of Medical Decision Making (SMDM) Abstracts Downloaded from mdm.sagepub.com at Chung Shan Medical University Library on February 24, 2011

E55

ABSTRACTS

underlying mechanisms and by varying the total number of missing values. They then compared the performance of different correction methods (ie, Multiple Imputation and the Begg and Greenes correction method) in each of these patterns of verification, in particular their ability to reduce the bias in estimates of accuracy by comparing it with the true value in the complete dataset. Result: Results of the study show that when the mechanism of missing reference data is known, accuracy measures can easily be correctly adjusted using either the Begg and Greenes method, or multiple imputation. In situations where the mechanism of missing reference data is complex or unknown, multiple imputation is more flexible and straight forward than the Begg and Greenes correction method. Conclusion: Partial verification by design can be a very efficient data collection strategy. In that case the pattern of missing reference data will be known and accuracy measures can easily be correctly adjusted using either Begg and Greenes method, or (Multiple) Imputation. If not defined by design, partial verification should be avoided, as it can seriously bias the results. There are however situations where the mechanism of missing reference data is not known and partial verification can not be avoided. In these situations we strongly recommend to use Multiple Imputation methods to correct. These methods are more flexible and straight forward than the Begg and Greenes correction method and give reliable estimates of the missing reference data.

the modalities in which they performed the worst, suggesting performance differences were not due to lack of effort. Conclusion: The introduction of changes to the breast screening programme such as new technology or workstation changes may affect mammography readers in different ways dependent on the training route they took, even if they do not impact on diagnostic performance overall. Identifying these relationships may help optimise performance.

DEM-3 THE EFFECT OF RADIOLOGY TRAINING ROUTE AND WORKSTATION LAYOUT ON MAMMOGRAPHY DECISIONS (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Sian Taylor-Phillips, PhD1, Matthew G. Wallis, MBChB2, Alison Duncan, MBChB, FRCR3, Aileen Clarke, MBChB1, and Alastair G. Gale, PhD4 (1) The University of Warwick, Coventry, United Kingdom, (2) Addenbrookes Hospital, Cambridge, United Kingdom, (3) University Hospital Coventry and Warwickshire Hospital NHS Trust, Coventry, United Kingdom, (4) Loughborough University, Loughborough, United Kingdom Purpose: To measure the effect of mammography readers’ previous training on diagnostic performance in the transition to digital mammography. Method: In the UK there are two types of mammography reader; radiologists and radiography advanced practitioners (radiographers trained to read mammograms). Radiologists experience a broader training programme in medicine and radiology. Performance of these two groups was investigated with two different digital mammography workstation layouts: with prior mammograms (from the previous screening round) digitised and adjacent to the current mammograms; or presented in film format and perpendicular to the current mammograms. 160 difficult test cases were read by four radiologists and four radiography advanced practitioners at each workstation. Diagnostic performance was measured using Jackknife Free Response Receiver Operating Characteristic (JAFROC). Whether participants looked at the prior mammograms and time taken was recorded for each case using video equipment. Result: There was no difference in overall performance between radiologists and radiography advanced practitioners (F(1) = .002, P = .97). There was an interaction between type of mammography reader and both workstation layout (F(1) = 10.5, P = .048), and percentage of cases for which the prior mammogram was looked at (F(1,6) = 11.6, P = .01). Radiologists outperformed radiography advanced practitioners and looked at the prior mammograms for a greater proportion of cases when using film prior mammograms, but the reverse relationship was apparent when using digitised prior mammograms, see figure 1. Participants spent longer per case in

DEM-4 COMPARISON OF VARIANTS OF A COMPUTERIZED CHEST PAIN DIAGNOSIS TUTOR (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Robert M. Hamm, PhD1, Timothy A. Wolfe, AA2, Christopher E. Thompson, AA1, Eric E. Arbuckle, MD3, Sven E. Berger4, David G. Aldrich3, Bruna M. Varalli-Claypool1, and Frank J. Papa, DO, PhD3 (1) University of Oklahoma Health Sciences Center, Oklahoma City, OK, (2) East Central University, Ada, OK, (3) Texas College of Osteopathic Medicine, Fort Worth, TX, (4) ACDET, Inc., Fort Worth, TX Purpose: The KBIT tutorial approach presents many (>50) case examples of a target presentation, such as chest pain, to sharpen students’ ability to recognize and discriminate different diagnostic categories. In a previous study, study booklets highlighting particular symptoms’ ability to discriminate confusable diagnoses improved students’ diagnostic accuracy. Here we compared three formats of computer-generated error feedback, varying their focus on symptoms that discriminate right case diagnosis from students’ wrong answer. Method: 53 physician assistant students (Study 1) and 54 PA, 15 MD, and 15 other students (Study 2) completed pretest, tutorial, posttest, and 2 week follow-up test. The students studied each of 9 diagnoses’ symptom lists, then diagnosed 49 practice cases described in terms of history and physical, with multiple choice response and immediate error feedback. Students were randomized to receive different error feedback for misdiagnosed cases: prototype of right answer (1 column feature list); features


ABSTRACTS

common to both right answer and student’s wrong answer plus features unique to right answer (2 column); or common features plus those unique to each disease (3 column). Study 1 participants saw 3 diseases in each of the three formats, counterbalanced. Study 2 participants saw just one format. Tests presented similar cases, without feedback, with 17 items repeated on all three occasions. Cases’ surface details and case order were varied upon repetition. Result: In Studies 1 & 2, participants diagnosed significantly more of the 17 repeated cases on post test (74%/67%) and 2-week follow up (59%/49%) than on pretest (43%/36%). At each time point, students diagnosed correctly more items considered “easier” on the basis of KBIT’s underlying prototype theory of category learning. The expected differences in accuracy gain due to the format of error feedback were not observed. Conclusion: The study demonstrated that the tutorial, with its error feedback for many cases, using any of the three forms of error feedback, contributes to student learning of chest pain diagnosis. Contrary to expectation, students did not learn more about those diseases for which they had received 2 or 3 column feedback, which highlight symptoms’ ability to discriminate between diagnoses, compared to the simple reminder of the correct diagnosis’ symptoms. Possible explanations include: insufficient training in use of tutorial feedback, limited exposure (few errors), test cases insensitive to lessons learned, or adequacy of the prototype feedback. DEM-5 DIAGNOSING ALCOHOLIC LIVER DISEASE: A PILOT STUDY FOR NICE (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Matt Stevenson, PhD, and Myfanwy Lloyd Jones, PhD The School of Health and Related Research, University of Sheffield, Sheffield, England Purpose: This work was undertaken as a pilot study for the recently formed NICE diagnostic appraisal committee. Non-invasive liver tests (NILTs) for alcoholic liver disease (ALD) were chosen; these tests have the potential to reduce both the costs and risk of mortality associated with liver biopsy. Four NILTs were evaluated: three blood tests (FibroTest, FibroMAX and the ELF test) and one transient elastography (FibroScan). Method: Systematic reviews were undertaken and complimented by specialist expert input to estimate parameters within a mathematical model. Key parameters included the sensitivity and specificity of each NILT compared with biopsy; the sensitivity of biopsy; the mortality associated with biopsy; and the cost and QALY implications for combinations of diagnostic test outcomes (true positives, false positives etc) and patient abstinence status (abstinent or not). The model was constructed to assess the cost-effectiveness of strategies using a NILT followed by a confirmatory biopsy where cirrhosis was indicated and strategies relying on the diagnosis of the NILT alone compared with biopsying all patients. Result: The most cost-effective intervention was markedly influenced by the assumed rates of abstinence for each strategy. Abstinence is the mainstay of treatment for all patients with ALD and can greatly increase life expectancy. Studies report a higher abstinence rate for patients whose biopsy diagnosed cirrhosis compared with those where it did not. If these rates are appropriate to NILT-only strategies, then tests with low specificity produce the greatest health benefit due to an increase in abstinence amongst the false positives. A key uncertainty is whether abstinence rates following a diagnosis made using a NILT alone would be influenced by patient or physician knowledge of low specificity. The cost-effectiveness of a NILT followed by a biopsy strategy is currently uncertain with scenario analyses reporting favourable and unfavourable values. These results are dependent on assumptions

regarding sensitivities and specificities for each diagnostic test, and on the cost and utility estimates from treating diseases unrelated to ALD detected on biopsy, which have relatively wide confidence intervals. Conclusion: The cost-effectiveness of NILT-only strategies are determined by the abstinence rates assumed in those diagnosed as having cirrhosis, many of whom will be false positives. A NILT used prior to a biopsy may be cost-effective. Further data are required before definitive conclusions can be determined. DEM-6 DO DIAGNOSTIC REASONING FAULTS ALWAYS RESULT IN DIAGNOSTIC ERRORS OR PATIENT HARM? (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Laura Zwaan, MSc1, Abel Thijs, MD, PhD2, Cordula Wagner, PhD3, Gerrit van der Wal, MD, PhD1, and Danielle R. M. Timmermans, PhD1 (1) EMGO Institute/ VU University Medical Center, Amsterdam, Netherlands, (2) Department of Internal Medicine, VU University Medical Center, Amsterdam, Netherlands, (3) NIVEL and VU University Medical Center, Utrecht, Netherlands Purpose: The present study aims to examine the occurrence and causes of faults in the diagnostic reasoning process (suboptimal cognitive acts) and relate them to diagnostic error and patient harm. Method: Physicians included 247 dyspnea patients in the study whose patient records were reviewed by expert internists. A questionnaire to review the diagnostic reasoning process was developed using the Dephi-method. The record review focused on detecting suboptimal cognitive acts, diagnostic error and patient harm. The findings of the record reviews were discussed with the treating physicians and subsequently classified using Reasons’ taxonomy of unsafe acts. Result: Suboptimal cognitive acts (such as, incomplete medical history taking, or not performing a necessary EKG) occurred in 66% of all cases. In 13.8% of all cases a diagnostic error occurred and in 11.3% the patient was harmed. There was an overlap between cases with diagnostic errors and cases with patient harm of 3.2%. Diagnostic error and patient harm more often occurred in cases with more suboptimal cognitive acts. However, diagnostic errors and patient harm also occurred in cases where no suboptimal cognitive act were involved (e.g., due to complications). Based on the expert judgments and the interviews with the treating physicians, the causes were mostly classified as mistakes (46%) and slips (18%). Conclusion: Diagnostic errors and patient harm were associate with more suboptimal cognitive acts. There was, however, no complete overlap between these three components. Diagnostic errors and patient harm also occurred when no faults were involved. Often, more than one suboptimal cognitive act occurred in cases with diagnostic errors and patient harm suggesting that a series of suboptimal cognitive acts were involved. Mistakes and slips were mostly seen as causes of suboptimal cognitive acts. This implies that physicians did not realize their actions were incorrect. More supervision could be a way to reduce diagnostic errors. DEM-7 HOW ARE EVIDENCE-BASED DECISIONS RULES APPLIED TO PATIENTS WITH A SUSPICION ON PULMONARY EMBOLISM? (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Laura Zwaan, MSc1, Abel Thijs, MD, PhD2, Cordula Wagner, PhD3, and Danielle R. M. Timmermans, PhD1 (1) EMGO Institute/VU University Medical Center, Amsterdam, Netherlands, (2) Department of Internal Medicine, VU University


E57

ABSTRACTS

Medical Center, Amsterdam, Netherlands, (3) NIVEL and EMGO Institute for Health and Care Research, Utrecht, Netherlands Purpose: Pulmonary embolism is a frequently missed diagnosis. The goal of the present study is to examine whether the evidencebased decisions rules to diagnose pulmonary embolism are correctly applied in clinical practice. Method: Physicians included 247 dyspnea patients in the study. Directly after the first examination, the physicians indicated the differential diagnoses and the likelihood of those diagnoses. After the patient was discharged from the hospital, their patient records were reviewed by experts. The cases for which the physicians suspected pulmonary embolism were selected for further analysis. The diagnostic process of those cases was compared to the evidencebased decisions rules for diagnosing pulmonary embolism which are based on the Wells score. In addition, 16 interviews were conducted with physicians who did not follow the evidence-based decision rules to obtain information on the reasons why the decision rules were not applied. Result: The results showed that in 80 out of 247 cases the physician suspected pulmonary embolism. The evidence-based decision criteria were correctly applied in 17 out of the 80 cases. In 36 cases unnecessary tests were performed to diagnose pulmonary embolism (i.e., CTa or D-dimer), while in 39 cases pulmonary embolism was not sufficiently examined, meaning that pulmonary embolism could have been missed. When the physicians were asked about their decisions they indicated that they did not want to expose the patient to the radiation of a CTa or they considered another diagnosis to be more likely assuming the patient did not have pulmonary embolism as well. Therefore they decided not to examine the patient more extensively. Conclusion: The evidence-based decision rules are not always correctly applied in clinical practice. In a substantial number of cases in which pulmonary embolism was suspected either no diagnostic tests were performed or unnecessary diagnostic tests took place. The physicians tended to overrule the criteria when they examined a patient. Physicians should be better trained and motivated to correctly apply the evidence-based decision rules in order to improve the diagnostic process. DEM-8 SAFE PATIENTS, SMART HOSPITALS: WHAT CAN THE DEM MOVEMENT LEARN FROM SUCCESSES IN THERAPEUTIC SAFETY? (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Peter J. Pronovost, MD, PhD, FCCM Johns Hopkins University School of Medicine, Baltimore, MD World-renowned patient safety leader and change agent Peter Pronovost will leverage his extensive personal experience with creating a culture of safety in medicine and the use of simple interventions such as checklists to create measurable improvements in patient safety to suggest a path forward for the burgeoning Diagnostic Error in Medicine movement. DEM-9 RESEARCH METHODS PANEL DISCUSSION: IS THERE A RIGHT WAY FORWARD? (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Peter J. Pronovost, MD, PhD, FCCM1, Brendan C. Delaney, MD2, Olga Kostopoulou, PhD, MSc, BA2, David E. Newman-Toker, MD, PhD3, Gordon Schiff, MD4, Michael J. Schull, MD, MSc, FRCPC5, and Hardeep Singh, MD, MPH6 (1) Johns Hopkins University School of Medicine, Baltimore, MD, (2) King’s College London, London, United Kingdom, (3) Johns

Hopkins Hospital, Baltimore, MD, (4) Brigham and Women’s Hospital, Boston, MA, (5) Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada, (6) Baylor College of Medicine, Bellaire, TX DEM-10 INTUITION IN DECISION MAKING—POTENTIAL AND SHORTCOMINGS (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Tilmann Betsch, PhD University of Erfurt, Erfurt, Germany I claim that intuition is capable of quickly processing multiple pieces of information without noticeable cognitive effort. I advocate a component view stating that intuitive processes in judgment and decision making are responsible for information integration and preference formation. Analytic thinking mainly guides search, generation and change of information. I present empirical evidence corroborating my notion of intuition. Specifically, I show that integration of information and formation of preferences functions without cognitive control and is unconstrained by the amount of encoded information and cognitive capacity. I close with discussing conditions under which intuition will increase or decrease decision accuracy. DEM-11 FEELING WE’RE WRONG: INTUITIVE BIAS DETECTION DURING DECISION-MAKING (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine Wim De Neys, PhD University of Toulouse, Toulouse, France Human thinking is often biased by heuristic intuitions. Popular theories have argued that people overrely on intuitive thinking and fail to engage in more demanding logical reasoning. However, the nature of the intuitive bias and logical thinking failure are poorly understood. It is not clear whether the bias results from a failure to detect that the heuristic intuitions conflict with more logical considerations or from a failure to discard and inhibit these tempting heuristics. The exact locus of the intuitive bias has far-stretching consequences for the debate on human rationality. If people were at least to detect the conflict, this would imply that they are no mere illogical thinkers but are aware that their response is not fully warranted. Specifying the exact bias locus is also paramount for the development of more effective intervention programs to prevent biased thinking. However, the field lacks clear data to settle this debate. In my research I am addressing this fundamental problem with an interdisciplinary approach that combines reasoning research with insights from the memory and cognitive control field. By relying on a combination of experimental, developmental, and neuroscientific methods I managed to start characterizing the conflict detection and inhibition mechanisms during thinking. This unique multifaceted approach has demonstrated that conflict detection during thinking is remarkably flawless. Although people fail to inhibit tempting heuristics, they at least implicitly detect that their answer is not warranted. This implies that people are far more logical than the widespread bias suggests and provides new insights on the alleged human irrationality. DEM-12 INSTINCT VS REASON: HOW METACOGNITIVE EXPERIENCES CONTROL ANALYTIC THINKING (DEM) Diagnostic Errors in Medicine—Diagnostic Errors in Medicine


ABSTRACTS

Valerie Thompson, BSc, MA, PhD University of Saskatchewan, Saskatoon, SK, Canada Often when making decisions, one or more of the potential choices is suggested by automatic, fast acting heuristic processes. Advertisers, for example, rely on a sense of familiarity to increase the appeal of their products. Even complex judgments made by experts can be delivered by heuristic processes. These initial, intuitive judgments can, in theory, be overturned by recourse to more reasoned analysis. However, as the extensive heuristics and biases literature demonstrates, reasoners often give responses that are consistent with the initial intuition, even though this leads them to neglect relevant principles of probability and logic. This phenomenon motivates Dual Process Theories of reasoning, which posit that automatic Type 1 processes give rise to a highly contextualised representation of the problem and attendant judgments that may or may not be analysed extensively by more deliberate, decontextualised Type 2 processes. A critical, but unanswered question concerns the issue of monitoring and control: When do reasoners rely on the first, intuitive output and when do they engage more effortful thinking? In this talk, I will present data to support the hypothesis that the compellingness of these intuitions can be attributed to a secondorder metacognitive judgment, which I will call the Feeling of Rightness. In other words, the initial intuition has two distinct aspects: the content of the choice delivered to working memory and a judgment about how right that decision feels. It is this latter judgment that determines the probability that more deliberate, analytic processes are engaged. GLH-1 PATIENT- AND POPULATION-LEVEL HEALTH CONSEQUENCES OF DISCONTINUING ANTIRETROVIRAL THERAPY (ART) IN RESOURCE-LIMITED SETTINGS (GLH) Global Health—Resource allocation and policy planning April D. Kimmel, PhD, MSc1, Stephen C. Resch, PhD, MPH2, Xavier Anglaret, MD, PhD3, Norman Daniels, PhD2, Sue J. Goldie, MD, MPH2, Christine Danel, MD, PhD4, Kenneth A. Freedberg, MD, MSc5, and Milton C. Weinstein, PhD2 (1) Weill Cornell Medical College, New York, NY, (2) Harvard School of Public Health, Boston, MA, (3) Inserm U897, Bordeaux, France, (4) PACCI Program, Abidjan, Ivory Coast, (5) Massachusetts General Hospital, Boston, MA Purpose: In resource-limited settings, increasing numbers of HIV-infected individuals are initiating ART and remaining in care longer. Many HIV budgets, however, are flattening or decreasing. By modeling a policy of discontinuing ART after treatment failure, we aimed to highlight trade-offs among competing policy goals of optimizing individual health outcomes, population health outcomes, and the number receiving treatment. Method: We assessed three HIV treatment strategies: (1) no ART; (2) never discontinue ART (Status Quo); and (3) discontinue ART after failure (Alternative). We used a state-transition model (CEPACInternational) to simulate annual probabilities of survival and receiving ART for treatment-eligible, HIV-infected individuals in the absence of treatment constraints. These estimates were then fed into a population-level linear programming model that included constraints on treatment capacity. For simplicity, we assumed that incidence of new patients and treatment capacity were constant over time. Data were derived from clinical trials and cohort studies conducted in Côte d’Ivoire, West Africa. Treated individuals received two sequential ART regimens; switching to 2nd-line ART and discontinuation of 2nd-line ART (Alternative strategy only) occurred upon detection of antiretroviral failure, defined as a 50% decrease in peak CD4 count. Individuals receiving a failed ART regimen continued to experience some treatment benefit, including decreased risk of AIDS-related mortality. At the population level,

we assumed an analytic time horizon of 5 years and the number of treatment-eligible cases (100,000/year) exceeded treatment capacity (25,000/year). Result: At the population level, including treated and untreated individuals, the Alternative strategy increased total life-years by 10,000 (+0.8%) to 1.23 million compared to the Status Quo strategy. The Alternative strategy increased the average number initiating ART annually by 180 individuals (+13.3%) to 1,530 compared to the Status Quo. Although more individuals received treatment under the Alternative strategy, life expectancy for treated individuals decreased by 0.8 years (–4.6%) to 16.4 years compared to the Status Quo. Among patients receiving ART over the 5-year period, 20.7% died under the Alternative strategy compared to 18.7% under the Status Quo. Results were sensitive to the timing of detection of ART failure, number of ART regimens, and level of treatment capacity. Conclusion: With limited HIV treatment resources, trade-offs emerge between maximizing health outcomes for individual patients receiving treatment and maximizing health outcomes and access to treatment at the population level. GLH-2 HEALTH INSURANCE, CATASTROPHIC MEDICAL SPENDING, AND DIABETES TREATMENT IN DEVELOPING COUNTRIES (GLH) Global Health—National and local health systems Crystal M. Smith-Spangler, MD, Veterans Affairs Palo Alto Health Care System and Stanford University, Stanford, CA and Jeremy D. Goldhaber-Fiebert, PhD, Stanford University, Stanford, CA Purpose: People obtain health insurance to protect themselves from the high costs of medical care and particularly from catastrophic medical spending. Numerous developing countries provide health insurance schemes, though insurance is often incomplete, necessitating out-of-pocket expenses for major illnesses. The global increase in diabetes has also increased the need for potentially unaffordable medical care. This study quantified the out-of-pocket medical spending associated with diabetes in developing countries and assessed whether health insurance was associated with less catastrophic medical spending and increased availability of diabetes medications. Method: Using 2002–3 World Health Survey data (n = 124,484 individuals from 35 low- and middle-income countries), we estimated the relationship between diabetes and out-of-pocket medical expenditures, expressed in 2003 international dollars, with quantile regressions that conditioned on age, sex, income, urban location, smoking, educational attainment, and health insurance. Similarly, we estimated the relationship between diabetes, catastrophic medical spending, health insurance, and possessing diabetes medications using logistic regressions. All analyses included country fixed effects and robust standard errors clustered by country. Result: Diabetes is associated with differentially higher out-ofpocket medical spending, particularly among individuals with high levels of out-of-pocket spending (excess spending of $3/year [95% CI: $2-$4] at the 50th percentile of out-of-pocket spending rising to $159/year [95% CI: $133-$184] at the 95th percentile). While having diabetes is associated with catastrophic spending (OR 1.38 [95% CI: 1.06–1.80]), health insurance is not significantly associated with reductions in catastrophic spending among diabetics (OR 1.07, P = 0.84]) or non-diabetics (OR 1.06, P = 0.28). Among diabetics, insurance is not significantly associated with increased diabetes medication possession (OR 1.17 [95% CI: 0.75–1.84]). Conclusion: Individuals with diabetes in low- and middleincome countries have higher out-of-pocket medical expenditures and a greater risk of catastrophic medical spending. In these settings, current health insurance schemes do not provide sufficient protection against costly medical care or sufficient subsidies to ensure that diabetes medication are obtained. To improve outcomes and avert


E59

ABSTRACTS

the costs of diabetes in developing countries, policies that combine primary prevention, improved health insurance mechanisms, and accessible delivery of healthcare delivery are likely required. GLH-3 PARENTS OF CHILDREN WITH HEALTH PROBLEMS: HOW DOES THEIR HEALTH CHANGE OVER TIME? (GLH) Global Health—Clinical epidemiology, e.g., neglected tropical diseases, malnutrition Jamie C. Brehaut, PhD1, Rochelle Garner, PhD2, Dafna Kohen, PhD2, Anton Miller, MD3, Lucyna Lach, PhD4, Anne Klassen, DPhil5, and Peter Rosenbaum, MD5 (1) Ottawa Hospital Research Institute, Ottawa, ON, Canada, (2) Statistics Canada, Ottawa, ON, Canada, (3) University of British Columbia, Vancouver, BC, Canada, (4) McGill University, Montreal, QC, Canada, (5) McMaster University, Hamilton, ON, Canada Purpose: The literature on the health of parents of children with health problems is mixed as to how their health should change over time, with hypotheses suggesting that it should a) decline faster over time, or b) actually improve over time compared to caregivers of healthy children. Canadian population-based data were used to examine changes over a 10-year time period in the health of caregivers of children with increasingly complex health problems compared to caregivers of healthy children. Method: The Canadian National Longitudinal Study of Children and Youth provided data collected biennially from 9401 children and their caregivers over six data cycles between 1994/5 and 2004/5. Linear and logistic growth curve analyses were used to model two outcomes: self-reported general health, based a single item reported on a 5-point scale ranging from 1 (poor) to 5 (excellent), and number of depressive symptoms, based on a shortened version of the Center for Epidemiological Studies Depression Scale (CES-D). The sample was divided into four groups of caregivers: caregivers of Healthy children, and caregivers of children with 1, 2, or 3 or more of four conceptually distinct indicators of child health problems. Modeled covariates included child (age, gender and only child status) and caregiver variables (age, gender, education, income, marital status). Result: Models showed that after controlling for covariates, caregivers of children with more severe health problems were less likely than caregivers of healthy children to report good general health (Healthy child group: 76.9%; 3+ indicators group 60.7%, P < .001) and reported more depressive symptoms (Healthy: 4.2/36; 3+ indicators: 7.1/36, P < .001). The size of this caregiver health effect varied according to complexity of child health problems (overall = 100.20, df = 3, P < .0001). Over the 10-year time period, there was a consistent 25% reduction in the percentage of respondents self-reporting good health for all four groups. Conclusion: The caregiver health decrement associated with having a child with a health problem is relatively stable over time, can persist for many years, and is greater with more complex child health problems. Increased awareness of these patterns should inform providers and policy-makers in their decisions about how to serve these families. GLH-4 HUMAN PAPILLOMAVIRUS TESTING FOR THE DETECTION OF CERVICAL INTRAEPITHELIAL NEOPLASIA: A SYSTEMATIC REVIEW (GLH) Global Health—Clinical epidemiology, e.g., neglected tropical diseases, malnutrition Emily Burger, BSBA1, Hege Kornor, PhD2, Marianne Gjertsen, MD, PhD2, Vigdis Lauvrak, ScD2, and Ivar Sønbø Kristiansen, MD, PhD1 (1) University of Oslo, Oslo, Norway, (2) The Norwegian Knowledge Centre for the Health Services, Oslo, Norway

Purpose: The purpose of the study was to compare the diagnostic accuracy of human papillomavirus (HPV) mRNA testing to HPV DNA testing. Method: We searched electronic databases MEDLINE, EMBASE and Cochrane Library from January 1996 through March 2010 using a predefined search strategy. We included publications in English or a Scandinavian language reporting data allowing the construction of a 2 × 2 table from studies with >50 participants where: an HPV mRNA test assessing >2 HPV genotypes was compared to an HPV DNA test, and the reference standard was histologically confirmed cervical intraepithelial neoplasia 2+ (CIN2+). Two reviewers independently assessed study eligibility, extracted data, and assessed risk of bias. Sensitivity, specificity, positive and negative likelihood ratios, positive and negative predictive values and diagnostic odds ratios were calculated for each study. In addition, we fitted a series of summary receiver operating characteristics (SROC) curves. Result: Out of 3126 potentially relevant citations, ten publications (nine studies) met our inclusion criteria. The included studies were of varying methodological quality, and predominately performed in a secondary screening setting. Six studies investigated the performance of the Pre Tect Proofer/NucliSENS EasyQ, two studies investigated the performance of the APTIMA assay and one study investigated both mRNA tests on the same patient samples. Due to few studies and considerable clinical heterogeneity, pooling of data was not possible. Instead, we compiled a ‘best evidence synthesis’ for E6/E7 mRNA HPV testing. Sensitivities ranged from 0.41 to 0.86 and from 0.90 to 0.95 for the Pre Tect Proofer and APTIMA assay, respectively. Specificities ranged from 0.63 to 0.97 and from 0.42 to 0.61 for the Pre Tect Proofer and APTIMA assay, respectively. The SROC curves for both mRNA tests were to the left of the diagonal and the APTIMA assay performed closest to the DNA tests. Conclusion: The synthesized evidence suggests that mRNA tests have diagnostic relevance. The Pre Tect Proofer appears to have a higher clinical specificity in exchange for a lower sensitivity when compared to DNA tests. The APTIMA results indicate that it may perform very closely to the DNA tests with a slightly higher specificity. Additional studies and economic evaluations should be conducted in order to make a solid conclusion regarding the clinical applicability of HPV mRNA testing. GLH-5 A SYSTEMATIC REVIEW OF THE INCIDENCE AND ECONOMIC BURDEN OF SURGICAL SITE INFECTION IN KOREA (GLH) Global Health—Clinical epidemiology, e.g., neglected tropical diseases, malnutrition Jonathan T. Tan, PhD1, Kristina Coleman, PhD, MPH1, Sarah Norris, PhD1, and Laurent Metz, MD, MBA, MS2 (1) Health Technology Analysts Pty Ltd, Balmain, Australia, (2) Johnson & Johnson, Singapore, Singapore Purpose: To conduct a systematic review of literature on the epidemiological and economic burden of surgical site infection (SSI) in Korea. Method: A literature search of the EMBASE, Medline and KoreaMed databases for English and Korean language publications was conducted. Searches for epidemiological and economic studies were conducted separately and limited to 1995–2010 to ensure the pertinence of the data. Relevant studies were identified using pre-defined criteria (i.e., reports rate, risk factors, cost of SSI; conducted in a hospital setting; not an intervention study). Result: Twenty-five studies were included in this systematic review. The overall incidence of SSI in Korea was between 2–5%. However, the incidence varies more widely depending on the surgical procedure examined. In particular, surgeries involving the gastrointestinal tract were associated with higher rates of SSI


ABSTRACTS

(up to 30%). The National Nosocomial Infections Surveillance (NNIS) risk index was positively correlated with the risk of developing an SSI. Specific risk factors were identified through multivariate analyses; these included diabetes, antibiotic prophylaxis and wound classification, which were shown to increase the risk of SSI by over 100%. The pathogens more commonly associated with SSI in Korea were Pseudomonas aeruginosa and Staphylococcus aureus. SSIs are associated with increased hospitalisation cost, with each episode of SSI estimated to cost an additional ₩2,000,000. A substantial portion of the increased cost was attributed to hospital room costs and the need for additional medication. One study reported that the cost of antibiotics in patients who developed SSIs was, on average, ₩561,068 more compared to patients without SSIs. Studies also found that post-operative stays in patients with SSIs were 5 to 20 days longer, while two studies reported that following cardiac surgery, patients with SSIs spent an additional 5 to 11 days in the ICU, compared to patients without SSIs. Conclusion: The incidence and cost estimates demonstrate that SSI represents a significant burden to the Korean healthcare system. Consequently, the identification of high-risk patient populations and the development of strategies aimed at reducing SSI could provide cost-savings and improve the efficiency of the Korean healthcare system. GLH-6 THE IMPORTANCE OF PREVALENCE ON SETTING POSITIVE TEST THRESHOLDS AND ON TEST OUTCOMES: THE CASE OF TUBERCULOSIS (GLH) Global Health—Resource allocation and policy planning Tanya G. K. Bentley, PhD, Partnership for Health Analytic Research, LLC, Beverly Hills, CA, Antonio Catanzaro, MD, University of California San Diego, La Jolla, CA and Theodore Ganiats, MD, UCSD School of Medicine, 9500 Gilman Drive, La Jolla, CA Purpose: Many factors affect the balance of true and false test results, and the interaction of two such factors—disease prevalence and the positive threshold—cause results to differ in high versus low-prevalence settings. We used an example of testing for latent tuberculosis infection (LTBI) to demonstrate the importance of disease prevalence in decisions regarding positive thresholds and test strategies. Method: We compared number of true and false positive results when using two LTBI screening tests (in-tube QuantiFERON-TB Gold [QFT-IT] and T-SPOT.TB) in five countries of varying prevalence. We used estimates from test manufacturers to ascertain each test’s positive thresholds, from published literature to determine sensitivity (81%, QFT-IT; 88%, T-SPOT.TB) and specificity (99%; 88%), and from the World Health Organization to estimate countryspecific LTBI prevalence. We assumed sensitivity and specificity remained stable, with prevalence the only difference between settings. Result: In switching from QFT-IT to T-SPOT.TB, the 7% increase in sensitivity impacted number of true positives more in highprevalence settings, and the 11% decrease in specificity impacted number of false positives more in low-prevalence settings. Tradeoffs between increasing case identification and decreasing unnecessary treatments thus differed by orders of magnitude as prevalence varied, with lower-prevalence settings paying a “price” of accepting more false positives for each true positive gained. For example, the number of false positives per true positive gained in the United States, with 5% LTBI prevalence, was close to 10-fold higher than in Mexico with 29% prevalence, and 30-fold higher than in Ivory Coast with 55% prevalence. Lower-prevalence countries may therefore determine that a 7% increase in early case detection benefits too few people to justify the high burden of false positives, while higher-prevalence countries may decide that a greater increase in early detection is worth the increased treatment of false positives, especially in settings with limited access to care.

Conclusion: Sensitivity and specificity of tests such as QFT-IT and T-SPOT.TB differ in large part because of positive test thresholds, which are applied by test manufacturers equivalently—yet can result in largely different outcomes—between settings. To optimize test performance and improve outcomes, sensitivity and specificity should be set locally not globally, by incorporating prevalence in conjunction with other disease- and setting-specific factors when making testing decisions. GLH-7 SUPPLY CHAIN AND SYSTEM FACTORS THAT EXPLAIN VARIATIONS IN STATE VACCINATION COVERAGE LEVELS OF THE NOVEL H1N1 VACCINE (GLH) Global Health—National and local health systems Carlo S. Davila Payan, MS1, Pascale Wortley, MD2, and Julie L. Swann, PhD1 (1) a. Centers for Disease Control and Prevention, b. Georgia Institute of Technology, Atlanta, GA, (2) Centers for Disease Control and Prevention, Atlanta, GA Purpose: In response to the 2009 H1N1 influenza pandemic, millions in the US were vaccinated, with state-specific coverage ranging from 8.7% to 34.4% for adults and 21.3% to 84.7% for children under 18; we study factors associated with higher vaccination coverage in a system where vaccine was in short supply. Method: We used regression coupled with other statistical techniques to predict state-specific vaccination coverage of adults or children, using independent variables including demographics, and area (US Census Bureau); past seasonal adult or childhood vaccination coverage (Behavioral Risk Factor Surveillance System, National Immunization Survey); Public Health Emergency Response Funds (CDC); physician counts (US Bureau of Labor and Statistics); children’s health information (National Center for Health Statistics); H1N1-specific state and local data at the CDC (level of allocation control, type of allocation priority, participation of VFC providers, date of expansion beyond ACIP target groups, number of shipments, number of ship-to locations, lead time for allocation and ordering, peak week of influenza-like illness activity); and degree of local autonomy of the public health system. Result: The best models including only statistically significant variables explained over 70% of the variation in state-specific vaccination coverage of adults or children. We find that higher past seasonal influenza vaccination coverage of adults was associated with higher 2009 H1N1 vaccination coverage of adults and children, and accounted for 30% of the variation. In terms of supply chain factors, vaccination coverage changed positively with the number of shipments per location and negatively with the time to order allocated doses. For children, the proportion of the state’s population 4y and had an average age of 59y. 76% were on prescription OA medications, 49% had hypertension and 36% were using PPIs. Patients ranked ambulatory pain (6.32; 95% CI 5.0- 7.6) and difficulty doing daily activities (6.32; 95% CI 5.0-7.6) as the most important benefit followed by resting pain (2.80; 95% CI 1.8–3.8), and stiffness


E69

ABSTRACTS

(2.65; 95% CI .9–4.4). Incremental changes (3%) in the risk of heart attack or stroke were assessed as the most important risk (10.00; 95% CI 8.2–11.8; and 8.90; 95% CI 7.3–10.5, respectively). A 2.5% incremental change in one-year ulcer risk (3.61; 95% CI2.6–4.6) and the risk of hypertension (3.02; 95% CI 2.8–3.2) were valued less. Overall we identified no significant differences in preferences across subgroups, but patients with hypertension weighed CV risks higher and patients on PPIs placed less weight on the risk of ulcer. Conclusions: Patients do have well defined preferences across NSAID-related benefits and risks. These preferences can be estimated and used to examine acceptable tradeoffs between benefits and risks. The observed differences in defined subgroups serve to validate our results, but need to be more rigorously examined in a larger sample of OA patients. HSR-1 COST-EFFECTIVENESS OF DRONEDARONE COMPARED TO AMIODARONE FOR PATIENTS WITH PAROXYSMAL ATRIAL FIBRILLATION (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Shelby L. Corman, PharmD, and Kenneth Smith, MD, MS University of Pittsburgh School of Medicine, Pittsburgh, PA Purpose: Dronedarone was FDA-approved in 2009 for the treatment of paroxysmal or persistent atrial fibrillation or atrial flutter in patients who are in sinus rhythm or who will be cardioverted. Dronedarone is more costly than its structural analogue, amiodarone, and preliminary data show it to be less effective than amiodarone but associated with fewer adverse events. Therefore, the purpose of this analysis was to compare the cost-effectiveness of dronedarone to amiodarone. Methods: A Markov model was used to estimate the costs and effectiveness of dronedarone and amiodarone in patients with paroxysmal atrial fibrillation. Costs were measured in 2009 US dollars, and effectiveness was measured in quality-adjusted life years (QALYs). All costs and probabilities were drawn from published literature, and medication costs were expressed as average wholesale prices. A five-year time horizon and three-month cycle length were used. Costs and effectiveness were discounted at 3% per year. The base case analysis assumed a 10% probability of significant hyperthyroidism, hypothyroidism, and pulmonary toxicity in patients receiving amiodarone, and a relative risk of 0.5 for dronedarone versus amiodarone for each adverse event. A probabilistic sensitivity analysis was used to determine the probability of cost-effectiveness when all model variables were sampled from distributions reflecting uncertainty. Results: In the base case analysis, the dronedarone strategy is dominated, costing $6690 more and having 0.07 fewer QALYs than the amiodarone strategy. In one-way sensitivity analyses, pulmonary toxicity risk is the only variable that could alter this conclusion. If the ten-year probability of amiodarone-related pulmonary toxicity is ≥32% (clinically plausible range: 2%-17%), dronedarone is not dominated; if this probability is 52% or 44% then the ICER for dronedarone is $50,000 or $100,000/QALY, respectively. At an acceptability threshold of $100,000/QALY, amiodarone is favored if its pulmonary toxicity risk is ≤36% even if there is no risk of dronedarone pulmonary toxicity. A probabilistic sensitivity analysis favors amiodarone in >98% of 5000 iterations at a $100,000/ QALY threshold. Conclusions: Dronedarone is more costly and less effective than amiodarone if the five-year probability of pulmonary toxicity with amiodarone is less than 32%. Due to its higher cost, we did not find dronedarone to be economically reasonable unless it has no pulmonary risk and amiodarone-related pulmonary toxicity occurs more frequently than previously reported.

HSR-2 INPATIENT COSTS OF CORONARY HEART DISEASE AMONG ADULTS AGED 18–64 YEARS (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Guijing Wang, PhD, Zhang Zefeng, PhD, Ayala Carma, PhD, Dunet Diane, PhD, and Fang Jing, MD, Centers for Disease Control and Prevention, Atlanta, GA Purpose: Coronary heart disease (CHD) is the most common form of and costly cardiovascular disease in the United States. Total medical costs have been estimated at $92.8 billion in 2009, and more than half of these ($54.6) were inpatient costs. The impact of comorbidities/complications on the cost of CHD is lacking. We examined the factors influencing the inpatient costs by analyzing a patient claims dataset. Method: From the 2005 MarketScan Commercial Claims and Encounters inpatient dataset, we identified 33,277 hospitalization claims with a primary diagnosis of CHD for patients aged 18 to 64 years with non-capitated health insurance plans; 10,188 (30%) were acute myocardial infarction (AMI). By secondary diagnosis status, we identified four major comorbidities/complications of CHD: hypertension, diabetes, heart failure, and hyperlipidemia. Using multivariate regression analysis, we examined the impact of these comorbidities/complications on the hospitalization costs, while controlling for selected patient characteristics as well as cardiac procedures of coronary artery bypass graft surgery (CABG) and percutaneous coronary intervention (PCI). Result: The average costs for CHD-hospitalizations were $23,484, and was $3,353 (P < 0.001) higher for AMI compared to other types of CHD. All comorbidities/complications were associated with increased inpatient costs. Hyperlipidemia had the biggest impact on the higher cost of AMI hospitalizations ($4,937, P < 0.001) while heart failure had the biggest impact on the higher costs of CHD hospitalizations other than AMI ($4,410, P < 0.001). Compared to hospitalizations that did not involve cardiac procedures, hospitalizations using CABG and PCI procedures increased the costs by $28,506 (P < 0.001) and $12,608 (P < 0.001), respectively. Conclusion: The inpatient costs for CHD are high, especially among those for AMI. Hypertension, heart failure, and hyperlipidemia as comorbidities/complications to CHD are major factors associated with increased costs. Cardiac procedures of CABG and PCI greatly increased the costs of CHD hospitalizations. New strategies for comprehensive prevention and control of hypertension, heart failure, and hyperlipidemia to reduce CHD could curb hospitalizations and cardiac procedures and thereby to control the associated medical costs. HSR-3 THE IMPACT OF HEPATITIS C ON WORK PRODUCTIVITY AMONG AFRICAN-AMERICAN AND HIGH VIRAL LOAD PATIENTS (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Jan-Samuel Wagner, BS, Kantar Health, Princeton, NJ, Marco DiBonaventura, PhD, Kantar Health, New York, NY, Yong Yuan, PhD, Bristol-Myers Squibb, Plainsboro, NJ, Gilbert L’Italien, BristolMyers Squibb, Princeton, NJ, and Ray Kim, MD, Mayo Clinic, Rochester, MN Purpose: Approximately 2.7 million patients in the United States and 170 million patients worldwide are infected with the hepatitis C virus (HCV). However, the health outcome effects of the virus, particularly among those with a high unmet need, are not yet full understood. Method: Using data from the 2009 United States (US) National Health and Wellness Survey, patients who reported a hepatitis C


ABSTRACTS

diagnosis (n = 695) were compared to a propensity-score matchedcontrol group (n = 695). African-Americans in the hepatitis C group (n = 76) and those who self-reported a high viral load (800,000+ IU; n = 57) were each compared to the matched control group on measures of work productivity (absenteeism, presenteeism, overall work impairment, and activity impairment) as assessed by the Work Productivity and Activity Impairment questionnaire. All analyses applied sampling weights to project to the population. Result: The HCV patient group was equivalent to the matched control group on all socio-demographic (e.g., gender, age, ethnicity, income, etc) and health history (e.g., smoking, alcohol use, anxiety, depression, number of comorbidities) variables. AfricanAmerican HCV patients reported significantly higher levels of presenteeism (36.9% vs. 23.7%, P < .05) and overall work impairment (40.8% vs. 26.5%, P < .05) relative to matched controls. No differences were observed for absenteeism or activity impairment. Patients with a high viral load reported significantly higher levels of absenteeism (17.1% vs. 6.5%, P < .05), presenteeism (37.3% vs. 23.7%, P < .05), overall work impairment (44.9% vs. 26.5%, P < .05), and activity impairment (55.1% vs. 40.2%, P < .05) than matched controls. Conclusion: The results of this study suggest that the HCV can be a substantial burden on patients (particularly among patients with a high unmet need) and their employers by reducing workers’ productivity and impairing their daily activities. HSR-4 TREATMENT THRESHOLDS FOR COST-EFFECTIVE OSTEOPENIC POSTMENOPAUSAL WOMEN IN JAPAN (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Kensuke Moriwaki, MS1, Hirotaka Komaba, MD, PhD2, Masafumi Fukagawa, MD, PhD2, Hiroki Inoue, MD, PhD1, Takeshi Toujo, MD, PhD1, Shinichi Noto, OTR, PhD1, and Hideaki E. Takahashi, MD, PhD1 (1) Niigata University of Health and Welfare, Niigata, Japan, (2) Tokai University School of Medicine, Kanagawa, Japan Purpose: Treatment guideline in Japan recommends medical treatment for osteopenic postmenopausal women who have no previous clinical fractures but with at least one additional fracture risk factor such as current smoking, high intake alcohol or family history of fractures. However, its cost-effectiveness has not been adequately assessed. The purpose of this study is to estimate the cost-effectiveness of alendronate treatment for osteopenic postmenopausal women in Japan considering bone mineral density (BMD) and other BMD-independent risk factors. Method: The cost-effectiveness analysis was conducted from the perspective of Japan healthcare system. Markov model with six health states (no fracture, post-vertebral fracture, post-hip fracture, post-vertebral and hip fracture, bedridden, and death) was developed to predict lifetime costs and effects (QALYs: quality-adjusted life years) associated with ten years of alendronate therapy and no drug treatment in postmenopausal women aged 65 years without fracture history. Risk functions for age and femoral neck BMDspecific fractures were constructed by using data from epidemiologic studies in Japan. Mortality rates, direct medical costs, utility values, and relative risk of drug therapy for fractures were estimated from other published sources. For the base-case analysis, we ran the model with deferent femoral neck T-score (–2.4, –2.0 and –1.5) and risk factor, using 1st order Monte Carlo simulations with 1,000 trials each. In addition, probabilistic sensitivity analysis was performed to assess parameter uncertainty. Result: Incremental cost-effectiveness ratio (ICER) of treating Japanese postmenopausal women who have one additional fracture

risk factor with T-score of –2.4, –2.0 and –1.5 ranged from $31,898 to $32,548, $53,903 to $59,335 and $103,212 to $113,108 per QALY gained, respectively. For women with T-score of –1.5 and two additional risk factors, ICER ranged from $60,845 to $66,686 per QALY gained. For women with three risk factors, ICER was estimated at $36,133 per QALY gained. Then the probabilities of being cost-effective were estimated in the range of 98.5 to 99.1%, applying the willingness to pay thresholds of $63,500~$67,000 (6.35~6.70 million JPY; 1$ = 100JPY, Ohkusa et al., 2006) per QALY gained. Conclusion: Medical treatment for Japanese postmenopausal women with femoral neck T-scores less than –2.0 would be costeffective. Also our study suggests that multiple risk factors should be considered in treating osteopenic postmenopausal women who have relatively high BMD in terms of cost-effectiveness. HSR-5 SOCIETAL PREFERENCES IN THE ALLOCATION OF HEALTHCARE RESOURCES: AN EMPIRICAL ETHICS APPROACH (HSR) Evidence, Economics, and Ethics—Ethical, Legal and Social Issues Chris Skedgel, MDE The University of Sheffield, Sheffield, United Kingdom Purpose: To identify characteristics or attributes relevant to conceptions of the societal value of healthcare and health gains using an empirical ethics approach. Methods: Attributes were derived from a review of empirical studies, identified using a ‘citation pearl growing’ search strategy. The review took an empirical ethics approach: attributes had to have evidence of significant public support in empirical studies and had to be compatible with a coherent and defensible theory of distributive justice. Together, these requirements ensured that selected attributes reflected public preferences while ‘laundering’ perverse or prejudicial attitudes. Theories of distributive justice were limited to those with an explicit maximand, including need principles, maximising principles, egalitarian principles, and Rawls’ Difference principle. Results: The review identified 13 attributes within three broad aspects of healthcare: the patient, the treatment and distributional issues. Of these, 4 attributes had clear evidence of public support and defensible ethical justifications: patient age, initial and final health states and distributional concerns. The review did not find a strong preference for prioritizing by absolute health gain. There was support for prioritizing patients with a healthy lifestyle, but this preference was laundered on the grounds that it reflects moralistic attitudes rather than principles of distributive justice. Preferences for duration of benefit were ambiguous. Conclusions: The conventional quality-adjusted life year (QALY) maximization approach to healthcare priority setting explicitly assumes that the only factors relevant to the societal value of health gains are the absolute health gain, the duration benefit and the number of patients treated. An increase in any of these factors is associated with a proportional increase in value. This review, however, is consistent with growing evidence of a reluctance to allocate healthcare solely on the basis of maximizing expected QALYs and a willingness to sacrifice efficiency for distributive justice. Younger patients and patients in more severe health states were consistently favoured over older or healthier patients and the quality of the final health state was more important than the absolute health gain. There was also a distributional preference for smaller health gains to many over larger gains to few. A fuller conception of societal value may improve priority setting, but it will be necessary to consider the relative strength of preferences for equity relative to efficiency before rejecting QALY maximization.


E71

ABSTRACTS

HSR-6 ORPHAN DRUGS: DOES SOCIETY VALUE RARITY? (HSR) Health Services and Policy Research—Health Policy Arna S. Desser1, Dorte Gyrd-Hansen2, Jan Abel Olsen3, Sverre Grepperud1, and Ivar Sønbø Kristiansen1 (1) University of Oslo, Oslo, Norway, (2) University of Southern Denmark, Odense, Denmark, (3) Univeristy of Tromsø, Tromsø, Norway Purpose: A general societal preference for prioritizing treatment of rare diseases over common ones could provide a justification for accepting higher cost-effectiveness thresholds for orphan drugs. We attempt to determine whether such a preference exists. Method: We surveyed a random sample of 1547 Norwegians aged 40–67. Respondents chose between funding treatment for a rare versus common disease and completed a person trade-off (PTO) exercise between the diseases for each of two scenarios: (1) identical per person costs and (2) higher costs for the rare disease. Diseases were described identically with the exception of prevalence. Respondents were randomized to either no information or different amounts of information about disease severity (severe vs. moderate) and expected benefits of treatment (high vs. low). All respondents rated five statements concerning equity attitudes on a Likert-scale. Result: 68% of respondents agreed completely with the statement “rare disease patients should have equal right to treatment regardless of costs.” Faced with trade-offs, 11.3% of respondents favored treating the rare disease, 24.9% the common disease and 64.8% expressed indifference. When the rare disease entailed a higher opportunity cost, results were 7.4%, 45.3% and 47.3%, respectively. Framing (“extra funding” vs. “fixed budget”) and amount of information about severity and treatment effectiveness had a small impact on preferences. Conclusion: Although there is strong support for general statements expressing a desire for equal treatment rights for rare disease patients, that support evaporates when individuals are faced with opportunity costs. HSR-7 PERFORMANCE OF HEAD CIRCUMFERENCE PERCENTILE CUTOFFS (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Carrie Daymont, MD1, Moira Zabel, MD2, Chris Feudtner, MD, MPH, PHD3, and David Rubin, MD, MSCE3 (1) University of Manitoba, Winnipeg, MB, Canada, (2) Children’s National Medical Center, Washington, DC, (3) University of Pennsylvania, Philadelphia, PA Purpose: The use of head circumference (HC) growth curves is recommended at all well child visits until two years of age to screen for pathologic conditions that may cause abnormal head size. However, little guidance is available for their use. We sought to evaluate the ability of HC growth curves to distinguish between children with and without pathology that may cause macrocephaly. Methods: We performed a retrospective cohort study of children born before 1/31/2008 with at least one HC measurement in the electronic record of a large primary care network before 1/31/2009. The primary outcome was the new diagnosis of an intracranial expansive condition or metabolic/genetic condition that can cause macrocephaly. Subjects with the outcome were identified by reviewing the records of subjects with diagnostic or billing codes indicating potential pathology. Subjects diagnosed with an outcome-defining condition prior to their first recorded HC or a birth weight = 23 cases). Statistical analysis applied included descriptive analysis, Chisquare test, analysis of variance (ANOVA), and hierarchical linear model to examine the relationship between physician-volume and health outcome for gastrectomy surgery. By applying hierarchical linear model, we accounted for the clustered nature of the data.

Result: A total of 2,773 gastrectomy cases were analyzed in this study. Of whom 35.16% were treated by physicians with low volume, 33.18% with medium volume, and 31.66% with high volume. The mean age and standard deviation was 64.73 ± 14.45. Over 66% of the subjects were male. The mean Elixhauser comorbidity index and standard deviation was 1.85 ± 1.79. After controlling for the characteristics of patients and hospitals, the results of hierarchical linear models showed that comparing with low physician-volume group, patients with high physician-volume had lower risk of infection (OR = 0.565, P < 0.05) and shorter length of stay (b = –3.020, P < 0.001). With regard to hospitalization expenses, both patients with medium and high physician-volumes expensed less than the low volume group (b = –45,809, P < 0.001; b = –78,145, P < 0.001). Conclusion: By using a nationwide population-based dataset, we found that higher physician volume is associated with lower risk of infection, length of stay, and hospital expenditure in gastrectomy patients. HSR-13 THE DIAGNOSTIC ACCURACY AND COST-EFFECTIVENESS OF SEROLOGIC TESTS IN THE DIAGNOSIS OF CELIAC DISEASE (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Vania Costa, MSc1, Kiran M. Chandra, MSc2, G. Blackhouse3, Bronwen McCurdy, BSc, MPH1, Luciano Ieraci, MSc4, Ron Goeree, MA3, and Leslie Levin, MD1 (1) Ontario Ministry of Health and Long-Term Care, Toronto, ON, Canada, (2) Ministry of Health and Long-Term Care/Programs for Assessment of Technology in Health Research Institute, Toronto, ON, Canada, (3) McMaster University, Hamilton, ON, Canada, (4) University of Toronto, Toronto, ON, Canada Purpose: To evaluate the diagnostic accuracy and cost-effectiveness of IgA (immunoglobulin A) and IgG serologic tests in the diagnosis of celiac disease including anti-tissue transglutaminase (tTG), antideamidated gliadin peptide (DGP), anti-endomysial antibody (EMA), and anti-gliadin (AGA) antibodies. Method: A systematic literature review was conducted to identify studies published between January 2000 and November 2009 that evaluated the sensitivity and specificity of serologic celiac disease tests using small bowel biopsy as the gold standard. The patient population consisted of untreated subjects with symptoms consistent with the disease. Pooled estimates of sensitivity and specificity were calculated using a bivariate, binomial generalized linear mixed model (SAS 9.2). Statistical significance was defined by P-values less than 0.05, “false discovery rate” adjustments were made for multiple hypothesis testing. A decision analysis was constructed (TreeAge Pro Suite 2009) to compare costs and outcomes based on the pooled estimates. Costs of serologic tests, endoscopy, small bowel biopsy, and physician fees in Canadian dollars were included. The outcome was expected costs per false negative (FN). Result: Seventeen eligible studies were identified. The pooled sensitivity for IgA tTG was 92.1% [95% confidence interval (CI) 88.0, 96.3], 89.2% (83.3, 95.1, P = 0.12) for IgA DGP, 85.1% (79.5, 94.4, P = 0.07) for IgA EMA, 74.9% (63.6, 86.2, P = 0.0003) for IgA AGA, and 44.7% (30.3, 59.2, P < 0.0003) for IgG tTG. Serologic test combinations slightly increased the sensitivity, 95.1% (92.2, 98.0, P = 0.039 vs. IgA tTG). Specificity ranged between 90.1% and 93.9% and was similar among the tests. Small bowel biopsy was assumed to have 100% accuracy since it is the gold standard. IgG tTG was the least costly and least effective strategy ($178.95, 0.1553 FNs). Biopsy alone was the most costly and most effective strategy ($396.60, 0 FNs). The cost per FN avoided moving from IgG tTG to the other strategies sequentially on the


ABSTRACTS

efficiency frontier were $293, $369, $1,401 for IgA EMA, tTG and biopsy respectively. Conclusion: The evidence available suggests that IgA tTG has a higher accuracy compared to other serologic tests. All testing strategies with biopsy were cheaper than biopsy alone however they also resulted in more FNs. The choice of serologic test will depend on the decision maker’s willingness to pay to avoid a FN. Serologic test combinations contribute little to the diagnostic accuracy with an increased cost. HSR-14 CURRENT STATUS AND SOCIETAL CONSENSUS REGARDING END-OF-LIFE TREATMENT DECISIONS IN KOREA (HSR) Evidence, Economics, and Ethics—Ethical, Legal and Social Issues Dae Seog Heo, MD, PhD, Seoul National University College of Medicine, Seoul, South Korea, Jong-Myon Bae, MD, PhD, National Evidence-based Healthcare Collaborating Agency, Seoul, South Korea, and Ho-Geol Ryu, MD, PhD, Boramae Medical Center, Seoul National University, Seoul, South Korea Purpose: Korea is one of the few industrialized countries that have yet to come to a consensus on end-of–life (EOL) treatments. We attempted to evaluate the current status of EOL treatments in Korea with a National survey and claims data of Health Insurance Review & Assessment Service (HIRA), the single payer in health insurance in Korea. Method: Data was extracted from the database of medical care fee claim submitted by medical institutes in Korea to HIRA in 2007. The source population was defined as individuals who died in 2007 with a history of medical service utilization in the last 30 days before death. We also surveyed Koreans and asked the following questions. 1) Are ventilator-dependent PVS patients candidates for end-of life treatment decisions? 2) Is withholding and withdrawing EOL treatment equivalent? 3) If an unconscious terminally ill patient’s wishes regarding EOL treatment are unknown, on what grounds should EOL decisions be made? 4) How should disagreements between or amongst medical staff and the patient’s family on EOL decisions be settled? Result: The average days of hospital visit was 17.1 days in the last month of life with an average cost per visit of US$ 141. The average length of hospital stay in the last month of life was 10.3 days. One quarter of Koreans received ICU care, 17.6% received CPR, and 16.5% received mechanical ventilation in their last month. Only 56.5% received pain killers whereas 16.3% and 34.8% received antibiotics and transfusion in their last month. The national survey of 1500 Koreans revealed the following. Fifty-seven percent of general Koreans and 67% of Korean healthcare professionals consider ventilator-dependent PVS patients as candidates for EOL treatment decisions. Only one quarter of all respondents thought of withholding and withdrawing EOL treatment as equivalent. Just over 50% thought that EOL treatment decision should be made through discussions between the physician and the patient’s family. For settling disagreements, 75% of general Koreans preferred direct settlement between the medical staff and the patient’s family while 55% of healthcare professionals preferred calling in the hospital ethics committee. Conclusion: Despite the high costs associated with life sustaining EOL treatment, greater societal consensus is required in order to develop a national guideline/regulation regarding EOL treatment decisions. HSR-15 POTENTIAL IMPACT ON DIABETES AND CARDIOVASCULAR DISEASES FROM A ONE-PENNY-PER-OUNCE EXCISE TAX ON SUGAR-SWEETENED BEVERAGES (HSR) Health Services and Policy Research—Health Policy

Y. Claire Wang, MD, ScD1, Pamela Coxson, PhD2, Litsa Lambrakos, MD2, Yu-Ming Shen, PhD, MS1, Lee Goldman, MD, MPH3, and Kirsten Bibbins-Domingo, PhD, MD2 (1) Columbia Mailman School of Public Health, New York, NY, (2) University of California, San Francisco, San Francisco, CA, (3) Columbia University Medical Center, New York, NY Purpose: Sugar-sweetened beverages (SSB) are an important source of excess calories among U.S. adults and have been linked to both obesity and diabetes. Previous studies predicted that a one-penny-per ounce excise tax on these beverages would reduce consumption by 10–20%.This study aims to quantify the potential impact of such tax on diabetes, cardiovascular disease, and medical costs. Method: We predicted the downward shift in SSB consumption from the baseline levels reported in the National Health and Nutrition Examination Survey 2003–2006 as a result of the tax, as well as the associated reduction in energy intake (adjusted for compensatory changes in other beverages), body weight, and risk of diabetes. We subsequently used the Coronary Heart Disease (CHD) Policy Model, a Markov cohort simulation model, to project the downstream burden that could be avoided from such tax scenario, quantified by the number of CHD and stroke events avoided and associated medical costs saved among US adults 25–64 years of age over 10 years. Result: We predicted a reduction in average weight by 0.6–1.1 lbs and mean BMI by 0.2–0.4 kg/m2 as a result of 10–20% reduction in SSB consumption and a net daily energy reduction of 6–12 kcal/ day per person. The projected reduction was greater in persons aged Inequality > Responsiveness > Responsiveness Inequality > Fairness) only observed in 7.9% of the population (95% confidence interval [CI] = 7.7%, 8.0%). Conditional on individual age, gender, marital status, self-rated health, education, and income, the probability of giving highest priority to inequalities was higher among individuals in European countries (18.0%, 95% CI = 13.0%, 22.8%) than in countries of the Western Pacific (8.5%, 95% CI = 5.5%, 11.5%) or Southeast Asia (11.8, 95% CI = 9.1%, 14.5%). Individuals in wealthier countries were also more likely to prioritize inequality, as a 1-standard deviation increase in gross domestic product per capita increased the probability of prioritizing health inequalities 2.2 percentage points (95% CI = 1.0, 3.4), from 11.7% to 13.9%. Conclusion: These findings suggest that concern for health inequality may be a “luxury” of residents of higher-income countries with comparatively effective health systems. HSR-28 DEVELOPMENT OF A BLOOD TEST TO SCREEN FOR COLORECTAL CANCER: WHICH FEATURES ARE IMPORTANT? A COST-EFFECTIVENESS APPROACH (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Ulrike Haug, German Cancer Research Center, Heidelberg, Germany, Amy B. Knudsen, PhD, Massachusetts General Hospital, Boston, MA and Karen M. Kuntz, ScD, University of Minnesota, Minneapolis, MN Purpose: Adherence with colorectal cancer (CRC) screening using fecal occult blood tests (FOBTs) is low. Accordingly, researchers are actively pursuing the development of a blood test (BT) for CRC screening. Pilot studies have focused on the BT’s sensitivity for detecting invasive CRC, while there is limited evidence regarding its sensitivity for detecting precursor lesions (i.e., adenomas). We assessed the cost-effectiveness of a hypothetical BT that does not detect precursor lesions (beyond chance detection) in comparison to the currently recommended FOBTs. Method: We used a previously-developed microsimulation model, SimCRC, to calculate life-years and lifetime costs (payers’

perspective) for a cohort of US 50-year-olds to whom non-invasive CRC screening is offered annually from age 50 to 75. For FOBTs (i.e., Hemoccult II, Hemoccult SENSA and fecal immunochemical test) we used established estimates regarding test performance and costs (respectively, specificities of 98%, 93%, 95%, sensitivities for small adenomas of 5%, 12%, 10%, sensitivities for large adenomas of 12%, 24%, 22%, sensitivities for CRC of 40%, 70%, 70% and per test costs of $5, $5, $22). For the BT we assumed a specificity of 95%, a sensitivity of 90% for CRC, a sensitivity of 0% for adenomas of all sizes, and a base-case per test cost of $22. We performed sensitivity analyses on the screening interval, test characteristics and screening adherence and performed threshold analysis on the cost of the BT. Result: At the base-case cost estimate of $22, the BT was dominated by the FOBTs in all scenarios, even when its sensitivity for CRC was assumed to be 100% or when adherence for BT strategies was assumed to be substantially higher than for FOBTs (80% versus 50%). Compared with Hemoccult SENSA and the fecal immunochemical test, BT strategies saved 13–18 fewer life-years while the costs were about $450,000 to $600,000 higher (outcomes expressed per 1000 50-year-old individuals, discounted at 3% annual rate). The BT remained dominated even when the unit cost of the test was lowered to zero. Conclusion: The detection of adenomas appears to be a crucial aspect in the development of BTs that are aimed to improve noninvasive screening for CRC. Sequentially-designed diagnostic studies focusing first on adenomas, which are more prevalent, rather than on CRC could be a resource-saving approach to identify promising marker candidates. HSR-29 CLINICAL DECISION MAKING FOR PREVENTION OF STROKE FROM ATRIAL FIBRILLATION (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Mark Blostein, MD1, Beste Kucukyazici, PhD2, Vedat Verter, PhD1, and Saied Samiedaluie1 (1) McGill University, Montreal, QC, Canada, (2) MIT-Zaragoza, Zaragoza, Spain Purpose: To develop a clinical decision support tool to assist physicians with atrial fibrillation therapy design for primary prevention of stroke. Method: The long-term antithrombotic therapy with warfarin reduces the relative risk of stroke from atrial fibrillation by approximately 65%, while increasing the bleeding risk. Aspirin constitutes a less aggressive therapy option with lesser impact on the stroke and bleeding risks. Given the potential benefits and risks of these options, and recognizing each patient’s case-mix and clinical variables, the therapy choice decisions are critical. We formulate the problem as a Markov Decision Process (MDP) so as to maximize the patient’s expected remaining QALY (i.e., quality-adjusted life years). The decision alternatives at each time epoch are warfarin, aspirin and “no medication.” The risk score and the bleeding score of the patient are the state variables of the MDP. The former is estimated using the CHADS scale, and the latter is accessed via the Beythframework. The MDP transition probabilities are estimated using charts of the 950+ atrial fibrillation patients from the anticoagulant clinic of Montreal Jewish General Hospital. Result: Using the MDP model, we identified the optimal therapy choice based on the patient’s age, stroke score (s) and bleeding score (b). The Table below depicts our results. For example, our framework suggests that an 85 year old patient with s = 1 and b = 1 should be started on Warfarin (W), whereas if the patient was a year older with the same risk factor Aspirin (A) would be the better choice. We estimated the expected remaining QALY of the


ABSTRACTS

950+ patients by the use of our methodology. Comparing the results with the patient files showed that significant improvements in may be achieved. Especially, for the younger group of patients with moderate stroke risk (with stroke score 1 or 2) and high bleeding risk (with bleeding score 2 and 3), these potential improvements varied between 16% and 37%. Conclusion: The current clinical guidelines for atrial fibrillation therapy are based on only the stroke risk of the patient. Our research shows that it is possible to improve the health outcomes by also incorporating the patient’s bleeding risk in the decision process.

in VR care. This novel conceptual model suggests factors that are significantly associated with the ReP of patients with VI by presenting important relationships of factors involved in ReP and examining the interactions of the identified variables and decision making during the VR care process. Results: The interprofessional approach to VR in the health care continuum leads to a proposed conceptual model integrating macro and micro level components and perspectives. The conceptual model serves as a basis for further study and refinement in the field of VR, for understanding clinical decision making, and informs the broader field of rehabilitation medicine. Conclusion: A novel conceptual model that integrates key concepts of clinical practice, policy and provider decision making was achieved. The model presented represents perspectives and determinants related to professionals, evidence-based clinical practice guidelines, and existing third party policy regarding chronic VI and VR care continuum. Ongoing and subsequent research will focus on: validating the proposed model and consensus with key stakeholders; identifying factors influencing its implementation in healthcare practice and clinical education (i.e., understanding novice and expert clinician decision making); understanding knowledge representation and underlying judgment processes in the treatment of chronic visual impairment (i.e., the roles of patient and clinician priority in making relevant decisions); and in the design of interdisciplinary clinical practice guidelines. HSR-31 EFFECTIVENESS OF ACUTE GERIATRIC UNITS: EXPLORING HETEROGENEITY OF TREATMENT EFFECT AMONG SENIORS HOSPITALIZED FOR PNEUMONIA (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Yew Yoong Ding, MBBS, FRCP, MPH1, Bee Hoon Heng, MBBS, MSc, (PH)2, John Abisheganaden, MBBS, FRCP1, Cindy L. Christiansen, PhD3, Dan R. Berlowitz, MD, MPH3, and Tow Keang Lim, MBBS, MMed(Int, Med), FRCP4

HSR-30 AN INTERPROFESSIONAL APPROACH TO UNDERSTANDING DECISION MAKING FOR TREATMENT OF CHRONIC VISUAL IMPAIRMENT: A NEW CONCEPTUAL MODEL (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Lori L. Grover, OD, Johns Hopkins University School of Medicine, Baltimore, MD, and Kevin D. Frick, PhD, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD Purpose: Appropriate rehabilitation of patients with chronic visual impairment (VI) can both improve individual abilities for health and personal management and maximize utilization of health care resources. There exists a need for a detailed, comprehensive understanding of health care processes and medical decision making involved in treating chronic VI. To fill this gap, a conceptual model is presented as a foundation for understanding medical decision making by identifying an interprofessional approach to treating chronic VI. Methods: Components were identified from existing practice guidelines, systematic review of literature, and interprofessional feedback via focus group data. Key concepts were synthesized into a schematic conceptual model for an interprofessional approach to treating patients with irreversible, chronic VI (a.k.a. vision rehabilitation or VR) in the U.S. healthcare system. A universal component of medical rehabilitation care—rehabilitation potential (ReP) of the patient—was included for patients with VI and further delineated as an important component for eye care provider decision making

(1) Tan Tock Seng Hospital, Singapore, Singapore, (2) National Healthcare Group, Singapore, Singapore, (3) VA Center for Health Quality, Outcomes & Economic Research (CHQOER), Bedford, MA, (4) National University Health System, Singapore, Singapore Purpose: The effectiveness of acute geriatric units in improving outcomes of hospitalized seniors in the real world is unclear. We sought to answer this question by focusing on pneumonia. We hypothesized that acute geriatric units reduce short-term mortality for seniors hospitalised for pneumonia compared with those receiving usual internal medicine care. Method: We merged medical records and administrative data of adults aged 65 years or older admitted to 3 acute care hospitals over one year. The outcome variable was 30-day mortality. The treatment variable was admission to acute geriatric units, with usual internal medical care as the reference. Other explanatory variables included demography, admission information, severity of acute illness (CURB score), co-morbidity, and functional status. We obtained propensity scores for admission to acute geriatric units and stratified seniors into quintiles according to scores sorted in ascending order. We performed logistic regression and propensity score matching (PSM) to estimate treatment effects for all seniors and for those within quintiles. Finally, we explored distribution of explanatory variables across quintiles. Result: The 30-day mortality for 3034 seniors included for analyses was 25.8%. There was a significant reduction in 30-day mortality for senior admitted to acute geriatric units (adjusted OR 0.77; 95% CI 0.60 to 0.99) using logistic regression analyses. The following table shows 30-day mortality and selected patient characteristics across quintiles.


E81

ABSTRACTS

Treatment effects: Acute geriatric units: OR (95% CI) Patient characteristics: Age >80 years Hospitalization in prior 90 days CURB score >2

Quintile 1 (n = 607)





0.82 (0.35–1.91)

1.16 (0.53–2.55)

0.97 (0.60–1.59)

0.66 (0.40–1.11)

0.57 (0.36–0.90)

0.0% 28.2%

3.3% 24.7%

68.0% 23.6%

100.0% 36.1%

100.0% 36.6%

12.5%

15.3%

18.2%

18.5%

21.1%

PSM obtained similar treatment effect estimates. Seniors with higher likelihood for admission to acute geriatric units (quintiles 4 and 5) obtained greater magnitude of mortality reduction. They were more likely to be very old, have recent hospitalization, and have higher severity of acute illness scores. Conclusion: Acute geriatric units reduced short-term mortality among seniors hospitalized for pneumonia, when compared with usual internal medicine care. Seniors who were more likely to receive care at these units had greater mortality reduction. These findings have implications on targeting policies for these and similar units including acute care for elders (ACE) units. HSR-32 TEN PERCENT OR GREATER WEIGHT LOSS IS A CLINICALLY MEANINGFUL OUTCOME IN THE TREATMENT OF OBESITY (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Cheryl S. Hankin, PhD1, Amy Bronstone, PhD1, and Barbara Troupin, MD, MBA2 (1) BioMedEcon, Moss Beach, CA, (2) VIVUS, Inc., Mountain View, CA Purpose: Health plans vary widely in their coverage and reimbursement policies for obesity treatments, in part due to a lack of consensus about the degree of weight loss that will confer obesityrelated morbidity risk reduction. Although a weight loss of 5–10% has been associated with clinical benefits, NIH obesity guidelines state that the goal of obesity treatment should be a loss of at 10% or greater of baseline weight. This review examines the literature on weight loss and associated clinical outcomes to determine whether a 10% weight loss threshold should be considered clinically meaningful. Method: We examined the long-term effects (minimum study duration 2 years) of 10% intentional weight loss among overweight and obese adults on obesity-related health outcomes, including hypertension, type 2 diabetes mellitus (T2DM), and hyperlipidemia. Relevant studies were identified through a Medline search of English studies [Medical Subject Headings: Obesity” AND (“Weight Loss” OR “Body Weight Changes”) AND (“Heart Diseases” OR “Hypertension” OR “Blood Pressure” OR “Diabetes Mellitus, Type 2” OR “Glucose Metabolism Disorders” OR “Insulin Resistance” OR “Hyperlipidemias”)]. Result: Among 1,292 citations, 104 were identified as relevant. Weight loss of 10% is associated with improved blood pressure and glycemic and lipid outcomes. With respect to hypertension, a weight loss of 10% appears to be required to resolve hypertension among overweight/obese persons. For glycemic outcomes, a weight loss of 10% or more is associated with a 1.6% decrease in hemoglobin A1c, a 63% reduction in diabetes medication use, and 33% reduction in mortality among overweight and obese persons with T2DM. A sustained weight loss of 16% reduces the 8-year incidence of T2DM by >80% in severely obese persons. Weight loss of 10% improves hyperlipidemia, especially among those with high baseline cholesterol levels who are able to maintain their weight loss.

Conclusion: Weight loss of ≥10% confers substantial cardiovascular and metabolic benefits among overweight and obese adults, and appears to be a more clinically significant intermediate marker of the effectiveness of weight loss interventions. Consistent application of this benchmark by health plans in determining coverage and reimbursement for weight loss interventions may result in wider access to more effective treatments and reduce the escalating rate of obesity-related diseases and related spiraling medical costs. HSR-33 A COST COMPARISON OF OPTIMAL VERSUS SUB-OPTIMAL PRE-DIALYSIS CARE IN CANADA BASED ON THE STARRT TRIAL (HSR) Health Services and Policy Research—Health Economics and Cost Analyses Charles Piwko, PhD1, Farah Jivraj, BSc, MSc2, Lou Marra, PhD3, Eva Appel, BSc1, Jacob Wolpin, PhD1, Fernando Camacho, PhD4, Colin Vicente, MSc1, David C. Mendelssohn, MD, FRCPC5, and Phil A. McFarlane, MD, FRCPC6 (1) PIVINA Consulting Inc., Thornhill, ON, Canada, (2) JanssenOrtho Inc., Toronto, ON, Canada, (3) Janssen Ortho Inc., Toronto, ON, Canada, (4) DAMOS Inc., Toronto, ON, Canada, (5) Humber River Regional Hospital and University of Toronto, Toronto, ON, Canada, (6) St. Michael’s Hospital Toronto and University of Toronto, Toronto, ON, Canada Purpose: Suboptimal transition from chronic kidney disease (CKD) to end stage renal disease (ESRD) results in poor clinical outcomes and substantial economic burden to health care systems and patients. The objective of this study was to estimate and compare the average total cost per patient requiring CKD management that initiates renal replacement therapy (RRT), stratified by their preparation status: 1.) Optimally Prepared (RRT initiation as outpatients and via AV Fistula or Graft), 2.) Sub-Optimally Prepared (RRT initiation as inpatients or via central venous catheter). Method: The Study To Assess Renal Replacement Therapy (STARRT), a Canadian, multicentre, retrospective study, designed to assess various factors related to pre-dialysis care and patient status at the time of dialysis initiation, was used to estimate and compare the average direct medical cost for patients in each group. Patients in the STARRT trial were retrospectively followed for up to 6 months following the start of dialysis. Unit costs for resources were obtained from participating hospitals, the literature, and/or standard costing sources (i.e., provincial fee schedules). The analysis was performed from the perspective of Canadian health care facilities and reported in 2010 Canadian Dollars (CAD). Descriptive statistical analyses were performed to determine the mean, standard deviation, and median for resources utilized. Result: Data from a total of 339 patients, who started chronic RRT at 10 Canadian centres were collected. The mean patient age was 63 ± 16 years. Sixty-two percent of these patients were males. One hundred and thirty four patients were Optimally Prepared (39.5%) and 205 patients (60.5%) were Sub-Optimally Prepared.


ABSTRACTS

The Optimally Prepared group ($52,225) had a statistical significant (P ≤ 0.001) lower total average cost during the study period than the Sub-Optimally Prepared group ($68,733). Cost drivers for the difference between the two groups included the various dialyses and the number of hospitalizations and the lengths of stay. The average length of stay in hospital for the Optimally Prepared group was 6.5 (± 29.5) days compared to 19.5 (± 31.9) days for the Sub-Optimally Prepared group (P = 0.01). Conclusion: In addition to the reported improved clinical outcomes, patients Optimally Prepared for RRT have significantly reduced costs, resulting in a potential decrease in the total economic burden of RRT. HSR-34 SECOND-LOOK ENDOSCOPY FOR BLEEDING PEPTIC ULCER DISEASE: A DECISION- AND COST-EFFECTIVENESS ANALYSIS (HSR) Health Services and Policy Research—Clinical Strategies and Guidelines Nan Kong, PhD, Purdue University, West Lafayette, IN, and Thomas F. Imperiale, MD, Indiana University School of Medicine, Indianapolis, IN Purpose: Following application of therapeutic endoscopic methods for treatment of bleeding peptic ulcer disease (PUD), a followup or second-look endoscopy (SLE) may be performed 1–2 days later. SLE may decrease the risk of recurrent PUD bleeding; however, it is not routinely recommended because it has no clear effect on the need for surgery or on mortality, in large part because clinical trials have been underpowered for these outcomes. Method: Using literature-based probabilities and Medicare reimbursed costs, we created a decision model comparing routine SLE (rSLE) vs. no SLE for patients with bleeding PUD. In the model, an initial episode of rebleeding was re-treated endoscopically, while a second rebleeding episode was treated surgically. For outcomes, we measured rates of rebleeding, need for surgery, hospital mortality, and hospital costs, and we calculated costs to avoid one of each outcome (expressed as the number needed to treat [NNT]), along with ICERs. When costs were uncertain, we chose costs that would bias the model against rSLE. Result: In the base case, rSLE reduces: rebleeding from 16% to 8.2%; the need for surgery from 3.1% to 2.7%; mortality from 1.08% to 0.94%; but not hospitalization cost, which increases from $12,069 to $12,572. Respective NNTs for rebleeding, surgery, and mortality are 12.8, 251, and 719, respectively. Incremental costs of rSLE to prevent 1 rebleed, 1 surgery, and 1 hospital death are $6,449, $125,750, and $314,375, respectively. Threshold analysis revealed that the rebleeding threshold required to neutralize the need for surgery and mortality is 14%, and is 20% to neutralize the cost difference. If rSLE were 100% effective in preventing rebleeding, then the rebleeding threshold required for cost neutrality is 8.6%. One-way sensitivity analyses revealed that base case findings for surgery and mortality were sensitive to the probabilities of: rebleeding after index endoscopy; rebleeding after rSLE; continued bleeding; and repeat use of therapeutic endoscopy. Conclusion: Although this analysis did not consider comorbidity from recurrent bleeding, effect of rSLE on length of hospital stay, or use of adjuvant therapy with proton pump inhibitors, the results suggest that rSLE is not indicated following therapeutic endoscopy for bleeding PUD. However, if the risk for rebleeding exceeds 20%, then SLE reduces the risk of rebleeding at no additional cost. HSR-35 U.S. CHILDHOOD OBESITY POLICIES AND THEIR PROJECTED IMPACT ON ADULT HEALTH THROUGH 2040 (HSR) Health Services and Policy Research—Health Policy Jeremy D. Goldhaber-Fiebert, PhD1, Rachel E. Rubinfeld, PhD1, Jayanta Bhattacharya, MD, PhD1, and Paul H. Wise, MD, MPH2

(1) Stanford University, Stanford, CA, (2) Stanford University and Lucile Packard Children’s Hospital, Stanford, CA Purpose: Childhood obesity threatens the future health of America’s adults. Recently, the U.S. Preventive Services Task Force recommended childhood obesity screening to better target preventive interventions. Others advocate universal childhood interventions. Projections of the impact of childhood obesity on future adult health are needed to guide policy decisions. Method: We developed the Stanford Childhood Obesity Projection and Evaluation (SCOPE) model to simulate body mass index (BMI) dynamics for children starting at age 2. The SCOPE model follows children as they grow into adulthood, tracking their BMI and obesity status. The SCOPE model projects outcomes including BMI at ages 18 and 40, and diabetes and hypertension prevalence at age 40. The parameters of the SCOPE model were informed by nationally representative, longitudinal data: the National Health and Nutrition Examination Survey (NHANES 2006), National Longitudinal Survey of Youth (NLSY) Children and Young Adult samples; and Panel Study of Income Dynamics (PSID). Using the SCOPE model, we evaluated the following strategies: childhood obesity screening (at age 5, 10, or 15) with interventions for children at risk; and universal school-based obesity interventions (e.g., interventions such as Planet Health). Result: Without intervention, 33% of U.S. children currently aged 5 through 10 will be overweight (BMI 25–30) or obese (BMI ≥30) by age 18. For obese 18 year-olds, the probability at age 40 of being obese is 70%, of being diabetic is 23%, and of being hypertensive is 39%. By contrast, for thin (BMI 20% improvement in immune reconstitution or cost substantially less than drugs currently being evaluated. HTA-6 NON-INVASIVE CARDIAC IMAGING TECHNOLOGIES FOR THE DIAGNOSIS OF CORONARY ARTERY DISEASE: A MEGA-HTA TO INFORM POLICY DECISION MAKING (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Application Bronwen McCurdy, BSc, MPH, Tanya Khan, MHSc, Alexandra Chambers, MSc, Jacob Franek, MHSc, and Kristen McMartin, PhD, Ontario Ministry of Health and Long-Term Care, Toronto, ON, Canada Purpose: To evaluate the diagnostic accuracy of different noninvasive cardiac imaging technologies in patients suspected of CAD and employ a novel technique of integrating multiple HTAs to inform policy decision-making. Method: Evidence-based reviews of the literature from 2004 to 2009 were conducted to assess diagnostic accuracy using coronary angiography as the reference standard. Five diagnostic imaging technologies were identified: CMR, SPECT, CT Angiography, stress ECHO and stress contrast ECHO. Estimates of diagnostic accuracy were calculated to compare across the technologies and an expert panel was struck to assist in the contextualization of the evidence. Re-aggregation of results to a mega-HTA was based on the 4 OHTAC decision determinants: clinical effectiveness, value for money, social values and feasibility of implementation. Result: Estimates of pooled sensitivity ranged from 0.98–0.84 and ranked highest to lowest as CT Angio, CMR, AC and traditional SPECT, gated SPECT and stress contrast ECHO and lastly, stress ECHO. For specificity, estimates ranged from 0.71–0.84 and ranked highest to lowest as stress ECHO, AC SPECT, CMR, stress contrast ECHO, CT Angio, gated and traditional SPECT. Sensitivity and specificity were equally weighted for policy-decision making, thus AUCs were calculated based on SROC curves and ranked as follows: CT Angio (0.94), cardiac MRI (0.93), AC SPECT (0.91; n = 13), stress contrast ECHO (0.90), stress ECHO (0.89), gated SPECT (0.89) and lastly traditional SPECT (0.89). Conclusion: For overall clinical benefit, CT Angio and CMR had the highest diagnostic accuracies. For implementation, stress ECHO (contrast) was deemed the most feasible. Social values were similar across technologies. Value for money is being assessed through an Ontario-based cost-effectiveness analysis. Mega-HTA results are currently being used to prioritize investments in cardiac imaging in Ontario. HTA-7 THE FIRST 4 YEARS OF A HOSPITAL-BASED COMPARATIVE EFFECTIVENESS CENTER: TRANSLATING RESEARCH INTO PRACTICE TO IMPROVE THE QUALITY, SAFETY AND COSTEFFECTIVENESS OF PATIENT CARE (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Methods Matthew D. Mitchell, PhD, Kendal Williams, MD, MPH, and Craig A. Umscheid, MD, MSCE, University of Pennsylvania Health System, Philadelphia, PA


E97

ABSTRACTS

Purpose: Most existing centers for health technology assessment (HTA) are associated with payers or government agencies. They most frequently review and analyze emerging and costly technologies. But hospitals often have to make decisions about processes of care that have impact not only on cost, but on the quality and safety of patient care. Method: Our academic medical center created a Center for Evidence-based Practice (CEP) for the purpose of gathering scientific evidence and applying it to decision making about purchasing, formularies, and clinical practice. CEP was established in July 2006, is funded by the Office of the Chief Medical Officer, and is staffed by two hospitalist co-directors trained in epidemiology, two HTA analysts, primary care and infectious disease physician liaisons, a librarian, a health economist, and an administrator, totaling 4.5 full time equivalents. Result: Over 100 evidence reports have been completed to date, 42 in the most recent 12 months. Internal clients requesting reports include clinical and administrative leaders and committees in our medical center, as well as committees established to improve and standardize care. Topics have included processes of care like the use of heparin versus saline for catheter flushing; and high-cost and emerging technologies like telemedicine in critical care. Reports review existing guidelines and systematic reviews first, and review primary studies when previously published reviews do not offer sufficient evidence. Local utilization and cost data are incorporated so reports can be tailored to our medical center’s needs. CEP then works with key stakeholders to implement reports, including integrating them into computerized clinical decision support, and measures their impact using administrative and/or clinical data. Evidence reviews are shared publicly through the National Guideline Clearinghouse, the Cochrane-indexed HTA database, and peer-reviewed publications. CEP also offers education through workshops, a resident elective, courses for medical and graduate students, and academic detailing. In addition, CEP has developed collaborations with payors, government organizations, and private industry, such as the development of evidencebased infection control guidelines with the CDC. Conclusion: An evidence-based practice center within an academic medical center can offer systematic evaluations of high impact clinical topics. Besides informing clinical practice, such evaluations can promote a culture of evidence-based decisionmaking, offer educational and publishing opportunities, and facilitate constructive relations between the medical center and outside organizations. HTA-8 COST-EFFECTIVE ALTERNATIVES TO LIVER BIOPSY IN THE MANAGEMENT OF CHRONIC HEPATITIS C (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Application Shan Liu, S.M.1, Michael Schwarzinger, MD, PhD2, Fabrice Carrat, MD, PhD3, and Jeremy D. Goldhaber-Fiebert, PhD1 (1) Stanford University, Stanford, CA, (2) INSERM 912, AixMarseille University, Marseille, France, (3) INSERM U707, Pierre et Marie Curie University, Paris, France Purpose: Chronic hepatitis C (HCV) is a serious liver disease affecting over 3 million Americans. Liver biopsy is the gold standard for assessing liver fibrosis and is used as a benchmark for initiating treatment, though it is expensive and carries serious risks of complications. FibroTest is a noninvasive biomarker assay for fibrosis, proposed as a screening alternative to biopsy. We assessed the cost-effectiveness of screening strategies for liver fibrosis and subsequent treatment of U.S. patients with chronic HCV. Method: We estimated the health outcomes and costs associated with each of 6 screening strategies. For patients with chronic HCV,

screening strategies to detect fibrosis employed FibroTest and liver biopsy either alone or sequentially followed by treatment (peginterferon alfa and ribavirin). Strategies included: FibroTest only; FibroTest with liver biopsy for ambiguous results; FibroTest followed by biopsy to rule in or rule out significant fibrosis (Metavir score stage 2+); Biopsy only (standard care); and Treatment without prior screening. For treatment of genotype 1 patients, early viral response (EVR) was assessed at 12 weeks, with further 36-week treatment reserved for those with EVR. For other genotypes, treatment was 24 weeks. We developed a Markov model of chronic HCV stratified by gender and genotype that tracks fibrosis progression towards decompensated cirrhosis, hepatocellular carcinoma, and death. Estimates of disease, population, cost, and utility parameters were derived from the published literature and expert opinion. Outcomes were expressed as expected lifetime costs (2009 USD), quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios (ICER). Result: Treatment of chronic HCV without screening is preferred for both men and women. For genotype 1, the ICER is $2,000/QALY compared to FibroTest only. For other genotypes, it is more effective and less costly than all alternatives. In clinical settings where testing is required prior to treatment, FibroTest only is more effective and less costly than liver biopsy, the current standard of care. Compared to FibroTest only, FibroTest with biopsy for ambiguous results has an ICER of $93,000/QALY for genotype 1 and $16,000/QALY for other genotypes. These results are robust to multi-way and probabilistic sensitivity analyses. Conclusion: Early treatment of chronic HCV is superior to the other screening strategies considered. In clinical settings where testing is required prior to treatment, FibroTest screening is a cost-effective alternative to liver biopsy. HTA-9 DO DIFFERENT METHODS OF MODELING STATIN EFFECTIVENESS INFLUENCE THE OPTIMAL DECISION? (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Methods Bob J.H. van Kempen, MSc1, Bart S. Ferket, MD1, Rogier L.G. Nijhuis, MD, PhD2, Sandra Spronk, PhD1, and M.G. Myriam Hunink, MD, PhD1 (1) Erasmus MC, Rotterdam, Netherlands, (2) ZGT Hengelo, Hengelo, Netherlands Purpose: Methods of modeling the effect of statins in simulation studies vary amongst papers that have been published. In this abstract, we will illustrate the impact of using different modeling methods on the optimal decision. Method: A previously developed and validated Monte CarloMarkov model based on the Rotterdam Study, a cohort study of 6871 individuals aged 55 years and older with 7 years follow-up, was used. Life courses of 3501 participants with complete risk profiles on statin treatment vs. no statin treatment were simulated using six health states (well, coronary artery disease (CAD), stroke, both CAD and stroke, and death). Transition probabilities were based on 5-year risks predicted by Cox regression equations, including (amongst others) total and HDL cholesterol as covariates. We used three different methods to model the effect of statins on the incidence of CAD: (1) statins lower total cholesterol levels and increase HDL, which through the covariates in the Cox regression equations leads to a lower incidence of CAD; (2) statins decrease the incidence of CAD directly through a relative risk reduction (RRR), assumed to be the same for each individual; (3) the RRR with statin therapy on the incidence of CAD is made proportional to the absolute reduction in LDL-cholesterol levels, for each individual. Each of the three statin modeling alternatives was compared to the no statin strategy. Result: In the 3501 subjects (mean age 69 ± 8.47, 39% men), lifeyears simulated for each of the three methods were: (1)


ABSTRACTS

17.241, (2) 17.705 and (3) 17.709 years. At a willingness-to-pay of $50,000, net health benefits were (1) 9.67, (2) 9.87 and (3) 9.87. Figure 1 shows the probability that statin treatment is cost effective for each of the three methods, for varying willingness-to-pay thresholds.

in the epidemiology and management of chronic HBV infection that could potentially benefit from future research. Conclusion: Early detection and treatment of HBV infection through screening appears to substantially impact both health outcomes and health service utilization for new immigrants and their receiving country. Given the potential for health gains for the immigrant cohort as well as the economic attractiveness of the intervention, some consideration might be given to the introduction of a universal HBV screening program to the U.S. immigration medical exam. HTA-11 WHICH MEN WITH LOW-RISK PROSTATE CANCER SHOULD BE TREATED? (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Application David Liu, MPH, MS1, Harold P. Lehmann, MD, PhD2, Kevin D. Frick, PhD3, and H. Ballentine Carter, MD1

Conclusion: The choice of modeling method of the effectiveness of a drug in simulation studies can potentially influence the optimal decision and the uncertainty associated with it. HTA-10 SHOULD HEPATITIS B SCREENING BE ADDED TO THE UNITED STATES IMMIGRATION MEDICAL EXAM? A COST-UTILITY ANALYSIS (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Application Jaclyn Beca1, Kamran Khan, MD, MPH, FRCPC1, and Jeffrey Hoch, PhD2 (1) St. Michael’s, Toronto, ON, Canada, (2) Cancer Care Ontario, Toronto, ON, Canada Purpose: Hepatitis B virus (HBV) infection is a significant health issue with risk for a number of costly and debilitating disease complications. In industrialized nations such as the United States, there is a significant and disproportionate burden of chronic HBV infection among the foreign-born population. The purpose of this study was to determine the cost-effectiveness of universal screening for HBV infection among new immigrants to the United States. Method: A Markov decision model was developed to compare screening and usual care strategies for a hypothetical cohort of immigrants entering the United States in one year. We considered direct health care costs and quality-adjusted life years (QALYs) for the immigrant cohort over a 20-year horizon. Patients could progress through the early stages of HBV infection to cirrhosis, hepatic decompensation, hepatocellular carcinoma (HCC), and death, as well as receive treatment or undergo liver transplant. We used prevalence of HBV infection among the foreign-born population from the National Health and Nutrition Examination Survey (NHANES) and obtained cost, utility, natural history and treatment effectiveness estimates from the literature. Costs and QALYs were discounted at 3% per year. Probabilistic sensitivity analyses were performed using 1,000 Monte Carlo simulations. Results: The incremental cost-effectiveness ratio for the screening strategy compared to usual care was $45,570 per quality-adjusted life year (QALY) gained. For a willingness-to-pay (WTP) of $100,000/ QALY, screening was cost-effective in 67% of Monte Carlo simulations. Our analysis identified key areas of uncertainty

(1) Johns Hopkins University School of Medicine, Baltimore, MD, (2) Johns Hopkins University School of Medicine and Bloomberg School of Public Health, Baltimore, MD, (3) Johns Hopkins Bloomberg School of Public Health, Baltimore, MD Purpose: Although active surveillance is an option for lowrisk prostate cancer, the comparative effectiveness of active surveillance (AS) vs. immediate radical prostatectomy (RP) is unknown. In this study, we compare the efficacy of AS vs. RP for men diagnosed at different ages and with different baseline health statuses. Method: A Monte Carlo simulation using a Markov Model was used to simulate the life course of men diagnosed with low-risk prostate cancer in the modern PSA era when treated with radical prostatectomy or monitored with active surveillance. Different starting ages from 50–75 and health statuses (poor health, average health, and excellent health as defined by 0.5×, 1.0×, and 1.5× average life expectancy without prostate cancer) were simulated. Disease progression probabilities and utilities were obtained from literature review. Life expectancy, number of years with treatmentrelated erectile dysfunction or incontinence, quality adjusted life expectancy (QALE), and a clinical incremental cost effectiveness ratio (ICER) (years with side effects per life year extended) for men undergoing radical prostatectomy or managed with active surveillance were estimated. Result: For a man age 56 in excellent health, immediate RP vs. AS resulted in 0.7 additional quality-adjusted life years (QALYs), with 3.8 years incremental years of life at a cost of 7 additional years of erectile dysfunction (ED) or incontinence, yielding a clinical ICER of 1.8 years of side effects per additional year of life. For a man age 67 in poor health, RP vs. AS resulted in–0.3 QALYs, no benefit to life expectancy and 2.6 additional years of ED and incontinence. Overall, increased age and decreased health status resulted in preference for AS in terms of QALEs and increased clinical ICER, whereas decreased age and increased health status resulted in preference for immediate RP and a lower clinical ICER. These trends held in all sensitivity analyses testing parameter uncertainties. Conclusion: Age and health status are critical determinants for optimal management strategies following diagnosis of low-risk prostate cancer. For older men and men in poorer health with reduced life expectancies, active surveillance should be strongly considered as an alternative to immediate treatment. Individual patient valuations of life with side effects vs. increased life expectancy can be compared to the calculated clinical ICER to inform individual treatment decisions.


E99

ABSTRACTS

HTA-12 A COST-EFFECTIVENESS ANALYSIS OF THERAPEUTIC OPTIONS FOR LOW RISK PROSTATE CANCER (HTA) Evidence, Economics, and Ethics—Health Technology Assessment (Comparative & Cost-effectiveness Analysis): Application Julia H. Hayes, MD1, Daniel A. Ollendorf, MPH, ARM2, Michael J. Barry, MD3, Steven D. Pearson, MD, MS, FRCP4, and Pamela McMahon, PhD4 (1) Institute for Technology Assessment, Massachusetts General Hospital, Boston, MA, (2) Institute for Clinical and Economic Review, Boston, MA, (3) Harvard Medical School, Boston, MA, (4) Massachusetts General Hospital, Boston, MA Purpose: The optimal therapeutic approach for low-risk clinicallylocalized prostate cancer (CaP) is unknown: over 50% of screendetected men are overtreated and treatment is associated with significant side effects (SE). This analysis examines the costeffectiveness of radical prostatectomy (RP), radiation therapy (IMRT), brachytherapy (BT), proton beam therapy (PBT) and active surveillance (AS) in these men. Method: A state transition model was constructed and analyzed using Monte Carlo simulation. Men received treatment or AS and incurred SE for 1–2 y and costs until death of CaP/other cause. Men on AS could elect therapy or be treated at progression (both with IMRT). The base case used 65 yo men and included therapy and patient time costs. Transition probabilities and utilities were developed from literature review. Sensitivity analysis on key parameters was performed. Main outcomes were costs (2008US$) and qualityadjusted life-years (QALYs), both discounted at 3%/y, and incremental cost-effectiveness ratios (ICERs). Result: AS was most effective, providing 8.58 QALYs at a cost of $30422. Compared to RP, AS provided an additional 9.1 mo of QALE at an added cost of $2074 (ICER $2729/QALY). Among initial therapies, BT was most effective and least expensive, providing an additional 3.5 mo of QALE at a cost savings of $2743 vs. RP. IMRT and PBT were more expensive than BT, RP, or AS.

Strategy Cost($) BT RP AS IMRT PBT

25,606 28,348 30,422 37,808 53,828

Incremental Cost($) QALYs — 2743 2074 7386 16,020

8.11 7.82 8.58 8.09 7.96

Incremental QALYs

ICER

— –0.29 0.76 –0.88 –0.13

— Dominated(D) $2729/QALY D D

Dominated: more expensive and less effective than BT. Alternative Analyses. AS followed by BT was more effective and less expensive than any initial therapy or AS followed by IMRT. The relative risk of CaP-specific death would have to be 0.6 for therapy vs. AS for QALE to be equal. Sensitivity Analysis (SA). AS was most effective on SA including probability of SE, progressive disease on AS and utilities. If IMRT cost was reduced to 90% of dairy products from organic sources had lower risk of eczema at age 2 years than children who consumed 90% of dietary dairy and meat of organic origin had higher levels of conjugated linoleic acid (P < 0.001) and trans-vaccenic acids (P = 0.015) in their breast milk than women consuming conventional diets. 23 studies evaluated the animal products themselves and found no significant difference in bacterial contamination between organic and conventional meats, milk or eggs (OR 1.10; 95% CI 0.93–1.30). However, Campylobacter, Enterococcus, and

Staphylococcus spp and E. coli cultured from organic products were less likely to be antibiotic resistant than bacteria cultured from conventional poultry, beef, pork, and milk (P < 0.001). There was no significant difference in heavy metal contamination or vitamin content of organic and conventional meat products. Conclusion: Too few studies have compared outcomes among consumers of organic and conventional animal products to draw conclusions about their health effects. Although bacterial contamination does not differ significantly between organic and conventional animal products, bacteria cultured from conventional products are more likely to be antibiotic resistant. The clinical significance of this increased antibiotic resistance is unknown. QMA-20 NOT ALL LIPID LOWERING IS CREATED EQUAL: A METAANALYSIS (QMA) Evidence, Economics, and Ethics—Meta-analysis Robert J. Bryg, MD, Olive View-UCLA Medical Center, Sylmar, CA, and David J. Bryg, PhD, Olive View-Medical Center, Sylmar, CA Purpose: Cholesterol lowering with statins and fibrates is a cornerstone of preventive cardiology. There is, however, little data on the magnitude of reduction in different endpoints in various high risk populations. In this analysis, we sought to determine how cholesterol lowering lowers the risk of cardiovascular death (CV death), myocardial infarction (MI) and stroke (CVA) in different populations. Method: We performed a literature search to identify all clinical trials from 1994 onward for cholesterol lowering with statins or fibrates. Thirty two studies were identified and then segregated into 5 separate categories based on the underlying cardiovascular risk: Primary prevention (N = 6), secondary prevention (N = 8), comparison of drug dose in secondary prevention (N = 8), significant comorbid conditions (N = 6), and fibrates (N = 4). Separate meta-analyses were performed for the three different endpoints for each of these 5 categories. Result: Primary and secondary prevention populations had significant reduction in all categories: CV death (25 and 22% reduction respectively), MI (41% and 32% reduction), and CVA (23% and 21% reduction). High dose versus low dose statins did not significantly decrease CV death, but had continued further significant reduction in MI (22%) and CVA (13%). Treatment with statins in comorbid diseases, such as heart failure and renal failure did not affect CV death or CVA, but did lower risk of MI by 23%. Fibrate therapy, based on these 4 studies, did not significantly affect CV death or CVA, but had a borderline significant decrease in MI of 18%. Rates for CV death ranged from 0.3% per year for primary prevention to 5.15% with cormorbidities. MI rates were similar at approximately 1.1% per year in all groups except primary prevention (0.56%). CVA rates were lowest in primary prevention (0.46% and highest in patients with comorbidities (1.2%). Conclusion: This data suggests that while there is a virtually uniform relative risk reduction of MI with cholesterol lowering with statins or fibrates, this effect is not necessarily carried over to reduction in CVA or CV death. The underlying comorbidities, disease processes, and the extent of relative and absolute risk reduction in each of these three endpoints need to be assessed separately in defining the expected benefit of cholesterol reduction in diverse populations. QMA-21 ASSESSING THE IMPACT OF AN IMPERFECT REFERENCE STANDARD ON ROC ANALYSIS THROUGH SIMULATION (QMA) Advances in Quantitative Methods—Simulation & Decision Modeling Milan Seth, MS, Boston Scientific Inc., Minneapolis, MN, and Karen M. Kuntz, ScD, University of Minnesota, Minneapolis, MN


ABSTRACTS

Purpose: The lack of conclusive information about disease status (i.e., a gold standard) renders the evaluation of test characteristics of a new diagnostic test problematic. Using an imperfect reference standard to estimate ROC curves can bias estimates of diagnostic accuracy and hence clinical value of the test under evaluation. We sought to evaluate the extent and direction of this bias through simulation. Method: We simulated values for a continuously scaled reference standard and diagnostic test from multivariate normal distributions for diseased and non-diseased individuals with differing means by disease status. The new diagnostic test values were simulated over a range of correlations to the reference standard from –0.3 to 0.5 (conditional on true disease status). The mean for nondiseased patients was fixed at 0, and the mean for diseased patients was set to 3 for the imperfect reference standard and 2 for the test. Data from the diseased and non-diseased patients were combined, and the simulated reference standard values were used to construct ROC curves of the new test results, to determine the bias introduced by the error in the reference standard, as well as through correlation of the test with the reference. Result: The true area under the ROC curve (assuming a perfect gold standard) was 0.76. The area under the ROC curve was lowest for a correlation of –0.3 at 0.69. For correlation values less than 0.3, use of an imperfect reference standard to determine disease status biased the ROC downward, and only for a correlation of 0.5 was the area under the curve biased upward (0.80). Conclusion: In these simulations, only if the diagnostic test was at least moderately correlated with the imperfect reference standard was an upward bias observed. In cases of conditional independence or negative correlation, the impact of utilizing an imperfect reference was to underestimate diagnostic accuracy of the test. If the diagnostic accuracy of a reference standard is known, and normality assumptions hold, correlation estimates between the reference and a diagnostic test for evaluation might provide value in estimating the extent of potential bias introduced to estimates of diagnostic accuracy. QMA-22 ESTIMATING SURIVAL GAINS—CAN WE RELY ON “END-OFSTUDY” RESULTS? (QMA) Evidence, Economics, and Ethics—Meta-analysis Ivar Sønbø Kristiansen, MD, PhD, MPH, University of Oslo, Oslo, Norway, and Henrik Stovring, PhD, University of Arhus, Arhus, Denmark Purpose: Economic evaluation of interventions for chronic diseases is frequently based on data at the end of clinical trials. This approach is based on the assumptions that relative hazards are constant across time. The aim of this study was to explore whether this assumption is violated in terms of survival curve crossings or convergences. Method: We identified all time-to-event graphs during 2007 in Annals of Internal Medicine, BMJ, JAMA, New England Journal of Medicine and Lancet. The following data were extracted: type of disease, type of exposure, number of comparator groups, number of paired comparisons, type of primary and secondary end-points, sample size, maximum follow-up time, survival curve convergences, survival curve crossings and type of epidemiologic design. Result: In total 78% of the 177 publications had survival curve convergences and 42% survival curve crossings. In multivariate logistic regression, survival curve convergence was positively associated with ‘more than one paired comparison’ (OR 3.7, 95% CI 1.3–10.8) and death as a secondary endpoint (OR 8.1, 95% CI 1.1–65.5). No association was found between survival curve crossings and any of the explanatory variables. Conclusion: Survival curve convergences and crossings are common phenomena in medical research. The results warrant care

in making inferences about the effectiveness of interventions for chronic diseases on the basis of measurement at a single point in time. QMA-23 COST-EFFECTIVENESS ANALYSIS FOR VARIOUS COLORECTAL CANCER SCREENING STRATEGIES IN TAIWAN (QMA) Advances in Quantitative Methods—Simulation & Decision Modeling Chiahsuan W. Li, MS, and Karen M. Kuntz, ScD, University of Minnesota, Minneapolis, MN Purpose: To evaluate colorectal cancer (CRC) screening strategies for the Taiwanese population compared with the current policy of biennial fecal occult blood testing (FOBT) for ages 50–69 without post-polypectomy surveillance. Method: We developed a Monte Carlo microsimulation model to track the incidence, size and location of adenomas and their potential progression to CRC, for a hypothetical cohort of 50-year-old Taiwanese at average risk for CRC. We evaluated screening strategies that extend the age range for screening (50–69 to 50–75), decrease the screening interval (biennial to annual), add post-polypectomy surveillance, and evaluate different screening modalities (colonoscopy, sigmoidoscopy) at recommended intervals. We assumed 60% adherence with FOBT and 40% adherence with endoscopy. The cost-effectiveness analysis was conducted from a third-party payer perspective. Sensitivity analyses were performed to test the robustness of the finding when varying adherence and test performance. Result: The current Taiwanese policy reduced CRC incidence by 23% and CRC mortality by 28%. Adding post-polypectomy surveillance to the current policy further reduced both incidence and mortality by an absolute 2%, while extending the age range along with the surveillance contributed to additional 6% increase in the effect compared with the current policy. Annual FOBT from age 50 to 75 along with surveillance reduced CRC incidence by 46% and CRC mortality by 52%, and was consistently cost saving compared to the current policy in both base-case and sensitivity analyses. Ten-yearly colonoscopy was more effective than the current policy, but it was also more costly in the base-case analysis and under the optimal adherence scenario, whereas it was cost saving when we used lower test performance estimates for FOBT. Extending the age range with surveillance was consistently more effective and less costly than adding surveillance alone relative to the current policy in sensitivity analyses. Conclusion: The costs associated with expanding the current Taiwanese CRC screening program to include annual FOBT with surveillance for individuals aged 50–75 would be offset by the cancer treatment costs averted, and would reduce the burden of CRC in Taiwan. QMA-24 MONTE CARLO APPROACH TO CALIBRATION OF DISEASE HISTORY MODELS FOR HEALTH TECHNOLOGY ASSESSMENT: A CASE STUDY (QMA) Advances in Quantitative Methods—Simulation & Decision Modeling Ba’ Pham, MSc, PhD, (c)1, George Tomlinson, PhD2, Paul Grootendorst, PhD3, and Murray D. Krahn, MD, MSc2 (1) Toronto Health Economics and Technology Assessment Collaborative, Toronto, ON, Canada, (2) University of Toronto, Toronto, ON, Canada, (3) Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada Purpose: Conceptually simple, Monte Carlo calibration (i.e., random search) is frequently used in the development of disease history models for economic evaluation. We evaluate whether


E169

ABSTRACTS

MC calibration determines at least approximately correct values of unknown inputs to a hypothetical model. Methods: Hypothetical model: a simplified history model of pressure ulcers (i.e., bed sores) in individuals receiving home care. The Markov model includes 3 health states (i.e., ulcer stage 0–1, 2, 3–4) with four transition parameters: weekly incidence of developing a stage-2 ulcer (°0), healing rates of stage-2 (q1) and stage 3–4 ulcers (q3), and progression rate from stage 2 to 3–4 (q2). “True” values of °0 and q-q3 were a-priori estimated. Base case analysis: Given the incidence °0, the model was calibrated to observed stagespecific prevalence data to determine the calibration parameters q1-q3. Prevalence was generated from the model using Kolmogorov’s forward equations: for observed prevalence, we used true values of °0 and q1-q3; and for projected prevalence, the true value of °0 and randomly generated values for q1-q3. Sensitivity analysis: MC calibration was evaluated with respect to: i) uncertain incidence °0, ii) multiple calibration targets (i.e., prevalence observed at multiple time points), iii) target misalignment (i.e., different timing between observed and projected prevalence), iv) goodness-of-fit assessment (i.e., Pearson’s, likelihood-ratio fit-statistics), v) acceptance criterion for good-fit parameter sets, vi) prior ranges of q1-q3, vii) sampling methods (i.e., random or Latin-hypercube sampling), and viii) sample size (e.g., 1000 to 100,000 random parameter sets). Outcome measures: i) number of good-fit parameter sets from the MC calibration, ii) number of unbiased good-fit parameter sets (i.e., calibrated q1-q3 were within 95% confidence intervals of their true values), and iii) relative errors of individual good-fit parameters. Results: The MC calibration yielded an ensemble of the goodfit parameter sets, representing post-calibration uncertainty. MC calibration performed well with accurate input data, multiple calibration targets, and perfect alignment. Otherwise, the number of biased good-fit parameter sets increased. MC calibration was robust with respect to variation in methods for goodness-of-fit assessment, acceptance criterion, varied ranges of calibration parameters, sampling methods, and sample size of the random parameter sets. Conclusions: Our results provide evidence in support of recently proposed components for the standardized calibration reporting checklist, and suggest areas for further methodological development of model calibration. QMA-25 USING AHP WEIGHTS TO FILL MISSING GAPS IN MARKOV DECISION MODELS (QMA) Advances in Quantitative Methods—Simulation & Decision Modeling Marjan J. M. Hummel, PhD, Lotte M. G. Vrijhoef-Steuten, PhD, Gijs van de Wetering, MSc, Karin G. M. Groothuis, PhD, Marjolein Hilgerink, MSc, Carine J. M. Doggen, PhD, and Maarten J. IJzerman, PhD, University of Twente, Enschede, Netherlands Purpose: Our study aims to combine the versatility of the Analytic Hierarchy Process (AHP) with the decision-analytic sophistication of Markov modelling in a new methodology for early technology assessment. As an illustration, we apply this methodology to a new technology to diagnose breast cancer. Method: Markov modelling is a commonly used approach to support decision making about the application of health care technology. We use a basic Markov model to compare the incremental cost-effectiveness of alternative technologies in terms of their costs and clinical effectiveness. The AHP is a technique for multi-criteria analysis, relatively new in the field of technology assessment. It can integrate both quantitative and qualitative criteria in the assessment of alternative technologies. We applied the AHP to prioritize a more versatile set of outcome measures than Markov models do. These outcome measures include the clinical

effectiveness and its determinants, as well as costs, patient comfort and safety. Furthermore, the AHP is applied to predict the performance of the new technology with regard to these outcome measures. Result: We systematically estimated priors on the clinical effectiveness of the new technology. In our illustration, estimations on the sensitivity and specificity of the new diagnostic technology were used as an input in the Markov model. Moreover, prioritized outcome measures including the clinical effectiveness (w = 0.61), patient comfort (w = 0.09) and safety (w = 0.30) could be integrated into one combined outcome measure in the Markov model. Conclusion: Combining AHP and Markov modelling is particularly valuable in early technology assessment when evidence about the effectiveness of health care technology is still missing. Moreover, this combination can be valuable in case decision makers are interested in other patient relevant outcomes measures besides the technology’s clinical effectiveness, and which are not (adequately) captured in the mainstream utility measures. These outcome measures can have a strong impact on the successful application of health care technology. QMA-26 USING BAYESIAN METHODS TO SYNTHESISE EVIDENCE ON THE EFFICACY OF ELECTRONIC AIDS TO SMOKING CESSATION (QMA) Advances in Quantitative Methods—Bayesian Methods, Analyses & Applications Jason Madan, MA, MSc, PhD, and Nicky J. Welton, PhD, Bristol University, Bristol, United Kingdom Purpose: To estimate the efficacy of electronic aids to smoking cessation, as an input into a cost-effectiveness analysis of such interventions. Method: A prior systematic review identified studies evaluating electronic aids to smoking cessation (websites, computer-based tailored advice, chat rooms, email/SMS communications, etc). A classification system was developed for these aids with five levels, ascending from single-component interventions providing generic advice (e.g., static website) to interventions with multiple components providing tailored feedback through several channels (e.g. interactive website + email + chat room). To synthesise this evidence and estimate class-level treatment effects, a Bayesian mixedtreatment comparison was constructed. This involved fitting a proportional-hazard Weibull survival model on sustained abstinence, as this was the main outcome of interest for the cost-effectiveness model. For some treatment classes, evidence on sustained abstinence was lacking, but point abstinence rates were available. A log-odds treatment effect was fitted to the latter type of outcome, along with a correlation structure between treatment effects on the two outcome types. This allowed treatment effects on sustained abstinence to be estimated for all intervention classes. Result: 51 studies were included in the analysis, with 127 arms. 62 arms reported sustained abstinence, of which 51 arms also reported point abstinence. The mean shape for the Weibull survival model was 0.18 (95% credible interval (CrI) 0.09–0.32), which was consistent with the hypothesis that quitting is hardest initially and becomes easier to sustain with time. There was an inverse relationship between the mean hazard ratio and the class of treatment, with estimates ranging from 1.15 (95% CrI 0.87–1.45) for the class one hazard ratio to 0.81 (95% CrI 0.68–0.93) for the class five hazard ratio. Conclusion: Bayesian methods allow for uncertainty in treatment effects to be quantified whilst incorporating treatment classes, multiple outcomes, and repeated measurements. Application to data on electronic smoking cessation aids demonstrated that such interventions are likely to improve sustained abstinence, and that increased intensity may lead to better outcomes. Further studies


ABSTRACTS

are required to determine the incremental benefits of more intensive electronic interventions; the design of these trials should be informed by value of information analyses. QMA-27 OPTIMAL STRAIN SELECTION FOR THE ANNUAL INFLUENZA VACCINE UNDER THE CLOSEST ANTIGENIC DISTANCE IMMUNITY MODEL (QMA) Advances in Quantitative Methods—Simulation & Decision Modeling Osman Ozaltin, MS1, Oleg Prokopyev, PhD1, Andrew Schaefer, PhD1, and Mark S. Roberts, MD, MPP2 (1) University of Pittsburgh, Pittsburgh, PA, (2) University of Pittsburgh School of Medicine, Pittsburgh, PA Purpose: Seasonal influenza is a major public health concern and the first line of defense is the flu shot. Antigenic drifts and high rate of influenza transmission require annual updates in the flu shot composition. We propose a mathematical model to optimize the strain selection decisions for the annual flu shot. We analyze the trade-offs involved with various different policy issues. Methods: We take the view of the Vaccines and Related Biological Products Advisory Committee of FDA; and optimize strain selections based on a production plan that is designed by the manufacturers exogenously. To select the strains for the flu shot, the committee meets for the first time at the end of February. In this initial meeting, recommendations are made for prevalent strains that have sufficient production yields. On the other hand, if the information is insufficient to select a strain for a category, the final decision of that category is deferred to the next meeting, which is held after four weeks. We propose a multi-stage stochastic mixed-integer program to determine the best flu shot composition, and the optimal time to select it. We consider all three flu strain categories (A/H1N1, A/H3N2, and B). We calibrate the cross-protective immunity among the candidate strains using a shape space model in which only the vaccine strain that has the smallest antigenic distance triggers immune response. The strain selection decisions are made to maximize the expected benefit of immune response minus the expected shortage cost under various different scenarios. Results: Selecting the strain of each category independent from the others results in a loss of up to 14% of the optimal benefit. The cost of considering only the most prevalent strains might be as high as the 45% of the optimal benefit. Our model allows incorporating more than three strains in the flu shot; hence it can be used to assess the benefits of a tetravalent flu shot. We find that incorporating a fourth strain into the flu shot would potentially prevent over a million flu cases. Conclusions: Integrating the composition and timing decisions is crucial to design the best flu shot. The uncertainties associated with the flu shot preparation campaign should analytically be incorporated into the strain selection decisions. QMA-28 DECISION MODEL: FACTOR V LEIDEN AND ANTICOAGULATION DURATION (QMA) Advances in Quantitative Methods—Bayesian Methods, Analyses & Applications Anna K. Donovan, MD, Kenneth Smith, MD, MS, and Margaret V. Ragni, MD, MPH, University of Pittsburgh School of Medicine, Pittsburgh, PA Purpose: Current anticoagulation guidelines suggest that the length of anticoagulation (AC) for unprovoked venous thromboembolism (VTE) should be determined by an individual risk assessment, balancing bleeding risk due to AC with the risk of VTE recurrence. Among individuals who are heterozygous for the factor V Leiden

(FVL) mutation, however, not only is VTE risk greater, but the risk for bleeding is lower, suggesting standard recommendations may not apply and longer term AC be considered. Methods: We constructed a Markov model to compare lifetime anticoagulation vs. shorter durations in 20-year-old FVL patients with an unprovoked VTE. Risks of major, minor, and fatal bleeding with and without AC, VTE morbidity and mortality, and quality of life utilities were obtained from the literature. We used sensitivity analyses to determine model parameter values favoring lifelong AC in FVL patients. Outcomes are in quality adjusted life years (QALYs), discounted at 3%/yr. Results: In general population groups (where VTE relative risk and odds ratio for AC-related bleeding are 1.0), the short-term AC strategy has 0.17 QALYs more than lifetime AC. In FVL patients, lifetime AC was favored if their VTE relative risk was >1.1 or if their bleeding odds ratio was