How to perform a critical appraisal of diagnostic tests

1 downloads 0 Views 635KB Size Report
Jun 13, 2014 - dence quickly and efficiently to answer a clinically de- rived question ... B1 132G Taubman Center/5302 1500 E. Medical Center Drive,. Ann Arbor ...... In: Evidence-based practice workbook, 2nd edn. ... Medline. http://gateway.
Pediatr Radiol DOI 10.1007/s00247-014-3202-y

REVIEW

How to perform a critical appraisal of diagnostic tests: 7 steps Aamer Chughtai & Aine Marie Kelly & Paul Cronin

Received: 13 June 2014 / Revised: 31 July 2014 / Accepted: 1 October 2014 # Springer-Verlag Berlin Heidelberg 2015

Abstract The critically appraised topic (CAT) is a format in evidence-based practice for sharing information. A CAT is a standardized way of summarizing the most current research evidence focused on a pertinent clinical question. Its aim is to provide both a critique of the most up-to-date retrieved research and an indication of the clinical relevance of results. A clinical question is initially generated following a patient encounter, which leads to and directs a literature search to answer the clinical question. Studies obtained from the literature search are assigned a level of evidence. This allows the most valid and relevant articles to be selected and to be critically appraised. The results are summarized, and this information is translated into clinically useful procedures and processes. Keywords Critical appraisal . Conditional probability . Evidence-based medicine . Evidence-based radiology . Levels of evidence . Literature search . Sensitivity and specificity . Systematic review

Introduction Rapid development in medical science during last five decades has spawned an immense volume of scientific data,

Electronic supplementary material The online version of this article (doi:10.1007/s00247-014-3202-y) contains supplementary material, which is available to authorized users. A. Chughtai : A. M. Kelly : P. Cronin Department of Radiology, Division of Cardiothoracic Radiology, University of Michigan, Ann Arbor, MI, USA P. Cronin (*) Department of Radiology, University of Michigan Hospitals, B1 132G Taubman Center/5302 1500 E. Medical Center Drive, Ann Arbor, MI 48109-5302, USA e-mail: [email protected]

which is said to double every 10 years [1]. Given the large amount of the data, keeping up with the most up-to-date information could be daunting for a busy practicing physician. Therefore, development of evidence-based practice (EBP) publications in this area can help health practitioners to keep up to date with the information overload required to solve ever increasing complex health problems [1]. However, it is important to be able to retrieve, analyze and thoroughly evaluate a large amount of evidence quickly and efficiently to answer a clinically derived question. One way of achieving this is by metaanalysis of the literature. However, this process is timeconsuming and requires in-depth review and detailed statistical analysis of the available data. More efficient and quicker sharing of evidence-based information can be achieved in several other ways, one of which is critically appraised topic (CAT). A CAT provides a standardized way of summarizing the most current research evidence focused on a pertinent clinical question. It also provides both a critique of the most up-to-date retrieved research and an indication of the clinical relevance of results [2]. In other words, CATs are not just abstracts of existing evidence. They critique the methodology, internal validity, external validity and statistical validity. They also assess and clinical applicability of the best research evidence to date and summarize the results into a few pages [2, 3]. For busy practicing clinicians who may not have the time to pursue the answer to a clinical problem and/or the specialized skill set to critically appraise the literature and reach an appropriate conclusion, a critically appraised topic provides easier access to the scientific literature [2]. A CAT answers an explicit clinical question arising from a specific patient encounter. In other words, a health professional generates a clinical question from a real clinical situation, follows this by finding and appraising the evidence, and finally applies it in clinical practice, which is the essence of EBP [1].

Pediatr Radiol

In this review, the seven steps involved in performing and writing a CAT to help solve a clinical problem and introduce some of the available evidence resources are described. The review will be limited to a discussion of diagnostic test accuracy studies as that is what the majority of radiologists do. This review is illustrated with two previously published CATs, on the comparative effectiveness of imaging modalities for the diagnosis of upper and lower urinary tract malignancy, and on the comparative accuracy of intravenous contrast-enhanced CT versus noncontrast CT plus intravenous contrastenhanced CT in the detection and characterization of patients with hypervascular liver metastases [4, 5].

How to perform and write a CAT The following seven steps of evidence-based practice are required to write a CAT. These steps are: 1. 2. 3. 4. 5. 6. 7.

Ask a focused and answerable question Search for the best available evidence Apply a level of evidence to the retrieved literature Critically appraise the retrieved literature Evaluate the strength of the evidence from the literature Apply the results to clinical practice Evaluate performance

Step 1: ask a focused and answerable question This is the first step in writing a CAT. A question focused on a specific clinical problem is formulated, which can be answered by the literature based on the question [6, 7]. A key concept of EBP is formulating an answerable question. The question should be relevant to the clinical problem and answerable in the time available [8]. The question should reflect both the patient’s illness or problem at hand, and the information the clinician needs to know. The clinical question is converted into a format that can be used to direct the literature search. The question consists of four parts: (1) clinical problem, (2) intervention or test to be compared, (3) comparison intervention or test and (4) outcome. The question components are abbreviated to PICO (patient/population [clinical] problem, intervention/ index test, comparator [test], and outcome). These components are summarized in Table 1 using the CATs on the comparative effectiveness of imaging modalities for the diagnosis of upper and lower urinary tract malignancy, and comparative accuracy of intravenous contrast-enhanced CT versus noncontrast CT plus intravenous contrast-enhanced CT in the detection and characterization of patients with hypervascular liver metastases [4, 5]. Asking the clinical question in this format helps direct

the literature search and increase the likelihood of a successful search (Table 2).

Step 2: search for the best available evidence Evidence may be found in many places. The easiest and most up-to-date sources of information are electronic resources. These include journals, information systems and synopses, and search engines. Brian Haynes has described the 4S evolution of locating the best evidence [9]. This evidence pyramid is composed of four levels: primary literature, syntheses, synopses, and information systems [7]. The base of the evidence pyramid (Level 1) is composed of primary research. Examples include randomized controlled trials (RCT), cohort or observational studies, case control studies, studies of diagnostic accuracy and cross-sectional studies [9]. Sources for primary literature include PubMed, Medline, Google Scholar, ISI Web of Knowledge, MD Consult and EMBASE [10–15]. ARRS GoldMiner and Yottalook are medical imaging search engines [16, 17]. There are three levels of secondary research above this. Level 2 is composed of syntheses. Syntheses include systematic reviews of primary research which are synthesized together into a larger study. Examples of syntheses are meta-analyses and Cochrane reviews [18–20]. The National Guidelines Clearinghouse, the National Institute for Clinical Excellence, the National Library for Health, the Scottish Intercollegiate Guidelines Network, and SUMSearch are resources for syntheses (including guidelines) [21–24]. These resources are presented in greater detail in a recent review [25]. PubMed Clinical Queries is another resource for systematic reviews. There are no specific radiology resources at evidence pyramid Level 2. Level 3 is composed of synopses. Synopses include the American College of Physicians Journal Club, Bandolier, the British Medical Journal Clinical Evidence, Dynamed, Evidence Based Medicine Online, the Physicians Information and Education Resource, the Turning Research into Practice Database and UpToDate online [26–33]. There are no specific radiology resources at evidence pyramid Level 3. Level 4 is composed of systems. There are no specific radiology resources at evidence pyramid Level 4. The search of the literature should be thorough, which requires familiarity with the available types and sources of information, the level of evidence and knowledge of how to search for a particular type of evidence and how to select articles with high level of evidence.

Search the primary literature When systematically searching the primary literature, PubMed is probably the most-used electronic database [7]. This process is already described in great detail in literature [7,

Pediatr Radiol Table 1 Examples of critically appraised topic (CAT) questions and their PICO format (patient/population/[clinical] problem, intervention/index test, comparator test, and outcome) CAT study question

PICO format question

“We wondered whether CTU, MRU, excretory urography, or retrograde urography performs better in the diagnosis of upper urinary tract transitional cell carcinoma.” [4]

Patient or problem

“In patients with upper urinary Patients with upper tract transitional cell carcinoma, urinary tract transitional how does CTU vs. MRU vs. cell carcinoma retrograde pyelography vs. excretory urography compare with each other for diagnosis?” [4] “We wondered whether the CTC, “In patients with bladder cancer, Patients with bladder MRC, or ultrasound performs how does MRC vs. CTC vs. cancer better in the diagnosis of lower ultrasonography (US) compare urinary tract transitional cell with each other for diagnosis?” carcinoma (bladder cancer).” [4] [4] “We wondered whether the “In patients with breast cancer, Patients with breast addition of NECT to CECT melanoma, NET, or thyroid cancer, melanoma, improves liver mass detection cancer, how does CECT versus NET, or thyroid cancer for initial staging or follow-up its combination with NECT with liver metastases at of patients with known breast compare at initial staging or initial staging or followcancer, melanoma, NET (e.g., follow-up for detection or up carcinoid, pancreatic islet cell, characterization of liver or medullary thyroid cancers), metastases?” [5] or thyroid cancer (nonmedullary).” [5] “In patients with breast cancer, Patients with breast “We wondered whether the melanoma, NET, or thyroid cancer, melanoma, addition of NECT to CECT cancer, how does the NET, or thyroid cancer improves liver mass radiologist’s confidence level with liver metastases at characterization for initial for characterization of liver initial staging or followstaging or follow-up of patients metastases compare when up with known breast cancer, interpreting CECT versus its melanoma, NET (e.g., combination with NECT?” [5] carcinoid, pancreatic islet cell, or medullary thyroid cancers), or thyroid cancer (nonmedullary).” [5] “We wondered whether the “In patients with breast cancer, Patients with breast addition of NECT to CECT melanoma, NET, or thyroid cancer, melanoma, improves management in cancer, how does the NET, or thyroid cancer patients with liver mass for characterization of significant with liver metastases at initial staging or follow-up of extrahepatic incidental findings initial staging or followpatients with known breast (including lymph nodes and up cancer, melanoma, NET (e.g., lesions in the adrenal glands, carcinoid, pancreatic islet cell, kidneys, and pancreas) compare or medullary thyroid cancers), when interpreting CECT versus or thyroid cancer its combination with NECT?” (nonmedullary).” [5] [5]

Intervention Comparison intervention

Outcome

CTU MRU

Accuracy in diagnosis of upper urinary tract transitional cell carcinoma

Excretory urography Retrograde urography

Accuracy in diagnosis of bladder cancer

CECT

CECT in Accuracy in combination diagnosis of with NECT liver metastases

CECT

CECT in Radiologist’s combination confidence with NECT level for characterization of liver metastases

CECT

CECT in Characterization combination of significant with NECT extrahepatic incidental findings

CECT (intravenous) contrast-enhanced CT, CTC computed tomography cystography, CTU computed tomography urography, MRC magnetic resonance cystography, MRU magnetic resonance urography, NECT non-enhanced computed tomography, NET neuroendocrine tumors

34, 35]. The process uses medical subject heading (MeSH) terms which provide a consistent method for information retrieval [7]. To find publications on a specific topic, MeSH terms are used for each of the four portions of the PICO (patient/population/[clinical] problem, intervention/index test, comparator [test], and outcome) question [7]. This process is summarized in Table 2.

Step 3: apply level of evidence to retrieved literature Study designs have been graded in a hierarchy of evidence by The Oxford Centre for Evidence-Based Medicine (CEBM) (Table 3). With the help of Table 3, a level of evidence can be rapidly assigned to a retrieved article [36].

Pediatr Radiol Table 2 Examples of search strategies for the PICO format (patient/population/[clinical] problem, intervention/index test, comparator test, and outcome) of questions PICO format question

Patient or problem

Intervention

“In patients with upper urinary tract transitional cell carcinoma, how do CTU vs. MRU vs. retrograde pyelography vs. excretory urography compare with one another for diagnosis?” [4]

(Upper Urinary AND Magnetic Resonance Tract Urography OR OR Upper Urogenital MR System) Urography AND OR (Cancer MRU OR OR Carcinoma Computed OR Tomography Malignancy Urography OR OR CT Urography OR Neoplasm) CTU

“In patients with bladder cancer, how do MRC vs. CTC vs. ultrasonography (US) compare with one another for diagnosis?” [4]

Bladder AND (Cancer OR Carcinoma OR Malignancy OR Neoplasm)

“In patients with breast cancer, melanoma, NET, or thyroid cancer, how does CECT versus its combination with NECT compare at initial staging or follow-up for detection or characterization of liver metastases?” [5]

(Breast OR Thyroid OR Neuroendocrine OR Carcinoid OR Islet cell OR Melanoma) AND (Cancer

Comparison intervention

Outcome

AND Excretory Urography AND Diagnosis OR OR Excretory Pyelography (Sensitivity OR AND Intravenous Specificity) Pyelography OR Intravenous Urography OR IV Pyelography OR IV Urography OR IVP OR IVU OR Retrograde Urography OR Retrograde Pyelography AND Ultrasound AND Diagnosis OR OR Ultrasonography (Sensitivity OR AND Sonography Specificity)

AND Magnetic Resonance Cystoscopy OR MR Cystoscopy OR Magnetic Resonance Virtual Cystoscopy OR MR Virtual Cystoscopy OR MRVC OR Computed Tomography Cystoscopy OR CT Cystoscopy OR Computed Tomography Virtual Cystoscopy OR CT Virtual Cystoscopy OR CTVU AND (Contrast-enhanced AND (Non-enhanced OR OR Contrast Non-contrast OR OR Enhanced Without contrast) OR AND Contrast material (Contrast-enhanced OR OR Intravenous contrast) Contrast AND OR (CT Enhanced OR OR Contrast material

AND (Liver OR Hepatic) AND (Mass OR Metastasis OR Lesion OR Tumor) AND (Detection

Pediatr Radiol Table 2 (continued) PICO format question

“In patients with breast cancer, melanoma, NET, or thyroid cancer, how does the radiologist’s confidence level for characterization of liver metastases compare when interpreting CECT versus its combination with NECT?” [5]

“In patients with breast cancer, melanoma, NET, or thyroid cancer, how does the characterization of significant extrahepatic incidental findings (including lymph nodes and lesions in the adrenal glands, kidneys, and pancreas) compare when interpreting CECT versus its combination with NECT?” [5]

Patient or problem

Intervention

Comparison intervention

Outcome

OR Tumor OR Mass OR Carcinoma OR Neoplasm) (Breast OR Thyroid OR Neuroendocrine OR Carcinoid OR Islet cell OR Melanoma) AND (Cancer OR Tumor OR Mass OR Carcinoma OR Neoplasm) (Breast OR Thyroid OR Neuroendocrine OR Carcinoid OR Islet cell OR Melanoma) AND (Cancer OR Tumor OR Mass OR Carcinoma OR Neoplasm)

Computed tomography)

OR Intravenous contrast) AND (CT OR Computed tomography

OR Diagnosis OR Characterization OR Sensitivity OR Specificity)

AND (Contrast-enhanced OR Contrast OR Enhanced OR Contrast material OR Intravenous contrast) AND (CT OR Computed tomography)

AND (Non-enhanced OR Non-contrast OR Without contrast) AND (Contrast-enhanced OR Contrast OR Enhanced OR Contrast material OR Intravenous contrast) AND (CT OR Computed tomography)

AND (Liver OR Hepatic) AND (Mass OR Metastasis OR Lesion OR Tumor) AND (Confidence level OR Conspicuity)

AND (Contrast-enhanced OR Contrast OR Enhanced OR Contrast material OR Intravenous contrast) AND (CT OR Computed tomography)

AND (Non-enhanced OR Non-contrast OR Without contrast) AND (Contrast-enhanced OR Contrast OR Enhanced OR Contrast material OR Intravenous contrast) AND (CT OR Computed tomography)

AND (Liver OR Hepatic) AND (Mass OR Metastasis OR Lesion OR Tumor) AND (Incidental finding OR Extrahepatic finding

CECT (intravenous) contrast-enhanced CT, CTC computed tomography cystography, CTU computed tomography urography, MRC magnetic resonance cystography, MRU magnetic resonance urography, NECT non-enhanced computed tomography, NET neuroendocrine tumors

Step 4: critically appraise the retrieved literature This step is described in literature elsewhere in detail and involves evaluating the retrieved literature for relevance to the clinical question, level of evidence, internal validity

and external validity or generalizability [34]. To appraise an article of diagnostic testing/imaging, it is important to evaluate two sections — the Materials and Methods and the Results. The Materials and Methods section is assessed for validity of the study. The Results section is assessed for

Pediatr Radiol Table 3 Designation of levels of evidence according to type of research. Adapted from the Oxford Centre for Evidence-Based Medicine Web site [36] Level

Prognosis

Diagnosis

Differential diagnosis / symptom prevalence study

1a

SR (with homogeneity*) of inception cohort studies; CDR√ validated in different populations Individual inception cohort study with>80% follow-up; CDR√ validated in a single population All or none case-series SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDR√ or validated on split-sample§ only “Outcomes” Research

SR (with homogeneity*) of Level 1 diagnostic studies; CDR√ with 1b studies from different clinical centers Validating** cohort study with good√√√ reference standards; or CDR√ tested within one clinical center Absolute SpPins and SnNouts√√ SR (with homogeneity*) of Level >2 diagnostic studies

SR (with homogeneity*) of prospective cohort studies

1b

1c 2a

2b

2c 3a

*

3b 4

Case-series (and poor quality prognostic cohort studies***) Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

5

Exploratory** cohort study with good√√√ reference standards; CDR√ after derivation, or validated only on split-sample§ or databases SR (with homogeneity ) of 3b and better studies Non-consecutive study; or without consistently applied reference standards Case-control study, poor or non-independent reference standard Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

Prospective cohort study with good follow-up**** All or none case-series SR (with homogeneity*) of 2b and better studies Retrospective cohort study, or poor follow-up

Ecological studies SR (with homogeneity*) of 3b and better studies Non-consecutive cohort study, or very limited population Case-series or superseded reference standards Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”

CDR clinical decision rule, RCT randomized controlled trial, SR systematic review Use “-” to denote the level of that fails to provide a conclusive answer because there is either a single result with a wide confidence interval or a systematic review with troublesome heterogeneity. Such evidence is inconclusive, and therefore can only generate grade D recommendations * By homogeneity we mean a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a “-” at the end of their designated level **

Validating studies test the quality of a specific diagnostic test, based on prior evidence. An exploratory study collects information and trawls the data (e.g., using a regression analysis) to find which factors are significant

***

By poor-quality prognostic cohort study we mean one in which sampling was biased in favor of patients who already had the target outcome, or the measurement of outcomes was accomplished in 80%, with adequate time for alternative diagnoses to emerge (for example 1–6 months acute, 1– 5 years chronic)



Clinical decision rules are algorithms or scoring systems that lead to a prognostic estimation or a diagnostic category

√√

An “Absolute SpPin” is a diagnostic finding whose specificity is so high that a positive result rules in the diagnosis. An “Absolute SnNout” is a diagnostic finding whose sensitivity is so high that a negative result rules out the diagnosis √√√

Good reference standards are independent of the test and are applied blindly or objectively to all patients. Poor reference standards are haphazardly applied but are still independent of the test. Use of a non-independent reference standard (where the test is included in the reference, or where the testing affects the reference) implies a level 4 study

§ Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into “derivation” and “validation” samples

statistical strength of the study [37]. Appraisal is performed on primary as well as secondary literature. If the search identifies secondary literature, this is appraised first because it contains higher level of evidence. The articles that are most relevant to the clinical question with higher level of evidence compared to other articles at the same level of evidence are selected to summarize the results of the literature review.

Secondary literature AMSTAR (assessment of multiple systematic reviews) is an appraisal tool for evaluating the quality of systematic reviews including meta-analyses. The goal of AMSTAR is to create a valid, reliable and useable instrument that helps users appraise systematic reviews, focusing on their methodological quality. It is practical because completion time is not long and one can

Pediatr Radiol

arrive at a final decision with ease due to comprehensible guidelines. It consists of 11 items evaluating for potential bias. The 11 items are (1) the provision of an a priori design; (2) assessment for duplicate study selection and data extraction; (3) performance of a comprehensive literature search with (4) use of the status of publication as an inclusion criterion; (5) provision of a list of included and excluded studies and (6) characteristics of included studies; (7) assessment of the scientific quality of included studies and (8) its use in formulating the conclusion; (9) assessment of methods for assessing heterogeneity; (10) publication bias; and (11) combining of the results of studies and clearly describing the potential sources of support [38]. The 11 AMSTAR items are listed in Appendix Table 1 [Electronic Supplementary Material (ESM)] with more detail. There is also an instrument for assessing guidelines, the Appraisal of Guidelines for Research and Evaluation (AGREE) instrument [39]. All items are scored on a seven-point Likert scale by at least two independent observers [40]. This has been undated to AGREE II, (Appendix Table 2 (ESM)).

Primary literature The primary literature is appraised using an instrument called QUADAS (Quality Assessment of Diagnostic Accuracy Studies, Appendix Table 3 (ESM)) [41]. This evaluates the internal validity of the studies using 14 questions that assess for potential biases [41]. The potential answers to these questions are yes, no, or unclear. A potential source of methodological bias is identified each time the answer to a question is “no” [37]. The potential biases assessed are patient spectrum and selection bias, partial verification bias, differential verification bias, incorporation bias, index test result bias, reference test result bias [41]. QUADAS also assesses whether there was an acceptable delay between tests, whether there was relevant clinical information, how uninterpretable tests were reported and how withdrawals were explained.

Additional issues in radiology Because of the nature of imaging, there are five further issues that should be considered with radiologic studies. These are: (1) The imaging technique should be described in adequate detail so that it can be reproduced in one’s own radiology department. (2) The imaging test should be evaluated and the reference test should be performed to the same standard of excellence. (3) “Generations” of technology utilized within the same imaging modality (e.g., 1tesla versus 1.5-tesla versus 3-tesla MRI) should be adequately accounted for with the study design and subsequent discussion. (4) The radiation exposure involved in the imaging study should be presented. (5) Image review technique, (i.e. a monitor [soft copy] versus film [hard-copy] images) should be discussed [37].

Statistical measures For diagnostic/imaging tests the statistical parameters of importance include prevalence of disease, sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio, and studies should note how these parameters affect the pre-test and post-test probability and conditional probabilities (Table 4) [42, 43].

Step 5: evaluate strength of the evidence from the literature When evaluating the available evidence for given tests or interventions, the Agency for Healthcare Research and Quality (AHRQ) United States Preventive Services Taskforce (USPFTF) and the CEMB at Oxford, UK, have devised strengths of recommendations (Tables 5 and 6) [44–47]. In addition, the American College of Radiology (ACR) and the Royal College of Radiologists (RCR) have developed guidelines and recommendations for use of diagnostic imaging for all subspecialties [48, 49]. For a clinical situation, a subspecialty committee of the ACR rates the appropriateness of each diagnostic test using an ordinal scale from 1 to 9 (1 being least appropriate; 9 being most appropriate).

Step 6: apply the results to clinical practice In this step, the results from the selected and appraised articles are applied to make and improve clinical decisions regarding specific diagnostic problems [50]. This is done by using the diagnostic test characteristics such as sensitivity, specificity, predictive values and likelihood ratios. These test characteristics are then applied, accounting for the probability of the patient having the disease. The probability of the patient having the disease may range from absolute certainty of the patient having the disease to absolute certainty of the patient not having the disease. However, there is no absolute certainty in clinical medicine. The clinicians therefore work with probability thresholds. The probability threshold above which the clinician considers treatable disease to be present is called the action threshold, whereas the probability below which the disease is disregarded and no treatment is given is called the exclusion threshold. There is gray area in between these two points. A useful diagnostic imaging test can narrow that gray area by moving the probability assessment above the action threshold or below the exclusion threshold. It does so by modifying the pretest probability based on the test results to generate the post-test probability. This can be achieved by using the Bayes

Pediatr Radiol Table 4 Statistical measures for diagnostic literature. Reproduced and modified with permission from [34] Measure

Meaning

Interpretation

Sensitivity

Patients with true positive test results / patients with disease

Specificity

Patients with true negative results / well people (no disease)

Positive predictive value (PPV)

Patient with true positive test results / all patients with positive test result (true + false positives) Patients with true negative results / all patients with negative test result (true + false negatives) Likelihood of a given test result in patients with disease/ the likelihood of the same result in patients without disease Patients with true results (true positives + true negatives) / whole population Patients with disease (true positive + false negative) / whole population

Negative test result in a highly sensitive test (≥95%) can rule out the disease SnNout - Sensitivity-negative rules out. Positive test result in a highly specific test can rule in the disease SpPin - Specificity-positive rules in If a test has a high positive predictive value (>95%), one may confidently commence treatment

Negative predictive value (NPV)

Likelihood ratio (LR)

Accuracy Prevalence of disease

theorem, which is a mathematical routine that estimates the post-test probability [51]. There are two ways to visually calculate a post-test probability; either one can use the Fagan’s nomogram using the positive or negative likelihood ratios (Fig. 1) or one can utilize the graph of

If a test has a high negative predictive value (>95%), one may be able to safely withhold treatment LR=0 (negative LR): excludes the disease LR=infinity (positive LR): confirms the disease LR=1 the test result found equally in the two groups Accuracy is the proportion (or percentage) of patients correctly diagnosed with or without the disease The disease prevalence in the literature should match the disease prevalence of the population being studied

conditional probabilities using the pretest probability and the test result (Fig. 2) [42, 52, 53]. The graph of conditional probabilities estimates the post-test probability of disease in a given patient with a positive or a negative test result utilizing the chosen pre-test probability.

Table 5 The U.S. Preventive Services Task Force (USPSTF) grade definitions and suggestions for practice. The USPSTF assigns one of five letter grades (A, B, C, D or I). Adapted from the U.S. Preventive Services Task Force Web site [44] Grade A B

C

D

I

Definition The USPSTF recommends the service. There is high certainty that the net benefit is substantial. The USPSTF recommends the service. There is high certainty that the net benefit is moderate or there is moderate certainty that the net benefit is moderate to substantial. The USPSTF recommends selectively offering or providing this service to individual patients based on professional judgment and patient preferences. There is at least moderate certainty that the net benefit is small. The USPSTF recommends against the service. There is moderate or high certainty that the service has no net benefit or that the harms outweigh the benefits. The USPSTF concludes that the current evidence is insufficient to assess the balance of benefits and harms of the service. Evidence is lacking, of poor quality, or conflicting, and the balance of benefits and harms cannot be determined.

Suggestions for Practice Offer or provide this service. Offer or provide this service.

Offer or provide this service for selected patients depending on individual circumstances.

Discourage the use of this service.

Read the clinical considerations section of USPSTF Recommendation Statement. If the service is offered, patients should understand the uncertainty about the balance of benefits and harms.

The USPSTF grades the quality of the overall evidence for a service on a three-point scale (good, fair, poor) Good: Evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes Fair: Evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, or consistency of the individual studies, generalizability to routine practice, or indirect nature of the evidence on health outcomes Poor: Evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes

Pediatr Radiol Table 6 Grade recommendations of the Centre for Evidence-Based Medicine Oxford based on the CEBM levels of evidence, (Table 3). Adapted from the CEBM Oxford Web site [36] Grade Recommendation A Consistent level 1 studies B Consistent level 2 or 3 studies or extrapolations‡ from level 1 studies C D

Level 4 studies or extrapolations‡ from level 2 or 3 studies Level 5 evidence or troublingly inconsistent or inconclusive studies of any level

Extrapolations‡ are where data are used in a situation that has potentially clinically important differences than the original study situation

Step 7: evaluate performance In this final step of the critical appraisal process, conclusions are drawn, and recommendations are made. A decision should made regarding the study design and

Fig. 1 Likelihood Ratio (Fagan) Nomogram. Post-test probability is derived by drawing a straight line from the pre-test probability vertical axis to the appropriate likelihood ratio and continuing the straight line to the vertical post-pest probability axis. Where this line intersects the vertical post-pest probability axis is the post-test probability [53]

Fig. 2 Use of Graph of Conditional Probabilities to achieve clinical resolution. Blue line indicates a positive test result, pink line indicate a negative result. Post-test probability for a positive result is derived by drawing a vertical line up to the blue curved line and then across to the yaxis. Post-test probability for a negative result is derived by drawing a vertical line up to the pink curved line and then across to the y-axis. a A weak diagnostic test (sensitivity 60 %, specificity 60 %). b A moderate diagnostic test (sensitivity 75 %, specificity 75 %). c A strong diagnostic test (sensitivity 90 %, specificity 90 %)

Pediatr Radiol Table 7 A list of helpful Web sites. Reproduced with permission from [34] Good Secondary Sources

Good primary sources

Level 3: Evidence-based reviews Search engines: The Cochrane Library DynaMed SumSearch Databases: TRIP database (Turning Research in to Practice) Guidelines: U.S. National Guidelines Clearinghouse U.S. National Library for Health NICE (National Institute of Clinical Excellence) SIGN (Scottish Intercollegiate Guidelines Network) Other Canadian Medical Association New Zealand Guidelines Group Level 2: Synopses Structured abstracts: EBM Online ACP Journal Club Level 1: Information systems Evidence-based summaries: Bandolier Clinical Evidence UptoDate Other Systematic Reviews: Cochrane Library To search several of the databases simultaneously: TRIP database (Turning Research in to Practice) at www.tripdatabase.com Free searching PubMed PubMed Clinical Queries Google Scholar Subscription-based searching Ovid Ovid tutorial Knowledge Finder RSNA Index to the Imaging Literature

quality (good versus bad), the evidence (strong versus weak), and if the study findings are consistent. Even when the evidence is strong, the extent to which the studied patients, diagnostic tests and outcomes translate to a specific medical practice and specific patient should also be determined. This section may also

evaluate how the information derived from the evidence base is applied to the clinical practice. It is, however, important to acknowledge the ongoing developments in imaging technology and improvement in diagnostic accuracy. The limitation of existing literature should be recognized and secondary studies recommended if necessary.

Further help A helpful list of Web sites is given in Table 7.

Conclusion The critically appraised topic is a practical patient or problem-centered evidence-based tool for reviewing, learning and applying critical appraisal skills to be used in clinical practice. A CAT is composed of several steps, starting with a focused clinical question. This drives a focused literature search for studies to address the question. From the resultant literature search, studies are assigned a level of evidence and articles that are most valid and most relevant are selected. These articles are then appraised. The main study results are then summarized and translated into clinically useful measures of accuracy, efficacy or risk. A CAT requires relatively little time to perform and thus can be performed by busy radiologists in clinical practice. The critically appraised topic is an excellent research project for medical students during their radiology attachment. They are also ideal research projects for radiology residents, and CATs can form the center of evidence-based imaging lectures or journal clubs. The critically appraised topic could also guide a practice quality improvement (PQI) project.

Conflicts of interest None

References 1. Dawes M (2005) Critically appraised topics and evidence-based medicine journals. Singap Med J 46:442–448, quiz 449 2. Fetters L, Figueiredo EM, Keane-Miller D et al (2004) Critically appraised topics. Pediatr Phys Ther 16:19–21 3. Wendt O (2006) Developing critically appraised topics (CATs). Presented at the American Speech-Language Hearing Association, division 12: augmentative and alternative communication (DAAC), 7th annual conference, San Antonio. http://www.edst.purdue.edu/ aac/Developing%20Critically%20Appraised%20Topics.pdf. Accessed 29 Aug 2014 4. Razavi SA, Sadigh G, Kelly AM et al (2012) Comparative effectiveness of imaging modalities for the diagnosis of upper and lower

Pediatr Radiol urinary tract malignancy: a critically appraised topic. Acad Radiol 19: 1134–1140 5. Sadigh G, Applegate KE, Baumgarten DA (2014) Comparative accuracy of intravenous contrast-enhanced CT versus noncontrast CT plus intravenous contrast-enhanced CT in the detection and characterization of patients with hypervascular liver metastases: a critically appraised topic. Acad Radiol 21:113–125 6. Glasziou P, Del Mar C, Salisbury J (2007) EBP Step 1: formulate an answerable question. In: Evidence-based practice workbook, 2nd edn. Blackwell Publishing, Oxford, pp 21–34 7. Staunton M (2007) Evidence-based radiology: steps 1 and 2—asking answerable questions and searching for evidence. Radiology 242:23–31 8. Feussner JR, Matchar DB (1988) When and how to study the carotid arteries. Ann Int Med 109:805–818 9. Haynes RB (2001) Of studies, summaries, synopses, and systems: the ‘4S’ evolution of services for finding current best evidence. Evid Based Ment Health 4:37–39 10. Haynes RB (2000) Wolters Kluwer Ovid SP. Medline. http://gateway. ovid.com/. Accessed 29 June 2009 11. Haynes RB (2014) Google and Google Scholar Beta search engines. http://www.google.com/ and http://scholar.google.com/. Accessed 29 June 2009 12. Haynes RB (2014) NCBI Pubmed Web site. U.S. National Library of Medicine. http://www.ncbi.nlm.nih.gov/PubMed. Accessed 3 May 2010 13. Haynes RB (2014) ISI Web of Knowledge Web site. Thomson Reuters. www.isiknowledge.com. Accessed 3 May 2010 14. Haynes RB (2014) MD Consult Web site. Elsevier. www.mdconsult. com. Accessed 3 May 2010 15. Haynes RB (2014) EMBASE. EMBASE biomedical answers Web site. Elsevier. http://www.embase.com/home. Accessed 3 May 2010 16. Haynes RB (2014) ARRS GoldMiner. Biomedical images database. goldminer.arrs.org/about.php. Accessed 17 July 2014 17. Haynes RB (2014) Yottalook Web site. Yottalook medical image search engine. http://www.yottalook.com/index_web.php. Accessed 29 September 2014 18. Haynes RB (2014) The Cochrane Collaboration Web site. The Cochrane Library. http://www.cochrane.org. Accessed 3 May 2010 19. Haynes RB (2014) Cochrane central register of controlled clinical trials. The Cochrane Library. www.mrw.interscience.wiley.com/ cochrane/cochrane_clcentral_articles_fs.html. Accessed 30 June 2014 20. Haynes RB (2014) Database of abstracts of reviews of effects. The Cochrane Library. www.mrw.interscience.wiley.com/cochrane/ cochrane_cldare_articles_fs.html. Accessed 30 June 2014 21. Haynes RB (2014) NICE National Institute for Health and Care Excellence Web site. http://www.nice.org.uk/. Accessed 30 June 2009 22. Haynes RB (2014) Scottish Intercollegiate Guidelines Network (SIGN). http://www.sign.ac.uk/. Accessed 30 June 2009 23. Haynes RB (2014) National Guidelines Clearinghouse (NGC). U.S. Department of Health & Human Services. http://www.guideline. gov/. Accessed 17 Dec 2010 24. Haynes RB (2011) National Library for Health (NLH) Web site. The National Archives. www.connectingforhealth.nhs.uk/resources/ systserv/national. Accessed 29 June 2014 25. Kelly AM (2009) Evidence-based radiology: step 2—searching the literature (search). Semin Roentgenol 44:147–152 26. Kelly AM (1994) Bandolier electronic journal. http://www.medicine. ox.ac.uk/bandolier/aboutus.html. Accessed 1 July 2009 27. Kelly AM (2014) UpToDate Online. Wolters Kluwer Health. http:// www.uptodate.com/home/index.html. Accessed 1 July 2009 28. Kelly AM (2014) DynaMed clinical reference tool. EBSCO. http:// www.ebscohost.com/dynamed/. Accessed 1 July 2009 29. Kelly AM (2014) Evidence-Based Medicine. BMJ Publishing Group. http://ebm.bmj.com/. Accessed 1 July 2009

30. Kelly AM (2013) ACP Journal Club Web site. American College of Physicians. http://www.acpjc.org. Accessed 3 May 2010 31. Kelly AM (2014) ClinicalEvidence Web site. BMJ Publishing Group. http://clinicalevidence.bmj.com/ceweb/index.jsp. Accessed 3 May 2010 32. Kelly AM (1997) TRIP database. http://www.tripdatabase.com. Accessed 3 May 2010 33. Kelly AM (2014) ACP Smart Medicine. American College of Physicians. pier.acponline.org/index.html. Accessed 29 June 2014 34. Sadigh G, Parker R, Kelly AM et al (2012) How to write a critically appraised topic (CAT). Acad Radiol 19:872–888 35. Kelly AM, Cronin P (2011) How to perform a critically appraised topic: part 1, ask, search, and apply. AJR Am J Roentgenol 197: 1039–1047 36. Kelly AM, Cronin P (2014) Levels of evidence (March 2009). Oxford Center for Evidence-Based Medicine. http://www.cebm.net/ index.aspx?o=1025. Accessed 3 May 2010 37. Dodd JD (2007) Evidence-based practice in radiology: steps 3 and 4—appraise and apply diagnostic radiology literature. Radiology 242:342–354 38. Shea BJ, Hamel C, Wells GA et al (2009) AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol 62:1013– 1020 39. Shea BJ, Hamel C, Wells GA et al (2014) The AGREE Collaboration. Appraisal of guidelines for research & evaluation (AGREE) instrument. www.agreecollaboration.org. Accessed 25 Feb 2011 40. Dans AL, Dans LF (2010) Appraising a tool for guideline appraisal (the AGREE II instrument). J Clin Epidemiol 63: 1281–1282 41. Whiting P, Rutjes AW, Reitsma JB et al (2003) The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 3:25 42. Maceneaney PM, Malone DE (2000) The meaning of diagnostic test results: a spreadsheet for swift data analysis. Clin Radiol 55:227–235 43. Cronin P (2009) Evidence-based radiology: step 3—critical appraisal of diagnostic literature. Semin Roentgenol 44:158–165 44. Fagan TJ (2014) Grade definitions. U.S. Preventive Services Task Force. http://www.uspreventiveservicestaskforce.org/uspstf/grades. htm. Accessed 31 July 2014 45. Kelly AM, Cronin P (2011) How to perform a critically appraised topic: part 2, appraise, evaluate, generate, and recommend. AJR Am J Roentgenol 197:1048–1055 46. Kelly AM, Cronin P (2014) Agency for Healthcare Research and Quality clinical practice guidelines. U.S. Department of Health & Human Services. www.ahrq.gov/clinic/cpgsix.htm. Accessed 29 Sept 2014 47. Kelly AM, Cronin P (2014) Levels of evidence (March 2009). Oxford Centre for Evidence-Based Medicine. http://www.cebm.net/ index.aspx?o=1025. Accessed 29 Aug 2014 48. Kelly AM, Cronin P (2014) ACR Appropriateness Criteria. American College of Radiology. www.acr.org/Quality-Safety/ Appropriateness-Criteria. Accessed 17 July 2014 49. Kelly AM, Cronin P (2014) The Royal College of Radiologists Web site. www.rcr.ac.uk. Accessed 17 July 2014 50. Cronin P (2009) Evidence-based radiology: step 4—apply. Semin Roentgenol 44:180–181 51. Guyatt G, Jaeschke R, Heddle N et al (1995) Basic statistics for clinicians: 2. Interpreting study results: confidence intervals. CMA J 152:169–173 52. Akobeng AK (2007) Understanding diagnostic tests 2: likelihood ratios, pre- and post-test probabilities and their use in clinical practice. Acta Paediatr 96:487–491 53. Fagan TJ (1975) Letter: nomogram for Bayes theorem. N Engl J Med 293:257