Diagnostic Accuracy of 18F-Fluorodeoxyglucose Positron Emission ...

8 downloads 0 Views 110KB Size Report
Positron emission tomography with 18F-fluorodeoxyglucose is a relatively new nuclear ... puterized tomography (CT), magnetic resonance imaging. (MRI), and ...
0013-7227/01/$03.00/0 Printed in U.S.A.

The Journal of Clinical Endocrinology & Metabolism 86(8):3779 –3786 Copyright © 2001 by The Endocrine Society

Diagnostic Accuracy of 18F-Fluorodeoxyglucose Positron Emission Tomography in the Follow-Up of Papillary or Follicular Thyroid Cancer ´ , PAUL LIPS, GERRIT J. J. TEULE, LOTTY HOOFT, OTTO S. HOEKSTRA, WALTER DEVILLE MAARTEN BOERS, AND MAURITS W. VAN TULDER Vrije Universiteit Medical Centre, Departments of Nuclear Medicine (L.H., O.S.H., G.J.J.T.), Clinical Epidemiology and Biostatistics (L.H., O.S.H., M.B., M.W.v.T.), Endocrinology (P.L.), and Institute for Research in Extramural Medicine (W.D.), 1007 MB Amsterdam, The Netherlands Positron emission tomography with 18F-fluorodeoxyglucose is a relatively new nuclear imaging technique in oncology. We conducted a systematic review to determine the diagnostic accuracy of 18F-fluorodeoxyglucose positron emission tomography in patients suspected of recurrent papillary or follicular thyroid carcinoma. Two reviewers independently selected, extracted, and assessed data from relevant literature found in computerized databases and by reference tracking. Prospective and retrospective studies with 10 human subjects, or more, that evaluated the accuracy of ring positron emission tomography, using 18F-fluorodeoxyglucose in follicular and papillary thyroid cancer, were included. Studies on 18F-fluorodeoxyglucose imaging using ␥ cameras, reviews, case reports, editorials, letters, and comments were excluded. The methodological quality was assessed by applying the criteria for diagnostic tests recommended by the Cochrane Methods Group on Screening and Diagnostic Tests. A rating system was used for qualitative analysis consisting of four levels of

S

evidence (1 ⴝ highest level; 4 ⴝ lowest level). Fourteen studies met the inclusion criteria. All studies claimed a positive role for positron emission tomography but, at evidence levels 3 or 4, precluding quantitative analysis. Methodological problems included poor validity of reference tests and a lack of blinding of test performance and interpretation. The reviewed material was heterogeneous with respect to patient variation and validation methodology. The most consistent data were found on the ability of 18F-fluorodeoxyglucose positron emission tomography to provide an anatomical substrate in patients with elevated serum Tg and negative iodine-131 scans. In conclusion, the results seem to support the potential of 18F-fluorodeoxyglucose positron emission tomography to identify and localize foci of recurrent cancer in the latter patient subset. However, implementation of positron emission tomography in a routine diagnostic algorithm requires additional evidence. (J Clin Endocrinol Metab 86: 3779 –3786, 2001)

INCE MOST PATIENTS with papillary or follicular thyroid cancer present at an early stage, surgical therapy is highly effective, and the 5-yr survival rate exceeds 85% (1). However, tumor recurrence can be associated with considerable morbidity (and even mortality). The aim of follow-up after primary surgery is the timely detection of local recurrence, lymphatic or distant relapse, assuming that early discovery of recurrences will improve the outcome after treatment (2). Besides physical examination, there is consensus about the use of serum Tg. Patients with detectable Tg levels need further tests to identify the anatomical substrate of recurrent disease. Usually, iodine-131 whole-body scintigraphy (131I WBS) will be the initial diagnostic procedure. However, if the findings are equivocal, many different diagnostic tests are available, including ultrasound (US), computerized tomography (CT), magnetic resonance imaging (MRI), and scintigraphy using thallium-201 (201Tl), technetium-99m (99mTc)-MIBI or -tetrofosmin, and indium111(111In)-octreotide. Despite the availability of all these radiological and nuclear medicine procedures, the clinical

work-up of patients can be difficult because of negative, equivocal, and/or conflicting test results. Positron emission tomography (PET), using 18F-fluorodeoxyglucose (FDG), is a relatively new and promising imaging modality to screen almost the entire body for neoplastic disease. It combines excellent scanner performance [sensitivity (Se), resolution] and a radioactive tracer with a favorable biodistribution and high affinity for cancer cells. At present, there is no consensus on the optimal place of FDG PET in treating patients with thyroid cancer. The objective of this systematic review was to summarize the evidence on the diagnostic accuracy of dedicated (ring) FDG PET , focusing on its utility in two potential applications in the follow-up of patients with papillary and follicular thyroid carcinoma: 1) to identify anatomical substrates in patients with elevated serum markers (or other suspicion of relapse) after negative 131I WBS; and 2) to complete the work-up in patients with known neoplastic foci (e.g. before planned local therapy).

Abbreviations: A, B, and C, Criteria list A-items, B-items, and C-items, respectively; CT, computerized tomography; FDG, 8F-fluorodeoxyglucose; FU, follow-up; 131I WBS, iodine-131 whole-body scintigraphy; MIBI, 99m Tc-hexakis-2-methoxyisobutylisonitrile; MRI, magnetic resonance imaging; PET, positron emission tomography; Se, sensitivity; Sp, specificity; TETRO, 99mTc-Tetrofosmin; US, ultrasound.

Literature search

Materials and Methods We conducted a computer-aided search of the Medline, Embase, and Cancerlit databases up to October 2000 without any language restriction. A modified version of a recently developed search strategy identified primary studies on diagnostic tests (3). This strategy ran in Medline and

3779

3780

The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up

Embase in conjunction with a specific search for PET, FDG, and thyroid cancer (4). To identify studies on thyroid cancer, we used the MeSHterms “thyroid gland” or “thyroid diseases” or “thyroid neoplasms” or “thyroglobulin” in Medline and “thyroid gland” or “thyroid disease” or “thyroid tumor” or “adenoma” in Embase, and the text-words “thyroid$” or “thyroglobulin$” or “papillar$” or “follicular$” in both. In addition, we screened the Cochrane library 1999, issue 2, using the search terms “thyroid cancer”, “thyroid carcinoma”, and “positron emission tomography”. We also checked references given in relevant identified publications and reviews.

Study selection Two unblinded reviewers (O.S.H. and L.H.) independently selected the studies. Differences were resolved by consensus. They applied the inclusion and exclusion criteria to select the relevant studies from the titles, abstracts, and keywords of the references retrieved by the literature search. Studies were included if they were prospective or retrospective, with 10 human subjects or more, and evaluated the accuracy of dedicated (ring) PET using FDG in patients suspected of recurrent follicular and papillary thyroid cancer. Studies on FDG imaging using ␥ cameras, reviews, case reports, editorials, letters, and comments were excluded.

Methodological quality assessment The same two unblinded reviewers independently assessed the methodological quality of the included studies using the criteria list for diagnostic tests recommended by the Cochrane Methods Group on Screening and Diagnostic Tests (5). The list consists of criteria for internal validity of studies (A-items), described below, and criteria relevant to the applicability of the study results or external validity (B- and C-items). A complete criteria list is available, on request, by contacting the first author. Eight internal-validity criteria were considered. Reference standard (criterion no. 1). Diagnostic accuracy is best determined by comparing test results with an appropriate so-called gold reference standard. To correct for potential systematic errors of measurements, it is essential that each patient (criterion no. 2) is submitted to the reference test, applied in a standardized manner (criterion no. 3). Ideally, histology and extended follow-up (e.g. 3 yr, given the often slow progression of thyroid cancer) would be the reference tests of choice. However, this is unrealistic in most clinical settings, especially at the level of individual lesions in patients with known neoplastic foci elsewhere. Therefore, we decided to accept the following reference tests to identify and localize recurrent cancer: 1) histology/cytology; 2) focal 131I-uptake; 3) pathognomonic bone scan or MRI for bone metastases; 4) CT/MRI for brain metastases; and 5) progression of radiologically documented lesions suspect for malignancy. With intrapulmonary lesions unconfirmed by histology or 131I, and of which no radiological follow-up was reported, congruence of PET and CT was also classified as recurrent disease. With PET-CT discrepancies, follow-up was needed. We regarded US, CT, or MRI of the neck as invalid without tissue confirmation, given that fine-needle aspiration is feasible. The minimally required duration of follow-up was 12 months. Independence of interpretation (criterion no. 4). This criterion refers to blinding of interpretations of the index test(s) and reference test(s) to avoid review bias. It is likely that the interpretation of an index test could be influenced by knowledge of the reference test results (or vice versa). We considered this requirement fulfilled in the case of pathology examination. Uniform application of reference test (criterion no. 5). This criterion helps to avoid work-up bias. When the results of a diagnostic test influence whether patients undergo confirmation by the reference test, the properties of a diagnostic test will be distorted. Comparison of different tests in a valid design (criterion no. 6). When FDG PET was compared with other imaging procedures, we considered the design valid when all (index) tests were performed and interpreted, independently of each other, on each patient. It is likely that the inter-

pretation of an index test can be influenced by knowledge of the other index test results. Study design (criterion no. 7). Each study was characterized as prospective or retrospective. Missing data (criterion no. 8). It is essential to include a description regarding missing data, because the patients who are unavailable may have very different outcomes from those available for assessment. This is a threat for the validity of the study results. Not all eight items reflecting the internal validity were used to assess the methodological quality of these studies. Comparison of different tests in a valid design (criterion no. 6) is obviously only relevant in studies where FDG PET was compared with other imaging tests, such as 99mTc-MIBI. If the original paper did not provide enough data on one or more of the A-, B-, and C-items, we requested additional information from the corresponding authors.

Data extraction Two reviewers (O.S.H. and L.H.) independently extracted the following data: Appropriate clinical setting and patient spectrum. The value of a diagnostic test is established only in a study that closely resembles clinical practice. An appropriate spectrum should represent patients that would normally be tested for the target disorder. Therefore, the description of the study population is essential. The extracted data included age distribution, sex distribution, number of patients, tumor type, tumor stage, and (methods and) levels of serum Tg measurements. Description of imaging procedure. This criterion required that the imaging protocol be described in sufficient detail to enable its application in one’s own practice. For FDG PET, this was fulfilled if the following items were discussed: 1) the patient preparation (fasting period); 2) timing and duration of acquisition; 3) dose FDG (megabecquerels); 4) attenuation correction; and 5) scanned trajectory.

Analysis We initially tried to carry out a quantitative metaanalysis; however, most studies did not supply enough valid data to enable calculation of Se and specificity (Sp) to allow statistical pooling. Furthermore, the spectrum of patients was too heterogeneous. Consequently, we analyzed three subgroups of patients: 1) with negative 131I WBS and raised serum markers, or 2) other clinical suspicion of relapse; and 3) with known neoplastic foci to complete the work-up. We extracted data on the proportion of patients with positive and negative PET scans in these groups, and we classified findings according to the defined set of criteria, i.e. true positive if confirmed by one of the valid reference tests (defined in the Methodological Quality Assessment paragraph), and false positive if confirmed by histopathology. Patients with unconfirmed PET lesions (within an FU of 12 months) and raised serum markers (indicative of recurrence) were classified as unclear (i.e. the PET finding was not clinically useful). Patients with discrepancy of PET and reference test (within an FU of 12 months) and low serum markers were classified as false positive. We classified negative PET findings as being true if confirmed by histopathology. Patients with congruent negative findings on PET and one of the valid reference tests (criterion no. 3, reference tests 2–5 above), combined with an FU of 12 months, were also classified as true negative. Patients with discrepancy of PET and one of the valid reference tests were classified as false negative. In all other, PET findings were classified as unclear. In a qualitative analysis, the conclusion on the value of FDG PET in thyroid carcinoma was based on the strength of scientific evidence. Levels of evidence (6) were generated from the analytic framework adapted from the Center for Evidence-Based Medicine of the National Health Service Research and Development in Oxford (see Table 1).

Results Literature search and study selection

The Medline-search identified 86; and the Embase-search, 97 publications. Because 65 studies were found in both da-

Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 3781

TABLE 1. Levels of evidence for diagnostic studies Level of evidence

Diagnosis

1

Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard Independent blind comparison but either in nonconsecutive patients or confined to a narrow spectrum of study individuals (or both), all of whom have undergone both the diagnostic test and the reference standard; or a diagnostic clinical practice guide not validated in a test set Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients Reference standard was not applied independently or not applied blindly

2 3 4

tabases and Embase identified one article twice, the total number of studies was 117. No additional studies were obtained through Cancerlit, the Cochrane library, or reference tracking. On the basis of title, abstract, and keywords, 97 studies were excluded. A full review of the remaining 20 studies resulted in the exclusion of another 6, because these articles were (partial) duplications (7–12). In these cases, we used the publication with the largest study population. However, the 1999 Gru¨ nwald et al. study (12) was excluded because its multicenter and retrospective nature entailed that many specific details were lacking, which were available in the original articles (13–15 and personal communication) and highly relevant for a systematic review. The final review comprised 14 studies (13–26). The study of Wang et al. (27) focused on the prognostic value of FDG PET, which was beyond the scope of this review and was therefore not included. Methodological quality assessment

The reviewers disagreed on 60 out of 398 scores (15%; 14 studies), and on 30 out of 166 internal validity scores (18%). All disagreements were resolved through consensus. Thirtyone scores (14A, 10B, and 7C; 8%) were corrected after applying the additional information obtained through personal communication: 6 of the 14 corresponding authors responded (13, 15–18). These data did not affect final assessment of the methodological quality of the studies. Most studies lacked information on one or more internal validity items. In particular, only small groups of patients were submitted to valid reference test(s); the results in patients with negative FDG PET were often not confirmed in a valid way; most interpretations of FDG PET and the reference test(s) were probably not performed independently of each other; the selection of patients for the assessment by the reference test was often not independent of the FDG PET results; and in most studies, no description regarding missing data were included (Table 2). Furthermore, two of three studies (16, 18, and 21) that compared FDG PET with other imaging procedures [99mTcMIBI, 99mTc-Tetrofosmin (TETRO), and 99m Tc-Furifosmin] used a valid study design, i.e. performance and interpretation was blind to the results of the other test on each patient. Study characteristics

The 14 reviewed articles were published between 1996 and 2000 (7 had a prospective study design; 5 studies were retrospective; and for 2, the design was unclear) (Table 3). Ten studies described the inclusion criteria used for selection of

the study population, and only 3 described the exclusion criteria. Most studies did not describe whether exclusion of patients was based on indeterminate PET results. Two studies were incomplete regarding both criteria (21, 22). Only the Dietlein et al. article (14) mentioned that it included patients consecutively presenting in the clinical department. On request, 5 other authors stated that they had done so (Table 3). The PET studies were performed under variable conditions. The dose of administered FDG (megabecquerels), time between injection and scanning, and scanning time per bed position each varied between studies. Nine studies used a whole-body FDG PET; the other three used regional FDG PET. Eight studies obtained attenuation-corrected emission scans. None of the studies described the reproducibility of FDG PET. Criteria for a positive test reading were defined in five studies. In four studies, FDG PET was not applied in a standardized manner (14, 18, 21, and 24). Sample sizes ranged from 10 –58 subjects; and the mean age, from 41– 67. Information on age and/or gender was missing in 4 of the 12 studies (13, 16, 18, and 24). Six studies provided baseline data on the initial tumor stage of the patients (14 –16, 18, 21, and 23), revealing that the majority concerned patients at high risk of recurrence (Table 3). Seven of the 14 studies properly described previous test information known to the physician (15–19, 23, and 24). Data on comorbid conditions and the period between first diagnosis and PET was only described in 3 studies (14, 22, and 23). The ability of FDG PET to identify and localize recurrent disease

The results of the studies, as reported, are summarized in Table 3, with Se and Sp ranging from 70 –95% and 77–100%, respectively. However, these data only were available from seven studies. Indications for PET scanning and studied patient spectra were very heterogeneous (Tables 3 and 4). As stated, we assessed the utility of FDG PET at three positions in the clinical work-up. FDG-PET in negative 131I WBS and elevated serum markers. All 14 reviewed studies provided data on FDG PET in patients with negative 131I WBS. According to their inclusion criteria, 6 specifically addressed the value of PET in patients with negative 131I WBS (15, 19, 22, 23, 25, 26). Three of these studies (15, 25, 26) focused on patients with elevated serum markers, i.e. Tg or Tg-antibodies. Of the provided Tg values, 50% exceeded 40 ng/ml, and 30% were higher than 100 ng/ml. However, many different Tg kits were used, and Tg data were provided with and/or without TSH stimulation.

3782

The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up

TABLE 2. Scores on internal validity for 14 studies on the diagnostic accuracy of FDG-PET in papillary or follicular thyroid cancer Reference test Study

Feine (13) Sasaki (17) Dietlein (14)

Gru¨ nwald (18)

Tests

PA 131 I WBS PA 131 I WBS Chest CT PA 131 I WBS Spiral CT FU (1 yr) PA

Pet/index vs. reference test Indexa tests

Valid Standardized Each patient A1 A2 A3

A C A

A

A B Ab Ab Ab A A A C A

B Ab A

A MIBI

131

I WBS

A MIBI

CT

C MIBI

FU (1 yr) Jadvar (19) PA d.r. FU (n.a. yr) Schlu¨ ter (20) PA Altenvoerde (15) CT/MRI Brandt-Mainz (16) PA

B

C n.a.

B

A B A

A n.a. A

A n.a. B FURI

131

I WBS

A FURI

CT

A

FU (1 yr)

Cb

FURI FURI Lind (21)

131

I WBS

A

A

A TETRO

Chung (22) Wang (23) Conti (24)

Muros (25) Alnafisi (26)

PA CT/MRI FU (1–3 yr) PA 131 I WBS FU (1 yr) PA 131 I WBS CT FU (2–3 yr) PA CT FU (3– 6 yr) PA FU (n.a. yr)

A A A

B C

A C Cb A A A A A A C A A C A

B B B

B B

Interpretation Performance A4 A5

A A Ab E E A A E B A E E E E E E A B

D E E E Ab E A E E E E A E E E E D D

A B A A C E E E Bb B E E A E Ab A B E A A A E A B B A

D D D D D E E E E E A E D E Db D D E D A A E D D D D

Pet vs. Index A6

Design Missings Level A7 A8

n.a.

A

B

4

n.a.

A

C

4

n.a.

A

B

3

Ab

B

4

n.a.

B

B

4

n.a. n.a.

B B A

B B C

4 4 4

E C (n ⫽ 35) n.a. Ab

C

4

B

3

n.a.

B

C

4

n.a.

C

B

4

n.a.

A

C

4

n.a.

B

C

4

b

A (n ⫽ 44)

A (n ⫽ 16)

A1–A8, Internal validity items; PA, pathology; d.r., documented response (progression of radiological documented lesions suspect for malignancy); n.a., not applicable. a Other index tests than FDG-PET. b The corrected scores using the information obtained through personal communication.

We were able to extract individual data of 156 patients from 11 studies. Hot spots suggestive of tumor localization were reported in 50 –100% of cases per study. On an aggregated level, in 82% (115/140) of the patients with raised markers and negative 131I WBS, FDG PET localized hot spots suggestive of recurrent disease (Table 5). According to our criteria, adequate validation on patient level was obtained in only 50% (68/131) of the cases, 90% of which proved to be recurrent disease. As far as we could extract from the data, histopathological confirmation was obtained in 51–57 of these 68 cases (Table 5). On a patient basis, PET missed confirmed recurrences (according to our criteria) in 6 patients, with histopathological documentation in 3. A lesion-

based (staging) accuracy could not be deduced from these studies because of methodological heterogeneity and limitations. FDG-PET in negative 131I WBS without elevated serum markers. The reported yield of PET in cases with negative 131I WBS and without elevated serum markers is summarized in Table 6. Usually, the clinical indication for PET in this group was not described in detail but appeared to comprise patients with clinical suspicion of relapse, e.g. equivocal imaging tests. PET scans were negative in 34 of 50 patients (68%). Using follow-up (1 yr) and histology as reference tests, this was a false negative result in one case. False positivity, as documented

1996 1997 1997 1997 1998 1998 1998 1998 1999 1999 1999 1999 2000 2000

Feine (13) Sasaki (17) Dietlein (14) Gru¨ nwald (18) Jadvar (19) Schlu¨ ter (20) Altenvoerde (15) Brandt-Mainz (16) Lind (21) Chung (22) Wang (23) Conti (24) Muros (25) Alnafisi (26)

Yes Yes Yes Yes No No No Yes n.a. Yes No n.a. Yes No

Prospective

41* 17 58 54* 10 13* 12 20* 35 54 37* 30* 10 11

No. of ptsa

12 14 38 39 10 9 8 15 15 54 30 24 7 11

Pts with P

23 3 20 12 0 3 4 1 20 0 5 0 3 0

Pts with F

58 (20 –90) 56 (11–78) 45 (19 –72) 50 (19 – 81) 41 (25– 60) 59 (26 –73) 48 (29 – 67) 56 (20 –77) 67 (35– 84) 48 (24 –72) 54 (16 – 81) n.a. 37 (19 –72) 41 (19 – 66)

Age (yr)b

59% 94% 74% 78% 80% 77% 58% 55% 71% 78% 62% n.a. 70% 82%

Sex (F)f Consecutivec

Yes Yes Yes Yes No No No Yes n.a. Yes n.a. n.a. n.a. n.a.

TSH (H/L)g

H⫹Li 12/5 58/0 38/16 n.a. 13/0 11/1 n.a. n.a. 0/54 21/16 n.a. 10/0 1/10 n.a. Proven rec. T4(43%) T4(44%) n.a. Only PET⫹ High risk (M1) T4(50%); N1(50%); M1(15%) T4(43%); M1(?%) n.a., only P High risk n.a. n.a. n.a.

Pre-PET data risk profileh

63% 100% 48% 39% 50% 81% ⱖ25% 90% ⱖ57% 61% 54% ⱖ50% ⱖ50% ⱖ55%

Prevd

95 n.a. 86 71 72 n.a. n.a. 72 n.a. 94 70 n.a. n.a. n.a.

Se%

n.a. n.a. 97 88 100 n.a. n.a. 100 n.a. 95 77 n.a. n.a. n.a.

Sp%

n.a. n.a. 96 79 100 n.a. n.a. 100 n.a. 97 78 n.a. n.a. n.a.

PPV%

n.a. n.a. 88 83 71 n.a. n.a. 71 n.a. 91 68 n.a. n.a. n.a.

NPV%

Outcome measurese

The corrected scores using the information obtained through personal communication are in italics; P, Papillary thyroid tumors; F, follicular thyroid tumors; T4, tumor invasion of surrounding structures; N1, ipsilateral lymph node metastases without invasion of surrounding structures; M1, distant metastasis; PPV, positive predictive value; NPV, negative predictive value. a The six marked studies* also include patients with other types of thyroid cancer (i.e., medullary or Hu¨ rthle). When possible, these patients were excluded from accuracy analysis. b Mean (range). c Consecutive presentation in the clinical department. However, given the overrepresentation of T4 (vs. normal distribution) and prevalence most likely the population under study was high risk. d Percentage of patients with (recurrent) disease. e If raw data to derive outcome measures were not given in the original papers, sensitivity and specificity values will be used as presented in the publications. f Percentage of female patients. g Number of patients with high or low (H/L) thyroid stimulating hormone (TSH) levels at PET. h Data of primary surgery/pathology other than histological type/initial work-up. i The number of patients with high or low TSH levels, respectively, is unclear.

Year

Study

TABLE 3. Characteristics of the 14 reviewed studies

Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 3783

3784

The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up

TABLE 4. Studies by clinical indication, categorized according to: 131 I WBS, serum markers, and the presence of known recurrence prior to PET 131

I WBS ⫺, Tg/TgAb 1

131

I WBS ⫺, Tg/TgAb 2

Feine 1996a

Feine 1996

Dietlein 1997 Gru¨ nwald 1997 Jadvar 1998 Schlu¨ ter 1998 Schlu¨ ter 1998 Altenvoerde 1998 Brandt-Mainz 1998 Lind 1999 Chung 1999 Wang 1999 Conti 1999 Muros 2000 Alnafisi 2000

Dietlein 1997 Gru¨ nwald 1997 Jadvar 1998

Brandt-Mainz 1998

Known neoplastic foci

Sasaki 1997 Dietlein 1997 Gru¨ nwald 1997

Brandt-Mainz 1998

Chung 1999 Wang 1999 Conti 1999

TgAb., Thyroglobulin antibody. a Italic: from these studies, no individual patient data could be extracted.

by histopathology, seemed to be more frequent than in the group with raised serum markers. However, the prevalence of disease in the latter probably was much higher. Even though lesion-based accuracy could not be deduced from these studies, the usefulness of FDG PET may vary from organ to organ. Some authors suggest that the technique would mainly be beneficial in patients with suspected regional lymph node metastases (14, 22); others suggest a higher detection limit (worse detection) of PET in the lungs (14). FDG PET in patients with known neoplastic foci. Six of the 14 reviewed studies seemed to contain data on FDG PET in patients with otherwise established recurrent disease (Table 4). According to the inclusion criteria, only 1 study specifically addressed this patient group (17). In none of these articles was it clearly defined for which clinical problems, if any, the PET scan had been performed. Some studies may have been performed to explore the relationship between 131I and FDG uptake (the flip-flop phenomenon, 13). The data usually allowed comparison only between the results of 131I WBS and FDG PET on a patient level. Analysis on a lesion level was largely anecdotal (16). We found that it was even difficult to identify patients in whom positive 131I findings represented an established recurrence, rather than uptake in benign remnant thyroid tissue (21). Sources of variation 131

I dose at WBS. As shown in Tables 5 and 6, the conditions of 131I scanning were not consistent in different studies. Especially in patients with elevated tumor markers without known tumor substrates, this may affect the results. Lowdose 131I scanning (2–5.5 mCi in these studies) may fail to disclose 131I-accumulating tumor deposits; Pacini et al. (1987) reported that high-dose 131I may reveal metastases in 12% of patients with a negative low-dose scan (28). In the presently reviewed studies, Wang et al. (23) reported this discrepancy in 7/13 patients subjected to high-dose 131I scanning after a

negative diagnostic scan. The present data do not allow quantification of this effect, however. In the patients assessed with high-dose 131I WBS and elevated serum markers, 79% (45/57) of the PET scans provided at least 1 site suspect of tumor. TSH levels. FDG uptake, and therefore detection, might depend on serum TSH levels. It is unclear whether discrepant results [i.e. studies showing positive relation between TSH levels and FDG uptake (29, 30) vs. those showing similar detection rate under TSH stimulation and TSH suppression (13, 17, 18)] relate to methodological differences or to biological variation. Of the studies in this review, none performed a head-to-head evaluation of this potential effect, and only one (23) supplied outcome measures stratified by TSH level with no apparent differences. It was unclear how patients had been selected for either modality, however. FDG PET compared with other imaging procedures

Only 3 studies (16, 18, and 21) compared FDG PET with other imaging procedures. Gru¨ nwald et al. (18) compared FDG PET with 99mTc-Sestamibi (MIBI) and reported superior outcomes of FDG PET for the detection of recurrent thyroid cancer. Regarding individual tumor sites, FDG PET and MIBI had congruent positive results in 65%, FDG-positive/MIBInegative in 25%, and FDG-negative/MIBI-positive in 10%. Lind et al. (21) compared FDG PET with TETRO and concluded that FDG PET gives better image quality and demonstrates more lesions, compared with 99mTc-Tetrofosmin (135 FDG positive vs. 61 TETRO positive lesions). However, it was not possible to determine overall accuracy of FDG PET and TETRO in this group of patients because the FDG/ TETRO-positive and 131I-negative lesions (61 FDG positive vs. 20 TETRO positive lesions) were not verified. BrandtMainz et al. (16) investigated the 99mTc-Furifosmin and found an Se of 33% (Sp of 100%) on a patient-by-patient level, and of 34% (Sp of 100%) on a lesion-by-lesion level. For FDG PET, an Se of 72% (Sp of 100%) was found on a patient-by-patient basis, and of 91% (Sp of 100%) on a lesion-by-lesion basis. Qualitative analysis

Two of the 14 studies that evaluated the diagnostic accuracy of FDG PET for the detection of recurrent disease were considered of level 3 evidence (14, 22), because both the PET-scan and the reference tests were not performed in all patients (Tables 2 and 3). The other 12 studies provided level 4 evidence (13, 15–21, 23–26), which was scored when the PET-scan and the reference tests were not performed and interpreted independently and blindly from each other, and irrespectively of the spectrum or the number of patients that received both the PET-scan and reference test(s). In most studies, only the positive PET findings were confirmed. To evaluate the patients with a negative PET or a negative reference test, a relevant duration of a follow-up period is required. Follow-up was performed in 7 studies, but only a small percentage was followed up, except for 2 studies in which 58% (24) and 76% (22) were evaluated.

Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 3785

TABLE 5. Patient based analysis of the yield of FDG PET in negative recurrent disease) Study

Gru¨ nwald Jadvar Altenvoerde Brandt-Mainz Lind Chung Wang Conti Muros Alnafisi Total Schlu¨ terb Total

N

PET⫹

8 8 12 12 15 33 16 15 10 11

6 6 6 9 15 27a 11 15 9 11

131

I WBS and elevated serum markers (indicative of 131

TP

FP

?

Reference test (n)

PET⫺

TN

FN

?

Reference test (n)

5 5 1 5 n.a. 11 7 4 5 6

1 1 n.a. n.a. n.a. n.a. 1 n.a. n.a. n.a.

0 0 5 4 15 16 3 11 4 5

PA(?), CT(?) PA(5), FU(1) CT(1) PA(3), CT(2) n.a. PA(11); CT/MRI(?) PA(3), CT(5) PA(3), CT(1) PA(4), CT(1) PA(6)

2 2 6 3 0 6 5 0 1 0

2 n.a. n.a. n.a. n.a. 5 0 n.a. n.a. n.a.

0 n.a. n.a. n.a. n.a. 1 5 n.a. n.a. n.a.

0 2 6 3 n.a. 0 0 n.a. 1 n.a.

FU1yr(2) FU?yr(?) n.a. n.a. n.a. PA(1), FU1–3yr PA(2), FU1yr(3) FU2–3 yr(?) FU3– 6 yr(?)

High Low High High High Low Low n.a. High Low

25

7

6

12

n.a.

n.a.

n.a.

n.a.

n.a.

n.a.

25

7

6

12

140

115

49

3

63

16

16

13

3

0

156

131

62

6

63

PA(16)

I WBS dose

N, Number of patients; TP, true positive; FP, false positive; TN, true negative; FN, false negative. a Tg level unclear in one patient. b In this study, the verification of selected PET-positive patients was reported. TABLE 6. Patient based analysis of the yield of FDG PET in negative

131

I WBS without elevated serum markers

Study

N

PET⫹

TP

FP

?

Reference test (n)

PET⫺

TN

FN

?

Reference test (n)

Gru¨ nwald Jadvar Brandt-Mainz Chung Wang Total

5 2 3 21 19 50

5 0 1 5 5 16

2 n.a. n.a. 1 2 5

3 n.a. n.a. n.a. 3 6

0 n.a. 1 4 0 5

PA(?), CT(?) n.a. n.a. PA(1), CT/MRI(?) PA(2), FU1yr(2) CT(1)

0 2 2 16 14 34

n.a. 0 2 16 12 30

n.a. 0 0 0 1 1

n.a. 2 0 0 1 3

n.a. FU unclear FU1yr(2) FU1–3yr(16) PA(1), FU1yr(12)

131

I WBS dose

High Low High Low Low

N, Number of patients; TP, true positive; FP, false positive; TN, true negative; FN, false negative.

Discussion

This systematic review included 14 studies that assessed the value of the diagnostic imaging technique FDG PET in papillary and follicular thyroid carcinoma. FDG PET is clearly able to solve clinical problems in selected patients with (suspected) recurrent thyroid cancer; and the studies showed promising results in this setting but cannot be viewed with confidence, because of methodological problems. These problems are considerable, given that 50% of the criteria for internal validity were not met. Studies investigating the diagnostic accuracy of FDG PET, or any other imaging technique, in thyroid cancer encounter specific problems. Most studies have small sample sizes, compatible with the low incidence of the disease. The validity of reference tests is a difficult issue, especially in defining the true extent of disease. From a scientific point of view, histopathological data and prolonged follow-up, preferably without interventions, would be the best combination. This is especially problematic if lesion-by-lesion analysis is necessary in patients with known recurrent disease. Further, in often slow growing tumors like thyroid cancer, prolonged follow-up, e.g. 3 yr, would be necessary. These requirements are not compatible with the clinical context in which these studies were clearly conducted; and, as a result, a variety of validation tests were used. For the purpose of this review, we reasoned that histopathological proof of lesions in the neck area should be clinically feasible. This is not the case for many pulmonary lesions, because without follow-up, this usually requires surgical evaluation. We therefore accepted congruence of CT and PET as

confirmed disease, but we also performed an Se analysis. This revealed that the assumption did not dramatically alter the conclusion that PET seemed to have been helpful in many patients with negative 131 I scans and raised serum markers. This yield of PET, i.e. in terms of localizing recurrent disease in patients with elevated serum markers and negative findings at 131I WBS, is reported to be very high. These data apply to the included spectrum of patients, i.e. with (very) high serum markers (indicative of recurrent disease). Because serum marker levels and tumor load are positively related, these findings cannot be extrapolated to situations with lower serum markers. In some studies, it is also unclear whether data refer to consecutive patients presenting in a typical clinical setting (rather than in the PET center) and in what time frame various tests have been carried out. As a result of these factors, Sp is probably overestimated, and the number of patients needed to have a PET-scan to localize recurrence may be underestimated. Apart from showing the presence of disease, PET may also play a role in determining the actual tumor spread, with potential important implications for the choice of therapy. This includes patients with and without known neoplastic foci (before PET). In TNM staging, the entire body is subject of investigation. Depending on the predilection sites of the cancer at hand, sequences of radiological methods (plain X-rays, bone scintigraphy, US, CT, MRI) focusing on body parts (like neck, lung, bone in thyroid cancer) are necessary to define the extent of disease. Consensus on the battery of tests (including methods and duration of follow-up) to validate findings of whole-body techniques like PET is urgently needed.

3786

The Journal of Clinical Endocrinology & Metabolism, August 2001, 86(8):3779 –3786 Hooft et al. • Accuracy of FDG PET in Thyroid Cancer Follow-Up

On the other hand, standardizing conventional work-up would simplify the assessment of the additional value of PET. Data on the performance of FDG PET, relative to other diagnostic methods in specific body areas, are largely anecdotal, and they merely suggest that the detection of lung metastases is not perfect (14), as has been found in other malignancies. Apart from clinical consensus, basic PET acquisition parameters also need to be clarified: should FDG PET be performed during thyroxin withdrawal or not? The present data are confusing. This lack of knowledge also impairs the understanding of the inverse relation between FDG and 131I (so-called flipflop phenomenon) reported in 31–74% of the patients (13, 14, 18). Again, it is unclear whether these differences relate to biological or methodological issues (e.g. variability of 131I WBS dose). Finally, if the claims are true that FDG uptake has prognostic relevance (27), this still needs to be translated into the clinical approach of these patients. Mixed patterns of 131I and FDG uptake within patients, and even within metastases, have been documented (13). To date, it is not evident what the impact of altered therapy decisions would be on patient outcomes in curative and palliative settings. Instead of focusing only on Se and Sp estimates, one might investigate whether application of FDG PET would improve patient outcomes, e.g. reduce the number of incorrect clinical decisions (compared with a defined conventional strategy). Because mortality is not the only issue in thyroid cancer, outcome measures should also be defined in terms of patient burden: inconclusive or negative diagnostics tests, invasive investigations, ineffective 131I-therapies, thyroxin withdrawal, negative surgical explorations, recurrence outside the field of local therapy within a specified time lapse, and quality of life. Then, follow-up duration and methodology should be specified. In the present evaluation, we considered 12 months as a minimum period of follow-up. We reasoned that unconfirmed lesions within that time frame are unlikely to affect clinical management, and that recurrent disease within that period after presumed radical local therapy would generally be estimated as a poor result of initial staging. Such studies may have to focus on clinical situations where actual staging (establishing the extent of disease) is crucial for therapy choice. In conclusion, the results seem to support the potential of FDG PET to provide anatomical substrate of raised serum markers in patients with a negative 131I scan. However, the present evidence does not allow implementation of PET in a routine diagnostic algorithm.

5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

15.

16. 17. 18. 19. 20.

21. 22. 23.

24.

Acknowledgments

25.

Received September 13, 2000. Accepted March 13, 2001. Address all correspondence and requests for reprints to: Lotty Hooft, M.Sc., Department of Clinical Epidemiology and Biostatistics, Vrije Universiteit Medical Centre, P.O. Box 7057, 1007 MB Amsterdam, The Netherlands. E-mail [email protected].

26. 27.

References 1. Schlumberger M, Pacini F 1999 Thyroid tumors (tumeurs de la thyroı¨de). Paris: E´ditions Nucle´on; 47 2. Schlumberger M, Baudin E 1998 Serum thyroglobulin determination in the follow-up of patients with differentiated thyroid carcinoma. Eur J Endocrinol 138:249 –252 3. Deville´ WLJM, Bezemer PD, Bouter LM 2000 Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiology 53:65– 69 4. Mijnhout GS, Hooft L, Tulder MW, van Deville´ WLJM, Teule GJJ, Hoekstra

28. 29. 30.

OS 1999 How to perform a comprehensive search for FDG-PET literature? Eur J Nucl Med 27:91–97 Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic test Recommended methods, updated 6 June 1996. Available at http://www.cochrane.org/cochrane/sadt.htm NHS Research and Development; Centre for Evidence-Based Medicine Levels of Evidence and Grades of Recommendations, updated 17th September 1998. Available at http://cebm.jr2.ox.ac.uk/docs/levels.html Feine U, Lietzenmayer R, Hanke JP, Wohrle H, Muller-Schauenburg W 1995 18 FDG whole-body PET in differentiated thyroid carcinoma. Flipflop uptake patterns of 18FDG and 131I. Nuklearmedizin 34:127–134 Gru¨nwald F, Schomburg A, Bender H, et al. 1996 Fluorine-18-fluorodeoxyglucose positron emission tomography in the follow-up of differentiated thyroid cancer. Eur J Nucl Med 23:312–319 Schlu¨ter B, Grimm-Riepe C, Beyer W, Lubeck M, Schirren-Bumann K, Clausen M 1997 Verification of FDG-positive findings in patients with differentiated thyroid cancer by means of operation. Acta Chir Aust 29:6 Lerch H, Altenvoerde G, Kuwert T, Schafers M, Matheja P, Schober O 1997 FDG-PET in patients with differentiated thyroid carcinoma, elevated thyroglobulin levels and negative iodine scans. Acta Chir Austriaca 29:6 –7 Dietlein M 1998 Follow-up of differentiated thyroid cancer: what is the value of FDG and Sestamibi in the diagnostic algorithm? Nuklearmedizin 37:12–17 Gru¨nwald F, Kalicke T, Feine U, et al. 1999 Fluorine-18-fluorodeoxyglucose positron emission tomography in thyroid cancer: results of a multicentre study. Eur J Nucl Med 26:1547–1552 Feine U, Lietzenmayer R, Hanke JP, Held J, Wohrle H, Muller-Schauenburg W 1996 Fluorine-18-FDG and iodine-131-iodide uptake in thyroid cancer. J Nucl Med 37:1468 –1472 Dietlein M, Scheidhauer K, Voth E, Theissen P, Schicha H 1997 Fluorine18-fluorodeoxyglucose positron emission tomography and iodine-131 wholebody scintigraphy in the follow-up of differentiated thyroid cancer. Eur J Nucl Med 24:1342–1348 Altenvoerde G, Lerch H, Kuwert T, Matheja P, Scha¨fers, M, Schober O 1998 Positron emission tomography with F-18-deoxyglucose in patients with differentiated thyroid carcinoma, elevated thyroglobulin levels, and negative iodine scans. Langenbecks Arch Surg 383:160 –163 Brandt-Mainz K, Muller SP, Sonnenschein W, Bockisch A 1998 Technetium99m-furifosmin in the follow-up of differentiated thyroid carcinoma. J Nucl Med 39:1536 –1541 Sasaki M, Ichiya Y, Kuwabara Y, et al. 1997 An evaluation of FDG-PET in the detection and differentiation of thyroid tumours. Nucl Med Commun 18:957–963 Gru¨nwald F, Menzel C, Bender H, et al. 1997 Comparison of 18FDG-PET with 131 iodine and 99m Tc-Sestamibi scintigraphy in differentiated thyroid cancer. Thyroid 7:327–335 Jadvar H, McDougall IR, Segall GM 1998 Evaluation of suspected recurrent papillary thyroid carcinoma with [18F]fluorodeoxyglucose positron emission tomography. Nucl Med Commun 19:547–554 Schlu¨ter B, Grimm-Riepe C, Beyer W, Lubeck M, Schirren-Bumann K, Clausen M 1998 Histological verification of positive fluorine-18-fluorodeoxyglucose findings in patients with differentiated thyroid cancer. Langenbecks Arch Surg 383:187–189 Lind P, Gallowitsch HJ, Mikosch P, et al. 1999 Comparison of different tracers in the followup of differentiated thyroid carcinoma. Acta Med Austriaca 26: 115–118 Chung JK, So Y, Lee JS, et al. 1999 Value of FDG PET in papillary thyroid carcinoma with negative 131I whole body scan. J Nucl Med 40:986 –992 Wang W, Macapinlac H, Larson SM, et al. 1999 [18F]-2-fluoro-2-deoxy-dglucose positron emission tomography localizes residual thyroid cancer in patients with negative diagnostic 131I whole body scans and elevated serum thyroglobulin levels. J Clin Endocrinol Metab 84:2291–2302 Conti PS, Durski JM, Bacqai F, Grafton ST, Singer PA 1999 Imaging of locally recurrent and metastatic thyroid cancer with positron emission tomography. Thyroid 9:797– 804 Muros MA, Llamas-Elvira JM, Ramı´rez-Navarro A, et al. 2000 Utility of fluorine-18-fluorodeoxyglucose positron emission tomography in differentiated thyroid carcinoma with negative radioiodine scans and elevated serum thyroglobulin level. Am J Surg 179:457– 461 Alnafisi NS, Driedger AA, Coates G, Moote DJ, Raphael SJ 2000 FDG-PET of recurrent or metastatic 131I-negative papillary thyroid carcinoma. J Nucl Med 41:1010 –1015 Wang W, Larson SM, Fazzari M, et al. 2000 Prognostic value of [18F]fluorodeoxyglucose positron emission tomographic scanning in patients with thyroid cancer J Clin Endocrinol Metab 85:1107–1113 Pacini F, Lippi F, Formica N, et al. 1987 Therapeutic doses of iodine-131 reveal undiagnosed metastases in thyroid cancer patients with detectable serum thyroglobulin levels. J Nucl Med 28:1888 –1891 Sisson JC, Ackermann RJ, Meyer MA, Wahl RL 1993 Uptake of 18F-fluoro2-deoxy-d-glucose by thyroid cancer: implications for diagnosis and therapy. J Clin Endocrinol Metab 77:1090 –1094 Moog F, Rainer L, Manthey N, et al. 2000 Influence of thyroid-stimulating hormone levels on uptake of FDG in recurrent and metastatic differentiated thyroid carcinoma. J Nucl Med 41:1989 –1995