Evaluating Variance of Gene Expression in the Human ... - ATS Journals

74 downloads 0 Views 238KB Size Report
LC, Koo JS. Global detection of molecular changes reveals concurrent .... Saha S, Jun AS, Stark WJ, Liu SH. Gene expression in donor corneal endothelium.
Human Lung Project: Evaluating Variance of Gene Expression in the Human Lung Michael P. Gruber, Christopher D. Coldren, Malcolm D. Woolum, Gregory P. Cosgrove, Chan Zeng, Anna E. Baro´n, Mark D. Moore, Carlyne D. Cool, G. Scott Worthen, Kevin K. Brown, and Mark W. Geraci Division of Pulmonary Sciences and Critical Care Medicine and Section of Biometrics and Informatics, University of Colorado Health Sciences Center; and National Jewish Medical and Research Center, Denver, Colorado

Nondiseased tissue is an important reference for microarray studies of pulmonary disease. We obtained 23 single lungs from multiorgan donors at time of procurement. Donors varied in age, sex, smoking history, and ethnicity. Lungs were dissected into upper and lower lobe peripheral sections for RNA extraction. Microarray analysis was performed using Affymetrix Hu-133 Plus 2.0 arrays. We observed that the relative variability of gene expression increased rapidly from technical (lowest), to regional, to population (highest). In addition, age and sex have measurable effects on gene expression. Gene expression variability is heterogeneously distributed among biologic categories. We conclude that gene expression variability is greater between individuals than within individuals and that population variability is the most important factor in the study design of microarray experiments of the human lung. Classes of genes with high population variability are biologically important and provide a novel perspective into lung physiology and pathobiology. Our study represents the first comprehensive analysis of nondiseased lung tissue. The generation of this robust dataset has important implications for the design and implementation of future comparative expression analysis with pulmonary disease states. Keywords: lung; microarray; genomics; variability

The use of gene expression microarrays as a high-throughput means to obtain qualitative and quantitative expression profiles on thousands of gene transcripts has revolutionized the field of translational medicine. Gene expression profiling has become a powerful tool in the armamentarium of clinical lung cancer research as a means to define clinical subtypes (1–3), prognosis (4–6), molecular biomarkers (7–9), and novel therapeutic interventions (10, 11). As a result of the application of this technology to lung cancer, the use of expression profiling can be widely applied to other non–cancer-related lung diseases. Recently, gene expression profiling has been used to provide insights into the pathogenesis of idiopathic pulmonary fibrosis (12–16), primary pulmonary hypertension (17, 18), smoking-related lung disease (19, 20), acute respiratory distress syndrome (21, 22), asthma (23, 24), and cystic fibrosis (25, 26). Once used primarily to investigate changes in gene expression within in vitro models such as cell culture or clonal cell popula-

(Received in original form August 12, 2004 and in final form February 17, 2006 ) This work was supported by NHLBI grant R01 HL 72340-01. Correspondence and requests for reprints should be addressed to Mark W. Geraci, M.D., University of Colorado Health Sciences Center, Division of Pulmonary Sciences and Critical Care Medicine, 4200 East Ninth Ave, C-272, Denver, CO 80262. E-mail: [email protected] The complete set of gene expression data has been deposited in the Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo/) accession #GSE1643. This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org Am J Respir Cell Mol Biol Vol 35. pp 65–71, 2006 Originally Published in Press as DOI: 10.1165/rcmb.2004-0261OC on February 23, 2006 Internet address: www.atsjournals.org

tions, microarray technology is increasingly applied to case-control models to derive gene expression patterns as descriptors of pathobiology or clinical outcomes. However, fundamental knowledge of the extent, nature, and sources of gene expression variation within nondiseased individuals is lacking. This lack of nondiseased comparative tissue can result in selection bias, confounding patient groups and Type I and Type II statistical errors. The majority of human microarray studies comparing diseased with nondiseased tissues published in the medical literature neither describe nor adequately characterize the comparative control population. Therefore, a better understanding of the normal variation between and within individuals including covariables such as age and sex will advance the use of microarray technology as a powerful investigational tool. Human microarray studies in nondiseased states have been used to analyze variations in gene expression in peripheral blood (27), retina (28, 29), cornea (30), brain (31), kidney (32), and muscle (33, 34). From these studies and others (35–37), it is increasingly apparent that tissue heterogeneity, intrinsic host factors, and sample processing have direct, measurable effects on gene expression. Additionally, microarray studies of human muscle (34) and retina (29) have demonstrated variations in gene expression specifically related to age and sex. A major challenge of microarray expression profiling and bioinformatics is to maximize true discovery while limiting false discovery. Although fundamental in concept, the immense amount of expression data generated from a single microarray experiment often yields hundreds of “differentially expressed” transcripts that may represent normal biologic variability between samples, tissue sample heterogeneity, or technical variability from tissue processing or the array platform and not a true pathobiologic discovery between comparative groups. In this study, we characterize the human lung transcriptome for the first time using microarray expression profiling. We analyze all major sources of variation in gene expression in postmortem human lung samples with particular focus on technical, anatomic, and individual variability. Additionally, we provide a novel description of the expression variability in the nondiseased lung. These results expand our understanding of normal human gene expression variation and pulmonary physiology and have an important impact on the future design of case-control microarray experiments involving the human lung. This robust “controltissue” database is publicly available as a resource to the research community for future comparative analyses.

MATERIALS AND METHODS Human Subjects Single whole-lung samples from 23 individuals were obtained from Tissue Transformation Technologies (Edison, NJ) (Table 1). All individuals suffered brain death and were evaluated for organ transplantation before research consent. Informed consent was obtained at the time of transplant evaluation. All specimens failed regional lung selection criteria for transplantation. Reasons listed for failure to transplant include age (41%), smoking history (5%), “quality” (14%), gas exchange

66

AMERICAN JOURNAL OF RESPIRATORY CELL AND MOLECULAR BIOLOGY VOL 35

2006

TABLE 1. PATIENT DEMOGRAPHICS Sample No. 4826 4851 4860 4878 4863 4874 20002 20008 20010 20014 20015 20017 20021 4804 4816 4818 4827 4824 4825 20000 20016 4859 20024

Age (yr)

Sex

Race

Cause of Death

Lung

P/F Ratio

Smoking History

Pack-years*

53 50 38 59 46 69 63 31 48 64 24 71 23 47 74 64 54 71 26 58 63 38 21

F F F F F M F M F M M M M M F M F F F M F M M

C C C C AA AA C H C AA C C C C C C AA C C C C C C

CVA CVA CVA ICH CVA Anoxia CVA Head trauma Hydrocephalus ICH Head trauma CVA Head trauma Head trauma CVA CVA Head trauma CVA Head trauma CVA CVA CNS tumor Head trauma

R L R L R L R R R L L R L L R R R L R L R R R

365 330 422 347 285 406 345 378 384 320 444 282 357 270 417 320 208 250 368 442 312 395 409

Former smoker, 1 ppd, quit for 4 mo Current smoker, unknown ppd Current smoker, unknown ppd Current smoker, 1 ppd for 40 yr Current smoker, unknown ppd Former smoker, quit for 21 yr Former smoker, quit for 25 yr Nonsmoker Nonsmoker Current smoker, 1 ppd Nonsmoker Current smoker, 0.5 ppd for 51 yr Former smoker, unknown ppd Former smoker, 2 ppd, quit for 2 yr Nonsmoker Nonsmoker Nonsmoker Former smoker, quit for 7 yr Nonsmoker Nonsmoker Former smoker, 0.5 ppd, quit for 20 yr Current smoker, unknown ppd Current smoker, 0.75 ppd for 8 yr

10 25 24 40 15 30 Unknown N/A N/A Unknown N/A 26 Unknown 30 N/A N/A N/A 20 N/A N/A Unknown 50 6

Definition of abbreviations: AA, African American; C, Caucasian; CVA, cerebrovascular accident; H, Hispanic; ICH, intracranial hemorrhage; N/A, not available; P/F ratio ⫽ PaO2/FIO2 ratio; PPD, packs per day. * Number of packs per day multiplied by the number of years of smoking.

(9%), size (9%), and inability to match (23%). For study inclusion, individuals had to demonstrate no evidence of active infection or chest radiographic abnormalities, mechanical ventilation ⬍ 48 h, PaO2/FiO2 ratio ⬎ 200, and no past medical history of underlying lung disease or systemic disease that involves the lungs (e.g., rheumatoid arthritis, systemic lupus erythematosus). Patients with mild asthma not requiring the regular use of inhaled ␤-agonists were included. Lung samples were procured within 34 h after brain death (mean, 16.2 h; range, 4.5–33.25 h). After resection, the lungs were insufflated with preservation solution and transported on ice to our laboratory. All samples were received within 28 h after procurement (mean, 16.7 h; range, 9–28 h). Upon receipt, each lung was dissected into upper and lower lobes and central (⬍ 5 cm from mainstem bronchus) and peripheral (⬍ 5 cm from pleura) sections. The samples were flash frozen in liquid nitrogen and stored at ⫺80⬚C for further analysis. The study was approved by the National Jewish Medical and Research Center Institutional Review Board (IRB protocol #NJC HS-1539).

Tissue Processing Frozen lung tissue (30–50 mg) was homogenized, and total RNA was extracted using the MiniElute protocol (Qiagen, Valencia, CA). Total RNA was quantified by spectrophotometer, and assessment of RNA quality by Bioanalyzer (Agilent Technologies, Palo Alto, CA) was performed. For study inclusion, RNA samples were required to meet the Tumor Analysis Best Practices Working Group quality standards (38). Only those RNA samples that showed intact 18S and 28S ribosomal RNA chromatographs and with optical density 260/280 ratios of 1.8–2.1 were used for further analysis.

Tissue Histology Concomitant with RNA isolation, tissue was fixed in 4% paraformaldehyde and paraffin embedded for hematoxylin-eosin staining. All included lung samples were reviewed by a pulmonary pathologist (C.D. Cool) and deemed histologically normal with the exception of two samples. Sample number 4827 showed evidence of acute lung injury and inflammation. Sample 4827 was used for the technical variability experiment but was excluded from further comparative analyses. Sample 4878 showed evidence of acute lung injury and inflammation. This specimen was included in the upper lobe versus the lower lobe (regional lung variability) comparative analysis but was excluded from all popula-

tion variability analyses. A majority of samples obtained from individuals with smoking histories (Table 1) demonstrated pathologic evidence of tobacco exposure, including pigmented macrophages and anthracosis. Lung samples that showed evidence of pneumonia, emphysema, lung fibrosis, or other pathologic findings were excluded from population comparative analyses.

Microarray Analysis of Human Lung Gene Expression RNA stabilization, isolation, and microarray sample labeling were performed using standard methods for reverse transcription and one round of in vitro transcription (39). HG-U133 Plus 2.0 microarrays were hybridized with 10 ␮g cRNA and processed per the manufacturer’s protocol (Affymetrix, Foster City, CA). A MIAME (Minimum Information About a Microarrary Experiment) checklist (40) containing extensive experimental details can be found in the online supplement. Hybridization signals and detection calls were generated in BioConductor, using the GCRMA and AFFY packages (41), and have been deposited in the NCBI GEO database (accession #GSE1643). Statistical analysis of the technical replicates was performed with SAS Version 8.2, PROC ANOVA. Intraclass correlation was used to estimate within-chip technical variability. Cluster analysis and class comparisons were conducted using BRB ArrayTools v3.3b developed by Dr. Richard Simon and Amy Peng Lam. Cluster analysis and class comparisons for region and age were performed using the 20,669 probe sets that were reliably detected on two or more arrays and had a signal intensity of 20 or greater in at least 50% of the arrays. Paired t tests were used to compare upper versus lower lobes. For the sex class comparison, the intensity criterion was removed, resulting in a set of 35,219 probe sets. Multivariate permutation tests to determine false discovery rates were based on 1,000 random class label permutations. Gene Ontology (GO) analysis of variability was performed on the 20,669-gene dataset using GenMAPP and MAPPfinder (42).

Experimental Design Technical replicate variability. Total RNA was isolated from a single individual’s (sample number 4827) upper lobe and lower lobe peripheral regions. Five separate probe syntheses and array hybridizations were completed with upper lobe and with lower lobe RNA. These 10 arrays comprise the technical replicates.

Gruber, Coldren, Woolum, et al.: Gene Expression Variability in the Human Lung

Regional lung variability. Microarray data were generated from total RNA isolated from upper lobe and lower lobe peripheral regions from eight individuals. The resulting 16 paired upper lobe and lower lobe arrays comprise the regional comparison dataset. Population variability. Microarray data from 21 of the 23 individual lower lobes were analyzed to investigate relationships between age, sex, and overall gene expression variability (samples 4827 and 4878 were excluded due to histologic abnormalities). The age comparison comprised individuals with age ⬎ 60 yr (n ⫽ 6; age, 68.7 ⫾ 4.3 yr) or ⬍ 40 yr (n ⫽ 7; age, 28.7 ⫾ 7.1 yr) matched for sex and cumulative smoking history (pack-years). The sex comparison comprised male (n ⫽ 11; age, 46.4 ⫾ 19.7 yr) and female (n ⫽ 10; age, 53.2 ⫾ 14.9 yr) individuals matched for age and cumulative smoking history.

RESULTS Sample Characteristics

Table 1 displays the characteristics of the lung samples that underwent microarray analysis. In total, 36 microarrays were completed on 23 individuals (age range, 21–74 yr; mean, 50.1 yr). Eleven of the patients were men. Seventy-eight percent (18/23) of individuals were Caucasian, 17% (4/23) were African American, and 4% (1/23) were Hispanic. All individuals developed brain death, with 50% (11/23) resulting from cerebrovascular accident. Thirty percent (7/23) died from head trauma, and 22% (5/23) died from other causes, including intracranial hemorrhage, malignancy, hydrocephalus, and anoxia. All lungs were received as whole, single lung specimens. Sixty percent of the lung samples (14/23) were obtained from the right side, and the remaining 40% (9/23) were from the left side. Smoking histories, including current smoking status and cumulative pack-years, were available in a majority of cases (Table 1). Thirty-five percent (8/23) of patients were current smokers, 30% (7/23) were former smokers, and 35% (8/23) were nonsmokers. Microarray Analysis

Technical replicate variability. Figure 1 shows the intraclass correlation of the technical replicates and illustrates the components of variance expressed as the mean ⫾ 2SD for each component. This analysis demonstrates that the total amount of variability observed within a set of RNA replicates is predominantly ac-

Figure 1. Intraclass correlation for technical replicates. The x axis represents the sources of variation (i.e., the variance components from genes, lobes, and microarray platform [chips]) included in the analysis (32,468 genes, 5 chips per upper versus lower lobe). The components of variance are expressed on the y axis as ⫾ 2 SD for each variability centered at the grand mean of gene expression. The intraclass correlation is the ratio of gene variability to total variability. A value of 0.946 indicates excellent reproducibility of gene expression across the microarray platform. The largest component of variability in this experiment is among the gene expression. ICC, intraclass correlation (V[genes]/V[total]).

67

counted for by the expression variation among the 32,468 analyzed genes. The intraclass correlation calculation, which accounts for the contribution of gene expression variability to the total experiment variability, is 0.946. There is negligible variability when comparing a single individual’s upper lobe to lower lobe and even less variability between RNA replicates. Regional variability. Sixteen upper lobe and lower lobe peripheral microarrays from eight individuals were analyzed using a paired t test. Unsupervised hierarchical clustering of this data is shown in Figure 2A. Figure 2A demonstrates that an individual’s upper and lower samples are more closely related to each other than to another individual, with the exception of samples 20008 and 20017, which did not cluster together. Additional permutations of the data confirm this discordant clustering to be a true finding and not a result of the applied clustering algorithm. Upon review of the tissue handling, RNA quality, chip quality, histologic analysis, and available clinical data, we have not identified an explanation for this observed dissimilarity. There was no observed clustering related to age, sex, or smoking status. Upper and lower lobe samples were compared using a paired t test. Figure 2B shows the results of this class comparison illustrated by an overabundance graph as described by Kaminski and colleagues (43). This graph compares the number of genes observed over a range of P value scores (observed discovery) with what would be expected under the matching null hypothesis (chance discovery). The comparison of observed discovery to chance discovery yields a global assessment of true or significant discovery between upper and lower lobes. In Figure 2B, the comparison of paired upper and lower lobe samples demonstrates that, for any given P value, all observed differences between the groups can be explained by chance. In other words, there is no significant difference in gene expression between upper and lower lobes from the same individual. We also compared left lung (n ⫽ 6) and right lung (n ⫽ 12) by a two-sample t test (data not shown) and found no statistically significant differences in gene expression between anatomic lobes within the population. Population Variability

Sex. Lower lobe microarrays of male (n ⫽ 11; age, 46.4 ⫾ 19.7 yr) and female subjects (n ⫽ 10; age 53.2 ⫾ 14.9 yr) matched for age and cumulative smoking history were compared by a twosample t test. Overall results are presented in Figure 3A. Over the entire gene expression profile, there are no significant differences between male and female subjects. However, at P ⬍ 0.001 (Figure 3B), there are ⵑ 33 genes significantly different between the groups. Ninety percent of these significant genes are located on the X or Y chromosome and the follow expected distribution (i.e., Y-linked genes have higher expression in the male group, and X-linked genes have higher expression in the female group). Age. Individuals with age ⬎ 60 yr (n ⫽ 6; age, 68.7 ⫾ 4.3 yr) or ⬍ 40 yr (n ⫽ 7; age, 28.7 ⫾ 7.1 yr) were compared by a twosample t test. Results are presented as an overabundance graph in Figure 4A. The lower lobe expression from an individual matched for sex and cumulative smoking history was used for comparative analysis. Results show that there are numerous genes differentially expressed between the age groups. Figure 4B illustrates that at P value ⬍ 0.001, there are 40 signature genes discriminating between the two age groups. The blue coloring in Figure 4B represents relatively low expression, and the red coloring represents relatively high expression. Although permutation analysis does not support this gene list as statistically robust (P ⫽ 0.086), the overabundance graph (Figure 4A) supports a conclusion that differential expression exists between the lungs of older and younger individuals (12, 16, 43–45).

68

AMERICAN JOURNAL OF RESPIRATORY CELL AND MOLECULAR BIOLOGY VOL 35

2006

Figure 2. Regional variability of gene expression in human lungs. Global analysis of gene expression using the 20,669 probe meeting detection and intensity criteria. (A ) Dendrogram illustrating the relatedness of all 16 upper/lower paired microarray experiments, based on centered correlation and displayed with average linkage. Six out of the eight individual pairs cluster together, demonstrating that an individual’s upper and lower samples are most closely related to each other than to another individual’s similar anatomic region. No clustering based on age or sex is observed. (B ) Regional variability overabundance graph: upper versus lower lobe (n ⫽ 16; paired t test). Overabundance graphs compare the number of genes observed over a range of P value scores (observed discovery, red line) with what would be expected under the matching null hypothesis (chance discovery, green line). The comparison of observed discovery to chance discovery yields a global assessment of true or significant discovery between comparison groups. In this comparison, any differentially expressed genes observed between upper and lower lobes can be explained by chance.

Gene Variability Analysis

To investigate the variability in gene expression, we examined the expression value for 20,669 probe sets across 21 lower lobe microarray experiments and computed the SD of each gene’s expression across individuals. To identify classes of genes with significantly higher- or lower-than-expected variability of expression, we used MAPPFinder 2.0 (42) to compare the observed distribution of variability in genes associated with each particular GO term to the overall distribution of variability. GO categories over-represented in the upper and lower deciles are listed in Tables E1 and E2 in the online supplement, and two large representative categories are contrasted with the overall variability distribution in Figure 5. Results show that gene expression variability is heterogeneously distributed among biological categories (Tables E1 and E2), with the greatest amount of expression variability related to immune function and immune-related processes. Conversely, the least amount of variability in gene expression was observed in areas of cell metabolism and cellular maintenance functions.

DISCUSSION Gene expression microarray profiling has become a common methodologic tool in the field of molecular biology and medical research. Once limited to tightly controlled in vitro studies, this technology has rapidly advanced into more complex biologic systems. Although the use of microarray technology is wide-

spread in the study of human disease, there remain many unanswered questions regarding normal variation in human gene expression between individuals. In addition, the relative contributions of the array platform, anatomic regional sampling, intrinsic tissue sample composition, and sample processing to overall experimental variability may pose challenges to true pathobiologic discovery. Several studies have tried to address the issues of tissue variability and population variability in nondiseased human tissues such as blood (27), cornea (30), retina (28, 29), and muscle (33–35). Although most studies suggest that technical variability is small (35, 46, 47), there remain conflicting reports (48). Therefore, we sought to analyze these major sources of gene expression variability while describing the natural biologic variability in the nondiseased human lung, a major organ focus of interest in our laboratory. Reproducibility is a critical factor in gene expression profile experiments. Similar to other published reports (35, 46, 47), we demonstrate that technical variability within the Affymetrix oligonucleotide microarray platform contributes minimally to overall experimental variability. Within our technical replicate experiment, the greatest source of variability was observed within the individual gene/probe set expression with small relative contributions from different regional locations and chip replicates (Figure 1). These results confirm the findings of others (35, 46, 47) and support the conclusion that the use of RNA replicates in the Affymetrix platform adds little to the precision of the

Figure 3. Sex-related variation in gene expression. Lower lobe microarrays of male (n ⫽ 11; age, 46.4 ⫾ 19.7 yr) and female subjects (n ⫽ 10; age, 53.2 ⫾ 14.9 yr) were compared using a two-sample t test. (A ) Overabundance graph showing the entire P value range illustrates no global difference between male and female lung gene expression. (B ) Detailed view of P value range (0– 0.005). Of the 33 probe sets exhibiting differential expression at P ⬍ 0.0002, 19 represent Y-linked genes, 11 represent X-linked genes, and 3 represent autosomal genes.

Gruber, Coldren, Woolum, et al.: Gene Expression Variability in the Human Lung

69

Figure 5. Distribution of gene expression variability. Histogram showing the distribution of SD values for the 410 probe sets in the GO category “metabolism” (GOID 8,152) and the 330 probesets in the GO category “immune function” (GOID 6,955). Probe sets representing metabolism genes are over-represented in the lowest decile of variability (z-score, 5.2), whereas those representing immune function are overrepresented in the highest decile (z-score, 8.6). The distribution of variability between these two GO categories differs significantly (␹2 11.66, P ⫽ 0.0006).

Figure 4. Age-related variation in gene expression. Lower lobe microarray samples from older subjects (n ⫽ 6; age, 68.7 ⫾ 4.3 yr) and younger subjects (n ⫽ 7; age, 28.7 ⫾ 7.1 yr) were compared using a twosample t test. (A ) Overabundance graph showing that irrespective of the selected P value threshold chosen, significant numbers of differentially expressed genes can be identified between older and younger subjects. (B ) Heat map illustrating the relative expression of 40 genes that meet the threshold of P ⬍ 0.001. Blue coloring represents relatively low expression; red coloring represents relatively high expression. Although permutation analysis does not support this gene list as statistically robust (P ⫽ 0.086), the overabundance graph supports the conclusion that differential expression exists between the lungs of older and younger individuals.

overall analysis. Therefore, in studies projected to have a large sample size, technical replicates may not be necessary. In studies of other organs, tissue variability, in the context of sample heterogeneity and regional distance, has demonstrable effects on gene expression. Bakay and colleagues investigated intraindividual muscle biopsy variations in gene expression and found that the greatest source of variability was between different regions of the same individual’s biopsy, highlighting the importance of cell-type composition on expression differences (35). Likewise, Whitney and colleagues demonstrated that variations in gene expression patterns in peripheral blood can be traced to differences in the relative proportions of specific blood

cell types (27). We initially analyzed intraindividual tissue replicates and lobar replicates for variability in gene expression. Tissue replicates consisted of duplicate sections of the sample anatomic region that underwent separate probe synthesis and array hybridization. Lobar replicates were “central” (within 5 cm of the mainstem bronchus) or “peripheral” (within 5 cm of the pleura) sections taken from the same lobe in the same individual and were subjected to an identical array protocol. Preliminary analysis from our small comparative groups (n ⫽ 3 and n ⫽ 6, respectively) suggested that the greatest expression variability was between individuals in the population as compared with within-individual intralobar regions. Given our concern regarding the potential for inadvertent central airway sampling and the clinical knowledge that different lung diseases preferentially involve different anatomic regions, we focused on investigating peripheral tissue variability within distinct anatomic regions (namely, upper lobe versus lower lobe). We found that within an individual, differentially expressed gene transcripts between upper and lower lobes are observed (data not shown). This finding was individually consistent within the population. However, when we compared upper lobe with lower lobe over the entire population using a parametric paired t test, we found that there were no significant differences in gene expression between the groups. These findings support the conclusion that there are differentially expressed genes across the upper and lower lobes of an individual; however, these differentially expressed genes are not consistently observed across the population. Thus, to minimize regional tissue sample variability in the human lung, study design with accrual of an appropriate sample size is important. Although it seems logical that case-control microarray expression experiments should match individuals for such covariates as age and sex, most reported array studies comparing diseased and nondiseased groups neither describe nor characterize the comparative control population. Several microarray studies in nondiseased human retina (28), brain (31), and muscle (34) have identified age- and sex-associated expression patterns within their study populations. In the present study, we demonstrate global differences in gene expression between older and younger subjects (Figures 4) matched for sex and cumulative smoking history that approach statistical significance. Given the significant age-related variation in gene expression demonstrated in

70

AMERICAN JOURNAL OF RESPIRATORY CELL AND MOLECULAR BIOLOGY VOL 35

our overabundance analysis, our t test analysis is underpowered to detect statistically significant differences among the older and younger groups. Furthermore, the use of array analysis modeling and, in particular, results of the age-related overabundance plot support our finding that age differences in gene expression in the human lung is a variable worth further investigation (16, 43–45). We observed no significant global difference in gene expression when we compared age-matched and cumulative smoking history–matched male subjects with female subjects (Figure 3). However, when we focused on highly statistically significant differences, we noted that ⵑ 33 of the total 35,219 genes were highly statistically different between the two populations. The majority of these genes are located on the X or Y chromosomes. Our results support the findings of others (28, 31, 34) that age and sex have measurable effects on gene expression profiles. Our study is the first to demonstrate that these differences are present in the nondiseased human lung. Given the limitations in our sample size for the subgroup analyses of age and sex, it is possible that the gene expression values for many genes are not normally distributed across the population. To investigate the impact of this on our age group comparison, we repeated our analysis using a nonparametric approach. Figure 4B illustrates 40 probe sets with a P value ⬍ 0.001 by parametric analysis. Repeating the analysis with a univariate permutation t test yields 37 probe sets (data not shown). As expected from the false discovery rate analysis, the parametric test and the univariate permutation test correspond on approximately half the genes meeting the P ⬍ 0.001 threshold (see Figures E1 and E2 and Table E3). Looking across the population, we observed that the variability in gene expression was heterogeneously distributed among biological categories (Tables E1 and E2 and Figure 5). We found the greatest amount of gene expression variability to be within immune function and immune-related processes. The least amount of variability in gene expression was observed in areas of cell metabolism and cellular maintenance functions. These observations have important implications for future microarray study designs because the inherent variability within the gene categories of interest strongly affects the sample size required for the measurement of a given size effect within each respective category. Some of the variability observed in gene expression, particularly with respect to inflammatory and immune processes, may reflect the fact that all of the patients suffered brain death. As evidenced by a large body of transplantation research, brain death has neurohumoral, metabolic, and inflammatory effects on the host (49–52). It is likely that this physiologic state, in addition to inherent differences in the nondiseased lung across the study population, underlies the variability in expression observed for immune function genes. A second potential limitation is that all patients required mechanical ventilation. Although there was no evidence of documented infection, severe gas exchange abnormality, or chest radiograph abnormality in any of the samples, the use of mechanical ventilation may have demonstrable effects on gene expression (53). Lastly, sample processing, handling, postmortem interval, and ischemia have measurable effects on gene expression. These effects have been demonstrated in several studies of various nondiseased human tissue types, including blood (27), muscle (33), intestinal mucosa (36), and brain (37). Li and colleagues suggest that in postmortem brain, tissue samples may be more vulnerable to these processing effects in vitro based on the clinical course and host capability to respond to proteolytic and metabolic stress in vivo (37). In our study, we obtained tissue samples within an average of 17 h of procurement, and all samples were expeditiously processed in the same manner by the same investigator (G.P.C.). Given that samples were obtained from different hospitals by

2006

different surgical teams, we are limited as to standardized procurement handling and processing procedures. Our work represents the first attempt to describe the human lung transciptome in a clinically nondiseased state. Our data provide novel insight into the natural variability in the human lung and describe the relative contributions of all major sources of gene expression variability. Our results show that population variability makes the greatest contribution to overall variability. Thus, adequate sample size is of paramount importance in study design. Regional variability within different anatomic locations in the lung may be significant when comparing only a small group of microarrays. This finding has important implications for the design of future comparisons with specific lung disease states because it suggests that different peripheral anatomic regions can be compared between populations if comparative groups are significant in size. Sample size estimates for comparative analyses cannot be generated from our database because the appropriate study sample size for any microarray expression study depends on the inherent variability of the gene or genes of interest and on the size of the effect that is anticipated. We also demonstrate that the contribution of the oligonucleotide microarray platform contributes little to the observed gene expression variability, and thus the inclusion of RNA replicates within the study design can be avoided in favor of expanded sample numbers. This comprehensive human lung database is available to the research community for its ongoing use as a control dataset for future comparative analyses. Conflict of Interest Statement : None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript.

References 1. Talbot SG, Estilo C, Maghami E, Sarkaria IS, Pham DK, Charoenrat P, Socci ND, Ngai I, Carlson D, Ghossein R, et al. Gene expression profiling allows distinction between primary and metastatic squamous cell carcinomas in the lung. Cancer Res 2005;65:3063–3071. 2. Jones MH, Virtanen C, Honjoh D, Miyoshi T, Satoh Y, Okumura S, Nakagawa K, Nomura H, Ishikawa Y. Two prognostically significant subtypes of high-grade lung neuroendocrine tumours independent of small-cell and large-cell neuroendocrine carcinomas identified by gene expression profiles. Lancet 2004;363:775–781. 3. Borczuk AC, Shah L, Pearson GD, Walter KL, Wang L, Austin JH, Friedman RA, Powell CA. Molecular signatures in biopsy specimens of lung cancer. Am J Respir Crit Care Med 2004;170:167–174. 4. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 2002;8:816–824. 5. Moran CJ, Arenberg DA, Huang CC, Giordano TJ, Thomas DG, Misek DE, Chen G, Iannettoni MD, Orringer MB, Hanash S, et al. RANTES expression is a predictor of survival in stage I lung adenocarcinoma. Clin Cancer Res 2002;8:3803–3812. 6. Tomida S, Koshikawa K, Yatabe Y, Harano T, Ogura N, Mitsudomi T, Some M, Yanagisawa K, Takahashi T, Osada H, et al. Gene expressionbased, individualized outcome prediction for surgically treated lung cancer patients. Oncogene 2004;23:5360–5370. 7. Ju Z, Kapoor M, Newton K, Cheon K, Ramaswamy A, Lotan R, Strong LC, Koo JS. Global detection of molecular changes reveals concurrent alteration of several biological pathways in nonsmall cell lung cancer cells. Mol Genet Genomics 2005;274:141–154. 8. Cho NH, Hong KP, Hong SH, Kang S, Chung KY, Cho SH. MMP expression profiling in recurred stage IB lung cancer. Oncogene 2004; 23:845–851. 9. Kikuchi T, Daigo Y, Katagiri T, Tsunoda T, Okada K, Kakiuchi S, Zembutsu H, Furukawa Y, Kawamura M, Kobayashi K, et al. Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene 2003;22:2192–2205. 10. Bergstralh DT, Taxman DJ, Chou TC, Danishefsky SJ, Ting JP. A comparison of signaling activities induced by Taxol and desoxyepothilone B. J Chemother 2004;16:563–576.

Gruber, Coldren, Woolum, et al.: Gene Expression Variability in the Human Lung 11. Kim H, Xu GL, Borczuk AC, Busch S, Filmus J, Capurro M, Brody JS, Lange J, D’Armiento JM, Rothman PB, et al. The heparan sulfate proteoglycan GPC3 is a potential lung tumor suppressor. Am J Respir Cell Mol Biol 2003;29:694–701. 12. Selman M, Pardo A, Barrera L, Estrada A, Watson SR, Wilson K, Aziz N, Kaminski N, Zlotnik A. Gene expression profiles distinguish idiopathic pulmonary fibrosis from hypersensitivity pneumonitis. Am J Respir Crit Care Med 2006;173:188–198. 13. Cosgrove GP, Schwarz MI, Geraci MW, Brown KK, Worthen GS. Overexpression of matrix metalloproteinase-7 in pulmonary fibrosis. Chest 2002;121:25S–26S. 14. Cosgrove GP, Brown KK, Schiemann WP, Serls AE, Parr JE, Geraci MW, Schwarz MI, Cool CD, Worthen GS. Pigment epithelium-derived factor in idiopathic pulmonary fibrosis: a role in aberrant angiogenesis. Am J Respir Crit Care Med 2004;170:242–251. 15. Kaminski N. Microarray analysis of idiopathic pulmonary fibrosis. Am J Respir Cell Mol Biol 2003;29:S32–S36. 16. Selman M, Pardo A, Barrera L, Estrada A, Watson SR, Wilson K, Aziz N, Kaminski N, Zlotnik A. Gene expression profiles distinguish idiopathic pulmonary fibrosis from hypersensitivity pneumonitis. Am J Respir Crit Care Med 2006;173:188–198. 17. Geraci MW, Moore M, Gesell T, Yeager ME, Alger L, Golpon H, Gao B, Loyd JE, Tuder RM, Voelkel NF. Gene expression patterns in the lungs of patients with primary pulmonary hypertension: a gene microarray analysis. Circ Res 2001;88:555–562. 18. Golpon HA, Geraci MW, Moore MD, Miller HL, Miller GJ, Tuder RM, Voelkel NF. HOX genes in human lung: altered expression in primary pulmonary hypertension and emphysema. Am J Pathol 2001;158:955– 966. 19. Hackett NR, Heguy A, Harvey BG, O’Connor TP, Luettich K, Flieder DB, Kaplan R, Crystal RG. Variability of antioxidant-related gene expression in the airway epithelium of cigarette smokers. Am J Respir Cell Mol Biol 2003;29:331–343. 20. Shah V, Sridhar S, Beane J, Brody JS, Spira A. SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic Acids Res 2005;33: D573–D579. 21. McGlothlin JR, Gao L, Lavoie T, Simon BA, Easley RB, Ma SF, Rumala BB, Garcia JG, Ye SQ. Molecular cloning and characterization of canine pre-B-cell colony-enhancing factor. Biochem Genet 2005;43: 127–141. 22. Park JS, Arcaroli J, Yum HK, Yang H, Wang H, Yang KY, Choe KH, Strassheim D, Pitts TM, Tracey KJ, et al. Activation of gene expression in human neutrophils by high mobility group box 1 protein. Am J Physiol Cell Physiol 2003;284:C870–C879. 23. Laprise C, Sladek R, Ponton A, Bernier MC, Hudson TJ, Laviolette M. Functional classes of bronchial mucosa genes that are differentially expressed in asthma. BMC Genomics 2004;5:21. 24. Yuyama N, Davies DE, Akaiwa M, Matsui K, Hamasaki Y, Suminami Y, Yoshida NL, Maeda M, Pandit A, Lordan JL, et al. Analysis of novel disease-related genes in bronchial asthma. Cytokine 2002;19:287–296. 25. Srivastava M, Eidelman O, Pollard HB. Pharmacogenomics of the cystic fibrosis transmembrane conductance regulator (CFTR) and the cystic fibrosis drug CPX using genome microarray analysis. Mol Med 1999;5: 753–767. 26. Wright JM, Zeitlin PL, Cebotaru L, Guggino SE, Guggino WB. Gene expression profile analysis of 4-phenylbutyrate treatment of IB3–1 bronchial epithelial cell line demonstrates a major influence on heatshock proteins. Physiol Genomics 2004;16:204–211. 27. Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, Relman DA, Brown PO. Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci USA 2003;100:1896–1901. 28. Chowers I, Liu D, Farkas RH, Gunatilaka TL, Hackam AS, Bernstein SL, Campochiaro PA, Parmigiani G, Zack DJ. Gene expression variation in the adult human retina. Hum Mol Genet 2003;12:2881–2893. 29. Yoshida S, Yashar BM, Hiriyanna S, Swaroop A. Microarray analysis of gene expression in the aging human retina. Invest Ophthalmol Vis Sci 2002;43:2554–2560. 30. Gottsch JD, Seitzman GD, Margulies EH, Bowers AL, Michels AJ, Saha S, Jun AS, Stark WJ, Liu SH. Gene expression in donor corneal endothelium. Arch Ophthalmol 2003;121:252–258.

71

31. Lu T, Pan Y, Kao SY, Li C, Kohane I, Chan J, Yankner BA. Gene regulation and DNA damage in the ageing human brain. Nature 2004;429:883–891. 32. Higgins JP, Wang L, Kambham N, Montgomery K, Mason V, Vogelmann SU, Lemley KV, Brown PO, Brooks JD, van de Rijn M. Gene expression in the normal adult human kidney assessed by complementary DNA microarray. Mol Biol Cell 2004;15:649–656. 33. Sanoudou D, KangPB, Haslett JN, Han M, Kunkel LM, Beggs AH. Transcriptional profile of postmortem skeletal muscle. Physiol Genomics 2004;16:222–228. 34. Roth SM, Ferrell RE, Peters DG, Metter EJ, Hurley BF, Rogers MA. Influence of age, sex, and strength training on human muscle gene expression determined by microarray. Physiol Genomics 2002;10:181– 190. 35. Bakay M, Chen YW, Borup R, Zhao P, Nagaraju K, Hoffman EP. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformatics 2002;3:4. 36. Huang J, Qi R, Quackenbush J, Dauway E, Lazaridis E, Yeatman T. Effects of ischemia on gene expression. J Surg Res 2001;99:222–227. 37. Li JZ, Vawter MP, Walsh DM, Tomita H, Evans SJ, Choudary PV, Lopez JF, Avelar A, Shokoohi V, Chung T, et al. Systematic changes in gene expression in postmortem human brains associated with tissue pH and terminal medical conditions. Hum Mol Genet 2004;13:609–616. 38. Tumor Analysis Best Practices Working Group. Expression profiling: best practices for data generation and interpretation in clinical trials. Nat Rev Genet 2004;5:229–237. 39. Golpon HA, Coldren CD, Zamora MR, Cosgrove GP, Moore MD, Tuder RM, Geraci MW, Voelkel NF. Emphysema lung tissue gene expression profiling. Am J Respir Cell Mol Biol 2004;31:595–600. 40. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001;29:365–371. 41. Wu Z, Irizarry RA. Preprocessing of oligonucleotide array data. Nat Biotechnol 2004;22:656–658. 42. Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol 2003;4:R7. 43. Kaminski N, Friedman N. Practical approaches to analyzing results of microarray experiments. Am J Respir Cell Mol Biol 2002;27:125–132. 44. Barash Y, Dehan E, Krupsky M, Franklin W, Geraci M, Friedman N, Kaminski N. Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays. Bioinformatics 2004;20:839–846. 45. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006;7:55–65. 46. Han ES, Wu Y, McCarter R, Nelson JF, Richardson A, Hilsenbeck SG. Reproducibility, sources of variability, pooling, and sample size: important considerations for the design of high-density oligonucleotide array experiments. J Gerontol A Biol Sci Med Sci 2004;59:306–315. 47. McClintick JN, Jerome RE, Nicholson CR, Crabb DW, Edenberg HJ. Reproducibility of oligonucleotide arrays using small samples. BMC Genomics 2003;4:4. 48. Novak JP, Sladek R, Hudson TJ. Characterization of variability in largescale gene expression data: implications for study design. Genomics 2002;79:104–113. 49. Avlonitis VS, Fisher AJ, Kirby JA, Dark JH. Pulmonary transplantation: the role of brain death in donor lung injury. Transplantation 2003;75: 1928–1933. 50. Novitzky D. Detrimental effects of brain death on the potential organ donor. Transplant Proc 1997;29:3770–3772. 51. Chen EP, Bittner HB, Kendall SW, Van Trigt P. Hormonal and hemodynamic changes in a validated animal model of brain death. Crit Care Med 1996;24:1352–1359. 52. Fisher AJ, Donnelly SC, Hirani N, Burdick MD, Strieter RM, Dark JH, Corris PA. Enhanced pulmonary inflammation in organ donors following fatal non-traumatic brain injury. Lancet 1999;353:1412–1413. 53. Copland IB, Kavanagh BP, Engelberts D, McKerlie C, Belik J, Post M. Early changes in lung gene expression due to high tidal volume. Am J Respir Crit Care Med 2003;168:1051–1059.