Quantifying the relationship between gene

Genetics: Published Articles Ahead of Print, published on September 15, 2004 as 10.1534/genetics.104.031666

Quantifying the relationship between gene expressions and trait values in general pedigrees Yan Lu1, Peng–Yuan Liu1, Yong–Jun Liu1,.2, Fu–Hua Xu1,2 and Hong–Wen Deng1,2,3 1. Osteoporosis Research Center 2. Department of Biomedical Sciences Creighton University, 601 N. 30th St., Suite 6787, Omaha, NE 68131 3. Laboratory of Molecular and Statistical Genetics, College of Life Sciences Hunan Normal University, Changsha, Hunan 410081, P. R. China

Running title: relationship between gene expressions and traits Key words: gene expressions, trait values, relationship, variance components

Corresponding author: Hong–Wen Deng, Ph. D. Osteoporosis Research Center

Tel: 402-280-5911

Creighton University

Fax: 402-280-5034

601 N 30th St. Suite 6787

Email: [email protected]

Omaha, NE 68131

1

Abstract Treating mRNA transcript abundances as quantitative traits and examining their relationships with clinical traits have been pursued by using analytical approach of quantitative genetics. Recently, Kraft et al. presented a family expression association test (FEXAT) for correlation between gene expressions and trait values with a family–based (sibships) design. This statistic did not account for biological relationships of related subjects, which may inflate type I error rate and/or decrease power of statistical tests. In this article, we propose two new test statistics based on a variance–components approach for analyses of microarray data obtained from general pedigrees. Our methods accommodate covariance between relatives for unmeasured genetic effects and directly model covariates of clinical importance. The efficacy and validity of our methods are investigated by using simulated data under different sample sizes, family sizes and family structures. The proposed method LR has correct type I error rate with moderate to large sample sizes regardless of family structure and family sizes. It has higher power with complex pedigrees and similar power to the FEXAT with sibships. The other proposed method FEXAT(R) is favorable with large family sizes, regardless of sample sizes and family structure. Our methods, robust to population stratification, are complementary to the FEXAT in expression–trait association studies.

2

Introduction In the past few years, there have been increasing interests in genetic studies of complex diseases by combining information on clinical traits, marker genotypes and comprehensive gene expressions. It was proposed that standard methods of quantitative genetics can be applied to microarray data analyses (Wolfinger et al. 2001; Kraft et al. 2003). Treating mRNA transcript levels as quantitative traits, efforts have been pursued in examining their relationships with clinical traits and mapping gene expression quantitative trait loci (eQTL) for these traits by using analytical approach of quantitative genetics (Kraft et al. 2003; Schadet et al. 2003). Gene–expression levels are represented in populations by continuous variation. The studies in model organisms illustrated the genome-wide view of gene expression levels as heritable phenotypes (Cheung and Spielman 2002). Li et al. compared transcript levels in lymphoblastoid cell lines from twins by cDNA microarray and RT-PCR to study the heritability of gene expression in humans. The distribution of the heritability among all genes showed a moderate and homogeneous positive shift that affects the majority of genes (Li et al., 2003). Brem et al. used a genome-wide genetic linkage approach to map the determinants of variation in gene expression. Their results suggested that the expression of most genes is affected by more than one locus (Brem et al. 2002). Most complex human phenotypes are also influenced by multiple genes (Chakravarti 1999). In developing analytical tools, it’s important to consider genetic structure of/between gene expressions or of/between gene expression and a complex trait. Replication is important in microarray experiments (Kerr and Churchill 2001; Nguyen et al. 2002; Yang and Speed 2002). Technical replication, such as spotting genes multiple times per array and hybridizing multiple arrays to the same samples, can only address the measurement error of an experiment. Bakay et al. (2002) found that major unwanted variability in expression 3

profiling experiments were from a substantial inter–individual variability. Such variability can be addressed by using multiple individuals randomly sampled from a population. However, as in genetic epidemiology studies, random sampling of individuals is liable to population stratification that may not only inflate Type I error but also mask real genetic effects (Deng 2001). Families generally contain more information about inherited traits than random unrelated individuals, and family–based studies usually rely on variation and covariation among relatives. Recently Kraft et al. (2003) made pioneering efforts to adopt a family–based design in microarray studies to minimize systematic biases such as those from stratification. Stratification was recognized as a confounding factor contributing to spurious association between transcript abundance and disease status in the sample (Gibson 2003; Kraft et al. 2003). Kraft et al. (2003) presented a stratified family expression association test (FEXAT) to examine the correlation between gene expressions and traits, which was claimed to account for family structure. The FEXAT has a smaller false–discovery rate than the standard Pearson’s correlation coefficient test when within–family correlation is of interest. However, it only considers sibship means and does not account for biological relationships between subjects within families. In addition, for large and complex pedigrees, the FEXAT only uses data form sibships extracted from large and complex pedigrees and does not utilize the information from all the pedigree members fully and simultaneously. Factors such as age, genotype, sex and habitual physical activity are important sources of variation in microarray experiments (Jin et al. 2001; Nogalska et al. 2001; Roth et al. 2002; Yang et al. 2003) and for complex traits (Deng et al. 2002). These factors may affect statistical analyses if they are not accounted for. Hence, covariate effects should be considered

4

simultaneously and explicitly for gene expression and quantitative trait analyses, which is to be done in the FEXAT (Kraft et al. 2003) and new analyses to be developed. Stimulated largely by the pioneering work of Kraft et al. (2003), to account for family structure and potential covariate effects, we propose a modified FEXAT statistic and a bivariate analysis based on a variance–components approach to quantify the correlation between gene– expression levels and complex clinic traits. The properties of these two methods in terms of power and type I error rate are explored in a range of situations, in comparison with the FEXAT.

Materials and Methods FEXAT statistic As a starting point, we briefly introduce the statistic FEXAT proposed by Kraft et al. (2003):

∑∑ x ( y − y )] FEXAT = (1) 1 ∑ n − 1∑ ( x − x ) ∑ ( y − y ) where i = 1,K, n index the sibships in the study (extended pedigrees may contribute multiple sibships) and j = 1, K , n index the subjects in a sibship. x is the expression level for subject j [

2

i

j

i

ij

ij

ij

i.

2

i

i.

2

ij

j

i.

i

i

ij

in family i , which is measured as the (log of the) fold change of the expression for the gene under study in the subject’s RNA relative to a reference sample as in cDNA arrays or as match– mismatch score as in oligonucleotide arrays. yij is the trait value. This FEXAT statistic is compared with its asymptotic χ12 distribution or an empirical permutation distribution to derive P values in practical data analyses (Kraft et al. 2003). It only accounts for sibship means for gene expression levels and trait values, and does not consider family structure — the biological relationship between subjects within a family. If the family structures are pedigrees, some

5

subjects will need to be excluded to calculate the statistic, since it only accommodates data for sibs. FEXAT(R) statistic To account for complete family structure — the relationships between all subjects within a pedigree, we propose here a statistic FEXAT(R), a revised version of the FEXAT. Denote X i = ( xi1 ,

L, x

ini

)T and Yi = ( yi1 ,

L, y

ini

)T , as gene–expression levels and trait values within

family i (with ni measured subjects), respectively. They can be expressed, respectively, as, X i = WXi β X + U Xi FXi + z Xi + e Xi

(2)

Yi = WYi β Y + U Yi FYi + z Yi + e Yi

(3)

and

where β X and β Y , respectively, are the vectors of fixed effects of gene–expression levels and trait values, which may incorporate effects of any observable covariates (e.g., sex and age) as well as overall means of gene–expression levels and trait values. WXi and WYi are the incidence matrices of β X and β Y , respectively. FXi and FYi , respectively, are the family–mean differences of gene–expression levels and trait values, which may be due to confounding factors such as population stratification (Kraft et al. 2003); FXi ~ N (0, σ F2X i J i ) and FYi ~ N (0, σ F2Yi J i ) , where J i is a matrix with all elements being 1. U Xi and U Yi are the incidence vectors of FXi and FYi with elements of 0 or 1, respectively. z X i and z Yi are the vectors of additive genetic effects of gene– expression levels and trait values, respectively, z X i ~ N (0, σ Z2 X i G i ) and z Yi ~ N (0, σ Z2Yi G i ) , where G i is the relationship matrix for the ni observed individuals within family i. The elements of G i are twice the coefficients of kinship. e Xi and e Yi are the vectors of residual effects of

6

gene–expression levels and trait values within family i, respectively, e Xi ~ N (0, σ e2X i I ) and e Yi ~ N (0, σ e2Yi I ) , where I is an ni × ni identity matrix. The variances of X i and Yi are thus VX i = σ F2 X i J i + σ Z2 X i G i + σ e2X i I and VYi = σ F2Yi J i + σ Z2Yi G i + σ e2Yi I , respectively. The FEXAT(R) can

be expressed as,

FEXAT( R) =

∑

[∑ (X

1 i ni − 1

i

{∑[(X − W

− 12

− 12

i

i

}

β X ) T VX VY (Yi − WY β Y )] i

− WX i β X i ) V ( X i − WX i β X i ) T

i

Xi

i

i

−1 Xi

i

i

∑ (Y − W β i

i

Yi

2

) V ( Yi − WYi β Yi ) −1 Yi

T

Yi

(4)

]

Similar to the FEXAT, in practical data analyses, P values for this statistic can be obtained by χ12 test or permutation test. Likelihood ratio (LR) statistic We present a new statistic (LR) by conducting analyses in a bivariate variance– components X = ( X1 ,

framework

(Lange

and

Boehnke,

1983;

Lange

1997).

Define

KX ,K, X ) and Y = (Y ,K, Y ,K, Y ) , as gene–expression levels and trait values T

i

T

n

1

i

n

from the whole sample where X i and Yi were defined in the Equations (2) and (3). X and Y can be written in a matrix form as,  X   WX β X   U X FX   z X   e X  + + +    =          Y   WY β Y   U Y FY   z Y   e Y 

where WX = (WX , 1

FY = (FY1 ,

K , W ,K , W Xi

K , F ,K , F Yi

Yn

),U T

X

=

), T

Xn

K, W ,K, W ) , F = (F ,K, F L 0 U L 0 M O M and U = 0 U 0 M O M 0 L 0 L U

WY = (WY1 ,

L M O M L

 U X1    0     0

0 U Xi 0

(5) T

Yi

Xn

7

       

X

Yn

Y

       

X1

Xi

Y1

Yi

K, F ) , L 0 M 0 . O M L U ,

T

Xn

Yn

       

β X , β Y , WX , WY , FX and FY were defined for Equations (2) and (3). Thus, the variance– i

i

i

i

covariance matrix is, σ = J ⊗  fx σ fxy

2 2  σ ax σ axy   σ ex σ exy  σ fxy  +I⊗ +G⊗ 2  2  σ 2fy  σ axy σ ay  σ exy σ ey 

2

V

where J

L M O M L

 J1   = 0     0

0 Ji 0

L

0

   0   J n 

M O M L

and G

L M O M L

G1   = 0     0

0 Gi 0

L

0

M 0 O M L G

n

   .    

(6)

J i and G i were defined for

Equations (2) and (3). I is an identity matrix. σ 2fx and σ 2fy denote the family mean variances of the studied gene–expression levels and trait values in the study population, respectively. σ fxy denotes the corresponding covariance between them. σ ax2 and σ ay2 denote the variances of additive effects of the studied gene–expression levels and trait values in the population, respectively. σ axy denotes the corresponding covariance between them. σ ex2 and σ ey2 denote the variances of residual effects of the studied gene–expression levels and trait values in the population, respectively. σ exy denotes the corresponding covariance between them. The genetic correlation between gene–expression levels and trait values can be calculated as

ρ axy = σ axy

σ ax2 σ ay2 .

Under the assumption of the multivariate normality of gene–expression levels and trait values, the ln -likelihood of V , β X and β Y given the observed data ( X , Y , WX , WY ) is, t 1 ln L(V, β X , β Y | X, Y, WX , WY ) = − ln(2π ) − | V | −∆ T V −1∆ 2 2 n

 X − WX β X  .   Y − WY β Y 

where t = ∑ ni and ∆ =  i

(7)

Maximum–likelihood estimates can be obtained via

Fisher-scoring algorithm implemented in each sibship and/or pedigree (Lange and Boehnke, 8

1983). Once maximum–likelihood estimates are available, one can test hypotheses of interest by the likelihood ratio (LR) statistic,

LR = −2 ln

L(σ axy = 0) L(σˆ axy )

(8)

This test statistic is approximately χ 2 –distributed with one degree of freedom. We can test the null hypothesis σ axy = 0 . If σ axy = 0 is accepted, one can conclude that there is no significant relationship between the gene–expression level X and trait value Y. Simulation study A series of simulations were carried out to examine the effects of different factors on the performance of these two methods, in comparison with the FEXAT. To investigate the effects of sample sizes, we simulated three samples of 48, 96 and 192 subjects, respectively. The first sample consisted of 4 families of size 12 (4×12. Throughout in this paper, the first number is for families and the second number for subjects within each family); the second sample consisted of 8 families of size 12 (8×12); the third sample consisted of 16 families of size 12 (16×12). To study the effects of family structures on these three methods, we simulated three family structures: two of them were pedigrees and another was sibship. Sibship structure is a group of full–sibs without parents. The pedigree structures are illustrated in Figure 1. According to the definition of the FEXAT (Kraft et al. 2003), a large sibship of size 8 in each pedigree was used to calculate the statistic FEXAT in pedigree type A, whereas in pedigree type B, the complex pedigree should be broken into three sibships to calculate the statistic FEXAT. To study the effects of family sizes on these three statistical methods, we simulated another sample with 24 sibships of sizes of 4 (24×4) to compare it with the situations of 8 sibships of sizes of 12 (8×12) under the same sample size. For simplicity, we considered a case in the simulation studies of

9

Kraft et al. (2003), i.e., the ratio of variance in family means to variance in within–family differences is 1:1. Our simulation results (not shown) indicated that this ratio has little effects on the comparison of the three methods. Total variation of gene–expression levels and trait values were fixed at 1. We studied four levels of genetic correlation between gene–expression levels and trait values (0.0, 0.3, 0.5 and 0.7) and three heritability levels for both the gene–expression levels and trait values (0.2, 0.4, and 0.6), respectively. In addition, we considered two situations about covariates: (1) There were no covariate effects, i.e., β cX = β cY = 0 where β cX and β cY are the regression coefficients of the gene–expression levels and trait values on a specific covariate c, respectively. (2) There were covariate effects on the gene–expression levels and trait values. We assume that β cX = β cY = 0.2 and covariate effects were drawn from a standardized normal distribution. When there were covariate effects on the gene–expression levels and trait values, the computation of the FEXAT statistic used the adjusted (for the covariate) gene–expression levels and trait values that were implemented in the standard regression analyses. We also assumed that the family–mean correlation between gene–expression levels and trait values is 0.3, which may be due to confounding factors such as population stratification (Kraft et al. 2003). Although our simulation results are presented for the above assumed parameter values, our results for other parameter values (not shown here) indicated similar conclusions obtained with those presented here. For each scenario, we generated 10,000 replicate studies and calculated the FEXAT, FEXAT(R) and LR test statistics for under two nominal significance levels of α = 0.05 and α = 0.01, respectively. The statistical power and type I error rate are used as criteria to compare the relative performance of these three statistics in quantifying the correlation between gene– expression levels and trait values. Power refers to the probability of declaring a statistical significance when a true correlation exists. Type I error rate is the probability of declaring that 10

gene–expression levels are correlated with trait values when there is no relationship between them, i.e., σ axy = 0 . More detailed descriptions of these test statistics examined are provided in the Methods section.

Results The type I error rate and power estimates over 10,000 replicated simulations are summarized in Tables 1–3. All the three methods are not sensitive to the correlation between the family–mean expression levels and trait values. That is, they are robust to population stratification (that can generate correlation between family–mean expression levels and trait values, Kraft et al., 2003) in association analyses between gene–expression levels and trait values. The estimated type I errors of the FEXAT(R) are slightly conservative and those of the LR are slightly inflated. The type I errors of the FEXAT vary depending upon family structures (see below). The heritabilities of the gene expression levels and trait values have no significant effect on type I errors of the three methods. As expected, the statistical powers increase with increasing heritabilities for all of the three methods. The statistic LR has higher power than the other two methods in most cases. Comparisons of results in Tables 1–3 show that type I errors of the LR approaches nominal levels with increasing sample sizes. For example, when the sample size of sibships increases from 4×12 (4 sibships each with 12 sibs) to 16×12 (16 sibships each with 12 sibs) and heritability h2=0.4, the estimated type I errors of the LR decrease from 12.7% to 5.3% and from 5.7% to 0.9% under the nominal significance levels of α = 0.05 and α = 0.01, respectively. In contrast, the reverse tendency is observed for the FEXAT. When the sample size increases, the

11

estimated type I errors of the FEXAT have a slight inflation especially when α = 0.01. As expected, the statistical powers increase as the sample size increases for all of the three methods. Pedigree structure affects type I error rates and powers (Tables 1–3). The FEXAT performs relatively unsatisfactorily in multigeneration complex pedigree data. Compared with the results from the same sample size and family size but with different family structures, the estimated powers of the FEXAT are much lower in complex pedigrees than in sibships, and the difference is more pronounced with large sample sizes. In addition, the type I errors of the FEXAT depart far above the nominal levels for those pedigrees with small sibships. This is largely because (1) the FEXAT only uses the data of sibships by breaking large pedigree into multiple sibs and may lose information, while the FEXAT(R) and LR use full data in pedigrees. (2) The FEXAT does not consider the biological relationship (covariance due to polygene effects), while the FEXAT(R) and LR methods accounted for the biological relationships of family members. For example, with a sample size of 16×12 (16 pedigrees each with 12 members), heritabilities of 0.4 (for both gene expression levels and trait values) and an expression–trait correlation of 0.7, the estimated powers of the FEXAT are 89.7%, 47.6% and 29.7% for the family structures of sibship, pedigree A and pedigree B, respectively. The type I error rate of the FEXAT is 10.4% with a sample size of 16×12 and heritabilities of 0.4 under the family structure of pedigree B. In contrast, the type I error rates of the LR and FEXAT(R) are reasonably robust to family structures and their statistical powers vary to a less extent than that of the FEXAT. Taking into account of both type I error rate and statistical power, we present a summary of the relative performance of the three methods under different situations in terms of sample sizes and family structures in Table 4. Table 4 also provides a recommendation for choosing different statistics in practical data analysis. This generalization is made under a

12

prerequisite of a large family size (e.g., >12). If the family size is small, the FEXAT(R) is over conservative along with lower powers (Table 5). Table 6 illustrates the results when a covariate had effects on both the gene–expression levels and trait values under study. Because both of the LR and FEXAT(R) accommodate covariate effects in the mean value structure of the models, they can correctly estimate the regression coefficient of the gene–expression levels and trait values on covariates. A comparison of Tables 2 and 6 shows that both of the LR and FEXAT(R) yield nearly identical type I error rates and powers whether there are covariate effects or not. The statistic FEXAT per se cannot account for covariate effects. When adjusting gene–expression levels and trait values that can be implemented in the standard regression analyses, the FEXAT can also obtain similar powers to those when covariate effects are absent, however, with slightly inflated type I error rates. (Tables 2 and 6) The above results are obtained from samples with large family sizes (Table 6). If the family size is small, with the same total sample size, the FEXAT has relatively larger inflated type I errors (Table 5). In comparison, the LR and FEXAT(R) are reasonably robust to family size regardless of the presence of covariate effects (Tables 2, 5 and 6).

Discussion In microarray experiments, measures of gene–expression levels are available for members of a set of families (Shannon et al. 2002; Schadet et al. 2003). Thus, standard methods of quantitative genetics can be applied to the family–based microarray data analyses. This may facilitate exploration of the association of gene expressions with multifactorial phenotypes of interest. Family–based designs have the advantages of being robust to population stratification. Recently, Kraft et al. (2003) proposed a stratified family expression association test (FEXAT) to quantify the relationship between gene–expression levels and clinical traits. Stimulated by their

13

work, we presented a modified FEXAT statistic and a bivariate analysis based on a variance– components approach for such experimental data analyses. Kraft et al. (2003) showed that the observed type I errors for the FEXAT were somewhat conservative especially at stringent nominal levels in their simulations. However, in our simulations, the FEXAT tends to have slightly inflated type I errors, especially under a large sample size. As indicated in Kraft et al.(2003), the FEXAT was more favorable for large family sizes than for small family sizes (see Tables 1 and 2 of Kraft et al. 2003); but the reverse trend was observed in the present study (see Tables 2 and 4 in the present study). This is presumably due to different simulation designs between the two studies. In the present study, we took into account of the biological relationship (covariances due to polygenes or shared environmental covariates) between subjects within the same families when simulating phenotypic data. In the study of Kraft et al. (2003), data for family members (siblings) appeared to be drawn independently from a normal distribution. Apparently, the former design is closer to the nature of quantitative–trait inheritance (Lynch and Walsh, 1997). It seems that the FEXAT is favored under small family size and/or subjects with distant relationships. Dividing extended pedigrees into multiple sibships, as suggested by Kraft et al. (2003), and ignoring covariances among family members greatly decreases the statistical powers and inflates type I errors of the FEXAT as shown in our simulations. All of the three methods are robustness to population stratification that may result in correlation in family means and yield spurious association results (Kraft et al. 2003). The FEXAT(R) seems to be slightly conservative at the cost of a decrease in the powers, while the type I error rates of the LR statistic are slightly higher than nominal levels especially with small sample size. The LR has higher powers than the FEXAT and FEXAT(R) in most cases, except that for sibships and large samples where the FEXAT(R) has the highest power. Both of the LR

14

and FEXAT(R) are favored by increasing sample sizes regardless of the family structures. This is particularly important given that the decreasing costs of microarray experiments makes it practical for gene expression study on a large scale in complex pedigrees. As shown in the results, the FEXAT is preferable to the FEXAT(R) and LR statistics under small sample sizes and for sibships, while the performances and the preference of the three statistics are reverse for moderate to large sample sizes and/or pedigrees. It is known that maximum–likelihood estimates of the variance and covariance components may be biased when only a small number of observations are considered (Hopper and Mathews 1982; Amos et al. 1996). As the sample size increases, the biases in maximum– likelihood estimates of variance and covariance components should be reduced; thus, the type I error rates of the LR statistic should approach their nominal levels (Amos et al. 1996) as reflected in our results for the LR method. We examined the empirical distribution of LR statistic under the null hypothesis with different sample sizes. We simulated 10,000 data set using pedigree A as described in the Simulation study, and in each data set the LR statistic was calculated and shown in histograms (Figure 2). The LR statistic follows the Chi-square distribution with one degree of freedom asymptotically. However, this statistic has a slightly heavy tail under small sample sizes. The methods proposed in the present study can also be used to characterize the genetic correlation between multiple gene–expression levels. Pairwise analysis of multiple gene expression profiles may provide a prediction of joint expression and regulation of these genes. Due to the high dimensional gene–expression data from microarrays, multiple testing problems may arise in the trait–expression and/or expression–expression association studies. An initial data reduction to focus on those genes whose expressions varied nontrivially across samples is necessary before the tests are performed. Additionally, systematic measurement errors may

15

contribute to the covariation between different gene–expression levels, which may lead to spurious association results. In a flexible variance–components framework, modeling such covariation as a covariance component in our analyses is straightforward. This may minimize bias in analyses. Variance–components methods make a critical assumption that the quantitative trait values within a family either follow, or can be transformed to follow, a multivariate normal distribution (Amos et al. 1996). Permutation tests can be performed in situations, particularly for small sample sizes, in which the violation of multivariate normality assumption is difficult to detect. Simulations showed that permutation tests warrant correct type I errors under violations of multivariate normality assumption (Abecasis et al. 2000). The procedure of reshuffling the original data is similar to that proposed by Kraft et al. (2003). Empirical P values can be computed by comparing the observed statistic with the permuted statistics under the null hypothesis of no trait–expression association. We simulated 10,000 data set using pedigree A and each data set was resampled for 10,000 times to obtain the empirical P value of the FEXAT(R). We found that the FEXAT(R) using permutation test performs slightly better than that using Chi-square test (Table 7). However, the permutation test is computationally demanding especially for gene-expression data. Typical microarray experiments aim at characterizing differential gene–expression patterns under distinct treatments. Quantifying trait–expression and/or expression–expression associations at the population level may represent another growing tide toward high–throughput microarray data analyses. In the latter analysis, a large number of candidate genes for complex diseases can be reduced by restricting attention to genes whose expression levels show associations with complex trait values. In the near future, remarkable advances in microarray technology may greatly decrease the experimental costs, making it practical to perform

16

microarray experiments in large sample sizes with large pedigrees. This has been exemplified by the current burgeoning of large–scale whole genome linkage scans for complex traits, which were hampered by the prohibitory costs just about a decade ago. It is anticipated that the methods based on the variance–components approach which is advantageous under large sample sizes will be a good way to measure quantitatively the relationship between gene expressions and clinical traits in general pedigrees. Note: the program for implementing the methods investigated here is available, upon request, to Y. L. ([email protected]) or H.W.D.

17

Acknowledgements The study was partially supported by grants from Health Future Foundation, NIH, the State of Nebraska LB595 and LB692. The investigator H.W.D. was partially benefited by support from Hunan Province, Chinese National Science Foundation, Huo Ying Dong Education Foundation, the Cheng Kong scholar program and the Ministry of Education of P. R. China.

18

Literature Cited Abecasis, G.R., Cardon, L.R. and Cookson, W.O. 2000. A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66: 279-292. Amos, C.I., Zhu, D.K. and Boerwinkle, E. 1996. Assessing genetic linkage and association with robust components of variance approaches. Ann Hum Genet 60:143-160. Bakay, M., Chen, Y.W., Borup, R., Zhao, P., Nagaraju, K. and Hoffman, E.P. 2002. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformatics 3:4. Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. 2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296: 752–755. Chakravarti, A. 1999. Population genetics-making sense out of sequence. Nature Genetics 21: 56-60. Cheung, V.G. and Spielman, R.S. 2002. The genetics of variation in gene expression. Nature Genetics 32: 522-525. Deng, H.W. 2001. Population admixture may appear to mask, change or reverse genetic effects of genes underlying complex traits. Genetics 159: 1319-1323. Deng, H.W., Xu, F.H., Huang, Q.Y., Shen, H., Deng, H., Conway, T., Liu, Y.J., Liu, Y.Z., Li, J.L., Zhang, H.T., Davies, K.M. and Recker, R.R. 2002. A whole-genome linkage scan suggests several genomic regions potentially containing quantitative trait Loci for osteoporosis. J Clin Endocrinol Metab 87: 5151-5159. Gibson, G. 2003. Population genomics: celebrating individual expression. Heredity 90:1-2. Hopper, J.L. and Mathews, J.D. 1982. Extensions to multivariate normal models for pedigree analysis. Ann Hum Genet 46: 373-383. Jin, W., Riley, R.M., Wolfinger, R.D., White, K.P., Passador-Gurgel, G. and Gibson, G. 2001. 19

The contributions of to transcriptional variance in Drosophila melanogaster. Nature Genetics 29: 389-395. Kerr, M.K. and Churchill, G.A. 2001. Statistical design and the analysis of gene expression microarray data. Genetical Research 77: 123-128. Kraft, P., Schadt, E., Aten, J. and Horvath, S. 2003. A family-based test for correlation between gene expression and trait values. Am J Hum Genet 72: 1323-1330. Lange, K. and Boehnke, M. 1983. Extensions to pedigree analysis. IV. Covariance components models for multivariate traits. Am J Med Genet 14: 513-524. Lange, K. 1997. Mathematical and Statistical Methods for Genetic Analysis. Springer-Verlag, New York. Li, J., Risch, N. and Myers, R.M. 2003. Heritability of gene expression in humans: A study of lymphoblastoid cell lines from twins. Am J Hum Genet 73: S207. Lynch, M. and Walsh, B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Assocs., Inc., Sunderland, MA. Nguyen, D.V., Arpat, A.B., Wang, N. and Carroll, R.J. 2002. DNA microarray experiments: Biological and technological aspects. Biometrics 58: 701-717. Nogalska, A. and Swierczynski, J. 2001. The age-related differences in obese and fatty acid synthase gene expression in white adipose tissue of rat. Biochimica et Biophysica Acta 1533: 73-80. Roth, S.M., Ferrell, R.E., Peters, D.G., Metter, E.J., Hurley, B.F. and Rogers, M.A. 2002. Influence of age, sex, and strength training on human muscle gene expression determined by microarray. Physiological Genomics 10:181-190. Schadt, E.E., Monks, S., Drake, T., Lusis, A., Che, N., Colinayo, V., Ruff, T., Milligan, S., Lamb, J., Cavet, G., Linsley, P., Mao, M., Stoughton, R. and Friend, S. 2003. The genetics of

20

gene expression surveyed in maize, mouse and man. Nature 422: 297-302. Shannon, W.D., Watson, M.A., Perry, A. and Rich, K. 2002. Mantel statistics to correlate gene expression levels from microarrays with clinical covariates. Genetic Epidemiology 23: 87-96. Wolfinger, R.D., Gibson, G., Wolfinger, E., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R.S. 2001. Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8: 625-637. Yang, W.S., Lee, W.J., Huang, K.C., Lee, K.C., Chao, C.L., Chen, C.L., Tai, T.Y. and Chuang, L.M. 2003. mRNA Levels of the Insulin-Signaling Molecule SORBS1 in the Adipose Depots of Nondiabetic Women. Obesity Research 11: 586-590. Yang, Y.H. and Speed, T. 2002. Design issues for cDNA microarray experiments. Nature Reviews 3: 579-588.

21

Figure Legends

Figure 1 Two types of pedigree structures in simulations Figure 2 Histograms of the statistic LR for the data produced under the null hypothesis. Simulation was performed under the null hypothesis as described in the section on Simulation using pedigree A. A) sample size 4×12, h2=0.2 and β cX = β cY =0.0; B) sample size 8×12, h2=0.2 and β cX = β cY =0.0; and C) sample size 16×12, h2=0.2 and β cX = β cY =0.0. The histograms of the statistic are shown with bars. The probability density function of the χ 2 distribution with one degree of freedom is shown with curves.

22

A

B

Figure 1

23

1.0

1.5

2.0

2.5

C

0.5

0.0

0.5

1.0

1.5

Density 2.0

2.5

B

0.0

Density 0.0

0.5

1.0

1.5

Density 2.0

2.5

A

0

0

0

5 10

LR

5 10

LR

5

10

LR

Figure 2

24

15 20

15 20

15

20

Table 1 Type I error rates and powers under different heritabilities (for both the gene expression levels and the trait values) with a sample size of 4×12

ρ axy Sibship h2=0.2 0.0* 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7 Pedigree A h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7 Pedigree B h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7

FEXAT

FEXAT(R)

LR

α =0.05

α =0.01

α =0.05

α =0.01

α =0.05

α =0.01

0.050 0.056 0.065 0.079

0.010 0.010 0.012 0.017

0.044 0.050 0.063 0.076

0.006 0.008 0.011 0.015

0.124 0.134 0.146 0.163

0.054 0.060 0.069 0.079

0.051 0.076 0.125 0.205

0.009 0.017 0.035 0.064

0.042 0.074 0.125 0.200

0.006 0.014 0.031 0.059

0.127 0.154 0.212 0.300

0.057 0.070 0.101 0.155

0.050 0.134 0.292 0.528

0.008 0.036 0.107 0.257

0.042 0.126 0.274 0.491

0.006 0.032 0.095 0.227

0.117 0.210 0.375 0.581

0.049 0.100 0.199 0.376

0.050 0.052 0.057 0.066

0.008 0.010 0.011 0.012

0.045 0.050 0.063 0.083

0.007 0.011 0.013 0.018

0.089 0.100 0.110 0.135

0.033 0.039 0.044 0.052

0.052 0.066 0.095 0.139

0.008 0.012 0.022 0.036

0.041 0.078 0.127 0.226

0.008 0.017 0.029 0.070

0.092 0.130 0.204 0.322

0.034 0.056 0.094 0.157

0.051 0.100 0.194 0.347

0.008 0.023 0.055 0.123

0.038 0.140 0.284 0.508

0.006 0.037 0.102 0.240

0.086 0.205 0.382 0.619

0.033 0.084 0.200 0.405

0.099 0.102 0.110 0.120

0.026 0.027 0.031 0.034

0.040 0.056 0.076 0.092

0.006 0.012 0.015 0.022

0.085 0.101 0.123 0.157

0.030 0.038 0.046 0.062

0.100 0.118 0.149 0.191

0.026 0.033 0.046 0.066

0.046 0.087 0.146 0.255

0.008 0.020 0.039 0.083

0.104 0.149 0.240 0.390

0.037 0.061 0.107 0.206

0.098 0.150 0.243 0.375

0.025 0.048 0.093 0.169

0.044 0.131 0.312 0.559

0.008 0.034 0.110 0.273

0.099 0.215 0.435 0.706

0.037 0.089 0.242 0.494

Note: In Tables 1–3 and 5–7, when ρ axy = 0.0, which indicates no correlation between expression levels and trait values, the corresponding data are for type I error rates. When ρ axy ≠ 0.0, the corresponding data are for statistical powers.

25

Table 2 Type I error rates and powers under different heritabilities with a sample size of 8×12

ρ axy Sibship h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7 Pedigree A h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7 Pedigree B h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7

FEXAT

FEXAT(R)

LR

α =0.05

α =0.01

α =0.05

α =0.01

α =0.05

α =0.01

0.052 0.062 0.081 0.110

0.008 0.011 0.019 0.031

0.042 0.063 0.088 0.119

0.006 0.011 0.019 0.032

0.077 0.089 0.111 0.144

0.023 0.028 0.036 0.053

0.051 0.109 0.213 0.381

0.009 0.029 0.075 0.169

0.041 0.111 0.220 0.390

0.006 0.027 0.075 0.163

0.074 0.138 0.249 0.424

0.021 0.048 0.109 0.216

0.050 0.223 0.537 0.841

0.009 0.081 0.281 0.627

0.041 0.219 0.527 0.824

0.025 0.076 0.261 0.593

0.067 0.252 0.572 0.848

0.020 0.110 0.330 0.669

0.048 0.055 0.066 0.085

0.007 0.009 0.012 0.018

0.040 0.064 0.097 0.135

0.008 0.014 0.025 0.034

0.066 0.079 0.111 0.159

0.017 0.022 0.035 0.056

0.049 0.082 0.148 0.253

0.009 0.018 0.043 0.087

0.045 0.116 0.238 0.409

0.008 0.028 0.076 0.177

0.070 0.138 0.288 0.500

0.019 0.047 0.127 0.275

0.049 0.155 0.365 0.639

0.008 0.046 0.147 0.368

0.042 0.235 0.556 0.839

0.007 0.078 0.288 0.613

0.065 0.271 0.607 0.885

0.017 0.112 0.372 0.732

0.100 0.105 0.119 0.135

0.026 0.031 0.037 0.046

0.043 0.066 0.100 0.155

0.008 0.012 0.024 0.042

0.066 0.087 0.137 0.213

0.016 0.026 0.044 0.079

0.100 0.134 0.197 0.293

0.026 0.045 0.075 0.129

0.046 0.130 0.270 0.479

0.007 0.035 0.099 0.223

0.066 0.165 0.359 0.612

0.018 0.057 0.171 0.372

0.101 0.202 0.384 0.619

0.026 0.080 0.191 0.381

0.043 0.245 0.573 0.874

0.008 0.080 0.306 0.665

0.061 0.294 0.660 0.932

0.016 0.125 0.425 0.812

26

Table 3 Type I error rates and powers under different heritabilities with a sample size of 16×12


FEXAT

FEXAT(R)

LR

α =0.05

α =0.01

α =0.05

α =0.01

α =0.05

0.051 0.096 0.189 0.322

0.009 0.026 0.064 0.137

0.046 0.125 0.228 0.372

0.009 0.033 0.080 0.161

0.049 0.093 0.184 0.315

0.008 0.025 0.063 0.135

0.054 0.273 0.618 0.897

0.010 0.108 0.366 0.733

0.046 0.306 0.655 0.909

0.008 0.125 0.390 0.746

0.053 0.269 0.613 0.896

0.009 0.108 0.366 0.734

0.055 0.571 0.954 1.000

0.010 0.330 0.857 0.997

0.048 0.600 0.957 1.000

0.008 0.340 0.855 0.996

0.055 0.569 0.953 1.000

0.010 0.332 0.857 0.997

0.050 0.065 0.091 0.134

0.010 0.013 0.022 0.038

0.043 0.088 0.142 0.241

0.008 0.022 0.041 0.087

0.054 0.084 0.151 0.257

0.013 0.024 0.049 0.102

0.051 0.128 0.275 0.476

0.010 0.037 0.103 0.240

0.044 0.201 0.446 0.724

0.009 0.068 0.209 0.473

0.057 0.205 0.484 0.777

0.013 0.072 0.257 0.560

0.052 0.283 0.644 0.917

0.010 0.113 0.392 0.767

0.040 0.434 0.853 0.992

0.008 0.203 0.648 0.949

0.057 0.440 0.866 0.996

0.012 0.219 0.694 0.972

0.106 0.119 0.144 0.185

0.030 0.037 0.052 0.071

0.044 0.095 0.174 0.287

0.007 0.026 0.052 0.106

0.055 0.107 0.206 0.355

0.012 0.029 0.075 0.163

0.104 0.141 0.206 0.297

0.030 0.046 0.078 0.134

0.046 0.230 0.505 0.815

0.009 0.078 0.254 0.578

0.057 0.255 0.588 0.884

0.012 0.101 0.349 0.713

0.105 0.210 0.390 0.612

0.030 0.081 0.198 0.386

0.044 0.453 0.871 0.903

0.007 0.218 0.667 0.964

0.054 0.488 0.916 0.999

0.012 0.264 0.776 0.989

.

27

α =0.01

Table 4 Summary of the performances of the three methods under different situations Sample size Family structure

Favorable method(s)

Small

Sibship

FEXAT and FEXAT(R)

Large

Sibship

FEXAT(R), FEXAT and LR

Small

Pedigree

FEXAT(R)

Large

Pedigree

LR

28

Table 5 Type I error rates and powers under different heritabilities with sibships 24×4

ρ axy β cX = β cY =0.0 h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7 β cX = β cY =0.2 h2=0.2 0.0 0.3 0.5 0.7 h2=0.4 0.0 0.3 0.5 0.7 h2=0.6 0.0 0.3 0.5 0.7

FEXAT

FEXAT(R)

LR

α =0.05

α =0.01

α =0.05

α =0.01

α =0.05

0.050 0.056 0.073 0.103

0.009 0.012 0.018 0.025

0.023 0.041 0.061 0.085

0.003 0.006 0.011 0.016

0.075 0.088 0.110 0.146

0.020 0.025 0.033 0.049

0.049 0.096 0.191 0.340

0.010 0.024 0.060 0.132

0.021 0.068 0.147 0.277

0.003 0.012 0.033 0.077

0.070 0.127 0.247 0.412

0.019 0.044 0.100 0.211

0.049 0.197 0.477 0.785

0.010 0.064 0.226 0.529

0.021 0.131 0.362 0.677

0.003 0.028 0.119 0.353

0.064 0.238 0.533 0.823

0.016 0.098 0.305 0.630

0.106 0.116 0.135 0.161

0.033 0.039 0.046 0.057

0.024 0.042 0.062 0.086

0.004 0.006 0.010 0.016

0.076 0.087 0.111 0.146

0.021 0.025 0.034 0.049

0.104 0.155 0.243 0.372

0.031 0.056 0.100 0.179

0.022 0.068 0.147 0.275

0.003 0.012 0.033 0.078

0.071 0.129 0.246 0.410

0.021 0.042 0.100 0.209

0.102 0.250 0.487 0.738

0.031 0.107 0.270 0.522

0.022 0.132 0.363 0.677

0.003 0.028 0.120 0.352

0.067 0.239 0.534 0.824

0.017 0.097 0.304 0.629

29

α =0.01

Table 6 Effects of covariates on type I error rates and powers with a sample size of 8×12


FEXAT

FEXAT(R)

LR

α =0.05

α =0.01

α =0.05

α =0.01

α =0.05

α =0.01

0.061 0.074 0.094 0.124

0.013 0.016 0.024 0.037

0.042 0.063 0.088 0.121

0.005 0.012 0.019 0.032

0.078 0.091 0.112 0.145

0.024 0.026 0.035 0.054

0.062 0.119 0.228 0.383

0.013 0.033 0.087 0.183

0.041 0.113 0.220 0.389

0.005 0.028 0.076 0.162

0.073 0.137 0.247 0.424

0.021 0.048 0.110 0.216

0.062 0.237 0.532 0.827

0.012 0.093 0.293 0.621

0.040 0.218 0.529 0.827

0.006 0.077 0.262 0.596

0.068 0.252 0.571 0.846

0.019 0.111 0.331 0.665

0.053 0.066 0.082 0.105

0.011 0.015 0.019 0.026

0.051 0.059 0.088 0.145

0.009 0.011 0.021 0.039

0.073 0.079 0.109 0.165

0.021 0.023 0.033 0.056

0.054 0.100 0.167 0.271

0.011 0.023 0.048 0.096

0.041 0.131 0.229 0.423

0.007 0.035 0.077 0.188

0.067 0.144 0.284 0.499

0.017 0.050 0.127 0.277

0.056 0.168 0.357 0.602

0.011 0.048 0.153 0.345

0.044 0.217 0.534 0.834

0.007 0.071 0.272 0.603

0.068 0.262 0.601 0.890

0.016 0.107 0.365 0.732

0.104 0.118 0.135 0.156

0.032 0.039 0.044 0.055

0.044 0.074 0.108 0.165

0.008 0.014 0.026 0.048

0.062 0.089 0.137 0.215

0.015 0.024 0.046 0.080

0.108 0.151 0.213 0.303

0.030 0.052 0.088 0.142

0.048 0.133 0.286 0.484

0.008 0.039 0.103 0.228

0.068 0.168 0.361 0.614

0.018 0.060 0.174 0.380

0.105 0.211 0.382 0.602

0.031 0.087 0.191 0.373

0.043 0.255 0.571 0.872

0.008 0.087 0.308 0.662

0.061 0.298 0.666 0.934

0.015 0.127 0.429 0.817

Note: β cX = β cY =0.2, β cX and β cY denote the regression coefficients of the gene–expression levels and trait values on a specific covariate c, respectively.

30

Lu et al., 05/25/04

Table 7 Comparison the performance of FEXAT( R) by Chi-square and permutation procedure*

ρ axy 0.0 0.3 0.5 0.7

Chi-square test α =0.01

α =0.05 0.045 0.116 0.238 0.409

0.008 0.028 0.076 0.177

Note: h2=0.4, pedigree A with a sample size of 8×12.

31

Permutation test α =0.01

α =0.05 0.051 0.139 0.290 0.521

0.010 0.042 0.125 0.296