Excess variants in AFF2 detected by massively parallel sequencing of ...

HMG Advance Access published July 10, 2012 Human Molecular Genetics, 2012 doi:10.1093/hmg/dds267

1–9

Excess variants in AFF2 detected by massively parallel sequencing of males with autism spectrum disorder Kajari Mondal, Dhanya Ramachandran, Viren C. Patel, Katie R. Hagen, Promita Bose, David J. Cutler and Michael E. Zwick∗ Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA Received February 6, 2012; Revised June 13, 2012; Accepted June 29, 2012

INTRODUCTION The recent surge of advances in second-generation sequencing technologies and better methods of targeted enrichment have made the detection of a more complete spectrum of genetic variation increasingly feasible (1,2). There is keen interest in applying these approaches to uncover the genetic basis of polygenic complex human disease, including autism (OMIM 209850), a childhood-onset disorder characterized by impaired social interactions, abnormal verbal communication, restricted interests and repetitive behaviors. Autism has an estimated prevalence of 1% (3,4), and one of its most striking epidemiological features is a 4-fold excess of affected males. Autism, or the broader autism spectrum disorder (ASD) phenotype, is an example of a highly heterogenous, multifactorial disorder with substantial heritability (5 – 8, reviewed in 9). To date, more than 100 different genes and genomic regions have been linked to this complex trait (reviewed in 10,11). Despite this, most of the genetic risk for ASD remains unexplained. Recent studies exploring ASD genetics generally adopt one of

two study designs. The first employs genome-wide association studies, which have identified a few loci of interest, but largely failed to replicate findings between studies (12– 14). A meta-analysis of these studies, with over 2500 study subjects, reveals it is extremely unlikely that there is any common variant influencing autism susceptibility with an odds ratio of .1.5 (15). The second design focuses on large but very rare (frequency usually less than 1 in a thousand in the general population) de novo and inherited copy number variants (CNVs). Numerous studies have now shown convincingly that this class of rare variation makes a significant contribution to autism susceptibility (16– 27), explaining up to 15% of all ASD cases. Unfortunately, these studies point to a highly heterogenous allelic architecture, as no single risk variant is found in more than 1% of surveyed cases. Overall, although genetic studies have uncovered a large number of candidate loci, much ASD heritability remains unexplained. Like other disorders with a male preponderance, there is evidence that mutations at X-linked loci in hemizygous males contribute to the observed ASD sex bias (24,28– 30). Here, we

∗ To whom correspondence should be addressed at: Department of Human Genetics, Emory University, Whitehead Biomedical Research Building, Suite 301, Atlanta, GA 30322, USA. Tel: +1 4047279924; Fax: +1 4047273949; Email: [email protected]

# The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Downloaded from http://hmg.oxfordjournals.org/ at Emory University on July 11, 2012

Autism spectrum disorder (ASD) is a heterogeneous disorder with substantial heritability, most of which is unexplained. ASD has a population prevalence of one percent and affects four times as many males as females. Patients with fragile X E (FRAXE) intellectual disability, which is caused by a silencing of the X-linked gene AFF2, display a number of ASD-like phenotypes. Duplications and deletions at the AFF2 locus have also been reported in cases with moderate intellectual disability and ASD. We hypothesized that other rare X-linked sequence variants at the AFF2 locus might contribute to ASD. We sequenced the AFF2 genomic region in 202 male ASD probands and found that 2.5% of males sequenced had missense mutations at highly conserved evolutionary sites. When compared with the frequency of missense mutations in 5545 X chromosomes from unaffected controls, we saw a statistically significant enrichment in patients with ASD (OR: 4.9; P < 0.014). In addition, we identified rare AFF2 3 ′ UTR variants at conserved sites which alter gene expression in a luciferase assay. These data suggest that rare variation in AFF2 may be a previously unrecognized ASD susceptibility locus and may help explain some of the male excess of ASD.

2

Human Molecular Genetics, 2012

RESULTS We sequenced the AFF2 locus in a sample of 202 males with a diagnosis of autism. A total of 127 of the cases were obtained from the multiplex AGRE, while the remaining 75 cases were from the SSC. We identified a total of 286 sites of variation, with 269 single nucleotide variants (SNVs) and 17 insertions or deletions (indels). Overall levels of variation were similar between the two data sets [Qw per site (39); AGRE 6.0 × 1024, SSC 6.7 × 1024], with an excess of rare variants as evidenced by a negative value for the Tajima’s D test statistics for both sets of samples [(40); AGRE 1.46, SSC 1.41]. For the SNVs, a total of 128, or 48%, had not been reported before. As

Figure 1. Summary of SNV and indel variation discovered at the AFF2 locus in males with ASD. The frequency of SNVs and indels (minor alleles) in cases is plotted against their level of evolutionary conservation. Most common variation has already been discovered and exists in public databases (blue; circles and diamonds). Most of the rare variation at AFF2 was discovered in our study and not contained in public databases (red; circles and diamonds). Table 1. Rare non-synonymous variants observed at the AFF2 locus Sample collection

Genome position (hg19)

Amino acid change

PhyloP

Inherited/ de novo

(SSC, 11251.p1) (AGRE, AU1016302) (SSC, 11554.p1) (AGRE, AU1284302) (SSC, 11012.p1)

148037554 148037715 148038084 148044334 148048494

S660T D714N R837C R927H I1030L

2.33 2.37 2.48 2.62 1.86

Inherited Inherited Inherited Inherited De novo

summarized in Figure 1, almost all common variation (.5% frequency in our population) is contained in dbSNP, whereas most rare variants (,5%) have not been cataloged in dbSNP. Our study is focused on rare variation, so we did not followup the 141 previously cataloged variants. Functional annotation of the remaining 128 SNVs revealed that 16 were located at sites with elevated evolutionary conservation (Supplementary Material, Table S1). Each of these variants was observed in a single case. To arrive at a better estimate of their population frequency, we genotyped 13 of the variants in a collection of 1400 unaffected male controls, obtained from the NIMH Human Genetics Initiative (Supplementary Material, Table S1). Only one intronic variant had a frequency .0.01, suggesting that the variants we identified are very rare in the general population. We next restricted our attention to missense mutations. In our cases, there were 5 (2.5% of total cases sequenced) singleton non-synonymous variants. We estimated that the population frequency for two of these variants was $0.001, with the other three having frequencies below 0.0001 (Supplementary Material, Table S1). We further noted that these mutations were all at highly conserved evolutionary sites (PhyloP .1, Table 1). To ask whether this distribution


further explore the hypothesis that rare mutations in X-linked genes contribute to ASD by performing targeted sequencing of the AFF2 (formerly FMR2) locus in two large autism cohorts. Our study is motivated by findings from FRAXE (OMIM #309548), a rare form of mild-to-moderate intellectual disability (ID) affecting 1 in 50 000 newborn males. Hallmarks of the FRAXE phenotype include learning difficulties, communication deficits, attention problems, hyperactivity and autistic behavior. The initial discovery and mapping of the FRAXE locus identified a CGG expansion upstream of AFF2 that silences the gene resulting in a complete loss of function. This simultaneously explained the ID and the presence of a chromosomal fragile site (31– 33). Large deletions that include both AFF2 and the adjacent FMR1 gene result in a severe ID (34 – 36), whereas deletions of the AFF2 locus only present with autism and mild ID (37). A duplication of the AFF2 locus has also been reported in a young male with ID (38). Thus, since complete loss of AFF2 leads to a mild non-syndromic form of ID often presenting with autistic features, we hypothesized that hypomorphic alleles with reduced function might act as autism susceptibility loci. If correct, our hypothesis predicts that: (i) males with autism will have an excess of rare variants, (ii) these rare variants will be found at evolutionarily conserved sites, and (iii) these variants will influence gene function. Because males are hemizygous for AFF2, whereas females are diploid, rare variants at this locus could help explain the preponderance of males with ASD, if those damaging rare variants are largely recessive in nature. Here, we report the results of a targeted second-generation sequencing study of the AFF2 locus in male ASD patients ascertained using different sampling designs from the Autism Genetic Resource Exchange (AGRE) and the Simons Simplex Collection (SSC). As predicted, we see an overrepresentation of rare variants, and missense mutations in particular, in ASD cases, with a more than 4-fold increase in ASD risk among carriers of rare AFF2 protein-coding changes. We observe both inherited and de novo risk alleles in our sample. Furthermore, we also identify two rare 3′ UTR variants that appear to alter the expression of a reporter in a luciferase assay in a tissue specific fashion. Taken together, these data show that AFF2 is a susceptibility locus conferring increased risk for ASD in males. These data also suggest that rare coding sequence variations contribute to susceptibility for complex traits like autism, and such variations may account for some of the high heritability for these disorders.


3

Table 2. Odds ratio (OR) for missense mutations at AFF2 for males with ASD when compared with unaffected controls Study

Present study (n ¼ 202) Control X chromosomes (n ¼ 5545)

PhyloP .1 Missense variants (PhyloP .1)

Odds ratio versus controls (95% CI)

P-value

PhyloP .2 Missense variants (PhyloP .2)

Odds ratio versus controls (95% CI)

P-value

5 33

4.2 (1.28–11.09) Reference

0.010 N/A

4 23

4.9 (1.21–14.37) Reference

0.014 N/A

was unusual, we compared our case data to sequence data from 5545 X chromosomes of European American controls, obtained from the Exome Variant Server from the NHLBI Exome Sequencing Project. As shown in Table 2, for variants with PhyloP scores .1.0 (hence the top 10% of conserved sites in the human genome), there is a 4.2-fold excess of mutations in ASD cases (95% CI: 1.28 – 11.09, P-value 0.01). If we further restrict our analysis to sites with PhyloP scores .2, there is a 4.9-fold excess of rare variants in ASD cases (95% CI: 1.21 – 14.37, P-value 0.014, Table 2). One potential explanation for our finding is that the sequenced cases are poorly matched with the controls, and the cases are systemically more variable at many or all X chromosome loci. To test this hypothesis, we sequenced a subset of our cases (75 SSC males) for virtually all coding loci on the X (1006 loci total). For each locus, we performed the same analysis as we did at AFF2, asking whether the cases were enriched for mutations with PhyloP .1, or .2. Our analysis rejects the hypothesis that our cases are systematically more variable than the controls (Supplementary Material, Fig. S1). At the vast majority of loci, controls have the same or more variation than cases. Furthermore, AFF2 appears to be especially enriched for variation in our cases. Of the 1006 loci, it exhibits the 77th greatest enrichment for variation relative to controls for sites with PhyloP .1 (P , 0.08) and the 21st most enrichment for variants with a PhyloP .2

(P , 0.02). Because the 75 SSC males were a subset of the total samples used in this analysis, we also asked whether this subset shows evidence of population stratification relative to the whole case sample. To do so, we obtained whole genome association X chromosome genotype data for the AGRE case samples, and assessed stratification for all of our case samples (41– 43). We observed no evidence of systematic stratification between our SSC cases and AGRE cases (Supplementary Material, Fig. S2), suggesting that we can reject the hypothesis that a simple model of stratification can account for our findings. To confirm the patterns of segregation of putative AFF2 susceptibility variants, we genotyped the mother and available siblings for each of the five missense mutations to determine whether they showed segregation as expected (Fig. 2). For two of the missense mutations (R927C, D714N), we observed segregation from the mother to both affected sons. For another missense mutation (S660T), we observed transmission of the risk allele from the mother to the affected son, and transmission of the reference allele to the unaffected son (Fig. 2). Missense mutation R837C was transmitted from mother to affected son. For the AGRE cases, these patterns of transmission were expected because of the manner in which we selected our cases for sequencing. Interestingly, missense mutation I1030L in a SSC case arose as a de novo mutation in the male patient with autism, providing further evidence that AFF2 is an autism susceptibility locus.


Figure 2. Segregation pattern of the non-synonymous AFF2 variants in families from AGRE and the SSC ASD patient collections. Note that the I1030L variant is a de novo variant in the male with ASD.

4


Given that we observed a statistically significant excess of rare missense mutations in AFF2 in patients with autism, we next wondered whether, among the collection of rare variants we discovered, non-coding variants were also autism susceptibility variants. One way non-coding alleles could exert influence is by changing the expression of AFF2. To test this hypothesis, we performed a luciferase assay using constructs containing two of the three 3′ UTR variants and a third with the human genome reference sequence as a comparison in two different cell lines. No other sequence variants were present in the 3′ UTR. As shown in Figure 3, the 3′ UTR variant (ChrX:148076068[C.T]) showed a statistically significant reduction in gene expression in the luciferase assay performed in Human HEK293T cells (P , 0.02), while the 3′ UTR variant (ChrX:148075200[T.C]) did not show a statistically significant reduction in expression (P , 0.31). We replicated these experiments using Mouse Neuro 2A cells and both 3′ UTR variants showed a strong increase in expression relative to a construct containing the human genome reference sequence (ChrX:148076068[C.T], P , 0.03; ChrX:148075200[T.C], P , 0.02). Together, these data suggest these rare 3′ UTR variants could act to alter the expression of AFF2, perhaps in a tissue-specific manner, and provide further support for the role of AFF2 as an autism susceptibility locus.

DISCUSSION Uncovering the genetic basis of polygenic disorders remains a challenge for research. The idea that common genetic variation contributes significantly to common diseases has been a leading model in human genetics for the past 15 years

(44 –46); evaluating this model required the development of high-throughput genotyping platforms and a catalog of common human genetic variation as seen in the HapMap project (47,48). The ‘common-disease-common-variant’ model was bolstered by technology development, since direct sequencing was not a viable strategy. Therefore, systematic genotyping of common variants was perceived as the best way to begin to characterize the allelic architecture of complex human traits (49). In the time elapsed since, genome-wide association studies have shown that common variants with large effects are unlikely to exist in the human population for many disorders, although a large number of loci with alleles with much smaller ORs (,1.2) remains plausible (reviewed in 50). Following Haldane in the 1920s, population biologists have long been aware that deleterious alleles of large effect will be maintained only at very low frequencies in the general population (51). Most quantitative traits, including human diseases, show substantial heritability in most populations (52). This observation suggests a possible paradox arising from the belief that these traits are influenced by stabilizing selection for an optimum, or simple directional selection to remove disease alleles, both of which should act to eliminate genetic variation (52,53). One model to account for the maintenance of variation supposes that variation is maintained by a very large number of loci, each with rare alleles of exceedingly small effect (too small to ever be individually detectable in an experiment), and which are individually at mutation-selection balance frequency (54). A later analysis, in contrast, argued that even if mutation-selection balance maintains much variation, then there are likely to be rare alleles with appreciable effects (55), and/or alleles at intermediate frequency created by balancing selection due to their pleiotropic effects on


Figure 3. Histogram showing the level of expression for 3′ UTR variants in luciferase assays. Expression levels are normalized to a construct with the human genome reference sequence. The 3′ UTR variant (ChrX:148076068[C.T]) showed statistically significant reduction of gene expression in the luciferase assay performed in human HEK293T cells (P , 0.02), while the 3′ UTR variant (ChrX:148075200[T.C]) did not (P , 0.31). Repeating these experiments using mouse Neuro 2A cells resulted in both 3′ UTR variants showing a strong increase of expression relative to a construct containing the human genome reference sequence (ChrX:148076068[C.T], P , 0.03; ChrX:148075200[T.C], P , 0.02), suggesting that these variants influence gene expression in a tissue specific manner.


AFF2 gene has a common ancestor in Drosophila melanogaster, the Lilliputian. Inactivation of lilli generates a fly of reduced size. By analogy with AF4, AFF2 has been considered a putative transcription factor (64), with the N-terminal domain of the protein (amino acids 1 – 541) known to have transactivation activity (65). Four of our mutations (S660T, D714N, R837C,R927H) fall in the C-terminal domain of the protein (amino acids 633– 966) shown to be responsible for localization of the protein in nuclear speckles (64); none of our mutations fell into putative RNA-binding domains between amino acids 787 – 815 and 890– 917 (64). The careful functional characterization of these and other alleles of AFF2 should help elucidate novel pathways and candidates that, when disrupted, contribute to autism susceptibility.

MATERIALS AND METHODS Selection of samples We selected 127 males from the Autism Genetic Resource Exchange (AGRE) multiplex collection and 75 males from the Simons Foundation Autism Research Initiative (SFARI) Simplex Collection, New York, NY, USA (SSC) for target DNA amplification and DNA sequencing. From the AGRE collection, we chose multiplex families with two or more male affected sib-pairs who shared .99% of 76 genotyped single nucleotide polymorphisms (SNPs) in the AFF2 genomic region (66). One male was randomly chosen if both affected siblings were equally affected; otherwise, the male with autism was chosen over those boys with a diagnosis of not quite autism or broad spectrum. From the SSC collection, we chose only those boys who were described as autistic and not reported to have any other syndromes. From the SSC collection, we chose 75 male children from different families with a diagnosis of ASD (67). We required subjects to have at least one unaffected sibling with no history of ASD or ID based on medical history and/or pedigree evaluation. Parents showed no signs of a broader autism phenotype. Prior to processing, DNA samples were quantified by measuring OD260/280 using a NanoDrop instrument. Following quantification, 100 ng of DNA from each sample were run on a 0.8% agarose gel to verify the integrity of the genomic DNA. Long PCR-based amplification of AFF2 genomic region from AGRE samples We prepared target DNA for sequencing the AGRE samples by performing long PCR (LPCR) amplification of the AFF2 genomic region. Long-range PCR primers were designed using EmPrimer (http://primer.genetics.emory.edu) using the following selection criteria: length, 29– 32 bases; GC percentage, 45– 60; and Tm $ 688C. The LPCR primers used are contained within Supplementary Material, Table S2. Five hundred nanograms of DNA from each of the AGRE samples were aliquoted and PCR master mix [1× LA Taq buffer (TaKaRa Bio Inc., Otsu Shiga, JP), 250 mM dNTP (TaKaRa Bio Inc.), 400 nM of both forward and reverse LPCR primers and 0.1 U/ml of LA Taq (TaKaRa Bio Inc.) were added. If an amplicon had a high GC content, we used 1× GC Buffer (TaKaRa Bio Inc.) in place of 1× LA Taq buffer. PCR was


multiple phenotypes (52). In support of the former view, copy number variation studies have identified variants with a large effect size, having odds ratios often greater than 5 (16– 27); however, these variants are quite rare, often occurring much less often than 1 in 1000 in the general population, a frequency generally consistent with a large effect locus at mutation selection balance. It has been reported that in aggregate, large CNVs account for 15% of the cases of autism. Since WGA studies generally preclude the existence of rampant intermediate frequency loci of large effect, Turelli’s theory predicts the existence of loci with multiple rare alleles, and intermediate ORs should make a significant contribution to polygenic disorders such as autism. Our study of AFF2 has identified one such locus. In the 202 autistic males sequenced, we detected five rare replacement variants: S660T, D714N, R837C, R927H and I1030L. These variants appear in $2.5% of boys with a diagnosis of autism. One variant (I1030L) arose as a de novo event. We have also identified rare 3′ UTR sequence variants that appear to alter levels of gene expression in a luciferase assay in a tissue-specific manner. AFF2 missense mutations have also been seen in other males with neurodevelopmental disorders (56,57). The observation of deletions of just the AFF2 locus in a patient with autism and mild ID (37) and a duplication of the AFF2 locus in a young male with ID (38) provide further independent support for our findings. We do not claim that the variants we have identified are monogenic causes of autism. Instead, our data support the idea that AFF2 harbors susceptibility alleles that make modest contributions to the risk of ASD. Furthermore, because AFF2 is found on the X chromosome, recessive acting effects would be expected to preferentially influence the risk of ASD in males. Even so, the size of the effects we observed could be overestimated for at least two reasons. First, the observed excess of rare variation in the human genome might make statistical significance estimates more challenging than modeled here (58). Second, our sample size rendered this study underpowered for detecting very weak effects. Therefore, there is the real possibility that the true OR of AFF2 mutations might be even smaller than we have estimated due to ‘Winner’s Curse’ (59). Finally, we note that we did not perform a genome-wide study, and had we done so with only our sample size, the magnitude of the effect estimated here would not have survived a multiple test correction. Detecting effect sizes this small would require vastly larger numbers of samples for a whole genome study. While recent exome sequencing studies identified de novo variants that appear enriched in patients with ASD (60 – 63), it is worth noting that the variant we observed could have arisen by chance. Replication of our finding regarding AFF2 in a larger collection of individuals with ASD could help resolve a number of these potential issues and provide further support for our findings. Much remains to be discovered concerning the structure and function of the AFF2 protein. We know AFF2 is a large gene with 21 exons and 6 annotated isoforms with alternative splicing among exons 2, 3, 5 and 7 (31 –33). The longest of the AFF2 isoforms is composed of 1311 amino acids and contains 2 nuclear localization signal sequences (33). AFF2 belongs to a gene family that includes AF4, LAF4 and AF5q31. The

5

6


performed using the following parameters: denaturation at 948C for 2 min, 29 cycles of 948C for 10 s, 688C for 1 min per kilobase of amplicon size and final extension at 688C for 5 min. Amplification was confirmed by using 1% agarose 96-well E-Gels (Invitrogen, Carlsbad, CA, USA). After purification, concentration of each of the amplicons was quantitated using PicoGreen dsDNA Quantitation reagent (Invitrogen) and the Tecan Ultra Evolution plate reader. An equimolar amount of each amplicon was then pooled by sample. Pooled amplicons were then purified using the Invitrogen PureLink PCR Purification Kit (Invitrogen) with the HC buffer. LPCR-enriched, purified products were then sheared using Covaris E210 (Duty cycle 10%, Intensity cycle 5, cycle/ burst: 200, time: 180 s).

We prepared target DNA for sequencing the SSC samples by using RainDance Technology’s (RDT) microdroplet-based technology to enrich for the human X chromosome exome as described previously (68). For 15 samples, a slightly modified version of the manufacturer’s protocol was used where the purified PCR product was run in 1% agarose gel, and the bands between 200 and 700 bp were extracted using a Qiagen Gel purification kit. The ends of the DNA fragments were repaired at 258C for 30 min using the New England BioLabs End Repair Module (New England BioLabs, Ipswich, MA, USA, catalogue #E6050L), followed by purification using the Qiagen MinElute columns (Qiagen, Valencia, CA, USA). The PCR fragments were then concatenated at 208C for 30 min using NEB Quick Ligation Kit (NEB, Ipswich, MA, USA, catalogue #M2200L). The ligated products were purified using the Qiagen MinElute columns and were made up to 100 ml volume by adding Qiagen elution buffer, and then fragmented using a Covaris E210 machine (duty cycle: 10%, intensity cycle: 5, cycle/burst: 200, number of treatments: 4, total time per treatment: 45 s). Illumina library preparation, sequencing and data analysis The sheared fragments from the LPCR and RainDance libraries were purified using a Qiagen QIAquick PCR purification column and eluted in 32 ml of elution buffer (Qiagen). The standard Illumina Genome Analyzer (IGA) multiplex library preparation protocol was then used to prepare samples for sequencing with only one modification: while purifying the adaptor-ligated products, we used Invitrogen E-Gel SizeSelect 2% (Invitrogen, catalogue #G6610-02) in place of the gel purification method suggested by Illumina. We confirmed sample enrichment by running an Agilent BioAnalyzer 7500 DNA chip. A quantitative qPCR was performed to quantitate the library using a KAPA Library Quantification Kit (Kapa Biosystems, Woburn, MA, USA, catalogue # KK4824). Enriched DNA was denatured and diluted to a concentration of 8 pM. Cluster generation and 70 bp or 100 bp single-end sequencing was performed using standard IGA or Illumina HiSeq protocols. We performed multiplex single-end sequencing of 12 samples per lane for the AGRE samples and 3 – 4 samples per Illumina lane for the SSC samples.

Variant annotation, validation and segregation analysis All common and rare single base pair and indel sequence variants identified by PEMapper were annotated into functional classes using the sequence annotator program, SeqAnt (69). Selected highly conserved rare variants (phastCon score . 0.65) were independently validated by Sanger sequencing. The primers for validation were designed using Primer 3 software (http://frodo.wi.mit.edu/). After validating replacement and UTR variants in cases, we next sequenced the mothers and affected or unaffected male siblings to verify the segregation pattern of the variant. The de novo variant at I1030L was confirmed by Sanger sequencing using a DNA sample obtained from blood for both the mother and male offspring. Control genotyping To aid in the estimation of allele frequencies in matched controls, we performed genotyping of male samples of European ancestry obtained from the NIMH Human Genetics Initiative (https://www.nimhgenetics.org/available_data/controls/). These control samples used were identical to those used in a previous study (70). Genotyping was performed by the iPLEX Gold method (Sequenom, San Diego, CA, USA) per the manufacturer’s instructions, using primers (see Supplementary Material, Table S4) designed with the Sequenom Assay Design 3.1 software. A positive control was included in every plate to confirm the sensitivity of the assay. The genotyping for the replacement variants, S660T and I1030L, could not be performed, due to failure of the primer design assay by both TaqMan and Sequenom. Statistical analysis All population genetic analyses were calculated using the popgen_fasta2.0.c code (Cutler, unpublished work) on the fasta files as previously described (71). This code calculated the average number of pairwise differences and Watterson’s estimator of the population mutation rate (Qw per site) for the entire sequenced region and different annotated SNV functional classes, while accounting for missing data (39). A point estimate for Tajima’s D was determined for all the data and different SNV functional classes (40). Control data from


RainDance microdroplet-based PCR enrichment of ChrX exome from SSC samples

After the completion of Illumina sequencing, the filtered reads were mapped and variants called using PEMapper (Cutler et al., submitted). The AFF2 reference sequence used for the AGRE samples consists of 10 discontiguous fragments covering 84.8 kb. For the AGRE samples, 99% of the bases had more than 8× coverage. Median depth of coverage was in the range of 388 – 1548, while the range of average depth of coverage was 459– 1907 (Supplementary Material, Table S3). Since the entire human X chromosome exome was sequenced in the SSC samples, we used a reference sequence consisting of 5748 discontiguous fragments covering 4.7 Mb. For the SSC samples, between 83 and 97% of the targeted reference bases had more than 8× coverage. Median depth of coverage was in the range of 20– 607, while the range of average depth of coverage was 21– 675 (Supplementary Material, Table S3).


Luciferase assays We performed luciferase assays in HEK293T and in Neuro-2a cell lines to determine whether the AFF2 3′ UTR variants we identified reduced gene expression relative to a 3′ UTR construct with the reference sequence. Plasmid construction Full-length 3′ UTR was amplified for two rare UTR variants with low (ChrX: 148075200 [T.C]) and high evolutionary conservation (ChrX: 148076068[C.T]). The resulting products were cloned into the pmirGLO Dual-Lucifease vector (Promega, Madison, WI, USA, Cat #E1330). As a control, a full-length 3′ UTR that consisted of the human genome reference sequence was amplified from an unaffected HapMap normal control and also cloned into the same vector. We were unable to successfully generate a construct for one of the variants we discovered (ChrX:148075804[T.A]). Sanger sequencing confirmed that each variant construct contained the correct novel variant site. Cell culture and transfections HEK293T and Neuro-2a cells were cultured at 378C with 5% CO2 in DMEM and RPMI media, respectively, with 10% fetal bovine serum. Twenty-four hours before transfection, 2 × 105 cells were plated in 1 ml of media in each well of 12-well cell culture dishes. Transfections were carried out in Opti-MEM (Invitrogen), using 8 ml of Lipofectamine 2000 (Invitrogen) and 0.5 mg of plasmid. Each plasmid was transfected into three separate wells. Just before transfection, the old media replaced with fresh media (DMEM or RPMI with 10% fetal bovine serum). Twenty-four hours after transfection, cells were harvested with 250 ml 1× Passive Lysis Buffer (Promega) by rocking at room temperature for 15 min. Lysates were cleared of cell debris by centrifugation at 14000 rpm for 5 min at 48C. Luciferase assay From each transfection, 20 ml of the lysate was added to a luminometer tube. To each luminometer tube, 100 ml of LAR II was added. A manual-load luminometer was used to measure the luminescence over a 10 s period, following a

2 s premeasurement delay. The luminometer measurement was repeated after the addition of 100 ml of Stop & Glo reagent. For each lysate, the firefly luciferase values were divided by the Renilla luciferase values. The results of the independent transfections were averaged and the standard deviation was calculated for each plasmid. We performed three independent transformations of each construct, with three replications of each transformation. The raw data were normalized relative to the reference sequence construct and a twotailed, unequal variance Student’s t-test was performed to determine whether the 3′ UTR containing vectors reduced gene expression relative to a construct with the human genome reference sequence.

SUPPLEMENTARY MATERIAL Supplementary Material is available at HMG online.

ACKNOWLEDGEMENTS The Emory Custom Cloning Core facility generated constructs to our specifications for our expression analysis. We thank members of the Cutler and Zwick labs for comments on the manuscript, Jennifer Mulle and Stephen T. Warren for discussion, Cheryl T. Strauss for editing and the Emory-Georgia Research Alliance Genome Center (EGC), supported in part by PHS Grant UL1 RR025008 from the Clinical and Translational Science Award program, National Institutes of Health, National Center for Research Resources, for performing the Illumina sequencing runs. The ELLIPSE Emory High Performance Computing Cluster was used for this project. The authors would like to thank the NHLBI GO Exome Sequencing Project and its ongoing studies, which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010). Conflict of Interest statement. None declared.

FUNDING This work was supported by the National Institutes of Health/ National Institutes of Mental Health (NIH/NIMH) and Gift Fund (grant number: MH076439, M.E.Z.); the Simons Foundation Autism Research Initiative (M.E.Z.); and the Training Program in Human Disease Genetics (grant number: 1T32MH087977, D.R.).

REFERENCES 1. Mamanova, L., Coffey, A.J., Scott, C.E., Kozarewa, I., Turner, E.H., Kumar, A., Howard, E., Shendure, J. and Turner, D.J. (2010) Target-enrichment strategies for next-generation sequencing. Nat. Methods, 7, 111– 118. 2. Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135– 1145. 3. Kogan, M.D., Blumberg, S.J., Schieve, L.A., Boyle, C.A., Perrin, J.M., Ghandour, R.M., Singh, G.K., Strickland, B.B., Trevathan, E. and van


5545 X chromosomes were obtained from exome sequencing data of individuals of European ancestry publicly available at the NHLBI Exome Variant Server (72). P-values and odds ratios were calculated with Fisher’s exact test in the R statistical package. PhyloP scores (phyloP46wayPlacental) were obtained using the February 2009, GRCh37(hg19) assembly from the UCSC Genome Browser (73). Affymetrix 5.0 data generated by the BROAD Institute for AGRE samples were downloaded from AGRE website (19). A total of 215 SNPs that spanned the X chromosome were successfully genotyped in the AGRE cases and sequenced in our SSC cases. We used PLINK to perform LD-based pruning (indeppairwise option) resulting in a total of 179 SNPs that were in approximate linkage equilibrium (74,75). We then used EIGENSOFT 4.2 and GCTA for principal component analysis to determine whether there was any evidence for population stratification between our AGRE and SSC cases (41 – 43).

7

8

4. 5. 6. 7.

8.

9.

11. 12.

13. 14.

15. 16.

17.

18.

19.

20.

21.

22.

23.

Dyck, P.C. (2009) Prevalence of parent-reported diagnosis of autism spectrum disorder among children in the US, 2007. Pediatrics, 124, 1395– 1403. Fombonne, E. (2003) The prevalence of autism. JAMA, 289, 87–89. Bailey, A., Le Couteur, A., Gottesman, I., Bolton, P., Simonoff, E., Yuzda, E. and Rutter, M. (1995) Autism as a strongly genetic disorder: evidence from a British twin study. Psychol. Med., 25, 63–77. Constantino, J.N., Zhang, Y., Frazier, T., Abbacchi, A.M. and Law, P. (2010) Sibling recurrence and the genetic epidemiology of autism. Am. J. Psychiatry, 167, 1349– 1356. Lichtenstein, P., Carlstro¨m, E., Ra˚stam, M., Gillberg, C. and Anckarsa¨ter, H. (2010) The genetics of autism spectrum disorders and related neuropsychiatric disorders in childhood. Am. J. Psychiatry, 167, 1357– 1363. Hallmayer, J., Cleveland, S., Torres, A., Phillips, J., Cohen, B., Torigoe, T., Miller, J., Fedele, A., Collins, J., Smith, K. et al. (2011) Genetic heritability and shared environmental factors among twin pairs with autism. Arch. Gen. Psychiatry, 68, 1095–1102. Ronald, A. and Hoekstra, R.A. (2011) Autism spectrum disorders and autistic traits: a decade of new twin studies. Am. J. Med. Genet. B, 156B, 255– 274. Betancur, C. (2011) Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res., 1380, 42– 77. State, M.W. and Levitt, P. (2011) The conundrums of understanding genetic risks for autism spectrum disorders. Nat. Neurosci., 14, 1499– 1506. Wang, K., Zhang, H., Ma, D., Bucan, M., Glessner, J.T., Abrahams, B.S., Salyakina, D., Imielinski, M., Bradfield, J.P., Sleiman, P.M.A. et al. (2009) Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature, 459, 528–533. Weiss, L.A., Arking, D.E., Daly, M.J. and Chakravarti, A. (2009) A genome-wide linkage and association scan reveals novel loci for autism. Nature, 461, 802–808. Anney, R., Klei, L., Pinto, D., Regan, R., Conroy, J., Magalhaes, T.R., Correia, C., Abrahams, B.S., Sykes, N., Pagnamenta, A.T. et al. (2010) A genome-wide scan for common alleles affecting risk for autism. Hum. Mol. Genet., 19, 4072–4082. Devlin, B., Melhem, N. and Roeder, K. (2011) Do common variants play a role in risk for autism? Evidence and theoretical musings. Brain Res., 1380, 78– 84. Christian, S.L., Brune, C.W., Sudi, J., Kumar, R.A., Liu, S., KaraMohamed, S., Badner, J.A., Matsui, S., Conroy, J., McQuaid, D. et al. (2008) Novel submicroscopic chromosomal abnormalities detected in autism spectrum disorder. Biol. Psychiatry, 63, 1111–1117. Kumar, R.A., KaraMohamed, S., Sudi, J., Conrad, D.F., Brune, C., Badner, J.A., Gilliam, T.C., Nowak, N.J., Cook, E.H., Dobyns, W.B. et al. (2008) Recurrent 16p11.2 microdeletions in autism. Hum. Mol. Genet., 17, 628–638. Marshall, C.R., Noor, A., Vincent, J.B., Lionel, A.C., Feuk, L., Skaug, J., Shago, M., Moessner, R., Pinto, D., Ren, Y. et al. (2008) Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet., 82, 477– 488. Weiss, L.A., Shen, Y., Korn, J.M., Arking, D.E., Miller, D.T., Fossdal, R., Saemundsen, E., Stefansson, H., Ferreira, M.A., Green, T. et al. (2008) Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med., 358, 667–675. Bucan, M., Abrahams, B.S., Wang, K., Glessner, J.T., Herman, E.I., Sonnenblick, L.I., Alvarez Retuerto, A.I., Imielinski, M., Hadley, D., Bradfield, J.P. et al. (2009) Genome-wide analyses of exonic copy number variants in a family-based study point to novel autism susceptibility genes. PLoS Genet., 5, e1000536. Glessner, J.T., Wang, K., Cai, G., Korvatska, O., Kim, C.E., Wood, S., Zhang, H., Estes, A., Brune, C.W., Bradfield, J.P. et al. (2009) Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature, 459, 569–573. Pinto, D., Pagnamenta, A.T., Klei, L., Anney, R., Merico, D., Regan, R., Conroy, J., Magalhaes, T.R., Correia, C., Abrahams, B.S. et al. (2010) Functional impact of global rare copy number variation in autism spectrum disorders. Nature, 466, 368–372. Moreno-De-Luca, D., Mulle, J.G., Kaminsky, E.B., Sanders, S.J., Myers, S.M., Adam, M.P., Pakula, A.T., Eisenhauer, N.J., Uhas, K., Weik, L. et al. (2010) Deletion 17q12 is a recurrent copy number variant that

24.

25.

26.

27.

28.

29.

30.

31. 32. 33. 34. 35.

36.

37. 38.

39. 40. 41. 42.

confers high risk of autism and schizophrenia. Am. J. Hum. Genet., 87, 618–630. Noor, A., Whibley, A., Marshall, C.R., Gianakopoulos, P.J., Piton, A., Carson, A.R., Orlic-Milacic, M., Lionel, A.C., Sato, D., Pinto, D. et al. (2010) Disruption at the PTCHD1 Locus on Xp22.11 in Autism spectrum disorder and intellectual disability. Sci. Transl. Med., 2, 49ra68. Sanders, S.J., Ercan-Sencicek, A.G., Hus, V., Luo, R., Murtha, M.T., Moreno-De-Luca, D., Chu, S.H., Moreau, M.P., Gupta, A.R., Thomson, S.A. et al. (2011) Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron, 70, 863– 885. Levy, D., Ronemus, M., Yamrom, B., Lee, Y.-H., Leotta, A., Kendall, J., Marks, S., Lakshmi, B., Pai, D., Ye, K. et al. (2011) Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron, 70, 886–897. Celestino-Soper, P.B.S., Shaw, C.A., Sanders, S.J., Li, J., Murtha, M.T., Ercan-Sencicek, A.G., Davis, L., Thomson, S., Gambin, T., Chinault, A.C. et al. (2011) Use of array CGH to detect exonic copy number variants throughout the genome in autism families detects a novel deletion in, TMLHE. Hum. Mol. Genet., 20, 4360–4370. Jamain, S., Quach, H., Betancur, C., Ra˚stam, M., Colineaux, C., Gillberg, I.C., Soderstrom, H., Giros, B., Leboyer, M., Gillberg, C. et al. (2003) Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4 are associated with autism. Nat. Genet., 34, 27– 29. Laumonnier, F., Bonnet-Brilhault, F., Gomot, M., Blanc, R., David, A., Moizard, M.-P., Raynaud, M., Ronce, N., Lemonnier, E., Calvas, P. et al. (2004) X-linked mental retardation and autism are associated with a mutation in the NLGN4 gene, a member of the neuroligin family. Am. J. Hum. Genet., 74, 552–557. Chung, R.-H., Ma, D., Wang, K., Hedges, D.J., Jaworski, J.M., Gilbert, J.R., Cuccaro, M.L., Wright, H.H., Abramson, R.K., Konidari, I. et al. (2011) An X chromosome-wide association study in autism families identifies TBL1X as a novel autism spectrum disorder candidate gene in males. Mol. Autism, 2, 18. Gu, Y., Shen, Y., Gibbs, R.A. and Nelson, D.L. (1996) Identification of FMR2, a novel gene associated with the FRAXE CCG repeat and CpG island. Nat. Genet., 13, 109– 113. Gecz, J., Gedeon, A.K., Sutherland, G.R. and Mulley, J.C. (1996) Identification of the gene FMR2, associated with FRAXE mental retardation. Nat. Genet., 13, 105 –108. Gecz, J., Bielby, S., Sutherland, G.R. and Mulley, J.C. (1997) Gene structure and subcellular localization of FMR2, a member of a new family of putative transcription activators. Genomics, 44, 201–213. Moore, S.J., Strain, L., Cole, G.F., Miedzybrodzka, Z., Kelly, K.F. and Dean, J.C. (1999) Fragile X syndrome with FMR1 and FMR2 deletion. J. Med. Genet., 36, 565–566. Probst, F.J., Roeder, E.R., Enciso, V.B., Ou, Z., Cooper, M.L., Eng, P., Li, J., Gu, Y., Stratton, R.F., Chinault, A.C. et al. (2007) Chromosomal microarray analysis (CMA) detects a large X chromosome deletion including FMR1, FMR2, and IDS in a female patient with mental retardation. Am. J. Med. Genet. A, 143A, 1358–1365. Cavani, S., Prontera, P., Grasso, M., Ardisia, C., Malacarne, M., Gradassi, C., Cecconi, M., Mencarelli, A., Donti, E. and Pierluigi, M. (2011) FMR1, FMR2, and SLITRK2 deletion inside a paracentric inversion involving bands Xq27.3-q28 in a male and his mother. Am. J. Med. Genet. A, 155A, 221–224. Stettner, G.M., Shoukier, M., Ho¨ger, C., Brockmann, K. and Auber, B. (2011) Familial intellectual disability and autistic behavior caused by a small FMR2 gene deletion. Am. J. Med. Genet. A, 155A, 2003– 2007. Whibley, A.C., Plagnol, V., Tarpey, P.S., Abidi, F., Fullston, T., Choma, M.K., Boucher, C.A., Shepherd, L., Willatt, L., Parkin, G. et al. (2010) Fine-scale survey of X chromosome copy number variants and indels underlying intellectual disability. Am. J. Hum. Genet., 87, 173– 188. Watterson, G.A. (1975) On the number of segregating sites in genetical models without recombination. Theor. Pop. Biol., 7, 256– 276. Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, 585– 595. Patterson, N., Price, A.L. and Reich, D. (2006) Population structure and eigenanalysis. PLoS Genet., 2, e190. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A. and Reich, D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet., 38, 904 – 909.


10.



62.

63.

64.

65. 66.

67. 68. 69.

70.

71.

72. 73.

74. 75.

exomes reveal a highly interconnected protein network of de novo mutations. Nature, 485, 246– 250. Sanders, S.J., Murtha, M.T., Gupta, A.R., Murdoch, J.D., Raubeson, M.J., Willsey, A.J., Ercan-Sencicek, A.G., DiLullo, N.M., Parikshak, N.N., Stein, J.L. et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature, 485, 237–241. Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J., Yamrom, B., Lee, Y.-H., Narzisi, G., Leotta, A. et al. (2012) De novo gene disruptions in children on the autistic spectrum. Neuron, 74, 285– 299. Bensaid, M., Melko, M., Bechara, E.G., Davidovic, L., Berretta, A., Catania, M.V., Gecz, J., Lalli, E. and Bardoni, B. (2009) FRAXE-associated mental retardation protein (FMR2) is an RNA-binding protein with high affinity for G-quartet RNA forming structure. Nucleic Acids Res., 37, 1269– 1279. Hillman, M.A. and Gecz, J. (2001) Fragile XE-associated familial mental retardation protein 2 (FMR2) acts as a potent transcription activator. J. Hum. Genet., 46, 251–259. Geschwind, D.H., Sowinski, J., Lord, C., Iversen, P., Shestack, J., Jones, P., Ducat, L. and Spence, S.J., AGRE Steering Committee. (2001) The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet., 69, 463– 466. Fischbach, G.D. and Lord, C. (2010) The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron, 68, 192– 195. Mondal, K., Shetty, A.C., Patel, V., Cutler, D.J. and Zwick, M.E. (2011) Targeted sequencing of the human X chromosome exome. Genomics, 98, 260– 265. Shetty, A.C., Athri, P., Mondal, K., Horner, V.L., Steinberg, K.M., Patel, V., Caspary, T., Cutler, D.J. and Zwick, M.E. (2010) SeqAnt: a web service to rapidly identify and annotate DNA sequence variations. BMC Bioinformatics, 11, 471. Collins, S.C., Bray, S.M., Suhl, J.A., Cutler, D.J., Coffee, B., Zwick, M.E. and Warren, S.T. (2010) Identification of novel FMR1 variants by massively parallel sequencing in developmentally delayed males. Am. J. Med. Genet. A, 152A, 2512–2520. Zwick, M.E., Mcafee, F., Cutler, D.J., Read, T.D., Ravel, J., Bowman, G.R., Galloway, D.R. and Mateczun, A. (2005) Microarray-based resequencing of multiple Bacillus anthracis isolates. Genome Biol., 6, R10. Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, WA. Accessed 12/2011. Available: http://evs.gs.washington.edu/EVS/ via the Internet. Dreszer, T.R., Karolchik, D., Zweig, A.S., Hinrichs, A.S., Raney, B.J., Kuhn, R.M., Meyer, L.R., Wong, M., Sloan, C.A., Rosenbloom, K.R. et al. (2012) The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res., 40, D918–D923. Plink 1.0.7. Available: http://pngu.mgh.harvard.edu/purcell/plink via the Internet. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559– 575.


43. Yang, J., Weedon, M.N., Purcell, S., Lettre, G., Estrada, K., Willer, C.J., Smith, A.V., Ingelsson, E., O’Connell, J.R., Mangino, M. et al. (2011) Genomic inflation factors under polygenic inheritance. Eur. J. Human Genet., 19, 807–812. 44. Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516–1517. 45. Lander, E.S. (1996) The new genomics: global views of biology. Science, 274, 536–539. 46. Kruglyak, L. (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet., 22, 139– 144. 47. Consortium, I.H. (2005) A haplotype map of the human genome. Nature, 437, 1299– 1320. 48. Consortium, I.H., Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., Belmont, J.W., Boudreau, A., Hardenbol, P. et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851– 861. 49. Zwick, M.E., Cutler, D.J. and Chakravarti, A. (2000) Patterns of genetic variation in Mendelian and complex traits. Annu. Rev. Genomics Hum. Genet., 1, 387–407. 50. Visscher, P.M., Brown, M.A., McCarthy, M.I. and Yang, J. (2012) Five years of GWAS discovery. Am. J. Hum. Genet., 90, 7– 24. 51. Haldane, J.B.S. (1927) A mathematical theory of natural and artificial selection, part V: selection and mutation. Math. Proc. Cambridge, 23, 838–844. 52. Barton, N.H. and Turelli, M. (1989) Evolutionary quantitative genetics: how little do we know? Annu. Rev. Genet., 23, 337–370. 53. Barton, N.H. and Keightley, P.D. (2002) Understanding quantitative genetic variation. Nat. Rev. Genet., 3, 11– 21. 54. Lande, R. (1975) The maintenance of genetic variability by mutation in a polygenic character with linked loci. Genet. Res., 26, 221–235. 55. Turelli, M. (1984) Heritable genetic variation via mutation-selection balance: Lerch’s zeta meets the abdominal bristle. Theor. Pop. Biol., 25, 138–193. 56. Tarpey, P.S., Smith, R., Pleasance, E., Whibley, A., Edkins, S., Hardy, C., O’Meara, S., Latimer, C., Dicks, E., Menzies, A. et al. (2009) A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat. Genet., 41, 535– 543. 57. Piton, A., Gauthier, J., Hamdan, F.F., Lafreniere, R.G., Yang, Y., Henrion, E., Laurent, S., Noreau, A., Thibodeau, P., Karemera, L. et al. (2011) Systematic resequencing of X-chromosome synaptic genes in autism spectrum disorder and schizophrenia. Mol. Psychiatry, 16, 867 –880. 58. Keinan, A. and Clark, A.G. (2012) Recent explosive human population growth has resulted in an excess of rare genetic variants. Science, 336, 740–743. 59. Ioannidis, J.P., Ntzani, E.E., Trikalinos, T.A. and Contopoulos-Ioannidis, D.G. (2001) Replication validity of genetic association studies. Nat. Genet., 29, 306–309. 60. Neale, B.M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K.E., Sabo, A., Lin, C.F., Stevens, C., Wang, L.S., Makarov, V. et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature, 485, 242–245. 61. O’Roak, B.J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B.P., Levy, R., Ko, A., Lee, C., Smith, J.D. et al. (2012) Sporadic autism

9