A database dedicated to CF and CFTRrelated ... - Wiley Online Library

4 downloads 13247 Views 616KB Size Report
Jul 6, 2010 - UMD-CFTR: A Database Dedicated to CF and CFTR-Related Disorders. Corinne Bareil,1Ã. Corinne The`ze,2 Christophe Béroud,1–3 Dalil ...
DATABASES

Human Mutation OFFICIAL JOURNAL

UMD-CFTR : A Database Dedicated to CF and CFTR-Related Disorders

www.hgvs.org

Corinne Bareil,1 Corinne The`ze,2 Christophe Be´roud,1–3 Dalil Hamroun,1,3 Caroline Guittard,1 Ce´line Rene´,1 Damien Paulet,2 Marie des Georges,1,3 and Mireille Claustres1–3  1

CHU Montpellier, Hoˆpital Arnaud de Villeneuve, Laboratoire de Ge´ne´tique Mole´culaire, Montpellier, F-34000 France; 2Universite´ Montpellier 1,

UFR Me´decine, Laboratoire de Ge´ne´tique Mole´culaire, Montpellier, F-34000 France; 3Inserm, U 827, Montpellier, F–34000 France

Communicated by Mark H. Paalman Received 14 April 2010; accepted revised manuscript 23 June 2010. Published online 6 July 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/humu.21316

ABSTRACT: With the increasing knowledge of cystic fibrosis (CF) and CFTR-related diseases (CFTR-RD), the number of sequence variations in the CFTR gene is constantly raising. CF and particularly CFTR-RD provide a particular challenge because of many unclassified variants and identical genotypes associated with different phenotypes. Using the Universal Mutation Database (UMDs) software we have constructed UMD-CFTR (freely available at the URL: http://www.umd.be/CFTR/), the first comprehensive relational CFTR database that allows an in-depth analysis and annotation of all variations identified in individuals whose CFTR genes have been analyzed extensively. The system has been tested on the molecular data from 757 patients (540 CF and 217 CBAVD) including disease-causing, unclassified, and nonpathogenic alterations (301 different sequence variations) representing 3,973 entries. Tools are provided to assess the pathogenicity of mutations. UMD-CFTR also offers a number of query tools and graphical views providing instant access to the list of mutations, their frequencies, positions and predicted consequences, or correlations between genotypes, haplotypes, and phenotypes. UMD-CFTR offers a way to compile not only disease-causing genotypes but also haplotypes. It will help the CFTR scientific and medical communities to improve sequence variation interpretation, evaluate the putative influence of haplotypes on mutations, and correlate molecular data with phenotypes. Hum Mutat 31:1011–1019, 2010. & 2010 Wiley-Liss, Inc. KEY WORDS: CFTR; locus-specific database; haplotype; genotype/phenotype

Introduction The CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) gene (MIM] 602421) is one of the most extensively studied gene worldwide in the field of inherited monogenic disorders (e.g., Additional Supporting Information may be found in the online version of this article. Correspondence to: Corinne Bareil, Laboratoire de Ge´ne´tique Mole´culaire,

INSERM U827, IURC, 641 av doyen Gaston Giraud, 34093 Montpellier cedex 5, France. E-mail: [email protected]; Mireille Claustres, Laboratoire de Ge´ne´tique Mole´culaire, INSERM U827, IURC, 641 av Doyen Gaston Giraud, 34093 Montpellier cedex 5, France. E-mail: [email protected]

searching ‘‘CFTR mutations’’ in PubMed through February 2010 retrieved more than 3,200 articles). Since its cloning in 1989 [Kerem et al., 1989; Riordan et al., 1989; Rommens et al., 1989], CFTR appears to be a complex genetic element, not only because more than 1,700 mutations, polymorphisms, or unclassified variations have already been reported to the Cystic Fibrosis Mutation Database (http://www.genet.sickkids.on.ca/cftr/Home.html), but also because its phenotypic expression can be modulated by internal polymorphisms [Kiesewetter et al., 1993]. It has been a major challenge to understand the molecular and functional effects of the more prevalent mutations causing CF [Kartner et al., 1992; Serohijos et al., 2008]. For rare variants, little to nothing is known, as data are often based on a single patient, and diagnostic laboratories have to face a risk for misclassification of mutations and variations, especially of missense type. Several sequence changes initially reported as causing disease have subsequently been reported to be neutral sequence variants (a typical illustration is the variant p.Ile148Thr) [Claustres et al., 2004; Rohlfs et al., 2004] or mutations with reduced penetrance (only some patients will develop CF or CFTR-related disorder; example: p.Arg117His) [Kiesewetter et al., 1993; Rosenstein and Cutting, 1998; ThauvinRobinet et al., 2009] or mutations with variable expressivity (some patients develop mild rather than severe symptoms; examples include p.Leu206Trp [Desgeorges et al., 1995; Rozen et al., 1995] or p.Asp1152His [Burgel et al., 2010; Mussaffi et al., 2006]). An ever-growing number of laboratories are now offering complete scanning or sequencing of CFTR coding/flanking sequences not only in the context of cystic fibrosis (CF; MIM] 219700) or male infertility due to isolated congenital absence of vas deferens (CBAVD; MIM] 277180) [Claustres et al., 2000], but also in diverse ‘‘CF-related disorders’’ (CF-RDs) or other clinical situations. The frequent identification of rare mutations substantially complicates test interpretation and counseling, particularly in the context of pregnant women. Gold standard in interpretation of DNA variations is thought to rely on their functional analysis in relevant in vivo or in vitro expression assays. However, there are many discordant, inconsistent, or controversial results in the literature, due to the lack of standardization and/or inappropriate model systems; a demonstrative example was recently illustrated for mutations of the arginine residue at codon 1070, whose functionality proved to be different in polarized and nonpolarized heterologous cells [Krasnov et al., 2008]. The significance of rare mutations in the diagnostic field, the genotype/phenotype correlation in the single individual, and the clinical relevance of complex alleles (several putative mutations within a single gene) and modifier genes are the issues that often raise more question than answers.

& 2010 WILEY-LISS, INC.

LSDBs (Locus-Specific DataBases) are now recognized as the best mode of collecting and curating lists of mutations related to human genetic diseases [Claustres et al., 2002; Cotton et al., 2007]. The existing Cystic Fibrosis Mutation Database (CFMD, http:// www.genet.sickkids.on.ca/cftr/Home.html) is an open-access database listing CFTR mutations and variations originally reported by laboratories worldwide and the associated phenotype. In a major upgrade in April 2010, all known CFTR mutations and sequence variants have been converted to the recommended nomenclature (http://www.hgvs.org/mutnomen/). CFMD constitutes a comprehensive core collection of mutations within the CFTR gene and has become the central disease-associated CFTR mutation database. However, central databases are not designed to collect and store the enormous amount of molecular data accumulated in diagnostic laboratories. Indeed, vast resources of high-quality genotyping data sit in laboratories expert in CFTR analysis, which perform thorough and extensive family segregation studies and analysis of mutations in alleles from the general population, for clinical or epidemiological purposes. These data, although invaluable for pathogenicity assessment, are unlikely to ever be published due to lack of time for reporting diagnostic data, or even shared due to the lack of robust bioinformatics easily usable by diagnostic laboratories. Pooling of these data could significantly advance the interpretation of missense, splice, or intronic variants, for example, by facilitating estimates of the frequency of rare variants, of rare events such as co-occurrence of variants with clinical or pathological feature [Greenblatt et al., 2008]. There is a need for an accurate and exhaustive collection of sequence variations identified in patients suffering from disease related to the CFTR gene and their relatives. The diversity and complexity of data associated with genotyping studies of families and population control samples make it far more difficult to organize them in a shared database than a single depository. Using the generic software called UMDs (Universal Mutation Database) [Beroud et al., 2000, 2005] we have developed a UMD-CFTR database specifically designed for extensive collection and analysis of disease-causing mutations, polymorphisms, and unclassified variants identified by laboratories with expert knowledge in the CFTR variation. It is a knowledgebase that combines a relational database with various functions that can be directly queried online and shared by users. To our knowledge, it is also the first LSDB that offers, for patients who have been explored for all CFTR coding/flanking sequences, the possibility to record and analyze any variations that have been found for each allele in cis and in trans.

Structure and Features of the UMD-CFTR The Website is divided into eight sections (Supp. Fig. S1), among which four are dedicated to information on the CFTR gene and the numbering of exons and cDNA sequences, information on the protein, the clinics, and the references, while four allow the user to query the database. The two major phenotypes caused by CFTR mutations, cystic fibrosis, and male infertility due to congenital absence of the vas deferens, are presented. Other associated phenotypes will be added in a next step. Hyperlinks are provided to access the reference sequences, the allelic variant information, and data search capabilities. The Web pages also contain links to external information including central biological databases, such as the Human Genome Database (GDB), GeneCard, UniGene, UniProt, OMIM, HGV, major patient associations Websites, etc.

1012

HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

The Software and Search Engine The database was constructed with an updated version of the Universal Mutation Database generic software (http:// www.umd.be) [Beroud et al., 2000, 2005], which includes an optimized structure to assist and secure data entry and to allow the input of various clinical or biological data. An offline copy of UMD-CFTR is continuously edited and updated by the curators; edited copies are regularly deposited on the server. This database is freely available online. Its integrity is ensured by the original (edited) template offline. Notification of errors in the current version would be gratefully received by the corresponding author. Database design follows general recommendations for LSDB characteristics and their curation suggested by the Human Genome Variation Society (HGVS) [Cotton et al., 2008].

CFTR Reference Sequences The UMD-CFTR follows the recommended nomenclature guidelines (http://www.hgvs.org/mutnomen/), for example, cDNA-based numbering with the A of ATG translational initiation codon (codon 1) at 11 and exons numbered from 1 to 27. The reference coding sequence of the CFTR gene (symbol CFTR; HGNC: 1884; GenBank cDNA Refseq: NM_000492.3 with the exception of c.1408A instead of G) is displayed in ‘‘The gene’’ section. For convenience to the users, the UMD-CFTR also gives the usual exon (and also nucleotide numbering for intronic variations) used since 1989 by the members of the Cystic Fibrosis Genetic Analysis Consortium, which recognize exons 1 to 24, using subdivisions ‘‘a’’ and ‘‘b’’ for exons 6, 14, and 17 (recognized as distinct units after the initial publication of the gene [Zielenski et al., 1991]), and uses the A of the ATG translation start codon at 1133. In addition, a CFTR Exon Phasing Tool has been provided to illustrate the ‘‘phase’’ of each exon (whether it begins or ends by the first, second, or third nucleotide of a codon) (Supp. Fig. S1). A correspondence table for variation names (at nucleotidic level, usual and recommended, and at proteic level) is provided in the ‘‘Home’’ page of the database (Supp. Fig. S1). The cDNA sequence was aligned against the genomic contig (GenBank NT_007933.15) to define intronic sequences in the database. The structural domains of the CFTR protein (GenBank NP_000483.3 with the exception of p.Met470 instead of p.Val470) were defined according to Chen et al. [2001].

Record Definition An anonymized and unique sample identifier (sample ID) is generated for each individual. A record (provided with a unique record identifier, UMD_ID) is a heterozygous or homozygous sequence variation identified in one individual. In the current version, two sequence variations affecting a single allele are entered as two separate records linked by the same sample ID.

Sequence Variation Entry The UMD software includes an automatic procedure to check for the correct description of the sequence variation at the nucleotide level (as long as the A of the traduction initiation codon is used) and to generate the variation name at the protein level. For the 3,973 records registered in this study, accuracy of variant description in UMD-CFTR was fully concordant with the Mutalyzer software v1.0.4 [Wildeman et al., 2008]. Presently, sequence variants localized in the promoter excepted, all types of

sequence variations can be reported in the database. However, exact breakpoints of large rearrangements cannot be recorded yet, for example, the six records reporting a deletion of exon 2 (c.54_164del) actually corresponds to two deletions presenting different breakpoints: c.54-5811_16412186del8108ins182 (five records) [Faa et al., 2006; Girardet et al., 2007] and c.54-1161_164 11603del2875 (one record) [des Georges et al., 2008].

Description of Variants Four variation types are documented: mutation, complex allele, polymorphism, and unclassified variants (UV). The variants previously described either in publications or in the CFMD database have been reviewed and when appropriate, recategorized in light of information (from literature or international CF networks) obtained since the first description or the original publication. Variants proven to be pathogenic were listed as ‘‘mutations.’’ Variations with no known clinical consequence were classified as ‘‘polymorphisms’’ when observed in at least 1% of control subjects. When the causative link between a sequence variant and the phenotype was not firmly proven because the segregation pattern could not be analyzed, the amino acid was not conserved or no functional data was available, the variant was listed as ‘‘unclassified.’’ When several variants associated in cis could be involved (e.g., variants always found in the same haplotype, the pathogenicity of each alteration being unknown, or when it was not clear whether this effect was due to one particular mutation or to the association), each variant was recorded as ‘‘complex allele.’’ The most documented examples are length variants localized at the polypyrimidine locus (polyTG followed by polyT repeats, c.121034TG [repeats] and c.121012T [repeats], respectively) upstream of the splice acceptor site of intron 9 (trivial name: intron 8), which affect the splicing efficiency of exon 10 (trivial name: exon 9) and act as genetic modifiers of CFTR function [Chu et al., 1993; Cuppens et al., 1998; Groman et al., 2004]. The polyTG and polyT repeats were entered as two separate records as their association could not be determined in all cases (either because familial segregation was impossible or because only the polyT repeats were determined). All polyTG repeats (c.121034TG[repeats]) were defined as ‘‘polymorphism.’’ PolyT alleles with a number of repeats o7 (e.g., c.121012T[3], T[5], or T[6]) were classified as ‘‘mutation’’ for the CBAVD phenotype. Within the CF phenotype, the T[5] allele was recorded as ‘‘UV’’ when associated with TG[11] or as ‘‘mutation’’ when associated with TG[12] or TG[13].

Searching tools In addition to previously developed routines [Beroud et al., 2000, 2005], news tools have been specifically developed for the UMD-CFTR for the in-depth analysis of alleles, genotypes, and haplotypes. The user has access to optimized multicriteria search tools to select records from any field. A list of variations is displayed and detailed data from each record can be accessed by a simple click on the ‘‘sample ID’’ hyperlink. This will provide information at several levels for each mutation. (1) Mutation description: mutation name at the cDNA and protein levels; wildtype and mutant codons and amino acids; domain of the protein; mutation type (Ts, transition; Tv, transversion; InF, in frame; Fs, frameshift), pyrimidin doublet and CpG, genetic status for the variation (homozygous, heterozygous), variation class (mutation, polymorphism, complex allele or unclassified). (2) Mutation

impact: Bioinformatics Prediction Tools are included to help in pathogenicity assessment of missense mutations (a pathogenicity score is calculated combining the degree of conservation, biochemistry and structure of amino acids by the UMD Predictor) [Frederic et al., 2009]. To evaluate the impact of splicing mutations on the mRNA, an algorithm in the UMD software calculates the consensus value (CV) of the mutant splice site and compares its strength to the wild-type [Beroud et al., 2005]. If the mutation modifies a restriction site, the program shows a restriction map displaying the new or abolished site and the enzymes of interest. (3) Patient and sample data: sample ID, patient status (proband or relative), gender (male, female, unknown), mode of transmission, age of onset, age of death, geographic origin, and phenotypic group (e.g., CF or CBAVD in this study). (4) Reference: reference ID, published (with a Pubmed link) or unpublished data (submitter). In a next step, a fifth section (Clinical data) will be documented.

‘‘Search’’ section Prequeried data can be displayed through several Searching Tools such as ‘‘Type and number of mutations’’ (Fig. 1), ‘‘Mutations by exon/intron’’ (Fig. 2), ‘‘Haplotypes’’ (Fig. 3) or ‘‘Mutation detection rates’’ (some examples are detailed below). Alternatively, the option ‘‘Free search’’ gives access to a ‘‘Quick search’’ (mutations and phenotypic group) and to an ‘‘Advanced search’’ interface querying each of several items (sample ID, Gender, Transmission, Mutation status, Patient status) with molecular data (mutation type, nucleotide, amino acid position, exon, CpG, structure).

CFTR Genotypes and Haplotypes For each patient, the complete mutation genotype is provided with the cis versus trans status of the disease-causing mutations ascertained by familial segregation analysis when available, as well as the other variation(s) associated in a complex allele when appropriate (Fig. 2). A specific tool has been developed to integrate in a haplotype the various sequence variations identified by scanning and sequencing technologies. For each phenotype (CF or CBAVD), the different disease-causing mutations are listed and for each mutation, additional data are provided such as (1) the number of compound heterozygous and homozygous patients, (2) the different mutations identified in trans with their frequency, (3) the associated variations in cis with their frequency (example shown in Fig. 3). A color-coded display of variations associated in cis with a mutation facilitates the visualization of haplotypes. Mutation p.Arg117His, for example (c.350G4A according to the recommended nomenclature), whose disease phenotype varying from asymptomatic to classical CF [Munck et al., 2009; ThauvinRobinet et al., 2009], can partially be explained by its association in cis with the polyT alleles (T[5] or T[7]), was only found in CBAVD patients in our series. In all cases (19 alleles found in 18 patients, one male being homozygous), p.Arg117His was associated in cis with T[7] allele and (when the segregation analysis could be performed) with TG[10] repeats in 68.42% (13 alleles/19) (Fig. 3).

Small lesions Fourteen routines facilitate the analysis of CFTR point mutations and display results as graphics and/or lists. They allow complex queries combining different fields: (1) Position HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

1013

Figure 1. The ‘‘Type and number of mutations’’ tool. A: The ‘‘Type and number of mutations’’ tool gives an overview in term of records of all reported disease-causing variations (mutations and complex alleles) in the database according to mutation types and phenotypes. B: Frequency of different types of disease-causing mutations in CF and CBAVD probands. C: Clicking on one type of mutation, for example ‘‘midintronic lesions,’’ allow the user to access the list of all reported mutations and the associated number of records in this category (42 records in this example). D: Clicking on a particular mutation (e.g., c.314026A4G in the recommended nomenclature; usually named 327226A4G) will display details such as wild-type and mutant sequences, and, in this example, splice-site type and strength expressed by the consensus value ‘‘CV.’’ The CVs of the splicing sites are calculated by the UMD algorithm and displayed with graphic tools. A green color indicates a potentially functional splice site with a CV470 (the darker the green is the stronger the site), while a red color indicates a nonfunctional splice site. When a mutation is expected to affect a splice site, both the CV of the mutant sequence and the variation of the CV in comparison to the wild-type sequence should be taken into account. Usually, a variation of 10% or more is associated with a splice recognition alteration ().

1014

HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

Figure 2. The ‘‘Mutations by exon/intron’’ tool. A: The ‘‘Mutations by exon/intron’’ interactive picture displays the 27 exons of the CFTR gene with the usual and the UMD-CFTR numbering and their phasing. B: Clicking on a particular intron or exon gives access to the list of reported disease-causing variations (mutations and complex alleles) for the selected exon or intron and the associated number of records. C: Clicking on the arrow and then on a particular record identifier (2833 in this example) will display all information associated with this ID in the fields ‘‘Patient and sample data,’’ ‘‘Mutation Impact,’’ or ‘‘Mutation description,’’ which also records the mutation carried by the patient on the other allele in trans and, in this example, the other mutation(s) or variation(s) included in the complex allele.

(distribution of mutations at the nucleotide level); (2) Mutational events (type of mutation: substitution, inframe del, nonsense, or variation at the polymorphic T[n] and TG[m] loci); (3) Detailed mutational events (frequency of mutations at each position); (4) Frequency of mutations (relative distribution of mutations at all sites sorted by frequency); (5) Frequency of events (distribution profile of mutational events); (6) Small deletions analysis; or (7) Small insertions analysis (determines if flanking repeated

sequences are involved in each microdeletion or microinsertion, respectively); (8) Mutations map (graphical distribution of mutations along the gene or the protein, with possible zoom on a region of interest); (9) Distribution by exon (tabular and graphical distributions of mutations by exon; for each exon, the expected value is calculated according to the exon’s size, the exon’s composition (mutability for each codon) and the number of mutations); (10) Nucleotides modifications (impact of mutations HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

1015

according to the position of the base in the codon); (11) Amino acids modifications (for each amino acid is given its frequency in the CFTR protein, the expected number of mutations if a random distribution was observed and the true number of mutations affecting this residue); (12) Splice mutations (intronic variations and their possible impact on donor and acceptor splice sites as well as branch points); (13) CpG sites (distribution of mutations affecting one of the 237 CpG sites, CG on the coding strand, or GC on the complementary strand: they involve 28 sites out of 237; (14) Structure (distribution of mutations in the various structural domains of the CFTR protein: N tail, TM1-12, EC, CL1-4, IC, NBD1-2, R, or C tail); domains are coded with different colors.

Large rearrangements within the CFTR locus Five functions are dedicated to large rearrangements (Supp. Fig. S2) [Tuffery-Giraud et al., 2009]: (1) ‘‘Deletion map’’ calculates the exons present and displays the domains of the CFTR protein that are predicted to be deleted; (2) ‘‘Deletions’’ and (3) ‘‘duplications graphs’’ display the frequency and extent of the large rearrangements; (4) ‘‘Impact of deletions on the reading frame’’ shows the impact of all 325 theoretical possible deletions of one or more exons taking into account the impact at the junctional codon: 209 possible simple deletions are predicted to disrupt the reading frame (‘‘out of frame del’’), 116 are predicted to maintain the reading frame (‘‘in frame del’’) and create a new amino acid residue at the junction in 23 cases; (5) ‘‘Breakpoints distribution’’ analyzes the distribution of intronic breakpoints.

Therapeutic approaches tool The identification of molecular mechanisms of diseases opened the way to new therapeutic avenues to correct the mutant mRNA in order to restore a functional protein. One strategy, called the ‘‘nonsense readthrough’’ approach, aims to suppress the pathogenic effect of a nonsense mutation using drugs designed to induce ribosomes to selectively read-through the premature stop codon (PTC) during mRNA translation. This results in a random incorporation of an amino acid at the position of the PTC, which is expected, through continuation of translation, to produce a complete CFTR protein [Du et al., 2008]. However, several studies have now demonstrated that various CF patients with the same nonsense mutation of the CFTR gene display variable readthrough efficiency in response to treatment [Kerem et al., 2008]. As sufficient mRNA containing the nonsense must be present to provide a template for drug-induced ribosomal readthrough, differences in efficiency could result from differences between tissues of the rate of nonsense-mediated mRNA decay [Linde et al., 2007], a mechanism that degrades transcripts carrying a PTC. Other studies emphasized the contribution of differences in DNA sequence of the premature stop codon (with the highest readthrough at UGA followed by UAG and then UAA) and the first nucleotide that follows [McCaughan et al., 1995; Welch et al., 2007]. The UMD-CFTR records cases with the inappropriate presence of a UAA, UAG, or UGA stop codon and can analyze their sequence context, but also records cases with amino acid substitutions at the same position and their associated phenotype. Knowing whether an amino acid substitution has been found in a CF patient at the same position as the nonsense mutation responsible for CF in another patient is a major information in the perspective of nonsense replacement by a random amino acid. Thus, the UMD-CFTR could help clinicians to evaluate patients eligible for the PTC readthrough therapy approach.

1016

HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

Figure 3.

The ‘‘Haplotypes’’ tool. Example of p.Arg117His mutation in CBAVD patients. ‘‘Haplotypes’’ gives several types of data for each disease-causing variation associated to a phenotype selected by the user: (1) the number of compound heterozygotes and homozygotes, (2) the associated mutations in trans with their frequency, (3) the associated variations (disease-causing or not) in cis with their frequency, (4) a color-coded display of the haplotypes harboring this mutation.

UMD-CFTR Current Data Contents The current version updated in February 2010 contains 3,973 records reported for 757 individuals analyzed in our laboratory comprising 540 CF and 217 CBAVD patients. A total of 1,325 records representing disease-causing mutations and complex alleles have been entered for the CF and CBAVD patients files (Fig. 1A); 301 different sequence variations have been identified, consisting in 232 disease-causing mutations (of which 10 variations are involved in complex alleles), 51 polymorphisms, and 18 unclassified variations. Four variants can be classified into two different categories: p.Phe508Cys (complex allele, mutation), c.121012T[5] (mutation, UV), p.Ser1251Asn (complex allele, mutation), p.Arg74Trp (complex allele, UV). The sequence variation average reported per patient is higher in CBAVD than in CF patients (8.56 vs. 5.14, respectively), reflecting the fact that full scanning/sequencing of the CFTR gene for diagnosis of rare mutations is more often carried out in CBAVD than in CF samples, which leads to identification of polymorphisms. The 1,325 records of the patients (that correspond to 1,493 alleles when the homozygotes are included) are distributed into 47.5%

Figure 4.

The ‘‘Alleles and genotypes’’ tool. The tool ‘‘Alleles and genotypes’’ gives the number of occurrence and the frequency of each allele and genotype from a group of records selected by the user. For polyT repeats (c.121012T[repeats]), clicking on the arrow in the ‘‘Frequency’’ column gives access to the detail of the polyTG-polyT haplotypes. Complex alleles appear in purple in ‘‘Allele’’ and ‘‘Genotype’’ columns. A: This extract represents mutations and complex alleles of the CFTR gene found in more than 1% of CBAVD patients. B: This extract shows genotypes represented in more than 1% of CBAVD patients. The analysis does not take into account the maternal or paternal origin of the alleles. The two disease-causing alleles are separated by a slash: one allele appears in black and the other found in trans either in pink for a single variation or in purple for a complex allele.

small deletions, 23.3% missense mutations, 14.7% splice site mutations, 10.7% nonsense mutations, 1.4% small insertions, 1% small indels and 1.3% large deletions. All types of mutations are found in both CF and CBAVD patients but their frequency differ depending on the phenotype. Notably, a higher frequency (56.5%, 572/1013) of small deletions is observed in CF patients compared to CBAVD patients (25.6%, 106/414), while missense mutations represent only 15.1% (153/1013) in CF compared to 43.7% (181/ 414) in CBAVD patients (Fig. 1B). We found 17 large rearrangements in CF (17/1023, 1.7%) and 2 (2/414, 0.5%) in CBAVD patients, representing 11 different large deletions, of which 6 are inframe. Rearrangements within the CFTR locus appear to be mostly concentrated in the 50 portion (with several mutations involving exon 2) and the second half of the CFTR gene (Supp. Fig. S2). The ‘‘Mutation Detection Rates’’ tool shows that almost all CF patients (524/540, 97%) have been found with two disease-causing mutations on both CFTR genes, 15 patients (2.8%) carried only one, whereas only one patient (0.2%) was found with no CFTR

alteration. In the CBAVD phenotype, two CFTR mutations have been reported in 178/217 (82%), one in 20 (9.2%), and none in 19 (8.8%) cases; however, in the latter subgroup one CBAVD male is described with two unclassified variants and another with one. As expected, the most common CBAVD genotype is p.Phe508del in trans to c.[121034TG[12];121012T[5]] (37/217 5 17.1%) (Fig. 4).

Curation One curator expert in the CFTR gene is assigned to the database, managing the content, performing sequence variation curation, editing, and approving submitted data. The validation process of UMD-CFTR by a dedicated curator is critical for maintaining high-quality data by standardizing the clinical and biological descriptions for each patient, by checking the reliability of the reported mutation depending on the technique used, and the accurate description of the mutation according to the HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

1017

international nomenclature. Even though the use of the official nomenclature is a good way to prevent ambiguities and errors, the CFTR community is reluctant to adopt the standardized system for the CFTR gene as many sequence variations have been (incorrectly) described years ago and they are widely used using their trivial name. The most dramatic changes reside in the naming of small deletions/insertions and deep-intronic mutations. Indeed, in our series of patients the 12 different small insertions reported were incorrectly described to the CFMD, they actually are duplications that are more likely caused by DNA polymerase slippage, duplicating a local sequence. The UMD-CFTR can provide an help for this inevitable change. When only one mutation is reported for a patient, we have to determine whether it is present on each allele in trans (which defines true homozygosity) or whether only one parent is carrier, which means that we have to check for false paternity, sample mix-up, deletion removing the exon (major cause), sequence variation at a primer or probe binding site, uniparental disomy (six cases described so far for CFTR) [Reboul et al., 2006] or de novo mutation [Zlotogora, 2004].

Conclusions and Perspectives A major problem with genotype–phenotype correlations studies is that patients are not screened at the same extent for mutations and polymorphisms, so that it cannot be proven that the mutants under study are the unique cause of their symptoms. In addition, for recessive diseases, it is necessary that annotations make it clear in which allelic combinations the pathogenic mutants or multipolyvariant mutants have been identified, requiring familial studies. Only expert laboratories can provide detailed and high-quality information on genotypes, haplotypes, and phenotypes. UMDCFTR is now expanding to a National Database through the collaboration of nine French laboratories specialized in CFTR analysis that perform complete exploration of the gene and extended familial analysis in diverse situations. A major goal is to gather and analyze data in a variety of CFTR-related disorders including chronic rhinosinusitis, bronchiectasies, pancreatitis, nasal polyposis, and other conditions such as ultrasound signs of fetal anomalies, heterozygous partners, compound heterozygous unaffected mothers/fathers or relatives, and pending (for patients where the phenotype can not be ascertained because of the age of the patient, for example, patients analyzed in the context of newborn screening or for patients with incomplete and/or ambiguous clinical data). The UMD-CFTR-France is managed by a single dedicated curator, who has permanent contact with participating laboratories and encourages the establishment of consensus criteria for molecular analysis, reporting, annotating, integrating and interpreting the enormous quantity of data produced by these laboratories. Establishing a coordinated national CFTR database using a single format and a common user interface will allow the sharing of inestimable information between specialists of the gene. A next goal will be to connect the UMD-CFTR to the National CF Registry. The UMD-CFTR system presented in this study will facilitate research programs on epidemiology studies, genotype–phenotype correlations, natural history, standards of care, feasibility studies and recruitment of patients into clinical trials.

Acknowledgments We are grateful to the French Association against CF (VLM, Vaincre la Mucoviscidose) for its constant support and Doctoral fellowship grants. The UMD-CFTR is a freely available online resource, which can be

1018

HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

accessed on the World Wide Web at http://www.umd.be/CFTR/. We thank the reviewers for their comments and suggestions. AUTHORS (CONTRIBUTIONS) C. Bareil, C. The`ze, D. Paulet, C. Rene´, and C. Guittard entered molecular data. C. Bareil and M. des Georges curated all the 3,973 data, and tested the UMD tools for their efficiency and quality. C. Bareil and M. Claustres wrote the article. C. Be´roud and D. Hamroun conceived and developed the informatics of the UMD-CFTR system. M. Claustres conceived and coordinated the study, and obtained specific funding for the curation of the database. All the authors read and approved the final manuscript.

References Beroud C, Collod-Beroud G, Boileau C, Soussi T, Junien C. 2000. UMD (Universal mutation database): a generic software to build and analyze locus-specific databases. Hum Mutat 15:86–94. Beroud C, Hamroun D, Collod-Beroud G, Boileau C, Soussi T, Claustres M. 2005. UMD (Universal Mutation Database): 2005 update. Hum Mutat 26:184–191. Burgel PR, Fajac I, Hubert D, Grenet D, Stremler N, Roussey M, Siret D, Languepin J, Mely L, Fanton A, Labbe´ A, Domblides P, Vic P, Dagorne M, Reynaud-Gaubert M, Counil F, Varaigne F, Bienvenu T, Bellis G, Dusser D. 2010. Non-classic cystic fibrosis associated with D1152H CFTR mutation. Clin Genet 77:355–364. Chen JM, Cutler C, Jacques C, Boeuf G, Denamur E, Lecointre G, Mercier B, Cramb G, Ferec C. 2001. A combined analysis of the cystic fibrosis transmembrane conductance regulator: implications for structure and disease models. Mol Biol Evol 18:1771–1788. Chu CS, Trapnell BC, Curristin S, Cutting GR, Crystal RG. 1993. Genetic basis of variable exon 9 skipping in cystic fibrosis transmembrane conductance regulator mRNA. Nat Genet 3:151–156. Claustres M, Altieri JP, Guittard C, Templin C, Chevalier-Porst F, Des Georges M. 2004. Are p.I148T, p.R74W and p.D1270N cystic fibrosis causing mutations? BMC Med Genet 5:19. Claustres M, Guittard C, Bozon D, Chevalier F, Verlingue C, Ferec C, Girodon E, Cazeneuve C, Bienvenu T, Lalau G, Dumur V, Feldmann D, Bieth E, Blayau M, Clavel C, Creveaux I, Malinge MC, Monnier N, Malzac P, Mittre H, Chomel JC, Bonnefont JP, Iron A, Chery M, Georges MD. 2000. Spectrum of CFTR mutations in cystic fibrosis and in congenital absence of the vas deferens in France. Hum Mutat 16:143–156. Claustres M, Horaitis O, Vanevski M, Cotton RG. 2002. Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. Genome Res 12:680–688. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT. 2008. Recommendations for locus-specific databases and their curation. Hum Mutat 29:2–5. Cotton RG, Phillips K, Horaitis O. 2007. A survey of locus-specific database curation. Human Genome Variation Society. J Med Genet 44:e72. Cuppens H, Lin W, Jaspers M, Costes B, Teng H, Vankeerberghen A, Jorissen M, Droogmans G, Reynaert I, Goossens M, Nilius B, Cassiman JJ. 1998. Polyvariant mutant cystic fibrosis transmembrane conductance regulator genes. The polymorphic (Tg)m locus explains the partial penetrance of the T5 polymorphism as a disease mutation. J Clin Invest 101:487–496. des Georges M, Guittard C, Templin C, Altieri JP, de Carvalho C, Ramsay M, Claustres M. 2008. WGA allows the molecular characterization of a novel large CFTR rearrangement in a black South African cystic fibrosis patient. J Mol Diagn 10:544–548. Desgeorges M, Rodier M, Piot M, Demaille J, Claustres M. 1995. Four adult patients with the missense mutation L206W and a mild cystic fibrosis phenotype. Hum Genet 96:717–720. Du M, Liu X, Welch EM, Hirawat S, Peltz SW, Bedwell DM. 2008. PTC124 is an orally bioavailable compound that promotes suppression of the human CFTR-G542X nonsense allele in a CF mouse model. Proc Natl Acad Sci USA 105:2064–2069. Faa V, Bettoli PP, Demurtas M, Zanda M, Ferri V, Cao A, Rosatelli MC. 2006. A new insertion/deletion of the cystic fibrosis transmembrane conductance regulator gene accounts for 3.4% of cystic fibrosis mutations in sardinia: implications for population screening. J Mol Diagn 8:499–503. Frederic MY, Lalande M, Boileau C, Hamroun D, Claustres M, Beroud C, Collod-Beroud G. 2009. UMD-predictor, a new prediction tool for nucleotide substitution pathogenicity––application to four genes: FBN1, FBN2, TGFBR1, and TGFBR2. Hum Mutat 30:952–959. Girardet A, Guittard C, Altieri JP, Templin C, Stremler N, Beroud C, des Georges M, Claustres M. 2007. Negative genetic neonatal screening for cystic fibrosis caused by compound heterozygosity for two large CFTR rearrangements. Clin Genet 72:374–377.

Greenblatt MS, Brody LC, Foulkes WD, Genuardi M, Hofstra RM, Olivier M, Plon SE, Sijmons RH, Sinilnikova O, Spurdle AB. 2008. Locus-specific databases and recommendations to strengthen their contribution to the classification of variants in cancer susceptibility genes. Hum Mutat 29:1273–1281. Groman JD, Hefferon TW, Casals T, Bassas L, Estivill X, Des Georges M, Guittard C, Koudova M, Fallin MD, Nemeth K, Fekete G, Kadasi L, Friedman K, Schwarz M, Bombieri C, Pignatti PF, Kanavakis E, Tzetis M, Schwartz M, Novelli G, D’Apice MR, Sobczynska-Tomaszewska A, Bal J, Stuhrmann M, Macek Jr M, Claustres M, Cutting GR. 2004. Variation in a repeat sequence determines whether a common variant of the cystic fibrosis transmembrane conductance regulator gene is pathogenic or benign. Am J Hum Genet 74:176–179. Kartner N, Augustinas O, Jensen TJ, Naismith AL, Riordan JR. 1992. Mislocalization of delta F508 CFTR in cystic fibrosis sweat gland. Nat Genet 1:321–327. Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui LC. 1989. Identification of the cystic fibrosis gene: genetic analysis. Science 245:1073–1080. Kerem E, Hirawat S, Armoni S, Yaakov Y, Shoseyov D, Cohen M, Nissim-Rafinia M, Blau H, Rivlin J, Aviram M, Elfring GL, Northcutt VJ, Miller LL, Kerem B, Wilschanski M. 2008. Effectiveness of PTC124 treatment of cystic fibrosis caused by nonsense mutations: a prospective phase II trial. Lancet 372:719–727. Kiesewetter S, Macek Jr M, Davis C, Curristin SM, Chu CS, Graham C, Shrimpton AE, Cashman SM, Tsui LC, Mickle J, Amos J, Highsmith WE, Shuber A, Witt DR, Crystal RG, Cutting GR. 1993. A mutation in CFTR produces different phenotypes depending on chromosomal background. Nat Genet 5:274–278. Krasnov KV, Tzetis M, Cheng J, Guggino WB, Cutting GR. 2008. Localization studies of rare missense mutations in cystic fibrosis transmembrane conductance regulator (CFTR) facilitate interpretation of genotype-phenotype relationships. Hum Mutat 29:1364–1372. Linde L, Boelz S, Nissim-Rafinia M, Oren YS, Wilschanski M, Yaacov Y, Virgilis D, Neu-Yilik G, Kulozik AE, Kerem E, Kerem B. 2007. Nonsense-mediated mRNA decay affects nonsense transcript levels and governs response of cystic fibrosis patients to gentamicin. J Clin Invest 117:683–692. McCaughan KK, Brown CM, Dalphin ME, Berry MJ, Tate WP. 1995. Translational termination efficiency in mammals is influenced by the base following the stop codon. Proc Natl Acad Sci USA 92:5431–5435. Munck A, Houssin E, Roussey M. 2009. The importance of sweat testing for older siblings of patients with cystic fibrosis identified by newborn screening. J Pediatr 155:928–930 e1. Mussaffi H, Prais D, Mei-Zahav M, Blau H. 2006. Cystic fibrosis mutations with widely variable phenotype: the D1152H example. Pediatr Pulmonol 41:250–254. Reboul MP, Tandonnet O, Biteau N, Belet-de Putter C, Rebouissoux L, Moradkhani K, Vu PY, Saura R, Arveiler B, Lacombe D, Taine L, Iron A. 2006. Mosaic maternal uniparental isodisomy for chromosome 7q21-qter. Clin Genet 70:207–213. Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, Zielenski J, Lok S, Plavsic N, Chou JL, Drumm ML, Iannuzzi MC, Collins FS, Tsui LC. 1989. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245:1066–1073.

Rohlfs EM, Weinblatt VJ, Treat KJ, Sugarman EA. 2004. Analysis of 3208 cystic fibrosis prenatal diagnoses: impact of carrier screening guidelines on distribution of indications for CFTR mutation and IVS-8 poly(T) analyses. Genet Med 6:400–404. Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N, Zsiga M, Buchwald M, Riordan JR, Tsui LC, Collins FS. 1989. Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245:1059–1065. Rosenstein BJ, Cutting GR. 1998. The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. J Pediatr 132:589–595. Rozen R, Ferreira-Rajabi L, Robb L, Colman N. 1995. L206W mutation of the cystic fibrosis gene, relatively frequent in French Canadians, is associated with atypical presentations of cystic fibrosis. Am J Med Genet 57:437–439. Serohijos AW, Hegedus T, Aleksandrov AA, He L, Cui L, Dokholyan NV, Riordan JR. 2008. Phenylalanine-508 mediates a cytoplasmic-membrane domain contact in the CFTR 3D structure crucial to assembly and channel function. Proc Natl Acad Sci USA 105:3256–3261. Thauvin-Robinet C, Munck A, Huet F, Genin E, Bellis G, Gautier E, Audrezet MP, Ferec C, Lalau G, Georges MD, Claustres M, Bienvenu T, Ge´rard B, Boisseau P, Cabet-Bey F, Feldmann D, Clavel C, Bieth E, Iron A, Simon-Bouy B, Costa C, Medina R, Leclerc J, Hubert D, Nove´-Josserand R, Sermet-Gaudelus I, Rault G, Flori J, Leroy S, Wizla N, Bellon G, Haloun A, Perez-Martin S, d’Acremont G, Corvol H, Cle´ment A, Houssin E, Binquet C, Bonithon-Kopp C, Alberti-Boulme´ C, Morris MA, Faivre L, Goossens M, Roussey M, Collaborating Working Group on R117H, Girodon E. 2009. The very low penetrance of cystic fibrosis for the R117H mutation: a reappraisal for genetic counselling and newborn screening. J Med Genet 46:752–758. Tuffery-Giraud S, Beroud C, Leturcq F, Yaou RB, Hamroun D, Michel-Calemard L, Moizard MP, Bernard R, Cossee M, Boisseau P, Blayau M, Creveaux I, Guiochon-Mantel A, de Martinville B, Philippe C, Monnier N, Bieth E, Khau Van Kien P, Desmet FO, Humbertclaude V, Kaplan JC, Chelly J, Claustres M. 2009. Genotype–phenotype analysis in 2,405 patients with a dystrophinopathy using the UMD-DMD database: a model of nationwide knowledgebase. Hum Mutat 30:934–945. Welch EM, Barton ER, Zhuo J, Tomizawa Y, Friesen WJ, Trifillis P, Paushkin S, Patel M, Trotta CR, Hwang S, Wilde RG, Karp G, Takasugi J, Chen G, Jones S, Ren H, Moon YC, Corson D, Turpoff AA, Campbell JA, Conn MM, Khan A, Almstead NG, Hedrick J, Mollin A, Risher N, Weetall M, Yeh S, Branstrom AA, Colacino JM, Babiak J, Ju WD, Hirawat S, Northcutt VJ, Miller LL, Spatrick P, He F, Kawana M, Feng H, Jacobson A, Peltz SW, Sweeney HL. 2007. PTC124 targets genetic disorders caused by nonsense mutations. Nature 447:87–91. Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PE. 2008. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 29:6–13. Zielenski J, Rozmahel R, Bozon D, Kerem B, Grzelczak Z, Riordan JR, Rommens J, Tsui LC. 1991. Genomic DNA sequence of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. Genomics 10:214–228. Zlotogora J. 2004. Parents of children with autosomal recessive diseases are not always carriers of the respective mutant alleles. Hum Genet 114:521–526.

HUMAN MUTATION, Vol. 31, No. 9, 1011–1019, 2010

1019