Crystal Structure of an Archaeal Intein-encoded ... - Science Direct

16 downloads 0 Views 785KB Size Report
forms a saddle-shaped DNA-binding surface. Each .... structures with the b-sheets forming the saddle- ..... and protease inhibitors (Complete2, Boehringer Man-.
doi:10.1006/jmbi.2000.3873 available online at http://www.idealibrary.com on

J. Mol. Biol. (2000) 300, 889±901

Crystal Structure of an Archaeal Intein-encoded Homing Endonuclease PI-PfuI Kenji Ichiyanagi1,3, Yoshizumi Ishino2, Mariko Ariyoshi1 Kayoko Komori2 and Kosuke Morikawa1* 1

Department of Structural Biology and 2

Department of Molecular Biology, Biomolecular Engineering Research Institute 6-2-3 Furuedai, Suita, Osaka 565-0874, Japan 3

Graduate School of Science Osaka University, Toyonaka Osaka, 560-0043, Japan

Inteins possess two different enzymatic activities, self-catalyzed protein splicing and site-speci®c DNA cleavage. These endonucleases, which are classi®ed as part of the homing endonuclease family, initiate the mobility of their genetic elements into homologous alleles. They recognize long asymmetric nucleotide sequences and cleave both DNA strands in a Ê crystal structure of the monomer form. We present here the 2.1 A archaeal PI-PfuI intein from Pyroccocus furiosus. The structure reveals a unique domain, designated here as the Stirrup domain, which is inserted between the Hint domain and an endonuclease domain. The horseshoeshaped Hint domain contains a catalytic center for protein splicing, which involves both N and C-terminal residues. The endonuclease domain, which is inserted into the Hint domain, consists of two copies of substructure related by an internal pseudo 2-fold axis. In contrast with the I-CreI homing endonuclease, PI-PfuI possibly has two asymmetric catalytic sites at the center of a putative DNA-binding cleft formed by a pair of four-stranded b-sheets. DNase I footprinting experiments showed that PI-PfuI covers more than 30 bp of the substrate asymmetrically across the cleavage site. A docking model of the DNA-enzyme complex suggests that the endonuclease domain covers the 20 bp DNA duplex encompassing the cleavage site, whereas the Stirrup domain could make an additional contact with another upstream 10 bp region. For the double-strand break, the two strands in the DNA duplex were cleaved by PI-PfuI with different ef®ciencies. We suggest that the cleavage of each strand is catalyzed by each of the two non-equivalent active sites. # 2000 Academic Press

*Corresponding author

Keywords: endonuclease; homing; protein splicing; Pyrococcus furiosus; X-ray crystallography

Introduction The recent discovery of a new class of proteins known as inteins has shed a new light on gene expression (for a review, see Cooper & Stevens, 1995). Inteins are translated as a part of a precursor polypeptide and are subsequently excised from it, through a self-catalyzed event that involves four concerted chemical reactions (Cooper & Stevens, 1995; Perler et al., 1997b). The two external regions, Present address: K. Ichiyanagi, Molecular Genetics Program, Wadsworth Center, New York State Department of Health, P.O. Box 22002, Albany, NY 12201-2002, USA. Abbreviations used: DRR, DNA recognition region. E-mail address of the corresponding author: [email protected] 0022-2836/00/040889±13 $35.00/0

called exteins, are simultaneously ligated to produce a mature protein with a speci®c function, while the intein often shows a site-speci®c endonuclease activity. The gene, therefore, codes for two different polypeptide chains with unrelated activities when split. The crystal structures of inteins from Mycobacterium xenopi GyrA (Klabunde et al., 1998) and a vacuolar ATPase subunit of Saccharomyces cerevisiae (PI-SceI) (Duan et al., 1997) have been reported. In spite of low amino acid sequence similarity, these inteins possess a common horseshoe-like structural motif containing the protein-splicing catalytic center. This motif is known as the Hint (Hedgehog and intein) domain, due to its structural resemblance to the C-terminal domain of the Drosophila melanogaster Hedgehog protein (Hall et al., 1997), which exhibits a self-catalyzed proteolytic activity. # 2000 Academic Press

890 Their N and C termini are located in a con®ned region in the tertiary structures, in agreement with the presence of catalytic residues in the two exteinintein junctions (Cooper & Stevens, 1995; Duan et al., 1997; Perler et al., 1997b; Klabunde et al., 1998). The intein-encoded site-speci®c endonucleases, which are classi®ed as part of the homing endonuclease family, speci®cally cleave DNA at, or very close to, the insertion site of the intein gene in the intein-missing (Intÿ) homologous alleles (for reviews, see Mueller et al., 1993; Lambowitz & Belfort, 1993; Belfort & Perlman, 1995; Belfort & Roberts, 1997; Jurica & Stoddard, 1999). The double-strand break facilitates the so-called intein homing process, whereby the intein gene is copied into the Intÿ allele. The homing endonuclease activity thus guarantees intein propagation, while the protein-splicing activity allows their presence within the host proteins. The homing endonuclease genes have been found also within self-splicing group I, group II, and archaeal introns (Mueller et al., 1993; Lambowitz & Belfort, 1993; Belfort & Perlman, 1995; Belfort & Roberts, 1997; Jurica & Stoddard, 1999). They constitute the LAGLIDADG (also called dodecapeptide family), bba-Me (KuÈhlmann et al., 1999), and GIY-YIG families. The LAGLIDADG family, which is characterized by one or two copies of the LAGLIDADG sequence motif, contains most of the intein-encoded endonucleases known to date. The mono-motif enzymes act as a homodimer and the bi-motif enzymes function as a monomer. They both recognize a long nucleotide stretch of usually 14-40 bp, and are tolerant to single base-pair substitutions within the stretch. The recognition sequences are often asymmetric, although mono-motif dimeric enzymes recognize pseudo palindromic sequences (Mueller et al., 1993; Lambowitz & Belfort, 1993; Belfort & Perlman, 1995; Belfort & Roberts, 1997; Jurica & Stoddard, 1999). The crystal structures of apoenzymes in the LAGLIDADG family have been reported; I-CreI from a group I intron in Clamydomonas reinhardtii chloroplast (Heath et al., 1997), I-DmoI from an archaeal intron of Desulfurococcus mobilis (Silva et al., 1999), and PI-SceI (Duan et al., 1997). The dimeric I-CreI enzyme is made of identical subunits related by a 2-fold axis. The I-DmoI monomer and the endonuclease region in PI-SceI are composed of two domains, which are related by an internal pseudo 2-fold axis. They show a common a/b structure, in which a pair of four-stranded b-sheets forms a saddle-shaped DNA-binding surface. Each of the two LAGLIDADG sequences forms part of an a-helix, with the two helices facing each other across the 2-fold axis to form an inter-domain, or inter-subunit, interface. The C-terminal ends of the helices constitute the endonuclease catalytic centers. In the DNA-Ca2‡-enzyme ternary complex structure of I-CreI (Jurica et al., 1998), the b-strands of each subunit follow the DNA major groove to interact with nine consecutive base-pairs within a

Archaeal Homing Endonuclease Structure

half of the recognition sequence. The crystal structure of a bba-Me homing endonuclease I-PpoI from a Physarum polycephalum nuclear intron (Flick et al., 1998), complexed with DNA, has revealed a similar strategy for sequence recognition that relies on the ¯exibility of the b strands. Two inteins from a hyperthermophilic archaeon, Pyrococcus furiosus, which are spliced out from a ribonucleotide reductase, have recently been isolated and characterized (Riera et al., 1997; Komori et al., 1999a,b). The spliced inteins show endonuclease activities speci®c for their cognate Intÿ sequences, and were designated as PI-PfuI and PIPfuII, respectively. The homing of these intein genes has not been demonstrated experimentally; however, the endonuclease activities of PI-PfuI and PI-PfuII strongly suggest the homing ability, as a feature of the homing endonucleases in intein/ intron homing is thought to be the generation of a double-strand break within the Intÿ DNA (Mueller et al., 1993; Lambowitz & Belfort, 1993; Belfort & Perlman, 1995; Belfort & Roberts, 1997; Jurica & Stoddard, 1999). Both the 53 kDa PI-PfuI and 44 kDa PI-PfuII endonucleases have two LAGLIDADG motifs, which are separated by about 100 amino acid residues. The enzymatic properties of PI-PfuI and PI-PfuII are similar to those of other LAGLIDADG homing endonucleases; however, PIPfuI shows an atypical activity, in that it readily binds to Holliday junctions, which are intermediates during homologous DNA recombination (Holliday, 1964). This binding activity is of particular interest, as the double-strand-break repair (DSBR) pathway for the intein/intron homing involves the formation and resolution of Holliday junctions (Belfort & Perlman, 1995; Mueller et al., 1996). Here, we report the crystal structure of PI-PfuI. This intein shows a tripartite domain organization, including a PI-PfuI-speci®c domain. A docking model of the protein-DNA complex suggests a possible recognition scheme with a long asymmetric DNA stretch. We also show that PI-PfuI introduces an incision on each DNA strand with different ef®ciencies. This asymmetric cleavage can be explained from the three-dimensional structural view, which suggests two non-equivalent catalytic sites.

Results and Discussion Overall structure The crystal structure of PI-PfuI, containing all Ê resolution with 454 residues, was solved at 2.1 A Rwork ˆ 0.192 and Rfree ˆ 0.232 (Table 1). The PIPfuI structure consists of three distinct domains (Figure 1(a)). Two of them correspond to the Hint and endonuclease domains, which are responsible for protein-splicing and DNA-recognition/cleavage activities, respectively. The third domain is inserted between the Hint and endonuclease domains. We designated this domain as the Stirrup

Table 1. Data collection and re®nement statistics A. Data collection and MIR phasing statistics a

Source or equipment Ê) Wavelength (A Ê) Resolution (A Completeness (%)b R-merge (%)b,c Unique reflections R-iso Isomorphous phasing powerd (acentric/ centric) Anomalous phasing powere Isomorphous Rcullisf (acentric/ centric) Anomalous R-cullisg Overall figure of merit (acentric/ centric) B. Refinement statistics Ê) Resolution (A Protein atoms Ê 2) (average B value, A Solvent atoms Ê 2) (average B value, A Zn atom (B value, Ê 2) A R-workingb,h R-freeb,h Ê) RMS bond length (A RMS bond angles (deg.) a

Data set

Native

HgCl2 1

K2PtCl4 1

Se-Met 1

HgCl2 2

K2PtCl4 2

Se-Met 2

KAu(CN)2

PF BL6a 1.00 30-2.1 93.7 (84.9) 5.4 (25.5) 31152

PF BL6a 1.00 30-2.8 97.0 (89.9) 4.6 (10.5) 13572 0.179

PF BL6a 1.00 30-2.8 95.5 (85.7) 4.5 (15.4) 13564 0.291

PF BL6a 0.95 30-2.8 94.7 (92.9) 5.3 (10.0) 13136 0.100

Dip100 1.54 30-3.0 85.5 (88.0) 9.3 (35.0) 9791 0.195

Dip100 1.54 30-3.0 95.7 (95.3) 12.5 (42.9) 11092 0.298

Dip100 1.54 30-3.0 92.1 (93.1) 6.5 (11.4) 10588 0.098

Dip100 1.54 30-3.0 88.0 (90.9) 8.7 (25.2) 10121 0.155

1.73/1.41

1.60/1.44

2.78/2.68

1.74/2.08

1.82/1.50

3.04/2.42

1.09/1.24

1.35

1.17

1.15

-

-

-

-

0.74/0.77 0.81

0.84/0.82 0.80

0.60/0.59 0.85

0.75/0.65 -

0.82/0.80 -

0.56/0.53 -

0.89/0.87 -

0.686/0.755 30-2.1 3755 (35.0) 332 (58.8) 1 (32.5) 0.191 (0.281) 0.232 (0.302) 0.006 1.110

For the native crystals, two data sets with different rotation axes were collected and merged. Values in parentheses refer to statistics in the highest-resolution shell. R-merge ˆ jIobs ÿ hIij/Iobs. d Isomorphous phasing power is de®ned as hFhi/(e), where hFhi is the mean calculated amplitude for the heavy-atom model and (e) is the phase-integrated lack-of-closure error for the isomorphous differences. e Anomalous phasing power is de®ned as hFcalci/(e), where hFcalci is the mean calculated Bivoet differences from the heavy-atom model and (e) is the phase-integrated lack-of-closure error for the anomalous differences. f Isomorphous R-cullis is de®ned as (e)/hFph ÿ Fpi, where (e) is the phase-integrated lack-of-closure error for the isomorphous differences, hFph ÿ Fpi is the mean isomorphous differences. g Anomalous R-cullis is de®ned as (e)/hFph(‡) ÿ Fph(ÿ)i, where (e) is the phase-integrated lack-of-closure error for the anomalous differences and hFph(‡) ÿ Fph(ÿ)i is the mean Bivoet differences for the heavy-atom derivative. h R-working ˆjFobs ÿ Fcalcj/Fobs, where Fobs and Fcalc are the observed and calculated structure factor amplitudes, respectively, for a randomly selected 95 % portion of the data set. R-free was calculated using a randomly selected 5 % portion of the data set, which was omitted through all stages of re®nement. b c

892

Archaeal Homing Endonuclease Structure

Figure 1 (legend shown on page 908)

domain due to its resemblance to an antique Japanese stirrup. The overall structure of PI-PfuI is similar to that of PI-SceI (Duan et al., 1997), a yeast intein-encoded endonuclease (Figure 1(b)). PI-SceI also shows a tripartite domain structure, where the Hint domain and an inserted region, termed DRR (DNA recognition region), together comprise the protein-splicing domain (Duan et al., 1997; Hu et al., 2000). The Hint domains in PI-PfuI and PI-SceI show common horseshoe-shaped structures, which are composed

almost entirely of b-sheets. Both endonuclease domains are built up from two copies of a/b substructures with the b-sheets forming the saddleshaped surfaces of the domains. The two endonuclease domains are inserted topologically at identical positions within the Hint domains. Both the Stirrup domain of PI-PfuI and the DRR of PI-SceI possess a three-stranded b-sheet that forms part of the domain surface. However, in addition to their divergent amino acid sequences, there are structural differences between the two inserted domains.

Archaeal Homing Endonuclease Structure

Figure 1 (legend shown on page 908)

893

894 First, the b-strands are approximately twice as long in PI-SceI as in PI-PfuI, and consequently the PISceI DRR shows a more extended conformation. Second, the PI-PfuI Stirrup domain contacts the other two domains, whereas the PI-SceI DRR does not interact with the endonuclease domain but protrudes towards the solvent. Finally, the PI-PfuI Stirrup domain is inserted into a linker region located between the endonuclease and Hint domains, while the PI-SceI DRR is located in an internal loop within the Hint domain (Figure 1(c)). The crystal structure revealed the presence of an unknown metal ion, which lies at the interface between the protein molecules related by the crystallographic 2-fold screw axis. The metal ion is most likely to be a zinc cation, due to its tetrahedral coordination geometry and the weight of approximately ten sigma in an omit map. It is coordinated with residues His156 and His158 in the endonuclease domain of one molecule, and residues Glu353 and Glu405 in the Stirrup domain of the neighboring molecule. It is of interest that the two histidine ligands partly constitute a putative DNA-binding interface, although we cannot exclude the possibility of a crystallographic artifact. It should be noted that PI-PfuI was puri®ed using solvents containing 1 mM EDTA but no zinc cation, and that the fraction containing the monomeric protein was subjected to crystallization. Therefore, we presume that the protein acquired the metal in the Escherichia coli cell, which would have been never dissociated during the puri®cation. Preliminary inductively coupled plasma analysis indicated that the PI-PfuI molecule binds a zinc ion in both solutions with and without DNA (unpublished results). The Hint domain The Hint domain of PI-PfuI shows a horseshoeshaped tertiary structure consisting of ®ve b-sheets with an insertion of an a-helix (Figure 2(a)). Although an alignment of the amino acid sequences of the Hint domains of PI-PfuI, PI-SceI, the M. xenopi GyrA intein, and the D. melanogaster Hedgehog protein shows poor conservation

Archaeal Homing Endonuclease Structure

(Figure 1(c)), the three-dimensional structures of the protein backbones are very similar (not shown). The N-terminal cysteine residue (Cys1), Cterminal asparagine residue (Asn454), and internal and penultimate histidine residues (His99 and His453), which are invariant among inteins, assemble in the center of the horseshoe structure (Figure 2(a)). Their positions and orientations are very similar to corresponding residues in the other known intein structures (not shown), supporting their functional importance in the protein-splicing catalytic reactions (Cooper & Stevens, 1995; Perler et al., 1997b; Klabunde et al., 1998). The endonuclease domain The endonuclease domain comprises two identi®ed structural motifs with an abbabbaa topology, which are related by an internal pseudo 2-fold axis (Figure 2(b)). The superposition of the 62 backbone Ca atoms in the secondary structure elements of the two subdomains gives an average root-meanÊ . Each subdomain square (RMS) deviation of 1.91 A includes the four-stranded b-sheet forming a saddle-shaped surface and assembles together through an interface made of a-helices. The presence of 14 basic residues on the surface of the b-sheets suggests that this large groove may be involved in DNA binding. In the structure of I-CreI dimer complexed with its cognate DNA (Jurica et al., 1998), the corresponding b-sheets have been shown to capture the DNA substrate. The highly conserved LAGLIDADG motifs, named block C and block E (Pietrokovski, 1994; Perler et al., 1997a), form parts of the a-helices 2 and 6, respectively, which together constitute the subdomain interface. The crystal structures of the LAGLIDADG endonuclease regions in PI-PfuI, PI-SceI (Duan et al., 1997), I-CreI (Heath et al., 1997), and I-DmoI (Silva et al., 1999) are very similar (Figure 3). The superposition of approximately 130 Ca atoms in the secondary structure elements of PI-SceI, I-DmoI, and the I-CreI dimer onto those of PI-PfuI provides Ê , 1.65 A Ê , and average RMS deviations of 2.59 A Ê 2.60 A, respectively. However, it should be noted that structural similarities are limited to the helical

Figure 1. (a) Stereoview of the entire Ca backbone of PI-PfuI. The numbers indicate the residues. The three domains are indicated. (b) Ribbon representation showing the overall structures of PI-PfuI (left) and PI-SceI (right). The Hint, endonuclease, and Stirrup domains are colored blue, red, and green, respectively. The DNA recognition region (DRR) of PI-SceI is colored green. (c) Domain organization and a sequence alignment of PI-PfuI. A. The diagram shows the domain arrangements of PI-PfuI. The Hint domain is divided into two parts (Hint-N and Hint-C) by the insertion of two other domains. The numbers under the diagram indicate the ®rst amino acid residues of each domain. B. Amino acid sequence alignment of PI-PfuI and other related proteins. The secondary structure elements of PI-PfuI determined in this study are indicated above the sequence. The Hint domain sequence of PI-PfuI is aligned with those of the same domains in PI-SceI (SceI), the GyrA intein from M. xenopi (GyrA), and the C-terminal domain of the hedgehog protein from D. melanogaster (Hh-C). The sequence of the DRR in the Hint domain of PI-SceI is not shown. The PIPfuI endonuclease domain sequence is aligned with the PI-SceI (SceI), I-DmoI (DmoI), and I-CreI (CreI) sequences. The I-CreI sequence is aligned with both the N and C-terminal subdomain sequences of PI-PfuI. The invariant (dark gray) and conserved (light gray) residues among the sequences are highlighted. The conserved sequence blocks of inteins are indicated.

Archaeal Homing Endonuclease Structure

895

Figure 2. (a) Ribbon representation showing the backbone structure of the Hint domain. The catalytically important residues for protein splicing are shown. (b) Ribbon representation showing the endonuclease domain. The two putative active centers are indicated with the side-chains of Asp149, Asp173, Glu250, and Lys322, which are assumed to participate in the catalytic reaction.

regions only and that their b-sheets vary in terms of curvature, length, and amino acid sequence (Figures 1(c) and 3). This diversity is consistent with differences found in nucleotide sequences in their target DNA duplexes. Silva et al. (1999) have shown that the structures of the I-DmoI N-terminal domain and the PI-SceI C-terminal subdomain are more similar to the I-CreI subunit than the others, implying that the two structural motifs may have evolved independently in I-DmoI and PI-SceI. In the case of PI-PfuI, the C-terminal subdomain resembles I-CreI more than its sister subdomain (average RMS deviations of approximately 60 Ca Ê pairs in the secondary structure elements; 2.78 A Ê for I-CreI and the N-terminal subdomain, 1.19 A for I-CreI and the C-terminal subdomain). This similarity is supported by the observation that the C-terminal subdomain contains a fourth a-helix (a9) with a putative catalytic lysine residue at a position almost identical with that of I-CreI. In contrast, this helix is absent from the N-terminal subdomain of PI-PfuI, both subdomains of PI-SceI, and both domains of I-DmoI.

We have shown that two acidic residues, the invariant Asp149 and the highly conserved Glu250 (Figure 1(c)), of the LAGLIDADG motifs of PI-PfuI are catalytically essential (Komori et al., 1999b). In the crystal structure, they are located at the ends of the a2 and a6 helices in symmetrically identical positions (Figure 2(b)). Both side-chains protrude toward the solvent from the center of the saddle surface. Their orientations are similar to those of acidic residues in the PI-SceI, I-DmoI, and I-CreI catalytic centers (not shown). In addition to the two acidic residues, there are two putative catalytic residues. First, the Lys322 side-chain protrudes from the C-terminal subdomain toward the two acidic residues (Figure 2(b)). This residue is conserved in PI-SceI (Lys403), I-CreI (Lys98), and IDmoI (Lys120). The catalytic importance of Lys403 in PI-SceI (Gimble et al., 1998) and of Lys98 in ICreI (Seligman et al., 1997) implies that Lys322 in PI-PfuI could contribute to the activity. In the ICreI dimer, the two symmetrical Lys98 residues are present and, in PI-SceI, the catalytically important Lys301 residue (He et al., 1998; Gimble et al., 1998) occupies a position symmetrically related with

896

Archaeal Homing Endonuclease Structure

Figure 3. Comparison of four LAGLIDADG enzyme structures. The endonuclease domains of the intein-encoded endonucleases, PI-PfuI and PI-SceI, are shown on the top. The intron-encoded I-DmoI monomer and I-CreI dimer are represented on the bottom.

Lys403. In PI-PfuI, however, a lysine residue corresponding to Lys301 of PI-SceI is absent from the Nterminal subdomain. The other candidate for catalytic residues of PI-PfuI is Asp173 lying in the close vicinity of Asp149 and Lys322 (Figure 2(b)), since Gln47 of I-CreI and Asp229 of PI-SceI, located in positions structurally equivalent to the Asp173 residue, have been shown to be catalytically important (Seligman et al., 1997; He et al., 1998; Christ et al., 1999). This acidic residue is, however, replaced by Met263 in the C-terminal subdomain of PI-PfuI. As a consequence, the putative catalytic residues are distributed asymmetrically in PI-PfuI, in contrast to PI-SceI and I-CreI. The Stirrup domain The Stirrup domain, with a molecular mass of 9 kDa, is folded into an a/b structure and inserted between the endonuclease and Hint domains (Figure 1(c)). The domain exposes a three-stranded b-sheet to the solvent, which lies against a-helices (Figure 1(a) and (b)). The positive charges distributed on the surface of the b-sheet may participate in DNA binding (see below). A BLAST search (Altschul et al., 1990) revealed that the sequence of the Stirrup domain has no apparent similarity with any known sequence. A DALI search (Holm &

Sander, 1999) of the domain also revealed an absence of signi®cantly similar structures in the available database. Therefore, at the moment, the folding of this domain appears to be speci®c to PIPfuI. DNA recognition and cleavage In order to investigate the PI-PfuI-DNA recognition mechanism, we ®rst identi®ed the DNA region recognized by PI-PfuI. DNase I footprinting experiments (Figure 4) showed that PI-PfuI protects the ÿ22/‡15 region of the top strand and the ÿ25/‡14 region of the bottom strand (Gÿ9/Gÿ10, Gÿ10/Gÿ11, and C‡9/A‡10 of the top strand, and Tÿ21/Cÿ22 of the bottom strand were not fully protected). These results suggest that PI-PfuI covers approximately 30 bp of the homing site DNA, with about 20 bp of the upstream region and about 10 bp downstream. A primer extension experiment supports this interpretation (Komori et al., 1999a). Under the reaction conditions, PI-PfuI cleaved only the bottom strand. This cleavage is not due to DNase I action, as catalytically inactive PI-PfuI variants caused no cleavage of the bottom strand under similar conditions (data not shown). To identify the putative DNA-binding interface of the enzyme, we calculated the electrostatic

Archaeal Homing Endonuclease Structure

897

Figure 4. (a) DNase I footprinting by PI-PfuI upon its target duplex. A 187 bp duplex containing the PI-PfuI homing site radiolabeled at the 50 -end of the top (left panel) or bottom (right panel) strand was treated with DNase I in the absence (lane 1) or presence of 5, 10, 20, 40, 80, 160, or 320 nM PI-PfuI (lanes 2 to 8, respectively), followed by denaturing polyacrylamide electrophoresis and autoradiography. The protected nucleotides are indicated on the left. CS and the ®lled triangles indicate the cleavage site by PI-PfuI under optimal conditions. (b) Summary of DNase I footprinting. The protected nucleotide stretch is shown by a black box. The cleavage pattern by PI-PfuI is indicated by a white line. The base-pairs are numbered with respect to the intein insertion site.

potentials of the protein surface (Figure 5(a)). The endonuclease domain contains four positively charged b-hairpins, which could embrace a DNA duplex. In contrast with the acidic Hint domain, the Stirrup domain exhibits positive patches on the b-sheet, consisting primarily of Lys359/Arg373 and Arg382/Arg384 side-chains, implying a contact of the Stirrup domain with DNA. A comparative analysis of the PI-PfuI charge distribution and the protein-DNA interaction in the cocrystal structure of I-CreI allowed us to construct a docking model between DNA and the PI-PfuI monomer

(Figure 5(b)). In this model, the endonuclease domain covers 20 bp across the cleavage site, in which the downstream region of 10 bp long (‡1/ ‡ 10) is bound to the N-terminal subdomain and the upstream 10 bp (ÿ10/ÿ 1) contact the C-terminal subdomain. The Stirrup domain could interact with another 10 bp upstream (ÿ20/ÿ 11). On the whole, our model reasonably explains the asymmetric recognition across the cleavage site. In the case of PI-SceI, DNA binding of the DRR in the protein-splicing domain has been suggested (Duan et al., 1997; Grindl et al., 1998; He et al., 1998;

898

Archaeal Homing Endonuclease Structure

their speci®city, thus minimizing their toxicity to the hosts. In contrast, the intron-encoded enzymes I-CreI and I-DmoI bear no additional DNA recognition element (Heath et al., 1997; Jurica et al., 1998; Silva et al., 1999). The cleavage ef®ciencies of the two DNA strands by PI-PfuI are different, as revealed from the DNA cleavage being observed only in the bottom strand under the conditions of the footprinting experiments. This ®nding suggests that the two strands are sequentially cleaved in the order of bottom to top. However, it remains unclear whether these cleavages are due to a single active site or two asymmetric active sites. We were unable to identify the catalytic metal-binding site(s) precisely, due to lack of a signi®cant peak correlated with the presence of a metal ion in the difference Fourier maps, which were calculated from manganese and calcium derivatives prepared by soaking and cocrystallization methods. However, the structural conservation among the four LAGLIDADG endonucleases suggests that they employ similar mechanisms of DNA cleavage. Therefore, the monomeric enzymes are presumed to employ two active sites like the I-CreI dimer (Jurica et al., 1998). Recently, Christ et al. (1999) demonstrated the presence of two distinct active sites in PI-SceI. Our docking model suggests that Glu250 is responsible for the top strand cleavage, while Asp149 cleaves the bottom strand with the assistance of Lys322, which is provided by the opposing subdomain (Figure 6). When considering that the substitution of a catalytic lysine residue reduces the activity by one or more orders of magnitude (Selent et al., 1992; Gimble et al., 1998; He et al., 1998; Ichiyanagi et al., 1998; Komori et al., 1999b), the absence of the catalytic lysine residue in the Glu250 active site is in good agreement with the inef®cient cleavage of the top strand. In addition, Asp173 in the N-terminal subdomain may assist the scission by the putative Asp149/Lys322 catalytic center, whereas the Glu250 catalytic center does not involve this asparatate residue, resulting in a weak catalytic activity.

Figure 5. (a) Electrostatic potential distribution on the surface of PI-PfuI. The surfaces with negative, neutral, and positive potentials are colored by red, white, and blue, respectively. The four b-hairpins in the endonuclease domain are indicated with yellow letters. (b) Docking model for the protein-DNA complex. DNA is shown as a ribbon drawing.

Pingoud et al., 1998, 1999; Hu et al., 2000). As discussed above, this portion is the additional element inserted into the Hint domain. Although the Stirrup domain and the PI-SceI DRR show the differences discussed above, it is conceivable that inteins may employ these domains to enhance

Materials and Methods Bacterial expression and protein purification Since pFINT1 (Komori et al., 1999a) carries the PI-PfuI gene with alanine at the N terminus substituted for the original cysteine residue, the gene was mutagenized to replace the alanine with a cysteine residue by a polymerase chain reaction (PCR) with appropriate primers, bearing pINT150. PI-PfuI was overproduced in E. coli as described (Komori et al., 1999a) by the use of pINT150. The E. coli cells were harvested, disrupted by sonication, and incubated at 90  C for 20 minutes in buffer A (50 mM Tris-HCl (pH 8.5), 2 mM 2-mercaptoethanol, 10 % (v/v) glycerol) supplemented with 0.05 M NaCl and protease inhibitors (Complete2, Boehringer Mannheim) containing 1 mM EDTA). The PI-PfuI protein in the heat-stable fraction was further puri®ed by sequential liquid chromatography with HiTrap-Q, HiTrapheparin, native DNA-cellulose, and Sephacryl S-200 (all

899

Archaeal Homing Endonuclease Structure

(50 mM Bis-Tris-HCl (pH 6.5), 0.4 M NaCl, 5 % polyethylene glycol 4000, 10 % glycerol). Data collection and phase determination Intensity data were collected at room temperature, on either a MAC science imaging diffractometer DIP100 with a Cu rotating anode or beam line 6a at the Photon Factory in Tsukuba. All data were reduced using the DENZO/SCALEPACK crystallographic data reduction program package (Otwinowski & Minor, 1997). The multiple isomorphous replacement and anomalous scattering Ê were calculated using the (MIRAS) phases to 2.8 A program SHARP (Fortelle & Bricogne, 1997) (Table 1). Solvent ¯attening and histogram matching using the program DM (CCP4, 1994) produced an electron density map that allowed tracing of the entire backbone. Figure 6. Possible mechanism of the double-strand break by the two catalytic centers. The DNA backbones of the top (purple) and bottom (red) strands and the protein side-chains of Asp149, Asp173, Glu250, and Lys322 are shown. The arrows indicate the attacks on the scissile phosphate groups. The protein surface is shown in light blue.

columns were from Amersham Pharmacia). All puri®cation steps were performed at 4  C in buffer A containing NaCl. The ®nal gel ®ltration gave a single peak containing the monomeric protein. The peak fractions were individually concentrated and subjected to crystallization. The selenomethionine derivative was obtained as follows. The pINT150 transformant of E. coli B834(DE3), a methionine auxotroph, was cultured in minimal medium supplemented by the normal 20 amino acids. When the absorbance of the culture at 600 nm reached 0.5, the cells were harvested, washed with the minimal buffer, and then incubated in minimal medium lacking amino acids. After 30 minutes, 40 mg/ml selenomethionine, the other 19 amino acids (40 mg/ml each), and 1 mM isopropylb-D-thiogalactopyranoside were added and the culture was incubated for a further four hours. The selenomethionine-substituted PI-PfuI was puri®ed from the cells according to the same procedure as that for the native protein, except that 10 mM dithiothreitol (DTT) was included in the puri®cation buffer instead of 2-mercaptoethanol. The crystals grew under similar crystallization conditions, with 10 mM DTT instead of 2-mercaptoethanol. Crystallization and heavy-atom derivatization Crystallization was carried out at 20  C by the microdialysis method using a 10 ml button. Crystals suitable for data collection were obtained from a solution containing 10 mg/ml protein, 50 mM Bis-Tris-HCl (pH 6.5), 0.5 M NaCl, 5 % (w/v) polyethylene glycol 4000, 10 % (v/v) dioxane, 10 % glycerol, and 2 mM 2-mercaptoethanol. The monoclinic crystals belong to space group P21 Ê , b ˆ 85.28 A Ê, with unit cell dimensions a ˆ 56.55 A Ê , b ˆ 112.94  . One protein molecule is conc ˆ 65.41 A tained in an asymmetric unit. The heavy-atom derivatives were prepared by soaking the native crystals in 5 mM HgCl2 for three hours, 5 mM K2PtCl4 for 12 hours, or 5 mM KAu(CN)2 for eight hours in a harvest buffer

Model building and refinement Ê The initial molecular model was built into the 2.8 A electron-density map by the use of QUANTA. The crystallographic re®nement against the native data set (30Ê ) was performed using rounds of CNS (BruÈnger 2.1 A et al., 1998) with bulk-solvent correction and anisotopic scaling and manual revision on sA-weighted electrondensity maps using 2Fo ÿ Fc and Fo ÿ Fc coef®cients. The progress of the re®nement was monitored by reductions in Rfree (BruÈnger, 1992). A zinc ion was placed on the site with approximately ten sigma peak in an omit map. Zinc is the most probable metal, because it shows a tetrahedral coordination geometry and the temperature factor of the zinc ion modeled at the site has been re®ned to a value similar to those of the ligand atoms. The current model contains all non-hydrogen protein atoms (residues 1-454), 332 water molecules, and a zinc ion. The model geometry was analyzed using the program PROCHECK (Laskowski et al., 1993). In the Ramachandran plot, 89.7, 9.6, 0.5, and 0.2 % of residues were in the most-favored, additional-allowed, generously allowed, and disallowed regions, respectively. Protein Data Bank accession numbers Coordinates and structure factors have been deposited in the RCSB Protein Data Bank and the accession code is 1DQ3. DNase I footprinting The PI-PfuI homing site duplex of 80 bp, prepared by annealing two synthesized oligonucleotides, was inserted into the EcoRV site of pUC19, bearing pTar180. The substrate DNA for DNase I footprinting was prepared by PCR using (50 -32P)-radiolabeled RV and M4 primers (Takara Shuzo) and pTar180. The reaction mixture contained 25 mM Tris-HCl (pH 7.5), 100 mM KCl, 0.1 mM EDTA, 0.1 mM DTT, 0.1 mg/ml calf thymus DNA, 1 mM CaCl2, 5 mM MgCl2, 1 nM radio-labeled duplex DNA, and 0.2 unit of DNase I in the absence or presence of 5, 10, 20, 40, 80, 160, or 320 nM PI-PfuI. The DNase I digestion was performed in a 10 ml aliquot at 37  C for one minute. The reaction was stopped by the addition of 5 ml of the stop solution (95 % (v/v) formamide, 20 mM EDTA, 0.1 % (w/v) bromophenol blue, 0.1 % (w/v) xylene cyanol) and was analyzed by 9 % polyacrylamide gel electrophoresis in the presence of 8 M urea, followed by autoradiography. The reference sequence ladders

900 were obtained by using the BcaBest2 dideoxy Sequencing Kit (Takara Shuzo) with labeled primers and denatured pTar180. Figure preparation Figures 1, 2, and 3 were drawn by using QUANTA (Molecular Simulation Inc.). Figures 5 and 6 were drawn by use of GRASP (Nicholls et al., 1991).

Acknowledgments We thank Drs N. Kunishima, D. Tsuchiya, and T. Oyama, Mr T. Nishino, Mr K. Yamada, and Ms S. Yamamoto for assistance with the measurements at Photon Factory (Proposal no. 98-G357). Dr S. Tsutakawa is acknowledged for thoughtful assistance throughout the structure determination. We thank Dr E. Morita at Ehime University for measurements for the inductively coupled plasma analysis. We thank Drs A. Bocquier and I. Cann for critically reading the manuscript and Dr Y. Kyogoku for useful discussions.

References Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403-410. Belfort, M. & Perlman, P. S. (1995). Mechanisms of intron mobility. J. Biol. Chem. 270, 30237-30240. Belfort, M. & Roberts, R. J. (1997). Homing endonucleases: keeping the house in order. Nucl. Acids Res. 25, 3379-3388. BruÈnger, A. T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature, 355, 472-475. BruÈnger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallog. sect. D, 54, 905-921. CCP4 (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallog. sect. D, 50, 760-763. Christ, F., SchoÈttler, S., Wende, W., Steuer, S., Pingoud, A. & Pingoud, V. (1999). The monomeric homing endonuclease PI-SceI has two catalytic centres for cleavage of the two strands of its DNA substrate. EMBO J. 18, 6908-6916. Cooper, A. A. & Stevens, T. H. (1995). Protein splicing: self-splicing of genetically mobile elements at the protein level. Trends Biochem. Sci. 20, 351-356. Duan, X., Gimble, F. S. & Quiocho, F. A. (1997). Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity. Cell, 89, 555-564. Flick, K. E., Jurica, M. S., Monnat, R. J., Jr & Stoddard, B. L. (1998). DNA binding and cleavage by the nuclear intron-encoded homing endonuclease IPpoI. Nature, 394, 96-101. Fortelle, E. D. L. & Bricogne, G. (1997). Maximum-likelihood heavy-atom parameter re®nement for multiple isomorphous replacement and multiwavelength anomalous diffraction methods. Methods Enzymol. 276, 472-494.

Archaeal Homing Endonuclease Structure Gimble, F. S., Duan, X., Hu, D. & Quiocho, F. A. (1998). Identi®cation of Lys-403 in the PI-SceI homing endonuclease as part of a symmetric catalytic center. J. Biol. Chem. 273, 30524-30529. Grindl, W., Wende, W., Pingoud, V. & Pingoud, A. (1998). The protein splicing domain of the homing endonuclease PI-SceI is responsible for speci®c DNA binding. Nucl. Acids Res. 26, 1857-1862. Hall, T. M., Porter, J. A., Young, K. E., Koonin, E. V., Beachy, P. A. & Leahy, D. J. (1997). Crystal structure of a Hedgehog autoprocessing domain: homology between Hedgehog and self-splicing proteins. Cell, 91, 85-97. He, Z., Crist, M., Yen, H., Duan, X., Quiocho, F. A. & Gimble, F. S. (1998). Amino acid residues in both the protein splicing and endonuclease domains of the PI-SceI intein mediate DNA binding. J. Biol. Chem. 273, 4607-4615. Heath, P. J., Stephens, K. M., Monnat, R. J., Jr. & Stoddard, B. L. (1997). The structure of I-Crel, a group I intron-encoded homing endonuclease. Nature Struct. Biol. 4, 468-476. Holliday, R. (1964). A mechanism for gene conversion in fungi. Genet. Res. 5, 282-304. Holm, L. & Sander, C. (1999). Protein folds and families: sequence and structure alignments. Nucl. Acids Res. 27, 244-247. Hu, D., Crist, M., Duan, X., Quiocho, F. A. & Gimble, F. S. (2000). Probing the structure of the PI-SceIDNA complex by af®nity cleavage and af®nity photocross-linking. J. Biol. Chem. 275, 2705-2712. Ichiyanagi, K., Iwasaki, H., Hishida, T. & Shinagawa, H. (1998). Mutational analysis on structure-function relationship of a Holliday junction speci®c endonuclease RuvC. Genes Cells, 3, 575-586. Jurica, M. S. & Stoddard, B. L. (1999). Homing endonucleases: structure, function and evolution. Cell Mol. Life Sci. 55, 1304-1326. Jurica, M. S., Monnat, R. J., Jr & Stoddard, B. L. (1998). DNA recognition and cleavage by the LAGLIDADG homing endonuclease I- CreI. Mol. Cell, 2, 469-476. Klabunde, T., Sharma, S., Telenti, A., Jacobs, W. R., Jr & Sacchettini, J. C. (1998). Crystal structure of GyrA intein from Mycobacterium xenopi reveals structural basis of protein splicing. Nature Struct. Biol. 5, 31-36. Komori, K., Fujita, N., Ichiyanagi, K., Shinagawa, H., Morikawa, K. & Ishino, Y. (1999a). PI-PfuI and PIPfuII, intein-coded homing endonucleases from Pyrococcus furiosus. I. Puri®cation and identi®cation of the homing-type endonuclease activities. Nucl. Acids Res. 27, 4167-4174. Komori, K., Ichiyanagi, K., Morikawa, K. & Ishino, Y. (1999b). PI-PfuI and PI-PfuII, intein-coded homing endonucleases from Pyrococcus furiosus. II. Characterization of the binding and cleavage abilities by site-directed mutagenesis. Nucl. Acids Res. 27, 41754182. KuÈhlmann, U. C., Moore, R. G., James, R., Kleanthous, C. & Hemmings, M. A. (1999). Structural parsimony in endonuclease active sites: should the number of homing endonuclease families be rede®ned? FEBS Letters, 463, 1-2. Lambowitz, A. M. & Belfort, M. (1993). Introns as mobile genetic elements. Annu. Rev. Biochem. 62, 587-622. Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallog. 26, 283-291.

Archaeal Homing Endonuclease Structure Mueller, J. E., Bryk, M., Loizos, N. & Belfort, M. (1993). Homing endonucleases. In Nucleases (Linn, S. M., Lloyd, R. S. & Roberts, R. J., eds), pp. 111-143, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Mueller, J. E., Clyman, J., Huang, Y. J., Parker, M. M. & Belfort, M. (1996). Intron mobility in phage T4 occurs in the context of recombination-dependent DNA replication by way of multiple pathways. Genes Dev,, 10, 351-364. Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11, 281-296. Otwinowski, Z. & Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode, Methods Enzymol. 276, 307-326. Perler, F. B., Olsen, G. J. & Adam, E. (1997a). Compilation and analysis of intein sequences. Nucl. Acids Res. 25, 1087-1093. Perler, F. B., Xu, M. Q. & Paulus, H. (1997b). Protein splicing and autoproteolysis mechanisms. Curr. Opin. Chem. Biol. 1, 292-299. Pietrokovski, S. (1994). Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins. Protein Sci. 3, 2340-2350.

901 Pingoud, V., Grindl, W., Wende, W., Thole, H. & Pingoud, A. (1998). Structural and functional analysis of the homing endonuclease PI-SceI by limited proteolytic cleavage and molecular cloning of partial digestion products. Biochemistry, 37, 8233-8243. Pingoud, V., Thole, H., Christ, F., Grindl, W., Wende, W. & Pingoud, A. (1999). Photocross-linking of the homing endonuclease PI-SceI to its recognition sequence. J. Biol. Chem. 274, 10235-10243. Riera, J., Robb, F. T., Weiss, R. & Fontecave, M. (1997). Ribonucleotide reductase in the archaeon Pyrococcus furiosus: a critical enzyme in the evolution of DNA genomes? Proc. Natl Acad. Sci. USA, 94, 475-478. Selent, U., RuÈter, T., KoÈhler, E., Liedtke, M., Thielking, V., Alves, J., Oelgeschlager, T., Wolfes, H., Peters, F. & Pingoud, A. (1992). A site-directed mutagenesis study to identify amino acid residues involved in the catalytic function of the restriction endonuclease EcoRV. Biochemistry, 31, 4808-4815. Seligman, L. M., Stephens, K. M., Savage, J. H. & Monnat, R. J., Jr (1997). Genetic analysis of the Chlamydomonas reinhardtii I-CreI mobile intron homing system in Escherichia coli. Genetics, 147, 1653-1664. Silva, G. H., Dalgaard, J. Z., Belfort, M. & Van Roey, P. (1999). Crystal structure of the thermostable archaeal intron-encoded endonuclease I-DmoI. J. Mol. Biol. 286, 1123-1136.

Edited by I. A. Wilson (Received 17 March 2000; received in revised form 12 May 2000; accepted 15 May 2000)