Computational protein design with electrostatic focusing: Experimental ...

5 downloads 2720 Views 619KB Size Report
Jun 3, 2013 - Folding free energy is used to rank the ability of the sequences to adopt the ... but to find at least one sequence able to adopt the chosen structure. .... bundle of staphylococcal protein A. This type of domain is one of the ...
Biotechnol. J. 2013, 8, 855–864

DOI 10.1002/biot.201200380

www.biotechnology-journal.com

Research Article

Computational protein design with electrostatic focusing: Experimental characterization of a conditionally folded helical domain with a reduced amino acid alphabet Maria Suárez-Diez1, Anaïs M. Pujol2, Manolis Matzapetakis2, Alfonso Jaramillo3 and Olga Iranzo2 1 Laboratory

of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal 3 Université d’Evry, Institut de Biologie des Systèmes et de Synthèse, Évry, France 2 Instituto

Automated methodologies to design synthetic proteins from first principles use energy computations to estimate the ability of the sequences to adopt a targeted structure. This approach is still far from systematically producing native-like sequences, due, most likely, to inaccuracies when modeling the interactions between the protein and its aqueous environment. This is particularly challenging when engineering small protein domains (with less polar pair interactions than with the solvent). We have re-designed a three-helix bundle, domain B, using a fixed backbone and a four amino acid alphabet. We have enlarged the rotamer library with conformers that increase the weight of electrostatic interactions within the design process without altering the energy function used to compute the folding free energy. Our synthetic sequences show less than 15% similarity to any Swissprot sequence. We have characterized our sequences in different solvents using circular dichroism and nuclear magnetic resonance. The targeted structure achieved is dependent on the solvent used. This method can be readily extended to larger domains. Our method will be useful for the engineering of proteins that become active only in a given solvent and for designing proteins in the context of hydrophobic solvents, an important fraction of the situations in the cell.

Received 29 NOV 2012 Revised 22 APR 2013 Accepted 03 JUN 2013

Supporting information available online

Keywords: Computational protein design · Electrostatic focusing · Physical effective energy functions · Reduced alphabet

1 Introduction One of the approaches for the design of proteins with a targeted structure or function is to attack the so-called inverse folding problem: Sequences with the desired Correspondence and current address: Dr. Olga Iranzo, Aix Marseille Université, Centrale Marseille, CNRS, iSm2 UMR 7313, 13397, Marseille, France E-mail: [email protected] Additional correspondence: Dr. Alfonso Jaramillo, Institute of Systems and Synthetic Biology (ISSB), 5 rue Henri Desbruères, F-91030 Évry Cedex, France E-mail: [email protected] Abbreviations: CD, circular dichroism; EEEF, empirical effective energy function; ESI-MS, electrospray mass spectrometry; HPLC, high-pressure liquid chromatography; NMR, nuclear magnetic resonance; PEEF, physical effective energy function; TFE, trifluoroethanol

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

length are explored to find those that stabilize the chosen fold. Folding free energy is used to rank the ability of the sequences to adopt the desired three-dimensional conformation, and only the sequences with the lowest folding free energy are kept and further explored. In nature, the biologically active states are not necessarily the states with lowest free energy; however, within the design of proteins, the goal of the designer is not to replicate nature but to find at least one sequence able to adopt the chosen structure. The design of enzymes with a targeted function poses an additional problem since multiple and often competing objectives have to be considered: Some of the folding free energy may have to be sacrificed to have an optimal electrostatic environment. Computational methods have been designed [1] to tackle this problem and simultaneously consider two objectives; however, these methods still rely on an estimation of the folding free energy to assess the stability of the designed protein.

855

Biotechnology Journal

Biotechnol. J. 2013, 8, 855–864

www.biotechnology-journal.com

Previously, three types of models to compute the free energy of the protein have been defined [2] and they differ in the required amount of experimental information (obtained from the analysis of protein structures). Knowledge-based potentials are derived from the statistical analysis of the sequence of proteins of known structure; empirical effective energy functions (EEEF) include empirical terms accounting for protein stability, whereas physical effective energy functions (PEEF) are based on a first-principles description of the inter-atomic interactions within the elements of the proteins. PEEFs use atomic-level molecular dynamics force fields such as CHARMM or GROMACS [3, 4], and the same parameterization for the energy functions regardless of the target structure. The use of PEEF relies on an accurate physical model able to correctly describe the properties of the underlying small molecules within in the context of a protein structure. This way, protein characteristics such as secondary structure propensities and the polar/non-polar amino acid pattern emerge from the individual properties of the protein constituent elements [5]. Rotamer libraries are commonly used to reduce the number of side-chain conformations in the modeling of the folded state. In known protein structures, some values of the internal dihedral angles of the side chains appear more frequently than others. Rotamer libraries contain, for each residue, a collection of frequently occurring rotamers (rotational isomers), which might also include local effects such as a dependency on the backbone φ and ψ angles [6]. PEEFs have been successfully used in the past for the design of proteins with targeted structures and functions [7]. The cores of these redesigned proteins usually show a striking similarity with native-like sequences; nevertheless, the sequences on the surface of the protein are distinctively non-native [8, 9]. Previously, it has been suggested that this failure to recover native-like sequences on the surface of the protein may be attributed either to an inadequate treatment of the contribution of entropy to the free energy of the folded state or to an inaccuracy of the solvation models used to simulate the interaction of the protein with its aqueous environment [7, 9]. It is still an open question whether such designed proteins with non-natural surfaces could lead to folded structures. We propose that the difficulties posed by a possible inadequate solvation model could be circumvented by looking for sequences able to fold in the targeted structure, but for which the computational method increases the effect of the electrostatic interactions within the protein core during the folding process. We aim at a highly accurate description of the electrostatic interactions within the protein core, so that we can compensate the effect of having a poor description of the interaction with the aqueous environment. Therefore, our optimization process ranks the sequences based on their folding energy, but only considers those sequences where the main change in free energy during the folding process is due to

856

the electrostatic interactions within its core, and these interactions will be the main driving force of the folding process. For these sequences, the effect of the inaccurate estimation of the solvation energy during the computational design procedure will be less that for those sequences where the solvation energy plays the key role along the folding path. Effective dielectric constants, adapted to describe the charge–charge interactions within the proteins have been shown as efficient methods to predict protein stability [10]. However, we used a first-principles approach, which implies not adding any new parameters. Therefore, we proposed to increase the size of the rotamer library to better satisfy the electrostatic interactions. We chose to use a tailor-made rotamer library that takes into account the electrostatic context of each amino acid and further refined the rotamer conformation using an electrostaticdriven optimization. This electrostatic enhancement only affects the construction of the rotamer library since the ranking of the explored sequences was still done using a first-principles-derived PEEF, which considers water as the solvent. However, the enhancement of the rotamer library allows a better scoring of the sequences with a high number of electrostatic interacting pairs. Using this approach we performed two redesigns of the three-helix bundle of staphylococcal protein A. This type of domain is one of the smallest autonomously folding bundle in nature, containing only 58 amino acids, and the solution structure of Domain B and its related Domain Z have been solved by nuclear magnetic resonance (NMR) [11–13]. In the first design, the only criterion to rank the sequences was the computed values of the folding free energy. For the second design, which was independent of the first, we made the same considerations, but also considered our previous knowledge on the hydrophobic surface patterning of native proteins. As we were mainly interested in the comparison between both approaches, we further simplified the design by using a reduced alphabet consisting of only four amino acids: isoleucine (I), lysine (L), glutamic acid (E) and alanine (A). These reduced alphabets have been shown to contain enough variability to lead to foldable structures in some reported cases [14–16].

2 Materials and methods 2.1 Computational methodology The computational protein design software, DESIGNER, solves the inverse folding problem [5, 7, 9] and, departing from a high resolution structure, analyses the space of possible sequences to search for those that are likely to fold into the desired target structure. For each analyzed sequence, DESIGNER uses the molecular mechanics force field CHARMM22 [17], together with experimental hydration coefficients for small molecules in water [18], to

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2013, 8, 855–864 www.biotecvisions.com

construct detailed atomic models of the folded and unfolded states. The folding free energy is evaluated as the difference between the free energy in these states. The implicit solvation model used by DESIGNER is based on the surface area of the atoms that is accessible to water. The goal of the design process is to find sequences minimizing the folding free energy. DESIGNER software has been successfully applied to the redesign of proteins and enzymes [1, 19, 20].

2.1.1 Rotamer library using electrostatic focusing To construct the model of the folded state for each possible sequence, an initial local rotamer library that included the effect of the local backbone in the rotamer conformation was added at each residue position [21]. To further refine this library and increase the sampling of the electrostatic interactions in the overall designing process (electrostatic focusing), we performed a local minimization of the structure of the rotamers. This process has been described elsewhere [22] and can be summarized as follows. For each pair of rotamers we perform an initial minimization of the structure, setting the backbone charges to zero to restrict the optimization to the interactions among the considered rotamers. Afterwards, a second minimization is done in a low-dielectric environment (with the dielectric constant of the medium artificially set to 8) to bias the search towards H-bond formation and to obtain the refined conformation of the residues. Only rotamer pairs with interaction energy lower than –5 kcal/mol were kept. The final number of rotamers was reduced through a clustering of the conformations, so that only rotamers differing more that 20o in their conformational angles were stored for further use. Once the library had been refined, we restore the values of the parameters defining our energy function to their initial values. This process was iterated for the four considered scaffolds, and in every case these libraries contained around 1500 rotamers in the first design, and 1200 for the second one (when only hydrophobic amino acids were allowed in the core positions).

2.1.2 Combinatorial optimization To select the sequence with the lowest folding free energy for each scaffold, we employed the Monte Carlo simulated annealing (MCSA) optimization algorithm, together

with the Metropolis criterion to accept/reject solutions. The temperature followed an exponentially cooling scheme, so that temperature decreased from the original value T0/R = 1  kcal/mol to the final value Tf/R = 0.01 kcal/mol. For each scaffold, we performed 200 independent optimization runs with 105 iterations for the MCSA.

2.2 Peptide synthesis The peptides Native (Domain B), IKEA and IKEAW35 (see Table 1 for sequences) were prepared on the CEM Liberty microwave-assisted peptide synthesizer using standard Fmoc chemistry [23]. Crude peptides were purified by reversed-phase high-pressure liquid chromatography (HPLC) and characterized using electrospray mass spectrometry (ESI-MS). All the peptides are amidated at the C terminus and free amine at the N terminus. Full descriptions of the processes and ESI-MS results are given in the Supporting information.

2.3 Circular dichroism spectroscopy The circular dichroism (CD) spectra were acquired using a JASCO J-815 spectrometer with 0.1-cm strain-free quartz cuvettes. Sample preparation and parameters used to record the CD spectra are described in Supporting information. All the CD spectra are reported in molar ellipticity ([θ], deg.cm2.dmol–1.residue–1), which was calculated using the following equation: [θ ] =

θ abs C × l × 10 × n

where θobs is the observed ellipticity in millidegrees, l the optical path length of the cell in cm, C the peptide concentration in mol/L, and n the number of residues.

2.4 Analytical ultracentrifugation Sedimentation analysis of proteins Native, IKEA, and IKEAW35 were performed at a concentration of 70–100 µM in 50 mM phosphate buffer pH 7.4 with and without 5.57  M trifluoroethanol (TFE; 40% TFE). These experiments were performed at the Analytical Ultracentrifugation and Macromolecular Interactions Facility at

Table 1. Sequences of the Native and designed peptides

Peptide Native IKEA IKEA2 IKEAW35

Sequence 1

10

24

41

59

_VDNKFNKEQQNAFWEILHLPNLNEEQRNGFIQSLKDDPSQSANLLAEAKKLNDAQAPK-NH2 WVDNKFNEKQKEKKKKAEHLPKKNEKQKKGEKEKEKKEPEQAINKKKEIEKEEEKQAPK-NH2 _VDNKFNKKQEEIEKKIKHLPKKNEKQIKGAKEKIKKKPEQAINIEKEIKKIEEKQAPK-NH2 _VDNKFNKKQEEIEKKIKHLPKKNEKQIKGAKEKWKKKPEQAINIEKEIKKIEEKQAPK-NH2

Code: Fix positions, N-caps, C-caps, Hydrophobic core, Designed positions. A color version of this table is available in the Supporting information (Table S1).

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

857

Biotechnology Journal

Biotechnol. J. 2013, 8, 855–864

www.biotechnology-journal.com

Centro de Investigaciones Biológicas, CSIC (Madrid, Spain) using an XL-I analytical ultracentrifuge (BeckmanCoulter Inc.) with an UV-VIS optics detection system, an An50Ti rotor and 12-mm double-sector centerpieces. A detailed description is given in the Supporting information.

2.5 NMR spectroscopy Peptide NMR samples were prepared in either 50  mM phosphate buffer pH 7.4 in 90/10 H2O/D2O or in a mixture of TFE/50  mM phosphate buffer pH  7.4/D2O (8/1.5/0.5). Concentrations are specified for each case. NMR spectra were collected at 25°C on a 500-MHz Bruker Avance III spectrometer equipped with a QXI probe with z axis gradients. Details regarding data acquisition and solvent suppression are given in the Supporting information.

3 Results 3.1 Computational design 3.1.1 Protein design algorithm Our electrostatic focusing algorithm translates to an enlargement of the rotamer library. In this case, the initial sizes of the rotamer libraries when considering only the effect of the local backbone conformation were of about 1000 rotamers for each of the different conformers. These libraries were enlarged by about 50% for the first design and by 20% for the second design, which was independent of the first. This process took less than 20 h, for each conformer, when running on a 3 GHZ Intel machine. The enlargement of the libraries translated also in an increase of the computational time required to compute the interaction energies between each possible pair of rotamers in the final environment (dielectric constant set back to 1). It also increased the number of iterations needed to achieve convergence in the combinatorial optimization process, but the increase on the overall computation time for these steps was similar to what would have been expected when applying our algorithm to rotamer libraries of the mentioned size.

3.1.2 Scaffold structure As a scaffold, we considered the Domain B of the immunoglobulin G-binding protein A from Staphylococcus aureus. X-ray structures of protein A binding the human immunoglobulin G were obtained previously [24, 25], but there is strong evidence that the protein undergoes conformational changes upon binding [11]; therefore, we decided to focus on a structure showing the free conformation of the protein in solution. Different NMR structures of this protein and its related Domain Z have been deposited at the protein data bank (PDB) with codes 1BDC, 1BDD, 2SPZ, 1Q2N and 2JWD [11–13]. For our com-

858

putational approach, we selected the most recent of them, 2JWD, showing the B domain with the mutation Y15W. The structure submitted to the database contains 29 conformers. Our methodology considers only one conformer at a time, so we decided to work with each conformer in parallel and combine the computational outcome for each of them. An initial analysis showed that conformers 1, 8, 16 and 24 are the ones showing the biggest differences in the three-helix bundles; the rest of them differ from these in the structure of the N- and C-terminal loops, so we chose the aforementioned conformers as representatives of the structure. Despite the reduced size of the chosen protein (58 residues), the full design is a challenge of its own due to the huge size of the space of possible sequence (~1070). Using a reduced alphabet, consisting on only four amino acids (I, K, E and A), we were able to reduce this space. We only redesigned the positions corresponding to the α-helices, a total of 50 positions (from position 7 to position 56). Amino acids in the N-caps positions (7N, 24N, 41Q) as well as in the third position after the N-cap (N-cap +3: 10Q, 27Q, 44N) have previously been shown to be critical to define the edges of the helices [12, 13]. As a result we decided to maintain the wild-type amino acids in these positions, although we allowed for conformational modifications in the computational modeling. Similar considerations led us to fix the amino acid content of the C-caps (19H, 20L, 56Q, 57A) of the helices. In addition, we only allowed mutations in non-proline non-glycine containing positions. In a second round of design and to ensure the correct formation of the hydrophobichydrophilic pattern, only I or A residues were allowed in core positions (those with solvent accessibility lower than 10%). To summarize, we had 50 designed positions; in 12 of them (7, 10, 19, 20, 21, 24, 27, 30, 39, 41, 44 and 56) only conformational changes of the wild-type amino acids were allowed; in the first round of design we allowed all 38 remaining positions to mutate to members of a reduced I, K, E, A alphabet. In the second round of design, for 9 of them (13, 17, 28, 31, 35, 42, 45, 49, 52), we further restricted mutations to only I or A. The final space of sequences we had to explore in the second round of design had a size of 29429 ≈ 1020 sequences for each of four the explored conformers (Fig. 1A). We wanted to introduce a Trp residue for the experimental determination of the concentration (through UVVis spectroscopy techniques) of the synthesized peptide solutions. In the first design we did not want to interfere with the automatic ranking of the sequences done using only the folding free energy. Thus, to avoid any interference with between the redesigned part of the sequence and the Trp residue, we decided to include the former at the starting position of the peptide. In the second design, we did indeed consider the hydrophobic propensity of each of the positions, to place the Trp in the one that would lead to the lowest folding free energy. Therefore, we

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2013, 8, 855–864 www.biotecvisions.com

Figure 1. (A) Structure of the B domain of the immunoglobulin G-binding protein A from S. aureus (PDB access code 2JWD). Blue indicates the positions that were not designed. Positions marked in magenta or red, corresponding to the N- and C-caps, retained their wild-type residues, but were allowed conformational changes. In the first design, positions in gray or orange were allowed to mutate to I, K, E or A, whereas in the second design, positions colored in orange (corresponding to the hydrophobic core) were only allowed to mutate I or A. (B) Proposed structure of the designed sequence IKEA. The orange positions signal the core of the peptide. The blue line shows the stabilizing interaction between 13K and 31E. (C) Proposed structure of the designed sequence IKEAW35. The color coding is the same as in A. The peptide has been rotated to show position 35, with the Trp residue (marked in green). Blue lines show the network of interactions between the residues on the surface of the peptide.

explored the positions 10, 17, 28, 35, 42 and 49 to find the one most likely to accommodate a Trp residue.

3.1.3 Designed sequence Upon analyzing the obtained sequences and considering the formation of H-bonds in the surface of the peptide, the appearance of clashes and the formation of the hydrophobic/hydrophilic pattern, we found that the scaffold providing the best results was the conformer 16. The designed sequences are shown on Table 1. The sequence IKEA was obtained in the first round of design, and it showed a folding energy of –297 kcal/mol (when using the same scoring function for both cases). The sequence IKEA2 was obtained in the second round and it showed a folding energy of –238 kcal/mol. In addition, we iteratively repeated the procedure for the second designed sequence to introduce a Trp residue in the previously mentioned positions. The best results were obtained for position 35 (Table S2) and the sequence that was finally synthesized, IKEAW35, which differed from IKEA2 by the mutation I35W. Among the 38 positions allowed to mutate, the optimization process introduced mutations in 31 of them; there was less than 10% sequence identity between the wild-type sequence and the designed proteins.

3.1.4 Designed sequence IKEA A schematic of the obtained model can be seen in Fig. 1B. For this designed sequence, the expected hydrophobic/hydrophilic pattern was not found; instead we found

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

a layer of charged amino acids (13K, 31E and 45K) in the core of the protein. Nevertheless, a stabilizing interaction is formed between residues 13K and 31E. The K on position 45 points towards the outside of the hydrophobic core, and we suspected that the hydrophobic part of its lateral chain might be acting towards the stabilization of the hydrophobic core, whereas the hydrophilic part might be working towards the formation of the correct hydrophilic pattern on the external part of the protein. In our computational design approach, the modeling was done using a fixed backbone approach; therefore, the differences between the designed and the original structure were only shown in the orientation of the side chains, which led to an root mean square deviation (RMSD) between the native protein domain and the redesigned protein of 0.009 Å (see Supporting information, Fig. S1 for an overlay of the native protein and the designed protein IKEA).

3.1.5 Designed sequence IKEAW35 Although the core positions were forced to have only A or I, for the rest of them, we allowed DESIGNER to choose between polar (K, E) and non polar residues (I, A), and we found that the mutations chosen led to the correct hydrophobic/hydrophilic pattern, and to the formation of a network of electrostatic interactions between the K and E of the surface (Fig. 1C). The RMSD between the native protein domain and the redesigned protein was 0.010 Å (see Supporting information, Fig. S2 for an overlay of the native protein and the designed protein IKEAW35).

859

Biotechnology Journal

Biotechnol. J. 2013, 8, 855–864

www.biotechnology-journal.com

3.2 Peptide synthesis The Native and designed peptides (see Table  1 for sequences) were synthesized using standard Fmoc protocols; they were deprotected and cleaved from the resin using the conventional conditions (95% trifluoroacetic acid/2.5% triisopropylsilane/2.5%H2O). In special cases, the final Fmoc deprotection was not carried out to keep the protecting group as a marker to facilitate HPLC purification of the target peptide from the crude peptide mixture. After purification, the Fmoc group was removed using mild basic conditions (10% triethylamine in H2O) and the peptides were subsequently purified by HPLC to obtain the final product with high purity as verified by analytical HPLC. All the peptides were amidated at the C terminus and free amine at the N terminus. The peptides were characterized using ESI-MS.

3.3 CD spectroscopy The CD experiments were performed in phosphate buffer at pH 7.4. The spectra shown in Figs. 2A and B display a band centered at 200 nm characteristic of a random coil structure [26], thus indicating both peptides are unfolded at this pH. Further titrations by changing the pH, the peptide concentration or ionic strength did not modify the random coil structure of the peptides. On the other hand, addition of TFE to the solution induced a change in the peptide structures (Figs. 2A and B). As the amount of TFE was increased (from 20 to 98%), a shift of the initial band to higher wavelengths was observed with the subsequent appearance of two new minima at 208 and 220 nm. These features are characteristic of helical structures [26]. Moreover, an isodichroic point at 203 nm was observed for both peptides, which indicates the direct conversion of the initial random coil peptide into a more structured helical final peptide. At the end of the TFE titration, the designed peptides showed a similar ellipticity to that displayed by the Native peptide under the same experimental conditions (Fig. 2C). Overall, both designs (IKEA and IKEAW35) showed very similar degree of helical structure at the same amounts of TFE (Supporting information, Fig. S3).

3.4 Analytical ultracentrifugation Sedimentation equilibrium experiments were carried out to determine the oligomerization state of the peptides at pH 7.4 (50 mM phosphate) in the absence and presence of 40% TFE. Due to security reasons (TFE is a flammable solvent) and to the change in density and viscosity of the solution [27], it was not possible to used higher concentration of TFE. The Native peptide was used as a standard or control since the addition of 40% TFE basically did not affect its CD spectrum (Supporting information, Fig. S4). The experimental data were fitted to a single species model. In all the cases, the data fitted well (Supporting

860

Figure 2. (A) CD spectra of solutions containing the first designed peptide (12.8 µM IKEA, 50 mM phosphate buffer pH 7.4, 25°C) with increasing amounts of TFE. (B) CD spectra of solutions containing the second designed peptide (13.0 µM IKEAW35, 50 mM phosphate buffer pH 7.4, 25°C) with increasing amounts of TFE. (C) Comparison of the CD spectra of designed peptides (without and with 98% of TFE) and the Native peptide (11.2–13.0 µM peptides, 50 mM phosphate buffer pH 7.4, 25°C).

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2013, 8, 855–864 www.biotecvisions.com

Table 2. Sedimentation equilibrium experiments at 20°C

Peptide

MW theoretical

Native peptide IKEA IKEAW35

6662 7199 6992

MW experimental 50 mM PB pH 7.4 + 40% TFE 8000 ± 300 10 300 ± 100 9900 ± 100

9900 ± 1000 10 900 ± 1700 11 500 ± 5000

PB, phosphate buffer.

information, Fig.  S5) and the molecular weights (MW) obtained for each peptide studied are reported in Table 2.

3.5 NMR spectroscopy The 1H-NMR spectrum of the native peptide recorded in phosphate buffer/D2O (90/10) shows a good dispersion of its amidic signals spanning a range of 1.6 ppm, consistent with folded, mostly alpha helical peptides/proteins in aqueous solution. The same spectrum of the designed peptide under the same experimental conditions exhibited only 0.6 ppm dispersion, indicating lack of structure. However, in TFE/phosphate buffer/D2O (8/1.5/0.5) a better dispersion of signals was obtained spanning about 1.2 ppm (Fig. 3), suggesting the presence of some structure in TFE conditions. To get more insight, 2D Nuclear Overhauser spectroscopy (NOESY) and Total Correlation spectroscopy (TOCSY) NMR experiments were carried out for the designed peptide IKEAW35 at a concentration of 2  mM in TFE. The 2D NOESY spectra (Fig.  4) show, in the H(N)-H(N) region, dispersion of signals as well as about 30 pairs of amide to amide Nuclear Overhauser effects (NOEs). In addition, about 50 peaks can be observed in the H(N)-Ha region of the TOCSY, while comparison of the NOESY and TOCSY spectra shows a number of H(N)Ha(i-1) NOEs. The equivalent 2D NOESY spectrum of the unfolded peptide in an acidic aqueous solution was devoid of non-intraresidual NOE crosspeaks (Supporting information, Fig.  S6). Efforts to complete the resonance assignment of IKEAW35 were hampered by the limited number of NOEs observed in respect to the expected ones coupled with the high prevalence of K and E repeats in the sequence of the construct that further aggravated the uncertainty of assignments. In the absence of resonance assignment we could not unambiguously estimate the tertiary structure of the construct. Due to the striking similarities between the designed sequences IKEA and IKEAW35, we decided not to work further on the characterization of the IKEA, since we did not expect to find any further dissimilarities in their structure.

4 Discussion One of the problems with implicit solvation in protein design is the excessive over-representation of charged

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Figure 3. 1H-NMR spectra of the amidic region of (A) Native peptide in 50 mM phosphate buffer pH 7.4, (B) IKEAW35 in 50 mM phosphate buffer pH 7.4, (C) IKEAW35 in TFE/50 mM phosphate buffer/D2O (8/1.5/0.5) and (D) IKEA pH 3.4.

and flexible side chains. We developed a computational protein design algorithm that finds the best side chainside chain pairings by creating a refined library. This relies on the use of an electrostatic focusing to increase the size of the rotamer library. We have applied our methodology to the problem of designing a three-helix bundle protein domain using a reduced alphabet. Importantly, we tested our algorithm by synthesizing sequences coding for those protein domains, and we characterized them with CD and NMR in different solvents. We did the testing with different solvents to evaluate the performance of using an extended rotamer library. As we expected that side chain packing would be enhanced in more hydrophobic media, the use of a refined library would decrease the number of false negatives in our algorithm, i.e. sequences coding for proteins that would fold in nature but not in the computer, due to the lack of enough rotamers to provide the correct electrostatic pairings. Notice that the folding in polar solvents does not require such refined libraries for the polar side chains, as normally they interact with the solvent.

861

Biotechnology Journal

Biotechnol. J. 2013, 8, 855–864

www.biotechnology-journal.com

Figure 4. NMR spectra of the designed peptide IKEAW35 (2 mM peptide) at 25°C in TFE/50 mM phosphate buffer/D2O (8/1.5/0.5). Top: H(N)-Ha region of H-H NOESY. Bottom: left: H(N)-H(N) region of H-H NOESY, right: Ha-H(N) region of H-H TOCSY.

Our computational methodology is based on an enlargement of the initial rotamer library, using an effective dielectric constant to bias the search towards rotamers able to form H-bonds among their side chains. It enlarges the search space for the combinatorial optimization problem and is similar to the approach presented in [10]; however, since it is restricted to the construction of the rotamer library, it does not modify the objective function or the way the folding free energy is estimated. As such, our approach could be easily combined with any other approach to compute the free energy (e.g. knowledge-based PEEFs or EEEFs) as long as they rely on the use of rotamers to model the folded state of the

862

protein. The enlargement of the rotamer library is done on a case-by-case basis and considers the local environment of each rotamer. To adapt our method to refine the rotamer library used together with either knowledgebased PEEFs or EEEFs, the corresponding scoring methods for the rotamer optimization should be changed accordingly. CD spectroscopy studies in aqueous solution at pH 7.4 reveal the expected α-helical structure for the Native peptide, but random conformation structure for both designed peptides IKEA and IKEAW35 (Figs. 2A and B at 0% TFE). Modification of the pH, peptide concentration or ionic strength did not change the CD spectra. Consistent

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2013, 8, 855–864 www.biotecvisions.com

with this, only the 1H-NMR spectrum of the Native peptide, recorded under similar experimental conditions, displayed a good dispersion of signals as expected for a folded peptide (Fig. 3). In the presence of TFE, both designed peptides adopted the helical structure characteristic of the Native peptide as shown by the CD spectra (Fig. 2C). Addition of TFE, a less polar solvent, into the aqueous solution of unfolded proteins or peptides is known to induce and stabilize α-helical secondary structure when there is a substantial tendency for these molecules to adopt that conformation, i.e. molecules that have already a leaning towards α-helical structures [28–30]. Additionally, the sets of CD spectra obtained as the TFE concentration increased (Figs. 2A and B) display an isodichroic point at 203  nm, suggesting a random coiled to helical structure transition. The 1H-NMR spectra obtained at 80% TFE show a good dispersion of signals that corroborates the CD results and indicates the formation of a structure upon TFE addition. Nonetheless, these CD and NMR experiments do not allow us to know the oligomerization state of the final structure. To get insight into this question, we carried out analytical ultracentrifugation experiments at the highest TFE concentration allowed (40%). The results obtained for the designed peptides in comparison with that of the Native counterpart indicate that the major species in solution were monomeric species in both cases. Thus, aggregation into higher oligomeric states did not occur. Since at 40% TFE the CD spectra already showed 50% of the total ellipticity and a well-defined isodichroic point was observed at 203 nm, we can postulate that the final structure obtained at the highest TFE concentration is most likely a monomeric peptide with a high α-helical content. One question that remained unanswered at this point was whether under these conditions the designed peptides would in fact adopt a single helical structure or whether they would fold into a single coiled coil structure. If one considers that during the computational design the amino acids defining the N- and C-caps of the three helices as well as the Pro in the loops were kept constant, it could very well be that a coiled coil is formed. To obtain more details regarding the structure, we decided to do 2D NOESY and TOCSY NMR experiments (Fig. 4). The resulting spectra of the peptide in TFE were consistent with a folded peptide. Considering the reduced number of amino acid types that were used in this design, the resulting signal dispersion is indicative of structure. The presence of a number of characteristic inter-residue NOEs further corroborated this. However, the number of NOEs observed was still smaller than that expected; all the signals were broader than the native protein and the dispersion of the aliphatic resonances was not very big. These results could indicate the formation of a coiled coil. However, additional analysis of these results will be the aim of further studies.

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

It is important to notice that during the first design, all amino acids from the reduced I, K, E, A alphabet were allowed in the designed positions. Thus, the sequence IKEA2 (related to IKEAW35 upon W35I) was among the sequences that were analyzed during that design but was ranked below the IKEA sequence due to the huge difference in the estimation of the folding free energy between both sequences (about 20%). This shows that the implicit solvating model used in the PEEF clearly underestimates the energetic cost of burring hydrophilic amino acids in the core of the protein. From the designed sequences, IKEA and IKEAW35, only one of them, IKEAW35, has a native-like sequence, with the hydrophobic surface patterning of native proteins. However, both of them performed similarly under the different types of analysis that we performed. Our electrostatically focused enlargement of the rotamer library succeeded in biasing our search towards sequences with a higher number of satisfied electrostatic interactions. The solvation model included within the PEEF is not accurate enough to distinguish between both sequences, and in fact it ranks the non-native like sequence IKEA higher that IKEAW35 due to its failure to fully represent the interaction of the protein with the solvent, Nevertheless, our electrostatic-driven optimization method has proved its ability to select for sequences able to lead to foldable proteins, reproducing the structure of the known ones, and to overcome the limitations associated with the use of implicit solvation models. The results indicated that the targeted folded structure is achieved only for high concentrations of the TFE solvent. TFE has a dielectric constant ten times lower that water and is known to induce helical structure through Hbonds, and in this solvent there are no significant differences in the structure adopted by the native or the nonnative like sequences. This indicates that our algorithm did improve the intra-side chain interactions and has biased our search towards pairs of rotamers able to form H-bonds among their side chains. The interactions with water are too dominant for our small protein fold (which does not contain a real hydrophobic core), overwhelming other terms. We can conclude that the targeted structure achieved is dependent on the solvent, indicating that unsupervised computational approaches could be used to design protein that would fold only in a specific condition. Despite our knowledge of protein stability and function, it is still not possible to perform an automated computational design of proteins based solely on this knowledge. The use of an automated (unsupervised) methodology to design synthetic protein sequences from first principles would open new venues in molecular biology, allowing for a predictive testing of our biological knowledge. The methodology we have presented here includes an electrostatic enhancement of the rotamer library; however, the exploration and ranking of the possible sequences

863

Biotechnology Journal

Biotechnol. J. 2013, 8, 855–864

www.biotechnology-journal.com

is done through a PEEF derived only from the knowledge of the physical interactions among the atomic components of the protein. We expect that our general methodology will be useful for the protein design in the context of hydrophobic solvents, which represent an important fraction of the situations in the cell.

O.I. acknowledges the financial support from the Fundação para a Ciência e a Tecnologia with co-participation of the European Community funds FEDER, POCI, QREN and COMPETE [PTDC/QUI-BIQ/098406/2008, PestOE/EQB/LA0004/2011 and the National Networks of Mass Spectrometry (REDE/1504/REM/2005) and NMR (REDE/1517/RMN/2005)]. O.I. and A.J. also thank the financial support from the European Commission [FP7-Marie Curie International Reintegration Grant to O.I. and FP7-ICT-043338 (BACTOCOM), to A.J], and MICROSCILA (PRES UniverSud Paris) and the Fondation pour la Recherche Medicale grants to A.J. The authors declare no commercial or financial conflict of interest

5 References [1] Suárez, M., Tortosa, P., Carrera, J., Jaramillo, A., Pareto optimization in computational protein design with multiple objectives. J. Comput. Chem. 2008, 29, 2704–2711. [2] Lazaridis, T., Karplus, M., Effective energy functions for protein structure prediction. Curr. Opin. Struct. Biol. 2000, 10, 139–145. [3] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J. et al., CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comp. Chem. 1983, 4,187–217. [4] Berendsen, H. J. C, van der Spoel, D., van Drunen, R., GROMACS: A message-passing parallel molecular dynamics implementation. Computer Phys. Commun. 1995, 91, 43–56. [5] Wernisch, L., Hery, S., Wodak, S. J., Automatic protein design with all atom force-fields by exact and heuristic optimization. J. Mol. Biol. 2000, 301, 713–736. [6] Dunbrack, R. L., Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 2002 12, 431–440. [7] Suárez , M., Jaramillo, A., Challenges in the computational design of proteins. J. R. Soc. Interface 2009, 6 Suppl 4: S477–91. [8] Kuhlman, B., Baker, D., Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 2000, 97, 10383–10388. [9] Jaramillo, A., Wernisch, L., Hery, S., Wodak, S. J., Folding free energy function selects native-like protein sequences in the core but not on the surface. Proc. Natl. Acad. Sci. USA 2002, 99, 13554–13559. [10] Vicatos, S., Roca, M., Warshel, A., Effective approach for calculations of absolute stability of proteins using focused dielectric constants. Proteins 2009, 77, 670–684. [11] Gouda, H., Torigoe, H., Saito, A., Sato, M. et al., Three-dimensional solution structure of the B domain of staphylococcal protein A: Comparisons of the solution and crystal structures. Biochemistry 1992, 31 9665–9672.

864

[12] Tashiro, M., Tejero, R., Zimmerman, D. E., Celda, B. et al., High-resolution solution NMR structure of the Z domain of staphylococcal protein A. J. Mol. Biol. 1997, 272, 573–590. [13] Zheng, D., Aramini, J. M., Montelione, G. T., Validation of helical tilt angles in the solution NMR structure of the Z domain of Staphylococcal protein A by combined analysis of residual dipolar coupling and NOE data. Protein Sci. 2004, 13, 549–554. [14] Schafmeister, C. E., LaPorte, S. L., Miercke, L. J. W., Stroud, R. M., A designed four helix bundle protein with native-like structure. Nat. Struct. Biol. 1997, 4 , 1039–1046. [15] Riddle, D. S., Santiago, J. V., Bray-Hall, S. T., Doshi, N. et al., Functional rapidly folding proteins from simplified amino acid sequences. Nat. Struct. Biol. 1997, 4, 805–809. [16] Marshall, S. A., Mayo, S. L., Achieving stability and conformational specificity in designed proteins via binary patterning. J. Mol. Biol. 2001, 305, 619–631. [17] MacKerell, A. D. Jr., Bashford, D., Bellott, M., Dunbrack, R. L. Jr. et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102, 3586–3616. [18] Ooi, T., Oobatake, M., Nemethy, G., Scheraga, H. A., Accessible surface areas as a measure of the thermodynamics parameters of hydration of peptides. Proc. Natl. Acad. Sci. USA 1987, 84, 3086–3090. [19] Ogata, K., Jaramillo, A., Cohen, W., Briand, J. et al., Automatic sequence design of major histocompatibility complex class I binding peptides impairing CD8+ T cell recognition. J. Biol. Chem. 2003, 278, 1281–1290. [20] Glykys, D., Szilvay, G., Tortosa, P., Suárez, M. et al., Pushing the limits of automatic computational protein design: Design, expression, and characterization of a large synthetic protein based on a fungal laccase. Syst. Synth. Biol. 2011, 5, 1–14. [21] Dunbrack, R. L., Karplus, M. Backbone-dependent rotamer library for proteins: Application to side-chain prediction. J. Mol. Biol. 1993, 230, 543–547. [22] Suárez, M., Tortosa, P., García-Mira, M. M., Rodríguez-Larrea, D. et al., Using multi-objective computational design to extend protein promiscuity. Biophys. Chem. 2010, 147, 13–19. [23] Chan, W. C., White, P. D. (Eds.), Fmoc Solid Phase Peptide Synthesis: A Practical Approach. Oxford University Press, 2000. [24] Graille, M., Stura, E. A., Corper, A. L., Sutton, B. J. et al., Crystal structure of a Staphylococcus aureus protein A domain complexed with the Fab fragment of a human IgM antibody: Structural basis for recognition of B-cell receptors and superantigen activity. Proc. Natl. Acad. Sci. USA 2000, 97, 5399–5404. [25] Hogbom, M., Eklund, M., Nygren, P. A., Nordlund, P., Structural basis for recognition by an in vitro evolved affibody. Proc. Natl. Acad. Sci. USA 2003, 100, 3191–3196. [26] Kelly, S. M., Jess, T. J., Price, N. C., How to study proteins by circular dichroism. Biochim. Biophys. Acta 2005, 1751, 119–139. [27] MacPhee, C. E., Perugini, M. A., Sawyer, W. H., Howlett, G. J., Trifluoroethanol induces the self-association of specific amphipathic peptides. FEBS Lett. 1997, 416, 265–268. [28] Merutka, G., Stellwagen, E., Analysis of peptides for helical prediction. Biochemistry 1989, 28, 352–357. [29] Zhong, L., Johnson, W. C. Jr., Environment affects amino acid preference for secondary structure. Proc. Natl. Acad. Sci. USA 1992, 89, 4462–4465. [30] Kemmink, J., Creighton, T. E., Effects of trifluoroethanol on the conformations of peptides representing the entire sequence of bovine pancreatic trypsin inhibitor. Biochemistry 1995, 34, 12630–12635.

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ISSN 1860-6768 · BJIOAM 8 (7) 753–868 (2013) · Vol. 8 · July 2013

Systems & Synthetic Biology · Nanobiotech · Medicine

7/2013 Biosynthesis Biotransformation Protein separation

International Biotechnology

www.biotechnology-journal.com

Special issue: International Biotechnology Efforts to develop biotechnological strategies for various applications have reached a global scale as indicated by the diverse international representation at the IBS2012 organized by the Asian Federation of Biotechnology (AFOB). For this special issue’s cover, one such useful technique – the rapid gene knockout method – is schematically shown: a series of outer grey arrows represent the multiple steps required for conventional gene knockout experiments, while the inner green arrow represents the rapid gene knockout method; the steamengine and the bullet train are used to further illustrate the difference. Image provided by Sang Yup Lee and Chan Woo Song (KAIST, Korea).

Biotechnology Journal – list of articles published in the July 2013 issue. Editorial: Flavors of international biotechnology Sang Yup Lee and Alois Jungbauer http://dx.doi.org/10.1002/biot.201300232 Mini-Review Parameter identification of in vivo kinetic models: Limitations and challenges Joseph J. Heijnen and Peter J. T. Verheijen

http://dx.doi.org/10.1002/biot.201300105 Research Article Rapid one-step inactivation of single or multiple genes in Escherichia coli Chan Woo Song and Sang Yup Lee

http://dx.doi.org/10.1002/biot.201300153 Research Article Engineering of a hybrid biotransformation system for cytochrome P450sca-2 in Escherichia coli Lina Ba, Pan Li, Hui Zhang, Yan Duan and Zhanglin Lin

http://dx.doi.org/10.1002/biot.201200097 Research Article Biosynthesis of gamma-linolenic acid and beta-carotene by Zygomycetes fungi Tatiana Klempova, Eva Basil, Alena Kubatova and Milan Certik

http://dx.doi.org/10.1002/biot.201200099 Research Article PEG chain length impacts yield of solid-phase protein PEGylation and efficiency of PEGylated protein separation by ion-exchange chromatography: Insights of mechanistic models Noriko Yoshimoto, Yu Isakari, Daisuke Itoh and Shuichi Yamamoto

http://dx.doi.org/10.1002/biot.201200325

© 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Research Article Evaluating the influence of selection markers on obtaining selected pools and stable cell lines in human cells Amanda M. Lanza, Do Soon Kim and Hal S. Alper

http://dx.doi.org/10.1002/biot.201200364 Research Article Full factorial screening of human embryonic stem cell maintenance with multiplexed microbioreactor arrays Drew M. Titmarsh, Dmitry A. Ovchinnikov, Ernst J. Wolvetang and Justin J. Cooper-White

http://dx.doi.org/10.1002/biot.201200375 Research Article Near-infrared and two-dimensional fluorescence spectroscopy monitoring of monoclonal antibody fermentation media quality: Aged media decreases cell growth Christian Hakemeyer, Ulrike Strauss, Silke Werz, Francisca Folque and Jose C. Menezes

http://dx.doi.org/10.1002/biot.201200355 Research Article Facile preparation of well-defined near-monodisperse chitosan/sodium alginate polyelectrolyte complex nanoparticles (CS/SAL NPs) via ionotropic gelification: A suitable technique for drug delivery systems Peng Liu and Xubo Zhao

http://dx.doi.org/10.1002/biot.201300093 Research Article Computational protein design with electrostatic focusing: Experimental characterization of a conditionally folded helical domain with a reduced amino acid alphabet Maria Suárez-Diez, Anaïs M. Pujol, Manolis Matzapetakis, Alfonso Jaramillo and Olga Iranzo

http://dx.doi.org/10.1002/biot.201200380

www.biotechnology-journal.com