Chapter 17

Chapter 17 Modeling Peptide–Protein Interactions Nir London, Barak Raveh, and Ora Schueler-Furman Abstract Peptide–protein interactions are prevalent in the living cell and form a key component of the overall protein–protein interaction network. These interactions are drawing increasing interest due to their part in signaling and regulation, and are thus attractive targets for computational structural modeling. Here we report an overview of current techniques for the high resolution modeling of peptide–protein complexes. We dissect this complicated challenge into several smaller subproblems, namely: modeling the receptor protein, predicting the peptide binding site, sampling an initial peptide backbone conformation and the final refinement of the peptide within the receptor binding site. For each of these conceptual stages, we present available tools, approaches, and their reported performance. We summarize with an illustrative example of this process, highlighting the success and current challenges still facing the automated blind modeling of peptide–protein interactions. We believe that the upcoming years will see considerable progress in our ability to create accurate models of peptide–protein interactions, with applications in bindingspecificity prediction, rational design of peptide-mediated interactions and the usage of peptides as therapeutic agents. Key words: Peptide docking, Peptide modeling, Rosetta FlexPepDock, Peptide–protein interactions, Peptide–protein complexes, Peptide binding

1. Introduction Protein–protein interactions are one of the driving forces of the living cell. A large and important subset of these interactions is mediated by a short, flexible linear peptide that binds to a globular receptor and may form a modular binding motif (1). It has been estimated that between 15 and 40% of all protein–protein interactions are mediated by a short linear peptide (1, 2). Interactions that are mediated by flexible peptides play key roles in major cellular processes, predominantly in signaling and regulatory networks (3), but also in cell localization, protein degradation, and immune response (1, 3). Due to their cardinal role in regulatory interactions, flexible peptides are in many cases implicated in human Andrew J.W. Orry and Ruben Abagyan (eds.), Homology Modelling: Methods and Protocols, Methods in Molecular Biology, vol. 857, DOI 10.1007/978-1-61779-588-6_17, © Springer Science+Business Media, LLC 2012

N. London et al.

disease and cancer (3). Consequently, these peptides provide an attractive starting point as leads for the design of inhibitory peptides and small molecule drugs (4–7). In vivo, these linear peptides are not necessarily independent molecules, but rather appear within disordered regions at protein termini (8), in-between domains (9), or as flexible loops that bulge out of structured domains and mediate a protein– protein interaction (10). Short peptide molecules may also be created in vivo by proteolytic digestion of precursor molecules (11, 12), or they can be synthesized for in vitro studies or as small drug molecules (13). Flexible peptides, as intrinsically disordered proteins, often lack a distinct fold in their unbound state, and upon encountering their target (the receptor), they go through simultaneous binding and folding (induced fit model) (9, 14–16), or go through an equilibrium-shift towards preexisting bound conformations (conformation sampling model) (16–18). Their size may vary from short dipeptides that can be likened to small ligand molecules, to flexible peptides dozens of amino acids long, which wrap around the entire perimeter of their receptors (19). This review aims to summarize the state of the art in modeling the interactions of flexible peptides at high resolution. As this problem involves many degrees of freedom both of the flexible peptide and the receptor, it is conceptually convenient to divide it into several consecutive steps, in line with prevalent approaches for modeling (20) and docking (21) of globular proteins (1) Model receptor structure: create an initial model of the receptor (if its structure has not been solved yet); (2) Predict binding site: locate potential binding sites on the receptor surface (3) Build initial model of peptide: create a set of models of plausible peptide backbone conformations (with or without considering the receptor); (4) Model and refine peptide–receptor complex structure: Optimize initial model of the peptide at the receptor binding site (based on steps 1–3) and refine into a high-resolution model. Note that in this last step, the peptide and receptor conformations may change considerably to increase their binding energy. Figure 1 presents an overview of the process, and Table 1 summarizes the different tools available for each step. The above four steps are not necessarily completely distinct and might rather depend on each other, since the final conformation of the peptide (and sometimes even of the receptor) is stabilized or even induced by the interaction between the two (16). Nonetheless, these rough guidelines make it easier to tackle this complicated problem in a modular fashion. Fortunately, for several well-studied systems (e.g., kinases, MHC proteins, PDZ, SH3, and WW domains), a solved structure of the peptide binding domain in complex with other peptide sequences can be used

17

Modeling Peptide–Protein Interactions

Fig. 1. Modular architecture of modeling peptide–protein interactions. An overview of the four conceptual stages in the high-resolution modeling of peptide–protein interactions.

as a template for subsequent refinement, by simply threading the desired sequence onto the solved peptide backbone. Even in these cases, the last step of refinement is often very important: As in any homology model, the template peptide structure may differ from the target peptide structure to a varying degree, from slight sidechain reorientation (22) to massive backbone rearrangements (23, 24). Throughout this chapter, we cover the existing approaches for modeling peptide–protein interactions following the steps described above. We include examples of recent applications for the modeling of peptide–protein interactions and discuss some eminent open problems in this field. Finally, we provide the reader with a list of major structural datasets of peptide interactions that have been used to characterize the unique properties of peptide–protein interactions as well as to evaluate existing methods.

N. London et al.

Table 1 Summary of methods for modeling peptide–protein interactions A. Prediction of peptide binding sites Name

Description

Availability

Reference

PepSite

Peptide binding location predictor; includes partial peptide orientation in the pocket Solvent mapping of the receptor surface. Correlates well with peptide binding sites Protein surface pocket detector. Peptides tend to bind to the largest pocket Predictor of anchoring residues for peptide or protein binding interfaces

http://www.russell.embl.de/ pepsite/ http://ftmap.bu.edu/

(42)

http://sts.bioengr.uic.edu/ castp/index.php N/A

(44)

FTmap CASTp AnchorsMap

(46)

(47)

B. Peptide backbone conformational sampling approaches Approach

Description

Reference

Molecular dynamics (MD) Monte Carlo (MC) Fragment-based approaches

MD has been used to recover the structure of peptides in solution. This works well when the peptide adopts a stable conformation in the absence of the receptor MC has been used to sample the structure of stable peptides Several studies have shown that short peptides have local preferences to adopt a specific conformation based on their sequence. This enables to utilize solved structures of similar sequences in a different context to predict the peptide’s conformation When no other data is available, the extended conformation is often a good starting point for the peptide conformation

(53–55)

Extended conformation

(56–58) (65, 67) (24, 27)

url should be in blue

C. High-resolution modeling of peptide–protein complexes Name

Description

Sampling method

Availability

Reference

FlexPepDock

High-resolution refinement of peptide–protein interactions High-resolution refinement of peptide–protein interactions Global docking of small molecules and short peptides

Monte-Carlo with minimization; implemented in Rosetta Optimized potential molecular dynamics Grid based, followed by genetic algorithm-based minimization Orthogonal Latin-square sampling

Rosetta 3.2; http://flexpepdock. furmanlab.cs.huji.ac.il/ Upon request

(27)

http://autodock.scripps.edu/

(75)

Upon request

(79)

DynaDock AutoDock MOLS

Global docking and refinement of short peptides

(28)

D. Modeling selected systems Reference

MHC/peptide PDZ/peptide

Two peptide anchoring residues bind in specific pockets The C-terminal residue is anchored at specific location

(23, 81–86, 100) (24, 88, 89, 102)

Datasets of protein-complex structures Name

Size

Resolution

Peptide lengths

Availability

Reference

PepX

X-ray < 2.5 Å

5–35

http://pepx.switchlab.org

(94)

peptiDB

1,431 (505 unique clusters) 103 unique clusters

X-ray < 2.0 Å

5–15

(26)

3did

829 (not clustered)

N/A

N/A

London et al. (supplemental information) http://3did.irbbarcelona.org

(95)


Constraints

17

System

N. London et al.

2. Modeling the Receptor Protein When docking a peptide (or any ligand) to a receptor protein, structures may be available for the receptor protein in its free form (unbound docking), or in complex with other peptide sequences (cross-docking). In more difficult cases, we would have to resort to homology modeling using the methods covered extensively in other chapters of this book or even ab initio modeling. Similar to protein–protein docking and ligand docking, the success of docking to unbound models, cross-docking and homology models, depends on the extent to which the receptor structures changes upon binding, mainly at the binding site (25). In previous work, we have shown that the backbone conformation of the receptor protein does not change substantially (100 Å2; see, for example, Fig. 2). In most of these cases (18/22), this pocket was the largest pocket available on the protein surface. (2) Binding of specific peptide residue into small hole: 47% of the peptides in the entire dataset were found to bind to a small pocket instead (pocket area < 100 Å2); in these cases, one of the peptide’s side chains is buried in this pocket in a knob-hole fashion. However, even when the peptide latches onto a small pocket, this is still, in general, the largest pocket available on the protein (29/40 cases). Our analysis further revealed that A-helical peptides tend to bind using the knob-hole strategy, whereas B-strand peptides prefer pockets. Either way, it turns out that finding the largest pockets on a receptor surface can provide useful guidance for peptide binding sites (see Note 2).

Fig. 2. Peptides tend to bind in large pockets on protein surfaces. An antagonist peptide (in red cartoon representation) in complex with the EphB4 receptor (in white surface representation; PDB: 2BBA). The largest pocket on the protein surface as detected by CASTp (44) is shown in dark gray mesh. Such a pocket can be used to focus the modeling of peptide-protein interactions to the relevant region.

17

3.3. Small-Molecule Mapping: FTmap (46) (Availability: http:// ftmap.bu.edu/) and ANCHORSMAP (47)

4. Modeling the Initial Backbone Conformation of the Peptide


The original purpose of FTmap (Fourier-Transform Maps) was the mapping of potential solvent binding sites on a protein surface. The server docks small organic molecules on the target protein surface using the Fourier-Transform approach (48), finds favorable binding positions, and clusters the conformations of all predictions. The clusters are then ranked according to their average free energy. Low-energy clusters are grouped into consensus sites, and the largest consensus sites were shown to locate active or ligand binding sites (46). We have recently shown (Raveh et al. (27) and unpublished data) that these clusters can also serve as good predictors of peptide binding sites for peptide anchoring residues. In yet unpublished results, we found that in 82% of the cases, there was at least one molecule cluster that approximately correlated to one of the peptide side chains (at least four atoms were found within 2 Å of the atoms of a single side chain). In 71% of those examples, an even more accurate match was found (at least three atoms were located within 0.7 Å of the atoms of a single side chain). Another method, which looks for binding sites of peptide anchor residues, is ANCHORSMAP (47), which was shown to locate the peptide anchor binding sites on the PDZ domain and in the protein–peptide complex kinase/PKI, and has recently been applied to characterize the specificity of Thr and Ser kinase binding grooves (104). We are currently working to combine the different approaches for binding-site prediction (pocket detection, small-molecule mappings, and other features extracted from peptide–protein complexes datasets) to devise an integrated machine learning based classifier that would predict peptide binding sites, in analogy to similar approaches for predicting binding sites for globular proteins and small molecules.

Most state-of-the-art tools available for modeling and refining the final peptide–receptor complex require an initial conformation of the peptide backbone as part of their input, except for the case of very short peptides made of 2–4 amino acids (49). In the absence of template structures for the target peptide–protein interaction, the initial peptide backbone conformation has to be modeled by other means. We have recently shown that the Rosetta FlexPepDock tool (see below) can model peptide–protein complexes accurately if the initial peptide backbone conformation deviates from the native peptide by at most 50° in terms of j/y torsion angles RMSD (27), meaning that the initial peptide model should at least approximate the correct native secondary structure. According to an induced fit model of peptide recognition, a peptide would fold only upon binding to its partner (14) (reviewed

N. London et al.

in ref. 16). This model suggests that even for building an initial model of the peptide backbone, the effect of the receptor protein on the peptide backbone conformation must be taken into account. In contrast, the conformational sampling model rather assumes that the peptide in its free form samples an ensemble of peptide conformations that includes the native, bound peptide conformation. According to this model, the presence of the receptor molecule only shifts the equilibrium further towards the bound form. The conformational sampling model was shown to apply to interactions between intrinsically disordered domains that exist as molten globules in their free state (17, 50) (reviewed in ref. 16). Also, it is known that small peptides that are stabilized by short-range hydrogen bonds, such as B-hairpin peptides (51) and A-helical peptides (52), may adopt a stable secondary structure already in their free form to a varying degree. This suggests that the initial modeling of a set of potential peptide backbone conformations based on sequence preferences alone could well serve as input to consequent peptide refinement within the receptor environment in a subsequent step. To the best of our knowledge, no generic well-tested tool for conformational sampling of peptide conformations in the context of peptide docking has yet been designed. However, different approaches have been used to address free peptide conformational sampling. Molecular dynamics (MD), for instance, has been used to predict the structure of A-helical and B-hairpin peptides (53, 54) and to study their energy landscape (55). Other sampling methods have also been used for exploring the structures of free peptide molecules. These include Monte-Carlo-based approaches (56–58), which often sample the conformation space more effectively than MD, as well as density-guided importance sampling (59) and simulated annealing-coupled replica exchange molecular dynamics (60). Sequence-based fragment libraries extracted from PDB structures have been very successful for de novo protein fold prediction (61, 62), loop modeling (63), and other applications (64). Voelz et al. (65) have used replica exchange molecular dynamics (REMD) simulations on 872 different 8-mer, 12-mer, and 16-mer peptide fragments from 13 proteins to examine the extent to which conformations of peptide fragments in water predict native conformations (native contacts) in globular proteins (extending a similar study on a smaller scale by Ho and Dill (66)). Using this scheme, they achieved accuracy of up to 63% in the prediction of native contacts for 8-mers, 71% for 12-mers, and 76% for 16-mers. It seems reasonable that these results would hold also for peptide– protein interaction, as Vanhee et al. (67) recently showed that bound peptides often emulate backbone fragments of monomer proteins. Therefore, already-solved structures can be a good source for estimating the interacting peptide backbone conformation. Preliminary results of an ongoing study in our group show that at least in some specific cases, sequence similarity can be used to detect correct protein segments from structures in the Protein

17


Data Bank (68), albeit there are many exceptions (see Note 3). Based on these results and on the Rosetta fragment libraries approach (62), we have developed and calibrated ab initio FlexPepDock, an extension of the FlexPepDock refinement protocol described in detail below. FlexPepDock ab initio fully samples the peptide conformations space while docking it to a given site on the protein receptor (105). This protocol has significantly increased the number of peptide-protein interactions that can now be modeled at high accuracy. Using ideal secondary structure geometry for initial peptide conformation. As the tools used for the final modeling of the peptide– protein complex require only an approximate initial model of the peptide backbone, it might suffice to specify the correct secondary structure composition of the peptide. We have recently shown that for a wide range of peptide–protein interactions, good results can be obtained using the Rosetta FlexPepDock method (27), if we start from an ideally extended initial peptide backbone conformation, even if the native peptide conformation deviates substantially from ideal extended geometry (27). Similar results were shown previously for PDZ domains, which also bind peptides in extended-like conformation (24). It is plausible that if native peptides are, e.g., helical, then an initial conformation with ideal helix geometry would be suitable for the final docking step, although this has not been tested hitherto. We note that the secondary structure propensity of a peptide in its free form can be inferred from experimental methods such as CD spectroscopy (69) or from sequence preferences alone and therefore may provide the necessary information for creating sufficiently good initial peptide models. Finally, we note that, in some cases, NMR spectroscopy can be used to determine the structure of the bound peptide molecule (70, 71), even if for technical reasons the structure of the receptor protein or the relative orientation of the peptide and the receptor cannot be determined (due to, e.g., the size of the receptor).

5. Modeling and Refinement of the Peptide–Protein Complex

Given a known binding site, whether from experimental data or based on prediction, and an estimated conformation for the peptide, be it based on a homologue, predicted as described above, or even a linear representation of the peptide in its binding pocket, we now have reached the last and most critical step of modeling peptide protein interactions: the high-resolution refinement of the peptide within the binding pocket. Again, there is no exact line between “refinement” and “docking” and different tools can reach near-native solutions starting from different representations of the system. This is not a trivial stage, since it has to tackle the sampling of many degrees of freedom. Usually, full flexibility will be given to

N. London et al.

the peptide backbone and side chains, and some level of flexibility will be sampled for the receptor protein. Moreover, correct selection of the best model is also a hard task, given the large conformational space and rugged energy landscape. In this section, we briefly review methods for the high-resolution modeling of peptide–protein interactions and their performance on various benchmarks. 5.1. Rosetta FlexPepDock (27, 105) (Availability: Rosetta Releases 3.2 and later; Web server at http:// flexpepdock. furmanlab.cs.huji. ac.il/(101))

Rosetta FlexPepDock is a high-resolution protocol for refining peptide–protein complexes implemented in the Rosetta modeling suite framework. Given a coarse model of the interaction (either based on homology modeling or generated using the approaches described above), FlexPepDock performs a Monte-CarloMinimization-based approach to refine all of the peptide’s degrees of freedom (rigid body orientation, backbone and side chain flexibility) as well as the protein receptor side-chain conformations. FlexPepDock was thoroughly benchmarked against a set of perturbed peptide–protein complexes and an effective range of sampling was defined. For peptides with initial backbone (bb) RMSD of up to 5.5 Å, FlexPepDock is able to create near-native models (peptide bb-RMSD