Catch Me if You Can: Challenges and Applications of ... - SAGE Journals

7 downloads 0 Views 894KB Size Report
Catch Me If You Can: Challenges and Application of Cross-Linking Approaches. III. However, the most challenging part is the cross-link anal- ysis and ...
99

V. Tinnefeld, A. Sickmann and R. Ahrends, Eur. J. Mass Spectrom. 20, 99–116 (2014) Received: 15 August 2013 n Accepted: 20 December 2013 n Publication: 28 February 2014

EUROPEAN JOURNAL OF MASS SPECTROMETRY Special Issue dedicated to Professor Michael Linscheid on the Occasion of his 65th Birthday

Catch me if you can: challenges and applications of cross-linking approaches Verena Tinnefeld,a Albert Sickmanna,b and Robert Ahrendsa* a

Leibniz-Institut für analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Str. 6b, 44227 Dortmund, Germany. E-mail: [email protected]

b

Department of Chemistry, College of Physical Sciences, University of Aberdeen, Aberdeen AB24 3UE, Scotland, UK Biomolecular complexes are the groundwork of life and the basis for cell signaling, energy transfer, motion, stability and cellular metabolism. Understanding the underlying complex interactions on the molecular level is an essential step to obtain a comprehensive insight into cellular and systems biology. For the investigation of molecular interactions, various methods, including Förster resonance energy transfer, nuclear magnetic resonance spectroscopy, X-ray crystallography and yeast two-hybrid screening, can be utilized. Nevertheless, the most reliable approach for structural proteomics and the identification of novel protein-binding partners is chemical cross-linking. The rationale is that upon forming a covalent bond between a protein and its interaction partner (protein, lipid, RNA/DNA, carbohydrate) the native complex state is “frozen” and accessible for detailed mass spectrometric analysis. In this review we provide a synopsis on crosslinker design, chemistry, pitfalls, limitations and novel applications in the field, and feature an overview of current software applications.

Keywords: mass spectrometry, cross-linking, protein interaction, cross-linking reagents, chemical cross-linking, structural proteomics, analysis software, metabolic cross-linking

Analytical challenges in cross-linking analysis The combination of chemical cross-linking with mass spectrometry (MS) is a challenging but emerging field to investigate biomolecule interactions, such as protein–protein, protein–RNA/DNA and protein–lipid binding and was especially fostered through the introduction of soft ionization techniques in MS during the last decades. 1,2 The actual cross-linking approach was introduced in the early 1970s, and the first experiments were thereby conducted with dimethyl suberimidate (DMS), which is still a widely used amine-specific cross-linker. 3 At this time, the analysis of the cross-linked products was limited to electrophoretic and photometric methods, such as sodium dodecyl polyacrylamide gel electrophoresis (SDS-PAGE) and light scattering, and thus unable to reveal the actual interaction site. Nowadays, with the implementation of MS to study biomolecular interactions, not only the involved proteins and so-far ISSN: 1469-0667 doi: 10.1255/ejms.1259

unknown binding partners, but also the contact sites, can be identified and principally quantified. Nevertheless chemical cross-linking remains a challenging field with several bottlenecks to overcome (I to VI). I. A cross-linking experiment is system dependent and careful sample preparation is most crucial to maintain a native complex formation, so that the complex of interest remains functional prior to cross-linking. II. In general, experiments have to be thoroughly planned to avoid unwanted side reactions or hydrolysis of the reagents and to achieve a maximum yield of the anticipated crosslinked products. Therefore, for every system the reaction conditions have to be adjusted, the cross-linking agents and controls carefully chosen, and the reaction conditions optimized. © IM Publications LLP 2014 All rights reserved

100

Catch Me If You Can: Challenges and Application of Cross-Linking Approaches

Figure 1. Cross-link types. Possible types of cross-linked products: type 0 (mono-link, dead-end), type 1 (intramolecular link), type 2 (intermolecular link) and type 3 (mixed product or multi-link).

III. However, the most challenging part is the cross-link analysis and interpretation of the MS derived data. The crosslink products can be distinguished in: mono-links (type 0), where just one reactive site of the reagent reacted with the target (dead end), intramolecular (type 1) cross-links, where the cross-link connects to different areas of the same molecule, the intermolecular (type 2) product, where the agent forms a covalent bond between two distinct target molecules, and the mixed product (type 3) where multiple cross-link types can be found within a single product. Therefore, a standard nomenclature, such as the one introduced by Schilling et al.,4 is pivotal to clearly define the obtained cross-link products (Figure 1). IV. A major drawback in cross-linking experiments is the analysis of the cross-linked products itself. Here, multiple challenges have to be overcome: first, the usually low abundant­product of interest must be detected out of the gross of unlinked molecules, second, the cross-linked product must be analytically accessible and, third, the different cross-linking types must be experimentally distinguished from each other. For the analysis of protein interaction sites, most often cross-links between interacting subunits or binding partners (intermolecular, type 2) matter. Unfortunately, usually mono-links (dead-ends, type 0) and intramolecular cross-links are generated as well. Whereas intramolecular cross-links can be used for validation of crystal structures 5 or to define new docking parameters and modeling restraints for protein complexes,6 mono-links provide nearly no information.7 Instead, they can interfere with the analysis of useful inter/intramolecular cross-links by increased sample complexity and background. V. Another challenge in cross-linking approaches is the computational identification of the investigated product. Manual MS data interpretation is inconvenient and time consuming, as their low abundance renders assignment of cross-links difficult;8 moreover, two series of fragment ions derived from a single branched product must be assigned to their origin to identify the interaction sites. To overcome all these problems, dedicated data-interpretation software

tailored for the used cross-linker and MS conditions (ionization, fragmentation method, fragmentation level) have to be combined with a database search. However, these results are just the initial step to verify cross-link products. VI. The validation of cross-linked products is a separate challenge. In further studies, the candidates have to be pinpointed with biochemical9 and bioinformatic approaches6. For the structural validation of binding partners, several options are available: distance plausibility in three-dimensional models (e.g., PDB data in pyMol10), Xwalk for cross-link prediction and validation11 or sitedirected mutagenesis to exchange candidate amino acids to prove the authenticity of an identified cross-link.12 To address these challenges (I–VI) at the levels of sample preparation, cross-link formation, analysis and identification, different experimental strategies (e.g., for cross-link design, analysis and computational identification) have to be applied for a successful experiment and are discussed in the following paragraphs.

Cross-linking workflow In general, two different cross-linking approaches can be distinguished: the bottom-up approach, which assesses a complex hydrolysate, and the top-down approach in which the entire complex is analyzed, and thereby the cross-linking itself can be carried out in vitro and in vivo.13 While in vitro cross-linking is usually based on the classical reaction of pre-purified proteins and protein complexes, the reaction in in vivo cross-linking is done in living cells in a more complex analytical environment. However, the initial steps of the top-down or bottom-up approach are accomplished in a very similar fashion (Figure 2). In step 1, the complex between the interacting partners has to be formed and thereby the complex should still be in the native state to avoid artificial results. Artificial results can occur if the target protein/protein complex loses its native state or if, by chance, interactions are covalently formed, which arise in non-natural environment and concentrations.14 Another point to be considered is the chance that the bound reagent alters the conformation of the target and therefore misleading links are found. Additionally, in vitro cross-linking of co-purifying­ complexes involves the danger of losing transient and weak binding partners.15 In  vivo cross-linking overcomes these risks but needs, by contrast, elaborate enrichment steps after the cross-linking experiment to reduce the complexity.16 As mentioned before, a careful validation, including the comparison with existing structure models,14 may allow artifacts to be identified. Especially, structural changes of the target could be investigated by testing the functionality or binding behavior. Avoiding artifacts may be managed by choosing an environment and reagent–target ratio that does not disturb the target in its native state. However, this needs to be adjusted for every experiment to find reaction conditions that work well.

V. Tinnefeld, A. Sickmann and R. Ahrends, Eur. J. Mass Spectrom. 20, 99–116 (2014) 101

Figure 2. Workflow of the bottom-up (left) and top-down (right) approach for in vitro cross-linking experiments. After complex formation (1) the cross-linking reaction is carried out (2). The cross-linking experiment has to be optimized (3) to reach the maximum yield of cross-link products. After proteolysis (4) and optional enrichment (5), the cross-linked compounds are analyzed by MS (6) and interaction partners and the crosslink site are identified by data analysis (7), applying dedicated cross-link software.

In the second step, the cross-linking experiment is performed and the two interaction partners are covalently coupled. For an in vitro experiment the cross-link reaction order (steps 1 and 2) can be inverted, since often at least one binding partner is pre-purified and can be modified with the cross-linking reagent prior to complex formation, which can reduce the number of introduced mono-links, for example if an ultraviolet (UV) reactive cross-linker is used.17 Also, the chance for cross-links of residues which would not be any more accessible after complex formation is enhanced. Typical protein concentrations for in vitro experiments are 5–10 µM – this “low” concentration may decrease the yield of cross-linked products, but helps to avoid protein aggregation and therefore artificial results.18 Cross-linker reagents are commonly used in 20- to 500-fold molar excess. For the classical­bottom-up approach the sample is injected in the femtomole to picomole range for MS analysis, while for the less-sensitive top-down experiments samples in the nanomole range19 are needed. In many cases the reaction conditions after initial crosslinking have to be further optimized (step 3). Important

parameters are: solvent/buffer composition, temperature, pH, concentration of interaction partners and cross-linking reagents, as well as the reaction time. For structural analysis it is desirable to have many cross-links within the target to obtain as many binding constraints as possible. By contrast, for targeted cross-linking or interaction studies fewer crosslinks, even only one, can hold all the information needed. An effective and straightforward way to monitor the success of the reaction is the use of SDS-PAGE for the analysis of protein cross-linking yield. Another possibility is the analysis of the undigested reaction mixture with matrix-associated laserdesorption ionization (MALDI)-MS to match it with the control and different reaction condition. Here the intensities and variance of the products can be compared in in vitro studies, but the complex must be pre-purified prior MS analysis, limiting a general application. Up to this point, for both cross-linking approaches all steps are carried out in the same fashion. Now the investigator has to decide if a bottom-up or a top-down analysis of the crosslinked complex/complexes should be carried out.19 These two approaches are inherently different; whereas the top-down approach allows the whole cross-linked complex to be analyzed, the bottom-up approach utilizes the hydrolyzation of the complex and is therefore, from the analytical perspective, often more accessible and so more applicable to different biological systems. With a bottom up approach the cross-linked complex is hydrolyzed (step 4) by the use of endoproteinases such as Trypsin,20 AspN,21 GluC,9 Lys,C22 or chymotrypsin,23 or by chemical cleavage (e.g., cyanogen bromide24). Thus the generated peptides allow (step 5) for the subsequent purification­ and enrichment of cross-linking products by affinity,25 size exclusion26 or ion-exchange chromatography­,27 which are optional prior to MS analysis but facilitate the identification­of the low abundant cross-linked products. In step 6, the MS analysis of a cross-linked peptide/protein is a two-phase process. First, the precursor mass of the crosslinked product has to be determined and then the fragment ions of the isolated precursor must be acquired. In the topdown approach19 the intact protein complex is purified in the gas phase (step 5), the precursor isolated and the entire complex fragmented by tandem mass spectroscopy (MS/MS), allowing for the distinction of internally cross-linked protein monomers­from covalent dimers by mass. Both approaches have their advantages and limitations. As the bottom-up approach offers more opportunities to change and optimize the conditions for the successful cross-link identification (e.g., change of protease, enrichment strategy, cross-link identification) and peptides are more efficiently accessible by liquid chromatography (LC)-MS than proteins, the bottom-up strategy is preferentially applied. In step 7, subsequent to MS analysis, the obtained precursor ion and fragment ion data are computationally analyzed and interpreted by specialized software­, which can identify the interacting partners and the exact cross-link site by a database search 28 or de novo sequencing.29 The approach described allows, in general, the investigation of complexes independent of whether the

102

Catch Me If You Can: Challenges and Application of Cross-Linking Approaches

protein  interaction partner is a protein, lipid, RNA, DNA or sugar. Generally, for top-down and in vitro experiments the purity of the target molecules is crucial to avoid artifacts in crosslinking experiments with non-target proteins and chargestate competition during electrospray ionization (ESI) process. To be able to analyze an intact protein complex in gas-phase MS experiments, the quantity of the purified complex is as important as its purity. Since the sensitivity and detection limit are much lower than in the bottom-up approaches,30 for large protein complexes (>100 kDa) a purity of more than 80% and concentrations around 0.5 to 1 g l–1 are required.30 Higher concentrations are also needed compared to those for bottom-up strategies because larger molecules are not transferred under the most efficient conditions into the gas phase,31 precursor intensities are distributed over multiple charge states32 and larger molecules generate a higher number of fragments ion, which implies that a higher precursor quantity is needed.

Cross-linker design Cross-linking reagents are available in many structural variations – they differ in length, functional groups, presence of reaction sites and special properties, such as isotope and affinity tags or cleavage sites, as referred in Bioconjugate Techniques.33 In this review, we preferentially highlight some of the well-established designs and properties of reagents commonly used for cross-linking. The simplest design and shortest cross-linkers are zerolength cross-linkers [Figure 3(a)], such as EDC [1-ethyl3-(3-dimethylaminopropyl) carbodiimide hydrochloride]. They act as coupling agents without adding atoms and are often utilized to identify protein–protein interactions at a large scale. 34,35 Unfortunately, it is not possible to introduce functional groups such as isotope labels and affinity tags, which impedes the analysis and data interpretation. However, avoiding an additional linker can be advantageous if the cross‑linker itself disturbs the complex formation or

Figure 3. Overview of cross-linker designs and reactions. (a) Example for a common zero-length cross-linker. (b) Protein cross-linker with a homobifunctional (orange), heterobifunctional (orange, purple), trifunctional (orange, purple, brown) or multifunctional (orange, purple, brown, blue) cross-linker design, thereby functioning as affinity, cleavable or isotopic tags, can be introduced by the third functionality or via the spacer group (blue). (c) Protein–lipid cross-linker with functional sites utilized for cross-linking (orange) or for enrichment (purple). (d) Protein-RNA cross-linking reaction sites of nucleobases and amino acids.131

V. Tinnefeld, A. Sickmann and R. Ahrends, Eur. J. Mass Spectrom. 20, 99–116 (2014) 103

further analysis.36 Additionally, for modeling approaches they are advantageous, because the likeliness of bond formation is limited to a close spatial proximity.37 However, most protein cross-linkers are bifunctional [Figure 3(b)]: they carry two functional groups that allow the crosslinking between different amino acid residues. Furthermore, tri- and multifunctional reagents are available containing additional functional groups that facilitate identification and/or enrichment of the cross-linked interaction partners. Thus, trifunctional cross-linkers often include a tag, which allows affinity enrichment before MS analysis.25 Biotin tags, for instance, enable purification of the cross-linked product by streptavidin-affinity chromatography (K d = 4 × 10–14 M).25 Especially for in vivo cross-linking, reagents with affinity tags such as biotin should be preferred, because they facilitate the enrichment of the cross-linking product from a highly complex background after cell lysis.38 Some trifunctional cross-linkers contain cleavable sites within their affinity tags, and thereby after enrichment the cleavage of the tag is utilized to elute the cross-linked product from the stationary phase and simultaneously remove the tag prior to MS analysis, which promotes ionization and fragmentation of the cross-link.33 In general, bifunctional linkers can be differentiated in homobifunctional cross-linkers with the same functional group or heterobifunctional ones with two different functional groups separated by a spacer [Figure 3(b)]. Homobifunctional cross-linkers have specificity toward one amino acid residue, whereas heterobifunctional cross-linkers react with different residues. Notably, the length of the spacer arm determines the radius in which the reaction can occur and consequently provides important information about molecular distance restraints. Green et  al. gave a comprehensive overview of the distances that several cross-linking reagents typically span.39 Thereby, many homo- and heterobifunctional crosslinkers with spacer arm lengths up to 35 Å are available.40,41 Although the absolute distance between two linked residues can hardly be proven with cross-linking caused by the flexibility of the residues­and the cross-linking reagent, the use of cross-linkers of different length provides a hint of the distances between two residues. Novak et al. showed that the use of reagents with different spacers (5.8–11.4 Å) allowed the distance restraints within ubiquitin to be determined more precisely.42 In addition, further special properties can be incorporated into cross-linker reagents to enhance the identification of a cross-linked complex. One method commonly applied to discriminate a non-cross-linked from the cross-linked molecule­is stable isotope or chemical coding of the formed product by the used reagent. Another functionality that can be introduced into the spacer are cleavable sites. The corresponding cleavage can be performed chemically43 and/ or photochemically44 prior to MS analysis (e.g., reduction of disulfides33), during ionization (MALDI45) or in gas-phase fragmentation experiments [e.g., collision-induced dissociation (CID) cleavable,46 infrared multiphoton dissociation 47). The latter is advantageous as the mass-to-charge-ratio (m/z) of

the interacting sites and the cleaved cross-linked products can be analyzed separately.45 To study protein–metabolite interactions, a slightly different cross-linker design is used. Often the reactive cross-linking group is directly incorporated into the metabolic proteinbinding partner [Figure 3(c)] by applying chemical synthesis or a native polymerization process.48 To avoid steric hindrance of binding, special attention has to be paid to ensure that the chosen functionality is sufficiently small to avoid disturbance of the metabolite–protein interaction.

Functional groups and side reactions The most crucial parts of a cross-linking reagent are the reactive groups that form covalent bonds between residues. As only a limited number of residues are sufficiently reactive to form novel covalent bonds, the number of organic chemical reactions is limited. In this section we focus on common functional groups of cross-linking reagents, their targets and unwanted side reactions. Although cross-linking has many applications, protein– protein cross-linking is most widely used, and therefore particular functional groups were designed to react specifically with amino acid residues, targeting thiols, amines and carboxylates. Owing to the frequency of amino groups in an average protein,49 homobifunctional amine cross-linkers are most prominent, especially imidoesters and N-hydroxysuccinimide (NHS) esters, both of which have been in use for over 40 years. 50,51 They react primarily with amines such as the N-terminus and lysine side chains. Since lysine is one of the most abundant amino acids,49 especially on the surface of proteins, they are easily accessible for cross-linking. Their reactivity toward nucleophiles, like primary and secondary amines, is increased with raised pH. Both types of amine cross-linkers also display side reactivity toward hydroxyls (e.g., serine, tyrosine, threonine) at pH 7.5. Under alkaline conditions it was shown that NHS-esters react preferentially with lysines and the N-terminus, while under acidic conditions the N-terminus and tyrosines react preferentially. 52,53 One major challenge to deal with is the reactivity toward water as the hydrolysis deactivates the cross-linker and prevents any other reaction. Thus, a relatively high protein and cross-linker concentration and freshly prepared solutions are crucial for an effective cross-linking experiment. Also, the pH of the crosslinking reaction should be in the range pH 7–9 to prevent hydrolysis and destabilization of the amine cross-link. Well known representatives of imidoesters are dimethyl apidimidate (DMA), dimethyl pimelidate (DMP) and dimethyl suberimidate (DMS), which are all, by nature of their structure, water soluble. Common NHS esters are not water soluble, but by adding sulfo-groups this problem can be solved.54 Prominent examples are disuccinimidyl suberate (DSS) and its watersoluble analog bis(sulfosuccinimidyl) suberate (BS3). For both

104

Catch Me If You Can: Challenges and Application of Cross-Linking Approaches

amine reactive groups many more reagents are available, which only differ by the length of the spacer arm.33 There are many other reactive groups for amine–amine cross-linking, but compared with imidoesters and NHS esters they are rarely used, so that even newly developed cross-link approaches come back to these long-known chemical functionalities.43,45 Based on nucleophilicity, the most reactive amino acid residue is cysteine. However, in proteins cysteines are, by a factor of three for eukaryotes and of six for eubacteria, less abundant than lysines,49,55 and they are easily accessible for modifications (e.g., oxidation, palmitoylation,56 prenylation,57 and S-nitrosylation58) and form disulfide bonds in vitro. Thus, they are less commonly used as targets for cross-linking experiments, but remain the second most applied chemistry. Next to maleimides and methanethiosulfonate,9 pyridyl disulfides are used for sulfhydryl cross-linking. Most of these react rapidly and result in stable bonds with high yield.33 Maleimides are very specific for sulfhydryls at pH 6.5–7.5, but at a higher pH side reactions with amines can occur. Under optimal conditions, in a nucleophilic addition the thiole is attached to the maleimide double bond and the cross-link is formed. Bis-maleimidohexane is an example of a simple homobifunctional cross-linker with a hexanespacer arm. In addition to maleimides, pyridyl disulfides are frequently applied for thiol cross-linking.33 DPDPB {1,4-di-[3¢(2¢-pyridyldithio)propionamido]butane}, a pyridyl disulfide, forms new disulfide bonds by releasing pyridine-2-thione. These products can be observed during reaction by their characteristic absorbance (343 nm),33 which can be utilized for the optimization of the reaction conditions. In general, sulfhydryl reactive groups are commonly used in heterobifunctional crosslinkers in combination with functionalities targeting primary amines like NHS esters or imidoesters.33 Another application of thiol-reactive cross-linkers is in targeted approaches: here homobifunctional forms of these cross-linkers are used to target single-cysteine variants of proteins. In these proteins, native cysteines are mutated into other amino acids and single cysteines are introduced to obtain only the desired cross-link between the single-cysteine mutation sites. Friedhoff’s group is using this approach successfully to investigate the DNA missmatch repair system.59 Less common are carboxyl reactive cross-linkers, such as zero-length carbodiimides. This group of coupling agents mediates the amide/phosphoramidate linkage of primary amines with carboxylates (e.g., aspartic acid, glutamatic acid, C-terminus) or phosphates and is therefore not limited to protein–protein cross-linking (e.g., oligonucleotide–protein cross-links).33 Next to the already-named specificities, further amino acid residues can be modified including arginine, histidine and tyrosine.33 Arginine can be targeted specifically by 1,2-dicarbonyls­like glyoxal derivatives. Histidine reacts under alkaline conditions with bis-diazonium derivatives, but at too high a pH (>8.0) tyrosines are also attacked.33 Therefore, specific reactions to these amino acids are quite limited and the above-mentioned amino acids should be preferred for cross-linking experiments.

Besides site-specific reagents, cross-linkers with less specificity are also used. A common cross-linker for in vivo crosslinking is formaldehyde, because it easily enters the cell, is rather nonspecific in reactivity (primary/secondary amines and amides with active hydrogen33) but in contrast is specific for physical proximity (2 Å). Additionally, it inactivates enzyme function and therefore “freezes” the cell in its current state.15 Besides chemical cross-linkers targeting specific amino acid residues and unspecific cross-linker as formaldehyde, another important group for cross-linking reagents are photo-reactive non-specific cross-linkers. These reagents are quite nonreactive prior to light activation and therefore highly controllable.33 The most prominent reactive groups are (halogenated) aryl azides, benzophenones, anthraquinones, diazo compounds and diazirine compounds. 60 Only heterofunctional photo cross-linkers (amine reactive and photoreactive) are commercially available—two of them are presented here. A well-established example is ANB-NOS (N-5-azido-2nitrobenzoyloxysuccinimide), which contains a NHS ester and a nitroaryl azide. The activation of the photoreactive group is induced by UV light at 320–350 nm, which does not affect the protein61 but activates the cross-linker. A more recent development are cross-linkers with a diazirine reaction site as the photoreactive group (NHS–diazirine),62 which is activated at 330–370 nm. Diazirines are far more stable in daylight than nitroaryl azides and are therefore easier to handle. Several NHS–diazirine cross-linkers of different spacer-arm length and cleavability can be purchased. The photoreactive site is not specific to certain amino acids and therefore the chance to form a cross-link is more likely. However, with these heterobifunctional reagents an additional clean up step in the cross-link reaction is needed to prevent the formation of dead-end mono-links during the photoactivation.63 Although this increases the extent of labor, simultaneously it is an easy way to prevent mono-links in general. Furthermore, since photo cross-linker can also be introduced into metabolites, such as sugars and lipids, they are also suited to study protein– metabolite­binding and interaction in vivo. More sophisticated is the use of photoactivatable derivatives of natural amino acids for in vivo experiments, which are incorporated by a synthetic transfer RNA (t-RNA)–aminoacylt-RNA-synthethase pair.64 This has the advantage that nearly any modification can be incorporated in a protein, but the need for synthetic t-RNA is a huge challenge. An easier concept is realized with the photoactivatable diazirine derivatives of leucine and methionine, which can be added directly to cell cultures. With this “feed and flash” strategy, in  vivo crosslinking is therefore easily accessible for a global analysis.65 Another in vivo approach with a photoreactive cross-linker was presented in 2009 by Yan et al.66 They developed a new reagent [5-(4-benzoylbenzamido)-4¢,5¢-di(1,3,2-dithiarsolan2-yl) fluorescein], which they named targeted releasable affinity probe (TRAP). This reagent contains a highly specific affinity site, which labels an engineered tagging site in a target protein, and a photoreactive benzophenone site, which captures the binding partner. After the reaction the target

V. Tinnefeld, A. Sickmann and R. Ahrends, Eur. J. Mass Spectrom. 20, 99–116 (2014) 105

protein is released to facilitate the analysis of the binding partner modified by TRAP. Unfortunately, poor labeling specificity and cellular toxicity, as well as undesired palmitoylation and oxidation of the engineered tagging site, negatively affect the applicability.67 The cross-linking of protein–RNA/DNA, protein–lipids or protein–sugar interactions typically does not use the traditional bifunctional cross-linker design. For lipid–protein crosslinking, most often the same chemistry is applied as in protein photo cross-linking. With photoactivatable cholines as probes, phospholipids or cholesterols are added to the in vitro or in vivo systems68 to elucidate potential protein-binding candidates. By an addition of an azide group incorporated into the lipid crosslinker and the use of click chemistry,69 fluorescent tags for detection or biotin tags for enrichment can be introduced into the protein–lipid complex [Figure 3(c)]. For RNA/DNA–protein cross-linking either the natural photoreactivity [Figure 3(d)] of the nucleobases is utilized or photoreactive base analogs [e.g., 4-thiouridine (4SU)], which improve the yield of the crosslink product,70 are incorporated. RNA–protein cross-linking is possible with all amino acids except for proline, but especially with cysteine, lysine, methionine, phenylalanine, tryptophan and tyrosine high reactivity is achieved.70 Carbohydrate–protein cross-linking has a long history in the generation of protein–sugar bioconjugates: to cross-link sugar moieties to amino acid residues, aldehydes out of primary and secondary alcohols are formed, which can be coupled to the target via a hydrazine derivative.71,72 A more recent approach and analytically more accessible technique is the use of photoactivatable cross-linking sugars. Here synthetic monosaccharides bearing a diazirine functionality are metabolically incorporated into sugar polymers attached to glycoproteins and, through UV activation, cross-linked to the interacting protein in their proximity.48 Even if protein–metabolite cross-linking is not widely used, it holds a strong potential, since key technologies such as lipidomics73 are an emerging field and metabolomic control is moving into the scientific focus.74 Many cross-linking reagents are commercially available. Pierce (www.piercenet.com) offers a wide variety of classical cross-linkers with different target specificity, spacer arm length and isotope labeling as well as the photoreactive amino acids l-photo-leucine/-methionine. More sophisticated reagents are provided by CreativeMolecules (www.creativemolecules.com), which especially has isotope labeled and cleavable reagents as well as linkers with affinity tags on offer. At Toronto Research Chemicals (www.trc-canada.com) different cross-linkers with amine and sulfhydryl specificity can also be purchased.

Advanced strategies for cross‑linking analysis To overcome the analytical challenges in cross-linking approaches several key features must be fulfilled after product

formation. First, the cross-linked product is generally of low abundance by nature and has to be enriched and isolated from non-cross-linked components. Second, the cross-linked product has to be clearly identifiable from the background of non-cross-linked molecules in an MS-based experiment. Finally, the design of the cross-link should facilitate identification and subsequent computational analysis. For the enrichment of the cross-linked product different chromatography-based techniques can be applied. Most common for experiments of a smaller scale are biotin-affinity enrichment strategies25 as they allow for a specific enrichment. However, biotin tags can compromise the MS/MS identification by introducing additional fragment ions and influencing the fragmentation process of the cross-linked peptides. To avoid this perturbation, cleavable biotin tags, which can be cleaved off prior MS analysis, can be applied. A different approach for enrichment is to use the properties of the crosslinked peptides themselves. An average fully tryptic peptide has a molecular mass below 1500 Da and a charge state of +2 under an acidic environment, whereas cross-linked peptides easily exceed both mass and charge state. To enrich these branched molecules one must apply techniques such as size exclusion chromatography (SEC) and cation exchange chromatography (SCX), which will separate the cross-linked peptides from unmodified peptides and mono-links.75 Leitner et al. demonstrated that the SEC enrichment strategy yields a threefold increase of the cross-link identification rate, and when combined with a multiprotease digest this can be increased to a factor of four for the analysis of the 20S proteasome of Schizosaccharomyces pombe.76 A slightly different strategy for cross-link separation compared to the classical LC-MS bottom up approach is the use of ion mobility spectrometry (IMS). Via ESI-IMS-MS the cross-linked products can be separated in the IMS device.77 The separation is based on the mobility differences of the ions, which depends on their mass and geometry, while drifting through an inert gas under a weak electric field.78 This even allows the separation of ions with the same mass but different geometry caused by the position of the cross-link. Even after enrichment the main challenge remains to identify­the cross-link out of a reaction mixture. By using soft ionization techniques such as ESI and MALDI, the analytes are  introduced into the gas phase and different strategies based on mass differences can be very beneficial to analyze and identify cross-links out of a background of noncross‑linked molecules. A common strategy to distinguish the cross-link product from the background is the introduction of a mass code to the cross-linked complex. This can be accomplished­either by the cross-linker itself or at the analyte level. To introduce a mass code the cross-linking experiment is performed twice under identical conditions, but applying different stable-isotope coded cross-linkers that have the same physicochemical properties in the experiment, but clearly differ in mass, or a mixture of light- and heavy-coded cross-linkers is used. After pooling samples, the subsequent

106

Catch Me If You Can: Challenges and Application of Cross-Linking Approaches

Figure 4. Use of cleavable, isotopic labeled cross-linkers to distinguish between type 0, 1, and 2 cross-link types. The cross-linking reaction is carried out twice, once with a stable isotope coded cross-linker D12-DGS and once with a noncoded DGS cross-linker. The two samples are combined, digested and analyzed by MALDI-TOF-MS. All cross-link products will produce a doublet with peaks 12 Da apart. To distinguish type 0, 1 and 2 cross-links from each other, in a second reaction the cross-linker is chemically cleaved. The derived spectra will display different doublets or mass shifts, making them unique for the individual cross-link type. Dead-end links produce a doublet with 4 Da mass differences and a mass shift of minus 162 Da (succinyl and the ethylene glycol group are cleaved off), while intra-protein cross-links have a doublet with 8 Da mass differences after cleavage and will display a mass shift of minus 62 Da (ethylene glycol). Inter-protein cross-links show two doublets with 4 Da mass differences after cleavage and an additional significant mass shift as one of peptides is cleaved off.

MS experiment detects all cross-linked peptides as doublets with the corresponding mass difference, such that they can be easily distinguished from non-cross-linked peptides or proteins (Figures 4 and 5). Müller et  al. introduced this technique in 2001 with four-times deuterated (heavy) and

Figure 5. Chemical cross-link coding. MALDI-TOF spectrum of SC-MutL (single cysteine variant of MutL) after cross-linking with MTS-BP-Bio {2-[Na-benzoylbenzoicamido-N6(6-biotinamidocaproyl)-l-lysinylamido]ethyl methanethiosulfonate­}, affinity purification and chemical coding with NMM and NEM, which creates a characteristic pattern with a 14 Da mass difference.9 (a) MALDI-TOF-MS of the MutL–MutL cross-link after trypsin digestion and (b) after the digestion with trypsin and Glu-C. The cross-link of the MutL dimer is indicated with X and the control peptide for the chemical modification­with K1-Bio.

nondeuterated­(light) cross-linkers to investigate the Op18tubulin complex.79 Heavy and light forms of the linker were mixed in a 1 : 1 ratio and so specific doublets with 4 Da mass differences on the MS1 level facilitated discovery and identification. Notably, such approaches also enable the quantitative comparison of different states of a purified complex. Chemical coding of cross-linked peptides based on similar molecules is a cheap alternative method similar to stable-isotope labeling. Friedhoff and co-workers used N-methylmaleimide (NMM) and N-ethylmaleimide (NEM) to label a thiol group in their cross-linker following a successful cross-linking reaction.9 Since these labels differ in mass by 14 Da, characteristic doublets in the MS are created (Figure 5), which correlates with the effect of isotopically coded reagents. One clear drawback, however, is that, in contrast to stableisotope incorporation, the chemical properties of the added moieties are slightly different. Consequently, this technique can easily be combined with MALDI, whereas it is not suitable for LC-MS, as the differential­labels induce massive retentiontime shifts during LC and can be separated by LC resulting in different ionization efficiencies. Besides the previously discussed techniques to facilitate identification on the reagent level (e.g., stable isotope and chemical coding), additional strategies include the introduction of a mass code directly on a linked peptide. Digestion with trypsin in 18O-enriched water is one possibility. Here, upon cleavage trypsin incorporates heavy oxygen atoms at the C-terminus into each peptide, occurring twice in cross-linked peptides. By measuring a mixture of the sample digested with normal water and 18O-water, the cross-linked peptides

V. Tinnefeld, A. Sickmann and R. Ahrends, Eur. J. Mass Spectrom. 20, 99–116 (2014) 107

are easily identified by the characteristic mass shift. 80 The mass code can also be introduced by N-terminal labeling with reagents coded with stable isotopes. However, in this approach lysine residues will react with the reagents, thus interfering with trypsin-based digestion. Therefore, Petrotchenko et al. established a strategy with distinct isotope-labeled crosslinkers and N-terminal labeling,81 which enabled them to differentiate between intermolecular cross-linked peptides and free lysine-containing peptides. A quite novel approach is the XChem-Finder workflow by the group of Zhou, which includes also 18O-labeling.29 In this workflow, they combined 18O-labeling for the detection of cross-linked peptides with the determination of the crosslinked peptide sequences by a classical database search and de novo sequencing. By combination of partial sequences and sequence tags, full sequences and involved functional groups are deduced. With this approach they discovered 14 thioether peptides in IgG2 that had not been reported before. Besides mass coding, chemical cleavable cross-linking reagents are very useful for identification and subsequent data analysis. The protein interaction reporter (PIR) strategy published by Tang and Bruce in 2005 benefits from the advantages given by the cleavable cross-linkers.82 The PIR cross-linker contains two labile bonds and therefore reporter ions are also produced. At first glance it is not obvious why breaking the initially formed bond could be helpful, but the branched structures of cross-linked peptides resulting in complex MS/MS fragment spectra complicates the identification of linked peptides. The use of a cleavable cross-linker enhances the fragmentation and consequently sequencing as well,83 since an individual peptide is analyzed. Furthermore, this reduces the computational analysis time for the subsequent database search. As a result, the time required for a database search is reduced considerably, because instead of a mass combination [two linked peptides, following (n2 + n)/2] just linear peptides have to be searched out of a complex mixture. Additionally the reporter ion allows distinguishing between dead-end and  inter-/intramolecular cross-links, because dead-end reporter ions will have one site hydrolyzed and therefore a higher mass. A similar development is achieved with a cross-linker that is isotopically labeled and photo-cleavable. Additional characteristic fragments are produced, which allow further distinction of different types of formed cross-linked products (types 0, 1, 2 and 3). Petrotchenko et al. demonstrated this technique with the cross-linker H/D12-DGS [ethylene glycol bis(succinimidylsuccinate)].43 The main principle is shown in Figure 4. Less prominent is the top-down approach for cross-linked proteins. For this strategy analysis by ESI Fourier-transform ion cyclotron resonance (ESI-FT-ICR), or the faster and more sensitive orbitrap analyzer, is chosen,84 while for fragmentation electron capture dissociation should be applied.34 Kruppa et al. chose the top-down approach for the analysis of crosslinked ubiquitin with DSS by FT-ICR-MS19 to show its advantages: after cross-linking no additional steps prior to MS

were required (e.g., digestion, enrichment or LC-separation of cross-linked and non-cross-linked peptides), since it is possible to isolate singly cross-linked proteins in the FT analyzer by mass (“gas-phase purification”32). A novel analytical strategy, embedded in the top-down approach, is gas phase cross-linking, where covalent links in an ion/ion gas phase reaction with peptide cations and cross-linker anions85 are formed. Webb et al. performed gas phase intramolecular cross-linking of ubiquitin with sulfo-NHS esters and subsequently used CID to localize the formed links. They demonstrated an extended top-down approach in the gas phase,86 which has some advantages and disadvantages compared to in-solution cross-linking. On the one hand, hydrolysis and other side reactions are avoided; on the other hand, lower reactivity and probability of a cross-link were shown.86 Owing to the simplicity of the workflow, the top-down approach is compatible with automation and high throughput. However, in general top-down proteomics still has some issues to overcome, such as impaired sensitivity in MS analysis and a much lower number of available bioinformatic tools.87 Another constraint is the low complexity of the sample and the size and number of investigated protein complexes. Despite all these strategies for cross-link analysis, data interpretation is still a challenge. In the next section some possibilities for automated data analysis are presented.

Data analysis of cross-linked proteins After the successful reaction and MS/MS analysis the next challenge is data analysis with specially tailored software. Even with the numerous tools available today, data analysis is still a major bottle neck in the field since software that fits all purposes and approaches is not available yet. If data analysis is conducted using a bottom-up strategy, the applied software utilizes MS and MS/MS data and a protein sequence databases. To date, more than 20 different search engines for cross-linking analysis are available and were reviewed recently by Mayne and Patterton.88 With this extensive selection of programs, it is up to the user to find the best fit for the application depending on the experimental and instrumental settings (Table 1). A small selection that performed well in our hands and is freely available online are discussed in the following paragraphs. One of the latest programs published is pLink.89 It is a Windows program working on command line, which requires MS and MS/MS in mgf (Mascot generic file) and protein sequence(s) in a fasta formats.90 In a text file (plink.ini) search conditions are set and common cross-linkers and enzymes are available, but user-defined parameters can be added as well. The results are presented in a browser or in Excel files. pLink identifies mono-links, intra-links and inter-links and searches with isotope-labeled cross-linkers. Depending on settings, data amount and computer performance a search

108

Catch Me If You Can: Challenges and Application of Cross-Linking Approaches

Table 1. Overview of programs to analyze cross- linking MS and MS/MS data.

Software

Online

References

Additional notes

pLink

Yes

 89

Downloadable program, no user interface

CrossWork

Yes

 91

Downloadable program

MassMatrix

Yes

 92

xQuest

Yes

94, 97

GPMAW

Yes

 98

Commercially

CLPM

Yes

 99

Web-based application

PROWL’s PeptideMap

Yes

100

Especially for disulfide bridges, web-based application

SearchXLinks

No

101

Especially for disulfide bridges

Web-based application or download Isotope-labeled cross-linkers, web-based or download

CrossSearch

No

102

Web-based application

iXLINK

No

103

For 18 O-labeled samples

ProCrossLink

Yes

104

For 18 O-labeled samples

X-Link

No

105

Request from developer

X-Links

No

106

Request from developer

ASAP

No

 26

Web-based application

MSX-3D

Yes

107

Web-based application

VIRTUALMSLAB

No

108

Request from developer

Crux

Yes

109

Downloadable program

MS2Assign

No

  4

Web-based application

X!Link

No

110

Request from developer

MS2Links

No

111

For protein–nucleic acid complexes

MS2PRO

No

 19

Request from developer

XLink-Identifier

Yes

  7

Web-based application

ICC-CLASS

Yes

Creative molecules*

xComb

Yes

112

MS-Bridge

Yes

Prospector**

StavroX

Yes

113

Especially for isotope-labeled cross-linkers Prepares data for search with standard search engines Web-based application Downloadable program/web-based application (Java)

*www.creativemolecules.com, **http://prospector.ucsf.edu

of a single file can take less than ten minutes with standard personal desktop computer (processor 64 bit, memory 8 GB). CrossWork is a software by MassAI Bioinformatics.91 This program provides a user interface and nearly the same features as pLink. The time needed for one search is also similar (processor 64 bit, memory 8 GB). The analysis includes two steps: searching and validation. In the search MS data (mgf) and a protein sequence (fasta) are loaded and the parameters for enzyme, cross-linker (isotope labeled or unlabeled) and MS conditions are set. In the validation step a scoring algorithm (CWscore) is employed, which assesses the quality of the potential hits, considering different features (e.g., overall sequence coverage, number of complementary b/y and b/a ion pairs). Thereby, the program differentiates between mono-links, intra-links and inter-links. A web-based application is MassMatrix.92 After free registration, MS and MS/MS data (mzXML93) and a protein database (fasta) are uploaded. There are also many protein databases provided online (e.g., IPI human, IPI mouse, NCBI human).

After defining the search settings, the search is started and results can be viewed in a browser or downloaded as an Excel file. Found cross-links are shown in a heat map. The duration of the upload and search depends on the server on which MassMatrix runs. In the best case, the search is fast (