Plastid Proteomics in Higher Plants: Current State ... - Plant Physiology

4 downloads 281594 Views 241KB Size Report
MCB–1021963, IOS–0701736, and IOS–0922560), by the Swiss National Science .... are under way to collect all available information for plants to generate a plant .... localization calls as judged by comparisons with sub- sequent independent ...
Update on Plastid Proteomics in Higher Plants

Plastid Proteomics in Higher Plants: Current State and Future Goals1 Klaas J. van Wijk* and Sacha Baginsky Department of Plant Biology, Cornell University, Ithaca, New York 14853 (K.J.v.W.); and Martin-Luther-Universita¨t Halle-Wittenberg, Institut fu¨r Biochemie, 06120 Halle, Germany (S.B.)

SIGNIFICANCE OF PLASTIDS IN PLANT BIOLOGY

Plastids are plant cell organelles with many essential functions in plant metabolism. Among these are photosynthesis, amino acid and fatty acid biosynthesis, as well as the synthesis of several secondary metabolites. All plastids originate from undifferentiated proplastids, which are restricted to meristematic tissues and undifferentiated cells. Depending on the tissue, proplastids can develop into different plastid types (e.g. amyloplasts in storage tissue, chloroplasts in photosynthetic tissues, and chromoplasts in fruits and flowers). Other specialized plastid types include gerontoplasts, the plastids of senescent leaves that are important for resource allocation, oleoplasts, which are oil storage plastids in olive (Olea europaea), and etioplasts, the final stage of proplastid development in photosynthetic tissues in the dark (Wise, 2006). Finally, plastid types can possibly specialize to different degrees depending on cell type, developmental state, and (a)biotic conditions. An extreme case are the highly specialized C4 chloroplasts in bundle-sheath and mesophyll cells in the maize (Zea mays) leaf, with strong differences in proteome composition (Friso et al., 2010). With the ability to develop and differentiate, plastids add versatile biosynthetic capacity to the plant cell and are responsible for unique biosynthetic pathways that make plants unrivaled biochemical factories that are essential for life on earth. Thus, significant research efforts are underway that aim at understanding plastid biology in depth. A decade ago, the first plastid proteomics study was published and the potential of plastid proteomics was outlined (van Wijk, 2000). Since then, proteomics of plastids and plant (sub)proteomes has delivered on its promise. Here, we provide an update on the current status of plastid proteome research a decade after the first reports.

1

This work was supported by the National Science Foundation (grant nos. MCB–1021963, IOS–0701736, and IOS–0922560), by the Swiss National Science Foundation (grant no. 31003A_127202), and by the Martin-Luther-University Halle-Wittenberg. * Corresponding author; e-mail [email protected]. www.plantphysiol.org/cgi/doi/10.1104/pp.111.172932 1578

ADVANCES IN PLASTID PROTEOMICS AND PROTEOMICS TECHNOLOGY

Plant and plastid proteomics are now well established scientific disciplines with many laboratories contributing to their progress. Not surprisingly, a significant fraction of the plastid proteome is characterized today, and available information includes protein quantities, protein interactions, and posttranslational modifications (PTMs), as will be briefly highlighted in this update. However several of the challenges for plastid proteomics outlined 10 years ago still exist today, including the detection of low-abundance proteins (e.g. more than 10,000-fold lower than Rubisco) and capturing the dynamics of plastid proteomes. Much of the progress has been driven by technology development and improved genomics resources. The main difference between proteomic technologies today and 10 years ago is the much improved sensitivity (routinely at 1–50 fmol), the accelerated duty cycle (now tandem mass spectrometry [MS/MS] scans within a few hundred ms), the improved mass accuracy (down to a few ppm for peptides), and the increased resolution (up to 100,000) of the latest generation mass spectrometers. Furthermore coupling of nano-liquid chromatography (LC) with MS/MS is now routine, and split-free nano-LC systems now deliver low flow rates for nanospray ionization with excellent reproducibility. Also important are improved software tools for the reliable identification of peptides based on MS/ MS spectra along with statistically sound estimates of false discovery rates in large data sets. With the maturation of proteomics work flows, quantitative information for plastid proteins became available. These new technologies, in combination with the availability of multiple sequenced plant genomes, now allow for answering more comprehensive and sophisticated questions as compared with a decade ago (Baginsky, 2009; Gstaiger and Aebersold, 2009; Schulze and Usadel, 2010; Walther and Mann, 2010). In this update, we review progress on plastid proteomics and lay out a series of challenges that can be addressed within the next few years. Table I provides an overview with Web-based plant proteomics resources that are relevant for plastid proteomics. We will center this update on the concept of a plastid protein atlas. Box 1 provides examples of the wide range of queries that a high-quality plastid protein atlas

Plant PhysiologyÒ, April 2011, Vol. 155, pp. 1578–1588, www.plantphysiol.org Ó 2011 American Society of Plant Biologists

Plastid Proteomics

Table I. Plant and plastid proteomics databases that provide information and tools for finding plastid proteins in Arabidopsis and other plant species, as well as functional annotation, posttranslational modifications, peptide information, and spectral data Abbreviations for species are as follows: At, Arabidopsis; Ca, bell pepper; Cr, the green algae C. reinhardtii; Mt, Medicago truncatula; Nt, tobacco; Os, rice; St, potato; Zm, maize.

Database

Species

AtProteome (http://gator. mascproteomics.org/)

At

PPDB (http://ppdb.tc. cornell.edu)

At, Zm, Os

AT_CHLORO (http://www. grenoble.prabi.fr/ at_chloro/)

At

plprot (http://fgczatproteome. unizh.ch)

SUBA (http://suba. plantenergy. uw.a.edu.au/)

At, Os, Nt, Ca

At

Main Purpose/Objective

Experimental information about identified proteotypic peptides, protein abundance in different organs, and spectral/peptide evidence for gene models Curated information for all proteins and protein models in At and Zm, including protein information and functional annotation; experimental information about leaf and subcellular fractions with MS-based identification details, including spectral counts and PTMs In-house analysis of At chloroplast proteome and its substructures (envelope, stroma, thylakoid) with detailed proteomic information (peptides, molecular mass, retention times, identification statistics) Proteome analyses of Os etioplasts, At chloroplast, Ca chromoplasts, and the undifferentiated proplastid-like organelles of Nt BY2 cells, plastid typespecific functions

Facilitate subcellular protein localization analysis based on different public prediction tools, proteomics papers, and GFP/yellow fluorescent protein localization studies; allow combinatorial queries on the contained data

In-House Experimental Plant Material

Type of In-House Experimental Information

Functional Annotation (Name and Function)

Subcellular Localization

Predictors

TAIR

None, but detailed organ information

None

Zm and At In-house leaves and MS-based chloroplast identification: fractions (e.g. peptides, ion stroma, thylakoids, scores, ppm, lumen, Mowse plastoglobules, scores, meta etc.); for Zm, information also BSC-and MC-specific chloroplasts to study C4 effects; for At, also different mutant background in ecotype Columbia Chloroplast In-house fractions MS-based (envelope, stroma, identification: thylakoid) peptides, ion scores, ppm, Mowse scores, meta information

Manual curation of name, functional annotation by MapMan

Chloroplast stroma, thylakoid, envelope

None

Manual curation of name, functional annotation by MapMan

Chloroplast stroma, thylakoid, envelope

None

Different plastid types from various species

None

Identifications from isolated plastids

None

Links to TAIR, AmiGO, and UniPROT

Users can employ various queries to generate answers

Multiple localization predictors

Different organs of At

None; literature only

In-house MS-based identification: peptides, ion scores, ppm, Mowse scores, meta information

Peptide dentifications, homologs identified in other plastid types, interactive two-dimensional PAGE from differently illuminated etioplasts Not applicable

(Table continues on following page.)

Plant Physiol. Vol. 155, 2011

1579

van Wijk and Baginsky

Table I. (Continued from previous page.)

Database

Species

PhosPhAt (http://phosphat. mpimp-golm. mpg.de/)

At

RIPP-DB (http://database. riken.jp/sw/links/ en/ria102i/)

At, Os

ProMex (http://promex. pph.univie. ac.at/promex)

At, Cr, Mt, St

Main Purpose/Objective

Phosphoproteome information from published and unpublished sources, identified peptides, or ions with annotated phosphorylation sites (where available); provides a P-site prediction tool

At

Plant phosphoproteome database with information on phosphopeptides by LC-MS/MS-based shotgun phosphoproteomics Mass spectral reference database of tryptic peptides from plant proteomes

Os and At cell cultures

phosphoproteome at different conditions

Different sources

should ultimately be able to answer. For more exhaustive overviews on proteomics of plastids, we refer to two recent review articles and the references therein (Baginsky, 2009; Agrawal et al., 2010); space restrictions do not allow us to cite recent literature more extensively.

DEVELOPMENT OF A PLASTID PROTEIN ATLAS

The concept of a protein atlas was established several years ago in particular for the human proteome (http://www.proteinatlas.org/). This concept involves the generation of protein inventories for each organ and subcellular localization tagged with additional protein information, such as splice variants, PTMs, protein-protein interactions, etc. Similar efforts are under way to collect all available information for plants to generate a plant proteome atlas that includes proteome information for the different plant organs and their organelles. Here, we concentrate on the plastid proteome atlas. In order to disseminate biologically useful information, such a plastid atlas should include the following: (1) protein accessions for each plastid type, including cellular specialization and subplastid localization; (2) information on peptide coverage of each identified protein and possibly different gene models; (3) steady-state protein abundance under a set of well-defined (a)biotic conditions as well as developmental states; (4) protein-protein interactions, protein-nucleotide assemblies, and oligomeric 1580

Type of In-House Experimental Information

In-House Experimental Plant Material

Specific information about peptide properties, annotated biological function, as well as the analytical context; provides the phosphopeptide spectrum Described in associated papers

Display of MS/MS spectra with annotation

Functional Annotation (Name and Function)

Subcellular Localization

Predictors

TAIR

None

Plant specific P-predictor (pSer, pThr, pTyr)

Hyperlinks to other databases

None

None

None

None

None

state; (5) reversible and irreversible PTMs; and (6) bioinformatics information such as subcellular localization predictions and network information. Plastids are among the best characterized cell organelles at the proteome level, and a quality chloroplast protein atlas is now emerging. However, the plastid protein atlas is far from complete, and strategies to improve proteome coverage and in-depth characterization must be developed and implemented. In the following paragraphs, we will review the status of each of the six components of the plastid protein atlas and outline strategies for improvement. Improved Protein Inventories for Each Plastid Type, Including Cellular Specialization and Subplastid Localization

The predicted size of the combined proteome of all plastid types ranges from 2,000 to 3,500 proteins in Arabidopsis (Arabidopsis thaliana), representing about 7% to 12% of all predicted protein-encoding genes. However, only about 1,200 proteins are currently recognized as being plastid localized (see the Plant Proteome Database [PPDB] at http://ppdb.tc.cornell. edu). Comparing this experimental plastid proteome data set with the predicted plastid proteome showed that, in particular, plastid proteins involved in signaling and plastid gene expression and RNA metabolism are strongly underrepresented. There are several reasons why a significant percentage of plastid proteins Plant Physiol. Vol. 155, 2011

Plastid Proteomics

Box 1. Examples of the wide range of queries that a high-quality plastid protein atlas should ultimately be able to answer. Current answers for Arabidopsis proteins are provided with reference to the databases and resources listed in Table I. Information for maize and rice is available in a subset of the databases (Table I). This box also serves to better identify the lack of information and challenges for plastid proteome research and resource development for the immediate future.

has not yet been recognized: (1) low abundance in chloroplasts (i.e. their detection is obscured by highly abundant photosynthetic proteins); (2) specific expression in a certain plastid type other than chloroplast; (3) only expressed under very specific conditions (developmental state, abiotic condition, or biotic challenge); or (4) too few ionizable tryptic peptides (e.g. transmembrane proteins with very short loops and tails or very small or basic proteins). Plastid proteome coverage can be improved by using better MS instrumentation with higher sensitivity, accuracy, and faster duty cycle, the use of alternative enzymes for protein digestion, more specific (e.g. affinity-based) fractionation of plastid proteomes, or increased efforts to analyze a more diverse set of plastid types, including heterotrophic plastids. However, as analytical sensiPlant Physiol. Vol. 155, 2011

tivity increases with these additional efforts, the challenge to distinguish between true positive and false positive plastid proteins increases as well. Based on the last decade of plastid proteome research, it is clear that objective filtering strategies for false positive identification and/or assignment to plastids are essential. The most practical solution involves repeated analysis of independent plastid preparations and the use of quantitative protein information for improved filtering of the identified proteins, based on two steps: (1) repeat observations in independent plastid preparations, because proteins that are observed at high frequency across these preparations are more likely to be bona fide plastid proteins; and (2) combined proteome information from unfractionated tissue and different purified organelles 1581

van Wijk and Baginsky

to recognize false positives that more highly accumulate in other subcellular locations, which requires quantitative information about relative protein abundance. Such relative quantification for these different sample types ideally should be done with the same experimental work flow, and a good example is available from the Localization of Organelle Proteins by Isotope Tagging (LOPIT) technique, which uses relative protein quantification along density gradients to assign proteins to organelles by association (Dunkley et al., 2006). The “frequency” filter is based on the assumption that nonplastid contaminants or false positive identifications are random events; therefore, this first filter does not remove systematic false positives, such as high-abundance cytosolic proteins, which can contaminate isolated chloroplasts. A small percentage of plastid proteins are also located elsewhere in the cell, and approximately 50 dual-targeted proteins have been discovered for Arabidopsis so far (Carrie et al., 2009). Most of these have shared locations with mitochondria and are involved in plastid or mitochondrial gene expression (e.g. t-RNA synthetases); however, shared localizations with the nucleus, peroxisome, or cytosol have also been described. Detection of such dual locations requires independent information, typically from image analysis using fluorescent fusion proteins and ideally also from phenotypical analysis of mutants. Collection of all available protein localization data from individual functional studies, as well as proteomics studies, is an important tool in the conclusive assignment of proteins to the plastid. For instance, it helps to recognize abundant proteins often identified in dozens of proteomics papers as potential contaminants. The SUBA database (http://suba.plantenergy. uwa.edu.au/) collects information for Arabidopsis that is available about the localization of a certain protein (e.g. MS/MS data, GFP localization, and prediction tools) and allows assembling lists of organellar proteins with self-defined reliability criteria. The PPDB accumulates similar information for Arabidopsis as well as maize and combines it with in-house MS/MS-based quantitative information on total leaf extracts and isolated plastid fraction (stored in PPDB) to manually evaluate this information and make a manual assignment for subcellular localization. This manual curation step using a conservative threshold (i.e. no call is made unless there is deemed sufficient evidence) has been proven to result in high-confidence localization calls as judged by comparisons with subsequent independent experimental localization studies by GFP fusions and image analysis. Another way to help complete the plastid proteome inventory is to analyze plastid types specialized for specific tasks in their resident tissue (organ or cell type) because they differ considerably in their protein composition. However, this is challenging for Arabidopsis, since its seeds and flowers are small and it does not develop storage organs. Thus, organelle isolation is often impractical, and proteome analyses 1582

are better performed at the level of the entire organ, as illustrated by the analysis of plastids in seeds (Chen et al., 2009). Several groups tried to circumvent this problem by using different plant species (e.g. tobacco [Nicotiana tabacum], bell pepper [Capsicum annuum], spinach [Spinacia oleracea], pea [Pisum sativum], wheat [Triticum aestivum], potato [Solanum tuberosum], tomato [Solanum lycopersicum], or Brassica rapa) for the analysis of amyloplasts, chromoplast, proplastids, and leucoplasts (Agrawal et al., 2010). However, so far, this has not significantly increased the number of identified plastid proteins, in part due to the lack of complete genome sequence information. Exceptions are rice (Oryza sativa) and maize, because good-quality genome annotation is available for these two organisms and the coverage of the plastid proteome of maize is now quite comparable to Arabidopsis, in part because cell typespecific chloroplasts, specialized for specific functions, were included (Friso et al., 2010). Importantly, this allowed the identification of C4-specific metabolic chloroplast envelope transporters and also helped identify many new subunits of the elusive thylakoid NADPH dehydrogenase complex involved in cyclic electron flow (Brau¨tigam et al., 2008; Majeran et al., 2008). The spatial distribution of proteins within chloroplasts has been the target of several proteome analyses, originally starting with the thylakoid lumen and peripheral soluble thylakoid proteins (Peltier et al., 2000), followed by systematic analyses of the thylakoid and envelope membrane proteomes, the soluble stroma proteome, specialized thylakoid-associated lipoprotein particles, assigned plastoglobules, and proteins associated with the plastid chromosome (Baginsky, 2009; Agrawal et al., 2010). A recent study separated the Arabidopsis chloroplast proteome into soluble proteins and thylakoid and envelope membrane proteins (Ferro et al., 2010). Protein localization to each subcompartment was based on the abundance distribution of identified proteins in different purified fractions. Information about the protein composition of the chloroplast subcompartments is available in PPDB and AT_CHLORO (http://www.grenoble. prabi.fr/at_chloro/). Because of space constraints in this update, we refer the reader to the most recent and comprehensive review with extensive literature citations (Agrawal et al., 2010) instead of discussing the original literature in this report. Discovery and Significance of Gene Models

Many genes have more than one annotated gene model; in some cases the different models only affect untranslated 5# and 3# ends, whereas in others this affects the actual translated region. This is achieved by different transcription start sites or by alternative splicing (AS). AS has received considerable attention at the transcript level, in particular since new generation sequencing techniques now allow for large-scale detection of AS. At least 20% of plant genes have one or more alternative transcript isoform. The majority of Plant Physiol. Vol. 155, 2011

Plastid Proteomics

these AS events have not been functionally characterized, but evidence suggests that AS participates in important plant functions, including stress response, and may impact domestication and trait selection. Alternative transcription start sites or AS can result in proteins with different N or C termini or internal protein regions, potentially affecting subcellular localization and functions. Indeed, one of the mechanisms for dual targeting is that two different proteins that differ in their N termini are generated from a single gene (Peeters and Small, 2001). Matching MS data to these different gene models can help to identify the most relevant predicted protein forms. In the PPDB (for Arabidopsis and maize) and AtProteome (http:// www.pep2pro.ethz.ch), peptide identification data are projected on each gene model, allowing evaluation of the most relevant models. However, a systematic analysis of the consequences of AS at the plant proteome level has not been carried out; this is not surprising, given the challenges associated with obtaining nearly complete sequence coverage (i.e. the percentage of primary amino acid sequence for which peptides are detected) that is required to distinguish different gene models. In the case of MS-based quantitative proteomics, decisions have to be made on how to handle protein models. For instance, one model may have more matched peptides than another model. One solution is to select only the information for the highest scoring model or, alternatively, to collect and sum all matched peptides for all protein models of a gene. In practice, this may not affect most quantifications, but it is important to systematically implement a chosen procedure. The van Wijk laboratory consistently selected the higher scoring protein model (calculated across all samples for the specific analysis), and if there was no difference in protein score between models, the model with the lowest digit was selected (Friso et al., 2010). Other laboratories sum up all spectral counts for a gene and remove the model information (Baerenfaller et al., 2008). Either method has its merits, and it is important that the applied procedure be transparent. Protein Abundance within the Plastid

The range of protein accumulation levels in plant organs and within the plastid likely spans up to approximately 10 orders of magnitude. Using onedimensional gel separation, followed by in-gel digestion and the latest generation of tandem mass spectrometers for untargeted (“shotgun”) analysis with data-dependent acquisition, proteins are typically identified within an abundance range of 5 to maximally 6 orders of magnitude. Mapping plastid protein abundance is important to understand the composition of protein complexes, functionalities of plastid membranes and plastid particles such as plastoglobules or nucleoids, as well as understanding plastid metabolism and consideration of metabolic flux. In addition, as discussed in the previous section, relative protein abundance measurements are also an imporPlant Physiol. Vol. 155, 2011

tant tool to evaluate if proteins are indeed plastid localized. When discussing protein quantification, we must distinguish between (1) measuring protein mass or protein concentration within a sample and (2) comparing relative protein concentrations (or mass) of the same protein between different samples. The latter case is often referred to as measuring differential protein expression or “functional proteomics” [e.g. when studying the effect of (a)biotic stress, developmental processes, or mutants]. Most (plant) protein quantification studies relate to differential expression (functional proteomics). In this section, we will discuss the first case, whereas the second case is briefly discussed below (see Employing the Plastid Proteome Atlas for Functional Analysis). The two strategies that have so far been employed to map protein abundance within the plastid are (1) image analysis of stained two-dimensional gels and (2) MS-based quantification using spectral counting. Quantification using two-dimensional gel electrophoresis with isoelectric focusing (IEF) as the first dimension was used in most gel-based studies (e.g. for the thylakoid lumen [Schubert et al., 2002] or soluble proteins in rice etioplasts [Kleffmann et al., 2007]); however, in most other studies, this was applied to “functional proteomics.” Two-dimensional gel electrophoresis with native gel electrophoresis as the first dimension was used to determine a quantitative map of soluble chloroplast proteins and their oligomeric states in the stroma of Arabidopsis (Peltier et al., 2006). In a subsequent study, Arabidopsis stromal proteins were quantified using MS-based spectral counting (Zybailov et al., 2008). Both complementary procedures were also carried out for chloroplast membranes and stromal fractions of isolated bundle sheath and mesophyll cells of maize leaves (Majeran et al., 2008). The advantage of IEF-based two-dimensional gels lies mostly in the higher resolution of IEF compared with native gels; however, IEF gels systematically lead to (often strong) underestimation of higher molecular mass proteins and hydrophobic proteins, whereas proteins with extreme pI (less than 4 or greater than 10) are harder to resolve. For the mapping of absolute protein abundances, including membrane proteins, colorless native or blue native gels are thus the better alternatives. Directly comparing image- and MS-based methodologies showed that image-based quantification is very limited in the number of proteins that can be accurately quantified, because protein spots need to be fully separated from other spots to avoid quantifying protein mixtures. Furthermore, the quantification is significantly affected by the amino acid composition, because current dyes bind in particular to basic residues, leading to overestimation or underestimation of proteins, depending on the amino acid composition. MS-based quantification allows for the quantification of a much larger number of proteins, typically resulting in a higher dynamic range. However, highly abundant proteins (e.g. the approximately 10–20 1583

van Wijk and Baginsky

most abundant proteins in a sample) are often underestimated because of the necessary use of datadependent acquisition (for numbers, see Zybailov et al., 2008), whereas proteins quantified with low numbers of MS/MS spectra can show quite large sample-tosample variation. In general, proteins are most accurately quantified if multiple unique peptides are detected, each in high numbers. Conversion of protein mass quantification (either by the image analysis- or MS-based quantification) to protein concentration requires normalization by either the number of predicted tryptic peptides within the relevant mass window (in the case of MS-based quantification) or by protein length or mass (for both image- and MS-based quantification). Despite the advantages described above, two-dimensional PAGE still has a place in quantitative proteomics, in particular for the analysis of protein complexes and because it provides an immediate visible overview of the proteome. The “gold standard” for protein abundance measurements is to spike the sample with isotope-labeled proteins or proteotypic peptides, assigned as “isotope dilution” (Brun et al., 2009). These peptides can be generated by in vitro synthesis or by expression as a concatamer of proteotypic peptides after construction of a synthetic gene, QconCAT. Both methods require significant investments and typically are applied to smaller numbers of proteins; therefore, these techniques are currently not practical for the quantification of hundreds of proteins and have so far been applied only to targeted analysis of selected plastid pathways (Wienkoop et al., 2010). However, efforts are under way to establish QconCAT to determine the stoichiometry of the Clp protease complex and for the quantification of specific plastid (plant) metabolic pathways or plastid processes. Protein-Protein Interactions, Protein-Nucleotide Assemblies, and Oligomeric State

To carry out their metabolic, structural, or signaling functions, many plastid proteins form transient or stable interactions with other proteins. Few undirected systematic protein interaction studies have been carried out for soluble stromal complexes, either by native gel electrophoresis (below 800 kD) or by chromatography (above 800 kD; Olinares et al., 2010); these two complementary studies provide an overview of the oligomeric state of more than 1,000 proteins. In particular, protein assemblies larger than 800 kD are dominated by functions in plastid gene expression, including nucleoids, mRNA metabolism, and ribosomes. The interaction of plastid proteins with DNA or RNA constitutes a regulatory network of gene expression. The largest structures of several megadaltons are nucleoids also known as transcriptionally active chromosome, which contains several copies of plastid DNA and dozens of DNA- and RNA-binding proteins, including proteins likely regulating nucleoid activities through reduction/oxidation or phosphory1584

lation (Pfalz et al., 2006). Envelope membrane-protein complexes are dominated by the translocon complexes at the inner and outer envelope membranes. These import complexes are functionally relatively well characterized by a variety of techniques, including blue-native gels (Kikuchi et al., 2009, and refs. therein). The abundant photosynthetic protein complexes in the thylakoid membrane have been a target for biochemical research for several decades and are now well characterized through a number of methodologies. Most proteins in these complexes have been identified and characterized by MS, and for some of them, PTMs have been determined by intact protein MS (Whitelegge, 2004). More detailed protein-protein interaction studies, using either coimmunoprecipitation or affinity purification using transgenic plants that express tagged transgenes, are needed to better characterize the plastid proteome interactome. This will help to better understand in particular the regulation of metabolism and plastid gene expression and to build reliable protein interaction networks to complement the plastid proteome atlas. Reversible and Irreversible PTMs

Most proteins undergo reversible and sometimes irreversible modifications. Large-scale analysis of PTMs, using a high-resolution, high-accuracy LTQOrbitrap mass spectrometer, was carried out for chloroplast membranes and stroma as well as total leaf extracts, and the frequencies of many PTMs were calculated (Zybailov et al., 2009). This analysis provides a framework for search parameters and the use of retention times for improved assignment of PTMs in large-scale proteomics and helps in distinguishing artificial modifications from those with a biological relevance. For nucleus-encoded plastid proteins, the most typical irreversible in vivo modification is proteolytic cleavage of the N-terminal transit peptide, the cTP. In the case of most plastid-encoded proteins, typically the N-terminal Met is removed by methione amino peptidases, which have been identified in plastids. Another frequent N-terminal modification that occurs after the removal of N-terminal targeting information is N-terminal acetylation. Because N-terminal acetylation requires in situ enzyme activity, it provides a reliable determination of the N terminus and thus valuable information about the processing site for transit peptides of imported chloroplast proteins. Thus, N-terminal acetylation allows mapping the in vivo N termini of plastid and cytosolic proteins. Kleffmann et al. (2007) established for a small set of proteins from rice etioplasts the in vivo N terminus and found that there is good agreement between the detected N-terminal peptide and the predicted processing peptidase cleavage site. Similarly, Zybailov et al. (2008) identified a larger set of N-terminal acetylated proteins in Arabidopsis chloroplasts and provided additional context information for the proPlant Physiol. Vol. 155, 2011

Plastid Proteomics

cessing protease cleavage site, also indicating that the predictive cleavage site is one residue off from the actual cleavage site. Improvements for cleavage site prediction should be possible based on the now available larger training set. PTMs often determine enzymatic activities and rapidly adjust enzyme activity to the requirements of cellular metabolism; protein abundance likely corresponds to maximal (theoretical) activity but is not always a good indicator for in vivo enzyme activity and its net contribution to cell metabolism. It is well established that reversible phosphorylation and reduction/oxidation (e.g. through the action of different types of plastid thioredoxins) are key regulators of plastid metabolism as well as plastid gene expression (Dietz and Pfannschmidt, 2010). Several proteomics studies identified thioredoxin targets by affinity chromatography, whereas other redox proteomics approaches used diagonal electrophoresis under reducing and oxidizing conditions to identify proteins under redox control in vivo (Dietz and Pfannschmidt, 2010). These analyses demonstrated that many chloroplast functions are regulated by thioredoxinmediated disulfide/dithiol exchange or by currently unknown redox modulators. Among these functions are isoprenoid and tetrapyrrole biosynthesis, starch biosynthesis and degradation, gene expression, protein folding and degradation, and vitamin biosynthesis. Redox targets in the thylakoid lumen were identified, and inhibition of the activity of the xanthophyll cycle enzyme violaxanthin deepoxidase by reduction (i.e. dithiol generation) was established (Dietz and Pfannschmidt, 2010). Over the last few years, two thylakoid-associated kinases (STN7 and STN8) as well as a thylakoidassociated phosphatase (TAP38/PPH1) have been identified, and their functions were investigated by functional analysis of Arabidopsis mutants (Lemeille and Rochaix, 2010). The reversible phosphorylation system at the thylakoid membrane regulates photosynthetic state transitions to optimize light absorption as well as long-term light adaptation. A total of 175 phosphorylated chloroplast proteins were identified, with 80% Ser and 20% Thr phosphorylation but no Tyr phosphorylation. One of the thylakoid kinases, STN7, was found to be an abundant phosphoprotein in vivo, suggesting the existence of kinase cascades in the chloroplast. Information about the exact site of phosphorylation was used to extract kinase motifs that are useful footprints for kinase activity in vivo (Reiland et al., 2009). Cumulative evidence for plant proteome phosphorylations are collected in various databases, such as the PhosPhAt database for Arabidopsis (http://phosphat.mpimp-golm.mpg.de/). Subcellular Localization Predictions and Network Information

The distribution of cellular functions to distinct cell organelles is an important organization principle that Plant Physiol. Vol. 155, 2011

needs to be understood to model metabolic and protein interaction networks and to make predictions at the systems scale. Thus, analyses of the protein composition of cell organelles were reported for virtually all plant cell organelles or membranes (Baginsky, 2009; Agrawal et al., 2010). At present, plant modeling and systems analysis approaches with subcellular organelles suffer from incomplete proteome identification and annotation. More complete organelle inventories will strengthen modeling efforts, and higher network consistencies should be obtained. In order to make a contribution to model quality, however, protein localization data should have low false positive rates (e.g. below 1%). Therefore, conservative assignment of protein subcellular localization in papers and public databases is better than overassignment of proteins, particularly since it is not really possible to associate a P value for subcellular localization assignment based on experimental data. Thus, the community’s goal should be a plastid proteome atlas with high sensitivity and a very low false positive rate. In addition to the experimental organelle proteome analysis, subcellular localization prediction is a possible source of information for “missing” plastid proteins, even if suboptimal. The generation of software routines to predict subcellular protein localization for plants, other eukaryotes, as well as prokaryotes has been in progress for well over a decade, in particular inspired by the increasing amount of protein inventories for different subcellular localizations. These inventories provide essential training and test sets. Whereas the prediction of N-terminal signal peptides for signal recognition particle-dependent targeting to the endoplasmic reticulum is rather accurate and sensitive, prediction of plastid localization is much less satisfactory and still attracts considerable attention. A consensus prediction combining several predictors using a naive Bayes method was suggested to improve both sensitivity and specificity for plastid and mitochondrial proteins (Schwacke et al., 2007). In the last 2 to 3 years, several new localization predictors (e. g. AtSubP, Subchlo, RSLpred, MultiP, Plant-mPLoc) were published for plants, mostly focusing on Arabidopsis. While each predictor may have advantages over the others, it is not clear that their prediction has a better true positive discovery rate for plastid proteins (i.e. a higher sensitivity) at a lower false positive discovery rate (i.e. a better specificity) than the most popular predictor, TargetP (http://www.cbs.dtu.dk/ services/TargetP/). TargetP is still the most commonly used predictor for plastid as well as plant mitochondrial localization that not only predicts localization but also the cTP and mTP cleavage sites. There is still some controversy regarding the true positive prediction rate of TargetP, which was found to differ between experimental data sets. While plastid proteome studies from the van Wijk laboratory and others reported true positive prediction rates in the range of 85%, consistent with the benchmark tests obtained during TargetP training, other 1585

van Wijk and Baginsky

groups found much lower prediction rates on their plastid protein set (Armbruster et al., 2011). Higher TargetP true positive rates (sensitivity) are usually observed when proteins were eliminated, not repeatedly detected in plastid preparations, while also applying conservative thresholds for protein identification (see below). Importantly, sets of detected lowabundance Arabidopsis proteins (several orders of magnitude lower than Rubisco large subunit; e.g. those involved in RNA metabolism) have similar true positive prediction rates as high-abundance proteins (Olinares et al., 2010). However, proteins located in the outer plastid envelope membrane or those reversibly associated with the outer envelope should be excluded from such prediction analysis because they do not possess a cleavable N-terminal plastid-targeting sequence. The main shortcoming of TargetP is the high false positive rate (low accuracy), likely around 35%, leading to an overprediction for plastid proteins. The current sensitivity and accuracy of TargetP are clearly not perfect, and the much larger sets of established subcellular proteomes for Arabidopsis (and to a lesser degree also maize and rice) should be useful to improve the performance of plastid localization predictors. In addition, it is quite likely that a subset of nucleus-encoded plastid proteins have atypical targeting information. For instance, it has been shown for a few plastid proteins that they are targeted to the plastid via the endoplasmic reticulum and that the N terminus of these precursor proteins contains a secretory signal peptide, followed by a cTP (Villarejo et al., 2005). However, scanning for signal peptides of approximately 1,000 established plastid proteins in Arabidopsis suggested that probably very few proteins take this route (Zybailov et al., 2008). It is possible that there is yet another pathway (or recognition system) for protein translocation across the envelope that accounts for the imperfect true positive rate; the recent finding of an envelope-localized SEC system may be relevant here (Skalitzky et al., 2011). Finally, it may be optimal to develop and test localization software for specific species, plant families, or even clades. For instance, monocotyledons such as rice, sorghum (Sorghum bicolor), and maize may have systematically different protein targeting information as compared with dicotyledons such as Arabidopsis, tobacco, pea, and spinach. Indeed, systematic analyses of established rice plastid proteins as well as rice orthologs for Arabidopsis chloroplast proteins showed that Ala instead of Ser or Thr is overrepresented in the cTP (Kleffmann et al., 2007; Zybailov et al., 2008). With detailed information about the enzymatic inventory of organelles, their specific contribution to metabolism and signaling is also accessible to largescale modeling approaches. Genome-scale metabolic networks for the C3 and C4 plants Arabidopsis and maize, respectively, as well as the green algae Chlamydomonas reinhardtii were constructed that take into account compartmentalization and allow assessment of the specific contribution of cell organelles to me1586

tabolism (Dal’Molin et al., 2010). Large-scale proteinprotein interaction networks also benefit significantly from knowledge about the colocalization of proteins in the same organelle. This information decreases false discovery rates in large-scale interaction data sets for Arabidopsis, thereby increasing the reliability of predicted interaction networks. Progress has been made for the assembly of plant organellar phosphorylation networks and for chloroplasts, in particular the (de) phosphorylation-driven movement of light-harvesting complexes in the thylakoid membrane (assigned state transitions; Lemeille and Rochaix, 2010). Studies in nonplant species have shown that, using phosphoproteomics information, it is possible to infer in vivo kinase activities from phosphorylation motifs to provide information about kinase/substrate relationships and, together with localization information, construct in vivo phosphorylation networks. Thus, protein inventories of cell organelles are important constraints in constructing signal transduction networks. Last but not least, publicly available and reliable protein subcellular localization will be helpful and cost effective in the functional analysis of genes and proteins as the need to determine the localization for each protein is fulfilled.

EMPLOYING THE PLASTID PROTEOME ATLAS FOR FUNCTIONAL ANALYSIS AND SYSTEMS BIOLOGY

Even if the plastid protein atlas is not complete, it does provide a rich source of information and a great tool for detailed functional studies. Table I lists the available proteomics resources with relevance to plastid biology, and Box 1 provides a number of example questions that can be addressed with the available tools. Now that the subcellular localization of many proteins is known, it is possible to analyze the qualitative and quantitative effects of mutations of specific organelles without actually purifying these organelles. For instance, quantitative comparative proteome analysis of chloroplasts from wild-type and different chloroplast Clp protease mutants was done using MS-based quantification of total Arabidopsis leaf extracts without actually isolating chloroplasts (Kim et al., 2009). The advantages of characterizing quantitative effects on the chloroplast proteome through analysis of total leaf extracts, rather than through analysis of isolated chloroplasts, are that (1) mutants with strong growth defects can be analyzed, because isolation of chloroplast from such mutants can be very hard or even practically impossible; and (2) more accurate results are obtained for chloroplast mutants with heterogeneity in their leaf phenotype (often with strongest phenotypes in the youngest leaves), because isolation of chloroplasts from such leaves could result in selection of a subset of chloroplast phenotypes not representing the overall chloroplast population. Furthermore, such subcellular proteome information for maize allowed others to help resolve the kinetics of Plant Physiol. Vol. 155, 2011

Plastid Proteomics

organelle biogenesis, the formation of cellular structures, and metabolism during maize leaf development and C4 cellular differentiation (Majeran et al., 2010). The current generation of mass spectrometers have sufficient sensitivity and throughput to detect and quantify a high number of chloroplast proteins even in complex mixtures. Furthermore, such a “total leaf” approach can be helpful for analyses of dynamic PTMs that prevent lengthy organelle isolation procedures (Reiland et al., 2009), in particular if no inhibitors can be applied to prevent change in such PTMs. With a plastid protein atlas for Arabidopsis and maize at hand, it can be expected that large-scale comparisons of chloroplast proteomes, their PTMs, and interaction networks under different conditions and in different genetic backgrounds or developmental states will provide novel insights into plastid biology. DEPOSITION OF PROTEOMICS AND MS INFORMATION IN PUBIC REPOSITORIES

Most published plastid proteomics studies of Arabidopsis provide tables containing lists of the identified proteins using standardized, nonredundant accession numbers provided through The Arabidopsis Information Resource (TAIR). For other plant species, this is more varied either because there is no sequenced genome or significant EST available or because databases are searched, such as the National Center for Biotechnology Information, that contain redundant sets of accessions (e.g. older and newer versions of genes); this can complicate the incorporation of such data sets by other laboratories. However, submission of the underlying mass spectra with associated metadata to public repositories such as the Proteomics Identifications Database (PRIDE; http://www.ebi.ac. uk/pride) will allow other laboratories to make use of these studies. And even for Arabidopsis and other new model (crop) species such as maize and rice, it is important that the mass spectral data be deposited, for instance to help improve search engines, improve genome annotation, or allow for comparative analysis by other laboratories. Indeed, several journals (e.g. Molecular and Cellular Proteomics and Nature Biotechnology) now require the submission of mass spectral data to such public repositories, as is customary for microarray data or RNAseq data sets. Further more detailed descriptions of experimental conditions and acquisition parameters are outlined in the Minimum Information About a Proteomics Experiment descriptions and enforced by several journals. We strongly support following these standards and the deposition of mass spectral data (e.g. converted MGF files) into PRIDE or other repositories.

CONCLUSION

Proteomics of chloroplasts and other plastid types has provided extensive protein inventories as well as Plant Physiol. Vol. 155, 2011

information about PTMs, protein abundances, and protein interactions. Proteomics and MS technologies feeding into plastid proteome information now allow system-level analysis of chloroplast biology, including chloroplast development, signaling, and interaction networks. For the reasons detailed above, we consider a high-quality plastid proteome atlas a milestone in the quest for biologically meaningful systems biology approaches. Together with parallel efforts for other organelles (e.g. mitochondria and peroxisomes), this will help to drive a better understanding of plant growth and development and help to realize the potential of plant systems biology. ACKNOWLEDGMENTS We thank the members of our laboratories for discussions and feedback on the manuscript. Furthermore, we sincerely apologize to all colleagues whose work could not be cited because of space constraints. Received January 20, 2011; accepted February 21, 2011; published February 24, 2011.

LITERATURE CITED Agrawal GK, Bourguignon J, Rolland N, Ephritikhine G, Ferro M, Jaquinod M, Alexiou KG, Chardot T, Chakraborty N, Jolivet P, et al (October 29, 2010) Plant organelle proteomics: collaborating for optimal cell function. Mass Spectrom Rev http://dx.doi.org/10.1002/mas.20301 Armbruster U, Pesaresi P, Pribil M, Hertle A, Leister D (2011) Update on chloroplast research: new tools, new topics, and new trends. Mol Plant 4: 1–16 Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S, Zimmermann P, Grossniklaus U, Gruissem W, Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320: 938–941 Baginsky S (2009) Plant proteomics: concepts, applications, and novel strategies for data interpretation. Mass Spectrom Rev 28: 93–120 Bra¨utigam A, Hoffmann-Benning S, Weber AP (2008) Comparative proteomics of chloroplast envelopes from C3 and C4 plants reveals specific adaptations of the plastid envelope to C4 photosynthesis and candidate proteins required for maintaining C4 metabolite fluxes. Plant Physiol 148: 568–579 Brun V, Masselon C, Garin J, Dupuis A (2009) Isotope dilution strategies for absolute quantitative proteomics. J Proteomics 72: 740–749 Carrie C, Giraud E, Whelan J (2009) Protein transport in organelles: dual targeting of proteins to mitochondria and chloroplasts. FEBS J 276: 1187–1195 Chen M, Mooney BP, Hajduch M, Joshi T, Zhou M, Xu D, Thelen JJ (2009) System analysis of an Arabidopsis mutant altered in de novo fatty acid synthesis reveals diverse changes in seed composition and metabolism. Plant Physiol 150: 27–41 Dal’Molin CG, Quek LE, Palfreyman RW, Brumbley SM, Nielsen LK (2010) C4GEM, a genome-scale metabolic model to study C4 plant metabolism. Plant Physiol 154: 1871–1885 Dietz KJ, Pfannschmidt T (December 30, 2010) Novel regulators in photosynthetic redox control of plant metabolism and gene expression. Plant Physiol http://dx.doi.org/10.1104/pp.110.170043 Dunkley TP, Hester S, Shadforth IP, Runions J, Weimar T, Hanton SL, Griffin JL, Bessant C, Brandizzi F, Hawes C, et al (2006) Mapping the Arabidopsis organelle proteome. Proc Natl Acad Sci USA 103: 6518–6523 Ferro M, Brugie`re S, Salvi D, Seigneurin-Berny D, Court M, Moyet L, Ramus C, Miras S, Mellal M, Le Gall S, et al (2010) AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics 9: 1063–1084 Friso G, Majeran W, Huang M, Sun Q, van Wijk KJ (2010) Reconstruction of metabolic pathways, protein expression, and homeostasis machiner-

1587

van Wijk and Baginsky

ies across maize bundle sheath and mesophyll chloroplasts: large-scale quantitative proteomics using the first maize genome assembly. Plant Physiol 152: 1219–1250 Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nat Rev Genet 10: 617–627 Kikuchi S, Oishi M, Hirabayashi Y, Lee DW, Hwang I, Nakai M (2009) A 1-megadalton translocation complex containing Tic20 and Tic21 mediates chloroplast protein import at the inner envelope membrane. Plant Cell 21: 1781–1797 Kim J, Rudella A, Ramirez Rodriguez V, Zybailov B, Olinares PD, van Wijk KJ (2009) Subunits of the plastid ClpPR protease complex have differential contributions to embryogenesis, plastid biogenesis, and plant development in Arabidopsis. Plant Cell 21: 1669–1692 Kleffmann T, von Zychlinski A, Russenberger D, Hirsch-Hoffmann M, Gehrig P, Gruissem W, Baginsky S (2007) Proteome dynamics during plastid differentiation in rice. Plant Physiol 143: 912–923 Lemeille S, Rochaix JD (2010) State transitions at the crossroad of thylakoid signalling pathways. Photosynth Res 106: 33–46 Majeran W, Friso G, Ponnala L, Connolly B, Huang M, Reidel E, Zhang C, Asakura Y, Bhuiyan NH, Sun Q, et al (2010) Structural and metabolic transitions of C4 leaf development and differentiation defined by microscopy and quantitative proteomics in maize. Plant Cell 22: 3509–3542 Majeran W, Zybailov B, Ytterberg AJ, Dunsmore J, Sun Q, van Wijk KJ (2008) Consequences of C4 differentiation for chloroplast membrane proteomes in maize mesophyll and bundle sheath cells. Mol Cell Proteomics 7: 1609–1638 Olinares PD, Ponnala L, van Wijk KJ (2010) Megadalton complexes in the chloroplast stroma of Arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry, and hierarchical clustering. Mol Cell Proteomics 9: 1594–1615 Peeters N, Small I (2001) Dual targeting to mitochondria and chloroplasts. Biochim Biophys Acta 1541: 54–63 Peltier JB, Cai Y, Sun Q, Zabrouskov V, Giacomelli L, Rudella A, Ytterberg AJ, Rutschow H, van Wijk KJ (2006) The oligomeric stromal proteome of Arabidopsis thaliana chloroplasts. Mol Cell Proteomics 5: 114–133 Peltier JB, Friso G, Kalume DE, Roepstorff P, Nilsson F, Adamska I, van Wijk KJ (2000) Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins. Plant Cell 12: 319–341 Pfalz J, Liere K, Kandlbinder A, Dietz KJ, Oelmu¨ller R (2006) pTAC2, -6, and -12 are components of the transcriptionally active plastid chromo-

1588

some that are required for plastid gene expression. Plant Cell 18: 176–197 Reiland S, Messerli G, Baerenfaller K, Gerrits B, Endler A, Grossmann J, Gruissem W, Baginsky S (2009) Large-scale Arabidopsis phosphoproteome profiling reveals novel chloroplast kinase substrates and phosphorylation networks. Plant Physiol 150: 889–903 Schubert M, Petersson UA, Haas BJ, Funk C, Schro¨der WP, Kieselbach T (2002) Proteome map of the chloroplast lumen of Arabidopsis thaliana. J Biol Chem 277: 8354–8365 Schulze WX, Usadel B (2010) Quantitation in mass-spectrometry-based proteomics. Annu Rev Plant Biol 61: 491–516 Schwacke R, Fischer K, Ketelsen B, Krupinska K, Krause K (2007) Comparative survey of plastid and mitochondrial targeting properties of transcription factors in Arabidopsis and rice. Mol Genet Genomics 277: 631–646 Skalitzky CA, Martin JR, Harwood JH, Beirne JJ, Adamczyk BJ, Heck GR, Cline K, Fernandez DE (2011) Plastids contain a second sec translocase system with essential functions. Plant Physiol 155: 354–369 van Wijk JK (2000) Proteomics of the chloroplast: experimentation and prediction. Trends Plant Sci 5: 420–425 Villarejo A, Bure´n S, Larsson S, De´jardin A, Monne´ M, Rudhe C, Karlsson J, Jansson S, Lerouge P, Rolland N, et al (2005) Evidence for a protein transported through the secretory pathway en route to the higher plant chloroplast. Nat Cell Biol 7: 1224–1231 Walther TC, Mann M (2010) Mass spectrometry-based proteomics in cell biology. J Cell Biol 190: 491–500 Whitelegge JP (2004) Mass spectrometry for high throughput quantitative proteomics in plant research: lessons from thylakoid membranes. Plant Physiol Biochem 42: 919–927 Wienkoop S, Weiss J, May P, Kempa S, Irgang S, Recuenco-Munoz L, Pietzke M, Schwemmer T, Rupprecht J, Egelhofer V, et al (2010) Targeted proteomics for Chlamydomonas reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic flux analyses. Mol Biosyst 6: 1018–1031 Wise RR (2006) The diversity of plastid form and function. In RR Wise, JK Hoober, eds, The Structure and Function of Plastids, Vol 23. Springer, Dordrecht, The Netherlands pp 3–26 Zybailov B, Rutschow H, Friso G, Rudella A, Emanuelsson O, Sun Q, van Wijk KJ (2008) Sorting signals, N-terminal modifications and abundance of the chloroplast proteome. PLoS ONE 3: e1994 Zybailov B, Sun Q, van Wijk KJ (2009) Workflow for large scale detection and validation of peptide modifications by RPLC-LTQ-Orbitrap: application to the Arabidopsis thaliana leaf proteome and an online modified peptide library. Anal Chem 81: 8015–8024

Plant Physiol. Vol. 155, 2011