Technique review Proteomics approaches to biomarker detection

Ming Zhou is a senior scientist in the Laboratory of Proteomics and Analytical Technologies at SAIC-Frederick Inc., Frederick, MD. Thomas P. Conrads is the Director of the Mass Spectrometry Centre at SAICFrederick Inc. The Mass Spectrometry Center is part of the Laboratory of Proteomics and Analytical Technologies. Timothy D. Veenstra is the Director of the Laboratory of Proteomics and Analytical Technologies, SAICFrederick Inc. His laboratory focuses on the application of state of the art mass spectrometry technology for the discovery of novel diagnostic biomarkers for diseases such as cancer.

Technique review Proteomics approaches to biomarker detection Ming Zhou, Thomas P. Conrads and Timothy D. Veenstra Date received: 31st January 2005

Abstract The development of mass spectrometry (MS) technologies has brought the ability to gather massive amounts of data characterising the proteomes of complex mixtures. A major focus in proteomics is to leverage this data-gathering capability to conduct comparative analyses of biofluids from healthy and disease-affected patients for the identification of highly specific biomarkers and/or the development of MS-based diagnostic platforms. Much effort has gone into optimising the biofluid proteome coverage that can be obtained using these technologies, leaving proteomics poised to make an important impact in disease diagnostics in the future.

INTRODUCTION Keywords: biomarkers, serum, mass spectrometry, proteomics

Timothy D. Veenstra, SAIC-Frederick, Inc., National Cancer Institute at Frederick, P.O. Box B, Bldg. 469, Rm. 160, Frederick, MD 21702, USA Tel: +1 301 846 7286 Fax: +1 301 846 6037 E-mail: [email protected]

The meteoric rise in the focus on proteomics technologies has resulted in an unprecedented ability to identify hundreds, if not thousands, of proteins in complex biological mixtures. While the development of this technology was grounded in the analysis of simple prokaryotes1 and eukaryotes,2 much of the attention has turned to applying the same methods to analyse biofluids such as serum, plasma and urine. For example, the number of proteins that have now been identified in blood rose exponentially to well over 1,000 in 2004. While several different approaches are described, it is not yet clear as to which of these, or any other, method is the best one for biomarker discovery. It is likely that no single method will be universally used in the future to discover biomarkers, and the path chosen may greatly depend on the specific disease or previous knowledge of putative biomarkers.

LARGE-SCALE IDENTIFICATION OF BIOFLUID PROTEOMES Before comparative proteomics studies of biofluids for the purpose of identifying

differentially abundant proteins that may serve as disease-specific biomarkers could proceed, methods had to be developed to allow for the comprehensive coverage of such proteomes. Plasma and serum represent the two most important biofluids in which the search for biomarkers has primarily been focused.3 Both of these samples, however, represent a challenge to proteomic characterisation. On the positive side, they have a very high protein concentration (ie 60–80 mg/ml), so that only a very small aliquot would seemingly be needed for global mass spectrometry (MS)-based analysis. On the negative side, a great majority (ie 99 per cent) of this protein content comprises only 22 proteins. Therefore, direct analysis of serum or plasma by MS results in the repeated identification of proteins such as albumin, transferrin and immunoglobulins (Igs). A reasonable hypothesis would suggest that diseasespecific biomarkers would be present within the 1 per cent of proteins that make up the low-abundance component of plasma or serum. Unfortunately, the dynamic range of protein concentration between the low- and high-abundance proteins is thought to span approximately

HENRY STEWART PUBLICATIONS 1473-9550. B R I E F I N G S I N F U N C T I O N A L G E N O M I C S A N D P R O T E O M I C S . VOL 4. NO 1. 69–75. MAY 2005

69

Zhou, Conrads and Veenstra

ten orders of magnitude, which is much greater than the two orders of magnitude that MS measurements are constrained to. Fortunately, investigators have explored many different fractionation methods prior to MS analysis to allow lowabundance proteins to be identified in serum or plasma analysis. These fractionation methods have included immunodepletion, chromatography and/ or electrophoresis.4–6 Fittingly, because of its historically invaluable role in the development of proteomics, two-dimensional polyacrylamide gel electrophoresis (2DPAGE) fractionation played a major part in one of the first large-scale studies that showed the ability to identify hundreds of proteins within plasma.6 Although 2DPAGE has excellent resolution, the plasma sample still needed to be immunodepleted of the highest-abundant proteins (ie albumin, haptoglobins, transferrins, transthyretin, Æ-1-antitrypsin, Æ-1-acid glycoprotein, hemopexin and Æ-2macroglobulin) and sequentially fractionated by anion-exchange and sizeexclusion chromatography prior to its analysis by 2D-PAGE (Figure 1). This fractionation scheme resulted in a total of 74 2D-PAGE gels being run and the subsequent visualisation of approximately 20,000 protein spots. Identification of many of these spots resulted in the net identification of 350 unique proteins, including interleukin-6, metallothionein II, cathepsins and peptide hormones, known to be present in serum at a concentration of less than 10 ng/ml. Almost concurrently, while the abovedescribed immunodepletion/ chromatography/2D-PAGE plasma analysis was being conducted, other investigators were conducting non-gel (or solution-based) methods to characterise plasma. One of the first studies utilised Igdepletion, multi-dimensional chromatography and MS to characterise the serum proteome.4 Serum was depleted of Igs and then digested with trypsin to create a complex mixture of peptides. These peptides were initially

70

separated into 60 fractions using strong cation-exchange chromatography followed by reverse-phase liquid chromatography (RPLC)-MS/MS analysis of each fraction. Almost 500 proteins were identified using this method, including such species as prostate-specific antigen (PSA), which was believed to be present in the serum used in this study at a concentration of approximately 1 pg/ml. Many other global studies have subsequently been conducted with the aim of characterising serum or plasma proteins, bringing the current state of the art coverage up to approximately 1,500 proteins. Developments in fractionation and MS instrumentation have been primarily responsible for this increase in the number of proteins that can be characterised within biofluids such as serum and plasma.

INVESTIGATION OF SUBBIOFLUID PROTEOMES Although they comprise proteins and other biological molecules, biofluids are not typically thought of in the same context as cells or tissues. Cells and tissues are thought of as packages of biomolecules interacting in concert with one another to result in the accomplishment of a series of different processes. Biofluids, however, are typically considered to be ‘rivers’ of proteins that are transported through the body, acting independently, unless they encounter a cell or tissue to which they are supposed to signal for a physiological event to occur. The authors’ laboratory has recently taken a different perspective on how serum and plasma can be analysed on a semi-global level for better characterisation of low-abundance proteins, thereby increasing the chances of finding disease-specific biomarkers. In this study, it was hypothesised that proteins in the blood interact with one another, forming complexes of proteins through both specific and non-specific interactions. Obviously, this hypothesis is not too surprising, as albumin is known as a ‘sticky’ protein that acts as a transporter


Proteomics approaches to biomarker detection

Six gels

Immunodepleted serum

Serum

Anion-exchange chromatography

.

.. .

One gel

One gel

.

. . . . . . . . . . . .

Strong cation-exchange chromatography

.

. .. . .. .. .. . . .. . . . . . . .. .. .. . . .. . . . .

2D-PAGE

Sixty-six gels

Figure 1: Proteome-wide characterisation of human plasma using a combined immunodepletion/chromatographic/twodimensional polyacrylamide gel electrophoresis (2D-PAGE) fractionation strategy followed by mass spectrometry identification. High-abundance protein-depleted serum was serially fractionated using anion and strong cation exchange chromatography, resulting in a total of 74 fractions that were separated and visualised on 2D-PAGE gels. Analysis of the accumulative 20,000 spots resulted in the identification of 350 unique proteins

for blood components such as hormones, cytokines and lipoproteins.7 The second hypothesis was that by targeting and extracting high-abundance proteins, it would be possible to enrich for lowerabundance proteins that complexed to these. Again, it is the authors’ belief that characterisation of this low-abundance fraction optimises the probability of finding novel disease biomarkers. To prove these hypotheses, a ‘sub-biofluid’ fractionation procedure was used, in which a series of antibodies directed against six high-abundance blood proteins

(ie albumin, IgA, IgG, IgM, apolipoprotein and transferrin) were used specifically to capture these proteins, as well as proteins associated with each.8 After a non-stringent washing, the targeted protein, along with all of the species that bound to it, was removed from the antibody column. Since this mixture is ‘contaminated’ with a single highly abundant protein (ie the antibodytargeted protein), the sample was then diluted fivefold in buffered 20 per cent acetonitrile and then passed through a 30 kDa cutoff ultrafiltration membrane to


71


collect the low molecular weight species. These mixtures were then digested with trypsin and analysed by RPLC-MS/MS (Figure 2). Over 200 unique proteins were found to be associated with these high-abundant proteins. Of these, 12 proteins that are currently utilised as experimental or clinical biomarkers were identified, including the known serum proteins PSA, pregnancy plasma protein A, meningioma-expressed antigen and dihydropteridine reductase (Table 1).

NH2 Lys

Bind

Other clinically relevant biomarkers, such as bone morphogenetic protein 3b, prostate transglutaminase, paraneoplastic antigen MA1, glycosylasparaginase, coagulation factor VII precursor, ryanodine receptor 2 and acid sphingomyelinase-like phosphodiesterase 3a, that were identified in this study had not previously been identified in other fully global analyses of blood proteome samples. Obviously, it is very difficult to reach any conclusions on the specificity of the interactions between these biomarkers

NH2 Lys

Cross-link

Protein G

Protein G

Protein G

Incubate serum

Wash

Elute Protein G

Centrifuge

30 kDa MWCO ultrafiltration

Protein G

Trypsin digest m/z

Figure 2: Method for identification of low-abundance proteins bound to common, high-abundance proteins in serum. In this technique, an antibody is covalently coupled to protein G and is then used to extract a specific high-abundance protein (eg albumin, immunoglobulin G, transferrin etc) from serum. A low-stringency wash is used to enable the co-isolation of low-abundance proteins that are bound to the targeted protein. To remove the high-abundance protein, the mixture is passed through a 30 kDa ultrafiltration membrane and then digested with trypsin. The resultant peptides are then identified by liquid chromatography coupled online with tandem mass spectrometry

72



Table 1: Clinically relevant biomarkers that were found to be associated with one or more highly abundant proteins in serum Protein

Association

Marker specificity

Glycosylasparaginase Paraneoplastic antigen MA1 Meningioma-expressed antigen 6/11 Dihydropteridine reductase Coagulation factor VII precursor Acid sphingomyelinase-like phosphodiesterase 3a Prostate transglutaminase Pregnancy plasma protein A Hsc70-interacting protein Ryanodine receptor 2 Bone morphogenetic protein 3b Prostate-specific antigen

Albumin IgA Albumin Apolipoprotein Albumin Apolipoprotein IgA, IgG IgG, IgM Albumin Albumin IgA Albumin, IgG

I-cell disease Paraneoplastic neurological syndrome Meningioma Tetrahydrobiopterin deficiency Liver cirrhosis Bladder tumours Prostate cancer Down’s syndrome Cell growth Diabetes Osteophytosis Prostate cancer

Ig, immunoglobulin

and the high-abundance proteins in blood; however, this study was the first to show that a potential archive of diagnostic information may be bound to large, highabundant circulatory proteins.

NARROWING DOWN WHERE TO LOOK IN THE HAYSTACK Discovery-driven biomarker identification, in which there is previous knowledge of what one is looking for, is analogous to looking for a needle in a haystack. Relying solely on fractionation and MS to identify disease-specific biomarkers is going to require studies in which samples from healthy and diseaseaffected patients are quantitatively or qualitatively compared. Up to this point, MS-based study design has been focused on generating lists of proteins identified in control samples and comparing it with a list of proteins identified in samples obtained from disease-affected individuals. It is easy to envision the laboriousness of such approaches and the ‘data overload’ that can result. The process of discovering biomarkers would be expedited by finding ways to narrow down what part of the biofluid ‘haystack’ needs to be searched to locate the biomarker ‘needle’. Processes that could narrow down the section of the proteome that needs to be assayed to identify biomarkers include

cell-based assays to localise a desired activity or subcellular fractionation to isolate specific cellular organelles. While not all biomarkers may enable the use of orthogonal techniques, any test that can point to the fraction of a biofluid that contains a relevant biomarker is invaluable. This point is well illustrated in the identification of the anti-proliferative factor (APF), a urinary biomarker for patients with interstitial cystitis (IC).9 IC is a chronic bladder disorder that affects approximately one million Americans, yet there is no definitive diagnostic marker for IC.10 The functional presence of APF had previously been recognised, as the application of urine from IC patients to bladder epithelial cells inhibited their proliferation in vitro.11 By combining this cell-based assay with a series of chromatographic steps, a specific fraction that contained APF could be narrowed down and characterised by RPLC-MS/ MS. As shown in panel (A) of Figure 3, the MS base peak chromatogram for the APF-active fraction is much simpler than that which would be seen for an unfractionated urine sample. While four putative structures were determined through de novo sequencing of the tandem MS spectra corresponding to APF, each of these pointed towards APF as being a glycosylated nonapeptide. The correct sequence of APF was then determined by


73


B) Relative abundance (%)

100 80 60 40 20

0

10

20

30

Retention time (min)

40

D)

C) GalNac

Gal

Sialic acid

Thr-Val-Pro-Ala-Ala-Val-Val-Val-Ala

1482.8

100

Relative abundance (%)

Relative abundance (%)

A)

80 60 40 20 1000

100 80 60 40

1200

1400

m/z

1600

1800

2000

826.5 y6 TVPA AVVVA b3 b4 b5 b6 b7 b8

620.4

638.4

Galactose GalNAc Sialic Acid

m/z=203 m/z=162

737.5

1191.5

529. 3 20 440.4

500

700

m/z=291

1029.5

900

m/z

1,100 1,300 1,500

Figure 3: Identification of a urinary biomarker for interstitial cystitis (IC). (A) Urine from IC patients was fractionated and aliquots measured using a cell-based assay to identify those that contained anti-proliferative factor (APF) activity. (B) Once the aliquot containing the desired activity was identified, it was analysed by microcapillary reversed phase liquid chromatography coupled with tandem mass spectrometry (MS/MS). (C) De novo sequencing of the MS/MS spectrum followed by subsequent validation of the proposed structure enabled (D) the biomarker (ie APF) for IC to be identified as a nine-residue sialoglycopeptide

using a combination of BLAST searches and lectin-binding studies, thereby showing the biomarker to be a sialoglycopeptide in which the peptide domain had 100 per cent homology to a peptide contained within the sixth transmembrane domain of frizzled-8, a Wnt ligand receptor.9 Unfortunately, biomarker discovery is never complete at the discovery phase. Validation studies were then designed not only to show that the correct structure had been formulated but, more importantly, that the proposed molecule was present in samples obtained from IC patients but not from matched healthy controls.

CONCLUSIONS Science is in the infancy of a new revolution. While it can be argued that the human genome project was invaluable for the genomic information that resulted

74

from it, perhaps as important is the new mentality that it has brought to biological science. In this new way of thinking, entire systems of genes, transcripts, proteins and metabolites are analysed in single studies. The challenge is how to flesh out the most valuable information presented within these, often enormous, amounts of data. Biomarker discovery using MS-based proteomics is directly within the centre of this paradigm. While the discovery of APF as a biomarker for IC was able to rely on the use of a cellbased assay, it is more likely that most biomarkers will not have the luxury of the use of such a test for their discovery. The most common study design relies on the immense data-gathering capabilities of MS as an aid to identifying proteomic differences between biofluids obtained from healthy and disease-affected patients. While there will be struggles ahead, the



progress that has been made in the past two years, just in the characterisation of biofluid proteomes, suggests that MS will be a leading technology in the discovery of disease-specific biomarkers in the near future. Acknowledgments This project has been funded, in whole or in part, with federal funds from the National Cancer Institute, National Institutes of Health, USA, under Contract No. NO1-CO-12400. By acceptance of this paper, the publisher or recipient acknowledges the right of the US Government to retain a non-exclusive, royalty-free licence and to any copyright covering the paper. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organisations imply endorsement by the US Government.

4.

Adkins, J. N., Varnum, S. M., Auberry, K. J. et al. (2002), ‘Toward a human blood serum proteome: Analysis by multidimensional separation coupled with mass spectrometry’, Mol. Cell. Proteomics, Vol. 1, pp. 947–955.

5.

Chan, K. C., Lucas, D. A., Schaefer, C. F. et al. (2004), ‘Analysis of the human serum proteome’, Clin. Prot., Vol. 1, pp. 101–226.

6.

Pieper, R., Gatlin, C. L., Makusky, A. J. et al. (2003), ‘The human serum proteome: Display of nearly 3700 chromatographically separated protein spots on two-dimensional electrophoresis gels and identification of 325 distinct proteins’, Proteomics, Vol. 3, pp. 1345–1364.

7.

Dea, M. K., Hamilton-Wessler, M., Ader, M. et al. (2002), ‘Albumin binding of acylated insulin (NN304) does not deter action to stimulate glucose uptake’, Diabetes, Vol. 51, pp. 762–769.

8.

Zhou, M, Lucas, D. A., Chan, K. C. et al. (2004), ‘An investigation into the human serum interactome’, Electrophoresis, Vol. 25, pp. 1289–1298.

9.

Keay, S., Szekely, Z., Conrads, T. P. et al. (2004), ‘An antiproliferative factor from interstitial cystitis patients is a frizzled 8 protein-related sialoglycopeptide’, Proc. Natl. Acad. Sci. USA, Vol. 101, pp. 11803–11808.

References 1.

Lipton, M. S., Pasa-Tolic, L., Anderson, G. A. et al. (2002), ‘Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags’, Proc. Natl. Acad. Sci. USA, Vol. 99, pp. 11049–11054.

2.

Washburn, M. P., Wolters, D. and Yates, 3rd, J. R. (2001), ‘Large-scale analysis of the yeast proteome by multidimensional protein identification technology’, Nat. Biotechnol., Vol. 19, pp. 242–247.

3.

Anderson, N. L. and Anderson, N. G. (2002), ‘The human plasma proteome: History, character, and diagnostic prospects’, Mol. Cell. Proteomics, Vol. 1, 845–867.

10. Warren, J. W. and Keay, S. K. (2002), ‘Interstitial cystitis’, Curr. Opin. Urol., Vol. 12, pp. 69–74. 11. Keay, S., Zhang, C. O., Shoenfelt, J. L. and Chai, T. C. (2003), ‘Decreased in vitro proliferation of bladder epithelial cells from patients with interstitial cystitis’, Urology, Vol. 61, pp. 1278–1284.


75