Multivariate Statistical Tools for the Evaluation of Proteomic 2D-maps

0 downloads 0 Views 403KB Size Report
Jul 10, 2007 - Abstract: Two dimensional polyacrylamide gel electrophoresis (2D-PAGE) maps represent an .... software packages or to the direct analysis of 2D-PAGE im- ages. ..... optimum) have to be optimized (usually by a trial and error.
Current Proteomics, 2007, 4, 53-66

53

Multivariate Statistical Tools for the Evaluation of Proteomic 2D-maps: Recent Achievements and Applications Emilio Marengo*, Elisa Robotti and Marco Bobba Department of Environmental and Life Sciences, University of Eastern Piedmont, Via Bellini 25/G, 15100 Alessandria, Italy Abstract: Two dimensional polyacrylamide gel electrophoresis (2D-PAGE) maps represent an unavoidable tool in many fields connected with proteome research, such as development of new diagnostic assays or new drugs. Unfortunately the information contained in the maps is often so complex that its recognition and extraction usually requires complex statistical treatments. Statistics accompanies many phases of 2D-PAGE maps management - from the spot revelation to maps matching, as well as the extraction and rationalisation of useful information. This review describes and reports the most recent achievements in the field of statistical tools applied to proteome research by two-dimensional gel electrophoresis (2D-GE). The first section is devoted to briefly describe the theoretical aspects of the multivariate methods mostly adopted in this field such as Principal Component Analysis, Cluster Analysis, Classification methods, Artificial Neural Networks. The most recent applications are then described explaining the analysis of spot volume datasets from standard differential analysis as well as the direct analysis of 2D maps images. Applications are also reported about the use of multivariate tools in the analysis of DNA and RNA profiles.

Key Words: Principal component analysis, classification methods, linear discriminant analysis, soft-independent model of class analogy, image analysis, moment functions, fuzzy logic, spot volume data. INTRODUCTION Two dimensional gel-electrophoresis (2D-GE) has undergone a rapid development in the last few years for the separation and analysis of protein extracts in many fields of proteomic research, e.g. clinical chemistry, botany, microbiology, toxicology, food security and control. In spite of being a very powerful tool for protein analysis, 2D-GE is characterised by low reproducibility, particularly due to the complexity of the specimen and instrumental technique adopted to obtain the final electrophoretic maps. The same limitation also limits one-dimensional (1D) gel electrophoresis (Righetti et al., 2001). The complexity of the sample covering a wide range of properties, structures and molecular weights contributes to the complexity of the final map. In addition, the instrumental technique itself (from sample preparation to the electrophoretic run) can further affect reproducibility of 2D-GE. These limitations of 2D-GE made it mandatory to use the dedicated software packages to analyse the information contained in two-dimensional maps (2D-maps) allowing to take into consideration in some way the intrinsic uncertainty of the technique. Many software became available in the last few years for the comparison of 2D-maps (PDQuest, Progenesis, Melanie, Z3, Phoretix, Z4000, etc.) (Anderson et al., 1981; Mahon et al., 2001; Rubinfeld et al., 2003). All commercial solutions available present advantages and disadvan *Address correspondence to this author at the Department of Environmental and Life Sciences - University of Eastern Piedmont - Via Bellini 25/G 15100 Alessandria, Italy; Tel: +39 0131 360272; Fax: +39 0131 360250; E-mail: [email protected]

1570-1646/07 $50.00+.00

tages (Almeida et al., 2005; Campostrini et al., 2005; Molloy et al., 2003; Moritz et al., 2003; Raman et al., 2002; Rosengren et al., 2003; Voss et al., 2000; Wheelock et al., 2005) but they are almost all based on a multi-step procedure performing the analysis of sets of 2D-maps from the digitalised images of the gels themselves, obtained by laser densitometry, phosphor imaging and via a CCD camera. The analysis of digitalised images involves several steps (described here with particular reference to the PDQuest system (Garrels et al., 1979, 1984, 1989): 1) Scanning: it turns each gel image into pixel data and each pixel is characterised by x-y coordinates indicating its position on the 2D-image and a Z value corresponding to its signal intensity (optical density value - OD). 2) Filtering images: a pre-processing step eliminating noise, background effects, specks and imperfections. 3) Automated spot detection: a step identifying the spots present on each gel independently. The operator has to select: the faintest spot (to set the sensitivity and minimum peak value); the smallest spot (to set the size scale parameter); the largest spot (to set the maximum size of the spots to be detected). A final smoothing is applied to remove spots close to the background level. Spots are located on the gel image (i.e. each spot is identified by x-y coordinates indicating its position), substituted by ideal Gaussian distributions and quantified by the sum of the OD values within each Gaussian distribution. 4) Matching of protein profile: sets of 2D-gels can be edited and matched to one another in a “match set”. Each spot is matched to the same spot in all of the other gels of the set ©2007 Bentham Science Publishers Ltd.

54 Current Proteomics, 2007, Vol. 4, No. 1

under investigation. For this purpose, landmarks are needed. Reference spots are used by PDQuest to align and position match set members for matching. The identification of the landmarks sets some parameters accounting for distortions existing among the gels to be compared. 5) Normalisation: it is applied to the maps to compensate gel-to-gel variations due to sample preparation and loading as well as staining and destaining procedures etc. 6) Differential analysis: it allows the comparison of different sets of 2D-maps i.e. control and diseased samples. Within each group of 2D-maps, a “sample group” is created containing the average values of all the spots identified. The comparison of the groups is carried out on “sample groups” to find differentially expressed proteins. Usually, only spots showing a two-fold variation are accepted as significantly changed (100% variation). This procedure allows to avoid differences due to the large experimental error rather than actual systematic variations. 7) Statistical analysis: it is applied to identify the differentially expressed proteins. Statistical analysis is usually based on Student’s t-test (p