Chemical Composition, Sensory Properties ... - Wiley Online Library

94 downloads 368782 Views 521KB Size Report
nology, and consumer science among other types of analytical data. (Berrueta and others ... nent analysis (PCA) to check for chemical markers to differentiate.
Chemical Composition, Sensory Properties, Provenance, and Bioactivity of Fruit Juices as Assessed by Chemometrics: A Critical Review and Guideline Ac´acio A.F. Zielinski, Charles W.I. Haminiuk, Cleiton A. Nunes, Egon Schnitzler, Saskia M. van Ruth, and Daniel Granato

Abstract: The use of univariate, bivariate, and multivariate statistical techniques, such as analysis of variance, multiple comparisons of means, and linear correlations, has spread widely in the area of Food Science and Technology. However, the use of supervised and unsupervised statistical techniques (chemometrics) in order to analyze and model experimental data from physicochemical, sensory, metabolomics, quality control, nutritional, microbiological, and chemical assays in food research has gained more space. Therefore, we present here a manuscript with theoretical details, a critical analysis of published work, and a guideline for the reader to check and propose mathematical models of experimental results using the most promising supervised and unsupervised multivariate statistical techniques, namely: principal component analysis, hierarchical cluster analysis, linear discriminant analysis, partial least square regression, k-nearest neighbors, and soft independent modeling of class analogy. In addition, the overall features, advantages, and limitations of such statistical methods are presented and discussed. Published examples are focused on sensory, chemical, and antioxidant activity of a wide range of fruit juices consumed worldwide.

Introduction Fruit juices are consumed worldwide, not only for their flavor, taste, and freshness, but also due to their beneficial health effects when consumed regularly. As fruit juices are relevant and suitable sources of polyphenolic compounds and also carotenoids, many people are becoming aware of the importance of consuming them in their daily diet. Indeed, for example, commercial purple (also known as red) grape juices have shown inhibitory effects on low-density lipoprotein oxidation that are comparable to those of red wines, and their antioxidant activity appears to

MS 20131708 Submitted 11/18/2013, Accepted 01/17/2014. Author Zielinski is with Graduate Program of Food Engineering, Federal University of Paran´a. R. Cel. Francisco Her´aclito dos Santos 210, Polytechnic Campus, CEP 81531-980, Curitiba, PR, Brazil and Food Science and Technology, Graduate Program, State Univ. of Ponta Grossa. Av. Carlos Cavalcanti, 4748, 84030–900, Uvaranas Campus, Ponta Grossa, Paran´a, Brazil. Author Haminiuk is with Graduate Program of Food Technology (PPGTA) – Federal University of Technology – Paran´a, Campo Mour˜ao Campus, Via Rosalina Maria dos Santos, 1233, Campo Mour˜ao, CEP 87301-899, Campo Mour˜ao, Paran´a, Brazil. Author Nunes is with Dept. of Food Science, Federal Univ. of Lavras, CP 3037, 37200–000, Lavras, Minas Gerais, Brazil. Authors Zielinski, Schnitzler, and Granato are with Food Science and Technology, Graduate Program, State Univ. of Ponta Grossa. Av. Carlos Cavalcanti, 4748, 84030–900, Uvaranas Campus, Ponta Grossa, Paran´a, Brazil. Authors van Ruth and Granato are with Inst. of Food Safety, RIKILT, Wageningen Univ. and Research Centre, P.O. Box 230, 6700 AE, Wageningen, the Netherlands. Direct inquiries to author Granato (E-mail: [email protected]).

be distributed among all the phenolic constituents, which is also consistent with the data on wine (Frankel and others 1998). In addition, some in vivo and ex vivo studies, including protocols using different types of animals, cells, and humans (Park and others 2004, 2009; Jung and others 2006; Shanmuganayagam and others 2007; Dani and others 2009; Lacerda and others 2012; Sun and others 2012; Macedo and others 2013), have shown that polyphenolic compounds, which are present in fruit-based beverages, including juices and fermented beverages, such as cis- and trans-resveratrol, gallic acid, quercetin, and catechins, act as potent free-radical scavengers and metal chelators by increasing the activity and expression of endogenous antioxidant enzymes, such as catalase and superoxide dismutase, and thereby reducing plasma free-radical release and damage to DNA. In this regard, the measurement of the in vitro antioxidant activity of fruit juices is an important and complementary approach for the determination of biochemical pathways and chemical substances that exert such activity in in vivo models, once research has shown a significant and statistically significant correlation between in vivo and in vitro studies (Rodrigo and others 2005; Macedo and others 2013). Data treatment is an important and decisive step to make inferences about differences between conventional and organic juices that come from different regions (Granato and others 2014a) or to differentiate products subjected to distinct technological operations based on chemical markers (Braga and others 2013), for example. It is very common to find research that applies univariate and bivariate methods, such as linear correlation analysis

300 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

doi: 10.1111/1541-4337.12060

Chemometrics, analytics, and fruit juices . . .

Figure 1–General overview on how to use chemometrics to assess geographical origin, biological activity, and sensory properties of fruit juices.

based on the Pearson’s product and one-factor analysis of variances (ANOVAs) to determine relationships between total or individual phenolic compounds and in vitro and in vivo antioxidant or other biological properties in fruit juices (D´avalos and others 2005; Yildirim and others 2005; Dani and others 2007; Vargas and others 2008; Vedana and others 2008; Goll¨ucke and others 2009; Burin and others 2010). However, this 1-dimensional (1D) approach is not as suitable as it apparently seems once it fails to guarantee the determination of multivariate association among experimental data and to gain insight into the intrinsic characteristics of complex food matrices. In order to overcome this serious limitation, interest in chemometric methods, which are multivariate by nature, has been recognized as a valuable tool in Food Science and Technology (Wu and others 2010; Baiano and Terracone 2011; Boggia and others 2013; Granato and others 2014b). Indeed, chemometrics is the science of extracting as much information as possible from chemical/biological systems, such as in vivo, ex vivo and also in vivo protocols, by data-driven means (Giacomino and others 2011; Alezandro and others 2013; Macedo and others 2013). It is an interfacial discipline that uses multivariate statistical techniques, theoretical and applied mathematics, and computer sciences to address problems in sensory analysis, food chemistry, food technology, and consumer science among other types of analytical data (Berrueta and others 2007; Correia and Ferreira 2007; Varela and Ares 2012). A simple and illustrative overview on how chemometrics can be used in food research is shown in Figure 1, that is, data need to be collected according to an appropriate experimental design and the product’s properties can be assessed by using a  C 2014 Institute of Food Technologists®

suitable statistical/mathematical tool, then inferences can be made using the graphs and tables generated by using these methods. Research has clearly shown that the use of chemometrics is an important and decisive tool to monitor the quality of fruitbased products (Mel´endez and others 2013). Chemometric techniques have also been used to assure the authenticity and quality of grape-based products, such as red wines (Arvanitoyannis and others 1999), to classify the geographical origin of foodstuffs based on chemical compounds (Liu and others 2006; Kallithraka and others 2007), and sensory properties (Kallithraka and others 2001), and to assess the typicality of red wines coming from different growing regions (Granato and others 2012), among other applications. Tamborra and Esti (2010) used a 3D model of principal component analysis (PCA) to check for chemical markers to differentiate grape varieties grown in southern Italy. The authors verified that the relative content of grape glycosidic precursors from various terpene families, as well as the contents of shikimic acid, cyanidin3-O-glucoside, trans-caftaric acid, and trans-coutaric acid, were helpful varietal discriminating factors. On the basis of the above, and taking into consideration the fact that multivariate statistical analysis has been vigorously applied to solve problems and increase the understanding of data structure in food research, the objectives of this review article are: (1) to provide an overview of some of the chemometric techniques most used in Food Science and Technology; (2) to show their application to assess the geographical origin, sensory properties, and bioactivity (antioxidant activity) of fruit juices consumed worldwide; and (3) to present a guideline together with some common software used to perform chemometric analyses.

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 301

Chemometrics, analytics, and fruit juices . . . Table 1–Example on how to build a data matrix for chemometrics application using Statistica, Chemoface, and Action software. Samples

Response 1

Response 2

Response 3

Response 4

Response 5

Response . . .

Response n

Fruit juice 1 Fruit juice 2 Fruit juice 3 Fruit juice 4 Fruit juice 5 Fruit juice 6 Fruit juice . . . Fruit juice n

In each cell, for a certain sample and response variable, one should add the mean value, that is, even if the experiment was conducted in triplicate (physicochemical analysis) or more than 50 measurements were made (sensory analysis), PCA/HCA can be performed using only the mean value.

Chemometric Methods As previously mentioned, univariate methods such as ANOVA, followed by a multiple comparison test of means (Tukey, Duncan, Fisher, and so on), are sometimes not sufficient in order to understand data structure. In sensory analysis, it is common not to observe differences among means when a test of liking of an attribute is assessed because data have a great variability but if an Internal Preference Map is applied to data, remarkable differences can be observed in a 2D scatter plot. Thus, more sophisticated statistical methods should be used to establish associations and make inferences between different test samples with respect to a list of responses used to characterize such products. These multivariate statistical methods, namely chemometrics, are divided into 2 main categories: unsupervised statistical techniques, also known as multivariate exploratory techniques (Correia and Ferreira 2007), and supervised statistical techniques, also known for providing classification indices to a large set of samples and/or response variables (Berrueta and others 2007). All these methods concern the extraction of relevant information from chemical data by mathematical and statistical tools and they are widely used in food research for making empirical mathematical models that are, for instance, capable of predicting the values of important properties that are not directly measurable, to predict the values of future samples based on only some simple response variables or just to study relationships among samples and analytical determinations of large data sets. In accordance with Varmuza and Filzmoser (2008), some common issues that can be successfully addressed by chemometrics are the classification of the origins of samples (from analytical, sensory, or spectroscopic data), the prediction of a property or bioactivity of a chemical compound or food extract, and the evaluation of the process status in technological processes. In addition, chemometrics has been successfully applied to assess the authenticity and typicality of foods, such as edible oils and fats, meat products, wine, milk, juices, and also products that present the claims “Protected Designation of Origin (PDO),” “Protected Geographical

Indication (PGI),” and “Traditional Specialty Guaranteed (TSG)” on the label (Granato and others 2010; Cruz and others 2011; Oliveri and Downey 2012; Fabani and others 2013; Nunes 2014; Zhao and others 2014). An important and decisive step to perform chemometric analysis for a certain data set is related to the organization of the experimental results, that is, the matrix of results should be defined and structured prior to the use of the software. Initially, all mean values for each response variable need to be placed in lines and the columns should represent the response. A simple and practical example on how to build the data matrix in Statistica, Chemoface, and Action software is presented in Table 1, that is, the columns must be inserted with the responses and each line should contain the mean values for the respective response. Although there are some criticisms regarding this issue, this is most common way to build the data matrix for chemometrics application. In order to facilitate the understanding of the use of the most used multivariate statistical techniques (chemometrics), Table 2 was built taking on examples of how analytical determinations were statistically evaluated: the preprocessing of response variables, the type of measurements and statistical approach used, the type and number of samples, as well as the most important results were also listed. Figure 2 contains practical steps one should follow in order to analyze experimental results (sensory, biochemical, nutritional, physicochemical, and rheological) using multivariate statistical techniques. It is known that when fruit-based products such as juices and fermented beverages are analyzed, many responses may be obtained using diverse analytical methods, such as sensory acceptance, rheology properties, content of vitamins, phenolic compounds, essential minerals, carotenoids, and bioactivity toward microorganisms and free radicals, among others. Therefore, there are many ways to express the results, and different values are obtained. For example, for oxygen radical absorbance capacity, the magnitude of the result

302 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . Table 2–Examples of multivariate statistical methods applied to the analysis of fruit juices. Type of fruit juice

Number of samples

Tomato

12

Sweet orange, tangerine, lemon, and grapefruit

83

Apricot

26

Mandarin

9

Grape

Statistical approach

Preprocessing of data

Result

Reference

PCA

Autoscaling

PC1 and PC2 explained 68.61% of total variance

Vallverd´ u-Queralt and others (2013)

PCA, HCA, LDA, PCR, PLS

Autoscaling

The 1st 4 PCs Abad-Garc´ıa and explained 84% of others (2012) total variance. Four clusters were formed with similarity index between 0.4 and 0.6 PC1 and PC2 Versari and others explained 66% of (2008) the total variance

Carbohydrates, organic HPLC and colorimeter PCA acids, amino acids, phenolic compounds, furanic compounds, and color parameters Mass spectra PTR-MS PCA

Autoscaling

Orange

482

Tomato

33

Pomegranate

15

Japanese quince (Chaenomeles)

21

Grape

212

Citrus

170

Spectra

Lemon

44

Apple

68

8

Instrumental technique

Volatile compounds SPME/GC-MS and sensory analysis (quantitative descriptive analysis [QDA]) Phenolic compounds HPLC-DAD-MS/MS (flavanones, flavones, flavonols, hydroxycinnamic acids, and coumarins)

Sugars (glucose, fructose, and sucrose) and acids (tartaric and malic) Antioxidant capacity (2,2-diphenyl-1picrylhydrazyl [DPPH], 2,2 -azino-bis-3ethylbenzthiazoline6-sulfonic acid [ABTS], and ferric ion reducing antioxidant power [FRAP]), total phenolic compounds, total anthocyanin content Al, Ba, B, Ca, Co, Cu, Fe, Li, Lu, Mg, Mn, Mo, Ni, P, K, Rb, Si, Na, Sr, Sn, Ti, V, Zn Phenolic compounds (hydroxycinnamic acids, flavonols, and flavanones) and antioxidant capacity (ABTS and DPPH) pH, titratable acidity, total soluble solids, sugars, organic acids, vitamin C, phenolic compounds, total phenols, antioxidant capacity, color measures, and sensory analysis pH, density, soluble solids, viscosity, turbity, titrable acidity, vitamin C, phenolic compounds, proteins, insoluble solids, and polysaccharides Spectra

Pomegranate

196

Analytical measurement

Original spectra

PC1 and PC2 van Ruth and explained 79.83% others (2008) of total variance. PC1 and PC2 Liu and others explained 63.7% of (2006) total variance

HPLC

PCA

Autoscaling

Spectra reaction

PCA, HCA

Autoscaling

PC1 and PC2 C ¸ am and others explained 93% of (2009) the total variance. Three clusters were formed.

ICP-AES, ICP-MS

PCA

Autoscaling

PCA showed the Simpkins (2000) separation of samples in 3 groups

HPLC-ESI-MS/MS and PCA spectra reaction

Autoscaling

PC1 and PC2 explained 85.30% of total variance

Vallverd´ u-Queralt and others (2012)

HPLC, spectra reaction, and colorimeter

PCA, HCA

Autoscaling

The 1st 3 PCs explained 74.28% of total variance. Two clusters were formed.

Mena and others (2011)

HPLC, spectra reaction.

PCA

Autoscaling

PC1 and PC2 explained 63.50% of total variance

Ros and others (2004)

NIR, MIR, and Vis spectroscopy

PCA, LDA, PLS-DA

Original spectra

About 80% to 89% of Cozzolino and correct others (2012) classification 100% of correct Freitas and others classification (2013)

Ultraviolet-visible PCA, SIMCA, (UV-Vis) LDA, spectrophotometry PLS-DA Br, As, Na, Rb, La, Cr, Sc, Instrumental LDA Fe, Co, Zn, and Sb neutronic activation analysis Spectra MIR and NIR LDA, PLS-DA spectroscopy

Smoothing (Savitzky– Golay) Autoscaling 1st derivative

93.2% of correct classification

Pellerano and others (2008)

77 to 100 of correct classification

Reid and others (2005) (Continued)

 C 2014 Institute of Food Technologists®

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 303

Chemometrics, analytics, and fruit juices . . . Table 2–Continued Type of fruit juice

Number of samples

Analytical measurement

Instrumental technique

Statistical approach

Preprocessing of data

Apple

40

Volatile compounds

SPME/GC-MS

Orange

150

Spectra

1 H NMR

Original relative peak areas PLS regression Various

Orange, apple, and grapefruit Guduchi (Tinospora cordifolia)

84

Metabolomics

HPLC-MS

LDA

9

Apple

36

Orange

900

Phytochemicals UPLC-QTOF-MS (jatrorrhizine, mangoflorine, menisperine, columbamine, berberine, tinosporoside) and mass spectra profile Polyphenol compounds HPLC (procyanidins, hydroxycinnamic derivatives, and dihydrochalcones) Electrode responses Electronic tongue

Orange, pear, peach, and apricot

84

Electrode responses

Electronic tongue

Cherry

131

Spectra

1 H NMR

Apple, blueberry, cranberry, Concord grape, and plum

52

Spectra

FTIR

Apple

704

Spectra

MIR

Apple

60

Spectra

Fluorescence spectra

Orange

26

Spectra

1 H NMR

Orange

36

Electrode responses

Electronic tongue

White grape

25

Electrode responses

Electronic tongue

is about 5000 to 40000 mmol Trolox equivalent per liter of juice, while the content of phenolic compounds is usually about 100 to 1000 mg/L, and pH varies from 3.0 to 5.5. Therefore, it is easy to observe the existence of different units to express the results and a standardization of the whole data set is undoubtedly required. In this sense, the variables with higher values present greater statistical importance in the modeling, as compared to variables that present lower values, such as pH and water activity. Thus, it is important to equalize the statistical importance of all response variables, the so-called preprocessing of data, and there are many methods to accomplish this objective: mean centering, variance centering, smoothing, and autoscaling. The latter method is often used for continuous quantitative responses, and smoothing is used for spectroscopic data. In a general way, data transformation is also required to make its distribution fairly symmetrical and to give each response variable (column) the same weight and the same prior importance in the analysis (Wold and others 2001). In accordance with Engel and

LDA, PLS-DA

PLS-DA

Result 80 to 92.5 of correct classification

About 3.47% of misclassification Pareto scaling 100 of correct classification Original data 88.9% of correct classification

Reference Reid and others (2004) Vigneau and Thomas (2012) Vaclavik and others (2012) Shirolkar and others (2013)

LDA, PLS-DA, Original data SIMCA, KNN

97% to 100% of correct classification

Mangas and others (1997)

PLS-DA, PCA, ANN PCA, PLS-DA

100% of correct classification 100% of correct classification

Ciosek and others (2005) Martina and others (2007)

96.7% of correct classification

Longobardi and others (2013)

PCA, LDA, PLS-DA, SIMCA PCA, HCA, SIMCA

Autoscaling Mean centering and 1st derivative Original data

Mean 100% of correct centering, classification 2nd derivate, normalization, and smoothing (Savitzky– Golay) PCA, PLS, KNN 1st derivative, 82.4% to 96.5% of smoothing correct (Savitzky– classification Golay) PCA, PLS, Original data Two different clusters SIMCA classified PCA, DA, KNN Fitting 98.8% of correct classification PCA, CDA, CA, Original data 97.99% of correct SIMCA classification PCA, SIMCA Mean 83% to 86% of centering correct classification

He and others (2007)

Kelly and Downey (2005) Seiden and others (1996) Vogels and others (1996) Liu and others (2012) Guti´errez-Capit´an and others (2013)

others (2013), if preprocessing of raw data is not carried out properly, it may introduce extra variation on the data set. Therefore, this initial step is critical once it directly influences the outcome of chemometrics application and the correctness and success of the experiments. Thus, prior to using any chemometric method, it is important to utilize one of the above-mentioned methods to make the data distributions fairly symmetrical. In practice, regardless of the data distribution, results are transformed, centering them by removing the mean value of each feature, then scaling the results by dividing nonconstant features by their standard deviation. This method is known as autoscaling and experimental results can be transformed into z-scores (transformed data) by using Equation 1: Z ij =

X ij − X j sj

(1)

where Z is the standardized value for each value of the response, Xij represents the original value for the object (i) of measured

304 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . .

Experimental results (physical, nutritional, sensory, metabolomics, microbiological, physicochemical)

Univariate/Bivariate statistical methods

Select the response variables

Multivariate statistical methods (Chemometrics) When many variables and few samples need to be analyzed simultaneously. See Besten et al. (2012) and Besten et al. (2013)

Standardize the results (auto-scaling, meancentering, variancecentering, smoothing, etc.)

Supervised pattern recognition methods

Unsupervised pattern recognition methods

Linear discriminant analysis (LDA)

Prediction of sample classes (numerical)

Analysis of multivariate correlations within response variables and a previous classification of samples

Principal component analysis (PCA)

Partial least squares discriminant analysis (PLS-DA)

Hierarchical cluster analysis (HCA)

Dendrograms and/or two or threedimensional graphs

Soft independent modeling by class analogy (SIMCA) K-nearest neighbors (KNN)

Figure 2–Suggestion on how to analyze experimental results using multivariate statistical techniques (chemometrics).

attribute (j), Xj is the mean value of the variable j, and sj is the standard deviation for the response variable. In contrast to autoscaling, variance scaling is an adjustment to a data set that equalizes the variance of each response variable and thus decreases the influence of each response in the data set. This procedure can be performed by dividing each observation (result) by the standard deviation of the whole column (Equation 2): Z ij =x ij /sj

(2)

Mean centering is another preprocessing procedure that is often used in multivariate calibration models, especially in food chemistry. This method calculates the mean value of the response variable and each result (in each line) is subtracted from the mean value, as given by Equation 3: Z ij = x ij − x¯ ij

(3)

For example, if the researcher needs to analyze spectroscopic data, the spectrum of all the spectra in the set will be subtracted by the mean value, calculated from all spectra. By removing the  C 2014 Institute of Food Technologists®

mean from each spectrum, the differences among samples are significantly enhanced in terms of both concentration and spectral response, thus leading to calibration models that render more accurate predictions. For a spectral data set, the smoothing preprocessing can also be appropriate in order to reduce the noise and remove narrow spikes. Differentiation can also be used to extract relevant information and to correct the baseline. The most used technique for smoothing and differentiation is the Savitzky–Golay algorithm, which is a local polynomial regression. The principle is that data for small wavelength intervals can be fitted by a polynomial and that the fitted values are a better estimate than those measured because some noise has been removed. For each j point in the spectra with xj value, a weighted sum of the neighboring values is calculated. The weights determine whether a smoothing is performed or a derivative is calculated. The number of neighbors and the degree of the polynomial control the strength of smoothing (Varmuza and Filzmoser 2008). The amount of smoothing depends on the number of neighbors used to compute the polynomial fit to the data: use of more data points results in more smoothing. Higher degree polynomials is equivalent to using less smoothing, since

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 305

Chemometrics, analytics, and fruit juices . . . high-order polynomials can twist and turn more to follow the details of the data (Mark and Workman Jr. 2007). Although the final choice is largely empirical, the quadratic function is the most commonly used (Adams 2004). There are no established and/or worldwide guidelines when to employ or to avoid a certain preprocessing method, so it is necessary to test some of the above-mentioned methods and check their suitability to explain the experimental data. Other preprocessing methods have already been reviewed by Engel and others (2013).

Unsupervised statistical techniques: principal component analysis (PCA) and hierarchical cluster analysis (HCA) PCA represents one of the most frequently used chemometric tools, mainly due to its very attractive features such as generation of a 2D or 3D graph that facilitates the understanding of similarities and differences among samples. PCA is a nonsupervised technique that reduces the dimensionality of the original data matrix, retaining the maximum amount of variability (percentage of explained variance), and it permits the visualization of the original arrangement of juices in an n-dimensional space (usually 2- or 3-dimensions) by identifying the directions in which most of the variance is retained, allowing the relationship between variables and observations to be studied, as well as recognizing the data structure. It is, therefore, possible to explain differences in various fruit juices by means of the factors obtained from the generalized correlation matrix of the data sets, and at the same time to determine which variables contribute most to such differentiation (Cruz and others 2013). One advantage and an important application of PCA in food research is that not only physicochemical, rheological, and chemical information can be analyzed, but also data from consumers obtained by hedonic tests, for example. In this regard, internal preference maps, which are a multidimensional projective map, can be generated aiming to assess individual consumer preferences, as well as to evaluate trends in the population used in the study, leading to useful information for product development and to target advertisement campaigns (Worch 2013; Vigneau and others 2014). It is important to emphasize that the use of this multivariate approach over traditional univariate statistics (ANOVA followed by multiple means comparison tests such as Tukey or Fisher) is much more relevant and scientifically sound once basic inferential methods assume that all consumers present the same behavior (preference) and that a single mean value is suitable and representative (we know it is not!) of the whole set of consumers. Applications of PCA for physicochemical, metabolomics, quality control, chemical, nutritional, biochemical, and sensory data are widely available in the literature (Alezandro and others 2011; Rodriguez-Campos and others 2011; Ellendersen and others 2012; Alezandro and others 2013; Besten and others 2013; Cheng and others 2013; Fernandes and others 2013; Granato and others 2014c). HCA comprises an unsupervised classification procedure that involves a collection of statistical methods that behave similarly or show similar characteristics. The initial assumption is that the nearness of objects in the space defined by the variables reflects the similarity of their properties (Giacomino and others 2011). Typically, in clustering methods, all the samples within a cluster are considered to present similar characteristics. The primary goal of HCA is to display data in such a way that natural clusters and patterns can be shown in a 2D space (also known as a dendrogram). This graphical representation allows the visualization of clusters and correlations between either samples or variables simultaneously, which is an

important positive advantage over univariate methods (Cruz and others 2011). In sensory analysis and consumer research, clustering techniques are widely employed to segment the population based on not only preferences or degree of liking of some food attribute such as aroma, bitterness, juiciness, creaminess, color, and taste but also on nutrition information and possible health-promoting benefits of foods (Marafon and others 2011; Onwezen and others 2012; Visschers and others 2013). After consumers are grouped in different clusters, differences on sensory preferences and other characteristics of the consumers (demographic and economical information, for example) can be compared using either one-way or multivariate analyses of variances (MANOVAs) adopting the formed clusters as independent variables. If the probability value of this test is below the established α (usually α = 0.05), a post hoc test (Fisher, Tukey, Bonferroni, or Duncan) can be applied using the mean values for each attribute to investigate which cluster differed from each other (Zhang and others 2008), that is, it is possible to evaluate which response variable is responsible for the segmentation of consumers. Other practical application of cluster analysis for physicochemical, biochemical, nutritional, and sensory data are vastly available in the literature (Zhang and others 2010; Granato and others, 2011b; Aprea and others 2012; Bernu´es and others 2012; Meng and others 2012; Zielinski and others 2014).

Supervised statistical techniques: linear discriminant analysis (LDA), partial least square-discriminant analysis (PLSDA), k-nearest neighbors (KNN), soft independent modeling of class analogy (SIMCA) PLS regression generalizes and combines features from PCA and multiple regression in a multivariate calibration method. Relationships between an X matrix of p predictor variables, and a matrix Y of q response variables are modeled by using a model containing a new set of a (with a  p) latent variables. Then, a PLS algorithm is used to calculate the X score, weight and loading matrices, the Y score and weight matrices, the PLS coefficients matrix and the X residuals, and Y residuals. The most commonly used algorithms are the nonlinear iterative partial least squares (NIPALS) and the statistically inspired modification of PLS (SIMPLS) (de Jond 1993). The optimal number of latent variables is evaluated by cross-validation, that is, the training data set is divided in smaller data sets and parallel models are developed on all groups by sequentially removing components and evaluating the effect on the predictive ability of the model. The results can be finally validated against an independent test (external) set (Parente 2011). PLS is particularly useful to predict a dependent variable from a large set of independent variables (descriptors). PLS-DA consists of a PLS regression where the dependent variable is a categorical one, namely, a set of dummy variables describing the categories and expressing the class membership. The Y is a set of binary (0 or 1) variables describing the categories; that is, the number of dependent variables is equal to the number of categories, and the PLS-DA is then run as if Y was a continuous matrix. The model performance is based in the correct classification. A similar approach is used in LDA, in which a linear function is used to model descriptors against categorical dependent variables (Vandeginste and others 1998). Both PLS-DA and LDA have been used to classify fruit juices, aiming to mainly assess provenance/origin and authentication/adulteration of samples. The analytical descriptors most commonly used are spectral data from near-infrared (NIR) and mid-infrared (MIR), in addition to data from high-performance liquid chromatography (HPLC), gas chromatography (GC), and electronic tongue (E-tongue).

306 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . – R Development Core Team: Austria) are still developing userfriendly graphical interfaces and usually require some command line programming. Although there are some toolboxes with graphical interfaces that can facilitate the use of these programs (Olivieri and others 2004; Clerc and others 2008), they are specific to a particular chemometric method. A free software (Chemoface) with user-friendly graphical interface for chemometric analyses was recently launched (Nunes and others 2012); it can solve a number of chemometric problems, including supervised and unsupervised methods, as well as design of experiments and response surface methodology. WEKA (Waikato Environment for Knowledge Analysis) can be used to perform numerous types of algorithms in order to preprocess and classify experimental results, perform regression analysis, clustering, association rules, and to provide a multivariate visualization of data. This software is available free under the GNU General Public License (Witten and others 2011). The main disadvantage of this software is related to the addition of mathematical and statistical functions/techniques, so the user has to combine WEKA to other software in order to analyze data. Likewise, if one uses the Statistica software, many multivariate statistical tools need to be purchased prior to data analysis, which may be a compromising factor for analytical analysts. Action software, which was developed by Brazilian scientists, is a free statistical package available to download (Portal Action 2013). This software is presented in Portuguese and in English; it has a user manual indicating the steps to be followed to perform each analysis and it also has suitable graphics output. It is the 1st statistical system to utilize the R platform together with Microsoft Excel. Action Software may be used to perform basic statistics (descriptive and inferential data analysis) as well as unsupervised statistical techniques (cluster analysis). One main disadvantage of this cost-free software is that it contains only HCA, so no other exploratory and classification techniques, including PCA and PLSDA, are implemented. Moreover, when HCA is performed, the user needs to choose the number of clusters to be formed which can be considered a drawback in data analysis. The R is the most used and comprehensive mathematical and statistical software due to many practical and economical reasons: it is free of costs and presents a high variety of models (nonlinear, linear, basic statistics, regression analysis, classification methods, cluster analysis, among others). Moreover, the graphics outputs have a high definition. Besides being very attractive and useful, R software requires some computational skills once it is based on programming, which may limit its use. It is possible to observe that besides these statistical software are available and have attractive outputs, yet all the advantages do not seem to have much influence on the popularity of the methods and the multivariate statistical methods have been mainly used by chemometricians and seem to be far from being routinely used in Software for chemometrics Due to the large size of the data sets and the complexity of analytical laboratories. calculations, the success of chemometric methods is dependent on specialized software. There are currently a number of spe- Fruit Juice Studies cialized statistical packages for chemometric calculations, which Provenance and geographical origin Nowadays, the origin of food products (mainly olive oils, differare available for purchase, and there are also a large number of costs-free packages. In general, those that are easier to use are ent types of cheeses, chocolates, and wines) is monitored in order commercial versions (XLSTAT, Addinsoft Inc., New York, N.Y., to assess the product quality. Indeed, assuring the product’s oriU.S.A.; Statistica, Statsoft Inc., Tulsa, Okla., U.S.A.; Pirouette, gin is a basic and intrinsic aspect with which to assess its typicality Infometrix Inc., Bothell, Wash., U.S.A.; Unscrambler, CAMO and, therefore, it represents a tool to monitor quality. In this regard, Software Inc., Oslo, Norway, U.S.A.). On the other hand, some multiple analytical parameters, such as mineral content, phenolic costs-free licensed versions (including Scilab – INRIA: France; R compounds, organic acids content, and instrumental color, have KNN is based on the determination of the distances between an unknown object and each of the objects of the training set. In KNN, the k-nearest objects to the unknown sample are selected and a majority rule is applied; the unknown is classified in the group to which the majority of the k objects belong. The choice of k is optimized by calculating the prediction ability with different k values. The method presents several advantages: its mathematical simplicity; it is free from statistical assumptions, such as the normal distribution of the variables; and its effectiveness does not depend on the space distribution of the classes (Berrueta and others 2007). In this research, for the KNN analysis, the inverse square of the Euclidean distance is used as the criterion for calculating the distance between samples and the number of neighbors (k) is selected after studying the success in classification with different k values. This technique should be applied to a training set with all the samples. SIMCA is the most used class-modeling technique. In SIMCA, each category is independently modeled using PCA, and it can be described by a different number of principal components. The number of principal components for each class in the training set is determined by cross-validation; in this way, a sufficient number of principal components is retained to account for most of the variation within each class. Another approach considers that each class is bounded by a region of space, which represents a percentage of confidence level (usually 95%) that a particular object belongs to a class. The discriminatory power measures how well a variable discriminates between 2 classes. This differs from the modeling power in the sense that if a variable is able to model 1 class well, it does not necessarily imply that it is able to discriminate 2 groups effectively (Berrueta and others 2007). The reliability of the classification models for each category that is achieved should be studied by 2 indexes: recognition ability (percentage of the members of the training set correctly classified) and prediction ability (percentage of the members of the test set correctly classified by using the rules developed in the training step). SIMCA models should also be evaluated by their sensitivity ability (the percentage of objects belonging to the category correctly identified by the mathematical model) and specificity ability (percentage of objects foreign to the category classified as foreign). In the supervised techniques, the classification rules achieved by the supervised chemometric techniques are validated by dividing the complete data set into a training set and an evaluation set. Samples are usually randomly assigned to a training set, consisting of 75% of samples, and the test set, composed of the remaining 25% of samples. Other divisions including 60% or 65% of samples in the training set have also been demonstrated to be effective. Such a division allows for a sufficient number of samples in the training set and a representative number of members among the test set (Granato and others 2011a).

 C 2014 Institute of Food Technologists®

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 307

Chemometrics, analytics, and fruit juices . . . been used as response variables to analyze commercial products in a wide range of countries (Cozzolino and others 2003; Pillonel and others 2003; Cosio and others 2006). Sugar and acid concentrations of 98 grape juice samples were analyzed by PCA over 2 successive years, totaling 196 samples. A scatter plot of all genotypes was constructed, where PC1 explained 41.4% and PC2 22.3% of total variance. It was possible to observe the formation of 3 groups of cultivars based on their genetic characteristics or use purpose. The juices produced with the hybrid of Vitis labrusca and Vitis vinifera were associated with high sugars (principally sucrose) and low acids. Among the Vitis vinifera varieties, it was observed that wine grapes in general had more acids and sugars than table grapes (Liu and others 2006). All analytical measurements present an intrinsic variability, and when a large set of samples (n > 20) is analyzed for various quality parameters, PCA may not be able to explain most part of results, as demonstrated by this study. However, this unsupervised techniques is very useful to study data structure. The origin of fruits differentiates their physicochemical composition. Ros and others (2004) characterized 21 samples of chaenomeles juice from physicochemical analysis. PCA was used to verify the differences between the juices; PC1 and PC2 were able to explain 63.50% of total variability and it was possible to observe the separation of the juice obtained from Lithuanian and Latvian genotypes, indicating difference in the physicochemical composition of the juices. Obviously, when the data set is somewhat limited, a high percentage of the variance is retained in the model and the scatter plot can be used to differentiate qualitatively samples from different locations, for example. Garcia-Wass and others (2000) analyzed 36 orange juice concentrates that originated from different countries using pyrolysis mass spectrometry (MS). PCA analysis was effective in distinguishing the origin of the orange juices, separating the juices of Brazilian and Israeli origin. Therefore, pyrolysis MS analysis, along with multivariate analysis, showed the potential to determine the origin of commercial juices. Visible (VIS), NIR, and MIR spectroscopy, combined with pattern recognition methods, have been used to differentiate grape juices of Australian Chardonnay and Riesling varieties (Cozzolino and others 2012). Overall, LDA models correctly classified 86% and 80% of the grape juices, according to variety, using attenuated total reflectance (ATR)–MIR and VIS–NIR, respectively. When both vintages (2006 and 2008) were analyzed using VIS–NIR, the LDA model correctly classified 80% of the samples. The PLS-DA models produced an overall rate of 83% and 90% of correct classification using VIS–NIR and ATR–MIR, respectively. In both LDA and PLS-DA, the misclassified samples were identified as those obtained from the smaller winery. Headspace volatile compounds were used as descriptors to classify 40 apple juices on the basis of apple variety (Jonagold and Bramley) and applied heat treatment (heated/nonheated samples) (Reid and others 2004). A total of 18 volatile compounds were identified. PLS-DA gave 92.5% correct classification of the apple juices, according to both variety and heat treatment. For the LDA model, 87.5% and 80% of samples were correctly classified according to apple variety and heat treatment, respectively. Subsequently, the authors also used MIR and NIR spectroscopy to differentiate between the apple juices on the basis of apple variety (Jonagold and Bramley, Golden Delicious and Elstar), in addition to heat treatment (Reid and others 2005). PLS-DA provided 77.2% (for both MIR and NIR data) of correct classification of the apple juices according to heat treatment. The model for vari-

ety correctly classified 78.3% to 100% using MIR data and 82.4% to 100% using NIR data. In general, the Elstar variety presented the lowest percentages of correct classifications. LDA models were also evaluated, but a poor classification performance (about 50% of correct classification) was obtained. Data from multielement analysis of lemon juices from different Argentinian regions was used to develop a reliable method in the traceability of the origin of lemon juices (Pellerano and others 2008). A total of 44 lemon juice samples were selected from 3 different areas of the northwest region of Argentina. Eleven elements were determined (Br, As, Na, Rb, La, Cr, Sc, Fe, Co, Zn, and Sb) and LDA was used to classify the juices. According to the LDA results, 93.2% of the original grouped cases were correctly classified, and 84.1% of the cross-validated grouped cases were correctly classified. To verify the reliability of the model, the method was tested using known samples as unknown variables. In particular, a set of 3 samples, composed of 1 sample for each production area, was randomly removed 5 times, and the model was recalculated. At all events, the samples were correctly classified. However, the ability of the model to classify new samples was not tested using an external validation data set, which limits the evaluation of the performance of the LDA method. It is important to note that LDA is used to have an idea about the classification of samples based on analytical data and, in this sense, if one aims at a more accurate statistical classification, PLS-DA may be a more suitable tool. Gan and others (2014) used atmospheric pressure chemical ionization-MS (APCI-MS) analysis for the classification of 202 monovarietal clarified apple juices in relation to their geographical provenance and cultivar. Before the application of multivariate analysis for the classification of the apple juices, the data set was divided randomly in 2 lots. The 1st lot was composed of 143 samples for the classification model, 70.8% of the total samples (internal validation set), and the other group used for their validation was formed of 59 samples (external validation). PLS-DA was applied to classify the clarified juices and it was able to correctly classify 100% and 94.2% of apple juices in relation to cultivar and geographical origin, respectively. Thus, PLS-DA was demonstrated as a viable tool for exploring and interpretating APCI-MS data. Nontargeted hydrogen nuclear magnetic resonance (1 H NMR) fingerprinting was used by Longobardi and others (2013) for the characterization of the geographical origin of 131 Italian sweet cherry juices. Multivariate statistical methods were used to distinguish the provenance from 2 different regions of Italy (Emilia Romagna and Puglia). PCA was applied to data but only about 21% of data were explained by the 1st 3 principal components and, therefore, this method was not considered good enough to differentiate samples from different origins. On the other hand, using SIMCA it was possible to classify 96.7% of cherry juice of the 2 origins. Verification of the model was by cross-validation and external validation, and this verified 83.7% and 87.2% of performance, respectively. The results showed that the experimental data contained excellent information with which to construct appropriate models. This paper is a good example that demonstrates that, in some cases, PCA does not seem to be the best statistical approach to check for similarities/dissimilarities among samples.

Authentication and adulteration Adulteration of juices is an illegal practice adopted by some manufacturers and suppliers, who usually add cheaper products to highly valued fruit products. This practice can affect the nutritional value, the sensory properties, and chemical composition

308 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . of the original juice, and it also violates consumer rights and expectations. Therefore, the monitoring of the authenticity of food products, including fruit juices, represents a demanding and necessary task to be performed by regulatory agencies (governmental bodies), industry, and academic researchers. In the Australian context, adulteration of orange juice includes the substitution of orange peel extract for juice. Juice is sometimes labeled as “national” when in truth it has a quantity of imported product and the substitution of orange juice by other juice. In order to differentiate the type of orange juice and peel extract, 482 samples of Australian and Brazilian orange juices and peel extracts were studied regarding the concentration of 22 elements (traces) using inductively coupled plasma coupled to atomic emission spectrometry (ICP-AES) and ICP coupled to MS (ICP-MS). The PCA of the products showed a differentiation between them that could be associated with differences in soil and rootstock (Simpkins and others 2000). Abad-Garc´ıa and others (2012) performed the characterization of phenolic compounds of citrus juices from Spanish cultivars by HPLC-diode array detector-MS/MS and evaluated the data using chemometric techniques. First, all the phenolic compounds determined (49 different phenolic compounds) were analyzed by cluster analysis (CA) and PCA. The similarities of the samples were calculated on the basis of Euclidean distance, and Ward’s method was used to establish clusters. The dendrogram showed 5 clusters at a similarity of 0.6. The 1st cluster contained only lemon juices; the 2nd clusters and the 4th were made up of tangerine juices; the 3rd cluster contained sweet orange juices; and the last cluster contained grapefruit juices. Using PCA analysis, it was possible to explain 70% of the total variance with the 3 principal principal components (PCs). In the PCA plot, it was possible to observe 5 different groups, confirming the CA analysis. The next step was to repeat the tests (PCA and CA) only considering the variables, which had a concentration higher than 10 mg/L (15 phenolic compounds). It was found that there were extreme variations in the phenolic profile between the grapefruit and lemon juices compared with the sweet orange and tangerine juices, thus distorting the CA and PCA. Excluding the grapefruit and lemon juices, CA and PCA were performed for the sweet orange and tangerine juices. It was then possible to verify them with a level of similarity from 0.4 to 0.6, 4 clusters were suggested. One group contained sweet orange juices, and there were 3 groups of tangerine juices (clusters formed in relation to subclass type). The tridimensional PCA plot showed a natural separation in 4 well-separated groups, which confirmed the CA analysis. These 15 variables were able to explain almost all the information present in the full data set, and the exclusion of the variables offers several advantages to enable the development of fast methods to guarantee fruit juice authenticity. Physicochemical analyses (◦ Brix, ash, minerals, pH, acidity, aminic nitrogen, vitamin C, and nitrogen) associate with chemometrics were used by Vaira and others (1999) for the authentication of orange juice. The analyses were performed in 41 juices samples and the 3 first PCs of the PCA were able to explain 73.32% of total variance. PCR was performed with Brix grades being the dependent variable for detection of adulterations in orange juice based in the dilution or addition of sugar. The models constructed were simulated with the theoretical dilution of the juices, and it showed effectiveness for the detection of adulterations in juices ranging from 15% to 30% of dilution. Jandric and others (2014) studied the authenticity of fruit juices using the ultra performance liquid chromatography coupled to quadrupole time-of-flight (UPLC-QTOF) MS technique through a metabolomic pattern. Score plots used for PCA showed segre C 2014 Institute of Food Technologists®

gation between pure pineapple juice and adulterated pineapple juice at a level of 1% with the addition of orange, grapefruit, apple, pomelo, and clementine juice. Clear separation was also observed between pure orange juice and orange juice adulterated at a level of 1% (added grapefruit, apple, pomelo, and clementine juice). A separation was also observed between pure grapefruit juice and grapefruit juice adulterated at a level of 5% (added apple, pomelo, and clementine juice). The results from this work showed that metabolomic data analyzed by chemometrics are potential screening tools for the rapid detection of juice adulteration (OmsOliu and others 2013). Vigneau and Thomas (2012) investigated the use of 1 H NMR spectroscopic profiling, associated with a PLS approach, in order to classify authentic and adulterated orange juices. An experimental database of 150 samples of authentic juices, and adulterated orange juices with a known percentage of clementine juice was evaluated. Six preprocessing strategies and various variable selection procedures were evaluated; the lowest error rate (3.47% of misclassification) was obtained using logarithm and Pareto scaling as preprocessing and variables selection, based on backward interval PLS or on genetic algorithm. Likewise, Le Gall and others (2011) used 1 H NMR spectroscopy and multivariate analysis to detect adulteration with pulp wash in orange juice. A total of 263 authentic and 50 pulp wash juices were used to create the models. PCA was used to train set samples and the separation among authentic and pulp juices groups were used to create classification models. The model building for LDA with 6 PCs classified correctly 94% of juices. A low-cost flow-batch analyzer based on ultraviolet-visible spectrophotometry was used to classify citrus juices with respect to brand (Freitas and others 2013). This study involved 150 samples of 6 commercial brands of processed citrus juice and 20 samples of fresh citrus juice; all taken from different lots. Samples were diluted by the addition of water in the proportion 1:70 (v/v). By excluding the saturation region at lower wavelengths, the working range was set to 232 to 358 nm. All samples evaluated were correctly classified by LDA and PLS-DA, whereas only 5 samples were not correctly classified by SIMCA, confirming the discriminatory power of the spectral data. The HPLC-MS technique was used for comprehensive metabolomic fingerprinting of 84 samples of fruit juices prepared from expensive (orange) or relatively low-priced (apple, grapefruit) fruits (Vaclavik and others 2012). The metabolomic data were used as a descriptor for the authentication of the samples, employing chemometrics. The classification ability of the LDA model was 100% and ranged from 94.1% to 92.9% for cross-validation. When orange–apple/orange–grapefruit admixtures were prepared at adulteration level, 10% were removed from the data set; a correct classification of 100% in both training and validation stages was achieved. The final LDA model enabled the reliable detection of the 15% addition of apple and grapefruit juice into orange juice; the presence of orange juice in grapefruit and apple juice was detectable at 25% and 10%, respectively. He and others (2007) used MIR spectroscopy for the rapid authentication of 12 different juices of apple, blueberry, cranberry, Concord grape, and their blends. The spectra of the phenolrich fraction of the juices were used to construct the SIMCA model for pattern recognition and 100% correct classification was achieved. None of the juice blends was assigned in the model as pure juice, supporting the prediction of the phenol fraction. For cross-validation and external validation, zero percent of misclassification was achieved by the phenol-rich fraction model. It is important to note that when the number of samples is limited, the

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 309

Chemometrics, analytics, and fruit juices . . . classification of samples may be overestimated and results might have been different if more samples were added to the experiment. A rapid authentication of Concord grape juice using Fourier transform infrared spectroscopy (FTIR) was performed by Snyder and others (2014). Discrimination between the 64 samples of grape juices was determined using SIMCA and it was possible to verify a minimal variability within the studied variety, compared with the variability between the different varieties of grapes, which demonstrated good discrimination for the varietal model. The interclass distance values for this study varied from 17 to 41 for the grape juice variety. Concord grape juice was best discriminated from red grape juice, and least discriminated from Niagara grape juice. SIMCA showed that it is suitable to discriminate different varieties of grape juice. Kelly and Downey (2005) evaluated cane syrup, high fructose corn syrup, beet sucrose, and a synthetic solution of fructose, glucose, and sucrose as potential adulterants in apple juices using MIR. KNN was used for the classification of 224 authentic and 480 adulterated juices. A correct classification rate of 96.5%, 93.9%, 92.2%, and 82.4% was found for partially inverted cane syrup, beet sucrose, high fructose corn syrup, and a synthetic solution of fructose, glucose, and sucrose, respectively. In a similar work, Downey and Kelly (2004) assessed adulteration (10% to 75% w/w) of strawberry and raspberry pur´ees by apple by means of NIR. Results were analyzed by SIMCA and PLS. Classification models that presented 75% accuracy (strawberry) and 95% accuracy (strawberry) were obtained and authors verified that the detection limit of adulterant in those pur´ees were 20% w/w for strawberry and 25% w/w for raspberry. The application of 1 H NMR as a screening for the authenticity of orange juices was used by Vogels and others (1996). Using 3 different category classes, (1) authentic juices, (2) pulp wash, and (3) other nonauthentic juices, KNN analysis was performed and 98.8% of classification was observed with 2 PCs and 3, 4, 9, or 10 nearest neighbors. Likewise, Belton and others (1998) used PCA and LDA on 1 H NMR spectra of apple juices (n = 26) from Spartan, Russet, and Bramley varieties and authors obtained a nice classification of juice samples (24 of 26 samples), where malic acid and sucrose contents were the main chemical markers that accounted for that classification.

tomato juices commercialized in Italy and Spain. The 1st 2 PCs were able to explain 68.60% of total variance (40.81% for PC1 and 27.80% for PC2), where PC1 was correlated in the positive values with 2-ethyl-1-hexanol, furfural, 2-isobutylthiazole, 1-nonanol, 3-methyl-1-butanol, 1,6-octadien-3,7-dimethyl-3-ol, and (E)-geranyl acetone, and the negative values in PC1 were associated with saltiness, red intensity, and odor. PC2 was related to the positive direction, with (Z)-3-hexenol, hexanal, and vegetable notes. Conventional tomato juices were arranged in a closer grouping; organic juices were more dispersed, and generally showed higher concentration of volatiles (Vallverd´u-Queralt and others 2013). van Ruth and others (2008) used PCA to evaluate the mass spectral data of the profiles of volatiles in mandarin juices collected by proton transfer reaction (PTR)-MS, both in their natural form, and compared with concentrated juice reconstituted only with water and also reconstituted with water added with natural juice. The 2 first principal components were able to explain 79.83% of the total variance and it was possible to suggest by PCA that during the concentration of the juice, the volatile characteristics of natural juice are lost. Besides the results seem to be obvious, PTR-MS coupled to chemometrics was proved to be suitable to verify the effect of technological operations on the volatile compounds of juices. Ciosek and others (2005) presented a strategy of data analysis for artificial taste and odor systems for orange juices using a sensor array. Five brands of orange juice were analyzed by an E-tongue and the responses were used as descriptors to classify the samples with respect to brand. Comparison of different means for data analysis was performed, for example, direct analysis (raw data analyzed by PLS-DA or artificial neural ne2rk [ANN]) and 2-stage analysis (PCA outputs analyzed by ANN or PLS-DA outputs analyzed by ANN). When PLS-DA was combined with ANN, the final model resulted in 100% correct classification of the juice brands for both training and test samples. In a similar study, Martina and others (2007) used an E-tongue to classify juices from different fruits (orange, pear, peach, and apricot) and brands (3 brands of orange juice). In all cases, 100% of correct classification was verified for both training and test samples. A potentiometric E-tongue was used to determine orange beverages of different brands in order to evaluate its recognition ability in beverage detection (Liu and others 2012). Using E-tongue coupled with SIMCA, it was possible to classify the samples from 6 brands of oranges with 97.99% correct classification. However, SIMCA did not distinguish 4 types of orange beverages and, therefore, this method showed a certain limitation when used to discriminate the samples. A multisensor system was applied for the characterization of white grape juices by Guti´errez-Capit´an and others (2013). Data analysis was performed by multivariate analysis, and the SIMCA method was used to classify the juices. Three different models, with the Albari˜no, Muscat a` Petit Grains Blanc, and Palomino varieties, were built using an unsupervised statistical technique (PCA). All 3 models were formed by 2 PCs, which explained 86%, 83%, and 86% of total variance, respectively. Therefore, the SIMCA method demonstrates a high potential for discriminating and classifying white grape varieties.

Sensory properties Sensory analysis is a multidisciplinary science and has been widely used in Food Science and Technology to assess a product’s quality with respect to color, overall aspect, creaminess, taste, and odor, among other intrinsic quality parameters (Cruz and others 2010). In this regard, because products present differences, not only because of remarkable discrepancies in raw materials, but also because of technological operations employed in their production, it is obvious that outputs from different producers/origins will present distinctive sensory characteristics. These perceived (or not) differences may be used to discriminate the product if one uses appropriate statistical techniques. Not only univariate methods such as ANOVA, but also chemometrics have been employed to analyze and extract as much information as possible from sensory data (Lawless and Heymann 2010). In order to satisfy consumer acceptance, it is necessary to perform analysis regarding the volatile profile and sensory properties of tomato juice, because of increasing demand. PCA was used Chemical composition to correlate the volatile compounds determined by solid phase The chemical composition of juices, especially phenolic commicroextraction (SPME)/GC-MS with the sensory parameters pounds, is highly influenced by a variety of factors, such as matu(obtained by quantitative descriptive analysis) of 12 commercial rity, variety, growing region, water stress, pathogenicity, and other 310 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . agronomic and technological conditions (Fuleki and Ricardo-daSilva 2003). Phenolic compounds have been successfully used to assess the authenticity and typicality of grape-based beverages, such as wine and juice (Garcia-Parrilla and others 1997). The phenolic content characterizes the varieties and may also provide useful information to establish differences among products from different origins or from different cultivars, and vintages (Granato and others 2012). Three types of Italian commercial apricot juices (organic, concentrate, and conventional) were characterized in terms of carbohydrates, organic acids, amino acids, phenolic compounds, furanic compounds by HPLC, and color parameters (a* coordinate). PCA on the chemical composition of juices explained 66% of the total variance by 2 principal components. The 1st component was associated with 5-hydroxymethylfurfural, malic acid, fructose, chlorogenic acid, and the a* coordinate (redness), while the 2nd component was associated with threonine and asparagine. The organic juices showed separation from the other juices because of higher levels of chemicals in the 1st component (Versari and others 2008). Braga and others (2013) used GC-flame ionization detector to measure the volatile composition of 18 apple juices from Fuji Suprema, Lis Gala, and Gala varieties, as well as apple fermented beverages and PCA together with HCA were used to try to classify samples and to understand how volatile compounds behave after fermenting the apple juice. By using PCA, 2 PCs explained more than 50% of data variability and a clear separation between juice and fermented beverage was observed. This result was confirmed by using HCA and no further supervised statistical technique was necessary to explain the experimental results. In order to demonstrate diversity among Spanish pomegranates, Mena and others (2011) performed the phytochemical characterization of 15 pomegranate cultivars (juices) and used PCA and HCA to describe the difference between the samples. The 1st 3 principal components of PCA were able to explain 74.28% of total variance; the 1st component showed a positive correlation with the b* coordinate, chroma, hue, lightness (L*), and juice yield, and a negative correlation with total phenolic compounds, ferric-reducing antioxidant power, 2,2 -azinobis-3-ethylbenzthiazoline-6-sulfonic acid (ABTS) assay, and total monomeric anthocyanins. PC2 was positively correlated with TA, citric acid, and 2,2-diphenyl-1-picrylhydrazyl (DPPH), and negatively correlated with pH, malic acid, and maturity index. PC3 correlated with sugar and the a* coordinate (positive) and total punicalagins (negative). HCA was used to verify the similarity between the samples, which considered the Euclidean distance and Ward’s method to group the juices into 2 clusters (graphic representation by dendrogram). The 1st cluster included juices of Spanish origin, while the 2nd included juices from the rest of the world; difference was shown by the phytochemical composition of the samples. Cross-flow filtration techniques were employed to clarify apple juices and the changes in polyphenol composition were monitored throughout the clarification process by HPLC (Mangas and others 1997). LDA, based on polyphenol composition, was able to correctly classify 97% of ultrafiltered samples and 100% of microfiltered ones. A good discrimination between ultrafiltered and microfiltered juices was verified. It was concluded that technological factors related to the clarification process of apple juice, such as membrane type, temperature, and process time, significantly influenced the profile of phenolic compounds, especially the hydroxycinnamic acids and procyanidin B1.  C 2014 Institute of Food Technologists®

Sim´o and others (2004) used micellar electrokinetic chromatography laser–induced fluorescente to determine the amino acids content in orange juices and discriminant analysis was used to classify 3 types of the juices in nectar, juice reconstituted from concentrates, and pasteurized juice not from concentrates. The classification of 26 standard juices was based on the content of 3 principal amino acids determined (L-arginine, L-asparagine, and gamma-aminobutyric acid). These 3 variables were able to classify correctly 100% of standard samples or to cross-validation procedures. Fluorescence spectra of 120 apple juices samples were obtained with the objective of evaluating the picking date of apples and correlating this with traditional harvest indices (Seiden and others 1996). The SIMCA model was applied in spectra and it was possible to classify the samples as either Jonagold or Elstar in 2 different clusters, observed in a 2D scatter plot. The models were not able to predict the picking date of apples in relation to the amount of titrable acids in the juices. When the data of soluble solids were added, it was possible to separate the models by picking day (1, 2, and 7 for Jonagold). Blanco-Gomis and others (1998) used the contents of some amino acids and riboflavin and pattern recognition statistical methods (KNN, LDA, PCA, SIMCA, PLS, and Bayes analysis) to typify 35 clarified and nonclarified apple juices and authors. PCA was used to analyze the data structure and more than 84% of the variance was explained by 3 PCs, but no clear separation was achieved. SIMCA and Bayes analysis were deemed suitable to typify juices based on the use (or not) of membrane technology. Duarte and others (2002) used FTIR-ATR spectroscopy coupled with chemometrics (PCA and PLS) to monitor the sugars (sucrose, fructose, and glucose) of Mango juices during ripening. The days of ripening evaluated were 1, 3, 5, 9, 13, 17, 19, and 21 being analyzed 3 juices in each stage. PCA of the FTIR spectra was performed to verify the separation among the juices in function of ripening with 99% of the variance explained by 2 PCs. It was clear the separation along the PC1 between the juices processed with fruits harvested to 1st days (1 to 5 d) compared to others days. PLS1 model developed to standard sugars was used to 1st derivate of the mango juices spectra, the model was able to predict sugar concentration in juices with 5.1% of prediction errors (% root-mean-square error of prediction) for sucrose, 6.7% for fructose, and 8.0% for glucose until the 9th d, being 25% of error for final days ripening.

Antioxidant activity and other functional properties Many studies indicate that reactive oxygen species and, consequently, oxidative stress are closely associated with a diverse assortment of diseases (Choe and Min 2009). Oxidative stress has been variously shown as depressed levels of antioxidant substances, low levels of endogenous enzymes (which form part of the antioxidant defense system), and increased levels of oxidation products (malondialdehyde and hydroperoxides, for example) (Macedo and others 2013). Antioxidants are compounds that significantly delay or inhibit the oxidation of oxidizable substrates at low concentrations (Halliwell and Gutteridge 2001). As antioxidant power is directly influenced by the phenolic composition, and interactions between such components, the bioactivity of fruit juices toward free radicals can also be used to assess the quality of fruit juices. Eight pomegranate juices, based on antioxidant capacity measured by DPPH (EC50 , AA), ABTS, β-carotene-linoleate, total phenolic content (TPC), and total anthocyanin content (TAC) were investigated by PCA and CA. The dimensions of the data set were reduced to 2 components and this explained 93% of

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 311

Chemometrics, analytics, and fruit juices . . . Table 3–Overall features, advantages, and limitations of the use of some multivariate statistical techniques used in Food Science and Technology. Chemometric tools

Overall features

Advantage

Disadvantage

Principal component analysis (PCA)

Technique that reduces the dimensionality of the original data matrix, retaining the maximum amount of variability. A minimum of 3 variables should be assessed in a high number of samples, preferably n > 20

Shows with a simple analysis it is possible the visualization of the original data in an n-dimensional space by identifying the directions in which most of the information is retained

Hierarchical cluster analysis (HCA)

An unsupervised classification procedure that involves a collection of statistical methods that behave similarly or show similar characteristics. It is recommended to use, at least, 3 variables and 5*2k cases (objects). The dendrogram for responses does not provide quantitative information about association among response variables Model is based in a linear function that is used to model response variables (quantitative descriptive analysis, for example) against categorical-dependent variables (geographical origin, for example). LDA seeks to reduce dimensionality while preserving as much of the class discriminatory information as possible

The graphical representation (dendrogram) generated by using HCA allows the visualization of clusters and correlations between either samples or variables simultaneously. These graphs present the data from high-dimensional row spaces in a way that facilitates the interpretation of experimental results

PCA cannot be used to classify objects unlike many research has shown. For this purpose, even if a clear separation is observed using a bi-dimensional plot, a supervised technique should be used in parallel One potential limitation of HCA is related to decide how many clusters should be derived from the data. Once HCA is based on algorithms, solutions may be nonunique

Model is based on the classes of the k closest objects Model optimization is based on an optimal value of k, which is selected by cross-validation procedures (the k giving the lowest classification error) Model is based on the algorithm that searches for latent variables with a maximum covariance with the classes Model optimization is based on an optimal value of latent variables, which is selected by cross-validation procedures (aiming the lowest classification error)

Very simple implementation, robust to noisy training data, effective if training data are large

Model is based on a collection of G PCA models, one for each of G classes. Model optimization is based on the choice of the number of principal components retained of each class by cross-validation

Robust in cases where the different classes involve discretely different analytical responses, handles better cases where the within-class variance structure is quite different for each class

Linear discriminant analysis (LDA)

K-nearest neighbors (KNN)

Partial least square discriminant analysis (PLS-DA)

Soft independent modeling of class analogy (SIMCA)

the total variance. PC1 was responsible for 67.4% of the total information and correlated highly with ABTS, TPC, EC50 , and β-carotene-linoleate. PC2 showed 25.9% of total variance and correlated highly with TAC and AA. The pomegranate juices were also grouped in clusters to see the dissimilarity between the samples, and the Euclidean distance and Ward’s method were used to classify the juices into 3 groups; 2 groups were formed of 3 samples each, and the other group contained 2 samples (C ¸ am and others 2009). This study shows that even when the number of samples is limited, chemometrics may also be effective when there are a large number of response variables characterizing the products. Polyphenol content in fruits is influenced by cultivation type (conventional or organic production) and harvesting conditions. PCA was used for this purpose to evaluate the differences between 33 samples of conventional and organic tomato juices. The 1st 2 principal components explained 85.30% of total variance; PC1

This multivariate method may be used for several response variables and presents a low error rates if data stick to a normal distribution.

Robustness in the face of data noise and missing data, not affected by multicolinearity because its components are orthogonal, easy to implement variable selection methods

LDA is, in essence, a parametric statistical method, that is, if the distributions are significantly non-Gaussian, the LDA projections may not preserve complex structure in the data needed for classification LDA is extremely sensitive to outliers. LDA will fail when the discriminatory information is not in the mean but rather in the variance of the data Biased by value of k, computationally very intensive, easily fooled by irrelevant attributes Random noise creeps in as more latent variables are added, requires sufficient calibration samples for each class to enable effective determination of discriminant thresholds, the classes are divided using linear partitions in the classification space which can be problematic if the class separation is nonlinear The models are calculated with the aim of describing variation within each class and no attempt is made to find directions that separate classes

accounted for 77.43% and PC2 accounted for 7.87% of the total variation. This analysis enabled the authors to separate all samples into 2 groups, organic and conventional juices, and this graphical approach was suitable to differentiate the antioxidant capacity and phenolic compounds of tomato juices (Vallverd´u-Queralt and others 2012). Feng and others (2012) determined flavonoids (hesperidin, nobiletin, and tangeretin) by RP-HPLC in 25 samples of citrus fruit juices and commercial beverages. The results were submitted to PCA and 2 PCs were able to explain 91% of total variance with the juices separated in 3 groups in accordance with the botanical classification (orange, mandarin, and hybrid). For the commercial juices, another PCA was carried out and results showed that juices marketed as ‘pure juices’ remained in the opposite side of the scatter plot of the juices containing 10%, 15%, and 20% of fruit juices. The juice of Tinospora cordifolia (a fruit popularly known as guduchi), which is an herbaceous vine of the

312 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . Menispermaceae family, is well known for its immunomodulatory and adaptogenic properties. PLS-DA was used in a study to evaluate the changes in phytochemical composition of this juice during refrigerated storage. Juice samples were analyzed on day 0, 15, and 30 using UPLC-QTOF/MS. The most abundant metabolites were degraded by 70% to 80% over a period of 15 d and 99% in 30 d. The PLS-DA model was able to correctly classify 100% of the samples from 0 to 15 d of storage, while juices stored for 30 d had 66.7% correct classifications. This poor classification ability was due to the incorrect assignment of 1 sample that was stored for 30 d, due to the extensive degradation of samples stored for 15 d. The same has been confirmed by comparing the abundances of ions of identified compounds in juices, where samples stored for 15 d showed very low abundance, compared to 0 d of storage (Shirolkar and others 2013). Not only results from in vitro experiments can be subjected to multivariate statistical techniques but also those that use ex vivo and in vivo protocols. Alezandro and others (2013) used a PCA based on 9 biomarkers of oxidative stress of diabetic rats supplemented with no (control group), 1.0, and 2.0 g dry weight/kg of body weight of jaboticaba (Myrciaria jaboticaba) juice powder. Results revealed that rats were clearly separated into different groups by 2 principal components and 63.60% of the variability was explained by a 2D projection. Authors concluded that PCA was highly effective in showing differences among groups after 40 d of supplementation of jaboticaba fruit extract. Likewise, Macedo and others (2013) also used PCA to show the effect of red wine supplementation to high-fat diet Wistar rats, in which authors chose 8 biomarkers of oxidative stress to classify rats according to the experimental design (LOW—wine with low in vitro antioxidant activity; MED—wine with intermediate antioxidant activity; HIGH—wine with high antioxidant activity). Two PCs were able to explain almost 71% of data variability and PCA was a suitable statistical approach to highlight differences among groups that were treated with wine as compared to the control group (water).

Summary of the application of chemometrics In order to summarize the theoretical properties as well as to highlight some important characteristics of the chemometric tools described in this review aiming to help the reader to choose the appropriate multivariate statistical technique that may be applied to experimental data, Table 3 was designed. Not only the advantages but also the drawbacks of using each chemometric tools are described, and the proper application of such methods should consider these limitations prior to analyze experimental data. Although we have pinpointed the interesting and advantages of using multivariate statistical techniques in Food Science and Technology, industry companies as well as governmental regulatory agencies face a considerable number of hurdles in adopting these methods in the routine due to some problems related to personal trained to analyze and interpret statistical results, disinterest in statistical analysis, most software packages are expensive to be implemented in different sectors (quality control, P&D) of the food company and usually require a constant updates and some procedures require purchase of add-on modules/components. All these limitations (commercial and related to training of personal) still hinder the use and dissemination of chemometrics in food research.

Concluding Remarks The potential of chemometric techniques in the analysis of fruit juices has been widely demonstrated. These multivariate statistical  C 2014 Institute of Food Technologists®

methods are useful, not only to detect adulteration, to authenticate, and to assess provenance and origin, but also to explore sensory and chemical properties, thus establishing the typicality of diverse fruit juices. In this general sense, chemometrics is an important, effective, and very applicable tool for the quality control of fruit juices, which both the industry and inspection agencies can adopt to monitor the fruit juice products, in addition to evaluating results in the field of research. We showed herewith that many sophisticated statistical techniques have been applied by scientists in diverse fields of Food Science and Technology. It is noteworthy that users need to have a fundamental and theoretical knowledge of the chosen method as well as their applicability and limitations. Just like in any other classical statistical method, including basic statistics, the use of chemometrics requires understanding of the principles of the method and of the meaning of the individual input parameters as well as a critical evaluation regarding the obtained results.

Acknowledgments The authors thank UEPG, UFPR, CNPq and CAPES (CAPES/PNPD, and CAPES/CNPq – Science Without Borders scholarship – D. Granato – Process # 8771–13–8) for financial support to conduct this research.

References Abad-Garc´ıa B, Berrueta LA, Garm´on-Lobato S, Urkaregi A, Gallo B, Vicente F. 2012. Chemometric characterization of fruit juices from Spanish cultivars according to their phenolic compounds contents: I. citrus fruit. J Agric Food Chem 60:3635–44. Adams MJ. 2004. Chemometrics in analytical spectroscopy. Cambridge: Royal Society of Chemistry. Alezandro MR, Granato D, Lajolo FM, Genovese MI. 2011. Nutritional aspects of second generation soy foods. J Agric Food Chem 59:5490–7. Alezandro MR, Granato D, Genovese MI. 2013. Jaboticaba (Myrciaria jaboticaba (Vell.) Berg), a Brazilian grape-like fruit, improves plasma lipid profile in streptozotocin-mediated oxidative stress in diabetic rats. Food Res Intl 54:650–9. Aprea E, Corollaro ML, Betta E, Endrizzi I, Dematt`e ML, Biasioli F, Gasperi F. 2012. Sensory and instrumental profiling of 18 apple cultivars to investigate the relation between perceived quality and odour and flavour. Food Res Intl 49:677–86. Arvanitoyannis IS, Katsota MN, Psarra EP, Soufleros EH, Kallithraka S. 1999. Application of quality control methods for assessing wine authenticity: use of multivariate analysis (chemometrics). Trends Food Sci Tech 10:321–36. Baiano A, Terracone C. 2011. Varietal differences among the phenolic profiles and antioxidant activities of seven table grape cultivars grown in the south of Italy based on chemometrics. J Agric Food Chem 59:9815–26. Belton PS, Colquhoun IJ, Kemsley EK, Delgadillo I, Roma P, Dennis MJ, Sharman M, Holmes E, Nicholson JK, Spraul M. 1998. Application of chemometrics to the 1H NMR spectra of apple juices: discrimination between apple varieties. Food Chem 61:207–13. Bernu´es A, Ripoll G, Panea B. 2012. Consumer segmentation based on convenience orientation and attitudes towards quality attributes of lamb meat. Food Qual Pref 26:211–20. Berrueta LA, Alonso-Salces RM, Heberger K. 2007. Supervised pattern recognition in food analysis. J Chromatogr A 1158:196–214. Besten MA, Nunes DS, Wisniewski Jr A, Sens SL, Granato D, Simionatto EL, Scharf DR, Dalmarco JB, Matzenbacher NI. 2013. Chemical composition of volatiles from male and female specimens of Baccharis trimera collected in 2 distant regions of southern Brazil: a comparative study using chemometrics. Quim Nova 36:1096–100. Blanco-Gomis D, Fernandez-Rubio P, Guiti´errez-Alvarez MD, Magas-Alonso JJ. 1998. Use of high-performance liquid chromatographic-chemometric techniques to differentiate apple juices clarified by microfiltration and ultrafiltration. Analyst 123:125–9.

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 313

Chemometrics, analytics, and fruit juices . . . Boggia R, Casolino MC, Hysenaj V, Oliveri P, Zunin P. 2013. A screening method based on UV–Visible spectroscopy and multivariate analysis to assess addition of filler juices and water to pomegranate juices. Food Chem 140:735–41. Braga CM, Zielinski AAF, Silva KM, Souza FKF, Pietrowski GAM, Couto M, Granato D, Wosiacki G, Nogueira A. 2013. Classification of juices and fermented beverages made from unripe, ripe and senescent apples based on the aromatic profile using chemometrics. Food Chem 141:967–74. Burin VM, Falc˜ao LD, Gonzaga LV, Fett R, Rosier JP, Bordignon-Luiz MT. 2010. Color, phenolic content and antioxidant activity of grape juice. Cienc Tecnol Aliment 30:1027–32. C ¸ am M, His¸il Y, Durmaz G. 2009. Classification of eight pomegranate juices based on antioxidant capacity measured by four methods. Food Chem 112:721–6. Cheng H, Qin ZH, Guo XF, Hu XS, Wu JH. 2013. Geographical origin identification of propolis using GC–MS and electronic nose combined with principal component analysis. Food Res Intl 51:813–22. Choe E, Min DB. 2009. Mechanisms of antioxidants in the oxidations of foods. Compr Rev Food Sci Food Safety 8:345–58. Ciosek P, Brz´ozka Z, Wr´oblewski W, Martinelli E, Di Natale C, D’Amico A. 2005. Direct and two-stage data analysis procedures based on PCA, PLS-DA and ANN for ISE-based electronic tongue—effect of supervised feature extraction. Talanta 67:590–6. Clerc F, Farrusseng D, Mirodatos C. 2008. OptiCat: a versatile open-source optimization platform for experimental design. Chemometr Intell Lab 93:167–71. Correia PRM, Ferreira MMC. 2007. Non-supervised pattern recognition methods: exploring chemometrical procedures for evaluating analytical data. Qu´ım Nova 30:481–7. Cosio MS, Ballabio D, Benedetti S, Gigliotti C. 2006. Geographical origin and authentication of extra virgin olive oils by an electronic nose in combination with artificial neural networks. Anal Chim Acta 567: 202–10. Cozzolino D, Smyth HE, Gishen M. 2003. Feasibility study on the use of visible and near-infrared spectroscopy together with chemometrics to discriminate between commercial white wines of different varietal origins. J Agric Food Chem 51:7703–8. Cozzolino D, Cynkar W, Shah N, Smith P. 2012. Varietal differentiation of grape juice based on the analysis of near- and mid-infrared spectral data. Food Anal Method 5:381–7. Cruz AG, Cadena RS, Walter EHM, Mortazavian AM, Granato D, Faria JAF, Bolini HMA. 2010. Sensory analysis: relevance for prebiotic, probiotic, and synbiotic product development. Compr Rev Food Sci Food Safety 9:358–73. Cruz AG, Souza SS, Celeghini MS, Ferreira MMC, Granato D, Sant’Ana AS, Faria JA. 2011. Monitoring the authenticity of Brazilian UHT milk: a chemometric approach. Food Chem 124:692–5. Cruz AG, Cadena RS, Alvaro MBVB, Sant’ Ana AS, Oliveira CAF, Bolini HMA, Faria JAF, Ferreira MM. 2013. Assessing the use of different chemometric techniques to discriminate low-fat and full-fat yogurts. LWT-Food Sci Technol 50:210–4. Dani C, Oliboni LS, Vanderlinde R, Bonatto D, Salvador M, Henriques JAP. 2007. Phenolic content and antoxidant activites of white and purple juices manufactures with organically or conventionally-produced grapes. Food Chem Toxicol 45:2574–80. Dani C, Oliboni LS, Umezu FM, Pasquali MAB, Salvador M, Moreira JCF, Henriques AP. 2009. Antioxidant and antigenotoxic activities of purple grape juice—organic and conventional—in adult rats. J Med Food 12:1111–8. D´avalos A, Bartolom´e B, G´omez-Cordov´es C. 2005. Antioxidant properties of commercial grape juices and vinegars. Food Chem 93:325–30. Downey G, Kelly D. 2004. Detection and quantification of apple adulteration in diluted and sulfited strawberry and raspberry purees using visible and near-infrared spectroscopy. J Agric Food Chem 52:204–9. Duarte IF, Barros A, Delgadillo I, Almeida C, Gil AM. 2002. Application of FTIR spectroscopy for the quantification of sugars in mango juice as a function of ripening. J Agric Food Chem 50:3104–11. Ellendersen LSN, Wosiacki G, Granato D, Guergoletto KB. 2012. Development and sensory profile of a probiotic beverage from apple fermented with Lactobacillus casei. Engr Life Sci 12:475–85. Engel J, Gerretzen J, Szyman E, Jansen JJ, Downey G, Blanchet L, Buydens LMC. 2013. Breaking with trends in preprocessing?. Trends Anal Chem 50:96–106.

Fabani MP, Ravera MJA, Wunderlin DA. 2013. Markers of typical red wine varieties from the Valley of Tulum (San Juan-Argentina) based on VOCs profile and chemometrics. Food Chem 141:1055–62. Feng X, Zhang Q, Cong P, Zhu Z. 2012. Simultaneous determination of flavonoids in different citrus fruit juices and beverages by high-performance liquid chromatography and analysis of their chromatographic profiles by chemometrics. Anal Methods 4:3748–53. Fernandes A, Barreira JCM, Antonio AL, Santos PMP, Martins A, Oliveira BPP, Ferreira ICFR. 2013. Study of chemical changes and antioxidant activity variation induced by gamma-irradiation on wild mushrooms: comparative study through principal component analysis. Food Res Intl 54:18–25. Frankel EN, Bosanek CA, Meyer AS, Silliman K, Kirk LL. 1998. Commercial grape juices inhibit in vitro oxidation of human low-density lipoproteins. J Agric Food Chem 46:834–8. Freitas SKB, Nascimento ECL, Dion´ızio AGG, Gomes AA, Ara´ujo MCU, Galv˜ao RKH. 2013. A flow-batch analyzer using a low cost aquarium pump for classification of citrus juice with respect to brand. Talanta 107: 45–8. Fuleki T, Ricardo-da-Silva JM. 2003. Effects of cultivar and processing method on the contents of catechins and procyanidins in grape juice. J Agric Food Chem 51:640–6. Gan HH, Soukoulis C, Fisk I. 2014. Atmospheric pressure chemical ionisation mass spectrometry analysis linked with chemometrics for food classification—a case study: geographical provenance and cultivar classification of monovarietal clarified apple juices. Food Chem 146:149–56. Garcia-Parrilla MC, Gonzalez GA, Heredia FJ, Troncoso AM. 1997. Differentiation of wine vinegars based on phenolic composition. J Agric Food Chem 45:3487–92. Garcia-Wass F, Hammond D, Mottram DS, Gutteridge CS. 2000. Detection of fruit juice authenticity using pyrolysis mass spectroscopy. Food Chem 69:215–20. Giacomino A, Abollino O, Malandrino M, Mentasti E. 2011. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques. Anal Chim Acta 688:122–39. Goll¨ucke APB, Catharino RR, Souza JC, Eberlin MN, Tavarez DQ. 2009. Evolution of major phenolic components and radical scavenging activity of grape juices through concentration process and storage. Food Chem 112:868–73. Granato D, Katayama FCU, Castro IA. 2010. Assessing the association between phenolic compounds and the antioxidant activity of Brazilian red wines using chemometrics. LWT-Food Sci Technol 43:1542–9. Granato D, Branco GF, Faria JAF, Cruz AG. 2011a. Characterization of Brazilian lager and brown ale beers based on color, phenolic compounds, and antioxidant activity using chemometrics. J Sci Food Agric 91: 563–71. Granato D, Katayama FCU, Castro IA. 2011b. Phenolic composition of South American red wines classified according to their antioxidant activity, retail price and sensory quality. Food Chem 129:366–73. Granato D, Katayama FCU, Castro IA. 2012. Characterization of red wines from South America based on sensory properties and antioxidant activity. J Sci Food Agric 92:526–33. Granato D, Calado VMA, Jarvis B. 2014a. Observations on the use of statistical methods in Food Science and Technology. Food Res Intl 55:137–49. Granato D, Oliveira CC, Calado VMA, Ares G. 2014b. Statistical approaches to assess the association between phenolic compounds and the in vitro antioxidant activity of Camellia sinensis and Ilex paraguariensis teas. Crit Rev Food Sci Nut. doi: 10.1080/10408398.2012.750233. Granato D, Oliveira CC, Caruso MSF, Nagato LAF, Alaburda J. 2014c. Feasibility of different chemometric techniques to differentiate commercial Brazilian sugarcane spirits based on chemical markers. Food Res Intl. doi: 10.1016/j.foodres.2013.09.044. Guti´errez-Capit´an M, Santiago JL, Vila-Planas J, Llobera A, Boso S, Gago P, Mart´ınez MC, Jim´enez-Jorquera C. 2013. Classification and characterization of different White grape juices by using a hybrid electronic tongue. J Agric Food Chem 61:9325–32. Halliwell B, Gutteridge JMC. 2001. Free-radicals in biology and medicine. New York, N.Y.: Oxford Univ. Press. He J, Rodrigues-Saona LE, Giusti M. 2007. Mid-infrared spectroscopy for juice authentication—rapid differentiation of commercial juices. J Agric Food Chem 55:4443–52.

314 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®

Chemometrics, analytics, and fruit juices . . . Jandric Z, Roberts D, Rathor MN, Abrahim JA, Islam M, Cannavan A. 2014. Assessment of fruit juice authenticity using UPLC-ToF MS: a metabolomics approach. Food Chem 148:7–17. de Jond S. 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometr Intell Lab Sys 18:251–63. Jung KJ, Wallig MA, Singletary KW. 2006. Purple grape juice inhibits 7,12-dimethylbenz[a]anthracene (DMBA)-induced rat mammary tumorigenesis and in vivo DMBA-DNA adduct formation. Cancer Lett 233:279–88. Kallithraka S, Arvanitoyannis IS, Kefalas P, El-Zajouli A, Soufleros E, Psarra E. 2001. Instrumental and sensory analysis of Greek wines; implementation of principal component analysis (PCA) for classification according to geographical origin. Food Chem 73:501–14. Kallithraka S, Mamalos A, Makris D. 2007. Differentiation of young red wines based on chemometrics of minor polyphenolic constituents. J Agric Food Chem 55:3233–9. Kelly JFD, Downey G. 2005. Detection of sugar adulterants in apple juice using Fourier transform infrared spectroscopy and chemometrics. J Agric Food Chem 53:3281–6. ´ Dani C, Funchal C. 2012. Lacerda D, Buchner I, Gomez R, Henriques JA, Antioxidant and hepatoprotective effect of organic and conventional grape juice in rats supplemented with high fat diet. Free Radical Bio Med 53:S96–7. Lawless HT, Heymann H. 2010. Sensory evaluation of food: principles and practices. New York, N.Y.: Springer. p 596. Le Gall G, Puaud M, Colquhoun IJ. 2001. Discrimination between orange juice and pulp wash by 1H nuclear magnetic resonance spectroscopy: identification of marker compounds. J Agric Food Chem 49:580–8. Liu HF, Wu BH, Fan PG, Li SH, Li LS. 2006. Sugar and acid concentrations in 98 grape cultivars analyzed by principal component analysis. J Sci Food Agric 86:1526–36. Liu M, Wang J, Li D, Wang M. 2012. Electronic tongue coupled with physicochemical analysis for the recognition of orange beverages. J Food Quality 35:429–41. Longobardi L, Ventrella A, Bianco A, Catucci L, Cafagna I, Gallo V, Mastrorilli P, Agostiano A. 2013. Non-targeted 1H NMR fingerprinting and multivariate statistical analyses for the characterization of the geographical origin of Italian sweet cherries. Food Chem 141:3028–33. Macedo LFL, Rogero MM, Guimar˜aes JP, Granato D, Lobato LP, Castro IA. 2013. Effect of red wines with different in vitro antioxidant activity on oxidative stress of high-fat diet rats. Food Chem 137:122–9. Mangas JJ, Suarez B, Picinelli A, Moreno J, Blanco D. 1997. Differentiation by phenolic profile of apple juices prepared according to membrane techniques. J Agric Food Chem 45:4777–84. Marafon AP, Sumi A, Granato D, Alcˆantara MR, Tamime AY, Oliveira MN. 2011. Effects of partially replacing skimmed milk powder with dairy ingredients on rheology, sensory profiling, and microstructure of probiotic stirred-type yogurt during cold storage. J Dairy Sci 94:5330–40. Mark H, Workman Jr J. 2007. Chemometrics in spectroscopy. London: Academic Press. Martina V, Ionescu K, Pigani L, Terzi F, Ulrici A, Zanardi C, Seeber R. 2007. Development of an electronic tongue based on a PEDOT-modified voltammetric sensor. Anal Bioanal Chem 387:2101–10. Mel´endez E, Ortiz MC, Sarabia LA, ´In˜ iguez M, Puras P. 2013. Modeling phenolic and technological maturities of grapes by means of the multivariate relation between organoleptic and physicochemical properties. Anal Chim Acta 761:53–61. Mena P, Garc´ıa-Viguera C, Navarro-Rico J, Moreno DA, Bartual J, Saura D, Mart´ı N. 2011. Phytochemical characterization for industrial use of pomegranate (Punica granatum L.) cultivars grown in Spain. J Sci Food Agric 91:1893–906. Meng JF, Xu TF, Qin MY, Zhuang XF, Fang YL, Zhang ZW. 2012. Phenolic characterization of young wines made from spine grape (Vitis davidii Foex) grown in Chongyi County (China). Food Res Intl 49:664–71. Nunes CA. 2014. Vibrational spectroscopy and chemometrics to assess authenticity, adulteration and intrinsic quality parameters of edible oils and fats. Food Res Intl. doi: 10.1016/j.foodres.2013.08.041. in press. Nunes CA, Freitas MP, Pinheiro ACM, Bastos SC. 2012. Chemoface: a novel free user-friendly interface for chemometrics. J Braz Chem Soc 23:2003–10. Oliveri P, Downey D. 2012. Multivariate class modeling for the verification of food-authenticity claims. Trends Anal Chem 35:74–86.

 C 2014 Institute of Food Technologists®

Olivieri AC, Goicoechea HC, Inon FA. 2004. MVC1: an integrated MatLab toolbox for first-order multivariate calibration. Chemometr Intell Lab 73:189–97. Oms-Oliu G, Odriozola-Serrano I, Mart´ın-Belloso O. 2013. Metabolomics for assessing safety and quality of plant-derived food. Food Res Intl 54:1172–83. Onwezen MC, Reinders MJ, van der Lans IA, Sijtsema SJ, Jasiulewicz A, Guardia MD, Guerrero L. 2012. A cross-national consumer segmentation based on food benefits: the link with consumption situations and food perceptions. Food Qual Pref 24:276–86. Parente E. 2011. Multivariate statistical tools for chemometrics. In: Fuquay JW, Fox PF, McSweeney PLH, editors. Encyclopedia of dairy sciences. San Diego: Academic Press. p 93–108. Park YK, Kim JS, Kang MH. 2004. Concord grape juice supplementation reduces blood pressure in Korean hypertensive men: double-blind, placebo controlled intervention trial. Biofactors 22:145–7. Park YK, Lee SH, Park E, Kim JS, Kang MH. 2009. Changes in antioxidant status, blood pressure, and lymphocyte DNA damage from grape juice supplementation. Ann NY Acad Sci 1171:385–90. Pellerano RG, Mazza SS, Marigliano RA, Marchevsky EJ. 2008. Multielement analysis of Argentinean lemon juices by instrumental neutronic activation analysis and their classification according to geographical origin. J Agric Food Chem 56:5222–5. Pillonel L, Luginb¨uhl W, Picque D, Schaller E, Tabacchi R, Bosset JO. 2003. Analytical methods for the determination of the geographic origin of Emmental cheese: mid- and near-infrared spectroscopy. Eur Food Res Technol 216:174–8. Portal Action. 2013. User’s Manual. Available from: http://www.portalaction.com.br/. Accessed 2013 August 30. Reid LM, O’Donnell CP, Kelly JD, Downey G. 2004. Preliminary studies for the differentiation of apple juice samples by chemometric analysis of solid-phase microextraction-gas chromatographic data. J Agric Food Chem 52:6891–6. Reid LM, Woodcock T, O’Donnell CP, Kelly JD, Downey G. 2005. Differentiation of apple juice samples on the basis of heat treatment and variety using chemometric analysis of MIR and NIR data. Food Res Intl 38:1109–15. Rodrigo R, Castillo R, Carrasco R, Huerta, P, Moreno M. 2005. Diminution of tissue lipid peroxidation in rats is related to the in vitro antioxidant capacity of wine. Life Sci 76:889–900. Rodriguez-Campos J, Escalona-Buend´ıa HB, Orozco-Avila I, Lugo-Cervantes E, Jaramillo-Flores ME. 2011. Dynamics of volatile and non-volatile compounds in cocoa (Theobroma cacao L.) during fermentation and drying processes using principal components analysis. Food Res Intl 44:250–8. Ros JM, Laencina J, Hell´ın P, Jord´an MJ, Vila R, Rumpunen K. 2004. Characterization of juice in fruits of different Chaenomeles species. LWT-Food Sci Technol 37:301–7. van Ruth SM, Frasnelli J, Carbonell L. 2008. Volatile flavour retention in food technology and during consumption: juice and custard examples. Food Chem 106:1385–92. Seiden P, Bro R, Poll L, Munck L. 1996. Exploring fluorescence spectra of apple juice and their connection to quality paramenters by chemometrics. J Agric Food Chem 44:3202–5. Shanmuganayagam D, Warner TF, Krueger CG, Reed JD, Folts JD. 2007. Concord grape juice attenuates platelet aggregation, serum cholesterol and development of atheroma in hypercholesterolemic rabbits. Atherosclerosis 190:135–42. Shirolkar A, Gahlaut A, Hooda V, Dabur R. 2013. Phytochemical composition changes in untreated stem juice of Tinospora cordifolia (W) Mier during refrigerated storage. J Pharm Res 7:1–6. Sim´o C, Mart´ın-Alvarez P, Barbas C, Cifuentes A. 2004. Application of stepwise discriminant analysis to classify commercial orange juices using chiral micellar electrokinetic chromatography-laser induced fluorescente data of amino acids. Electrophoresis 25:2885–91. Simpkins WA, Louie H, Wu M, Harrison M, Goldberg D. 2000. Trace elements in Australian orange juices and others products. Food Chem 71:423–33. Snyder AB, Sweeney CF, Rodriguez-Saona LE, Giusti M. 2014. Rapid authentication of concord juice concentration in a grape juice blend using Fourier-transform infrared spectroscopy and chemometric analysis. Food Chem 147:295–301.

Vol. 13, 2014 r Comprehensive Reviews in Food Science and Food Safety 315

Chemometrics, analytics, and fruit juices . . . Sun T, Chen QY, Wu LJ, Yao XM, Sun XJ. 2012. Antitumor and antimetastatic activities of grape skin polyphenols in a murine model of breast cancer. Food Chem Toxicol 50:3462–7. Tamborra P, Esti M. 2010. Authenticity markers in Aglianico, Uva di Troia, Negroamaro and Primitivo grapes. Anal Chim Acta 660:221–6. Vaclavik L, Schreiber A, Lacina O, Cajka T, Hajslova J. 2012. Liquid chromatography–mass spectrometry-based metabolomics for authenticity assessment of fruit juices. Metabolis 8:793–803. Vaira S, Mantovani VE, Robles JC, Sanchis JC, Goicoechea HC. 1999. Use of chemometrics: principal component analysis (PCA) and principal component regression (PCR) for the authentication of orange juice. Anal Lett 32:3131–41. Vallverd´u-Queralt A, Medina-Rem´on A, Casals-Ribes I, Lamuela-Raventos RM. 2012. Is there any difference between the phenolic content of organic and conventional tomato juices? Food Chem 130:222–7. Vallverd´u-Queralt A, Bendini A, Tesini F, Valli E, Lamuela-Raventos RM, Toschi TG. 2013. Chemical and sensory analysis of commercial tomato juices present on the Italian and Spanish markets. J Agric Food Chem 61:1044–50. Vandeginste BGM, Massart DL, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J. 1998. Handbook of chemometrics and qualimetrics: Part B. Chapter 33. Amsterdam: Elsevier. Varela P, Ares G. 2012. Sensory profiling, the blurred line between sensory and consumer science. A review of novel methods for product characterization. Food Res Int 48:893–908. Vargas PN, Hoelzel SC, Rosa CS. 2008. Determinac¸a˜ o do teor de polifen´ois totais e atividade antioxidante em sucos de uva comerciais. Alim Nutr 19:11–5. Varmuza K, Filzmoser P. 2008. Introduction to multivariate statistical analysis in chemometrics. 1st ed. Boca Raton, Fla.: CRC Press. Vedana MIS, Ziemer C, Miguel OG, Portella AC, Candido LMB. 2008. Efeito do processamento na atividade antioxidante de uva. Alim Nutr 19:159–65. Versari A, Parpinello GP, Mattioli AU, Galasi S. 2008. Characterization of Italian commercial apricot juices by high-performance liquid chromatography analysis and multivariate analysis. Food Chem 108: 334–40. Vigneau E, Thomas F. 2012. Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy. Chemometr Intell Lab 117:22–30.

Vigneau E, Charles M, Chen M. 2014. External preference segmentation with additional information on consumers: a case study on apples. Food Qual Pref 32:83–92. Visschers VHM, Hartmann C, Leins-Hess R, Dohle S, Siegrist M. 2013. A consumer segmentation of nutrition information use and its relation to food consumption behaviour. Food Pol 42:71–80. Vogels JTWE, Terwel L, Tas AC, van den Berg F, Dukel F, van der Greef J. 1996. Detection of adulteration in orange juices by a new screening method using proton NMR spectroscopy in combination with pattern recognition techniques. J Agric Food Chem 44:175–80. Witten IH, Frank E, Hall MA. 2011. Data mining: practical machine learning tools and techniques. Available from: www.cs.waikato.ac.nz/ml/weka. Accessed 2013 September 5. Wold S, Sj¨ostr¨om M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab 58:109–30. Worch T. 2013. PrefMFA, a solution taking the best of both internal and external preference mapping techniques. Food Qual Pref 30: 180–91. Wu D, He Y, Nie P, Cao F, Bao Y. 2010. Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice. Anal Chim Acta 659:229–37. Available from: http://www.sciencedirect.com/science/article/pii/S0003267009015566. Yildirim HK, Akc¸ay YD, G¨uvenc¸ U, Altindisli A, S¨ozmen EY. 2005. Antioxidant activities of organic grape, pomace, juice, must, wine and their correlation with phenolic content. Intl J Food Sci Tech 40:133–42. Zhang X, Dagevos H, He Y, van der Lans I, Zhai F. 2008. Consumption and corpulence in China: a consumer segmentation study based on the food perspective. Food Pol 33:37–47. Zhang X, Huang J, Qiu H, Huang Z. 2010. A consumer segmentation study with regards to genetically modified food in urban China. Food Pol 35:456–62. Zhao M, Downey G, O’Donnell CP. 2014. Detection of adulteration in fresh and frozen beefburger products by beef offal using midinfrared ATR spectroscopy and multivariate data analysis. Meat Sci 96:1003–11. Zielinski AAF, Haminiuk CWI, Alberti A, Nogueira A, Demiate IM, Granato D. 2014. A comparative study of the phenolic compounds and the in vitro antioxidant activity of different Brazilian teas using multivariate statistical techniques. Food Res Intl. doi: 10.1016/j.foodres.2013.09.010.

316 Comprehensive Reviews in Food Science and Food Safety r Vol. 13, 2014

 C 2014 Institute of Food Technologists®