Artificial neural network and multiple regression ...

Journal of Applied Research on Medicinal and Aromatic Plants 9 (2018) 124–131

Contents lists available at ScienceDirect

Journal of Applied Research on Medicinal and Aromatic Plants journal homepage: www.elsevier.com/locate/jarmap

Artificial neural network and multiple regression analysis models to predict essential oil content of ajowan (Carum copticum L.)

T

⁎

Mohsen Niaziana,b, , Seyed Ahmad Sadat-Nooria, Moslem Abdipourc a

Department of Agronomy and Plant Breeding Science, College of Aburaihan, University of Tehran, Tehran-Pakdasht, Iran Department of Tissue and Cell Culture, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), 3135933151 Karaj, Iran c Kohgiluyeh and Boyerahmad Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Yasuj, Iran b

A R T I C LE I N FO

A B S T R A C T

Keywords: Ajowan Artificial neural network Multiple linear regressions Oil content Selection criteria Trachyspermum ammi L.

Ajowan is an important medicinal plant that grows in arid and semi-arid regions of central Europe, India, Egypt, Iran, Iraq, Afghanistan, and Pakistan. Essential oil is the most consumable product of ajowan in pharmaceutical and food industrials, and correct predict of oil content is one of the main goals in breeding programs of ajowan. Two methods namely artificial neural network (ANN) and multiple regression model (MLR) were conducted to predict the oil content of ajowan from readily measurable plant characters. According to simple correlation analysis, four characters (number of rays, number of pedicels, number of flowers per umbellet, and number of umbellets in an umbel) were selected as input variables in both artificial neural network and multiple linear regressions models. The essential oil content of ajowan was well predicted using SigmoidAxon transfer function and two hidden layers of artificial neural network with a root mean square error (RMSE) of 0.192%, a mean absolute error (MAE) of 0.112% and a determination coefficient (R2) of 0.901. The performance of ANN was better than MLR with a RMSE of 0.262 and a R2 of 0.748. Based on stepwise regression and ANN analyses the most important characters for oil content of ajowan were number of umbellets in an umbel and number of flowers per umbellet and these traits can be assigned as selection criteria for essential oil content of ajowan.

1. Introduction Ajowan (Carum copticum L.) is one the important industrial medicinal plants belong to Apiaceae family that can used in raw or processed forms in traditional medicine or modern pharmaceutical industry (Niazian et al., 2017a). This plant is mainly grows in arid and semi-arid regions of the east of India, northwest, central and eastern parts of Iran, central Europe, Iraq, Afghanistan, and Pakistan and also in Egypt (Ashraf and Orooj, 2006; Boskabady et al., 2014; Joshi, 2000; Moosavi et al., 2015). Ajowan seeds contain an essential oil with about 50% content of thymol, which has a strong germicide, anti-spasmodic and fungicidal effect (Ashraf and Orooj, 2006). Many of the medicinal and aromatic plants do not have stable production in their growing areas and usually are wild harvested to meet demands (Dalkani et al., 2012; Niazian et al., 2017b). Hence, attention to stable quality and quantity production of medicinal plants is important to respond to growing

demands of pharmaceutical needs. Seed is the most important part of ajowan. Positive correlation of seed yield and essential oil content have been reported in ajowan (Fadaei Heidari et al., 2016), so seed yield and essential oil content are the most important breeding objectives in this plant. Seed yield and oil content are quantitative and complicate traits that are controlled with many genes and mainly affected by environmental conditions (Dalkani et al., 2011), that lead to low heritability of this traits. For such traits with low heritability, indirect selection through yield components and their association is the first choice of plant breeders, which help them to indirectly increase their desired traits (Dalkani et al., 2011). There are several methods for analysis of yield components that according to the objectives of the project can be chosen. Techniques such as analysis of variance, simple correlation coefficient, multiple regression and path analysis usually used to analyze yield component (Fraser and Eaton, 1983). One of the simplest methods that can help to better understand of yield component and

Abbreviations: ANN, artificial neural network; BY, biological yield; IL, average internodal length; LL, leaf length; MAE, mean absolute error; MLP, multi layered perception; MLR, multiple linear regressions; MSE, mean square error; NFU, number of flower per umbellet; NP, number of pedicels; NR, number of rays; NU, number of umbels; NUU, number of umbellets in an umbel; OC, oil content; PH, plant height; RMSE, root mean square error; SPSY, single plant seed yield ⁎ Corresponding author at: Department of Agronomy and Plant Breeding Science, College of Aburaihan, University of Tehran, Pakdasht-Tehran, Iran. E-mail address: [email protected] (M. Niazian). https://doi.org/10.1016/j.jarmap.2018.04.001 Received 5 November 2017; Received in revised form 3 April 2018; Accepted 5 April 2018 Available online 24 April 2018 2214-7861/ © 2018 Elsevier GmbH. All rights reserved.

125

Karaj Esfahan Ardabil1 Flavarjan Ardabail2 Hamadan Shahedye Sadogh Ghazvin Yazd Rafsanjan1 Rafsanjan2 Yazd2 Sarbisheh Birjand1 Birjand2 Ghaen Boshroye Sarbishe2 Ghoom Shiraz Arak Marvdasht

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

86.66 93.33 105.33 80.00 51.00 50.66 99.00 83.33 62.66 76.66 45.00 80.00 82.33 39.00 53.33 57.66 40.00 61.00 95.00 70.00 59.00 30.33 66.33

90.66 106.33 109.00 77.33 47.00 46.667 78.33 83.33 71.33 92.66 57.00 85.33 93.33 52.00 58.66 71.33 50.00 56.66 81.00 63.00 58.33 28.33 77.33

10.66 9.33 6.33 7.33 6.66 5.33 8.33 5.66 7.33 7.33 5.66 6.00 6.66 8.33 8.33 6.33 10.66 10.66 8.33 5.33 10.66 8.66 5.66

2014

2014

2015

NR

PH (cm)

8.33 6.33 6.33 8.00 6.33 6.00 6.66 4.66 7.66 6.66 6.66 7.66 6.66 8.33 7.33 5.33 8.00 7.00 7.00 6.33 7.00 6.33 6.00

2015 5.00 6.00 6.00 6.00 3.66 4.33 4.33 5.00 4.66 6.00 5.30 5.30 7.00 5.30 5.00 5.00 5.00 5.00 5.00 4.66 5.00 4.00 5.00

2014

NP

5.00 6.00 6.00 5.00 4.00 5.00 5.00 6.00 5.00 5.00 6.00 4.66 4.66 6.00 5.33 4.30 6.66 5.33 5.66 4.66 5.00 4.00 5.00

2015 14.33 13.00 11.33 11.66 12.00 9.00 14.60 11.00 13.00 13.66 9.00 14.00 12.30 12.33 8.33 11.00 11.33 11.66 13.00 12.66 14.20 9.33 7.32

2014

NFU

14.32 13.33 10.66 11.33 12.66 9.00 14.10 12.00 14.66 12.30 8.33 14.00 12.00 13.00 10.00 11.66 14.66 14.20 8.33 13.33 14.00 8.33 9.33

2015 21.00 17.00 17.33 18.33 15.66 18.33 21.00 19.00 21.66 19.33 14.00 19.66 19.33 21.00 18.00 17.66 23.33 21.66 19.66 20.00 21.66 17.66 16.20

2014

NU

22.00 17.00 17.00 18.41 17.66 17.00 20.66 19.00 22.66 16.66 14.66 19.33 20.33 21.33 20.00 17.33 23.33 21.66 19.00 16.33 23.45 16.66 16.33

2015 7.66 5.66 5.00 5.66 3.00 5.66 7.66 5.00 5.00 5.66 3.33 4.10 5.00 5.00 4.33 4.33 7.33 5.00 3.00 5.33 7.33 3.00 4.66

2014

NUU

7.33 4.00 5.00 5.30 4.00 4.33 6.66 4.00 4.33 5.66 2.20 4.00 4.66 5.00 5.00 5.66 7.90 5.00 4.66 5.00 7.00 4.00 4.33

2015 3.66 4.66 5.66 5.66 4.00 5.66 4.33 3.66 4.33 4.33 4.66 4.00 7.33 3.33 5.66 3.66 4.00 3.83 3.66 5.00 3.33 1.33 4.83

2014

LL (cm)

4.00 4.00 4.66 6.00 4.66 6.00 4.33 3.66 4.00 4.66 6.00 4.33 7.00 3.50 4.50 3.83 4.33 4.50 5.00 4.33 4.66 2.66 5.33

2015 7.66 9.33 6.66 5.60 5.00 7.66 5.60 6.60 7.00 5.60 7.66 7.33 8.66 7.33 4.50 5.00 7.66 9.33 6.66 6.66 4.66 7.00 7.66

2014

IL (cm)

7.166 9.33 10.66 6.50 5.63 6.66 7.33 5.33 6.83 6.00 4.00 7.50 7.33 8.66 7.16 5.00 6.00 6.50 8.66 8.00 6.33 4.83 8.00

2015 206.96 147.80 99.60 78.10 118.26 16.90 183.80 184.10 34.80 167.16 40.10 166.60 145.60 46.26 49.23 131.79 261.56 187.43 108.83 91.56 177.00 32.23 77.43

2014

BY (g)

197.67 135.87 99.10 87.33 110.8 23.00 181.33 162.33 65.00 158.6 36.67 158.07 137.03 43.33 54.27 146.93 277.77 166.77 110.93 95.33 165.43 49.00 78.07

2015

100.33 41.46 36.10 39.76 51.66 14.63 85.00 71.93 15.66 38.23 16.80 46.90 52.40 20.76 10.53 48.46 111.06 80.80 35.70 40.70 162.93 10.33 15.00

2014

SPSY (g)

102.97 50.33 35.20 45.83 44.77 17.73 76.40 65.00 24.37 54.07 15.30 52.27 56.93 23.20 30.90 45.83 78.96 84.07 37.10 40.15 143.47 13.80 16.23

2015

4.33 3.76 2.17 2.41 2.66 1.60 3.46 2.25 1.40 2.78 1.56 2.61 2.56 1.39 2.16 2.66 3.81 3.53 1.9 2.16 4.13 1.55 1.19

2014

OC (%)

4.35 3.20 2.23 2.52 2.56 1.23 3.07 1.83 1.20 2.50 1.43 2.51 2.68 1.10 2.16 2.58 3.68 3.74 1.51 2.16 3.47 1.23 1.04

2015

PH: plant height, NR: number of rays, NP: number of pedicels, NFU: number of flowers per umbellet, NU: number of umbels, NUU: number of umbellets in an umbel, LL: leaf length, IL: average internodal length, BY: biological yield, SPSY: single plant seed yield, OC: oil content.

Ecotype

No.

Table 1 Means of all recorded characteristics of ajowan in 2014 and 2015 growing seasons.

M. Niazian et al.



M. Niazian et al.

growing seasons. In each experiment 23 Iranian ajowan ecotypes (Table 1) were cultivated in randomized complete block design (RCBD) with three replicates. Each plot was contains four rows with seven m length and with 60 and 10 cm distance between and within lines, respectively. In each experiment, some important characters including oil content (OC) and associated characters such as plant height (PH), number of rays (NR), number of pedicels (NP), number of umbels (NU), number of umbellets in an umbel (NUU), number of flowers per umbellet (NFU), leaf length (LL), average internodal length (IL), biological yield (BY) and single plant seed yield (SPSY), were recorded separately. Number of flowers was counted from randomly selected umbellets. For extraction of essential oil, a sample consist of a mixture of seed of four accessions from each ecotype (20 g) was ground using an electric grinder, then achieved fine powder was added to 500 ml distilled water on top of a heater at 100 °C and oil was extracted using a Clevengertype 5 apparatus (Noori et al., 2017) for 2.5 h. Each extraction was conducted in three replications. The percent of extracted essential oil was calculated based on grounded seeds. This field data were used for training and testing of the ANN and MLR models.

assists in effective selection is correlation coefficient analysis (Mishra et al., 2015). Since the relationship between the two traits may be reliant on a third trait, so correlation coefficient solely is not informative enough to explain the cause and effect relationships among the variables (Bahmani et al., 2015). However, the reasonable explanation of observed correlations can be achieved using of the path analysis (Bahmani et al., 2015). Path analysis as a statistical method of testing cause/effect relationships among correlated variables has been used frequently in medicinal plants (Bhandari and Gupta, 1991; Cosge et al., 2009; Dalkani et al., 2011; Lal, 2007). The main problem of regression-based models is that they cannot explain highly nonlinear and complex relationship between seed yield and its components (Emamgholizadeh et al., 2015). To overcome this problem, in the recent years artificial intelligence (AI) models, such as artificial neural networks (ANN), genetic expression programming (GEP) and adaptive neuro-fuzzy inference system (ANFIS) were used (Azamathulla and Ghani, 2011; Emamgholizadeh et al., 2013a, [Emamgholizadeh et al.,2013b]; Iquebal et al., 2014; Samadianfard et al., 2014; Shahinfar et al., 2012; Silva et al., 2014; Niazian et al., 2018a,[Niazian et al.,2018b]). Artificial neural network is an intelligence model that acts like human brain (Tufail et al., 2008). Neurons or nodes are the simple processing elements that together make an ANN. The Neurons of an ANN connected to each other through direct communication links. Each communication link has its own weight, made by transfer functions (Safa et al., 2015). The weight of communication links indicate to used information by net to solve the problem (Emamgholizadeh et al., 2015). Furthermore, according to training convergence in an ANN, different algorithm can be used (Govindaraju, 2000a,[Govindaraju,2000b]). One of the ANN types that is commonly used in agriculture researches is MLP paradigm (Emamgholizadeh et al., 2015; Naroui Rad et al., 2015; Safa et al., 2015). The MLP consist of an input layer, hidden layer(s) and an output layer (Emamgholizadeh et al., 2015). MLP use some nonlinear functions to transform m inputs to n outputs. Two important factors that can affect performance of MLP are the number of hidden layers and also the type of transfer function. One or two hidden layers are sufficient in the most of the cases (Erzin et al., 2008). Various criteria are used to assess the performance of ANN with variable hidden layers and transfer functions but MSE, RMSE, MAE and R2 are the most used criteria (Safa et al., 2015). Because of this fact that many herbal medicines are free from side effects, the interest in plant products has considerably increased all over the world (Ashraf and Orooj., 2006), but this increasing demand needs the production of medicinal plants in stable quantity and quality. In ajowan, essential oil content and its composition can be affected by genetic as well as climatic factors and soil conditions (Rahimmalek et al., 2017). High variability of genetic and climatic factors can lead to non-deterministic and non-linear nature of developmental processes of biological entities (Prasad et al., 2016). Some conventional statistical approaches such as correlation coefficient, multi linear regression analysis and path analysis can be used for indirect improvement of quantitative and complex traits but this methods have some shortage that complicated methods such as artificial neural network method can help to understand these complex and unpredictable developmental phenomenon of biological systems. The objectives of the present study were (a) predict essential oil content of ajowan medicinal plant using artificial neural network method, (b) compare the predicted results with the results of conventional regression-based method, and (c) find the most important selection criteria(s) for essential oil content of Iranian ajowan ecotypes using ANN and MLR models.

2.2. Statistical analysis Before the analysis of variance, raw data were subjected to normality test using SAS software (Ver. 9.1) (Cary, 2004) and then the analysis of variance was conducted for collected data of each year. The combined means matrix of ecotypes and traits was used to calculate simple correlation among all investigated characters. The simple Pearson correlation coefficients analysis was conducted using SAS software. The ANN computing was conducted by Neuro-Solutions version 5.0 (www.nd.com), using essential oil as dependent variable and the remaining traits as independent variables. The training and testing of ANN were performed using 552 collected samples from four accessions of each ecotype in 2014 and 2015. Therefore, the field experiment data set (552 data) was based on the four accessions of 23 ajowan ecotypes were studied over two years in three replications. All 552 field data were randomly divided to two sections, 70% (386 data points) for training, and 30% (167 data points) for network test. An overview of the mean of each trait, studied in two years of experiment, is presented in Table 1. 2.3. Artificial neural network Before training the neural network, the input data were normalized. The purpose of normalization is that the data be converted to numbers between 0 and 1. Because in this study, for the processing elements (neurons) in the hidden layer, SigmoidAxon threshold function with the output between 0 and 1 was selected, so, input data were also arranged between 0 and 1. For inputs near to 0 and 1, neurons weight’s changes would be in minimal, but for inputs near to 0.5, neurons response to input signal will be faster (Wosten et al., 1999). Considering this fact, data normalization was conducted in such a way that the mean of data series reach to 0.5. For this purpose, the following equation (Eq. (1)) was used to normalization.

x norm = 0.5(

x0 − x ) + 0.5 x max − x min

(1)

where x norm is the normalized value for Xo input, x is the mean of data, x max and x min are the maximum and minimum values of data, respectively. The multi-layer perceptron of ANN with Levenberg-Marquardt learning algorithm and eight different activation functions including SigmoidAxon, Axon, LinearSigmoidAxon, TainhAxon, BiasAxon, LinerAxon, LinerTanhAxon and SoftMaxAxon (Table 2) was used in the present study. The data set of four independent variables include NR, NP, NFU and NUU (that were significantly correlated with essential oil content) were

2. Materials and methods 2.1. Plant material and field experiments Two field experiments were conducted in agricultural research field in College of Aburaihan-University of Tehran in 2014 and 2015 126


M. Niazian et al.

hidden layers were assessed with RMSE (Eq. (2)), MAE (Eq. (3)) and R2 (Eq. (4)).

Table 2 Neuron activate function. Activation function

Formula

n

SigmoidAxon

f (xi , wi ) =

Axon LinearSigmoidAxon

f (xi , wi ) = xi

f (xi , wi ) = TainhAxon

f (xi , wi ) = f (xi , wi ) = f (xi , wi ) =

BiasAxon LinerAxon LinerTanhAxon

1 1 + exp[−xilin ]

⎛− f (xi , wi ) = ⎜ ⎜⎜ ⎝

SoftMaxAxon

f (xi , wi ) =

MAE =

lin ⎛ 0x i ≺0 ⎜ 1x lin ≻1 i ⎜⎜ lin ⎝ x i else tanh [x ilin] xi + wi βxi + wi

1x ilin ≺ − 1x ilin ≻1 x ilin else

∑i = 1 (Oi − Pi )2

RMSE =

1 n

n

(2)

∑i =1 |Oi − Pi|

(3)

n

n

R2 =

∑i = 1 (Oi − O )(Pi − P ) n ∑i = 1

n

(Oi − O )2 ∑i = 1 (Pi − P )2

(4)

where n is the number of data, Oi is the observed values, Pi is the predicted values and the bar denote the mean of the variable.

1

2.4. Multiple linear regressions

exp [xilin ] ∑j exp [xilin ]

In order to obtain a multiple regression model for the essential oil content, using independent variables, a stepwise regression analysis was generated using SAS software. The stepwise regression model was carried out using the same inputs and output data used for the MLPANN model. The independency of errors is one of the hypotheses in regression. In the present study, Durbin-Watson test was used to evaluate the independence errors. Durbin-Watson value of 1.85 showed the independency of errors. The variance Inflation Factor (VIF) for independent variables in model 4 (Table 6) showed that these variables are not collinear. Kolmogorov-Smirnov test is also used to assess the normality of dependent variable using SAS software. The results of this test showed that data were not normal and logarithm transformation was conducted to normalize data. 3. Results and discussion 3.1. Simple correlation coefficient The results of simple correlation analysis showed that biological yield and single plant yield has the highest correlations with essential oil content of Iranian ajowan ecotypes (Table 3). Number of flowers per umbellet and number of umbellets in an umbel had significant correlation with oil content at 1% probability level (r = 0.67 and 0.61, respectively). Number of rays and number of pedicels also had significant correlation with oil content at 5% probability level (Table 3). Fadaei Heidari et al. (2016) reported a high positive correlation between essential oil and 1000 seed weight in different Iranian ajowan ecotypes. Ghanshyam et al. (2015) used genotypic and phenotypic correlation

Fig. 1. An umbel of ajowan medicinal plant with its components as inpouts of ANN and MLR models.

directly fed to input layer of ANN and then expected result of OC was produced in output layer. Fig. 1 shows an umbel of ajowan with four mentioned inputs. Different transfer functions and numbers of hidden layers were tested to find the optimal performance of the final model. The performance of ANN with different transfer functions and different

Table 3 Correlation coefficient analysis for investigated traits in Iranian ecotypes of ajowan. Character

PH

NR

NP

NFU

NU

NUU

LL

IL

BY

SPSY

OC

PH NR NP NFU NU NUU LL IL BY SPSY OC

1.00 −0.03ns 0.32ns 0.29ns −0.03ns 0.16ns 0.23ns 0.93** 0.41* 0.28ns 0.29ns

1.00 0.19ns 0.50* 077** 0.54** −0.14ns 0.13ns 0.41* 0.44* 0.43*

1.00 0.60** 0.40* 0.17ns 0.26ns 0.47* 0.41* 0.61** 0.40*

1.00 0.68** 0.57** −0.37ns 0.30ns 0.70** 0.69** 0.67**

1.00 0.56** −0.37ns 0.30ns 0.50* 0.52** 0.39ns

1.00 −0.06ns 0.17ns 0.53** 0.53** 0.61**

1.00 0.05ns −0.31ns −0.27ns −0.12ns

1.00 0.1ns −0.09ns 0.10ns

1.00 0.93** 0.84**

1.00 0.82**

1.00

PH: plant height, NR: number of rays, NP: number of pedicels, NFU: number of flowers per umbellet, NU: number of umbels, NUU: number of umbellets in an umbel, LL: leaf length, IL: average internodal length, BY: biological yield, SPSY: single plant seed yield, OC: oil content. * Significant at 5% probability level. ** Significant at 1% probability level. ns Not significant. 127


M. Niazian et al.

Table 4 The performance of the artificial neural network model with different transfer functions to predict essential oil content (%) of ajowan. Transfer function

SigmoidAxon Axon LinearSigmoidAxon TainhAxon BiasAxon LinerAxon LinerTanhAxon SoftMaxAxon a b c

Training

Table 6 Stepwise regression analysis of essential oil content (dependent variable) and other morphological traits (independent variables) of Iranian ajowan ecotypes.

Testing

R2a

RMSEb

MAEc

R2a

RMSEb

MAEc

0.90 0.05 0.12 0.86 0.07 0.07 0.59 0.00

0.19 0.36 0.48 0.20 0.51 0.46 0.27 0.51

0.11 0.23 0.34 0.12 0.45 0.34 0.22 0.67

0.88 0.04 0.10 0.82 0.05 0.06 0.51 0.00

0.23 0.44 0.35 0.23 0.53 0.39 0.29 0.54

0.14 0.20 0.38 0.16 0.40 0.34 0.25 0.69

Step

Variable entered

Variables in model

Partial RSquarea

R-Squareb

1 2 3 4

NUU NFU NR NP

0.3121 0.2244 0.1705 0.0702

0.3121 0.5365 0.7070 0.7772

Durbin-Watson value = 1.854

VIF for all variable (5 < VIF)

NUU NUU, NFU NUU, NFU, NR NUU, NFU, NR,NP Tolerance for all variable (1 > TOL)

NFU: number of flowers per umbellet, NP: number of pedicels, NR: number of rays, NUU: number of umbellets in an umbel, TOL: tolerance, VIF: variance inflation factor. a Partial determination coefficient. b Determination coefficient.

Determination coefficient. Root mean square error. Mean absolute error.

coefficients analyses to find interrelationship among single plant seed yield, oil content and four yield component traits, including number of secondary branches per plant, number of umbels per plant, number of umbellets per umbel and harvest index in 28 diverse Indian germplasm lines of ajowan, and did not report any significant correlation among essential oil content and other investigated traits. However, in the present study, essential oil content of Iranian ecotypes ajowan had significant correlation with investigated morphological and yield traits. According to the results of correlation coefficient analysis, oil content of ajowan was strongly affected by morphological characteristics, including number of rays, number of pedicels, number of flowers per umbellet, and number of umbellets in an umbel, so these variables were applied to estimate the OC of ajowan in both ANN and MLR models.

et al., 2008; Movagharnejad and Nikzad, 2007; Naroui Rad et al., 2015; Panda et al., 2010; Zhang et al., 2011). The capability of ANN model to produce high-level of statistical data is depend on hidden layer (Naroui Rad et al., 2015). According to the results of Tables 5 and 6 it can be deduced that SigmoidAxon transfer function with two hidden layers are the best parameters in ANN model to predict essential oil content of ajowan. According to scatter plot and box plot there was no significant difference between measured data and predicted data of OC in ANN in both training (Fig. 2a and b) and testing datasets (Fig. 3a and b). The same distribution and same median of measured data and predicted data in box plot of both testing and training stage show the power of ANN in prediction of ajowan OC.

3.2. Essential oil content prediction using MLP/ANN model 3.3. Oil content prediction using MLR model

After the choice of input variables, in this step, the efficiency of ANN was assessed using different number of hidden nodes and also different transfer functions. According to results, the least amounts of RMSE and AME and highest R2 values were achieved using SigmoidAxon function followed by TainhAxon function in both training and testing (Table 4), but the least accuracy of ANN model was achieved by SoftMaxAxon transfer function. The accuracy of ANN model using R2, RMSE and AME for different hidden layer showed that the best result of ANN was achieved with two hidden layers (Table 5). With two hidden layers in training stage RMSE = 0.19%, MAE = 0.11% and R2 = 0.90 and in testing stage RMSE = 0.23%, MAE = 0.14% and R2 = 0.88 estimated. The complexity of the ANN structure, total number of inputs as well as output units, number of samples used in training, the extent of noise in the sample set, and the algorithm used for training are the factors that can affect the number of hidden layers and hidden units in ANN (Erzin

Stepwise regression of MLR model was conducted using SAS software to find respective contributions of four input variables including number of rays, number of pedicels, number of flowers per umbellet, and number of umbellets in an umbel to the total variation of oil content of ajowan. The same data of ANN model here is also applied to input and output of regression model. Regression model computed by SAS was as below:

OC = −3.417 + 0.042NR + 0.094NP + 0.154NFU + 0.641NUU

where OC is oil content, the NR is number of rays, the NP is number of pedicels, the NFU is number of flowers per umbellet, and the NUU is number of umbellets in an umbel. The MLR formulation (Eq. (5)) shows the importance and effect of independent variables on dependent variable and it show that how the amount of oil content in ajowan can change by different amounts of NR, NP, NFU, and NUU. The results of stepwise regression analysis showed that the highest partial R-square was related to NUU that entered to model in first step (Table 6). NR, NP, and NFU were the variables that entered to regression model in next steps (Table 6). Bahmani et al. (2015) used stepwise regression to find contribution of independent variables to the total variation of essential oil content of fennel (Foeniculum vulgare L.) and report that 32.46, 3.63 and 2.33% of total variance of oil content were interpreted by number of leaves, plant height and days to 50% flowering, respectively. Scatter and box plots were also applied to compare observed and predicted OC values from MLR model in both training and testing datasets. Fig. 4 shows the ability of MLR model to predict OC in training (R2 = 0.77) and testing (R2 = 0.74) stages, respectively. These values were significantly less than estimated values by ANN model (R2 = 0.90 and R2 = 0.88). According to scatter and box plots in training (Fig. 3a and b) and testing stages (Fig. 4c and d) of MLR model, distribution and

Table 5 The performance of the artificial neural network model with different hidden layers to predict essential oil content (%) of ajowan. Number ofhidden layer(s)

1 2 3 4 5 a b c

Training

Testing

R2a

RMSEb

MAEc

R2a

RMSEb

MAEc

0.86 0.90 0.88 0.85 0.82

0.22 0.19 0.21 0.28 0.31

0.14 0.11 0.12 0.22 0.27

0.85 0.88 0.86 0.82 0.79

0.25 0.23 0.23 0.31 0.33

0.16 0.14 0.14 0.21 0.29

(5)

Determination coefficient. Root mean square error. Mean absolute error. 128


M. Niazian et al.

Fig. 2. Measured and predicted oil content of ajowan in ANN model. (a) Scatter plot of measured and predicted oil content in training stage of ANN. (b) Box plot of measured and predicted oil content in training stage of ANN.

can greatly affect the oil content of ajowan medicinal plant. In a study, ANN was applied to predict the final fruit weight to select important variables in Iranian population of melon (Cucumis melo L.) and flesh diameter was reported as important independent variable that can affect final fruit weight of melon (Naroui Rad et al., 2015).

other statistic parameters of predicted values differed from measured values and it is obvious that ANN model is more vigorous than MLR model to predict oil content of ajowan (Figs. 2 and 3). Emamgholizadeh et al. (2015) used the ANN and MLR models to predict seed yield of sesame (Sesamum indicum L.) using five independent variables as input dataset and reported that the ANN model performed better than the MLR model in seed yield prediction of sesame. Mokarram and Bijanzadeh (2016) used MLR model along with two models of ANN including MLP and radial basis function (RBF) models to predicting biological yield (BY) and yield (Y) of barely and reported that MLP model had the highest R2 values for prediction of BY and Y of barely.

4. Conclusion Although there are different multivariate regression-based models to find most important associated traits that greatly affect desired quantitative traits with low heritability in plants but these models are unable to interpret complex relationships between dependent and independent variables, and also complex inter-and intra-relations between independent variables, especially when nonlinear relations are prevalent. One of the powerful methods that can help plant breeders to overcome these shortages is ANN. In the present study, ANN model was applied to predict oil content and also to find most important characters that affect the OC of ajowan medicinal plant. According to different analyses, MLP model with SigmoidAxon transfer function and two hidden layers was the best model to predict OC of ajowan. Prediction of ANN was better than MLR model. Based on the results of ANN and stepwise regression, number of umbellets in an umbel was the most important variable that greatly affects oil content of ajowan.

3.4. Sensitivity analysis of the governing variables on the oil content The sensitivity test is the final step of an ANN model that helps to select the most important inputs (Nourani and Sayyah Fard, 2012). To find most important input variables that can affect oil content of ajowan, sensitivity tests were conducted in both ANN and MLR models. The results of sensitivity test showed that highest RMSE (0.442%) and MAE (0.316%) and lowest R2 (0.483) were achieved in ANN model without NUU (Table 7). In MLR model is also the highest RMSE (0.512%) and MAE (0.401%) with lowest R2 (0.403) corresponded to MLR model without NUU (Table 7). These results were agree with results of stepwise regression (Table 6) and indicate that number of umbellets in an umbel is the most important independent variable that

Fig. 3. Measured and predicted oil content of ajowan in ANN model. (a) Scatter plot of measured and predicted oil content in testing stage of ANN. (b) Box plot of measured and predicted oil content in testing stage of ANN. 129


M. Niazian et al.

Fig. 4. Measured and predicted oil content of ajowan in MLR model. (a) Scatter plot of measured and predicted oil content in training stage of MLR. (b) Box plot of measured and predicted oil content in training stage of MLR. (c) Scatter plot of measured and predicted oil content in testing stage of MLR. (d) Box plot of the measured and predicted oil content in testing stage of MLR.

References

Table 7 The sensitivity analysis of the governing variables on the essential oil content. Method

The best ANN/MLR (with NR,NP,NFU,NUU as input) MLR/ANN without NUU MLR/ANN without NFU MLR/ANN without NR MLR/ANN without NP

ANN

Ashraf, M., Orooj, A., 2006. Salt stress effects on growth, ion accumulation and seed oil concentration in an arid zone traditional medicinal plant ajwain (Trachyspermum ammi [L.] Sprague). Journal of Arid Environments 64, 209–220. Azamathulla, H.M., Ghani, A.A., 2011. Genetic programming for predicting longitudinal dispersion coefficients in streams. Water Resources Management 25, 1537–1544. Bahmani, K., Izadi Darbandi, A., Ramshini, H.A., Moradi, N., Akbari, A., 2015. Agromorphological and phytochemical diversity of various Iranian fennel landraces. Industrial Crops and Products 77, 282–294. Bhandari, M., Gupta, A., 1991. Variation and association analysis in coriander. Euphytica 58, 1–4. Boskabady, M.H., Alitaneh, S., Alavinezhad, A., 2014. Carum copticum L.: a herbal medicine with various pharmacological effects. BioMed Research International. http:// dx.doi.org/10.1155/2014/569087. https://www.hindawi.com/journals/bmri/ 2014/569087/abs/. Cary, N.C., 2004. SAS Institute. The SAS System for Windows. Release 9.1. SAS Institute, North Carolina. Cosge, B., Ipek, A., Gorbouz, B., 2009. Some phenotypic selection criteria to improve seed yield and essential oil percentage of sweet fennel (Foeniculum vulgare Mill. var. Dulce). Tarim Bilimleri Dergisi 15 (2), 127–133. Dalkani, M., Darvishzadeh, R., Hassani, A., 2011. Correlation and sequential path analysis in Ajowan (Carum copticum L.). Journal of Medicinal Plants Research 5 (2), 211–216. Dalkani, M., Hassani, A., Darvishzadeh, R., 2012. Determination of the genetic variation in Ajowan (Carum Copticum L.) populations using multivariate statistical techniques. Revista Ciência Agronômica 43 (4), 698–705. Emamgholizadeh, S., Bateni, S.M., Jeng, D.S., 2013a. Artificial intelligence-based estimation of flushing half-cone geometry. Engineering Applications of Artificial Intelligence 26 (10), 2551–2558. Emamgholizadeh, S., Kashi, H., Marofpoor, I., Zalaghi, E., 2013b. Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models. International Journal of Environmental Science and Technology 11 (3), 645–656. Emamgholizadeh, S., Parsaeian, M., Baradaran, M., 2015. Seed yield prediction of sesame using artificial neural network. European Journal of Agronomy 68, 89–96. Erzin, Y., Rao, H., Singh, D., 2008. Artificial neural network models for predicting soil thermal resistivity. International Journal of Thermal Sciences 47, 1347–1358.

MLR

R2a

RMSEb

MAEc

R2a

RMSEb

MAEc

0.88

0.23

0.14

0.74

0.26

0.18

0.48 0.59 0.64 0.70

0.44 0.40 0.36 0.31

0.31 0.28 0.23 0.18

0.40 0.47 0.54 0.63

0.51 0.43 0.38 0.33

0.40 0.30 0.24 0.21

ANN: artificial neural network; MLR: multiple linear regressions, NFU: number of flowers per umbellet, NP: number of pedicels, NR: number of rays, NUU: number of umbellets in an umbel. a Determination coefficient. b Root mean square error. c Mean absolute error.

Acknowledgements The authors are thankful to the Research Institute of Forests and Rangelands of Iran for procuring the seeds, and also grateful to Dr. M.H Asare, the secretary of science and technological development staff of medicinal plants and traditional medicine of Islamic Republic of Iran for his kind support of ajowan project.

130


M. Niazian et al.

(Trachyspermum ammi (L.) Sprague). In Vitro Cellular and Developmental Biology Plant 54, 54–68. Niazian, M., Sadat-Noori, S.A., Abdipour, M., 2018b. Modeling the seed yield of Ajowan (Trachyspermum ammi L.) using artificial neural network and multiple linear regression models. Industrial Crops and Products 117, 224–234. Noori, S.A.S., Norouzi, M., Karimzadeh, G., Shirkool, K., Niazian, M., 2017. Effect of colchicine-induced polyploidy on morphological characteristics and essential oil composition of ajowan (Trachyspermum ammi L.). Plant Cell, Tissue and Organ Culture (PCTOC) 130, 543–551. Nourani, V., Sayyah Fard, M., 2012. Sensitivity analysis of the artificial neural network outputs in simulation of the evaporation process at different climatologic regimes. Advances in Engineering Software 47, 127–146. Panda, S., Ames, D., Panigrahi, S., 2010. Application of vegetation indices for agri-cultural crop yield prediction using neural network techniques. Remote Sensing 2, 673–696. Prasad, A., Prakash, O., Mehrotra, S., Khan, F., Kumar Mathur, A., Mathur, A., 2016. Artificial neural network-based model for the prediction of optimal growth and culture conditions for maximum biomass accumulation in multiple shoot cultures of Centella asiatica. Protoplasma 254 (1), 335–341. Rahimmalek, M., Fadaei Heidari, E., Ehtemam, M.H., Mohammadi, S., 2017. Essential oil variation in Iranian Ajowan (Trachyspermum ammi (L.) Sprague) populations collected from different geographical regions in relation to climatic factors. Industrial Crops and Products 95, 591–598. Safa, M., Samarasinghe, S., Nejat, M., 2015. Prediction of wheat production using artificial neural networks and investigating indirect factors affecting it: case study in Canterbury Province, New Zealand. Journal of Agricultural Science and Technology 17, 791–803. Samadianfard, S., Nazemi, A.H., Ashraf Sadraddini, A., 2014. M5 model tree and gene expression programming based modeling of sandy soil water movement under surface drip irrigation. Agriculture Science Developments 3 (5), 178–190. Shahinfar, S., Mehrabani-Yeganeh, H., Lucas, C., Kalhor, A., Kazemian, M., Weigel, K.A., 2012. Prediction of breeding values for dairy cattle using artificial neural networks and neuro-fuzzy systems. Computational and Mathematical Methods in Medicine. http://dx.doi.org/10.1155/2012/127130. https://www.hindawi.com/journals/ cmmm/2012/127130/abs/. Silva, G.N., Tomaz, R.S., Sant’Anna, I.C., Nascimento, M., Bhering l, L., Cruz, C.D., 2014. Neural networks for predicting breeding values and genetic gains. Scientia Agricola 71 (6), 494–498. Tufail, M., Ormsbee, L.E., Teegavarapu, R., 2008. Artificial intelligence-based inductive models for prediction and classification of fecal coliform in surface waters. Journal of Environmental Engineering 134 (9), 789–799. Wosten, J.H.M., Lilly, A., Nemes, A., le Bas, C., 1999. Development and use of a database of hydraulic properties of European soils. Geoderma 90, 169–185. Zhang, H., Hu, H., Zhang, X., Zhu, L., Zheng, K., Jin, Q., Zeng, F., 2011. Estimation of rice neck blasts severity using spectral reflectance based on BP-neural network. Acta Physiologica Planta 33, 2461–2466.

Fadaei Heidari, E., Rahimmalek, M., Mohammadia, S., Ehtemam, M.H., 2016. Genetic structure and diversity of ajowan (Trachyspermum ammi) populations based on molecular, morphological markers, and volatile oil content. Industrial Crops and Products 92, 186–196. Fraser, J., Eaton, G.W., 1983. Application of yield component analysis to crop research. Field Crop Research 39, 787–797. Ghanshyam, N., Dodiya, S., Sharma, S.P., Jain, H.K., Dashora, A., 2015. Assessment of genetic variability, correlation and path analysis for yield and its components in ajwain (Trachyspermum ammi L.). Journal of Spices and Aromatic Crops 24 (1), 43–46. Govindaraju, R.S., 2000a. Artificial neural networks in hydrology. I: preliminary concepts. Journal of Hydrologic Engineering 5, 115–123. Govindaraju, R.S., 2000b. Artificial neural networks in hydrology. II: hydrologic applications. Journal of Hydrologic Engineering 5, 124–137. Iquebal, M.A., Ansari, M.S., Sarika, S.P., Dixit, N.K., Aggarwal Verma, R.A.K., Jayakumar, S., Rai, A., Kumar, D., 2014. Locus minimization in breed prediction using artificial neural network approach. Animal Genetics 45, 898–902. Joshi, S.G., 2000. Medicinal Plants. Oxford and IBH Publishing Co. Pvt. Ltd., New Delhi, India 491p. Lal, R.K., 2007. Associations among agronomic traits and path analysis in fennel (Foeniculum vulgare Miller). Journal of Sustainable Agriculture 30 (1), 21–29. Mishra, R., Gupta, A.K., Lal, R.K., Jhang, T., Banerjee, N., 2015. Genetic variability, analysis of genetic parameters, character associations and contribution for agronomical traits in turmeric (Curcuma longa L.). Industrial Crops and Products 76, 204–208. Mokarram, M., Bijanzadeh, E., 2016. Prediction of biological and grain yield of barley using multiple regression and artificial neural network models. Australian Journal of Crop Science 10 (6), 895–903. Moosavi, S.G., Seghatoleslami, M.J., Jouyban, Z., Ansarinia, E., Moosavi, S.A., 2015. Response morphological traits and yield of Ajowan (Carum copticum) to water deficit stress and nitrogen fertilizer. Biological Forum – An International Journal 7 (1), 293–299. Movagharnejad, K., Nikzad, M., 2007. Modelling of tomato drying using artificial neural network. Computers and Electronics in Agriculture 59, 78–85. Naroui Rad, M.R., Koohkan, S., Fanaei, H.R., Pahlavan Rad, M.R., 2015. Application of artificial neural networks to predict the final fruit weight and random forest to select important variables in native population of melon (Cucumis melo L.). Scientia Horticulturae 181, 108–112. Niazian, M., Sadat Noori, S.A., Galuszka, P., Tohidfar, M., Mortazavian, S.M.M., 2017a. Genetic stability of regenerated plants via indirect somatic embryogenesis and indirect shoot regeneration of Carum copticum L. Industrial Crops and Products 97, 330–337. Niazian, M., Sadat Noori, S.A., Tohidfar, M., Mortazavian, S.M.M., 2017b. Essential oil yield and agro-morphological traits in some Iranian ecotypes of ajowan (Carum copticum L.). Journal of Essential Oil Bearing Plants 20 (4), 1151–1156. Niazian, M., Sadat-Noori, S.A., Abdipour, M., Tohidfar, M., Mortazavian, S.M.M., 2018a. Image processing and artificial neural network-based models to measure and predict physical properties of embryogenic callus and number of somatic embryos in ajowan

131