Selection of Methodology for Neural Network Modeling of Constitutive ...

Computer-Aided Civil and Infrastructure Engineering 15 (2000) 440–458

Selection of Methodology for Neural Network Modeling of Constitutive Hystereses Behavior of Soils Imad A. Basheer* Transportation Laboratory, Engineering Service Center, California Department of Transportation, 5900 Folsom Boulevard, Sacramento, California 95819, U.S.A.

Abstract: Classic constitutive modeling of geomaterials based on the elasticity and plasticity theories suffers from limitations pertaining to formulation complexity, idealization of behavior, and excessive empirical parameters. This article capitalizes on the modeling capabilities of neural networks as substitutes for the classic approaches. The neural network–based modeling overcomes the difficulties encountered in understanding the underlying microscopic processes governing the material’s behavior by redirecting the efforts into learning the cause-effect relations from behavioral examples. Several methodologies are presented and cross-compared for effectiveness in approximating a theoretical hysteresis model resembling stress-strain behavior. The most effective methodology was used in modeling the constitutive behavior of an experimentally tested soil and produced models that simulated the real behavior of the soil with high accuracy. Although these models are empirical, they are retrainable and thus, unlike classic constitutive modeling techniques, can be revised and generalized easily when new data become available. 1 INTRODUCTION Constitutive models describing the relationships between stresses and strains of materials are crucial elements in the design and analysis of engineering systems. The traditional approaches to modeling the constitutive behavior of geomaterials such as soils are normally based on the elasticity and plasticity theories.2–4,6,7 In the last few decades, * E-mail: [email protected].

modeling of the soil constitutive behavior has undergone extensive progress, stimulated by the advancements in digital computing and the evolvement of efficient numerical solution techniques. Unfortunately, the marked progress in both computing and numerical solutions far exceeded our ability to accurately model the real behavior of soils. Depending on the accuracy sought and the available level of understanding of the physical processes, constitutive models tend to be extremely sophisticated. In order to achieve tractable and practical models, one usually resorts to idealizing the soil behavior and applying many assumptions and simplifications to the underlying processes. Unfortunately, such practices often lead to models that are unrepresentative of the real behavior of the soil being modeled. When such models are employed for solving a soil-structure interaction problem, for instance, erroneous results are expected irrespective of the sophistication and accuracy of the numerical technique used in the solution. Other limitations inherent in the classic constitutive modeling are described in Basheer.1 Due to the complexity involved in formulating even a specialized model, it is undoubtedly that the development of generalized constitutive models applicable to a wide variety of soils, physical and loading conditions, stress path, etc. is virtually impossible. The constitutive behavior of soils could include monotonic loading and cyclic loading either with or without loading reversal (e.g., extension to compression). In civil engineering systems, monotonically loaded elements are not common. The monotonic behavior is relatively simpler to model even using the classic techniques. The materials involved in the vast majority of civil engineering systems

© 2000 Computer-Aided Civil and Infrastructure Engineering. Published by Blackwell Publishers, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford OX4 1JF, UK.

Selection of methodology for neural network modeling of constitutive hystereses behavior of soils

are subject to cyclic loading-unloading conditions such as soils supporting pavements, bridge abutments, and railroad foundations. The cyclic behavior is more challenging due to the abrupt changes in the stress-strain relationships, the hystereses induced on unloading and reloading, and the associated changes in soil physical properties that would affect the material behavior henceforth. The limitations of the classic constitutive modeling pertain to (1) lack of understanding of the underlying processes governing the behavior, (2) deficiencies inherent in the theories used to explain the behavior, and (3) complexity involved in model formulation based on these theories. One alternative approach that could circumvent these limitations is to develop empirically based numeric models from a large body of data describing the real behavior of soils as obtained from laboratory experiments or field observations. Such an approach would be based exclusively on extracting the knowledge directly from the observed behavior without the need for a precise understanding of the many underlying processes that govern the behavior. Artificial neural networks (ANNs) are particularly relevant in this regard because of their extraordinary mapping capabilities when the cause-effect relationships between the system state variables are not precisely understood or the governing behavior is not fully supported by well-defined theories. ANNs are universal approximators16 capable of learning an arbitrary phenomenon and understanding the associations between system parameters. Because the networks can be trained to learn the real constitutive behavior, neither idealization of the processes is necessary nor are assumptions needed. This will result in better models and consequently more logical and accurate solutions to the problems analyzed. Neural networks are trained on samples of behavior representative of the problem domain of interest. Once the behavior is learned, the network would be able to predict the behavior for untrained situations within the same problem domain. Moreover, the flexibility of neural networks as approximators allows the modeler to develop generalized models applicable to a wide spectrum of scenarios. The effect of a large number of parameters (e.g., soil properties, loading and physical conditions) can be included simply by expanding (diversifying) the set of examples used in network training. Unlike classic techniques, increasing the level of complexity (generalization) in these empirical models is made only at a minimal cost associated with obtaining more data, whereas the model-development approach is fixed at the same level of sophistication. The modeling of constitutive behavior of materials such as soils and concrete using neural networks has been tried recently.1,8–10,15,19 The available literature is deficient in several aspects that will be addressed in this article, such as applicable modeling techniques, factors affecting the sensitivity of the techniques, the cyclic behavior of clay, etc.

441

In this article, several methodologies that may be used as frameworks in neural network–based modeling of the constitutive behavior of materials are examined in relation to their accuracy in approximating the real behavior of soil, as well as the modeling simplicity. The methodologies are evaluated in approximating a theoretically obtained hysteresis model resembling the constitutive cyclic behavior of soils. The best chosen methodology is then applied to real data obtained experimentally from a clayey soil. 2 NEURAL NETWORKS Neural networks are computing devices made up of a large number of simple, highly interconnected processing elements (neurons) that emulate abstractly the structure and operation of the biologic nervous system. They operate in parallel, learn from examples, and process the information by their dynamic state in response to the external signals. There are many different types and architectures of neural networks, varying fundamentally in the way they learn, the details of which are well documented in the literature.11.12,18 The backpropagation neural network is one but the most commonly used due to its precisely defined and understood learning law and the distinguished ability to generalize. The architecture of a backpropagation network may contain two or more layers. A simple two-layer neural network consists only of an input layer that contains the input variables to the problem and an output layer containing the solution of the problem. This type of network is only a satisfactory approximator for linear problems. However, for approximating (mapping) nonlinear systems, additional intermediate (hidden) processing layers usually are employed to handle the problem’s nonlinearity/complexity. Although it depends on complexity of the function or the process being modeled, one hidden layer may be sufficient to map an arbitrary function to any degree of accuracy.16 Hence three-layer architecture was adopted for the present study. Figure 1 shows the typical structure of a fully connected three-layer network with n input nodes (neurons), m hidden nodes, and r output nodes. The neurons of any one layer are connected to all the neurons in the succeeding and preceding layers, but no links are allowed within the same layer. The importance of the connection (link) between any two neurons is characterized by the assigned weight it carries. Initially, these weights are assigned randomly; however, during training, they change dynamically in such a way as to produce the best approximation to the problem solution. First, an example is presented to the “uneducated” network, the input vector part of the example is propagated forward, and a network solution (output) is obtained. The deviation of the network solution from the target value is computed, and using the backpropagation “learning law,”

442

Basheer

Fig. 1. A typical three-layer feedforward error backpropagation fully connected network with n nodes in input layer, m nodes in hidden layer, and r nodes in the output layer.

the error is propagated backward beginning at the output layer to adjust the connection weights. Once again at the input layer, the signal may be refed forward into the network using the new weights, and the procedure is repeated such that the same error will not occur twice. This so-called training process is applied to the many training examples presented to the network until an optimal weight matrix is obtained that represents the best approximation of the process being modeled. A multilayer network (e.g., Figure 1) is designed to receive an input signal vector X = [x0 , x1 , . . . , xi , . . . , xn ]T , where X ∈ Rn+1 , x0 = 1, and is required to produce an estimate output vector Yˆ , hopefully, as close as possible to the (unknown) system’s actual (target) output Y = [y1 , y2 , . . . , yk , . . . , yr ]T , where Y ∈ Rr . Since no interaction is allowed between the nodes of the same layer, each output (e.g., kth output) is analyzed independently. Assuming that the network has already been trained, it is activated by (1) feeding the input vector X forward through the links connecting the input to hidden layers and represented by W = [w1 , w2 , . . . , wz ]T , where z = m × (n + 1) and W ∈ Rm×(n+1) , (2) processing of the input vector at the hidden nodes, (3) passing it further through the links connecting the hidden nodes to the kth output node, represented by the vector V = [v1 , v2 , . . . , vz ]T , where z = m + 1 and V ∈ R(m+1) , and finally, (4) processing it at the kth output node to produce a solution. The network approximation (yˆk ) for the kth actual output (yk ) is computed from m n yˆk = σ vkj σ (1) w j i xi j =0

i=0

where x0 = 1.0, σ is an activation (squashing) function such as the simple logistic [σ (ξ ) = 1/(1 + e−ξ )], and

wj 0 = θj and vk0 = θk are, respectively, the input and hidden layers’ thresholds that determine the firing limits of the neurons.1,11,12,18 Network learning is normally judged by its ability to respond within a prespecified tolerance to a set of untrained examples. Although there are many factors to a successful network development, one of three situations may occur in a given training phase. These are (1) the network weights could converge to one global minimum, which assures the best solution to the problem, (2) the weights may get stuck in a local minimum on the multidimensional rugged error hypersurface, yielding an inferior solution, and (3) an intermediate situation where the weights are quasi-optimal due to limitations inherent in the learning law or the selection of the training parameters. Depending on the problem being solved, the level of accuracy sought, and the computational resources, a quasi-optimal solution generally may be acceptable. The backpropagation learning algorithm11,12,18 combines both the mechanism of feedforward of the input signal, the backpropagation of error, and the adjustment of weights that is at the heart of network development. A step-by-step simplified mathematics for the backpropagation algorithm is given in Basheer.1 Selection of data for network training is another crucial element in successful network development. The problem domain normally is divided into three sets (or subsets), each containing a distinct collection of examples describing the phenomenon under investigation. A network with an arbitrarily chosen architecture is trained on data of a compact set Sp containing P exemplars sampled from the problem domain set S. A subset Sq , called the test set, containing a sufficient number of exemplars Q drawn from S is also formed such that Sq does not contain any member of Sp . This set is used to examine the network generalization during the training process and to control the potential of overfitting by overparameterization due to expanding the hidden layer. Because Sq is used during the network development, a third subset eccentric to both Sp and Sq , called the validation set Sv , also may be formed and subsequently used to further verify the model generalization. The relationships between the various sets necessary for effective network development are S = Sp ∪ Sq ∪ Sv and Sp ∩ Sq = Sp ∩ Sv = Sq ∩ Sv = ϕ = {·}. Ideally, a mapping network should minimize the error function(s) on all Sq , Sp , and Sv (i.e., on the exhaustive problem domain S). The success of a backpropagation network as a trainable approximator hinges on many other issues pertaining to network training, architecture, and data treatment that are very well discussed in much of the published literature1,11,12,18 and will not be repeated herein.


3 MAPPING TECHNIQUES This article examines several mapping techniques that could be used in modeling the constitutive response of soils, especially with complex hysteresis behavior, as well as to other similar problems. The hysteresis behavior is of special importance in this context because it can be applied to the more specific monotonic behavior, as will be shown later. The most elementary method for training a network is the “data as is” procedure, which attempts to map the X vectors onto Y vectors without provisions to the data. This approach is called herein direct mapping (DM). Essentially, this approach may work satisfactorily for hystereses- and anomalies-free data such as those generated from a one-toone function f [1 : 1]. Direct mapping as a means for neural network training normally fails in approximating hysteresis functions or data1 and thus will not be considered in this article. To alleviate the limitations of DM, special provisions for the standard method of training, method of data presentation to the network, and data preprocessing prior to training may be all but essential factors in developing accurate approximators. Therefore, this section focuses primarily on the techniques that can be employed for developing reasonably reliable models of hysteresis behavior. The methodologies discussed here are categorized into (1) data preprocessing-oriented or (2) data presentationoriented. The former category involves operating on the data prior to training by the standard DM. Essentially, pretreatment of data prior to training strives to transform the data from any arbitrary degree of hysteresis into f [1 : 1]. The latter category involves modifying the way the data are presented to the network during the training process. In the following subsections, five proposed methods are discussed and tested for their effectiveness as platforms for developing neural network approximators.

443

indicator subvector. With these expanded feature vectors, the DM training method is expected to generate networks capable of reasonably approximating the function defined on the given domain. 3.2 Function fragmentation (FF) Function fragmentation requires that each problematic segment of the complex function be treated in isolation from the others. Mapping (e.g., using DM) is then performed independently for each segment, yielding a number of ANNs equal to the number of segments. The basic difference between this methodology and FL is that FF maps each segment independently, whereas FL maps all the labeled segments into a single ANN. The FF operation may be expressed as f [1 : m] ≡

N

ANNi

∀di ⊂ D

(3)

i=1

A better accuracy for the mapping networks is realized if the subfunctions are all f [1 : 1]. Thus the FF method aims to convert the original one-to-many function into many oneto-one subfunctions. The entire domain is then simulated by combining predictions from all the ANNs. One advantage of this approach is that it enables mapping of each module to a different degree of accuracy depending on its complexity, thus reducing computation (learning) time. The limitations of this approach pertain to (1) the imprecise rules governing the way the functions or the data are partitioned (fragmented), (2) the large number of networks that need to be trained/tested independently, and (3) the nonunique predictions at the boundary points separating two adjacent segments due to nonuniform network accuracy.

3.1 Function labeling (FL)

3.3 Quasi-sequential dynamic mapping (QSD)

Labeling involves pretreating the segments that restrict the function (or data) from being f [1 : 1] by assigning distinctive indicators such as real, integer, or binary numbers to them. If the function is one-to-many f [1 : m], then FL attempts to approximate an equivalent combination of oneto-one subfunctions fi [1 : 1] such that

The QSD mapping approach is a sequential reproduction/recognition technique that uses historical states of the function (or data) to predict (forecast) future states. The dynamic nature of this representation provides an advantage over other methods in that it does not require any provision for the function/data. In training, the data are fed to the network as pairs of input vectors describing the state at previous times (lags) and output vectors representing future state(s). The number of lags (memory steps) needed for data representation is strictly problem-dependent. In order to train a multilayer network, the data are initially encoded as

f [1 : m] ≡

N i=1

fi [1 : 1]

(2)

Here, is the logic union operator, N is the total number of f [1 : 1] subfunctions, and fi is defined on the subdomain di ≡ [xiL , xiU ], where di ⊂ D, D refers to the simulation or problem domain, and superscripts L and U denote, respectively, the lower and upper limits of subdomain di . The f [1 : m] function may be converted into f [1 : 1] by expanding each input feature vector to include an additional

y(i + 1) = f { [ y(i − k), x(i − k)]; [ y(i − k + 1), x(i − k + 1)]; . . . ; [ y(i), x(i)]; x(i + 1); rl }

(4)

444

where f is the function to be approximated by ANN, i is the step number, k is the lag (k ≥ 1), rl ’s are independent variables (indicators) characterizing the function, and y(j ) is the f value for x(j ) as obtained from the data; j ∈ {i − k, i − k + 1, . . . , i − 1, i}. Once the network has been trained, it is used to predict (forecast) future states [e.g., y(i + 1)] from the knowledge of the past [e.g., y(i)]. In order to use the forecasting networks for prediction, several important issues are addressed. The developed network may be used as (1) a sequential recognition network, where a future state of the system is determined for a given previous condition, or (2) a sequential reproduction model, in which the network is used to simulate the entire behavior starting from a known state(s) [e.g., initial condition(s)] and incrementally proceeding by allowing the network to reproduce on its own all the remaining futuristic states. In the latter scheme, the network is run recursively by repeatedly feeding the response simulated at the output units back into the input layer. Each cycle advances the system one step ahead (either temporally or spatially). The sequential reproduction is thus a dynamic system with the capability of using raw data irrespective of their complexity and hysteresis nature. Another apparent advantage of sequential mapping is related to the easily derived inverse models from the same “forward” data. This is so because a property at a given state is only dependent on previous states, regardless of whether this property belongs to the dependent or independent vector. A possible drawback of the sequential/dynamic systems, however, is the potential need for prolonged training and/or huge network sizes. The reason is that unless the developed network is precise, the prediction error may compound each time the network is recursively used to advance the predictions one step ahead. At a later stage in simulation, the model may become totally unrepresentative of the system. This situation is even more exacerbated when the functions (or data) contain hystereses or abrupt changes in trend. 3.4 True sequential dynamic mapping (TSD) This method is similar to the QSD mapping, however, both training and testing are carried out in a sequential manner. Here, the network is trained on data where the input vector matches that of the experimental training data only at the initial stage (first run). The network is then used to predict an output (of a particular data set), which in turn is copied back to its registered slots in the input vector. The actual output vector is used only to modify the weights of the network using the backpropagation learning laws. For example, consider the case where the data are represented as vector pairs similar to Equation (4). Initially, the network is presented with the initial conditions that are identical to the first training pattern in the training database. With an initialized network, the output vector is calculated by activating the network in the usual manner [the computed output

Basheer

may be dissimilar to the given (target) output]. The error signal computed at the output side is then used to modify the weight vector in backward propagation scheme. Prior to the second iteration cycle (and assuming exemplar-byexemplar training), the input vector is adjusted such that the output produced in the previous cycle is copied to its corresponding locations in the input vector. This training scheme is based on the Jordan method13 and is one form of the recurrent networks trained by backpropagation. This training scheme is thought to provide a better stochastic search on the error hypersurface due to continuous changes in the input pattern. The search is forced to continuously traverse the error surface in all possible directions to locate a global minimum. Therefore, unlike the static QSD training, which depends on updating the weights based on a change in the error, this scheme allows modifying the weights based on both the error and the change in the input vector. Because of the dynamic nature of simulation, this approach has essentially the same advantages and limitations as the QSD approach. 3.5 Hybrid modeling (HM) Depending on the complexity and degree of nonlinearity of the system to be modeled, a combination of the various data processing (e.g., FL, FF) and training (e.g., QSD, TSD) methodologies may be needed to obtain accurate approximation models. The selection among these methodologies may be driven by experience and experimentation with the data. Besides being problem-dependent, deciding on combinations of specific methodologies should be evaluated in the light of accuracy, availability of computational resources, and training time. 4 SIMULATION EXPERIMENTS The simulations conducted in this study are aimed at examining the proposed methodologies as platforms useful for approximating the cyclic constitutive behavior of soils. A typical cyclic stress-strain curve for soils is given in Figure 2. A material typically is loaded from an initial state (e.g., stress-free, strain-free condition) until a certain level of stress or strain is reached and then unloaded fully (i.e., to zero stress) or partially, followed by reloading. The various parts of a single-cycle curve comprising virgin loading, unloading, reloading, and postloading are shown in Figure 2. Postloading commences when the stress level after reloading reaches the level at which the unloading occurred. A number of unload-reload-postload cycles may be applied as dictated by the problem until the material fails. A monotonically loaded material produces monotonic stress-strain curve in which the virgin loading part continues until failure is reached.


Fig. 2. A stress-strain curve with one unload-reload cycle labels used in classifying the loading stage.

4.1 Mathematical function To evaluate the suitability of the proposed methodologies as platforms for developing neural network approximators, a hysteresis mathematical function that would produce a curve similar to Figure 2 is selected. Because of relevancy to constitutive modeling, a mathematical (theoretical) stress-strain model was formulated and used in the analysis. The reason for choosing a theoretical model rather than directly evaluating the methodologies on experimental data is to eliminate the potential of interference of the experimental and measurement errors (noise) with the neural network’s mapping ability and the limitation of the methodology. This choice also was stimulated by the fact that the methodologies may vary considerably in their sensitivity to the noise in the data. The mathematical cyclic stress-strain model was derived from the simple monotonic Konder’s expression.14,17 The Konder model is a hyperbolic (nonlinear) representation used in approximating the one-dimensional monotonic constitutive behavior of soils. An expression for the model is5 ε σ (ε) = a + bε

(5)

where σ is the normal stress and ε is the axial strain. The empirical (fitting) constants a and b denote, respectively, the reciprocal of the initial tangent modulus [(dσ/dε)ε=σ =0 ] and the stress at infinite strain (i.e., ultimate stress). The empirical constants a and b are normally obtained by fitting linear transformation of Equation (5) to experimental data.5 The simple Konder model (Equation 5) was used to generate virgin loading and postloading parts of a typical stress-strain curve. Next, to add an unload-reload cycle,

445

new portions were appended to the original curve by considering the following: (1) Empirical constants. The σ -ε Konder curve data were obtained for cases with a = 1 [i.e., initial tangent modulus (dσ/dε)ε=σ =0 = 1/a = 1.0], while the constant b ranged from 0.5 to 3.0 with increments of 0.5, for a total of six stress-strain curves. (2) Unloading. The unloading for all the six curves was started at ε = 1.0 irrespective of σ . The strain at which unloading was initiated is denoted by εu , and the corresponding stress level is σu . Unloading was assumed to proceed at a slope equal to initial tangent modulus (1/a = 1.0) until σ = σu /2 is reached regardless of the strain. Had the unloading been continued at some rate beyond this level, a free-stress state (i.e., σ = 0) could have been achieved at ε = εp . However, in order to obtain an open unloadingreloading loop, unloading was commenced, alternatively, from the point with σ = σu /2 at an arbitrary slope such that the state of free stress occurs at ε = εp /2. (3) Reloading. The reloading was continued straight upward to coincide with (εu , σu ) and continued thereafter (i.e., postloading) along the same original curve. Because these single-cycle stress-strain curves were obtained from the simple Konder model, they are referred to herein as extended Konder’s curves (EKCs). A diagrammatic representation for an EKC with one appended unload-reload loop is given in Figure 3. This curve is expressed mathematically according to Equation (6), which will be referred to as the extended Konder’s model (EKM), from which the data for the simulation experiments were generated:  ε  if 0 ≤ ε ≤ εu  1+bε    ε +ε  ε −εp if u 2 p ≤ ε ≤ εu    εp ε ε +ε σ if 2p ≤ ε ≤ u 2 p (6) σ (ε,εu ,εp ,σu ) = εuu ε − 2  

 ε  2σu ε − εp if 2p ≤ ε ≤ εu   2ε −ε 2 u p     ε if ε ≥ εu 1+bε The preceding set of equations produces a complete fivesegment σ -ε curve with a single open unloading-reloading loop. As illustrated in Figure 3, these segments are OA (virgin loading), AB (unloading, part 1), BC (unloading, part 2), CA (reloading), and AD (postloading). The virgin loading and postloading parts are monotonically increasing functions, continuous, and differentiable throughout the domain where they reside. The function within the loop region is f [1 : 3], discontinuous, and nondifferentiable. As demonstrated in Figure 3, there are three discontinuities along the loading-unloading course, namely, at points A, B, and C, that produce abrupt changes in the trend that may be difficult to capture by DM ANNs.

446

Basheer

4.5 Results

Fig. 3. The extended Konder’s stress-strain curve with one hysteresis loop.

4.2 Data generation The six EKCs used in neural network simulations were generated for 0 ≤ ε ≤ 10 with a total of 160 (σ, ε) data points representing each curve. In all model-development phases, five EKCs (b = 0.5, 1.0, 2.0, 2.5, and 3.0) comprising a total of 800 (σ, ε) points were used for training. The network generalization was tested on one EKC (a = 1.5) with a total of 160 (σ, ε) points. 4.3 Network optimization The method of adaptive network expansion explained in Basheer1 was used in the simulations. Briefly, this procedure involves incrementally expanding the hidden layer (from one hidden node), followed by training and simultaneously testing of network performance. This procedure is repeated for every training epoch and exemplar presented to the network until all the prespecified error criteria are satisfied. In each network evolution, the matrix of converged connection weights for the current network is used to initialize the next larger network. 4.4 Performance measures The accuracy of each simulation was evaluated based on determining the overall agreement (or disagreement) between the ANN-predicted and the corresponding actual output stress values. These included the sum of squared error (SSE) for all curves combined and the statistical coefficient of determination R 2 , relevant to a zero intercept-unit slope linear model Yˆ = Y. An R 2 = 1.00 denotes perfect agreement, and 0.00 represents absolutely uncorrelated values.

4.5.1 FL. The FL technique incorporated assigning three labels to identify three segments of the σ -ε curve. The segments and their labels (L) are as follows: L = 0 for both 0 ≤ ε ≤ εu (virgin loading) and εu ≤ ε ≤ 10 (post loading); L = −1 for εp /2 ≤ ε < εu (unloading), and L = +1 for εp /2 ≤ ε < εu (reloading). With these labels, this simulation experiment is modified from finding a network that substitutes the function f in σ = f (ε, b) into a network that approximates the function f in σ = f (ε, b, L). Training of the revised data is performed using DM to produce a dual-function network that performs both classification (to locate the segment) and interpolation (to predict untrained data within each segment). For the EKM (Equation 6), the most appropriate network architecture was found at 3-30-1, referring to 3 nodes (i.e., ε, b, L) in the input layer, 30 nodes in hidden layer, and one node (i.e., σ ) in the output layer. The accuracy of the 3-30-1 network was determined as follows: SSE\R 2 = 0.9007\0.998 for the training curves and 0.2855\0.965 for the test curve. In order to compare prediction accuracy between training and test curves, the SSE was averaged per curve (denoted by SSEPC), which gave SSEPC = 0.1801 for training and SSEPC = 0.2855 for testing. Figure 4a–f compares the theoretical curves with their ANN-predicted analogues. The agreement demonstrates a relatively satisfactory capability of the FL modeling approach; however, there is a room for substantial improvement in prediction. 4.5.2 FF. The function fragmentation (FF) method requires that the EKC-generated data be preprocessed by decomposing the curves into four fragments, namely, OA, ABC, CA, and AD (see Figure 3). Each fragment was then trained independently to approximate the function f in σ = f (ε, b) within its relevant domain by using the direct mapping approach. This resulted in four ANNs (ANN1 to ANN4 ), with R 2 exceeding 0.997 for the training data and 0.954 for the test data. The predictions from the four networks were combined to generate the entire EKCs. The five training curves were predicted with SSE varying between 0.0143 to 0.0526. The averaged SSEPC = 0.027 for training curves and 0.029 for the test curve. An excellent agreement was found between the theoretical curves and those predicted by the combined four modules, as demonstrated in Figure 5. It is clear that the FF-based model outperforms the FL because each of the four smaller networks was trained only to perform interpolation of the function within its monotonic region (i.e., no training for classification). 4.5.3 QSD. In the QSD modeling, the one-point and twopoint schemes for encoding and representing the training data were used. The objectives of these modeling experiments are to design ANNs that approximate as closely as


447

Fig. 4. Results of simulation for the EKCs σ = f (ε, b, L, a = 1.0) based on 3-30-1 FL ANN with comparison with theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5.

448

Basheer

Fig. 5. Results of simulations using four FF-based ANNs for the EKCs σ = f (ε, b, a = 1.0) in comparison with the theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5.


449

possible the function f in the sequential equations: σ (i + 1) = f { [ ε(i), σ (i)]; ε(i + 1); b}

(7a)

σ (i + 2) = f { [ ε(i + 1), σ (i + 1)]; [ ε(i), σ (i)]; ε(i + 2); b}

(7b)

Equation (7a) represents the one-point scheme with four input parameters, and Equation (7b) represents the twopoint scheme with six input parameters. In one simulation phase, an ANN was trained incrementally using the one-point scheme to an architecture of 4-10-1. The agreement between the true and ANN-predicted σ (i + 1) data for all 960 points is shown in Figure 6, with SSE = 0.4013 and R 2 = 0.991 for the training and test data combined. Because this network predicts (accurately) a future state from a given (correct) state, I call it a correct-in, correct-out (CICO) network. Two modules are necessary for the simulations: the neural network module containing the network developed from the data and the delay/recursion module that receives the predictions, stores them, lags the data and predictions, and activates the network module. A simplified schematic for the simulation unit is illustrated in Figure 7. A dynamic simulation is run starting from the known initial conditions (ε0 , σ0 ) = (0, 0) to predict the next state, which is then used to predict the next, and so on until the last state (e.g., material failure) is reached. Results of the simulations for the training and test curves are shown in Figure 8a–f. In general, predictions using this methodology are not quite as accurate as those obtained with FL and FF. The QSD mapping of the six curves yielded SSE = 100.08 and SSEPC = 19.6 for the training curves and 1.76 for the test curve. Note that the network did predict the test curve with a better accuracy than the training curves. It is obvious how the network predictions degraded from the SSE = 0.4013 (for the 4-10-1 CICO network) to SSE = 100.08. The network begins with relatively accurate mapping only at the very early stage of simulation and then the predictions continue to deviate from the true values due to error accumulation attributed to the recursive nature of this approach. The mapping accuracy does not seem to follow a trend in relation to simulating the unloadreload loop. It is believed that the network accuracy plays a major role in the recursive simulation process. Therefore, in order to examine the effect of accuracy of the CICO ANN, the current network was allowed to expand to 20 hidden nodes. The comparison between the true σ (i + 1) and predictions of the 4-20-1 CICO ANN is presented in Figure 9, with SSE = 0.1663 for the combined training and test data (compare with 0.4013 for the 4-10-1 ANN). Comparing Figure 6 with Figure 9, it is seen that the overall accuracy of the CICO network has increased only slightly, especially at higher values of stress. Surprisingly, however, the dynamic simulation improved substantially for the training

Fig. 6. Agreement between the QSD-trained 4-10-1 ANN predictions and theoretical data obtained from the EKCs σ = f (ε, b, a = 1.0). The agreement plot combines results of training and test data.

Fig. 7. Schematic of simulation unit used in simulating the functions and data from sequentially trained networks.

and test curves, as evidenced from Figure 10a–f. The combined simulation SSE dropped from 100.08 for the 4-10-1 network to 9.948 for the 4-20-1 network. This demonstrates the high sensitivity of the simulation to the accuracy of QSD networks. An additional simulation experiment was run using QSD combined with FL (i.e., HM). Here, a four-element label vector was inserted into the input vector with label assignment as follows: (1, 0, 0, 0) for virgin loading, (0, 1, 0, 0) for unloading, (0, 0, 1, 0) for reloading, and (0, 0, 0, 1) for postloading. The total number of inputs to the network is now eight. An optimal network with 10 hidden nodes (i.e., 8-10-1) was obtained. This network was then used to

450

Basheer

Fig. 8. Simulation results of the EKCs σ = f (ε, b, a = 1.0) using QSD 4-10-1 ANN in comparison with theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5. Overall six-curve SSE ≈ 100.08.


451

shown in Figure 12a–f, with a computed overall six-curve SSE = 0.165. This is to be compared with the six-curve SSE = 1.019 for the labeled-QSD ANN simulations (see Figure 11). 4.6 Comparing the methodologies

Fig. 9. Agreement between predictions of QSD-trained 4-20-1 ANN and theoretical data obtained from EKCs. The agreement plot combines training and test data.

recursively simulate the six EKCs, and the results of simulations are shown in Figure 11a–f. The six-curve SSE was calculated as 1.019, a reduction from 100.08 for the 4-10-1 network and 9.948 for the 4-20-1 network. This indicates a significant improvement in the simulations with segmentlabeled network. Finally, simulations using two-point scheme encoding did not demonstrate a noticeable improvement in prediction over those of the one-point scheme. The results of these simulations are not presented in this article. 4.5.4 TSD. Several training techniques within this modeling methodology were investigated. These included (1) combined adaptive and fixed-architecture networks training and (2) hybrid mapping using TSD + FL and QSD + FL. Table 1 summarizes the results of five simulations, in which all comparisons are based on SSEPC. In the fixed-architecture network training, all networks were designed to include 10 hidden nodes and essentially were trained to 5000 iterations. Also, one network was trained adaptively to 10 hidden nodes to examine the effect of adaptive weights on the final network performance. Finally, one network was trained on data without labels and was used to compare with networks trained on labeled data. The following was observed (refer to Table 1): (1) labeled networks resulted in better predictions than unlabeled networks, (2) TSD outperformed QSD modeling, (3) a mixture of TSD and QSD training did not result in better networks, and (4) fixed-architecture training outperformed adaptive training, but only slightly. Agreement plots of the best network (experiment no. 2, Table 1) are

In order to select the best methodology, the five tested methodologies should be cross-compared with regard to both prediction accuracy and simplicity in model development. The criterion for selecting the best prediction methodology is straightforward and solely depends on the SSE of predicting both the training and test curves. However, the simplicity in model development is subjective, and thus only a qualitative measure may be used. Therefore, another measure of effectiveness is introduced that mainly reflects the degree of involvement (DI) in data processing prior to training. This includes labeling of problematic segments, subdividing region into subregion, presenting the data in historical format, etc. The DI, which is best judged by the modeler, is assigned as follows: 1 = little (minimal) involvement; 2 = little to moderate; 3 = moderate; 4 = moderate to high; and 5 = high (extreme) involvement. Table 2 presents the performance of the various methodologies in modeling the EKCs. The six-curve SSE represents the grand total for all the six curves (five training curves and one test curve). As seen from Table 2, the labeled TSD (i.e., TSD+FL) approach outperforms the other approaches in modeling the EKCs. The FF approach is almost as accurate as the labeled TSD, but it is too involved and suffers from serious limitations, as discussed previously. Therefore, the labeled TSD (HM) approach will be adopted for modeling the constitutive hysteresis behavior of a clayey soil observed from real laboratory testing. 5 APPLICATION TO SOIL A high-plasticity CH fat clay was selected for the σ -ε testing with the following physical properties as obtained from relevant standard tests: liquid limit (LL) = 51.2 percent, plastic limit (PL) = 22.5 percent, fraction finer than 2 µm = 42 percent, Harvard optimal moisture content (OMC) = 19.8 percent, and maximum dry density (MDD) = 1.685 g/cm3 . Unconfined compression tests were conducted (using strain rate of 2 percent per minute) on soil samples prepared at various densities γd and water contents w. Table 3 summarizes the experimental operating conditions and distribution of the tested samples. Set no. I included 26 σ -ε curves representing samples prepared dry (i.e., w < OMC) and wet (w > OMC) of optimum, each with one unload-reload cycle. Set no. II included 22 σ -ε curves for samples prepared dry of optimum, each with two unload-reload cycles. A tabulated summary of the tests results such as peak stresses, strains at peak, residual

452

Basheer

Fig. 10. Simulation results of EKCs σ = f (ε, b, a = 1.0) using QSD-trained 4-20-1 ANN with comparison to theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5. The overall six-curve SSE = 9.948.


453

Fig. 11. Simulation results for EKCs σ = f (ε, b, a = 1.0) using QSD-labeled 8-10-1 ANN with comparison to theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5. The overall six-curve SSE = 1.019.

454

Basheer Table 1 Accuracy of the several dynamic models of extended Konder’s stress-strain curves

Exp. no. 1 2 3 4 5

ANN architecture

Method of training

Fixed+labels Fixed+labels Fixed+labels Adaptive+labels Fixed+no labels

QSD TSD TSD + QSD∗ TSD TSD/QSD∗

∗ Using

SSE SSE Average (TRN) (TST) SSEPC 0.865 0.126 0.191 0.142 8.650

0.154 0.039 0.032 0.159 1.891

0.170 0.028 0.037 0.050 1.757

sequential/static training ratio of 2.

strengths, and all experimental curves is given in Basheer.1 One strain-controlled ANN-based constitutive model was developed for each category. Labeling of the curves proceeded by using a threeelement vector to denote the stage of loading on the curve, namely, virgin loading, unloading, and reloading. Postloading was combined with the reloading part because it only added an extra discontinuity (and difficulty) in the σ -ε curve. The label vector is denoted by (λ1 , λ2 , λ3 ) with the assignment as shown in Figure 2. In both networks, the input and output layers were essentially the same. The input layer consisted of eight nodes, namely, (1) the compaction conditions (w and γd ), (2) the three-element stage label vector, (3) the current states of strain and stress, and (4) the futuristic strain for which the stress (the only model output) is to be predicted. A mathematical neural network expression for the strain-controlled network is σ (i +1) = ANN[w,γd ,λ1 ,λ2 ,λ3 ,ε(i),σ (i),ε(i +1)]

(8)

A network representing the expression in Equation (8) is shown in Figure 13 with one transnational input node and one output feeding back into the input layer. For the one-cycle σ -ε model of clay, training of the labeled TSD networks was done by using the training and test data summarized in set no. I of Table 3. The optimal network was obtained at 20 hidden nodes (denoted by 8-20-1). The network was then used in the simulation unit (Figure 7) to recursively simulate all the training, test, and validation σ -ε curves starting from the initial state of freestress/free-strain (σ0 , ε0 ) = (0, 0). It is to be recognized that the test and validation curves represent cases different from those used in training in either the compaction condition or the unloading strain. The simulations resulted in R 2 = 0.97 for all training curves, 0.96 for test curves, and 0.95 for validation curves. Examples of such simulations are shown in Figure 14a, b, and c. As seen, there is good agreement between the experimental and predicted curves, indicating that the one-cycle network has learned, reasonably well, the hidden features embedded in the cyclic σ -ε curves at and away from the hysteresis.

For the two-cycle model, the training revealed an optimal network topology identical to that of the single-cycle model (i.e., 8-20-1). Good agreement also was observed between the experimental and the network-simulated curves with R 2 = 0.96, 0.95, and 0.98 for all the training, test, and validations curves, respectively. Figure 14d, e, and f shows sample comparisons between the simulated and corresponding σ -ε curves. Again, the simulated curves nearly matched the experimental analogues at and away from the unload-reload loops. A two-cycle σ -ε curve essentially comprises a monotonic loading curve prior to the first unloading and a onecycle curve prior to the second unloading. This implies that a well-trained two-cycle (or higher-order) neural model should contain mapping power sufficient to accurately simulate the monotonic and one-cycle curves. Also, when the unloading-reloading cycles do not induce a drastic change in the constitutive behavior of the material, any arbitrary cyclic model (one-cycle or higher) can simulate reasonably well the stress-strain curves regardless of the number of unload-reload cycles introduced, including the monotonic loading curve. Moreover, an arbitrary cyclic model also should be able to predict partial unloading even though the network has never been trained on such incompletely unloaded curves. The generalization capability (i.e., universality) of cyclic models is an interesting feature of ANN modeling that makes such models quite flexible and tractable tools. The explanation and verification of the generalization capabilities of ANN-based constitutive models are presented in Basheer.1 As a verification experiment, the stress-strain behavior of the soil was examined under three unload-reload cycles. The behavior was then simulated using the previously developed one-cycle model. Figure 15 shows the simulation of the stress-strain curve, which, of course, comprises a validation test. It is apparent that the one-cycle model is able to predict the higher-order cyclic curve with reasonably good accuracy.

6 SUMMARY AND CONCLUDING REMARKS The presence of hystereses such as those typical of constitutive behavior of materials poses a challenging problem to modeling using any modeling approach, whether conventional or neural network–based. The hystereses impart nonuniqueness to the data or the function being modeled, thus requiring special treatment of the data prior to use in modeling or the use of a special modeling technique. In this article, several techniques that may be used as frameworks for developing neural network–based models for approximating hystereses data were discussed. The similarity between the various methods pertains to the removal of the hysteresis effect from the data or the function by either


455

Fig. 12. Simulation results of EKCs σ = f (ε, b, a = 1.0) using TSD-labeled 8-10-1 ANN with comparison to theoretical curves. (a–e) Training curves for b = 0.5, 1.0, 2.0, 2.5, and 3.0, respectively, and (f) test curve for b = 1.5. The overall six-curve SSE = 0.1650.

456

Basheer Table 2 SSE and DI for the methodologies used in simulating the extended Konder stress-strain curves

Mapping approach Training SSEPC Test SSEPC Six-curve SSE DI FL FF QSD HM = TSD + FL HM = QSD + FL

0.180 0.028 1.856 0.025 0.173

0.286 0.029 0.670 0.039 0.154

1.186 0.169 9.949 0.164 1.019

2 4 2 2 2

processing the data or by employing sequential data presentation in which one state is assumed to be influenced by at least one previous state. The various methodologies were cross-compared in relation to their effectiveness in approximating a theoretical cyclic constitutive model derived from the simple monotonic Konder stress-strain expression. The reason for using a theoretical function is to eliminate the potential interference of noise-corrupted experimental data with the capacity of the methodology and/or the limitations of the neural network approach. A special hybrid approach that combines true sequential dynamic (TSD) training and labeling of problematic segments of the function resulted in the best approximation of the theoretical model. This hybrid approach simulates recursively the stress-strain behavior beginning from the initial zero-stress, zero-strain state. The hybrid modeling approach was then applied to approximate the stress-strain behavior of clayey material tested in uniaxial compression. A large number of tests were obtained that varied in compaction conditions, strain at which the soil was unloaded, and number of unload-reload cycles. The designed networks demonstrated high ability in simulating the real behavior of soil in all stages of loading in the neighborhood of the hysteresis and away from it. The high accuracy obtained with this new class of models is very important because of the deficiency of the classic techniques of constitutive modeling in achieving this accuracy. Moreover, the neural network approach is flexi-

Fig. 13. Schematic of constitutive ANN-based model for the cyclic behavior of soil.

ble and can be used to derive generalized models of soils with no more than just using additional data featuring all variables to be included in the model such as soil properties. Another feature of the neural network model is that of model inversion, a quite difficult task in classic modeling. Here, inversion is simply performed by switching the input with the output vectors and retraining. The simplicity in inversion is due to the nature of the sequential mapping that overcomes the nonuniqueness in the function. A few inversion simulations of constitutive models using this technique can be found in Basheer.1 Future studies should be directed toward further investigation of the effectiveness of the TSD training approach in modeling three-dimensional cyclic loading. This requires special representation of the stress-strain data in the three loading axes. A constitutive model would be further expanded by using stress-strain curves of a variety of soils with a wide range of physical properties and operating conditions. Such a model would be useful, when implemented in numerical solution techniques, in solving earth-related engineering problems.

Table 3 Physical conditions of the soil and number of samples tested with one and two unloading-reloading cycles (TRN, training; TST, testing; VAL, validation) Number of stress-strain curves or stress-strain data points (patterns)

Set No.

Water content w (%)

Strain at unloading εu (%)

Total no. of curves

Total no. of points

TRN curves

TRN points

TST curves

TST points

VAL curves

VAL points

I II

15–23 16–20

1.0–5.0 1.0–4.0

26 22

1570 1246

16 11

980 634

6 6

370 335

4 5

220 277


457

Fig. 14. Samples of comparison between experimental and simulated cyclic stress-strain curves. (a–c) Training, testing, and validation for single-cycle curves. (d–f) Training, testing, and validation for double-cycle curves. 1 psi = 6.895 kN/m2 .

458

Basheer

Fig. 15. Prediction of three-cycle σ -ε curve with the one-cycle neural constitutive model.

DISCLAIMER The contents of this article reflect the author’s views and do not reflect the official views or practices of the California Department of Transportation.

REFERENCES 1. Basheer, I. A., Neuromechanistic-based modeling and simulation of constitutive behavior of fine-grained soils, Ph.D. dissertation, Kansas State University, Manhattan, 1998. 2. Chen, W. F., Constitutive Equations for Engineering Materials, Vol. 2: Plasticity and Modeling, Elsevier, New York, 1994. 3. Chen, W. F. & Baladi, G. Y., Soil Plasticity: Theory and Implementation, Elsevier, New York, 1985. 4. Chen, W. F. & Saleeb, A. F., Constitutive Equations for Engineering Materials, Vol. 1: Elasticity and Modeling, Elsevier, New York, 1994.

5. Daniel, D. E. & Olson, R. E, Stress-strain properties of compacted clays, Journal of Geotechnical Engineering, ASCE, 100 (10) (1974), 1123–36. 6. Desai, C. S., Phan, H. V. & Sture, S., Procedure, selection and application of plasticity models for a soil, International Journal of Numerical and Analytical Methods in Geomechanics 5 (1981), 295–311. 7. Desai, C. S. & Siriwardane, H. J., Constitutive Laws for Engineering Materials with Emphasis on Geologic Materials, Prentice-Hall, Englewood Cliffs, NJ, 1984. 8. Ellis, G. W., Yao, C., Zhao, R. & Penumadu, D., Stressstrain modeling of sands using artificial neural networks, Journal of Geotechnical Engineering, ASCE, 121 (5) (1995), 429–35. 9. Ghaboussi, J., Garrett, J. H., Jr. & Wu, X., Knowledgebased modeling of material behavior with neural networks, Journal of Engineering Mechanics, ASCE, 117 (1) (1991), 132–53. 10. Ghaboussi, J. & Sidarta, D. E., New nested adaptive neural network (NANN) for constitutive modeling, Computers and Geotechnics 22 (1) (1998), 29–52. 11. Hassoun, M. H., Fundamentals of Artificial Neural Networks, MIT Press, Cambridge, MA, 1995. 12. Hecht-Nielsen, R., Neurocomputing, Addison-Wesley, Reading, MA, 1990. 13. Jordan, M. I., Attractor dynamics and parallelism in connectionist sequential machine, in Proceedings of the 8th Annual Conference of the Cognitive Science Society, Amherst, MA, 1986, pp. 531–46. 14. Konder, R. L., Hyperbolic stress-strain response: Cohesive soils, Journal of Soil Mechanics and Foundation Engineering Division, ASCE, 89 (1) (1963), 115–43. 15. Penumadu, D. & Zhao, R., Triaxial compression behavior of sand and gravel using artificial neural networks (ANN), Computers and Geotechnics 24 (1999), 207–30. 16. Poggio, T. & Girosi, F., Networks for approximation and learning, Proc. IEEE 78 (9) (1990), 1481–97. 17. Prevost, J. H. & Kean, C. M., Shear stress-strain curve generation from simple material parameters, Journal of Geotechnical Engineering, ASCE, 116 (8) (1990), 1255–63. 18. Simpson, P. K., Artificial Neural Systems: Foundations, Paradigm, Applications, and Implementations. Pergamon Press, New York, 1991. 19. Zhu, J., Zaman, M. & Anderson, S., Modeling of soil behavior with a recurrent neural network, Canadian Geotechnical Journal 35 (1998), 858–72.