Combining Global and Local Optimization Techniques for ... - OnePetro

Combining Global and Local Optimization Techniques for Automatic History Matching Production and Seismic Data S. Mantica and A. Cominelli, ENI-Agip, and G. Mantica, INFN-INFM, U. dell’Insubria

Summary Gradient-based optimization techniques are increasingly adopted by oil industries for computer-aided history matching because of the great timesaving they can offer over conventional trial-anderror approaches. However, these methods lead to the identification of a single set of parameters, thus neglecting the inherent nonuniqueness of the solution of the underlying inverse problem. In this paper we propose a new approach that couples a chaotic sampling of parameter space with a local minimization technique. Through the evolution of a nonlinear dynamical system, we identify several points to be successively used as initial guesses for a local gradient-based optimizer. This provides a series of alternative matched models with different production forecasts that improve the understanding of the possible reservoir behaviors. The validity of this approach has been proven on a synthetic reservoir derived from a real west Africa field. Introduction The search of an optimal set of simulation parameters to match reservoir production and time-lapse seismic data is usually a significant problem. Typically, it requires the minimization of a nonconvex, least-squares objective function in a parameter space populated by many local minima. Starting from a physically reasonable point, gradient-based methods1,2 allow a fast descent to the closest minimum. The drawback of this approach is potentially twofold. First, an unsatisfactory reduction in the cost function may result. This could only be improved with a redefinition of some of the initial parameters. Second, even when the match is acceptable, only a single forecast scenario is produced. The nonconvex nature of the history-match problem can be conceptually better tackled using stochastic global optimization techniques, in which parameter space is explored by randomly generated trajectories until a satisfactory minimum is reached. In this framework, entrapment around local minima is avoided by ad hoc hill-climbing rules. A typical approach of this kind is the well-known simulated annealing3 (SA) method, in which the uphill moves are accepted in accordance with a thermally driven Metropolis rule.4 A cooling of the fictitious statistical mechanical system defined in this way guarantees the convergence toward the global minimum. Many authors have already proposed global minimization approaches, either stochastic or deterministic, in the field of reservoir simulation (see Ouenes et al.5 for a review). In particular, Simulated Annealing,6 the Tunneling Method,7 Genetic Algorithms,8 and hybrid approaches9 all seem to be very promising. Unfortunately, however, global convergence—even to an approximation of the solution—usually requires a huge number of iterations. As a matter of fact, this price is often too high for the reservoir history-match problem, in which the computation of the objective function is mostly expensive.

Copyright © 2002 Society of Petroleum Engineers This paper (SPE 78353) was revised for publication from paper SPE 66355, first presented at the 2001 SPE Reservoir Simulation Symposium, Houston, 11–14 February. Original manuscript received for review 29 March 2001. Revised manuscript received 28 November 2001. Manuscript peer approved 21 December 2001.

June 2002 SPE Journal

Intuitively at least, some sort of mating between local and global techniques should lead to a reasonable compromise between the slow convergence of the latter and the fast, but locally trapped, nature of the former. We might think of generating some (short) trajectories in parameter space, which would hopefully get close to some minima of the history-match problem. A reasonably selected sample of points on these trajectories could then be used to start a good local gradient-based optimizer. Certainly, this is not an entirely new idea; many minimization libraries employ the technique of multiple starting points. What is new in our approach is the attempt to improve dynamically—and not stochastically—on the “quality” of these starting points. The way in which we hope to achieve this goal will be evident in a moment. SA, surely the first candidate as global counterpart in a coupled approach, is in our opinion inadequate for our aim, at least in the known implementations. At the beginning of the evolution, when the annealing temperature is high, SA samples parameter space merely as an undirected random walk. It is only the slow descent of the temperature that drives the path toward the global minimum at later times. But even so, the Markovian property of SA implies that all the information on the structure of the objective function, obtained all along the minimization path, is discarded. This waste of information is critical when the number of function evaluations is considered a key issue for the problem under study. We then propose a totally different approach: the initial seeds for the local minimization should be provided by a nonlinear dynamical system, driven by the values of the objective function. This method, originally developed in the context of a fractal reconstruction problem,10 has been applied to simple synthetic history-match problems in a previous paper.11 A trajectory is generated in parameter space according to a nonlocal rule, by which the point xn+1 is obtained as a function, B␭, of all the previously generated points xn+ 1 = B␭共x1, . . ., xn兲. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( 1) In the above, ␭⳱an external parameter, which will be defined later on. Of course, B␭ is constructed also taking into account the values of the objective function f in these points. As it stands, B␭ is a deterministic process, as is the gradient method. The difference is that we do not want this process to aim straight at the supposed minimum, but rather we would like it to wander in space, spending larger spans of time where the objective function is lower, thereby learning its structure, and effecting better and better guesses of where the global minimum might lie. These guesses should then provide us with the initial points of the gradient technique. We propose to carry out such a complex task via a “chaotic” system, that is, one with trajectories that have maximal algorithmic complexity, and which can therefore encode the structure of the objective function. Within these bounds, the actual implementation, and the definition of the rule B␭, is a matter of problem engineering. We present in this paper a particular realization of this idea that has performed satisfactorily on the problem at hand. The paper proceeds as follows: in the next section we review some basic properties of stochastic minimization techniques with particular focus on SA, and we introduce the method of chaotic optimization. Next, we apply the proposed approach on a model 123

problem. Finally, our comments and perspective for future work conclude the paper. Chaotic Optimization of Complex Systems To emphasize the properties of chaotic sampling it is worth reviewing, by contrast, some features of SA-type methods. A sequence of points x1, . . . , xn in parameter space, called “accepted” points, is iteratively constructed. For shorthand notation, let f1, . . . , fn be the sequence of values of f in these points: fi=f(xi). A temperature parameter ␭n is also defined. A new search point x* is generated using a random-choice function g, which depends both on ␭n and the last accepted point xn x* = g共xn;␭n兲. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( 2) The function g(.;␭n) can be defined as a distribution centered around xn with standard deviation proportional to ␭n, (e.g., a Gaussian or a Cauchy-type distribution). The new point x* is added to the list of accepted points if f(x*)>fn, or if a Metropolis condition is fulfilled: xn+ 1 ≡ x* if rnd ⬍ exp关− 共f* − fn兲 Ⲑ ␭n兴, . . . . . . . . . . . . . . . . . . . . ( 3) where rnd⳱a random number generated using a uniform distribution in [0,1]. During the iterations, the temperature ␭n is usually decreased, following a logarithmic rule (classical SA) or a hyperbolic one (fast SA), whenever the corresponding states are considered sufficiently sampled, according to a user-defined criterion. The SA procedure just outlined has some interesting convergence properties,12 which ultimately derive from the fact that the neighborhood of any point in parameter space is visited infinitely often in time, but without any estimates on the rate at which the sequence approaches the solution. This makes the method too costly from a computational point of view for problems such as history matching, in which the function evaluation is a CPU-timeconsuming process. Moreover, as is evident from Eq. 2, only the last accepted point determines the process, and all the previously computed points and related function values are actually discarded. The approach we adopt here, to the contrary, tries to take advantage of all the collected information in a fixed, finite number of steps. Consider the dynamical system defined by Eq. 1, where the following properties are verified: • Both sequences, x1, . . . , xn, and f1, . . . , fn, contribute to B␭. • ␭ is a parameter that plays the same role as the temperature in the SA framework. • For any ␭>0, the motion generated by B␭ in parameters space is exponentially unstable. • At the iteration count n, the motion is attracted by the points with low values of f in the sequence x1, . . . , xn. • As ␭ diminishes, the attracting effect is amplified. A typical trajectory of a system so defined should, at least in principle, closely examine regions of low objective function. At the same time, the request of exponential instability should suffice to avoid trapping into a local minimum. One could also think of annealing this system by lowering the parameter ␭ during the evolution. As a result, the system B␭ should build up trajectories in which the parameter space structure of the objective function is encoded. Indeed, in the theory of dynamical systems, it can be proven that an unstable trajectory has maximal complexity: as a consequence, it represents the “best” possible “sampling” of f. Finally, a judicious choice of a subset of the points on the trajectory should give the starting points for a gradient minimization algorithm. The problem remains of deriving a dynamical system with these properties. We adopt here the following implementation: let the set of n points, x1, . . . , xn, in parameter space be given. The last point, xn, is the present point in the exploration of parameter space. Define a pseudoelectric charge qi for each of the n points as

qi =

124

冦

1 0 − exp关共fn − fi兲 Ⲑ ␭n兴 fi − fn

if if if if

fn⬎ fi fn ⱕ fi

i= n n− pⱕi⬍n . . . . (4) 0⬍i⬍n− p 0⬍i⬍n− p

In the above, p