an evolutionary programming approach for portfolio selection

Grid Technology in Financial Planning - A Methodology for Portfolio Structuring 1

Jing Tang1, Meng Hiot Lim1, and Yew Soon Ong2 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 {pg04159923, emhlim}@ntu.edu.sg 2 School of Computer Engineering, Nanyang Technological University, Singapore 639798 [email protected]

ABSTRACT This report outlines the development of a grid-enabled computational methodology for solving the portfolio optimization problem. Unlike the classical treatment of a set of mean-variance optimized portfolios as a deterministic efficient frontier, we adopt a novel treatment of the frontier as a distribution of points on a statistical front. This is made possible by our stochastic algorithms, specifically an evolutionary programming (EP) approach that is able to derive meaningful portfolios for the statistical front. More importantly, the EP approach is inherently a parallel search algorithm, making it amenable to exploit the immense computational resources that a grid computing architecture can provide. In order to derive statistically meaningful front, one requires a great number of points to form the efficient band. With a grid computing framework, one is able to farm out such computationally expensive tasks and derive "interesting" portfolios to create the band. Such a capability lays the necessary backbone for developing further analytic methodology for portfolio rebalancing, an important aspect of financial planning. This is significant since in recent years, financial planning has permeated into the mindset of the population. The level of awareness is set to increase with the population's growing affluence.

1. Introduction A portfolio refers to a collection or basket of tradable assets or securities held by an investor. The composition of a so-called optimal portfolio is formulated so as to achieve good rate of return with the least risk exposure possible. Hence it is a matter of trading off between maximizing return and minimizing risk. To put it simply, an investor who has a greater propensity to stomach risk will logically seek a premium in terms of higher rate of return for the greater risk exposure. The level of risk is controlled through diversification while the optimal proportioning of investment capital across a broad spectrum of securities maximizes the rate of return. Portfolio management is thus an important facet of financial management. In a modern day scenario where superior information technology infrastructure is a norm, efficient and reliable computing methodologies are therefore crucial. Markovitz [1952] laid the foundation of a framework for mean-variance portfolio optimization. The theory put forth a methodology for deriving an optimal portfolio based on the concept of minimizing risk associated to a level of expected return, the so-called mean-variance optimized portfolio. Accordingly, the higher the return expected of a portfolio, the higher the risk associated to the securities held in the portfolio. Conceptually, one is to allocate investment capital across a few classes of assets or securities in a manner such that the risk exposure is minimized. To make use of this approach, one has to identify the efficient frontier associated to the assets in the portfolio. So far, the critical line algorithm has been the most widely or probably the only approach used to compute the efficient frontier [Markovitz et. al 1992][Perold, 1984]. With the efficient frontier, the problem reduces to one of mathematical computation of a set of weights describing the apportionment of capital across the various assets based on an expected mean rate of return and variance, which is represented as a point on the frontier. For example, assume that one is to allocate investment capital across N securities. From the efficient frontier, we identify a point P=(a,b) on the frontier as an investor’s mean-variance efficient portfolio. With the portfolio P, one is able to achieve a maximum expected return equals to b for a risk exposure specified by the variance a. It is therefore necessary to completely solve for a set of weights w1,

1

w2, ..., wN to fully define how the investor should commit his investment across the various assets. Usually, we require Σ wi = 1, meaning a fully invested wealth or resources. For example, Lagrangian optimization can be used to solve for the set of weights analytically [Campbell et. al 1997]. Each weight |wi| < 1, specifies the proportion of capital to be invested in asset i. Negative values of weights denote shorting of securities in the portfolio. For the purpose of illustration of our technique, we contend only with positive values of weights although shorting of securities can be easily accommodated in our approach. Typically in portfolio management, the basic philosophy calls for sufficient diversification of assets held in the portfolio consistent with the traditional investment adage of “never put all your eggs in one basket”. Our intention in this paper is to put forward a versatile approach that is suitable for solving large-scale asset allocation with flexibility to incorporate impositions of practical constraints. The context of size in this case refers to the number of assets in the portfolio and is indicative by the value N. The stochastic technique described in this paper is based on evolutionary programming, a potentially powerful methodology that is capable of offering a multitude of alternative well-diversified portfolios. Evolutionary programming [Fogel 1995] is an approach based on Darwinian evolutionary metaphor of selection and adaptation. Unlike genetic algorithm, its more popular counterpart, EP relies primarily on mutation type of search and exploratory mechanisms as its basic search engine. Genetic algorithm on the other hand takes full advantage of crossover type of operation as its main exploratory mechanism. The main strength of EP, which is also similar to evolutionary strategy, is its ability to exploit direct linkages between parents and offspring through cycles of mutation based regeneration of the population pool. As a search technique, it is versatile and powerful for solving optimization problems, most notably in function approximation and combinatorial optimization. EP is inherently a parallel computational search methodology. When applied to solve the portfolio optimization problem, the algorithm leads itself to parallelization in a grid computing environment. With this added computational horsepower, one is able to solve problems that are computationally demanding. In this respect, the derivation of a reliable and statistically meaningful efficient frontier, which forms the cornerstone of asset allocation in risk management, can greatly benefit from a grid computing framework. The next section briefly presents the basics of mean-variance portfolio optimization and the algorithm for computing the efficient frontier by highlighting the coding structure and operations specific to the needs of our problem. Subsequently in Section 3, we implement and demonstrate the applicability of our sequential algorithm by applying it to a set of real data involving 24 classes of securities as well as the parallel algorithm for the multiple EP. Finally, we end the paper by summarizing the contributions of this report and discussing from a practical perspective the implication of constraints handling within the context of our approach.

2. Architecture design For a portfolio P, the expected return of the portfolio Rp is essentially a function of two components:

Rp = f(µp ,σp)

(1)

In Eq.1, µp refers to the average return attributed to the weighted average returns of all the securities in the

portfolio. The second term σp represents the risk exposure of the portfolio. To calculate the average return is relatively straightforward. As shown in Eq. 2, it is basically the sum of the averages weighted according to the contribution of a security in the portfolio:

µp = w1µ1 + w2µ2 + ... + wNµN

(2)

In Eq. (2), µi denote the mean expected return for asset i in the portfolio. Correspondingly, wi denote the weightage of asset i in the portfolio. The second component of Eq.1 corresponds to the risk associated to

the portfolio. The overall portfolio risk σp which is measured in terms of standard deviations (or variance) of the individual assets are computed as per Eq. (3) below:

2

σp 2 = Σ Σ wiwjσiσjρij

(3)

Eq.(3) is useful in describing the risk associated to a portfolio as being dependent on the weights, risk of the individual assets and the correlation between the various assets. Collectively, the term σiσjρij can simply be referred to as the covariance between assets i and j. One way to estimate the covariance is to compute it from the historical rate of return data of both assets. For a portfolio, the weights corresponding to the allocation of capital to assets are represented as real numbers less than one although in our algorithm implementation, we actually make use of an integer representation for practical considerations. Hence, a solution coding describing the weight vector Wp for allocating capital involving N assets can be represented as a sequence of integer values, which is written symbolically as follows: Wp = (I1,I2,...,IN) (4) The integer values can be converted to weight ratios through normalization. For each structure, the solution must satisfy the constraint ΣIi = K, where K is the base value for normalization. Algorithm:: create_efficient_frontier // // // //

g = generation counter P(g) = population pool at generation g M = number of points for plotting efficient frontier T = iteration limit

for (i = 1 to M) g := 0; initial_population P(0); evaluate P(0) while (g < T) create_new_population P’(g) pop_mutate P(g) elitism evaluate P’(g) fitness_adjustment P’(g) P(g) = P’(g) If (fittest_individual< threshold_VTR) Compute_variance Update best_variance If ( var_count >= limit) break; Else continue; Write_output (weights, exp_return, variance) End; Figure 1: Pseudo-code of EP algorithm for approximating efficient frontier. The pseudo-code in Figure 1 describes the basic structure of the program. The primary evaluation of fitness for each solution in the population pool is based on the rate of return specified in Eq. (2). For each set of weights, the primary fitness is evaluated based on how closely the expected rate of return is satisfied. The evaluation function for computing the fitness of a solution structure denoted as s is as follows: fitness(s) =

λ e-|VTR – µ(Ws)|

(5)

In Eq.5, λ is a constant scaling factor while the exponential term refers to the absolute difference between VTR (the desired target value) and µ(Ws) denote the average return as determined by Eq.2. Ws refers to the

3

weights which are decoded from the structure s. The fitness levels of the individuals in the population are also dynamically adjusted by scaling the fitness with respect to the average, minimum and maximum fitness values in the population. We specify a threshold or tolerance that is chosen in the range of 0.00001 to 0.01 as a basis for achieving the expected rate of return as computed by means of Eq. (2). For each solution within the specified threshold, the secondary fitness based on the variance criterion as specified by Eq. (3) is computed. Since the minimum variance is not known, the parameters used in the evaluation of the secondary fitness are adjusted dynamically as and when better solutions are uncovered. There are basically 3 types of operations employed in creating a new population with sufficient diversity and yet maintaining focus on good solution candidates so as to effect convergence towards desirable solution fairly quickly. The primary mutation type operation is based on a biased stochastic selection type of asexual reproduction. In each generation, the selected parent is altered; the extent of which is stochastically determined. The guiding principle behind the remaining two mutation operations is to provide focus on search that will yield the optimum solution. In one type of mutation, the operation involves culling a significant portion of the population pool, replacing them with the fittest ones in the population. Subsequently, a two-point exchange operation is carried. For the other mutation operator, the level of modification of each individual structure is determined based on its fitness level. In the pseudocode of Figure 1, we categorize these 3 types of mutation operations simply as “pop_mutate P(g)”. Elitism strategy ensures that a certain number (>1) of the fittest individuals in the population pool is retained in the next generation. In our approach, we consider the efficient frontier as a distribution of points plotted on axes of mean versus the variance. Our justification for such an approach is based on the approximation of data used in the determination of efficient frontier. To begin with, the data (expected return and risk of assets) are approximations derived from part historical data. It is therefore not meaningful to derive an absolute efficient frontier. It therefore makes none sense to derive a statistical approximation of the frontier. In this context, the concept of a statistical efficient frontier band is more appropriate. In plotting efficient frontier band, the more points we consider, the more statistically meaningful is the efficient frontier. But calculating many points sequentially takes up the overwhelming bulk of the computation. Considering the time incurred in calculating more points, the spontaneous outcome is to apply parallel methodology, where the calculation of points are distributed to multiple machines. Parallelism is thus considered a desirable feature of any framework for optimization of computationally expensive problems. In the present algorithm, it is straightforward to achieve parallelism, since each point of the efficient frontier can be conducted independently, carrying out separate evolutionary programming on its own.

3. Implementation detail & Demonstration To demonstrate the parallel algorithm, we employed NetSolve, a computational platform which facilitates grid-based heterogeneous computing in a transparent and efficient manner. Parallelism is achieved by wrapping the evolutionary programming for each point on a NetSolve server. Hence, using the farming client application programming interface, the evolutionary programming for each point of the efficient frontier can be readily executed in parallel on remote servers. Even though, we used a centralized database, it should be noted that NetSolve has capabilities for distributed data storage on remote servers. Although our present demonstration is centered around NetSolve as the supporting infrastructure, it is clear that the whole approach can be easily configured to take advantage of a grid network. 3.1 Sequential algorithm The algorithm outlined was implemented in C programming language. In order to minimize rounding-off errors, all mathematical calculations are carried out in double precision computations. To test our algorithm and approach, we make use of a historical data set consisting of monthly rate of returns for 24 classes of securities. The historical data covers a period of 10 years from June 1985 to May 1996 [Harvey, 1999]. Using a spreadsheet program, we converted the data to annual rate of returns and computed the mean and covariance matrix involving the 24 classes of securities. With the mean rate of returns for each security and the covariance matrix, we compute the efficient frontier using the EP algorithm described earlier.

4

0.14 0.12

Mean

0.1 1986-1996 1986-1995 1986-1994

0.08 0.06 0.04 0.02 0 0

0.1

0.2

0.3

Var Figure 2: Efficient frontiers obtained from data of different year. Figure 2 shows the efficient frontier computed for the 24 classes of securities from data of different year. The terminating criterion is set at 1000 iterations. From our observations, the algorithm can usually converge to a solution with significantly less number of iterations thus providing a close approximation of the efficient frontier. For the plots in the figure, which is based on 100 points, the computation of the efficient frontier takes almost 150 seconds on a personal computer with 1.9GHz-clock speed. The plots of the efficient frontier on Fig. 2 show that the program can approximate the frontier consistently and reliably. We do not expect it to be computationally demanding to perform multiple runs of the EP algorithm. From the frontiers obtained through various runs of the program, one can subsequently make use of a simple curve smoothing routine to derive a smoothly defined efficient frontier. This in our view is secondary since it is a well-known fact that statistical estimation of return and risk by itself is prone to uncertainty and error. This is also not necessary if we were to treat the efficient frontier as a statistical band consisting of points (each point representing a mean-variance efficient portfolio) that are able to achieve the optimal mean-variance object without any significant deviation. 3.2 Parallel algorithm The simulations of parallel algorithm were carried out on a cluster of Pentium IV 1.9 GHz platforms. Each machine is equipped with 256Mb of RAM, and running the Linux Redhat 7.0 operating system. For each benchmark problem, we conduct 10 simulation runs to obtain the statistical result. Unlike the sequential algorithm, we consider 100 points and 200 points in the parallel algorithm. Figure 3 shows the performance of the parallel algorithm in terms of computation time carried out on 1 to 11 machines. Figure 3 shows that for both 100 points and 200 points, the total CPU time reduces accordingly along with the increase of the machines. The extent of reduction of computation time is more obvious with the increase of the number of points of the efficient frontier. This facilitates users to derive the efficient frontier in a more time-efficient manner. It shows the great commercial potential of providing the efficient frontier to users that are required to operate in a competitive market.

5

300 Time (second)

250 200 100 requests 200 requests

150 100 50 0 1

2

3

4 5 6 7 8 9 10 11 Number of machines

Figure 3 : The reduction of computation time for parallel algorithm of efficient frontier

4. Conclusion EP can be a powerful tool for financial market applications, in particular for effective portfolio management. With the EP algorithm, the efficient frontier for a cluster of assets or securities can be easily approximated. This is essential as decisions on the allocation of capital using the mean-variance approach hinges on the efficient frontier. Performance wise, the algorithm was shown to be reliable and fast, making it potentially useful for very large-scale asset allocation and also for situation involving dynamic asset reallocation. Aiming at improving the efficiency and effectiveness of the algorithm, we implement the multiple evolutionary programming algorithm embedded in a grid-enabled solver  NetSolve. It is a challenge that we combine the model with the advanced computing technology. To verify the performance of the parallel implementation of the multiple EP, we run simulation for both 100 points and 200 points. The simulation shows the total CPU time reduces accordingly along with the increase of the machines. The successful application of the multiple evolutionary programming algorithm within a grid computing environment for providing efficient frontier demonstrate its potential in solving other computationally demanding financial problems. It serves as a good example in financial field that takes advantage of advanced computing technology. We envisage that our approach can be a significant step towards solving other financial problems efficiently and providing great commercial potential.

References Campbell, J Y, Lo, A & MacKinlay, A C [1997], “The Econometrics of Financial Markets,” Princeton Univ Press Fogel, D B [1995], “Evolutionary Computation: Toward a New Philosophy of Machine Intelligence,” IEEE Press, Piscataway, NJ. Harvey, C [1999], Duke University, permission to use data; original source of data – Ibbotson Associates. Markovitz, H [1952], “Portfolio selection,” J. Finance, vol. 7, pp. 77-91. Markovitz, H, Todd, P, Xu, G & Yamane, Y [1992], “Fast Computation of Mean-Variance Efficient Sets Using Historical Covariances,” Journal of Financial Engineering, vol. 1, no. 2, pp. 117-132. Perold, A.F. [1984] “Large-Scale Portfolio Optimization,” Management Science, vol. 30, no. 10, October.

6