Optimisation in Financial Engineering - SSRN

Optimisation in Financial Engineering An essay on ‘good’ solutions and misplaced exactitude. Manfred Gilli University of Geneva and Swiss Finance Institute Enrico Schumann University of Geneva

Der Mangel an mathematischer Bildung gibt sich durch nichts so auffallend zu erkennen wie durch maßlose Schärfe im Zahlenrechnen. (Carl Friedrich Gauß)

Imagine you take a trip to the Swiss city of Geneva, known for international organisations, expensive watches and, not least, its lake. After leaving the train station, you ask a passerby how far it is to the lakefront. You are told ‘Oh, just this direction, along the Rue du MontBlanc. It’s 512.9934 meters.’ This is not a made-up number. Google Earth allows you to track your path with such a precision. We measured 10 times, however, and obtained a range between roughly 500 and 520 meters for this particular route. (Unfortunately, had you really tried to take this way, during most of 2009 you would have found that the street was blocked by several construction sites.) When we are told ‘it’s 500 meters to the lake,’ we know that this should rather mean about, say, between 400 and 600 meters; we intuitively translate the point estimate into a range of reasonable outcomes. In other fields, we sometimes seem to lack such an understanding. In this short article, we shall look at such a field, financial engineering. We will argue that a misplaced precision can sometimes be found here, and we will discuss it in the context of financial optimisation. In Section 1 we will discuss the different types of approximation that are present in financial modelling. In Section 2 we move to optimisation. We discuss a particular type of optimisation methods, so-called heuristics, and their application and appropriateness through a concrete example from asset allocation. Section 3 concludes. Throughout this paper, we use the words precise and exact for the numerical quality of a quantity; we use accurate if a quantity is precise and economically meaningful.

1

Financial Modelling

In setting up and solving an optimisation model, we necessarily commit a number of approximation errors. (A classic reference on the analysis of such errors is von Neumann and Goldstine [1947]. See also the discussion in Morgenstern [1963, ch. 6].) This paper is based on Manfred Gilli’s leçon d’adieu, given at the Conférence Luigi Solari 2009 in Geneva. Both authors gratefully acknowledge financial support from the eu Commission through mrtn-ct-2006-034270 comisef. Version: 9 February 2010.

1

‘Errors’ does not mean that something went wrong; these errors will occur even if all procedures work as intended. The first approximation is from the real problem to the model. For instance, we may move from actual prices in actual time to a mathematical description of the world, where both prices and time are continuous (ie, infinitely-small steps are possible). Such a model, if it is to be empirically meaningful, needs a link to the real world, which comes in the form of data, or parameters that have to be forecast, estimated, simulated or approximated in some way. Again, we have a likely source of error, for the available data may or may not well reflect the true, unobservable process. When we solve such models on a computer, we approximate a solution; such approximations are the essence of numerical analysis. At the lowest level, errors come with the mere representation of numbers. A computer can only represent a finite set of numbers exactly; any other number has to be rounded to the closest representable number, hence we have what is called roundoff error. Then, many functions (eg, the logarithm) cannot be computed exactly on a computer, but need be approximated. Operations like differentiation or integration, in mathematical formulation, require a ‘going to the limit’, ie, we let numbers tend to zero or infinity. But that is not possible on a computer, any quantity must stay finite. Hence, we have so-called truncation error. For optimisation models, we may incur a similar error: some algorithms, in particular the methods that we describe below, are stochastic, hence we do not (in finite time) obtain the model’s ‘exact’ solution, but only an approximation (notwithstanding other numerical errors). In sum, we can roughly divide our modelling into two steps: from reality to the model, and then from the model to its numerical solution. Unfortunately, large parts of the quantitative finance literature seem only concerned with assessing the quality of the second step, from model to implementation, and attempt to improve here. In the past, a certain division of labour was necessary: the economist created his model, and the computer engineer put it into numerical form. But today, there is little distinction left between the researcher who creates the model, and the numerical analyst who implements it. Modern computing power allows us to solve incredibly-complex models on our desktops. (John von Neumann and Herman Goldstine, in the above-cited paper, describe the inversion of ‘large’ matrices where large meant n > 10. In a footnote, they ‘anticipate that n ∼ 100 will become manageable’ (fn. 12). Today, Matlab inverts a 100×100 matrix on a normal desktop PC in 1/1000 of a second. But please, you will not solve equations by matrix inversion.) But then of course, the responsibility to check the reasonableness of the model and its solution lies – at all approximation steps – with the financial engineer, and then only evaluating problems at the second step falls short of what is required: any error in this step must be set into context, we need to compare it with the error introduced in the first step. But this is, even conceptually, much more difficult. Even if we accepted a model as ‘true’, the quality of the model’s solution would be limited by the attainable quality of the model’s inputs. Appreciating these limits helps to decide how ‘exact’ a solution we actually need. This decision is relevant for many problems in financial engineering since we generally face a trade-off between the precision of a solution, and the effort required (most apparently, computing time). Surely, the numerical precision with which we solve a model matters, we need reliable methods. Yet, empirically, there must be a required-precision threshold for any given problem. Any improvement beyond this level cannot translate into gains regarding the actual problem any more; only in costs (increased computing time or development costs). For many finance problems, we guess, this required precision is not high.

2

Example 1 In numerical analysis, the sensitivity of problem is defined as follows: if we perturb an input to a model, the change in the model’s solution should be proportional. If the impact is far larger, the problem is called sensitive. Sensitivity often is not a numerical problem; it rather arises from the model or the data. In finance, many models are sensitive. Figure 1 shows the S&P 500 from 31 December 2008 to 31 December 2009, ie, 253 daily prices. The index level rose by 23%. 23.45%, to be more precise (from 903.25 to 1 115.10). But does it make sense to report this number to such a precision? 1100 1000 900 800 700 Jan 09

Apr 09

Jul 09

Oct 09

Jan 10

10%

15%

20%

25%

30%

35%

Figure 1. Left: The S&P 500 in 2009. Right: Annual returns after jackknifing 2 observations. The vertical line gives the realised return.

Suppose we randomly pick two observations – less than one percent of the daily returns – and delete them; then we again compute the yearly return. Repeating this jackknifing 5 000 times, we end up with a distribution of returns; it is pictured in the right panel of Figure 1. The median return is about 23%, but the 10th quantile is 20%, the 90th quantile is 27% (the minimum is only about 11%, the maximum is 34% !). Apparently, tiny differences like adding or deleting a couple of days cause very meaningful changes. This sensitivity has been documented in the literature (for instance in Acker and Duck [2007], Dimitrov and Govindaraj [2007]), but it is often overlooked or ignored. Hence the precision with which point estimates are sometimes reported must not be confused with accuracy. We may still be able to give qualitative findings (like ‘this strategy performed better than another’), but we should not make single numbers overly precise; we need robustness checks. Returns are the empirical buildings blocks of many models. If these simple calculations are already that sensitive, we should not expect more complex computations to be more accurate. Example 2 The theoretical pricing of options, following the papers of Black, Scholes and Merton in the 1970s, is motivated by an arbitrage argument according to which we can replicate an option by trading in the underlier and a riskless bond. A replication strategy prescribes to hold a certain quantity of the underlier, the delta. The delta is changing with time and with moves in the underlier’s price, hence the options trader needs to rebalance his positions. Suppose you live in a Black–Scholes–Merton world. You just sold a onemonth call (strike and spot price are 100, no dividends, riskfree rate is at 2%, volatility is constant at 30%), and you wish to hedge the position. There is one deviation from Black– Scholes–Merton, though: you cannot hedge continuously, but only at fixed points in time (see Kamal and Derman [1999]). We simulate 100 000 paths of the stock price, and delta-hedge along each path. We compute two types of delta: one is the delta as precise as Matlab can do; one is rounded to 3

20

20

15

15

10

10

5

5

0

0

−5 80

90

100

110

−5 80

120

90

100

110

120

Figure 2. Payoff of replicating portfolios with delta to double precision (left), and delta to 2 digits (right).

two digits (eg, 0.23 or 0.67). The following table shows the volatility of the hedging-error (ie, difference between the achieved payoff and the contractual payoff), in % of the initial option price. (It is often helpful to scale option prices, eg, price to underlier, or price to strike.) Figure 2 shows replicated option payoffs. frequency of rebalancing

with exact Delta

with Delta to two digits

18.2% 8.3%

18.2% 8.4%

once per day five times per day

The volatility of the profit-and-loss is practically the same, so even in the model world, nothing is lost by not computing delta to a high precision. Yet in research papers on option pricing, we often find prices and Greeks to 4 or even 6 decimals. Here is a typical counterargument: ‘True, for one option we don’t need much precision. But what if we are talking about one million options? Then small differences matter.’ We agree; but the question is not whether differences matter, but whether we can meaningfully compute them. (Your accountant may disagree. Here is a simple rule: whenever you sell an option, round up; when you buy, round down.) Between buying one share of ibm stock or buying one million shares, there is an important difference: you take more risk. We can rephrase our initial example: you arrive at the train station in Geneva, and ask for the distance to Lake Zurich.

2

Optimisation in financial engineering

Heuristics The obsession with precision is also found in financial optimisation; researchers are striving for exact solutions, better even if in closed-form. Finding these exact solutions is not at all straightforward; for most problems it is not possible. Importantly, optimisation methods like linear or quadratic programming put – in exchange for exact solutions – considerable 4

constraints on the problem formulation; we often must shape the problem such that it can be solved by such methods. Thus, we get a precise solution, but at the price of possibly incurring more approximation error at an earlier stage. An example from portfolio optimisation can illustrate this point. Markowitz [1959, ch. 9] compares two risk measures, variance and semi-variance, along the dimensions cost, convenience, familiarity, and desirability; he concludes that variance is superior in terms of cost, convenience, familiarity. For variance, we can compute the exact solution to the portfolio selection problem; for semivariance, we can only approximate the solution. But with today’s computing power (the computing power we have on our desktops), we can test whether even with an inexact solution for semi-variance, the gains in desirability are worth the effort. To solve such a problem, we can use optimisation heuristics. The term heuristic is used in different fields with different, though related, meanings. In mathematics, it is used for derivations that are not provable (sometimes even incorrect), but lead to correct conclusions nonetheless. (The term was made famous by George Polya [1957].) Psychologists use ´ the word for simple ‘rules of thumb’ for decision making. The term acquired a negative connotation through the works of D. Kahnemann and A. Tversky in the 1970s, since their ‘heuristics and biases’ programme involved a number of experiments that showed the apparent suboptimalitiy of such simple decision rules. More recently, however, an alternative interpretation of these results has been advanced, see for instance Gigerenzer [2004, 2008]. Studies indicate that while simple rules underperform in stylised settings, they yield (often surprisingly) good results in more realistic situations, in particular in the presence of uncertainty. The term heuristic is also used in computer science; Pearl [1984, p. 3] describes heuristics as methods or rules for decision making that are (i) simple, and (ii) give good results sufficiently often. In numerical optimisation, heuristics are methods that aim at providing good and fast approximations to optimal solutions [Michalewicz and Fogel, 2004]. Conceptually, they are often very simple; implementing them rarely requires high levels of mathematical sophistication or programming skills. Heuristics are flexible: we can easily add, remove or change constraints, or modify the objective function. Well-known examples for such techniques are Simulated Annealing and Genetic Algorithms. Heuristics employ strategies that differ from classical optimisation approaches, but exploit the processing power of modern computers; in particular, they include elements of randomness. Consequently, the solution obtained from such a method is only a stochastic approximation of the optimum; we trade-off approximation error at the solution step against approximation error when formulating the model. Thus, heuristics are not ‘better’ methods than classical techniques; the question is rather when to use which approach [Zanakis and Evans, 1981]. In finance, heuristics are appropriate. (Maringer [2005] gives a introduction and presents several case studies.)

Minimising downside risk In this section, we will consider a concrete example: portfolio optimisation. Our first aim is to evaluate the precision provided by a heuristic technique. To do that, we need to compare the in-sample quality of a solution with its out-of-sample quality. Then, we will compare several selection criteria for portfolio optimisation, and discuss the robustness of the results. Required precision We use a database of several hundred European stocks to run a backtest for a simple portfolio strategy: minimise semi-variance, subject to (i) the number of assets in the portfolio must be between 20 and 50, (ii) any weight of an included asset must 5

out−of−sample risk−adjusted return

out−of−sample risk in %

14

12

10

0.75

0.5

0

8

0

200 400 in−sample rank

−0.25 0

600

200 400 in−sample rank

600

Figure 3. Risk and risk-adjusted return. The grey dots give the actual portfolios, the dark line is a local average.

lie between 1% and 5%. We construct a portfolio using data from the last year, hold the portfolio for three months and record its performance; then we rebalance. In this manner, we ‘walk forward’ through the data which spans the period from January 1999 to March 2008. (Details can be found in Gilli and Schumann [2009].) The solution to this optimisation problem cannot be computed exactly. We use a heuristic method called Threshold Accepting. This method, however, only returns stochastic solutions: running the method twice for the same data set will lead to different optimal portfolios. With this method, we face an explicit trade-off between computing time and precision. So when we let the algorithm search for longer, then on average we get better solutions. We execute, for the same data set, 600 optimisation runs with differing numbers of iterations, and obtain 600 solutions that differ in their in-sample precision. Higher insample precision is associated with more computing time. We rank these portfolios by their in-sample objective function (ie, in-sample risk) so that rank-1 is the best portfolio, rank-600 is the worst. The left-hand panel of Figure 3 shows the resulting out-of-sample risk of the portfolios, sorted by in-sample rank. We observe an encouraging picture: as the in-sample risk goes down, so does the out-of-sample risk. In other words, increasing the precision of the insample solutions does improve the out-of-sample quality of the model. At least up to a point: for the best 200 portfolios or so, the out-of-sample risk is practically the same. So once we have a ‘good’ solution, further improvements are only marginal. We also compute risk-adjusted returns (Sortino ratios with a required return of zero), shown in the righthand panel. It shows a similar pattern, even though it is much noisier. The following table gives details (all annualised). The numbers in parentheses are the standard deviations of the out-of-sample results.

6

2 1.8 1.6 1.4 1.2 1 0.8 Jan00

May01

Oct02

Feb04

Jul05

Nov06

Apr08

May01

Oct02

Feb04

Jul05

Nov06

Apr08

2 1.8 1.6 1.4 1.2 1 0.8 Jan00

Figure 4. Upper panel: out-of-sample performance of one euro invested in the rank-1 portfolio. Lower performance: out-of-sample performance after jackknifing.

ranks all 1–50 51–100 101–150 151–200 201–300 301–400 401–500 501–600

average risk 9.55 (1.67) 8.03 8.07 8.09 8.17 8.51 9.16 10.94 12.49

(0.04) (0.05) (0.04) (0.08) (0.13) (0.24) (0.40) (0.55)

average risk-adj. return 0.48 (0.19) 0.66 0.66 0.64 0.63 0.59 0.49 0.35 0.19

(0.05) (0.05) (0.05) (0.07) (0.08) (0.11) (0.09) (0.11)

This randomness, it must be stressed, follows from the optimisation procedure: in each of our 600 runs, we obtained slightly different portfolios; each portfolio maps into a different out-of-sample performance. For the best portfolios, the improvements are minuscule; for example the average risk per year of the best 50 portfolios is 4 basis points lower than the risk of the next-best 50 portfolios. To judge the relevance of this randomness introduced by our numerical technique, we need to compare it with the uncertainty coming from the data. To build intuition, we jackknife from the out-of-sample paths just as we did in Example 1 above. An example is illustrated in Figure 4. In the upper panel, we picture the out-of-sample performance of one euro that was invested in the best portfolio (rank-1; this path corresponds to the left-most grey dot in Figure 3). In the lower panel, we see several paths computed after having randomly-selected and deleted one percent of the data points. The average risk of the best 50 portfolios in our tests was 8.03% per year, with a standard deviation of 4 basis points. With jackknifing one percent of the data, we obtain risk between 7.75% and 8.17%; with jackknifing five percent, we get a risk between 7.58 and 8.29, far greater than the randomness introduced by our method. For risk-adjusted return – in which we are naturally more interested –, things are even worse. The best 50 portfolios had an average 7

1

0.5 MV Upside potential Value−at−Risk 0 10

11

12

13

14 15 16 17 Annualised return in %

18

19

20

0.9

0.95

1

1 Minimum variance Upside potential ratio Value−at−Risk 0.5

0 0.5

0.55

0.6

0.65 0.7 0.75 0.8 0.85 Annualised Sharpe ratios

Figure 5. Returns and Sharpe ratios for different portfolio selection strategies.

risk–return ratio of 0.66. Just jackknifing the paths of these portfolios by one percent, we already get a range between 0.42 and 0.89. In sum, heuristics may introduce approximation error into our analysis, but it is swamped by the sensitivity of the problem with respect to even slight data changes. Hence, objecting to heuristics because they do not provide exact solutions is not a valid argument in finance. Robustness checks We run backtests for a a large number of alternative selection criteria, among them partial moments (eg, semi-variance), conditional moments (eg, Expected Shortfall), or quantiles (eg, Value-at-Risk). In the study described above, we only investigated the approximation errors of the optimisation method, compared with the errors coming from the data. But now we wish to compare different models. We implement a backtest like described above, but we also add a robustness check again based on a jackknifing of the data: suppose a small number of in-sample observations were randomly selected and deleted (we delete 10% of the data). The data has changed, and hence the composition of the computed portfolio will change. If the portfolio selection strategy is robust, we should expect the resulting portfolio to be similar to the original one, as the change in the historical data is only small, and we would also expect the new portfolio to exhibit a similar out-of-sample performance. Repeating this procedure many times, we obtain a sampling distribution of portfolio weights, and consequently also a sampling distribution of out-of-sample performance. We do not compare the differences in the portfolio weights since it is difficult to judge what a given norm of the difference between two weight-vectors practically means. Rather, we look at the changes in out-ofsample results. This means that for any computed quantity that we are interested in, we have a distribution of outcomes. Figure 5 gives some examples for different strategies. (More details can be found in Gilli and Schumann [forthcoming].) The figure shows the out-of-sample returns of three strategies: minimum variance, the upside-potential ratio [Sortino et al., 1999], and Value8

at-Risk; we also plot Sharpe ratios. Portfolios constructed with the upside-potential ratio, for instance, have a median return that is more than a percentage point higher than the return of the minimum variance portfolio; Sharpe ratios are also higher. Even Value-at-Risk seems better than its reputation. But most remarkable is the range of outcomes: given a 10% perturbation of in-sample data, returns differ by more than 5 percentage points per year.

3

Conclusion

In this article, we have discussed the precision with which financial models are handled, in particular optimisation models. We have argued that precision is only required to a level that is justified by the overall accuracy of the model. Hence, the required precision should be specifically analysed, so to better appreciate the usefulness and limitations of a model. Our discussion may appear trivial; everyone knows that financial markets are noisy, and that models are not perfect. Yet the question of the appropriate precision of models with regard to their empirical application is rarely discussed explicitly. In particular, it is rarely discussed in university courses on financial economics and financial engineering. Again, some may argue, the errors are understood implicitly (just like ‘500 meters’ means ‘between 400 and 600 meters’), or that in any case more precision does no harm; but here we disagree. We seem to have a built-in incapacity to intuitively appreciate randomness and chance, hence we strive for ever more precise answers. All too easily then, precision is confused with accuracy; acting on the former instead of the latter may lead to painful consequences.

References Daniella Acker and Nigel W. Duck. Reference-Day Risk and the Use of Monthly Returns Data. Journal of Accounting, Auditing and Finance, 22(4):527–557, 2007. Valentin Dimitrov and Suresh Govindaraj. Reference-Day Risk: Observations and Extensions. Journal of Accounting, Auditing and Finance, 22(4):559–572, 2007. Gerd Gigerenzer. Fast and Frugal Heuristics: The Tools of Bounded Rationality. In Derek J. Koehler and Nigel Harvey, editors, Blackwell Handbook of Judgment and Decision Making, chapter 4, pages 62–88. Blackwell Publishing, 2004. Gerd Gigerenzer. Why Heuristics Work. Perspectives on Psychological Science, 3(1):20–29, 2008. Manfred Gilli and Enrico Schumann. Optimal enough? No. 10, 2009.

COMISEF Working Paper Series

Manfred Gilli and Enrico Schumann. RiskReward Optimisation for Long-Run Investors: an Empirical Analysis. European Actuarial Journal, forthcoming. URL http://www.actuaries.org/Munich2009/Programme_EN.cfm. Michael Kamal and Emuanel Derman. When You Cannot Hedge Continuously: The Corrections of Black–Scholes. Risk, 1(12):82–85, 1999. Dietmar Maringer. Portfolio Management with Heuristic Optimization. Springer, 2005. 9

Harry M. Markowitz. Portfolio Selection. Wiley, New York, 1959. Zbigniew Michalewicz and David B. Fogel. How to Solve it: Modern Heuristics. Springer, 2004. Oskar Morgenstern. On the Accuracy of Economic Observations. Princeton University Press, 2nd edition, 1963. Judea Pearl. Heuristics. Addison-Wesley, 1984. George Polya. How to Solve it. Princeton University Press, 2nd edition, 1957. ´ Frank Sortino, Robert van der Meer, and Auke Plantinga. The Dutch Triangle. Journal of Portfolio Management, 26(1):50–58, 1999. John von Neumann and Herman H. Goldstine. Numerical Inverting of Matrices of High Order. Bulletin of the American Mathematical Society, 53(11):1021–1099, 1947. Stelios H. Zanakis and James R. Evans. Heuristic “Optimization”: Why, When, and How to Use It. Interfaces, 11(5):84–91, 1981.

10