Forecasting Gold Price A Comparative Study

Course of Financial Econometrics FIRM

Forecasting Gold Price A Comparative Study Alessio Azzutti, University of Florence

Abstract: This paper seeks to evaluate the appropriateness of a variety of existing forecasting techniques (6 methods) at providing accurate forecasts for gold price. Special consideration is given to the ability of these techniques at providing forecasts, which outperforms the random walk, used as a benchmark in the comparison. Interestingly, the results shows that only the ARIMA model is able to outperform the random walk at every horizons and on average the ARIMA model is seen providing the best forecasts in terms of the lowest root mean squared error over the 36 month forecasting horizons. Moreover, other four precious metals (silver, platinum, palladium and rhodium) data were used to run a multiplicative regression model; but, again, the optimal ARIMA reported finer results.

1

Introduction

Gold serves several function in the world economy, and its link with financial and macroeconomic variables are well-established (Pierdzioch et al., 2014). It has a monetary value and it is sought after by central banks to be part of their international reserves, which fulfil many purposes (Gupta et al., 2014). It has industrial uses and it can be transformed into jewellery. In modern finance, it is used as a hedge against inflation and a safe haven during crises. In fact, unlike stocks and bonds, the gold price has been constantly thought as the less risky asset. Gold has also other many distinguished features. Its supply is accumulated over the years and its global physical production can be as small as 2% of total supply, thereby in contrast to other commodities its annual production may not sway its price as other factors do. It is noticeable that given the significance of gold in the modern world, the ability to provide accurate forecasts into the future price of gold will be of primary importance. Moreover, there are benefits from finding the right model that forecasts the gold price more accurately than others do. Out-of-sample forecasting offers informational availability advantage for monetary and policymakers, hedge fund managers and international portfolio managers which can be used in gauging future inflation, estimating demand for jewellery, discerning investment in precious metals and other commodities and assessing the future movement of the dollar exchange rate. The figure below shows the time series for gold price from January 1970 to August 2014. In general, it has an increasing trend for the whole period and it seems to portray an exponential growth over the last 15 years. A first look at the figure shows signs of two major shocks post 1980 and 2010, which create structural breaks in the time series.

The aim of this paper is to evaluate the use of a variety of forecasting models representing both parametric and nonparametric techniques for obtaining accurate forecasts for the price of gold.

Whilst there exists various metrics, which are used for comparing between two different out-ofsample forecasts, the paper relies on the Root Mean Squared Error (RMSE). Although the RMSE criterion is in the process of gaining its popularity, it is necessary to briefly describe this measure at the outset so that the reader has a clear understanding of the results reported in this paper. In the very recent past, the RMSE criterion has been adopted as a popular measure in a range of forecasting studies (e.g., Altavilla and De Grauwe, 2010; Hassani et al.,2009, 2013; Beneki and Silva, 2013). The RMSE can be computed as follows: ∑𝑛 𝑖=1(ŷ𝑇+ℎ,𝑖 −𝑦𝑇+ℎ,𝑖 )

𝑅𝑀𝑆𝐸 = √

𝑛

2

,

where, ŷ𝑇+ℎ is the ℎ-step ahead forecast obtained by the model used to fit data, 𝑦𝑡 is the actual values and N is the number of the forecasts. This paper shows the results obtained by the application of several different forecasting techniques over 36 horizons from 1 month ahead until 36 months ahead. This time horizon enables to capture both short and medium run effectiveness of a given forecasting model at accurately predicting the future price of gold. The models evaluated in the study include an Autoregressive integrated moving average (ARIMA), an Autoregressive fractionally integrated moving average (ARFIMA), an Exponential smoothing (ETS), an Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS) and a Multiple linear regression (MLR) with other four precious metals monthly prices as explanatory variables. Thereafter each out-of-sample forecasting result will be compared along with a Random Walk (RW) model. The results from the six competitive models, selected based on the average lowest Root Mean Squared Error (RMSE) are reported in this paper. Note that unlike most of the existing literature on forecasting the price of gold, which analyses the role of financial and macroeconomic variable in predicting gold price, this study primarily concentrates on univariate approaches. In fact, the univariate approach relieves the problem of

choosing macroeconomic and financial variables that defines the state of the world economy, given that gold is a globally traded asset. Thereafter, a multiple regression model, including prices of four (silver, platinum, palladium and rhodium) other precious metals, is used and it is compared with univariate approaches. Finally, it is worth noting that using the above described approaches, which can handle non-stationarity of the data and hence the price of gold is forecasted and not the gold returns as in done in the literature (Shafiee and Topal, 2010)1. The remainder of the work is organized as follows. Section 2 describes the methodology underlying the various forecasting techniques whilst Section 3 is dedicated to an analysis of the data. Section 4 reports the empirical results and the paper concludes in Section 5.

2 2.1

Methodology Forecasting Models

Random Walk The random walk model is used as a benchmark, as it is a widely accepted practice that a forecasting technique which is recommended for a particular forecast should at least be more accurate than a random walk. In brief, today’s value for gold is forecasted to be tomorrow’s value for gold. And if the series being fitted by a random walk has an average trend that is expected to continue in the future, a so called drift might be taken into account.

Autoregressive Integrated Moving Average (ARIMA) ARIMA models are the most general class of models for forecasting a time series, which can be made to be «stationary» by differencing, perhaps in conjunction with logging or deflating if necessary. The random variable is viewed as a combination of signal and noise. An ARIMA model can be thought as a filter that tries to separate the signal from the noise, and the signal is then extrapolated into the future to obtain forecasts. The ARIMA forecasting equation for a stationary time series is a linear equation in which the predictors consist of lags of the dependent variable and/or lags of the forecast errors. A nonseasonal ARIMA model is classified as an ARIMA(p,d,q) model, where: • p is the number of autoregressive terms, • d is the number of nonseasonal differences, • q is the number of lagged forecast errors in the prediction equation.

1

Shalfiee and Topal (2010) address the issue of non-stationary gold prices by proposing a model that has three components: along term trend reversion component, a diffusion component and a jump or dip component.

The optimal ARIMA model, which is referred to as automatic-arima, is provided through the forecast package for R. A more detailed description of the algorithm underlying automatic-arima can be found in Hyndman and Khandakar (2008). The general Box-Jenkins model for y is written as: 𝑦 ∗ = Ø𝑦𝑡−1 + Ø𝑦𝑡−2 + … + Ø𝑦𝑡−𝑝 + 𝜃1 ɛ𝑡−1 + 𝜃2 ɛ𝑡−2 + … + 𝜃𝑞 ɛ𝑡−𝑞 , where Ø and 𝜃 are unknown parameters and the ɛ are i.i.d. normal errors with zero mean. Once the number of differences (d) has been determined through the unit root test, the algorithm that minimises the Akaike Information Criterion (AIC) is used in order to determine the value of p and q. The following is the formula to be minimised: 𝐴𝐼𝐶 = − log(𝐿) + 2(𝑝 + 𝑞 + 𝑃 + 𝑄 + 𝑘), where k = 1 if c ≠ 0 and 0 otherwise and L represents the maximum likelihood of the fitted model.

Autoregressive Fractionalized Integrated Moving Average (ARIMA) The study relies on the ARFIMA modelling process provided through the forecast package in R. Once again, the modelling algorithm automatically estimates and selects the p and q for an ARFIMA (p,d,q) model based on the Hyndman and Khandakar (2008) algorithm whilst the d and parameters are selected based on the Haslett and Raftery (1989) algorithm.

Exponential Smoothing (ETS) The ETS technique incorporate the foundations of exponential smoothing and is made available through the forecast package for the R software. ETS overcomes a limitation found in earlier exponential smoothing models which did not provide a method for easy calculation of prediction intervals (Makridakis, Wheelwright and Hyndman, 1998). The ETS model from the forecast package consider the error, trend and seasonal components along with over 30 possible options for choosing the best exponential smoothing model via optimization of initial values and parameters using the MLE and selecting the best model based on the AIC criterion. A detailed description of ETS can be found in Hyndman and Athanasopoulos (2013). Figure 2 summarises in table format the several ETS formula’s that are evaluated in the forecast package to select the best model to fit the data. Note that ((Hyndman and Athanasopoulos, 2013): • 𝑒𝑙𝑙𝑡 denotes the series level at time t, • 𝑏𝑡 denotes the slope, • 𝑠𝑡 denotes the seasonal component of the series, • M denotes the number of season in a year, • α, β, γ and ϕ are smoothing parameters, •

+ 𝜙ℎ = 𝜙 + 𝜙2 + … + 𝜙 ℎ and ℎ𝑚 = [(ℎ − 1)mod𝑚] + 1.

Figure 2: Formulae for recursive calculations and points forecast (Hyndman and Athanasopoulos, 2013).

Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS) The TBATS model is an exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trends and Seasonal components. The result is a technique, which is aimed at providing accurate forecasts for time series with complex seasonality. A detailed description of TBATS model can be found in De Livera et a. (2011) and is therefore not reproduced here. Even if the idea of non-seasonality of gold prices is definitely common, seasonal models are used to verify this characteristic of gold price time series.

Multiple linear regression (MLR) The general form of a multiple linear regression is 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1,𝑖 + 𝛽2 𝑥2,𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘,𝑖 + 𝑒𝑖 , where 𝑦𝑖 is the variable to be forecast and 𝑥1,𝑖 , … , 𝑥𝑘,𝑖 are the 𝑘 predictor variables. Each of the predictor variables must be numerical. The coefficients 𝛽1 , … , 𝛽𝑘 measure the effect of each predictor after taking account of the effect of all other predictors in the model. Thus, the coefficient measure the marginal effects of the predictor variables. As for simple linear regression, when forecasting we require the following assumption for the errors (𝑒𝑖 , … , 𝑒𝑁 ):  

The errors have mean zero; The errors are uncorrelated with each other;



The errors are uncorrelated with each predictor 𝑥𝑗,𝑖 .

As mentioned before, tomorrow’s gold prices is forecast through the information available of today’s other four precious metals prices (silver, platinum, palladium and rhodium). Thus, the general formula of the multiple regression under analysis will be in the following form: 𝐺𝑂𝐿𝐷𝑡 = 𝛽0 + 𝛽𝑆 𝑆𝐼𝐿𝑉𝐸𝑅𝑡−1 + 𝛽𝑃𝐿 𝑃𝐿𝐴𝑇𝐼𝑁𝑈𝑀𝑡−1 + 𝛽𝑃𝐴 𝑃𝐴𝐿𝐿𝐴𝐷𝐼𝑈𝑀𝑡−1 + 𝛽𝑅 𝑅𝐻𝑂𝐷𝐼𝑈𝑀𝑡−1 + 𝑒𝑡 .

3

Data

The data used in this study relates to the prices of gold, silver, platinum, palladium and rhodium. The price of gold is determined through trading in the gold and its derivatives markets, however a procedure known as the Gold Fixing in London which sprang from September 1919, provides a daily benchmark price to the industry. The afternoon fixing was introduced only in 1968 to provide a price when US markets are open which is caused by time differences. The other precious metals use the similar pricing model. A monthly adjusted close price of gold from August 1992 to September 2014 (266 observed prices) is used for the analysis. The data are freely available at www.kitco.com. It was possible to gather data for all the five metals only starting from August 1992, whereas the endpoint of the sample is purely driven by data availability at the time of conducting this study. The analysis evaluates out-sample forecast for horizons of ℎ = 1 step, up to ℎ = 36 steps ahead, and thereby enables capturing and evaluating both short and long run forecasting abilities of the given forecasting models. The table below presents some descriptive statistics of data as a help to understand the structure of the data and the time series. Table 1: Descriptive statistics for metals (Aug 1992 – Sep 2014)

Series Gold Silver Platinum Palladium Rhodium

Mean 661.68 11.29 883.24 369.80 1804.36

Median 394.05 5.93 698.50 319.00 1122.50

SD 456.68 9.33 504.61 229.45 1865.83

CV 69.02 82.60 57.13 62.05 103.41

Skewness 1.08 1.51 0.56 0.82 2.24

S-W(p)