Forecasting the number of extreme daily events ... - Wiley Online Library

3 downloads 579 Views 2MB Size Report
Nov 15, 2012 - nl/download/ensembles/download.php); NCEP daily precipi- tation for ...... by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their.
JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 117, D21110, doi:10.1029/2012JD018015, 2012

Forecasting the number of extreme daily events out to a decade ahead Rosie Eade,1 Emily Hamilton,1 Doug M. Smith,1 Richard J. Graham,1 and Adam A. Scaife1 Received 27 April 2012; revised 27 August 2012; accepted 28 September 2012; published 15 November 2012.

[1] The predictability of daily temperature and precipitation extremes is assessed out to a decade ahead using the Met Office Decadal Prediction System. Extremes are defined using a simple percentile based counting method applied to daily gridded observation data sets and corresponding model forecasts. We investigate moderate extremes, with a 10% probability of occurrence, ensuring they are frequent enough for robust skill analysis while having sizable impacts. We quantify the predictability of extremes, assess the impact of initialization, and compare with the predictability of the mean climate. We find modest but significant skill for seasonal predictions of temperature extremes in most regions (global area-average correlation of 0.3) and for precipitation extremes over the USA (area-average correlation of 0.2). The skill of both temperature and European rainfall extremes improves for multiyear forecast periods, as longer averaging periods reduce the impact of unpredictable short-term variations, capitalizing on predictable trends from external forcings. For 5 year periods out to a decade ahead, root-mean square temperature errors are reduced by 20% compared to use of climatology in most regions, apart from the southeastern USA. Initialization improves forecast skill for temperature and precipitation extremes on seasonal timescales in most regions. However, there is little improvement beyond the first year suggesting that skill then arises largely from external forcings. The skill for extremes is generally similar to, but slightly lower than, that for the mean. However, extremes can be more skillful than the mean, for example, USA cold nights, where trends in extremes are greater than for the mean. Citation: Eade, R., E. Hamilton, D. M. Smith, R. J. Graham, and A. A. Scaife (2012), Forecasting the number of extreme daily events out to a decade ahead, J. Geophys. Res., 117, D21110, doi:10.1029/2012JD018015.

1. Introduction [2] Changes in extreme weather can have a much greater impact on society than changes in the mean climate. For example, heat waves, especially increases in numbers of warm nights, can lead to increased human mortality [McMichael and Githeko, 2001] and heavy rainfall can lead to high casualties and economic loss through flooding [Kunkel et al., 1999]. Indeed, the impact of climate change is likely to be most acute when natural variability and climate change combine to produce changes in the frequency of extreme weather events. However, previous assessments of monthly to decadal forecasts have mainly focused on the mean. This is partly due to the need to assess high spatial resolution observation data sets and model output in order to quantify extreme events, but also due to the low frequency of extreme events, inherent from their definition, making

1

Met Office Hadley Centre, Exeter, UK.

Corresponding author: R. Eade, Met Office Hadley Centre, Exeter EX1 3PB, UK. ([email protected]) Published in 2012 by the American Geophysical Union.

robust verification more difficult. Of course, skillful prediction of mean changes would be expected to imply skillful predictions of extremes, since a shift in the mean is likely to be accompanied by changes in the probabilities of extremes. Although this is the case for seasonal predictions of moderately extreme temperatures [Hamilton et al., 2012], previous studies have made it clear that it cannot simply be assumed that changes in extremes can be inferred from changes in the mean. For example, Hegerl et al. [2004] show that model projections of changes in mean temperature are significantly different to changes in extremes (annual hottest and coldest day) over about half of the globe. Furthermore, record daily high temperatures have been shown by Meehl et al. [2009] to vary differently to record low temperatures in the USA, leading to a robust increase in their ratio. This is also apparent over Australia where changes in mean annual temperature are largely consistent with changes in the frequency of maximum temperature records but not with the smaller changes in minimum temperature records [Trewin and Vermont, 2010]. Annual extremes vary differently to the mean where global warming significantly changes surface properties, such as regions of retreating sea ice and snow cover [Kharin and Zwiers, 2000, 2005]. Changes in the intensity of high precipitation extremes, especially in the

D21110

1 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

tropics and sub-tropics, tend to exceed those of the mean [Kharin et al., 2007], in line with a reduced probability of wet days. It is therefore important to quantify the accuracy of model forecasts through direct assessment of predictions of extremes and we do this on timescales ranging from seasonal to decadal. [3] Some work has been done to identify trends in extremes over the second half of the 20th century from observations [e.g., Frich et al., 2002; Alexander et al., 2006], and over multidecadal periods into the future using Global Climate Models (GCMs) [e.g., Kharin et al., 2007; Menéndez and Carril, 2010]. Experiments have been carried out to identify how soon changes in extremes may become detectable above weather noise; indeed, Russo and Sterl [2011] suggest extreme temperature changes may already be apparent. However, there are large uncertainties in future projections [Clark et al., 2010; Kharin et al., 2007], and very few studies have investigated the potential for forecasting changes in extremes on seasonal to decadal timescales. Robertson et al. [2009] found moderate skill in seasonal predictions of the frequency of wet days over Indonesia, but did not look at extreme wet days. Zeng et al. [2010] found modest skill in seasonal prediction of winter extreme precipitation over Canada using a regression method on lagged reanalysis data. A recent study using the Met Office seasonal prediction system (GloSea4 [Arribas et al., 2011]) assessed the skill of temperature extreme predictions on seasonal timescales [Hamilton et al., 2012], results of which are discussed in section 4. [4] Here we extend the approach of Hamilton et al. [2012] to assess the accuracy of model forecasts of the proportion of extreme days in a period, out to a decade ahead, and to investigate precipitation as well as temperature. We make comparisons of the predictability of extremes with that of the mean, for which decadal forecasts are already in operation. Furthermore, we investigate the impact of initializing forecasts with observations on the predictability of extremes. This paper is organized as follows. Section 2 describes the model and observation data used, Section 3 describes the methodology and Section 4 discusses the results, with conclusions presented in Section 5.

2. Data [5] We assess forecast skill in the Met Office Decadal Prediction System (DePreSys) [Smith et al., 2007, 2010] which uses the third Hadley Centre climate model (HadCM3) [Gordon et al., 2000]. HadCM3 is a coupled dynamical GCM, initialized from the observed state of the ocean and atmosphere to enable potential prediction of internal variability. The model also includes external forcings, using historic and projected changes in anthropogenic greenhouse gas and aerosol concentrations, volcanic aerosol concentrations and solar irradiance. We use an updated version of HadCM3 (the same as Smith et al. [2010]) which differs from the original version (used by Smith et al. [2007]) in that flux adjustments are applied. It also includes both the direct and first indirect (cloud albedo) effect of aerosols, simulated interactively [Collins et al., 2010].

D21110

[6] We analyze daily model output from hindcasts (retrospective forecasts) generated as part of the EU ENSEMBLES Project [Doblas-Reyes et al., 2010; Smith et al., 2010]. Each hindcast consists of 9 ensemble members using different combinations of perturbed model parameters to sample model uncertainty [Collins et al., 2006, 2010]. Seasonal skill is assessed from 6 month long hindcasts starting on 1st of February, May, August and November in each year from November 1960 to August 2006 inclusive. The 4 standard seasons, December–February, March–May, June–August and September–November (DJF, MAM, JJA, SON) at a lead-time of 1 month are assessed in these seasonal hindcasts. The hindcasts starting each November were extended to 10 years in length allowing skill out to a decade ahead to be assessed. The impact of initialization is assessed in these decadal hindcasts by comparing DePreSys to parallel hindcasts, referred to here as NoAssim, made using exactly the same external forcing factors but starting in 1860 with initial conditions taken from coupled model simulations of pre-industrial climate [Smith et al., 2007, 2010]. Using observed values of solar irradiance and volcanic aerosols beyond the hindcast start date will overestimate the skill of real forecasts because future changes of these are not known. We therefore use projected changes; estimating the total solar irradiance by repeating the previous eleven-year solar cycle, and volcanic aerosol concentration by using an exponential decay function from that at the start date (with a time scale of one year). Thus the effects of volcanic eruptions that occur during a hindcast are not included, although those that occurred before the hindcast start date are included. [7] We assess the skill of the model ensemble mean output relative to the following gridded observation data sets: HadGHCND daily minimum and maximum temperature, with the mean of the minimum and maximum used to calculate monthly mean temperature [Caesar et al., 2006] (http://www. metoffice.gov.uk/hadobs/); E-OBS daily precipitation for Europe [Haylock et al., 2008], developed for the European Union Framework 6 ENSEMBLES project (http://eca.knmi. nl/download/ensembles/download.php); NCEP daily precipitation for the USA [Higgins et al., 1996, 2010]. These observational data sets have variable amounts of missing data, both spatially and temporally. To reduce sampling errors due to missing observations when calculating observed monthly means or counting numbers of extremes, a daily data threshold is applied such that at least 90% of the relevant daily data needed to make that monthly observed data value must be present at a particular grid point for it to be included in the analysis. This can lead to a whole season or year of missing data in some regions, thus when assessing the accuracy of the forecasts, a further annual threshold is applied. This requires that at least 75% of annual observations relating to the assessed hindcasts must be present for a grid point to be included in the skill plots. For example, if fewer than 35 of the 46 hindcasts can be included in the assessment due to missing observations, the result is not plotted for that grid point. In practice this only applies to the observations from the last decade, for which there is reduced coverage in some years; over small regions of Asia and South America in the

2 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

HadGHCND temperature data set and the far east of the European region in the E-OBS precipitation data set.

3. Methodology [8] A wide variety of extreme definitions have been considered in previous studies. Frich et al. [2002] identified a number of these as a core set to be used for independent and robust monitoring of changes in climate, leading to a globalland data set of observed extremes [Alexander et al., 2006]. These include percentiles (e.g., daily temperature above the 90th percentile of its distribution), thresholds (e.g., total number of frost days [Meehl et al., 2004]), duration (e.g., number of consecutive dry days), intensity (e.g., intensity of warm spells, defined as the sum of degrees above a threshold for those days that are above that threshold) and absolute values (e.g., annual max temperature). Tebaldi et al. [2006] showed that considering projections of precipitation intensity and dry days together highlights tendencies for heavier rainfall events that would not be visible from changes in single indices or mean precipitation, suggesting that it can be beneficial to consider multiple indices. Other studies have investigated record high or low daily temperature [e.g., Meehl et al., 2009; Rahmstorf and Coumou, 2011; Trewin and Vermont, 2010] and return periods (the time between an extreme event of specified intensity reoccurring). Here we adopt the approach of Hamilton et al. [2012] and assess the predictability of the proportion of extreme days in a given period, using a simple percentile based counting method. [9] We assess relatively moderate extremes so that they are frequent enough for robust skill analysis while still having potentially important impacts (for example, as defined by the Expert Team on Climate Change Detection, Monitoring and Indices, ETCCDMI [Alexander et al., 2006], and as set out in the IPCC Fourth Assessment Report, WG1 [Trenberth et al., 2007]). Moderate 1 in 10 day extreme events are defined regionally using a percentile approach. Akin to Hamilton et al. [2012] we assess daily temperature warmer than the 90th percentile (warm events) or colder than the 10th percentile (cold events), using in turn both daily maximum (Tmax90 and Tmax10) or minimum values (Tmin90 and Tmin10). We extend the work of Hamilton et al. [2012] by additionally assessing extreme wet events (Precip90), defined using the 90th percentile of daily total precipitation after dry days have been removed. For observations, dry days were defined as those for which precipitation was less than or equal to 0.1 mm/day [e.g., Vlcek and Huth, 2009]. However, this definition is not necessarily appropriate for models. Instead, the threshold for model dry days was taken as equal in percentile to the threshold of 0.1 mm/day within the observed distribution. Assessment of low rainfall extremes (linked to drought) would likely require a duration-based definition, and is left for future work. [10] Model biases are inevitable, and must be taken into account when assessing skill. To do this we compare observed and forecast anomalies relative to their respective climatologies. As in previous studies [Smith et al., 2007, 2010] model climatologies and percentile thresholds were computed from independent simulations of the 20th century forced by anthropogenic (greenhouse gases, and aerosols) and natural (volcanic aerosols and solar irradiance) factors.

D21110

This is the standard approach for anomaly initialized hindcasts [Smith et al., 2007, 2012b], avoiding the need for bias-corrections based on the hindcasts themselves. The climatological period was 1958–2001, although the analysis was also carried out using the period 1979–2001 and the results showed little sensitivity to this change. We note that some bias is expected even with the anomaly initialization approach, due to sampling errors and initialization shocks [Smith et al., 2012b]. Bias adjustments based on the hindcasts might therefore improve skill further, but differences in skill, though generally small, can be sensitive to the use of “in sample” data for bias correction [Smith et al., 2012b]. [11] The proportion of extreme days within a month can be found by simply counting the number of days above/ below the thresholds. However, when counting the number of extreme days in a longer period, the static monthly percentile thresholds are discontinuous at the end of each month. This means that consecutive days on the transition between two months may be very similar but only one may be counted as extreme due to the different thresholds. An alternative method is to calculate moving daily thresholds over a rolling 5-day window centered on each day [e.g., Kenyon and Hegerl, 2008], but this was found to give a very noisy signal. To avoid this we fitted a smooth curve through the monthly thresholds at each grid point to provide continuous moving daily thresholds. This was achieved using a Fast Fourier Transform (FFT) with a half power of eight, applied to two periods of the seasonal cycle of monthly thresholds in order to maintain continuity at the annual boundaries. A half power of 8 was chosen to capture the multiple peaks in the seasonal cycle of precipitation while minimizing noise. For temperature the same method was used for consistency, despite its simpler seasonal cycle for which a lower half power could have been chosen. An advantage of this FFT method is that each daily threshold is calculated from a much larger sample size of daily values than would be possible if they were calculated over a rolling 5-day window. A disadvantage of this FFT method is that the percentiles represented no longer exactly correspond to 10%. The actual percentiles are about 9% for temperature extremes and 5% for precipitation extremes; however Hamilton et al. [2012] found that results were fairly insensitive to percentiles used. Note that the precipitation percentile is lower because the thresholds are calculated using the recommended method of first removing dry days [e.g., Groisman and Knight, 2008], due to the skewed nature of the distribution of precipitation toward its lower limit of zero, and inconsistencies in observation records of low rainfall days (such as drizzle totals). The sensitivity of the two methods (FFT derived moving daily thresholds versus static monthly thresholds) was examined, and found to give similar levels of skill and consistent results for all parts of the analysis. Thus only the moving daily threshold results are presented here. [12] We assess the accuracy of hindcasts of mean temperature and precipitation anomalies alongside proportions of extreme days counted in a given period (season, year or multiyear period). We also compare DePreSys to the uninitialized model NoAssim, and to persistence (of anomalies). Persistence is defined as the most recent n-months of observations immediately before each hindcast start date, where n is equal to the number of months in the forecast

3 of 12

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Table 1. Area-Weighted Average of Spearman’s Correlation: Seasonal Hindcastsa Extremes Season

Extreme Definition

DePreSys

Persistence

0.31 0.31 0.31 0.31 0.36 0.36 0.36 0.36 0.47 0.47 0.47 0.47 0.34 0.34 0.34 0.34 0.37 Mean 0.25

Seasonal Precipitation Skill – Europe Land Precip90 0.11 0.02 Precip90 0.15 0.10 Precip90 0.07 0.04 Precip90 0.03 0.05 0.07 0.04 Extremes Average NoAssim (DJF only) 0.01

0.09 0.12 0.14 0.08 0.07 Mean 0.01

Seasonal Precipitation Skill – USA Land Precip90 0.24 0.09 Precip90 0.26 0.16 Precip90 0.22 0.03 Precip90 0.09 0.05 0.20 0.09 Extremes Average NoAssim (DJF only) 0.02

0.32 0.27 0.23 0.10 0.23 Mean 0.09

DJF MAM JJA SON Average

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rX ðMi  Oi Þ2 =N i

Seasonal Temperature Skill – Global Land Tmin10 0.28 0.19 Tmin90 0.32 0.19 Tmax10 0.20 0.12 Tmax90 0.26 0.14 MAM Tmin10 0.30 0.22 Tmin90 0.40 0.29 Tmax10 0.27 0.14 Tmax90 0.34 0.19 JJA Tmin10 0.45 0.25 Tmin90 0.52 0.25 Tmax10 0.30 0.13 Tmax90 0.36 0.16 SON Tmin10 0.25 0.23 Tmin90 0.41 0.32 Tmax10 0.21 0.16 Tmax90 0.33 0.18 Average 0.33 0.20 Extremes Average NoAssim (DJF only) 0.21 DJF MAM JJA SON Average

We also compute the standardized root-mean squared-error (SRMSE, root-mean squared-error divided by the standard deviation of the observations)

Mean DePreSys

DJF

D21110

SRMSE ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X  Þ2 =N ðO i  O i

where Mi and Oi are the model and observed values over N time steps and Ō is the mean of the observations. This is not shown for all cases as it gave very similar results to correlation. To reduce noise, after extreme output had been calculated, skill analysis was performed over regions of 17.5 latitude  18.75 longitude, with a spatial missing data threshold of 0.5 applied to the corresponding observations. We tested the significance of differences in correlation and SRMSE using a bootstrap resampling method with 5000 permutations, similar to that outlined by Smith et al. [2010]. Each hindcast ensemble being compared was randomly sampled over time (in blocks of 5 consecutive temporal points to account for autocorrelation) to create bootstrap samples (for each grid point in turn). The differences in skill measures were calculated for 5000 such paired bootstrap samples and 5 and 95 percentile confidence limits identified, such that a skill measure for the first hindcast is deemed significantly different to the second if zero is outside this range. On figures, such points are identified by stippling. We also display tables showing area-weighted averages (using simple cosines) of such correlation maps to enable a simple comparison of individual seasons and extreme definitions without the use of excessive numbers of figures. The significance of these values is tested as above, but by computing the area average correlation for bootstrap samples of entire paired fields. Significantly different values are indicated in bold in the tables.

a

Area-weighted average of Spearman’s correlation for model hindcasts of extremes (third column), persistence hindcasts of extremes (fourth column) and model hindcasts of the mean (fifth column). “Seasonal Temperature Skill – Global Land” is for temperature over global land (North America, Eurasia, Australia and Argentina) with respect to HadGHCND observations, “Seasonal Precipitation Skill – Europe Land” is for precipitation over Europe land (with respect to ENSEMBLES-OBS), and “Seasonal Precipitation Skill – USA Land” is for precipitation over USA land (with respect to NCEP observations). Values in bold in the third column are significantly different to zero. Values in bold in the fourth and fifth columns are significantly different from those in the third column. See section 3 for details of significance tests. Values in bold indicate that the correlation for DePreSys extremes is significantly different to zero (third column), persistence (fourth column) or the mean (fifth column).

period. This was found to be the best measure of persistence for these variables, akin to Hamilton et al. [2012], though results using alternative definitions showed little sensitivity to the change. [13] To assess hindcast accuracy we first compute the Spearman’s rank correlation coefficient (SCC, defined in the same way as the more commonly used Pearson’s correlation, but on the rank of data values rather than the data values themselves). This avoids making assumptions about the normality, linearity or continuity of the sample which would not be appropriate for discrete variables such as the proportion of extreme days in a period [Wilks, 2006].

4. Results [14] In the following sections we summarize the skill of temperature extreme hindcasts by concatenating all four definitions together (Tmax90, Tmax10, Tmin90, Tmin10; i.e., as a vector of length four times that of a single definition), although analysis has also been carried out for each definition separately (Tables 1 and 2). For rainfall extremes, we only consider extreme wet events (Precip90). Correlation contour plots are shown for DePreSys predictions of the proportion of extreme days in a period relative to corresponding observations. Difference plots compare this correlation directly with that of hindcasts of the mean or using the alternative models NoAssim (the same as DePreSys but without initialization), persistence (the observed n-month period immediately before each hindcast start date, where n is the length of the forecast period) or climatology. Correlation differences are calculated in Fisher space. For correlation, high positive values indicate good skill. Plots of SRMSE show regions of skill very similar to those for correlations and are therefore omitted in general. However, we do include some SRMSE plots for illustration. In these, low values indicate good skill. Climatology would give a value of 1, and hindcasts with the same distribution as observations but zero

4 of 12

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Table 2. Area-Weighted Average of Spearman’s Correlation: Multiyear Hindcastsa Hindcast Period

Extreme Definition

Extremes DePreSys

Persistence

NoAssim

Mean DePreSys

Year 1 Year 2 Years 2–6 Years 5–9

Multiyear Temperature Skill – Global Land Tmin10 0.50 0.31 0.50 Tmin90 0.60 0.42 0.59 Tmax10 0.37 0.15 0.36 Tmax90 0.47 0.26 0.44 Tmin10 0.48 0.34 0.51 Tmin90 0.56 0.43 0.61 Tmax10 0.34 0.13 0.33 Tmax90 0.41 0.24 0.43 Tmin10 0.78 0.64 0.79 Tmin90 0.83 0.68 0.84 Tmax10 0.68 0.35 0.69 Tmax90 0.70 0.42 0.73 Tmin10 0.77 0.64 0.77 Tmin90 0.84 0.66 0.84 Tmax10 0.70 0.39 0.69 Tmax90 0.73 0.45 0.73 0.49 0.29 0.48 All 4-Defs.b All 4-Defs. 0.45 0.30 0.48 All 4-Defs. 0.74 0.55 0.75 All 4-Defs. 0.73 0.49 0.72

0.52 0.52 0.52 0.52 0.48 0.48 0.48 0.48 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.52 0.48 0.80 0.80

Year 1 Year 2 Years 2–6 Years 5–9

Multiyear Precipitation Skill – Europe Land Precip90 0.08 0.07 0.01 Precip90 0.12 0.06 0.08 Precip90 0.26 0.04 0.32 Precip90 0.23 0.08 0.20

0.01 0.01 0.17 0.14

Year 1 Year 2 Years 2–6 Years 5–9

Multiyear Precipitation Skill – USA Land Precip90 0.19 0.08 0.09 Precip90 0.02 0.12 0.06 Precip90 0.08 0.07 0.01 Precip90 0.24 0.11 0.10

0.25 0.03 0.15 0.19

Year 1

Year 2

Years 2–6

Years 5–9

a As for Table 1, but for 1-year and 5-year hindcasts. Values in bold in the fourth, fifth, and sixth columns are significantly different from those in the third column. Values in bold indicate that the correlation for DePreSys extremes is significantly different to zero (third column), persistence (fourth column), NoAssim (fifth column) or the mean (sixth column) see section 3 for details. b Computed by concatenating all four temperature extreme definitions, as in Figure 1.

D21110

correlation (which therefore offer no potential skill) have a value of √2 [e.g., Collins, 2002]. [15] Comparison of DePreSys with NoAssim allows the impact of initialization to be assessed. Note, however, that NoAssim hindcasts were only available for the November start dates, so that a comparison of seasonal skill between DePreSys and NoAssim is only possible for DJF. Assessment of skill is, of course, limited to the regions covered by the observations; northern hemisphere land, Australia and Argentina for temperature, Europe and the USA for precipitation. Figures and area averages of correlation and SRMSE maps are thus restricted to these regions. 4.1. Seasonal Predictability [16] We find that seasonal temperature extremes (proportion of extreme days in a season, relative to the moving daily thresholds as described in section 3), over all four standard seasons, are reasonably predictable at a lead-time of 1-month (Figure 1 and Table 1). DePreSys is significantly more skillful than climatology and persistence over much of the globe (Figures 1a and 1c, noting that positive values of correlation differences indicate greater skill for DePreSys extreme hindcasts). This is in agreement with results using Met Office seasonal forecasts (GloSea4 [Hamilton et al., 2012]). The values of SCC are themselves fairly modest, with an area average of 0.3 (Table 1). There is some variation over the different seasons and extreme definitions, with area average SCCs ranging from 0.20 to 0.52 (Table 1). However, analysis of the robustness of these differences, and corresponding physical mechanisms, is left for future work. In agreement with Hamilton et al. [2012], the skill of seasonal temperature extremes is similar to that of seasonal mean temperature, but generally lower (Figure 1b and Table 1). This similarity is because the frequencies of moderate temperature extremes are highly correlated with the mean [Hamilton et al., 2012]. We have also investigated the predictability of wet extremes over Europe and the USA, and find some areas where skill is significantly better than persistence over the USA (Figures 2a and 2c). The skill is lower than that for temperature and not significantly

Figure 1. Skill of seasonal hindcasts of temperature extremes. (a) The Spearman’s Correlation Coefficient (SCC) between observed and forecast anomalies of the proportion of daily extremes, computed by concatenating all four definitions (Tmax90, Tmin90, Tmax10 and Tmin10) and all four standard seasons (DJF, MAM, JJA, and SON) at a forecast lead time of one month; see section 4 for further details. (b) SCC of extremes (as in Figure 1a) minus the SCC for predictions of the seasonal mean. (c) SCC of extremes (as in Figure 1a) minus the SCC of persistence forecasts (most recent 3-months of observations immediately before each hindcast start date; see section 3). In all figures, each 2.5 latitude by 3.75 longitude grid point shows values for the 17.5 by 18.75 region centered on that grid point. Stippling identifies regions where correlations for extremes are significantly different to zero (Figure 1a), the mean (Figure 1b), and persistence (Figure 1c) (for method, see section 3). For clarity, only alternate points are marked with stippling. 5 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Figure 2. Skill of seasonal hindcasts of wet extremes (Precip90). (a) SCC of the proportion of daily wet extremes, (b) SCC of extremes minus the SCC for predictions of the seasonal mean, and (c) SCC of extremes minus the SCC of persistence forecasts (as in Figure 1).

different to that for mean precipitation (Figure 2b). The skill appears to be greater over the USA than Europe (area average SCC of 0.2 compared to 0.1, Table 1), although further analysis of the physical mechanisms is needed to gain confidence in this result. [17] Seasonal predictions of temperature and precipitation extremes are clearly improved in some regions through assimilation of initial conditions (Figure 3), though analysis is for winter only here due to availability of NoAssim hindcasts. For temperature the improvements are mainly at the higher latitudes, but also Australia, the eastern coast of the USA and parts of East Asia and China. However, initialization shows no improvement for other regions, and in fact shows slightly lower skill in central Asia, Europe and central USA, suggesting that the overall skill in these regions seen in Figure 1a arises from external forcing factors. Indeed, Hamilton et al. [2012] found that removal of the climate change signal significantly reduced the skill of seasonal temperature extremes in central Asia (though also China and eastern Canada) but with no significant effect in the rest of North America and Eurasia. For wet extremes, forecasts are improved by initialization over most of the USA and Europe, although there is possibly some degradation in Eastern Europe. Despite the clear improvement through initialization, the overall skill over Europe is very low (Figure 2a). 4.2. Interannual to Decadal Predictability [18] Figures 4, 5, 6, and 7 show the skill of predictions of the proportion of extreme days in annual and multiannual periods, relative to the moving daily thresholds as described in section 3. Each row in these figures shows results for a particular forecast period, in the same format as Figures 1, 2

and 3. The first two rows represent year 1 and year 2, i.e., annual hindcasts (December–November) at lead-times 1 and 13 months respectively. The last two rows represent predictions of the five-year periods years 2–6 and years 5–9, i.e., 5-year hindcasts at lead-times 13 and 49 months respectively. [19] Predictions of temperature extremes in annual and multiannual periods are significantly more skillful than climatology over much of the globe, i.e., correlations significantly greater than 0 (Figure 4a) and SRMSE significantly less than 1 (Figure 6a). There is also significant improvement over persistence in many regions (Figure 4c, positive values of correlation differences indicating greater skill). Correlation is higher than for seasonal timescales, and increases for multiyear periods (area average SCCs are 0.5 for years 1 and 2, and 0.8 for years 2–6 and 5–9; Table 2), partly because averaging over longer periods reduces the impact of unpredictable short-term variability. We note that the higher correlations are also partly due to trends, and the absolute values of correlation should be interpreted carefully. For a clear assessment of the general climate trend, we look at the trend in 5-year mean temperature. We find that the model represents the spatial variability reasonably well compared to the observations, with lower trends over Australia, southeast Asia and Argentina compared to Europe and northern Asia (Figures 8a and 8b). This is reflected in SRMSE values of 0.8 or less throughout much of the globe (Figure 8d), representing a 20% improvement over climatology (which has an expected value of 1). However, the model trend is much higher than observed in the southeastern USA (Figure 8c), leading to large values of SRMSE for mean temperature (Figure 8d) and temperature extremes (Figure 6a), especially hot days

Figure 3. The impact of initialization on seasonal predictions of (a) temperature and (b) precipitation extremes in northern hemisphere winters (December 1960 to February 2006). The figures are constructed as Figures 1c and 2c, but for the SCC of DePreSys minus the SCC of NoAssim (see section 4.1 for further details). 6 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Figure 4. Skill of annual and multiannual hindcasts of temperature extremes. (a) SCC of the proportion of daily extremes in the 1 or 5-year period. (b) SCC of extremes minus the SCC of the annual or multiannual mean. (c) SCC of extremes minus the SCC of persistence forecasts (most recent 12 (60) months of observations immediately before each annual (5-year) hindcast start date; see section 3). Rows are for Year 1, Year 2, Years 2–6 and Years 5–9, i.e., for annual hindcasts at lead-times 1 and 13 months, and 5-year hindcasts at lead-times 13 and 49 months respectively. Stippling identifies regions where correlations for extremes are significantly different to zero (Figure 1a), the mean (Figure 1b), and persistence (Figure 1c) (for method, see section 3). For clarity, only alternate points are marked with stippling. (not shown). This is consistent with the region of observed lack of warming that Pan et al. [2004] refer to as a “warming hole,” associated with regional scale feedbacks, involving soil moisture, atmospheric circulation and the surface energy balance. Such feedbacks are also dependent on land use changes [Misra et al., 2012], which are not

well represented in our model. Recent studies suggest that decadal internal dynamic variability is also an important driver, largely originating from sea surface temperatures in the tropical Pacific and possibly the northern Atlantic [Kunkel et al., 2006; Meehl et al., 2012].

Figure 5. Skill of annual and multiannual hindcasts of wet extremes. (a) SCC of the proportion of daily wet extremes, (b) SCC of extremes minus the SCC for predictions of the seasonal mean, and (c) SCC of extremes minus the SCC of persistence forecasts (as in Figure 4). 7 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Figure 6. Skill of annual and multiannual predictions of (a) temperature and (b) precipitation extremes in terms of Standardized Root-Mean squared-Error (SRMSE). Rows are for Year 1, Year 2, Years 2–6 and Years 5–9 (as in Figures 4a and 5a). Stippling identifies regions where SRMSEs for extremes are significantly different to one (for method, see section 3). [20] As was the case for seasonal timescales, the skill in terms of correlation for temperature extremes is comparable to that for the mean, but generally lower especially at longer lead times (Figure 4b). There is some significant improvement through initialization in year 1, especially in Australia and Canada (Figure 7a). However, initialization shows very little impact beyond the first year, suggesting that the predictability on these timescales arises from external forcings (Figure 7a).

[21] As for seasonal timescales, annual and multiannual predictions of wet extremes have low skill; indicated by correlations much lower than for temperature extremes (Figure 5a compared to Figure 4a). There are some regions of significant skill with correlation significantly higher than climatology over the USA and northern Europe (Figure 5a) though not in terms of SRMSE (Figure 6b). There is improvement over persistence, especially in northern Europe for 5-year periods, though also in the USA (Figure 5c). The

Figure 7. The impact of initialization on annual and multiannual predictions of (a) temperature and (b) precipitation extremes. The figures are constructed as Figures 4c and 5c, but for the SCC of DePreSys minus the SCC of NoAssim (see section 4.2 for further details). 8 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Figure 8. The trend for 5-year mean temperature from (a) the DePreSys ensemble mean for hindcast years 2–6 and (b) the HadGHCND observations; (c) the difference between the two, i.e., Figure 8a minus Figure 8b; and (d) the SRMSE of DePreSys 5-year mean temperature (again for hindcast years 2–6; stippling identifies regions where SRMSEs are significantly different to one). high correlations in the USA for years 5–9 are somewhat surprising given the low correlations in years 2–6 (Figure 5a), with improvement also apparent in terms of SRMSE (Figure 6b). A similar signal of higher correlation in eastern USA (though lower correlation in the west) is found for the NoAssim (uninitialized) hindcasts for both years 5–9 and years 2–6 (not shown, but can be inferred from Figure 7b), suggesting that it is an externally forced signal that has somehow been suppressed in the initialized hindcasts for the earlier period, possibly due to initialization shock [Smith et al., 2012b]. The skill in the USA in year 1 arises mainly through initialization, shown by correlations for DePreSys significantly greater than for NoAssim. For Europe this difference has the same sign but is not shown as significant (Figure 7b). In both regions the skill beyond the first year arises mainly from external forcings, shown by correlations for DePreSys not significantly different to NoAssim (Figure 7b). For years 2–6 correlation is higher over Europe than the USA (area average SCC of 0.3 compared to 0.1). Further analysis is needed to understand the physical mechanisms responsible for the increased skill over Europe, as this may arise from coincidental trends due to forcings in DePreSys and the increasing winter NAO in observations which is known to have driven a systematic

increase in precipitation extremes across Northern Europe [Scaife et al., 2008]. [22] Interestingly, though correlations for wet extremes are fairly low, they are significantly higher than for the mean on decadal timescales over parts of northern Europe (Figure 5b). Indeed, mean precipitation loses its skill beyond the seasonal timescale over much of Europe (not shown), while extremes retain low but potentially useful levels of positive correlation and also low values of SRMSE (Figure 6b). This suggests that the signal-to-noise ratio is higher for extremes than for the mean. This could occur, for example, if either the variability or the trend in extremes, relative to unpredictable variations, is greater than for the mean. Since, as discussed above, the predictability for extreme wet days over Europe arises mainly from external forcings beyond year 1, we focus on the signal-to-noise ratio of the trend as a possible reason for the higher predictability of extremes than the mean seen in Figure 5 in northern Europe. We compare the observed trends in 5-year mean precipitation and 5-year wet extremes in Figure 9. These trends have been normalized by their detrended standard deviations of annual mean precipitation and wet extremes respectively (i.e., after subtraction of the linear trends calculated for each grid box), which provide a measure of the

Figure 9. Observed signal-to-noise ratio of trends in (a) 5-year wet extremes and (b) 5-year mean precipitation. The plots show trends computed over the period 1960–2011 divided by their associated standard deviations of annual values (after detrending; see section 4.2 for further details). (c) The signal-to-noise ratio of wet extremes minus that of mean precipitation (i.e., Figure 9a minus Figure 9b). 9 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

D21110

Figure 10. The SCC for (a) hot extremes (Tmax90) and (b) cold extremes (Tmin10) minus the SCC for mean temperature, for DePreSys hindcast years 2–6. Also shown is the signal-to-noise ratio of temperature extremes minus that of mean temperature, as in Figure 9c but for (c) hot extremes (Tmax90) and (d) cold extremes (Tmin10); see section 4.2 for further details. noise, or unpredictable variability. We find that the signalto-noise ratio for wet extremes is indeed higher than for the mean over northern Europe (Figure 9c), providing some support for the higher predictability in extremes than the mean. For temperature on decadal timescales, we also find similar relationships between differences in correlations between extremes and the mean and their respective differences in normalized trends for hot and cold events (Figure 10). For example, the correlation for extremes is higher than for the mean for Tmax90 over northern Eurasia and for Tmin10 over the USA, corresponding to regions of

higher normalized trends in extremes compared to the mean. Conversely, regions with lower normalized trends in extremes tend to provide lower predictability of extremes. These results suggest that analysis of the skill of extremes may occasionally provide information that is complementary to the skill of the mean if there are robust physical reasons to expect the trend in extremes, relative to noise, to be greater than for the mean. [23] The focus of this study was to assess the potential of using global climate models to predict changes in extremes on decadal time scales, rather than to carry out a full

Figure 11. Skill of annual and multiannual predictions of (a) temperature and (b) precipitation extremes in terms of reliability, displayed as the percentage of years where observations are contained within the model’s 90% confidence interval (see end of section 4.2). Rows are for Year 1, Year 2, Years 2–6 and Years 5–9 (as in Figures 4a and 5a). 10 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

probabilistic assessment of forecasts (which would not be very robust with just 9 ensemble members). To take a first look at the reliability of such forecasts, we have calculated the 90% confidence interval of our hindcasts based on the standard deviation of ensemble members (assuming normality). We then assessed the number of times that corresponding observations fall within this interval over the period (Figure 11). Ideally this would occur 90% of the time. For temperature extremes, we find that the observations are too often outside the ensemble confidence interval (Figure 11a), suggesting that the model is over confident, with true forecast uncertainties being underestimated, as is often found for seasonal forecasts [Weigel et al., 2009]. However, this is not the case everywhere; for temperature extremes over Europe (Figure 11a) and precipitation extremes over both Europe and the USA (Figure 11b) observations are too often inside the ensemble confidence interval, suggesting that the true forecast uncertainties are being overestimated. Other studies [e.g., Goddard et al., 2012] find similar imperfections in ensemble spread, suggesting improvements in models and initialization strategies are required.

5. Summary and Concluding Remarks [24] We have assessed the skill in terms of correlation and error for predicting moderate 1 in 10 day extreme hot and cold temperature and wet rainfall events on timescales from a season to a decade ahead. This was achieved by analyzing daily observed and model data, and comparing the proportion of extreme days in observations with the proportion of extreme days predicted by the Met Office Decadal Prediction System (DePreSys). Our assessment is restricted by the availability of observations to global-land (the northern hemisphere, Australia and Argentina) for temperature, and Europe and the USA for rainfall. [25] On average, seasonal temperature extremes are significantly more skillful than persistence over much of the globe. Their skill is fairly modest, with an area average of maps of Spearman’s correlation coefficient (SCC) for DePreSys over global-land of 0.3 (significantly higher than zero and persistence, as in Table 1) and slightly lower than for the mean, consistent with a companion study [Hamilton et al., 2012]. There is also some skill for wet extremes, especially over the USA, although the correlation values are lower than for temperature (with an area average SCC of 0.2). [26] The skill of both temperature and rainfall extremes improves for multiyear periods, with mean SCCs of 0.8 for temperature (global land) and 0.2 for European rainfall in years 2–6 and 5–9. This is partly because a longer averaging period reduces the impact of unpredictable short-term variations, and also capitalizes on predictable trends from external forcings. [27] Interestingly, the skill for extremes is sometimes higher than for the mean. This is seen especially for multiyear extremes, including rainfall over Europe, hot extremes over northern Eurasia, and cold extremes over the USA. Regions with higher skill for extremes than the mean generally correspond to regions where the trends in extremes are greater than they are for the mean, pointing to a higher signal-to-noise ratio as a possible explanation. Analysis of extremes may therefore occasionally provide

D21110

complementary information to analysis of the mean if there are robust physical reasons to expect the trend in extremes, relative to noise, to be greater than for the mean. [28] Initialization of the model with observations improves the skill in some regions for both temperature (higher latitudes, Australia, and China) and rainfall extremes (USA and Europe) on seasonal timescales. However, there is very little improvement beyond the first year suggesting that skill on these timescales in our model arises from external forcings. Thus, while it is encouraging that there is significant predictability of extremes for the coming years, the impact of initialization is disappointing. A similar result was presented by Smith et al. [2010] who found no improvement from initialization in predictions of 5-year mean surface temperature over land. However, Smith et al. [2010] did find significant improvements in key regions of the ocean, including the Atlantic sub-polar gyre and the tropical Pacific, leading to improved predictions of Atlantic hurricane frequency. Many studies [e.g., Zhang and Delworth, 2006; Knight et al., 2006; McCabe et al., 2004; Smith et al., 2012a] suggest that skillful predictions of multiyear ocean temperatures in these regions may also lead to predictions of important climate impacts over land, including rainfall over the Sahel and parts of North and South America, and European temperatures. The absence of such improvements in this study and in previous work [Smith et al., 2010] might be caused by errors in the atmospheric response to tropical Pacific and North Atlantic ocean temperatures simulated by our decadal prediction model (HadCM3), which is soon to be updated to HadGEM3 [Walters et al., 2011]. Improved skill may therefore be possible in the future through further model developments, especially those that improve the representation of global teleconnections through which ocean conditions influence land regions. Nevertheless, skill arising purely from external forcings suggests that useful multiannual predictions of temperature extremes are already a viable possibility in many regions, although further assessment of the sensitivity to different forcing factors would be prudent before relying on such forecasts. [29] Acknowledgments. This study was supported by the Joint UK DECC/Defra Met Office Hadley Centre Climate Programme (GA01101) and the EU FP7 COMBINE project.We acknowledge the HadGHCND data set produced by the Hadley Centre and National Climate Data Center. We acknowledge the E-OBS data set from the EU-FP6 project ENSEMBLES (http://www.ensembles-eu.org) and the data providers in the ECA&D project (http://eca.knmi.nl). CPC US Unified Precipitation data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.cdc.noaa.gov/.

References Alexander, L. V., et al. (2006), Global observed changes in daily climate extremes of temperature and precipitation, J. Geophys. Res., 111, D05109, doi:10.1029/2005JD006290. Arribas, A., et al. (2011), The GloSea4 Ensemble Prediction System for Seasonal Forecasting, Mon. Weather Rev., 139(6), 1891–1910, doi:10.1175/2010MWR3615.1. Caesar, J., L. Alexander, and R. Vose (2006), Large-scale changes in observed daily maximum and minimum temperatures: Creation and analysis of a new gridded data set, J. Geophys. Res., 111, D05101, doi:10.1029/2005JD006280. Clark, R. T., J. M. Murphy, and S. J. Brown (2010), Do global warming targets limit heatwave risk?, Geophys. Res. Lett., 37, L17703, doi:10.1029/ 2010GL043898. Collins, M. (2002), Climate predictability on interannual to decadal time scales: The initial value problem, Clim. Dyn., 19, 671–692, doi:10.1007/ s00382-002-0254-8.

11 of 12

D21110

EADE ET AL.: DECADAL FORECASTS OF EXTREMES

Collins, M., B. B. B. Booth, G. Harris, J. M. Murphy, D. M. H. Sexton, and M. J. Webb (2006), Towards quantifying uncertainty in transient climate change, Clim. Dyn., 27, 127–147, doi:10.1007/s00382-006-0121-0. Collins, M., B. B. B. Booth, B. Bhaskaran, G. R. Harris, J. M. Murphy, D. M. H. Sexton, and M. J. Webb (2010), Climate model errors, feedbacks and forcings: A comparison of perturbed physics and multimodel ensembles, Clim. Dyn., 36(9–10), 1737–1766, doi:10.1007/ s00382-010-0808-0. Doblas-Reyes, F. J., A. Weisheimer, T. N. Palmer, J. M. Murphy, and D. Smith (2010), Forecast quality assessment of the ENSEMBLES seasonalto-decadal Stream 2 hindcasts, ECMWF Tech. Memo. 621, 45 pp., Eur. Cent. for Medium-Range Weather Forecasts, Reading, U. K. Frich, P., L. V. Alexander, P. Della-Mara, B. Gleason, M. Haylock, A. M. G. Klein Tank, and T. Peterson (2002), Observed coherent changes in climatic extremes during the second half of the twentieth century, Clim. Res., 19, 193–212, doi:10.3354/cr019193. Goddard, L., et al. (2012), A verification framework for interannual-todecadal predictions experiments, Clim. Dyn., doi:10.1007/s00382-0121481-2, in press. Gordon, C., C. Cooper, C. A. Senior, H. Banks, J. M. Gregory, T. C. Johns, J. F. B. Mitchell, and R. A. Wood (2000), The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments, Clim. Dyn., 16, 147–168, doi:10.1007/s003820050010. Groisman, P. Y., and R. W. Knight (2008), Prolonged dry episodes over the conterminous United States: New tendencies emerging during the last 40 years, J. Clim., 21, 1850–1862, doi:10.1175/2007JCLI2013.1. Hamilton, E., R. Eade, R. J. Graham, A. A. Scaife, D. M. Smith, A. Maidens, and C. MacLachlan (2012), Forecasting the number of extreme daily events on seasonal timescales, J. Geophys. Res., 117, D03114, doi:10.1029/ 2011JD016541. Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New (2008), A European daily high-resolution gridded data set of surface temperature and precipitation. J. Geophys. Res., 113, D20119, doi:10.1029/2008JD010201. Hegerl, G. C., F. W. Zwiers, P. A. Stott, and V. V. Kharin (2004), Detectability of anthropogenic changes in annual temperature and precipitation extremes, J. Clim., 17, 3683–3700, doi:10.1175/1520-0442(2004) 0172.0.CO;2. Higgins, R. W., J. E. Janowiak, and Y.-P. Yao (1996), A gridded hourly precipitation data base for the United States (1963–1993), NCEP/Clim. Predict. Cent. Atlas 1, 46 pp., Natl. Cent. for Environ. Predict., College Park, Md. Higgins, R. W., V. E. Kousy, V. B. S. Silva, E. Becker, and P. Xie (2010), intercomparison of daily precipitation statistics over the united states in observations and in NCEP reanalysis products, J. Clim., 23(17), 4637–4650, doi:10.1175/2010JCLI3638.1. Kenyon, J., and G. C. Hegerl (2008), Influence of modes of climate variability on global temperature extremes, J. Clim., 21(15), 3872–3889, doi:10.1175/2008JCLI2125.1. Kharin, V. V., and F. W. Zwiers (2000), Changes in the extremes in an ensemble of transient climate simulations with a coupled atmosphere– ocean GCM, J. Clim., 13(21), 3760–3788, doi:10.1175/1520-0442 (2000)0132.0.CO;2. Kharin, V. V., and F. W. Zwiers (2005), Estimating extremes in transient climate change simulations, J. Clim., 18(8), 1156–1173, doi:10.1175/ JCLI3320.1. Kharin, V. V., F. W. Zwiers, X. Zhang, and G. C. Hegerl (2007), Changes in temperature and precipitation extremes in the IPCC ensemble of global coupled model simulations, J. Clim., 20(8), 1419–1444, doi:10.1175/ JCLI4066.1. Knight, J. R., C. K. Folland, and A. A. Scaife (2006), Climatic impacts of the Atlantic Multidecadal Oscillation, Geophys. Res. Lett., 33, L17706, doi:10.1029/2006GL026242. Kunkel, K. E., R. Pielke Jr., and S. A. Changnon (1999), Temporal fluctuations in weather and climate extremes that cause economic and human health impacts: A review, Bull. Am. Meteorol. Soc., 80, 1077–1098, doi:10.1175/1520-0477(1999)0802.0.CO;2. Kunkel, K. E., X.-Z. Liang, J. Zhu, and Y. Lin (2006), Can CGCMs simulate the twentieth-century “warming hole” in the central United States?, J. Clim., 19, 4137–4153, doi:10.1175/JCLI3848.1. McCabe, G. J., M. A. Palecki, and J. L. Betancourt (2004), Pacific and Atlantic Ocean influences on multidecadal drought frequency in the United States, Proc. Natl. Acad. Sci. U. S. A., 101, 4136–4141, doi:10.1073/pnas.0306738101. McMichael, A., and A. Githeko (2001), Human health, in Climate Change 2001: Impacts, Adaptation and Vulnerability, edited by J. McCarthy et al., pp. 451–486, Cambridge Univ. Press, Cambridge, U. K.

D21110

Meehl, G. A., C. Tebaldi, and D. Nychka (2004), Changes in frost days in simulations of 21st century climate, Clim. Dyn., 23, 495–511, doi:10.1007/s00382-004-0442-9. Meehl, G. A., C. Tebaldi, G. Walton, D. Easterling, and L. McDaniel (2009), The relative increase of record high maximum temperatures compared to record low minimum temperatures in the U.S., Geophys. Res. Lett., 36, L23701, doi:10.1029/2009GL040736. Meehl, G. A., J. M. Arblaster, and G. Branstator (2012), Mechanisms contributing to the warming hole and the consequent U.S. east-west differential of heat extremes, J. Clim., 25, 6394–6408, doi:10.1175/ JCLI-D-11-00655.1. Menéndez, C. G., and A. F. Carril (2010), Potential changes in extremes and links with the Southern Annular Mode as simulated by a multi-model ensemble, Clim. Change, 98(3–4), 359–377, doi:10.1007/s10584-0099735-7. Misra, V., J.-P. Michael, R. Boyles, E. P. Chassignet, M. Griffin, J. J. O’Brien (2012), Reconciling the spatial distribution of the surface temperature trends in the southeastern United States, J. Clim., 25, 3610–3618, doi:10.1175/JCLI-D-11-00170.1. Pan, Z., R. W. Arritt, E. S. Takle, W. J. Gutowski Jr., C. J. Anderson, and M. Segal (2004), Altered hydrologic feedback in a warming climate introduces a “warming hole,” Geophys. Res. Lett., 31, L17109, doi:10.1029/ 2004GL020528. Rahmstorf, S., and D. Coumou (2011), Increase of extreme events in a warming world, Proc. Natl. Acad. Sci. U. S. A., 108, 17,905–17,909, doi:10.1073/pnas.1101766108. Robertson, A. W., V. Moron, and Y. Swarinoto (2009), Seasonal predictability of daily rainfall statistics over Indramayu district, Indonesia, Int. J. Climatol., 29(10), 1449–1462, doi:10.1002/joc.1816. Russo, S., and A. Sterl (2011), Global changes in indices describing moderate temperature extremes from the daily output of a climate model, J. Geophys. Res., 116, D03104, doi:10.1029/2010JD014727. Scaife, A. A., C. K. Folland, L. V. Alexander, A. Moberg, and J. R. Knight (2008), European climate extremes and the North Atlantic Oscillation, J. Clim., 21(1), 72–83, doi:10.1175/2007JCLI1631.1. Smith, D. M., S. Cusack, A. W. Colman, C. K. Folland, G. R. Harris, and J. M. Murphy (2007), Improved surface temperature prediction for the coming decade from a global climate model, Science, 317, 796–799, doi:10.1126/science.1139540. Smith, D. M., R. Eade, N. J. Dunstone, D. Fereday, J. M. Murphy, H. Pohlmann, and A. A. Scaife (2010), Skilful multi-year predictions of Atlantic hurricane frequency, Nat. Geosci., 3, 846–849, doi:10.1038/ ngeo1004. Smith, D. M., A. A. Scaife, and B. Kirtman (2012a), What is the current state of scientific knowledge with regard to seasonal and decadal forecasting?, Environ. Res. Lett., 7, 015602, doi:10.1088/1748-9326/7/1/015602. Smith, D. M., R. Eade, and H. Pohlmann (2012b), A comparison of fullfield and anomaly initialization for seasonal to decadal climate prediction, Clim. Dyn., in press. Tebaldi, C., K. Hayhoe, J. M. Arblaster, and G. A. Meehl (2006), Going to the extremes: An intercomparison of model-simulated historical and future changes in extreme events, Clim. Change, 79, 185–211, doi:10.1007/s10584-006-9051-4. Trenberth, K. E., et al. (2007), Observations: Surface and atmospheric climate change, in Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, edited by S. Solomon et al., pp. , 235–336, Cambridge Univ. Press, Cambridge, U. K. Trewin, B., and H. Vermont (2010), Changes in the frequency of record temperatures in Australia, 1957–2009, Aust. Meteorol. Oceanogr. J., 60, 113–119. Vlcek, O., and R. Huth (2009), Is daily precipitation Gamma-distributed?: Adverse effects of an incorrect use of the Kolmogorov–Smirnov test, Atmos. Res., 93(4), 759–766. Walters, D. N., et al. (2011), The Met Office Unified Model Global Atmosphere 3.0/3.1 and JULES Global Land 3.0/3.1 configurations, Geosci. Model Dev. Discuss., 4(2), 1213–1271, doi:10.5194/gmdd-4-1213-2011. Weigel, A. P., M. A. Liniger, and C. Appenzeller (2009), Seasonal ensemble forecasts: Are recalibrated single models better than multimodels?, Mon. Weather Rev., 137, 1460–1479, doi:10.1175/2008MWR2773.1. Wilks, D. S. (2006), Statistical Methods in the Atmospheric Sciences, 2nd ed., pp. 23–70, Academic, San Diego, Calif. Zeng, Z., W. W. Hsieh, A. Shabbar, and W. R. Burrows (2010), Seasonal prediction of winter extreme precipitation over Canada by support vector regression, Hydrol. Earth Syst. Sci. Discuss., 7(3), 3521–3550, doi:10.5194/hessd-7-3521-2010. Zhang, R., and T. L. Delworth (2006), Impact of Atlantic multidecadal oscillations on India/Sahel rainfall and Atlantic hurricanes, Geophys. Res. Lett., 33, L17712, doi:10.1029/2006GL026267.

12 of 12