Comparison of Random Regression Models with Legendre ...

23 downloads 22889 Views 684KB Size Report
†Agriculture and Agri-Food Canada, Dairy and Swine Research and Development Centre, ... ‡Canadian Dairy Network, Guelph, Ontario, N1K 1E5, Canada.
J. Dairy Sci. 91:3627–3638 doi:10.3168/jds.2007-0945 © American Dairy Science Association, 2008.

Comparison of Random Regression Models with Legendre Polynomials and Linear Splines for Production Traits and Somatic Cell Score of Canadian Holstein Cows J. Bohmanova,*1 F. Miglior,†‡ J. Jamrozik,* I. Misztal,§ and P. G. Sullivan‡ *Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, N1G 2W1, Canada †Agriculture and Agri-Food Canada, Dairy and Swine Research and Development Centre, Sherbrooke, Quebec, J1M 1Z3, Canada ‡Canadian Dairy Network, Guelph, Ontario, N1K 1E5, Canada §Department of Animal and Dairy Science, University of Georgia, Athens 30602

ABSTRACT A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-ageseason of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.

Received December 12, 2007. Accepted June 3, 2008. 1 Corresponding author: [email protected]

Key words: random regression model, Legendre polynomial, linear spline INTRODUCTION Random regression model (RRM) methodology is commonly used for routine genetic evaluation of production traits in dairy cattle (Interbull, 2007). The basic idea of RRM consists of fitting average lactation curves for a subpopulation and animal specific curves describing deviations from the average curves. Several functions can be used to fit fixed and random regressions. Early applications made use of parametric functions and lactational shape functions such as the Ali and Schaeffer (1987) and Wilmink (1987) functions. However, those functions were subsequently replaced by orthogonal Legendre polynomials (LP) because the former models were unable to properly model the peak of the lactation curve and tended to converge slowly because of high correlations among coefficients (Schaeffer, 2004). Models with LP as regressions are orthogonal and, therefore, have better convergence properties than models with parametric or lactational shape functions (Pool et al., 2000). However, to fit the shape of lactation appropriately, higher order polynomials are required. In this case, implausibly high additive genetic variances at the extremes of lactation and negative correlations between the most distant test-days can be obtained (Jamrozik et al., 2001; López-Romero et al., 2004). The overestimation of variances at the extremes of the lactation is often explained by Legendre polynomials’ lack of asymptotes; poor performance of fitting data at the extremes of lactations (López-Romero et al., 2004); or not accounting for between-herd variation in the shape of the lactation (Gengler and Wiggans, 2001; Jamrozik et al., 2001; de Roos et al., 2004). Recently, splines have been advocated as a good alternative to LP for analyzing test-day yields in RRM (White et

3627

3628

BOHMANOVA ET AL. Table 1. Description of data set used for estimation of variance components (VC) and 2 data sets used for genetic evaluation, the first with test-day records until August 2006 (D06) and the second with test-day records until August 2001 (D01) Item

VC

D06

Number of test-day records Number of cows Number of test-day records per cow Number of HTD1 classes Number of animal in pedigree

96,756 6,094 16.0 3,915 18,178 Mean SD

Days in milk Milk yield (kg) Fat yield (kg) Protein yield (kg) SCS

161 28.8 1.04 0.93 2.5

1

95 9.2 0.33 0.36 1.7

D01

45,120,202 26,832,479 2,650,096 1,564,228 17.0 17.1 3,593,917 2,503,244 3,695,277 2,517,531 Mean SD Mean SD 161 27.8 1.02 0.89 2.2

95 8.8 0.34 0.26 2.0

162 26.1 0.96 0.84 2.1

95 8.5 0.32 0.25 1.9

Herd-test-day.

al., 1999; Druet et al., 2003; Silvestre et al., 2005), mainly because of their limited sensitivity to the data (individual records influence only a specific part of the function) and higher flexibility of fitting lactation curves (Misztal, 2006). Splines are piecewise functions consisting of independent segments that are connected in so-called knots (de Boor, 1978). The segments are described by lower order polynomials. The simplest case of a spline function is a linear spline, where the segments are fitted by linear polynomials. Coefficients of linear splines are simple interpolation coefficients between the 2 knots adjacent to the record and 0 between all other knots. Because at most 2 coefficients are nonzero for a given record, the system of equations with splines is sparser than with LP (Misztal, 2006). White et al. (1999) presented an application of cubic splines for analysis of test-day (TD) records of milk production. These functions consist of 2 terms: a) linear interpolation common for all segments and b) second derivatives at internal knots defining the deviation from the overall slope in each interval. However, while the first 2 coefficients (intercept and slope) are allowed to have different variances and are assumed to be correlated, the derivatives are assumed to have the same variances and to be independently distributed. This, in contrast to linear splines, leads to a highly structured covariance matrix. A review of the

theory of spline functions and descriptions of different families of splines is given by Meyer (2005). The Canadian test-day model (CTDM) is a 12-trait random regression model in which TD milk, fat and protein yields, and SCS from the first 3 lactations are analyzed (Schaeffer et al., 2000). Random regressions (additive genetic and permanent environmental) are based on LP of order 4. Only TD records from 5 through 305 DIM are considered in the model. Because less than 40% of Canadian Holstein cows end their lactation before 305 DIM, it is of interest to include into the CTDM the additional TD records recorded after 305 DIM. Higher accuracy of the overall 305-d EBV is expected from such evaluation because of utilization of more information. However, it can also lead to an increase in biasedness of the evaluation because the additional records are more likely to be influenced by the effect of cow’s pregnancy, which is currently not considered in the CTDM. The objectives of this study were 1) to estimate variance components; 2) to test genetic evaluations from RRM based on LP versus RRM based on linear splines with 4, 5, and 6 knots in terms of a wide range of model comparison criteria; and 3) to identify the best model for a routine genetic evaluation of TD milk, fat, protein yield, and SCS records from 5 through 365 DIM from the first 3 lactation records.

Table 2. Description of random regression models1 Model

Type of function

q2

Position of knots (DIM)

LEG SPL4 SPL5 SPL6

Legendre Splines Splines Splines

5 4 5 6

— (5, 65, 245, 365) (5, 65, 125, 245, 365) (5, 65, 125, 245, 305, 365)

1

LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots. Order of polynomial in models with Legendre polynomials or number of knots in models with linear splines. 2

Journal of Dairy Science Vol. 91 No. 9, 2008

3629

LEGENDRE POLYNOMIALS VERSUS LINEAR SPLINES

Figure 1. Posterior mean estimates of additive genetic variance of daily milk yield in first, second, and third lactation (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

MATERIALS AND METHODS Data Variance components were estimated using a data set created by random sampling of 50 herds with at least 50 lactating cows in one year from the Canadian Holstein database, provided by Canadian Dairy Network. The data consisted of 96,756 TD milk, fat and protein yields, and SCS records of 6,094 cows from their first 3 lactations initiated from 1988 through 2006. Only TD records with all 4 traits present on a TD from 5 through 365 DIM were included. All cows were required to have a first lactation record. The pedigree file contained 18,178 animals (Table 1). To evaluate a model’s predictive ability and stability of EBV, 2 additional data sets were considered (Table 1). Data set D01 contained all TD records available until August 2001, and data set D06 contained all TD records available until August 2006.

Models The data were analyzed with 4 multiple-trait, multiple-lactation RRM. The general formula common for all models was (Schaeffer et al., 2000): q

yijklmo = HTDijk + ∑ αijln z n (t ) q

q

n =1

n =1

n =1

+∑ βijmn z n (t ) + ∑ γijmn z n (t ) + eijklmo , where yijklmo was the oth test-day record of the mth cow for a trait i (TD milk, fat, protein yield, or SCS) in lactation j, HTDijk was the kth herd-test-date effect for a trait i and lactation j, αijln was the nth fixed regression coefficient for a trait i and lactation j specific to the lth region-age-season class, q is the number of covariates, βijmn was the nth random regression coefficient for the additive genetic effect of cow m separate for trait i and Journal of Dairy Science Vol. 91 No. 9, 2008

3630

BOHMANOVA ET AL.

Figure 2. Posterior mean estimates of additive genetic variance of daily somatic cell score in first, second, and third lactation (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

lactation j, γijmn is the nth random regression coefficient for the permanent environmental effect of cow m for trait i and lactation j, z(t) was a vector of covariates of size q describing the shape of lactation curve of fixed and random regressions evaluated at t DIM, and e is the residual. Twelve classes of residual variance with 30-d classes were defined for each lactation. The fixed and random regressions were fitted either with LP of order 4 (LEG) or with linear splines with 4 (SPL4), 5 (SPL5), or 6 (SPL6) knots. In the SPL4 model, the 4 knots were located at 5, 125, 245, and 365 DIM (Table 2). The knots were equally spaced (120 DIM) across lactation. The SPL5 model was created by adding one knot at 65 DIM, which created 2 shorter (60 DIM) intervals at the beginning of lactation and gave the spline function flexibility to better fit the peak of the lactation. The SPL6 model had compared with the SPL5 model one additional knot at 305 DIM. The SPL6 model was the most and the SPL4 the least complex

Journal of Dairy Science Vol. 91 No. 9, 2008

model; both the LEG and SPL5 model had the same number of parameters. Linear splines were constructed by fitting a first degree polynomial between each pair of knots. Coefficients of linear splines were calculated as interpolation coefficients between the 2 knots adjacent to the record and 0 among all other knots. Let T be a vector of knots, then covariables of linear splines for DIM t located between knots Ti and Ti+1 were calculated as z i (t ) = z i +1 (t ) =

Ti +1 − t Ti +1 − Ti

t − Ti Ti +1 − Ti

,

= 1 − z i (t ), and z1….i −1,i +2….q = 0.

For records located at a knot i covariables were defined as zi = 1 and z1….i−1,i+1….q = 0. Vector z had at most 2

LEGENDRE POLYNOMIALS VERSUS LINEAR SPLINES

3631

Figure 3. Posterior mean estimates of residual variance of daily milk yield in first, second, and third lactation (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

nonzero elements, and sum of all elements of the vector was equal to 1. For instance, for a record at t = 30, using a model with 4 knots located at t = {5, 65, 245, 365}, the vector of covariables was ⎪⎧ 30 − 5 65 − 30 ⎪⎫ z(30) = ⎪⎨ , , 0, 0⎪⎬ = {0.42, 0.58, 0, 0}. ⎪⎪⎩ 65 − 5 65 − 5 ⎪⎪⎭ In matrix notation each model can be described as y = Xb + Zu + Wp + e, where y was a vector of observations; b was a vector of HTD and regression coefficients for region-age-season class; and u, p, and e were vectors of additive genetic, permanent environmental, and residual values, respectively. X, Z, W were incidence matrices.

The data were assumed to follow y | b, u, p, R ∼ MVN (Xb + Zu + Wp, R) and ⎡ a ⎤ ⎛G ⊗ A 0 0 ⎞⎟⎟ ⎢ ⎥ ⎜⎜ ⎟ ⎜⎜ ⎢ ⎥ var ⎢ p⎥ = ⎜ 0 P ⊗ I 0 ⎟⎟⎟, ⎜ ⎟ ⎢ ⎥ ⎜ 0 R⎟⎟⎠ ⎢⎣ e ⎥⎦ ⎜⎝ 0 where R was a block diagonal matrix with blocks of size 4 × 4 specific to a particular DIM interval and lactation. Residuals were assumed to have the same variance within residual intervals and heterogeneous variance between intervals, and residuals for different DIM were assumed to be uncorrelated. Matrices G and P were the random regression covariance matrices for the genetic and permanent environmental Journal of Dairy Science Vol. 91 No. 9, 2008

3632

BOHMANOVA ET AL.

Figure 4. Posterior mean estimates of residual variance of daily somatic cell score in first, second, and third lactation (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

effects, respectively. The size of both matrices was (q × 3 × 4) by (q × 3 × 4), where q is the size of the vector of random regression coefficients, 3 is the number of lactations, and 4 is the number of traits. The matrix A was the additive genetic relationship matrix, and I was an identity matrix. A Bayesian approach via Gibbs sampling was applied to estimate model parameters. A single long chain of 100,000 samples was generated for each model. The first 20,000 samples were discarded as a burn-in, and the remaining samples were used to compute posterior means of model parameters. Convergence of Gibbs chains was monitored by visual inspections of plots of samples. The prior distributions of parameters of the model were b ∝ constant, u| 3 g ∼ MVN (0, A ⊗ 3 g ), p |3 p ∼ MVN (0,I⊗ 3 p ), 3 g | vg , S2g ∼ IW (vg , vg S2g ), 3 p | v p , S2p ∼ IW (v p , v p S2p ), Journal of Dairy Science Vol. 91 No. 9, 2008

where Σg and Σp were genetic and permanent environmental covariance matrices, respectively; MVN denoted the multivariate normal distribution, and IW denoted the inverse Wishart distribution with ν degrees of belief and the scaling factor S. Values adopted for the scaling factors ( S2g and S2p ) were obtained by combining results from previous studies that considered TD with DIM ≤305 and assumptions made on variances past 305 DIM. Conservative degrees of belief were chosen to represent the vague prior information. Two genetic evaluations were carried out for each of the 4 models that were compared by using variance components previously estimated from the variance component data. The first evaluation (EVAL01) was based on D01 and the second (EVAL06) used D06 data. Mixed model equations were solved by iteration on data with a preconditioned conjugate gradient algorithm using a block diagonal preconditioner (Lidauer

3633

LEGENDRE POLYNOMIALS VERSUS LINEAR SPLINES Table 3. Estimates of posterior expectation of the Bayesian deviance ( D(Q) ), Bayesian deviance evaluated at the posterior mean of the parameters ( D(Q) ), effective number of parameters (pD), deviance information criterion (DIC), and rank of models (Rank) by DIC1 Model

D(Q)

D(Q)

pD

DIC

Rank

LEG SPL4 SPL5 SPL6

117,051 138,603 77,240 13,113

−21,707 2,674 −103,742 −183,500

138,757 135,929 180,982 196,613

255,808 274,532 258,924 236,646

2 4 3 1

1

LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots.

combines both goodness of fit and degree of parameterization of the model and can be calculated easily from the samples generated by a Markov Chain Monte Carlo (MCMC) simulation. The DIC was defined by Spiegelhalter et al. (2002) as

et al., 1999). The convergence criterion was defined as the average relative difference between left and right hand sides of mixed model equations and was required to be 120 DIM)] was calculated as follows: 97

AD =

∑ (ebv 01i − ebv 06i ) i =1

.

97

Similarly as in ERP, ebv06 were shifted by subtracting the average change in EBV from EVAL01 to EVAL06.

RESULTS AND DISCUSSION Trends of daily additive genetic variances of milk and SCS are shown in Figures 1 and 2, respectively. Additive genetic variance functions from LEG followed the typical U shape with high variances at the beginning and at the end of lactation compared with the middle of lactation. Additive genetic variance functions of SPL had a parabolic shape between knots. The overall shape of the spline variance function was influenced by number of knots and correlations between random regression coefficients at knots. Functions were smoother in intervals with higher correlations between knots. The parabolic shape of the variance function does not have a biological background and is an artifact specific to linear splines. However, because test-day records are evenly distributed across lactation, the overestimation and underestimation of variances will be cancelled across lactation and, therefore, should not have a significant effect on genetic evaluation. The overall shapes of the variance functions were similar among models for milk, fat, and protein yield. However, LEG had noticeably higher variances at the extremes of lactation than SPL. Variances of LEG for SCS were higher than

Table 6. Percentage decrease (−) or increase (+) of models’ error of prediction compared with LEG model1 Lactation 1

Lactation 2

Lactation 3

Model

Milk

Fat

Protein

SCS

Milk

Fat

Protein

SCS

Milk

Fat

Protein

SCS

LEG SPL4 SPL5 SPL6

0 −1 0 −1

0 1 1 0

0 0 1 −1

0 −3 −4 −6

0 −3 −1 −4

0 −4 −2 −7

0 −3 0 −4

0 −1 −2 −6

0 −4 −1 −3

0 −6 −3 −7

0 −4 0 −3

0 −3 −4 −10

1

LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots.

Journal of Dairy Science Vol. 91 No. 9, 2008

LEGENDRE POLYNOMIALS VERSUS LINEAR SPLINES

3635

Figure 5. Residual variance (RV) of first parity milk yield calculated as variance of differences between observed and predicted records in data set D06 (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

variances of SPL along the entire lactation. The implausible high variances at the extremes of lactations were reduced by SPL but were not completely removed. This finding suggests that the edge effect is not only caused by the type of function used for regressions but could also be caused by smaller number of TD records at the periphery of the lactation and factors not accounted for in the model such as preferential treatment, between herd variation in the shape of the lactation curve (Gengler and Wiggans, 2001; Jamrozik et al., 2001; de Roos et al., 2004), and the effect of pregnancy. Residual variance increased with lactation number and was highest at the beginning of lactation and gradually decreased with later DIM (Figures 3 and 4). None of the models provided the lowest or highest overall residual for all traits and lactations. Nevertheless, LEG and SPL6 had the lowest residual variance from all models at the end of lactation for all traits and lactations. Comparison of models based on DIC favored SPL6, which was the most complex model (Table 3). The DIC of the SPL6 model was 7% lower than the DIC of the second-ranking model, model LEG. This indicates that

the SPL6 model is not overparameterized relative to LEG. The LEG ranked second but had by just 1% lower DIC than SPL5. The SPL4 ranked last and had 7% higher DIC than the LEG model. As shown in Table 4, the highest average daily heritabilities for milk (0.45), fat (0.36), and protein (0.42) yields were observed in first lactation with SPL6 and in the second and third lactations with LEG. The LEG model had the largest heritabilities for SCS from all models in all 3 lactations (0.21, 0.25, and 0.33). However, the differences among models were relatively small. Posterior standard deviations of average daily heritability ranged between 0.001 and 0.013. Comparison of goodness of fit of models based on the RV identified SPL4 as the worst model in all traits and lactations (Table 5). This model had higher RV than LEG for milk in first (15%), second (13%), and third (17%) lactation. On the other hand, SPL6 was the best model in terms of RV. Both SPL6 and SPL5 had smaller RV than LEG in all traits and lactations. A trend of RV across lactation where each point represents an average RV of a 30-d interval is seen in Figure 5. In all models, the highest RV was observed Journal of Dairy Science Vol. 91 No. 9, 2008

3636

BOHMANOVA ET AL.

Figure 6. Lactation curve of first age-parity class in D06 data set (LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots).

in the interval of 36 to 65 DIM and the lowest in the interval of 336 to 365 DIM. The SPL6 model had of all the compared models the lowest RV across the whole lactation. The LEG had a relatively constant RV across all intervals. The SPL4 model had the highest RV from all models between 5 and 185 DIM. At the end of lactation (186 to 365 DIM), the SPL5 had the highest RV from all models. Figure 6 shows an example of a lactation curve of one region-age-season class. All models with splines predicted the highest milk yield at the very beginning of the lactation and therefore did not create the typical shape of lactation curve with a peak around 35 to 45 DIM as the LEG model. This indicates that the location of the second knot (65 DIM) is not optimal for fitting the peak of the lactation of the fixed regression and instead should be located at earlier DIM. The shape Journal of Dairy Science Vol. 91 No. 9, 2008

of the lactation after the peak is almost linear, and therefore both SPL and LEG provide similar trend. The decline of milk yield after the peak has a nearly constant shape, suggesting that a spline function with just 2 knots after the peak would be sufficient. This fact cannot be generalized to random regressions because random regressions fit not only mean but also variances and variance functions have more osculation than average lactation curve. Differences among models in the PSB and the RHO were much smaller than in the RV but also indicated SPL6 as the best and SPL4 as the worst model (Table 5). The values of the PSB, the RHO, and the RV for SCS were similar between SPL5 and SPL6. As given in Table 6, differences among models in ERP were not significant in first lactation for milk, fat, and protein. In the second and third lactation, all SPL

3637

LEGENDRE POLYNOMIALS VERSUS LINEAR SPLINES

Table 7. Average decline in EBV (305-d EBV for milk, fat, and protein and average daily EBV for SCS) from EVAL01 to EVAL06 for bulls newly proven in EVAL011 Lactation 1

Lactation 2

Lactation 3

Model

Milk

Fat

Protein

SCS

Milk

Fat

Protein

SCS

Milk

Fat

Protein

SCS

LEG SPL4 SPL5 SPL6

−123 −145 −106 −103

−1.0 −3.5 −0.5 −0.7

−1.2 −4.4 −0.6 −0.8

−0.007 −0.014 0.002 0.001

−233 −149 −94 −89

−3.2 −4.4 0.3 −0.2

−3.6 −3.4 0.8 0.8

0.084 0.026 0.035 0.035

−223 −138 −98 −97

−2.7 −5.8 −1.1 −1.6

−3.1 −3.2 0.4 0.3

0.050 0.015 0.013 0.022

1 LEG = model with Legendre polynomials; SPL4–6 = model with linear splines with 4 to 6 knots; EVAL01 = genetic evaluation using test-day (TD) records available until August 2001; EVAL06 = genetic evaluation using TD records available until August 2006.

were better than LEG, and SPL6 had smallest ERP. The SPL4 model, a model with lowest number of parameters, had surprisingly smaller ERP than LEG and also SPL5. For SCS, ERP values of SPL were smaller compared with LEG not only in the second and third but also in the first lactation. The lowest ERP for SCS were found with SPL6, being 6, 6, and 10% smaller than with LEG in first, second, and third lactation, respectively. Decline in bulls’ EBV after their first official genetic evaluation is currently one of the major concerns of the Canadian dairy industry. As given in Table 7, smaller average decline in 305-d EBV from EVAL01 to EVAL06 of newly proven bulls in EVAL01 were obtained with the SPL5 and SPL6 model compared with LEG model. This fact was especially apparent in the second and third lactations, where the decline was 44% (fat at third lactation from SPL6) to 99% smaller than in LEG. On the other hand, significantly higher drops in EBV between the 2 genetic evaluations were observed with SPL4 compared with LEG. The SPL6 model was the most complex model, and as expected, converged in the longest time (17 d). The EVAL06 with this model was 5 and 7 d slower than EVAL06 with LEG and SPL5, respectively. In a routine genetic evaluation 1) solutions from previous evaluations are used as starting values (this study used zeros as starting values), and 2) less strict convergence criteria are applied. This means that in practice, the time differences between the compared models may be less than was found in the present study. The LEG model converged in the shortest total CPU time. Convergence rate of this model was better than the convergence rate of SPL4, which had fewer parameters than LEG. The slower convergence rate of SPL than of LEG was probably caused by higher correlations among knots compared with correlations among coefficients of Legendre polynomials. This study showed that spline functions with 6 knots are superior to LP of order 4 for fitting TD records with DIM up to 365 d in terms of goodness of fit and stability of 305-d EBV. Models with spline functions and knots

located at different DIM than the functions used in this study could provide a better fit, but large changes in overall model performance are not expected with data where test-day records are evenly distributed across lactation because the differences will cancel out on average. Additional research is needed to identify optimal numbers and placement of knots for both fixed and random regression. Fixed regression with denser knots at the beginning and sparser knots in the middle and end of lactation could provide smoother curves than in this study. Cubic splines or linear splines with a smoothing parameter can create smoother curves than linear splines only (de Boor, 1978). However, these functions were not tested against currently used LP because their covariance matrix is highly structured and implementation of such functions into the CTDM would not be straightforward. Use of different functions for additive genetic and permanent environmental effects (spline functions for one random component, Legendre polynomials for the other component, or splines for both components but with different knots) were not investigated. LópezRomero et al. (2004) suggested that only functions of the same order of fit should be considered to provide equal opportunity of variation for both components. Models with LP of higher order than 4 were not considered. In a preliminary study (unpublished), such models showed extremely high additive genetic variances in early and late lactation and large oscillatory patterns in the middle of lactation. This was in agreement with results by López-Romero et al. (2004). CONCLUSIONS The SPL6 model had the best goodness of fit, measured by PSB, RHO, and RV, among the 4 compared models. No significant differences were found in ERP for milk, fat, and protein among models in first lactation, but noticeably smaller ERP were obtained with SPL compared with LEG in second and third lactation. In the case of SCS, SPL performed better in terms of ERP Journal of Dairy Science Vol. 91 No. 9, 2008

3638

BOHMANOVA ET AL.

in all 3 lactations. The most stable EBV were obtained with SPL6. The DIC criterion also identified the SPL6 model as the most plausible model. Additive genetic variances at the extremes of lactations were smaller with SPL compared with LEG. The SPL5 and SPL6 were better models than LEG, especially for SCS. In general, SPL6 had the best overall performance among all compared models. Based on results from this study, the best model for genetic evaluation of TD records with ≤ 365 DIM is the RRM using linear splines with 6 knots. ACKNOWLEDGMENTS This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET: http://www.sharcnet.ca). Funding was provided by DairyGen Council of Canadian Dairy Network and NSERC of Canada. Three anonymous reviewers are thanked for their helpful comments and suggestions. REFERENCES Ali, T. E., and L. R. Schaeffer. 1987. Accounting for covariances among test day milk yields in dairy cows. Can. J. Anim. Sci. 67:637–644. de Boor, C. 1978. A Practical Guide to Splines. Springer-Verlag, New York, NY. de Roos, A. P., A. G. Harbers, and G. de Jong. 2004. Random herd curves in a test-day model for milk, fat, and protein production of dairy cattle in the Netherlands. J. Dairy Sci. 87:2693–2701. Druet, T., F. Jaffrezic, D. Boichard, and V. Ducrocq. 2003. Modeling lactation curves and estimation of genetic parameters for first lactation test-day records of French Holstein cows. J. Dairy Sci. 86:2480–2490. Gengler, N., and G. R. Wiggans. 2001. Variance of effects of lactation stage within herd by herd yield. J. Dairy Sci. 84(Suppl. 1):216. (Abstr.)

Journal of Dairy Science Vol. 91 No. 9, 2008

Interbull. 2007. Interbull routine genetic evaluation for dairy production traits, August 2007. http://www-interbull.slu.se/eval/ aug07.html Accessed Nov. 30, 2007. Jamrozik, J., D. Gianola, and L. R. Schaeffer. 2001. Bayesian estimation of genetic parameters for test day records in dairy cattle using linear hierarchical models. Livest. Prod. Sci. 71:223–240. Lidauer, M., I. Stranden, E. A. Mantysaari, J. Poso, and A. Kettunen. 1999. Solving large test-day models by iteration on data and preconditioned conjugate gradient. J. Dairy Sci. 82:2788–2796. López-Romero, P., R. Rekaya, and M. J. Carabano. 2004. Bayesian comparison of test-day models under different assumptions of heterogeneity for the residual variance: The change point technique versus arbitrary intervals. J. Anim. Breed. Genet. 121:14–25. Meyer, K. 2005. Random regression analyses using B-splines to model growth of Australian Angus cattle. Genet. Sel. Evol. 37:473–500. Misztal, I. 2006. Properties of random regression models using linear splines. J. Anim. Breed. Genet. 123:74–80. Pool, M. H., L. L. Janss, and T. H. Meuwissen. 2000. Genetic parameters of Legendre polynomials for first parity lactation curves. J. Dairy Sci. 83:2640–2649. Schaeffer, L. R. 2004. Application of random regression models in animal breeding. Livest. Prod. Sci. 86:35–45. Schaeffer, L. R., J. Jamrozik, G. J. Kistemaker, and B. J. Van Doormaal. 2000. Experience with a test-day model. J. Dairy Sci. 83:1135–1144. Silvestre, A. M., F. Petim-Batista, and J. Colaco. 2005. Genetic parameter estimates of Portuguese dairy cows for milk, fat, and protein using a spline test-day model. J. Dairy Sci. 88:1225– 1230. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64:583–639. Sullivan, P. G., J. W. Wilton, L. R. Schaeffer, G. J. Jansen, J. A. B. Robinson, and O. B. Allen. 2005. Genetic evaluation strategies for multiple traits and countries. Livest. Prod. Sci. 92:195–205. White, I. M. S., R. Thompson, and S. Brotherstone. 1999. Genetic and environmental smoothing of lactation curves with cubic splines. J. Dairy Sci. 82:632–638. Wilmink, J. B. M. 1987. Adjustment of lactation yield for age at calving in relation to level of production. Livest. Prod. Sci. 16:335–345.