SPSS Syntax for Applying Rules for Combining ...

SPSS Syntax for Applying Rules for Combining Multivariate Estimates in Multiple Imputation Joost R. van Ginkel Leiden University February 24, 2014

1

Files in this zip-file • MI-mul2manual.pdf: this file • MI-mul2.sps: SPSS syntax file. This is a read-only file. • runMI-mul2.sps: SPSS syntax file. This file may be modified to suit your needs (see Below). • incomplete2 imp.sav: A multiple-imputation data set in SPSS format containing the responses of 300 ’respondents’ to 41 items, denoted V1, . . ., V41. The original incomplete data set is a simulated data set from a simulation study by Van Ginkel, Van der Ark & Sijtsma (2007). The data file contains the original incomplete data set, plus five completed versions of the incomplete data set. The different versions are indicated by an additional variable imputation , which contains the data set number (0 indicating the original incomplete data set). The five completed versions were created using multiple Two-Way imputation for separate scales (Van Ginkel, Van der Ark, & Sijtsma, 2007). Variables V1 to V40 have ordered answer categories ranging from 0 to 4, variable V41 is a dichotomous variable with values 1 and 2. Five percent of the scores are missing. Missing values are indicated by a comma. • ExampleMultinomial.sav: An example SPSS data file containing the results of five multinomial logistic regression analyses resulting from five completed data sets. 1

• ExampleLogistic.sav: An example SPSS data file containing the results of five binary logistic regression analyses resulting from five completed data sets. • ExampleOrdinal.sav: An example SPSS data file containing the results of five ordinal regression analyses resulting from five completed data sets. • ExampleMixed.sav: An example SPSS data file containing the results of five regression analyses resulting from five completed data sets. • ExampleMixedParameters.sav: An example SPSS data file containing regression coefficients of five multilevel analyses resulting from five completed data sets. • ExampleMixedCovariances.sav: An example SPSS data file containing covariance matrices of the regression coefficients of five multilevel analyses resulting from five completed data sets. • ExampleMixedRepeated.sav: A merged example SPSS data file containing both the results of the files ExampleMixedParameters.sav and ExampleMixedCovariances.sav.

2 2.1

About the SPSS Syntax The Purpose

The file MI-mul2.sps is the second version of the syntax file MI-mul.sps. The SPSS syntax allows a researcher to draw inferences for multivariate estimates from a data set with missing values, where the values are estimated multiple times using multiple imputation (Rubin, 1987). When drawing inferences from multiple imputation, a distinction must be made between univariate estimates, for example, a regression coefficient of a continuous or dichotomous variable, and multivariate estimates, for example, multiple regression coefficients of one categorical variable with multiple categories. To test whether such a categorical variable has a significant effect, one overall test must be carried out which tests whether all coefficients of this variable differ significantly from 0. Thus, for multivariate estimates, different rules for multiple imputation apply than for univariate estimates. The SPSS file MI-mul2.sps performs the calculations for combining the results of multivariate estimates. It should be noted that SPSS has a built-in procedure

2

for combining the results of a multiple-imputation data set. However, this procedure is limited to univariate estimates only. The most important difference with the old file MI-mul.sps is that this new version is more user-friendly than the old one, and that it has been programmed such that the procedure is more compatible with the built-in procedures from SPSS. The old file MI-mul.sps can still be downloaded from the same link as MI-mul2.sps.

2.2

The Method

For multivariate estimates, the parameters from the statistical analysis are combined using the following procedure: suppose Q is a k×1 vector of parameter estimates (for example, a set of regression coefficients of one categorical variable), and we have t = 1, ..., m complete versions of an incomplete data ˆ (t) be the sampling estimate of Q in completed data set t, and let set. Let Q ˆ (t) . For the m completed data U (t) be a covariance matrix associated with Q sets, the overall estimate of vector Q is estimated as the mean of m vectors: m

X ˆ (t) . ¯= 1 Q Q m t=1

(1)

¯ has two parts: a between-imputation The covariance matrix associated with Q part, and a within-imputation part. The within-imputation covariance matrix is computed as the mean of m variance estimates: m

1 X (t) U . U¯ = m t=1

(2)

The between-imputation covariance matrix is computed as m

1 X ˆ (t) ¯ 2 B= (Q − Q) . m − 1 t=1

(3)

Define the relative increase in variance r1 due to the missing data across the components of Q as r1 = (1 + m−1 )tr(B U¯ −1 )/k.

(4)

The total variance is computed as T = (1 + r1 )U¯ .

3

(5)

¯ is tested against parameter value Q using a statistic which The estimate Q is computed as follows: ¯ − Q0 )T T −1 (Q ¯ − Q0 )/k. D1 = (Q

(6)

The p-value for testing Q = Q0 is p = P (Fk,ν1 ≥ D1 ).

(7)

Define t = k(m − 1). The number of degrees of freedom is ν1 = 4 + (t − 4)[1 + (1 − 2t−1 )r1−1 ]2

(8)

ν1 = t(1 + k −1 )(1 + r1−1 )2 /2

(9)

for t > 4 and for t ≤ 4. ¯ The idea behind this procedure is that the overall test for estimate Q is corrected for the extra uncertainty caused by the missing data. For more information about multiple imputation, we refer to Schafer (1997) and Rubin (1987).

2.3

Disclaimer and Bugs

It should be emphasized that this SPSS syntax is distributed without any warranty on the part of the author. Although the SPSS syntax has been tested thoroughly, one can never fully exclude the possibility of errors. The author appreciates suggestions and reports of detected errors (please enclose SPSS data file). All correspondence can be sent to Joost R. van Ginkel, Leiden University, Faculty of Social and Behavioural Sciences, PO Box 9555, 2300 RB Leiden, The Netherlands [email protected]

3 3.1

Using the SPSS Syntax Preparing your SPSS File with Parameter Estimates

The use of the syntax file MI-mul2.sav will be illustrated with an incomplete data set named incomplete2.sav. This data set is not included in this zip4

file but the data of this file can be found in incomplete2 imp.sav on top of the data sheet (data set with variable imputation = 0). The analysis examples in this manual include variables V41 and V17, and the sum scores of items V1 to V10, items V11 to V20, items V21 to V30, and items V31 to V40. Variable V41 is a dichotomous background variable, variable V17 is an item with five answer categories which is chosen only for illustrative purposes, and variables V1 to V40 are items from a 40-item quesionairre, in which each ten items form a subscale. Because the data are incomplete, and we want to handle the missing data using multiple imputation, we must perform the analyses in several steps. Before performing statistical analyses, we must impute the data five times (see file incomplete2 imp.sav). Thus, we get five plausible complete versions of the incomplete data set. We compute the sum scores of the items that together form a subscale. This can either be done using the menu (task bar: Transform, Compute) or using the syntax: COMPUTE COMPUTE COMPUTE COMPUTE EXECUTE

score1 score2 score3 score4 .

= = = =

sum(V1 to V10) . sum(V11 to V20). sum(V21 to V30). sum(V31 to V40).

Now that we have computed the sum scores, we can perform statistical analyses for each of the completed data sets separately, and combine the results into one overall analysis. The steps that need to be taken are described below. 3.1.1

Split File

In order to combine the results of a statistical analysis carried out on multiply imputed data sets, we must first establish that the analysis in question is carried out for each imputed data set separately, rather than for the total multiply imputed data set at once. We use the SPSS ’Split File’ option for this purpose (task bar: Data, Split File). We choose the variable imputation as grouping variable, and click OK. See Figures 1 and 2. If we now perform statistical analyses, they will be carried out for each completed data set separately. 3.1.2

The OMS Option

To obtain a data file that contains the results of the separate analyses analyses, a number of steps are taken:

5

Figure 1: Choosing the Split File Option in SPSS.

1. We select the OMS option in SPSS (task bar: Utilities, OMS Control Panel). See Figure 3. 2. Within OMS we select Tables in the Output Types menu. 3. The type of analysis we want to perform is selected. Figure 4 shows an example of Mixed Models, which can be used for various linear models such as linear regression analysis, analysis of variance, or multilevel analysis. Thus, we select Mixed in the Command Identifiers menu. 4. The types of estimates from the output that we want to write to a data file are specified. As already noted in Section 2.2, we need a set of parameter estimates, and a covariance matrix for combining the results of statistical analyses. To this end, we first select Covariance Matrix in the Table Subtypes for Selected Commands menu (see Figure 4). Next, we scroll down to Parameter Estimates in the Table Subtypes for Selected Commands menu. We click on Parameter Estimates while holding the Ctrl-key. Note that for other types of analyses the parameter estimates and covariance matrices may be found under different names. Examples of other analyses are given in Section 4.

6

Figure 2: Choosing imputation as Grouping Variable.

Figure 3: Choosing the OMS Option.

7

Figure 4: Choosing Options within OMS.

5. We choose the option that will write the results to a data file (Output Destinations menu, File). By clicking Options and choosing SPSS Statistics Data File in the Format menu, the destination file will be an SPSS data file. 6. A name for the data file is specified (here, Example.sav). See Figure 5. 7. Finally, we click Add and click OK twice. All subsequent analyses using that are carried out using Mixed Models are now written to a file named Example.sav until the specified OMS command is ended. 3.1.3

Carrying out the Analysis

After the OMS command is specified, a statistical analysis is carried out and its results are written to a file. This requires the following steps: 1. Carry out the analysis of interest. To continue the earlier example in Section 3.1.2, an analysis in Mixed Models is performed (task bar: Analyze, Mixed Models, Linear). Note that for other examples other

8

Figure 5: Choosing the Destination File within OMS.

Figure 6: Example of a File with Parameter Estimates (Showing the Parameter Estimates).

9

Figure 7: Example of a File with Parameter Estimates (Showing the Covariance Matrices).

analyses must be selected. Because of the split file option (see Section 3.1.1) the analysis is carried out for each of the imputed data sets separately. 2. After carrying out the analysis of interest we return to the OMS option (task bar: Utilities, OMS Control Panel). We click on the request that was created earlier, we click End and click OK twice. The results of the five separate analyses are now written to a new data file. Figures 6 and 7 show what a file with parameters and covariance matrices could look like. Figure 6 shows part of the parameter estimates in the file (variable Estimate). When scrolling down and scrolling to the right, the variables representing the covariance matrices become visible (Figure 7).

3.2

Syntax Options

After saving the results, we open the syntax file runMI-mul2.sps (task bar: File, Open, Syntax). The SPSS syntax file looks like

10

INCLUDE ’{path}MI-mul2.sps’. RULESMIMUL FILE = ’{path + filename}’ /ESTIMATE = {estimates} /COV = {covariance matrices} /LEVELSIND = {number of levels independent variables} /M = {number of completed data sets}. This file calls the file runMI-mul2.sps which performs the actual computations. The file runMI-mul2.sps may be modified to suit your needs. The options that may be specified are described in the next sections. 3.2.1

Specifying the File Names

In the first two lines the paths of the files MI-mul2.sps and the file with the estimates have to be specified. For example, if MI-mul2.sps is located in C:\Program Files\SPSS\MI-mul2.sps and the file with the estimates has been saved to C:\DataFiles\Example.sav the first two lines must be changed into: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\Example.sav’ 3.2.2

Specifying the Parameter Estimates and Covariance Matrices

After specifying the file names, the next two lines /ESTIMATE = {estimates} /COV = {covariance matrices} are modified to suit your needs. The SPSS data file that includes the parameter estimates and covariance matrices of the parameter estimates is usually built up as follows: the parameter estimates (model coefficients) are represented by a variable in the SPSS file, and the covariance matrices are represented by a number of variables. The names of the variables representing the parameter estimates and covariance matrices depend on the type of analysis [Mixed Models, (multinomial) logistic regression, ordinal regression]. Suppose the variable that contains the parameter estimates is called Estimate, and the variables that contain the covariance matrices, are variables Intercept to V174V412, lines 3 and 4 become 11

/ESTIMATE = Estimate /COV = Intercept to V174V412 3.2.3

Specifying the Number of Levels of the Independent Variables

In the next line, the number of levels of each independent variable is specified. Suppose we have one independent variable with 2 levels, one independent variable with 5 categories, and the interaction between both variables with 2 × 5 = 10 levels. This is specified as /LEVELSIND = 2,5,10 A continuous variable is specified as a variable having one category. The number of levels specified for a variable is actually the total number of parameters of this variable in the model, including the redundant ones. Whenever the OMS option writes the parameters and the covariance matrix of the parameters to a file, the variable including the parameters (Figure 6) and the variables that together form the covariance matrices (Figure 7) also include the redundant parameters, indicated by a 0. Because a continuous variable has only one parameter and no redundant ones, this is specified as a variable with one category. Note that the levels must be specified in the same order as the order of the effects in the file. 3.2.4

Specifying the Number of Levels of the Dependent Variable (Optional)

In some analyses such as multinomial logistic regression, the dependent variable is categorical. When combining the results of such analyses, the number of levels of the dependent variable has to be specified as well. This is specified with the optional subcommand /LEVELSDEP. This option has to be specified in an additional line which is not included in the runMI-mul2.sps file. Unlike the /LEVELSIND command (see Section 3.2.3), redundant parameters should not be specified here. Thus, if a categorical variable has 5 categories, this is specified as: /LEVELSDEP = 4 (the number of categories minus the reference category). The default option is 1, which can be used when the dependent variable is dichotomous or continuous.

12

3.2.5

Specifying the Number of Levels of the Intercept (Optional)

Another type of analysis with a categorical dependent variable is ordinal regression. Ordinal regression models have several intercepts. In these models the dependent variable is ordinal and has several levels. In such models, the number of intercepts equals the number of categories of the dependent variable, minus 1 (the reference category). Suppose the dependent variable has 5 categories, this is specified in an additional line, not included in the runMI-mul2.sps file: /LEVELSINT = 4 By default, the /LEVELSINT command equals 1. This default value may be used when the dependent variable is either dichotomous or continuous. 3.2.6

Adjusted Number of Degrees of Freedom (Optional)

By default, the syntax file computes combined F -tests with error degrees of freedom using Equations 8 and 9. Using this approximation, the number of degrees of freedom may sometimes exceed the number of error degrees of freedom of the separate analyses of the five imputed data sets. This is because the calculation of the number of degrees of freedom in Equations 8 and 9 are based on the assumption that the sample size is sufficiently large for the asymptotic normal approximation (Schafer, 1997, p. 108). However, for smaller samples, we need an adjusted number of degrees of freedom smaller than the number of degrees of freedom of the five separate analyses. If we want to use an adjusted number of error degrees of freedom, we should specify this in the syntax. In the example of Section 3.1.3 the variable df (Figure 6) contains the number of degrees of freedom of each effect. By adding the following line to the syntax: /DF = df the analyses are performed with an adjusted number of degrees of freedom smaller than the number of error degrees of freedom of each imputed data set separately (290 in the example). This loss of degrees of freedom compared to the number of 290 represents the extra uncertainty caused by the missing data. The adjusted number of degrees of freedom is approximated with an extremely complex formula, which we shall not give here. The interested reader is referred to Reiter (2007). Finally, it may be noted that by default the number of degrees of freedom is approximated using Equations 8 and 9, but this can also be specified by means of: 13

/DF = 0 where the zero indicates that the standard approximation is used. 3.2.7

Specifying the Number of Imputations

The number of imputations needs to be specified as well. Suppose the data were imputed 5 times, the line: /M = {number of completed data sets}. has to be changed into: /M = 5.

4

Examples

In the next sections we are going to show specific examples of analyses that are combined using the MI-mul2.sps file. For each example we use the completed data file incomplete2 imp.sav.

4.1

Multinomial Logistic Regression

In this example of multinomial logistic regression we take variables V41, score1 (Section 3.1), and the interaction of score1 × V41 as independent variables, and V17 as the dependent variable. The folllowing steps are taken: 1. A split file is carried out as described in Section 3.1.1. 2. We perform an OMS option as described Section 3.1.1. However, in step 3 we select Nominal Regression in the Command Identifiers menu. In the Table Subtypes for Selected Commands menu (step 4) we select Asymptotic Covariance Matrix and Parameter Estimates. 3. A full factorial multinomial logistic regression is carried out (task bar: Analyze, Regression, Multinomial Logistic) for the five completed ˆ (t) and their covariance matrices data sets. The parameter estimates Q have to be displayed in the output as well. How this is done is shown in Figure 8. 4. We end the OMS command (Section 3.1.3, step 2).

14

Figure 8: Selecting the Parameters and the Covariance Matrix in Multinomial Logistic.

5. File ExampleMultinomial.sav contains the results of these analyses. If we would like to combine the results of these analyses into one final result, we proceed as follows. The variable that contains the parameter estimates is B, and the variables that contain the covariance matrices, are @0 Intercept to @3 V412score1. Assuming the files are located in the same directories as in Section 3.2.1, lines 1 to 4 become INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMultinomial.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 Furthermore, a continuous variable (here, variable score1), always has one level. Thus, line 5 becomes /LEVELSIND = 1,2,2 (two levels for both variable V41 and the interaction score1 × V41). An additional line is added to the syntax, the number of levels of the dependent variables is specified (see, Section 3.2.4). Thus, the final syntax looks like this: 15

INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMultinomial.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 /LEVELSIND = 1,2,2 /LEVELSDEP = 4 /M = 5. 6. Run the syntax file (task bar: Run, All) and the combined results of the five separate analyses will appear in the output. Furthermore it is important to note that in multinomial logistic regression, no degrees of freedom must be specified. The example data file does have a variable that contains the degrees of freedom, but those degrees of freedom are the degrees freedom of the effects, while in multiple imputation the error degrees of freedom are adjusted. Standard statistical tests for multinomial logistic regression do not have error degrees of freedom.

4.2

Binary Logistic Regression.

For binary logistic regression we cannot use the standard option in SPSS (task bar: Analyze, Regression, Binary Logistic) because this option does not provide an asymptotic covariance matrix of parameters estimates, which we need for combining the results. Instead, we use multinomial logistic regression for this purpose too. By using a binary outcome variable and specifying the first answer category as the reference category, the resulting analysis is equivalent to a binary logistic regression. Thus, we can proceed in the same way as in the former multinomial logistic regression example in Section 4.1. File ExampleLogistic.sav contains the results of five logistic regression analyses applied to the five completed data sets. The independent variables of these analyses are score1, V17, and the interaction of score1 × V17. The dependent variable is V41. The syntax that combines the analyses into one result looks like this: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleLogistic.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 /LEVELSIND = 1,5,5 /LEVELSDEP = 1 /M = 5. 16

Figure 9: Selecting the Parameters and the Covariance Matrix in Ordinal.

Run the syntax file (task bar: Run, All) and the combined results of the five separate analyses are now in the output. It may be noted that the penultimate line (/LEVELSDEP = 1) can also be omitted because the default number of levels of the dependent variable (minus the reference category) is 1 (see, Section 3.2.4).

4.3

Ordinal Regression

As a third example, we will show how the results of ordinal regression are combined. Again we carry out an analysis with score1, V41, and the interaction of score1 × V41 as the independent variables, and V17 as the dependent variable, only now we assume that variable V17 is ordinal rather than nominal (which makes sense because variable V17 is supposed to be a rating-scale item). The OMS options for this analysis are PLUM in the Command Identifiers menu, and Asymptotic Covariance Matrix and Parameter Estimates in the Table Subtypes for Selected Commands menu (see, Section 3.1.1, step 4)). Next, we carry out a full factorial ordinal regression (task bar: Analyze, Regression, Ordinal) for the five completed data sets.The parameter estimates and their covariance matrices have to be displayed in the output as well. See, Figure 9 for how this is done. We end the OMS 17

request. File ExampleOrdinal.sav contains the results of these analyses. Again we assume that MI-mul2.sps and ExampleOrdinal.sav are located in the same directory as specified in Section 3.2.1. As pointed out in Section 3.2.5 ordinal regression models have several intercepts, which must be specified with the optional /LEVELSINT subcommand. The complete syntax file for combining the results of the ordinal regressions, is given by: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleLogistic.sav’ /ESTIMATE = Estimate /COV = V170 to V412score1 /LEVELSINT = 4 /LEVELSIND = 1,2,2 /M = 5. When we run the syntax (task bar: Run, All) the combined results of the five separate ordinal regressions will appear in the output.

4.4

Analysis of Variance

In the following two sections we will discuss two examples of analysis of variance (ANOVA). Combining the results of ANOVA is less trivial than the analyses that were shown in the previous examples. This lies in the fact that ANOVA models use a different parameterization than (logistic) regression models. Van Ginkel and Kroonenberg (2014) showed that the results of ANOVA on multiply imputed data sets may be combined by carrying out the ANOVA as a regression analysis, but rather than using the standard dummy coding for the independent variables as in most regression models, we have to use effect coding (Edwards, 1985, pp. 146-150). Unfortunately, effect coding is not implemented in SPSS so we have to recode the predictors of the ANOVA ourselves. First we willl discuss an example of ANOVA with independent measures. Second, an example of repeated measures ANOVA is discussed. 4.4.1

Analysis of Variance with Independent Measures

As stated earlier, the combination of results of ANOVA require the use of effect coding rather than dummy coding. Suppose a categorical variable has C categories, effect coding splits this variable into k = C - 1 variables (one for each category, minus the reference category), with a value of 1 if

18

the respondent belongs to category c, - 1 if (s)he belongs to the reference category, and 0 otherwise. Below we are going to describe the steps needed for combining the results of ANOVA, using variables V17, V41 and the interaction of V17 × V41 as predictors, and score1 as the dependent variable. 1. We create effect coded variables using the menu (task bar: Transform, Compute/Recode) or using the syntax: RECODE V41 (1=-1) (2=1) INTO V41EC . RECODE V17 (0=1) (4=-1) (ELSE=0) INTO RECODE V17 (1=1) (4=-1) (ELSE=0) INTO RECODE V17 (2=1) (4=-1) (ELSE=0) INTO RECODE V17 (3=1) (4=-1) (ELSE=0) INTO COMPUTE V17V41EC0 = V41EC*V17EC0 . COMPUTE V17V41EC1 = V41EC*V17EC1 . COMPUTE V17V41EC2 = V41EC*V17EC2 . COMPUTE V17V41EC3 = V41EC*V17EC3 . EXECUTE.

V17EC0 V17EC1 V17EC2 V17EC3

. . . .

Here, we arbitrarily chose category 1 as the reference category of V41 and category 5 as the reference category of V17. Other choices for a reference category will lead to the exact same results. The result of the above syntax commands are displayed in Figure 10. 2. A split file as described in Section 3.1.1 is carried out. 3. We perform an OMS as described in Section 3.1.2. 4. We carry out the regression analysis with the effect coded variables. • The covariance matrix of the parameter estimates is not provided in the standard regression or ANOVA procedure in SPSS. Therefore we must turn to the procedure Mixed Models (see, Section 3.1.3, step 1). This procedure performs multilevel analysis but may also be used for simple regression models. • Since we don’t have repeated measures, we click Continue without specifying any variables for Subjects and for Repeated. • After clicking Continue, the effect coded variables are entered as covariates (not as factors) and variable score1 is entered as the dependent variable.

19

Figure 10: Effect Coded Variables.

• We specify all effect coded variables as fixed effects (button Fixed). Use only main effects because the interaction between variables V17 and V41 is already included. See, Figure 11. • Once the fixed effects have been specified, we click Continue. We click on Statistics and select Parameter estimates and Covariances of parameter estimates and click Continue. See, Figure 12. • When clicking Ok the analysis is run. 5. We end the OMS command (see, Section 3.1.3, step 2). 6. File ExampleMixed.sav contains the results of these analyses. Variable Estimate contains the regression coefficients and variables Intercept to V17V41EC3 contain the covariance matrices. • Again we assume that the files are located in the same directories as in Section 3.2.1. Thus, lines 1 to 4 of the file runMI-mul2.sps become INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V17V41EC3 20

Figure 11: Models.

Specifying the Fixed Effects in Mixed

Figure 12: Selecting the Parameters and the Covariances of the parameters in Mixed Models.

21

• A difference between ANOVA and the other examples is that because of the use of manually created effect coded variables, redundant parameters are not displayed in variable Estimate and variables Intercept to V17V41EC3. This has consequences for the way we must proceed. Unlike the other examples, the number of levels specified for each variable in the /LEVELSIND subcommand, should not include the redundant categories. Since the independent variable V41 has two categories, there is only one nonredundant category for this variable. Variable V17 has 5 categories and hence 4 nonredundant categories. Finally, the interaction V41 × V17 has 1 × 4 = 4 nonredundant categories. Thus, line 5 of syntax file runMI-mul2.sps must be changed into: /LEVELSIND = 1,4,4 • Furthermore, unlike Wald tests in ordinal or (multinomial) logistic regression, F -tests in ANOVA have error degrees of freedom, which have to be adjusted downwards using the /DF command subcommand (see, Section 3.2.6). • Finally, we specify the number of complete data sets so the complete syntax looks like this: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V17V41EC3 /LEVELSIND = 1,4,4 /DF = df /M = 5. 7. Run the syntax file (task bar: Run, All) and the combined and ANOVA will appear in the output. 4.4.2

Repeated Measures Analysis of Variance

Finally, we are going to show an example of repeated measures ANOVA. Repeated measures ANOVA is the most difficult analysis to combine using the syntax. Besides the difficulties that standard ANOVA has (using Mixed Models rather than the standard General Linear Model procedure and using effect coding instead of dummy coding), repeated measures ANOVA has two additional difficulties. Firstly, applying Mixed Models to repeatedmeasures data requires the data to be stored as a long file. In other words, the repeated measures have to be represented by different records rather 22

than different variables. Secondly, using the OMS option as described in the previous examples will fail when used for repeated measures ANOVA. For this example, we use the total scores of the subscales score1 to score4 as a within-subjects factor (see, Section 3.1. In this example we are going to carry out a repeated measures ANOVA with subscale as a withinsubjects factor, and variable V41 as a between-subjects factor. The steps that are needed to combine the results of Repeated measures ANOVA, together with their difficulties, are described below. 1. Firstly, the data are restructured such that the four test scores are represented as one variable which we will call score, and repeated measures are presented by different records. To this end the following steps are taken: • We use the ”Restructure” option (Data, Restructure). Because per respondent we want the four test score to be seperate records, we choose (Restructure selected variables into cases and click Next. • Next, we choose the number of variables we want to restructure. Since we want the four test scores to form one variable, we choose One and click Next. • Afert clicking Next we have to choose the variable that represents a case (Case Group Identification). Since we have variable containing case numbers, we choose Use case number. When choosing this option, SPSS will create a variable id (optionally, you may change the variable name) in the restructured file of which the value equals the case number of the original file. In this example, we call the variable to be transposed score (other names are possible as well). We enter score1 to score4 as the variables to be transposed. Because we need variables V41 for the analysis and imputation for the split file option, we choose these variables as fixed variables. These variables will be maintained in the restructured file. Once the above steps have been taken, we click Next. • We choose the number of indicator variables. Here we choose one, namely one variable that contains the subscale number. • After clicking Next we have to choose a name for the index variable. In this example we name the index variable subscale, and click Next twice.

23

Figure 13: Restructured Data Set.

• Finally, we finish the restructuring by clicking Finish. The data have now been restructured into a file that can be used in Mixed Models. See, Figure 13. 2. We now carry out a split file as described in Section 3.1.1. 3. We need to create effect coded variables for the between-subjects factor V41 and the within-subjects factor Subscale. This may be done as follows, using either the menu (task bar: Transform, Recode into different variables/Compute) or the syntax: RECODE V41 (2=1) (1=-1) INTO V41EC. RECODE Subscale (1=1) (4=-1) (ELSE=0) INTO SubscaleEC1. RECODE Subscale (2=1) (4=-1) (ELSE=0) INTO SubscaleEC2. RECODE Subscale (3=1) (4=-1) (ELSE=0) INTO SubscaleEC3. COMPUTE V41Subscale1EC = V41EC*SubscaleEC1. COMPUTE V41Subscale2EC = V41EC*SubscaleEC2. COMPUTE V41Subscale3EC = V41EC*SubscaleEC3. EXECUTE. 4. Before carrying out the analyses we must first perfom an OMS command. Unfortunately, for repeated measures ANOVA this is much more 24

complicated than for other analyses. One property of repeated measures ANOVA is that it contains one or more random effects. When a model contains random effects, writing the results as described in Section 3.1.2 will fail. Writing the parameter estimates to a data file will not be a problem, but writing the covariance matrices to the same data set will be problematic. Therefore, we must turn to a different strategy. • We carry out steps 1 to 3 from Section 3.1.2. • In step 4 from Section 3.1.2 we only select Parameter Estimates in the Table Subtypes for Selected Commands menu. • Next, we save the covariance matrices of the parameter estimates to a second file. To this end we carry out steps 1 to 3 from Section 3.1.2. • Now in step 4 we only select Covariance Matrix in the Table Subtypes for Selected Commands menu. • After this has been done, we perform step 5 with one modification: after clicking on options, we must click on All dimensions in a single row (below Table Pivots). See Figure 14. Writing the covariance matrices to a file in the presence of random effects is only possible when all dimensions are stored in one single row. • After this, we carry out steps 6 to 7 from Section 3.1.2. Note that in step 6 we have to specify a new file name so that the file with parameter estimates will not be overwritten. The covariance matrices of the parameter estimates are now written to a new file. 5. We can now carry out the analysis: • As in ANOVA with independent measures, use Mixed Models (see, Section 3.1.3, step 1). • Since we have repeated measures, we have to specify the variable containing the case numbers, right after entering the Mixed Models procedure. In our example this is variable id, see Figure 15. Once this has been done, we click Continue. • Again, the effect coded variables are entered as covariates and variable score is entered as the dependent variable. • We specify all effect coded variables as fixed effects (Fixed). Use only main effects because the interaction between variables V17 and subscale is already included. See, Figure 16. Click Continue.

25

Figure 14: Choosing the option All dimensions in a single row.

• When we have repeated measures, we have to specify random effects as well (Random). With one within-subjects factor we have to specify include the intercept and select (id) as the subjects variable. See, Figure 17. After that we click Continue. • Finally, we click on Statistics, we select Parameter estimates and Covariances of parameter estimates and click Continue. See, Figure 12. We are now ready to run the analysis (Ok). 6. We end the OMS commands. Note that there are two OMS commands here which need to be ended, one for the parameters and one for the covariance matrix of the parameters. Thus, we have to carry out step 2 in Section 3.1.3 two times. 7. We open the file that contains the covariance matrices (example file ExampleMixedCovariances.sav). The file looks as in Figure 18. The file consists of two rows. We only need the first row, which contains the covariances of parameter estimates of the fixed effects. To this end we need to delete the second row. This may either be done manually, or by using the menu (task bar: Data, Select Cases). 26

Figure 15: Specifying the Variable that Contains the Case Numbers.

Figure 16: Models.

Specifying the Fixed Effects in Mixed

27

Figure 17: Specifying the Random Effects in Mixed Models.

Figure 18: File Containing the Covariance Matrices of the Repeated Measures ANOVA.

28

8. What follows next is the most difficult part of the procedure. Suppose there are p parameters and m imputed data sets. The file needs to be restructured such that the covariance matrices are represented by p variables containing the covariance matrices of the m imputed data sets, plus those of the original incomplete data set. The followings steps must be taken • We us the restructure option (Also, see, step 1 from this section). When entering the Restructure option, we have to specify the number of variables, which is the number of parameters in the repeated-measures ANOVA model After this has been specified, we click Next. • Next we have to enter the variables containing the covariances of the intercept of each imputed data set. This requires quite some searching in the variable list. This searching may be facilitated by displaying the variable names in the variable list, rather than the variable labels (right click on one of the names in the variable list and select Display Variable Names). The variable names that appear have the same structure, namely: @{imputation number} {parameter1} {parameter2} or in case of the original data: Original {parameter1} {parameter2}. In the Target Variable list are the variable names of the 8 variables that will appear in the new file. By default, they are called trans1 to trans8. For variable trans1 we enter all variables for which {parameter1} equals Intercept, for both the original data and all imputed data sets. These variables contain the variances of the intercept and the covariances of the intercept with the other parameters. The resulting variable in the restructured file will consequently include all variances and covariances of the intercept. Before proceeding, we assign a useful name to this variable, for example, Intercept. See, Figure 19. • We go to the next variable trans2, see Figure 20. This variable should contain the covariances of the effect of variable V41 in the repeated-measures ANOVA. These are all variables for which {parameter1} equals V41EC. Note that this name V41EC applies only to this specific example. The name of {parameter1} depends on the name that was given in step 3 of this section. After all these variables have been entered, we give a name to the variable in the new file, for example V1EC. See Figure 21. • This process of entering variables and renaming the resulting vari29

Figure 19: Selecting the Variables Containing the (Co)variances of the Intercept.

Figure 20: Going to the Next Variable in the Variables to be Transposed List.

30

Figure 21: Selecting the Variables Containing the (Co)variances of the Effect of Variable V41.

able, is continued until variable trans8. At each new variable, the {parameter1} part of the variable name differs. Once everything has been entered, we click Next. • We specify the number of index variables. Since we do not need any index variable we specify None and click Next. • Next, we have to specify what we would like to do with the variables not selected. Since we don’t need the remaining variables (TableNumber to Label ) we can choose Drop variable(s) in the new data file and click Next. • When clicking Finish the data will be restructured. • Save the restructured file. 9. We open the file containing the parameter estimates (example file ExampleMixedParameters.sav). We want to merge this file with the restructured file that contains the covariances (task bar: Data, Merge Files, Add Variables). We save the merged file. The example file ExampleMixedRepeated.sav contains the result of this merging of both files. This file is ready to use for the combining of the results of repeated-measures ANOVA. 31

10. Variable Estimate contains the parameter estimates, and variables Intercept to V41Subscale3EC contain the covariance matrices. Thus, lines 1 to 4 of the file runMI-mul2.sps become INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixedRepeated.sav’ /ESTIMATE = Estimate /COV = Intercept to V41Subscale3EC Like ANOVA for independent measures, redundant parameters are not displayed in the file. Thus, here we should not include the redundant categories in the /LEVELSIND subcommand. As already noted in the example from Section 4.4, V41 has two categories, so there is only one nonredundant category for this variable. Variable Subscale has 4 levels and hence 3 nonredundant categories. Finally, the interaction V41 × subscale has 1×4 = 4 nonredundant categories. Thus, line 5 of syntax file runMI-mul2.sps becomes: /LEVELSIND = 1,3,3 Like ANOVA for independent measures, the F -tests have error degrees of freedom, which have to be specified in the /DF subcommand (see, Section 3.2.6). The variable containing the degrees of freedom is df. The complete syntax looks like this: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V41Subscale3EC /LEVELSIND = 1,3,3 /DF = df /M = 5. 11. Run the syntax file (task bar: Run, All) and the combined repeatedmeasures ANOVA will appear in the output.

References Edwards, A. L. (1985). Multiple regression analysis and the analysis of variance and covariance. New York: Freeman. 32

Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502-508. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall. Van Ginkel, J. R. & Kroonenberg (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78-91. Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation for item scores when test data are factorially complex. British Journal of Mathematical and Statistical Psychology, 60, 315-337.

33