Using SPSS in the Social Sciences1[1] - John Abbott College

74 downloads 69 Views 8MB Size Report
Using SPSS in the Social Sciences1[1]. Follow the steps as outlined here to get an idea of how SPSS works. This is intended to be a shortcut, or quick reference,  ...
Using SPSS in the Social Sciences1[1] Follow the steps as outlined here to get an idea of how SPSS works. This is intended to be a shortcut, or quick reference, to the rudimentary tasks available in SPSS according to the type of material to be covered in Quantitative Methods. Certain assumptions have been made regarding this brief handout. These include but are not limited to the following: a working familiarity with the Windows® environment as well as knowledge of the basic curriculum in Quantitative Methods. As an additional note, on screen captures will look similar to those seen in the application of SPSS 15.0.1.1 Version. Accessing SPSS From the Start menu, click on Programs – SPSS 15.0. Once opened you should see something that looks like Exhibit 1 (shown below). Once you become more familiar with SPSS, these 5 options will become clearer. At this point, however, you should click cancel, leaving you with a blank sheet similar to that of Excel.

Adapted from A student Guide to SPSS 11.0 for Windows® by Ken Deal. – Prepared by David Desjardins and revised by Elaine Clendinneng. 1[1]

Entering Data Suppose you have a series of grades for a class test ranging from 0 to 100 (as is usually the case) and would like to enter them into SPSS to find the mean and standard deviation for the class results. Further, once the data have been entered into SPSS, assume the file will be named grades.sav (note that the .sav extension is the default file association for SPSS input files). In SPSS, the data (individual variables) are input in columns in the data view while the variables themselves are defined in the variable view. Begin by clicking on the variable view tab at the bottom of the SPSS window (shown below).

Once completed, you should see a blank version of the screen capture below (with all of the relevant information having been done). Under Name, this is where you would define the variable of interest. Note here that the name can only be alphanumeric without special characters or spaces. If you include any disallowed characters the name will not be accepted. Here, the data will be numeric in nature (with the default actions listed below). There are other possible selections (shown in the lower screen capture) with the most common (after numeric) being string. The string definition will allow any characters to be used up to a maximum of 250. You can also change the type of variable you define (numeric is the default) by clicking of the after the Type / Numeric. You are now ready to input the data. Click on the data view tab at the bottom of the sheet and begin to insert the data under the heading grades. To switch from data view to variable view and back, you can click the data view / variable view tabs at the bottom of the window at any time during your SPSS session.

Saving and Opening SPSS files Save your data to a floppy disk, to a USB or as you normally would in the Windows® environment. To open an existing SPSS input file click file – open – data as shown below. The corresponding file manager should open for retrieval of your saved file. Ensure that the look in: area says 3½ Floppy (A:)

Finding Descriptive Statistics Using the data in grades.sav we turn our attention to finding a number of descriptive statistics. The most common ones used in Quantitative Methods are: the sample mean, sample standard deviation, and the five number summary (minimum, first quartile (Q1), median (M), third quartile (Q3), maximum). To find s, and the five number summary, click on analyze – descriptive statistics – frequencies. A similar figure to the one presented below will appear. Move the appropriate variable(s) into the variables column by highlighting the variable(s) and clicking on . Once done, to obtain the desired descriptive statistics, click on the statistics box at the bottom of the dialog box. The associated dialog box is shown below.

To obtain the desired statistics, just check off the corresponding item of interest in the dialog box. Once satisfied with your selections, click continue and the frequencies: statistics dialog box will disappear. Then in the corresponding frequencies dialog box, click ok, and the results should be placed into an SPSS output window. The result should look as shown on the next page. Note that the percentiles printed for 25, 50 and 75 will represent Q1, M, and Q3, respectively.

Histogram of the Frequency Distribution To find the histogram of the frequency distribution click graphs – legacy dialogs – histogram. Click grades, and then click to move grades into the variable box. If you want to include titles, click titles to open the titles window. Type in the desired information and click continue, and then click ok. The histogram should appear in the output window.

Shortcut for all Descriptive Statistics, Histogram, and Stem and Leaf Display 1. Click on analyze – descriptive statistics – explore to open the explore box. Click grades, and then click to move grades into the Dependent list box. To separate the results obtained by a characteristic such as gender, the gender variable should be placed in the factor list box. 2.

Click statistics to open the statistics box and make sure that Percentiles is checked. Click continue. Note that the percentiles printed for 25, 50 and 75 will represent Q1, M, and Q3, respectively. Click ok. The desired result should be printed in the output window.

3. Once done, click on plots to open the plots box and make sure that both Stem and leaf and Histogram are checked. Click continue.

Regression and Correlation Correlation Coefficient (Pearson’s r): To find the correlation coefficient in SPSS follow the following steps. 1.

Click analyze – correlate – bivariate.

2.

The Bivariate Correlations dialog box should open.

3.

Click the variables under consideration to highlight, then to move these variables over to the right side Variables. Ensure that the Pearson box is checked, since this will give you the Pearson r correlation coefficient that is normally used.

4.

Click ok, and you should be able to find the correct correlation coefficient in the output window. The sequence to follow is shown below.

If the data in question is age (x) and price (y) as in the file regress.sav you would see the following output, shown below. Note that the level of significance (or p-value) is provided to test for statistical significance, and is statistically significant at the 1% level.

Linear Regression: To find the regression equation in SPSS follow the following steps. 1.

Click analyze – regression – linear.

2.

The Linear Regression dialog box should open.

3.

Click the appropriate variable for the dependent variable (y) then click to move the variable into the dependent box on the right hand side. Repeat for the to move the variable into the independent box independent variable (x) clicking on the right hand side.

4.

If you are interested in determining the correlation coefficient along with other descriptive statistics click on statistics in the Linear Regression dialog box making sure that descriptives is checked off, then click continue.

5.

Click ok, and you should be able to find the appropriate regression values for the slope (b) and intercept (a) in the output window.

The sequence to follow is shown below.

For the age (x) and price (y) data, if we run the regression price = a + b age (ignoring the error term (e)for the time being) or more compactly as y = a + bx, where a is the intercept and b is the slope of the regression line, there is a myriad of potential output to wade through. This can prove problematic for the first time user of SPSS, or the new student. Some of the potential problem areas include, Descriptive Statistics, Correlations, Variables Entered/Removed, Model Summary, and ANOVA (Analysis of Variance). These items provide the descriptive statistics along with the value of r2 for the coefficient of determination, and the correlation coefficient (r). These are interesting to note but can be found in other ways. What is of interest is the value of a and b from the regression equation. The relative portion of the output is shown below.

The values of a and b are given under the Unstandardized Coefficients column. The intercept (a) is the value next to (constant) and the slope (b) is the value next to the other (dependent) variable, agex.

Scatterplot: To prepare a scatter plot in SPSS follow the following steps. 1.

Click graphs – legacy dialogs and then click scatter/dot.

2.

The Scatterplot dialog box should open. Click simple, then click define.

3.

The Simple Scatterplot dialog box should open. Click the to move the correct variables over to the right side under X-axis and Y-axis for the independent and dependent variables respectively.

4.

Click ok, and your scatterplot should appear in an output window. Note that you can also add titles by clicking on the titles option. The sequence to follow is shown below.

To plot the least squares regression line: Begin by generating the scatterplot as described above, then follow the ensuing directions. 1.

Double click on the generated scatterplot in the SPSS output window. This will allow you to access the chart editor.

2.

Once in the chart editor, click on elements and select fit line at total.

3.

Close the chart editor and your regression line should appear in the output window.

Hypothesis Testing T-Test – Single Sample Assume that we are using the same data from the beginning from the file grades.sav, and would like to know if the mean of the population (to be defined) in question is statistically different from 65. Here, it should be noted that the value 65 is an arbitrary value assigned by the user or researcher. This is the case of the single sample t-test for the mean. Begin with the data entered into the input window. Once there click on Analyze – Compare Means – One Sample T Test. This sequence is shown below.

If done correctly, the One-Sample T Test dialog box should open as shown below. To to move the appropriate variable to the right side complete the t-test, click on grades then column under Test Variable(s) and supply a value for the hypothesized mean next to Test Value. Once completed click ok.

For this particular test, the result is given below. Relative to a first course in Quantitative Methods, some of the information is superfluous. Specifically, the Mean Difference and 95% Confidence Interval of the Difference are not required. The points of interest are the t value (under t) at –1.593 with degrees of freedom (df) for the t-distribution of 19. With this information you could easily obtain the critical value for the t-distribution for an appropriate significance level (usually given by α in most textbooks) to determine whether or not the result is statistically significant or not (i.e. accept Ho or Ha as the case warrants). If the result uses the p-value approach, the p-value is also provided under Sig. (2-tailed) at 0.128 (as this result is larger than .05 it is not statistically different from 65 in this case).

Paired t-Test – Independent Sample Suppose you had grade data for students from two separate tests and wanted to know if the means for each of the two tests was statistically different from one another. As was the case with the single sample t-test begin by clicking on Analyze - Compare Means – PairedSamples T Test as shown below.

The resulting dialog box is shown below. The key here is to highlight the two variables with which you would like to compare means. In doing so, the two variables chosen will appear to next to Variable 1: and Variable 2: under Current Selections. Then, click on the move the selection to the Paired Variables: column. Once done, click on ok and the resulting output should be in the output window.

As was the case for the single sample t-test (results below), much of the information is too much for a first course in Quantitative Methods. However, as was the case for the single sample t-test, the required information is readily available. The t value (under t) at –0.387 with degrees of freedom (df) for the t-distribution of 19. With this information you could easily obtain the critical value for the t-distribution for an appropriate significance level (usually given by α in most textbooks) to determine whether or not the result is statistically significant or not (i.e. accept Ho or Ha as the case warrants). If the result uses the p-value approach, the p-value is also provided under Sig. (2-tailed) at 0.783 (not statistically different in this case).

Independent Samples T-test For an independent samples T-test, click on Analyze – Compare means – Independent Samples T-Test

In the Test Variable(s) box enter the variable for which a comparison of means is required. The grouping variable, in this case gender, should be defined with value labels in the variable view. In this particular example 1=male 2=female. Click on Define Groups. Enter 1 for Group 1 and enter 2 for Group 2. Click on Continue and the Define Groups dialog box will close. Then click on OK in the Independent Samples T-Test box.

The following output is generated. The interpretation is similar to the Paired-Samples TTest. Assume equal variances for your results. Group Statistics

y

gender male female

N 6 6

Mean 23.5000 26.6667

Std. Deviation 1.87083 6.12100

Std. Error Mean .76376 2.49889

Independent Samples Test Levene's Test for Equality of Variances

F y

Equal variances assumed Equal variances not assumed

15.003

Sig. .003

t-test for Equality of Means

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference Lower Upper

-1.212

10

.253

-3.16667

2.61300

-8.98880

2.65546

-1.212

5.926

.272

-3.16667

2.61300

-9.57983

3.24650

Hypothesis Testing – χ2 Test of Independence The χ2 test of independence statistically checks to see if two variables are related in some way or not (independent under Ho and dependent under Ha). As a starting point, assume that we already have data on gender and favourite subject area as given is the input file fav.sav. The usual set up is in the form of a contingency table with expected frequencies computed from the observed frequencies. With the observed and expected frequencies, we then readily compute the χ2 statistic and compare it to a specified critical value for a χ2 with (r1)(c-1) degrees of freedom. Unfortunately, SPSS cannot deal directly with the contingency table as usually seen in textbooks. For the usual setup seen in texts, Excel is a better starting point. To start click on Analyze – Descriptive Statistics – Crosstabs. The resulting menu combination is shown below.

Once done, the Crosstabs dialog box opens (below). Begin by selecting the appropriate variable to be in the rows (gender of the respondent in this case) by highlighting that variable and clicking the appropriate to place that variable under the Row(s) column. Do the same for the other variable to be placed under Column(s). The dialog box is shown below.

To begin the χ2 test of independence, first click on the Statistics box in the Crosstabs dialog box to open the Crosstabs: Statistics dialog box. There are a multitude of possibilities for tests that can be performed. Make sure the Chi-square box is checked off and click continue, and the box will close. Next, click on the Cells…box to open the Crosstabs: Cell Display dialog box. Here you can choose to display the observed and expected counts by checking off the appropriate boxes next to Observed and Expected and click continue, and the box will close. It is common to include percentages in terms of one or more variables within the context of this analysis. Checking off the appropriate Row, Column, or Total under Percentages in the Crosstabs: Cell Display dialog box can do this.

After closing the Statistics and Cell Display dialog boxes shown above, you will then be left with the Crosstabs dialog box only. Click on ok to perform the statistical calculations. Results for this specific test of independence are presented on the next page. As can be seen, the output provided begins with a summary of the cases processed for each of the variables. The crosstabulation in the next table is the standard contingency table that most individuals see in textbooks. It provides the observed and expected values along with the marginal totals for each row and column. Note that if you were to ask for percentages, they would have been provided in this table as well.

The next output table is the actual statistical result. As was the case with the output from regression, this testing procedure provides additional information that may or may not be of value. In addition, if you were to check off some of the other statistical tests in the Crosstabs: Statistics dialog box, they would have been presented here also. The information for the test of independence involves the computed χ2 value given at 1.613. With this value you could compare this result to a predetermined critical value for a χ2 variable with 2 degrees of freedom (df) (specific to this example). If the result uses the p-value approach, the p-value is also provided under Asymp. Sig. (2-sided) at 0.446 (independent in this case, therefore not statistically significant).

Notes on the Output Window Saving and printing the output file is done in the standard Windows® manner. Be cautious of printing the output file. In SPSS, since there is a modified Table of Contents on the left side of the output window for ease of navigation, if a particular item is highlighted, only the highlighted portion will be printed. If you would like to return to an output file from a previous session of SPSS, you should click on file – open – output. If you are familiar with the Windows® environment, there are numerous shortcuts that can be followed for file and data manipulation. In this case, the .spo extension is associated with SPSS output files, but is not available as an option for the file open shortcut. When dealing with SPSS output files, as long as the output window remains open, all subsequent work done in the current SPSS session will be placed into the output window in sequence.