Вы находитесь на странице: 1из 26

Introduction to

SPSS for Windows


Excerpted from SPSS Manual

Prepared by

Dr. Hisham S. Abou-Auda

This handout is for SPSS release 13.0.


SPSS 13.0 is a comprehensive system for analyzing data. SPSS can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and complex statistical analyses. SPSS makes statistical analysis more accessible for the beginner and more convenient for the experienced user. Simple menus and dialog box selections make it possible to perform complex analyses without typing a single line of command syntax. The Data Editor offers a simple and efficient spreadsheet-like facility for entering data and browsing the working data file.

Starting SPSS
To start SPSS:
From the Windows Start menu choose: Programs > SPSS for Windows > SPSS for Windows

When you start a session, you see the Data Editor window.
Figure 1-1 Data Editor window (Data View)

Variable Display in Dialog Boxes


Either variable names or longer variable labels will appear in list boxes in dialog boxes. Additionally, variables in list boxes can be ordered alphabetically or by their position in the file. In this guide, we will display variable labels in alphabetical order within list boxes. For a new user of SPSS, this provides a more complete description of variables in an easy-to-follow order. Since the default setting within SPSS is to display variable labels in file order, we will change this before accessing data.
From the menus choose: Edit Options... Select Display labels in the Variable Lists group on the General tab. Also select Alphabetical. Click OK, and then click OK to confirm the change.

Opening a Data File


Before you can analyze data, you need some data to analyze. To open a data file: From the menus choose:
File > Open > Data...

Alternatively, you can use the Open File button on the toolbar.

This opens the Open File dialog box.

Open File dialog box

By default, SPSS-format data files (.sav extension) are displayed. You can display other file formats using the Files of Type drop-down list.
Click Open to open the SPSS data file. Variable labels

The data file is displayed in the Data Editor. If you put the mouse cursor on a variable name (the column headings), a more descriptive variable label is displayed if one has been defined for that variable. By default, the actual data values are displayed. To display labels: From the menus choose:
View Value Labels

Alternatively, you can use the Label tool on the toolbar.

Descriptive value labels are now displayed. This makes it easier to interpret the responses.
Value labels displayed in the Data Editor

Running an Analysis
The Analyze menu contains a list of general reporting and statistical analysis categories. Most of the categories are followed by an arrow, which indicates that there are several analysis procedures available within the category; they will appear on a submenu when the category is selected. Well start with a simple frequency table (table of counts). From the menus choose:
Analyze Descriptive Statistics Frequencies... Frequencies dialog box

Select (click) the variable Income category.

Variable labels and names in the Frequencies dialog box

A more complete description of each variable pops up when the cursor is over it. The variable name (in square brackets) is inccat, and it has the variable label Income category. If there were no variable label, only the variable name would appear in the list box. In the dialog box, you choose the variables you want to analyze from the source list on the left and move them into the Variable(s) list on the right. The OK button, which runs the analysis, is disabled until at least one variable is placed in the Variable(s) list. Additional labeling information can be easily obtained for any variable on the list by clicking on the variable name with the right mouse button.
Click the right mouse button on Income Category [inccat], and then click (left mouse button) Variable Information. Click the down arrow on the Value Labels drop-down list. Figure 1-10 Defined labels for income variable

All of the defined value labels for the variable are displayed. Click Gender [gender] in the source variable list, and then click the right-arrow button to move the variable into the target Variable(s) list. Click Income category [inccat] in the source list, and then click the right arrow button again.

Variables selected for analysis

A pound sign (#) icon next to the variable name indicates that the variable is numeric. An icon with the letter A indicates that the variable is a string (alphanumeric) variable, which may contain both letters and numbers. A less-than sign (left angle bracket) indicates that the variable is a short string, containing eight or fewer characters.
Click OK to run the procedure.

Viewing Results
Viewer window

Results are displayed in the Viewer window.

Creating Charts
Although some statistical procedures can create high-resolution charts, you can also use the Graphs menu to create charts. For example, you could create a chart that shows the relationship between wireless telephone service and PDA (personal digital assistant) ownership. From the menus choose:
Graphs Bar... Click Clustered and then click Define. Define Clustered Bar dialog box

Scroll down the source variable list and select Wireless service [wireless] as the Category Axis variable. Select Owns PDA [ownpda] as the Define Clusters By variable. Click OK to create the chart.

Bar chart displayed in Viewer window

Exiting SPSS
To exit SPSS: From the menus choose:
File Exit Click No if you get an alert asking if you want to save your results.

Statistics Coach
The Statistics Coach can help to guide you through the process of finding the procedure that you want to use. From the menus choose:
Help Statistics Coach Statistics Coach, first step

The Statistics Coach presents a series of questions designed to find the appropriate procedure. The first question is simply What do you want to do? For example, if you want to summarize data: Select Summarize, describe, or present data. Then click Next.

Statistics Coach, first step

Data type selection

The next question asks about the type of data you want to summarize. If youre unsure, each choice displays different examples. Select Continuous, numeric data (interval, ratio).

Selecting a different data type

The example changes to reflect your choice. If youre still unsure, you can:

Click More Examples.

A new example of the same data type is displayed. If the examples dont provide enough information, you can: Click Help.
Select Tables and numbers and click Next.

Statistics Coach, final step Select Individual case listings within categories.

When the Statistics Coach has enough information, the Next button changes to Finish. When you click Finish, the dialog box for the selected procedure opens automatically, and a Help topic for the procedure is also displayed.

This is a custom Help topic, based on your selections in the Statistics Coach. Since some dialog boxes perform numerous functions, more than one path in the Statistics Coach may lead to the same dialog box, but the instructions in the Help topic may be different. Click Tell me more in the Help topic to get more detailed information.

Using the Data Editor


The Data Editor displays the contents of the active data file. The information in the Data Editor consists of variables and cases. In Data View, columns represent variables and rows represent cases (observations). In Variable View, each row is a variable, and each column is an attribute associated with that variable. Variables are used to represent the different types of data that you have compiled. A common analogy is that of a survey. The response to each question on a survey is equivalent to a variable. Variables come in many different types, including numbers, strings, currency, and dates.

Entering Numeric Data


Data can be entered into the Data Editor, which may be useful for small data files or for making minor edits to larger data files. Click the Variable View tab at the bottom of the Data Editor window. Define the variables that are going to be used. In this case, only three variables are needed: age, marital status, and income.

In the first row of the first column, type age. In the second row, type marital. In the third row, type income.

New variables are automatically given a numeric data type. If you dont enter variable names, unique names are automatically created. However, these names are not descriptive and are not recommended for large data files. Click the Data View tab to continue entering the data. The names that you entered in Variable View are now the headings for the first three columns in Data View. Begin entering data in the first row, starting at the first column.

In the age column, type 55. In the marital column, type 1. In the income column, type 72000. Move the cursor to the first column of the second row to add the next subjects data. In the age column, type 53. In the marital column, type 0. In the income column, type 153000.

Currently, the age and marital columns display decimal points, even though their values are intended to be integers. To hide the decimal points in these variables: Click the Variable View tab at the bottom of the Data Editor window. Select the Decimals column in the age row and type 0 to hide the decimal. Select the Decimals column in the marital row and type 0 to hide the decimal.

Defining Data
In addition to defining data types, you can also define descriptive variable and value labels for variable names and data values. These descriptive labels are used in statistical reports and charts.

Adding a Variable Label


Labels are meant to provide descriptions of variables. These descriptions are often longer versions of variable names. Labels can be up to 256 characters long. These labels are used in your output to identify the different variables. Click the Variable View tab at the bottom of the Data Editor window. In the Label column of the age row, type Respondent's Age. In the Label column of the marital row, type Marital Status. In the Label column of the income row, type Household Income. In the Label column of the sex row, type Gender.

Adding Value Labels for Numeric Variables


Value labels provide a method for mapping your variable values to a string label. In the case of this example, there are two acceptable values for the marital variable. A value of 0 means that the subject is single, and a value of 1 means that he or she is married. Click the Values cell for the marital row, and then click the button to open the Value Labels dialog box. The value is the actual numeric value. The value label is the string label applied to the specified numeric value. Type 0 in the Value field. Type Single in the Value Label field.
Click Add to add this label to the list. Value Labels dialog box

Repeat the process, this time typing 1 in the Value field and Married in the Value Label field. Click Add, and then click OK to save your changes and return to the Data Editor.

These labels can also be displayed in Data View, which can help to make your data more readable. Click the Data View tab at the bottom of the Data Editor window. From the menus choose:
View Value Labels

The labels are now displayed in a list when you enter values in the Data Editor. This has the benefit of suggesting a valid response and providing a more descriptive answer.

Examining Summary Statistics for Individual Variables


Level of Measurement
Different summary measures are appropriate for different types of data, depending on the level of measurement: Categorical. Data with a limited number of distinct values or categories (for example, gender or marital status). Also referred to as qualitative data. Categorical variables can be string (alphanumeric) data or numeric variables that use numeric codes to represent categories (for example, 0 = Unmarried and 1 = Married). There are two basic types of categorical data: Nominal. Categorical data where there is no inherent order to the categories. For example, a job category of sales isnt higher or lower than a job category of marketing or research. Ordinal. Categorical data where there is a meaningful order of categories, but there isnt a measurable distance between categories. For example, there is an order to the values high, medium, and low, but the distance between the values cant be calculated. Scale. Data measured on an interval or ratio scale, where the data values indicate both the order of values and the distance between values. For example, a salary of $72,195 is higher than a salary of $52,398, and the distance between the two values is $19,797. Also referred to as quantitative or continuous data.

Summary Measures for Categorical Data


For categorical data, the most typical summary measure is the number or percentage of cases in each category. The mode is the category with the greatest number of cases. For ordinal data, the median (the value above and below which half the cases fall) may also be a useful summary measure if there is a large number of categories. The Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable. From the menus choose:
Analyze Descriptive Statistics Frequencies... Select Owns PDA (ownpda) and Owns TV (owntv) and move them into the Variable(s)

list.
Categorical variables selected for analysis

Click OK to run the procedure.

Frequency tables

The frequency tables are displayed in the Viewer window. The frequency tables reveal that only about 21% of the people own PDAs, but almost everybody owns a TV (99.2%). This might not be an interesting revelation, although it might be interesting to find out more about the small group of people who do not own television

Summary Measures for Scale Variables


There are many summary measures available for scale variables, including: Measures of central tendency. The most common measures of central tendency are the mean (arithmetic average) and median (value above and below which half the cases fall). Measures of dispersion. Statistics that measure the amount of variation or spread in the data include the standard deviation, minimum, and maximum. Open the Frequencies dialog box again. Click Reset to clear any previous settings. Select Household income in thousands (income) and move it into the Variable(s) list.
Scale variable selected for analysis

Click Statistics. Select Mean, Median, Std. deviation, Minimum, and Maximum. Frequencies Statistics dialog box

Click Continue. Deselect Display frequency tables in the main dialog box. (Frequency tables are usually not useful for

scale variables since there may be almost as many distinct values as there are cases in the data file.)
Click OK to run the procedure.

The Frequencies Statistics table is displayed in the Viewer window.


Frequencies Statistics table

Computing New Variables


Using a wide variety of mathematical functions, you can compute new variables based on highly complex equations. In this example, however, we will simply compute a new variable that is the difference between the values of two existing variables. The data file demo.sav contains a variable for the respondents current age and a variable for the number of years at current job. It does not, however, contain a variable for the respondents age at the time he or she started that job. We can create a new variable that is the computed difference between current age and number of years at current job, which should be the approximate age at which the respondent started that job. From the menus in the Data Editor window choose:
Transform Compute... For Target Variable, enter jobstart. Select Age in years (age) in the source variable list and click the arrow button to copy it to the

Numeric Expression text box. Click the minus () button on the calculator pad in the dialog box (or press the minus key on the keyboard). Select Years with current employer (employ) and click the arrow button to copy it to the expression.
Compute Variable dialog box

Note: Be careful to select the correct employment variable. There is also a recoded categorical version of the variable, which is not what you want. The numeric expression should be age-employ, not ageempcat. Click OK to compute the new variable. The new variable is displayed in the Data Editor. Since the variable is added to the end of the file, it is displayed in the far right column in Data View and in the last row in Variable View.

New variable displayed in the Data Editor

Using Functions in Expressions


You can also use predefined functions in expressions. More than 70 built-in functions are available, including: Arithmetic functions Statistical functions Distribution functions Logical function Date and time aggregation and extraction functions Missing-value functions Cross-case functions String functions
Compute Variable dialog box displaying function grouping

Functions are organized into logically distinct groups, such as a group for arithmetic operations and another for computing statistical metrics. For convenience, a number of commonly used system variables, such as $TIME (current date and time), are also included in appropriate function groups. A brief description of the currently selected function (in this case, SUM) or system variable is displayed in a reserved area in the Compute Variable dialog box.
Pasting a function into an expression. To paste a function into an expression: Position the cursor in the expression at the point where you want the function to appear. Select the appropriate group from the Function group list. The group labeled All provides a listing of all available functions and system variables. Double-click the function in the Functions and Special Variables list (or select the function and click the arrow adjacent to the Function group list). The function is inserted into the expression. If you highlight part of the expression and then insert the function, the highlighted portion of the expression is used as the first argument in the function. Editing a function in an expression. The function is not complete until you enter the arguments, represented by question marks in the pasted function. The number of question marks indicates the minimum number of arguments required to complete the function. Highlight the question mark(s) in the pasted function. Enter the arguments. If the arguments are variable names, you can paste them from the variable list.

Using Conditional Expressions


You can use conditional expressions (also called logical expressions) to apply transformations to selected subsets of cases. A conditional expression returns a value of true, false, or missing for each case. If the result of a conditional expression is true, the transformation is applied to that case. If the result is false or missing, the transformation is not applied to the case. To specify a conditional expression: Click If in the Compute Variable dialog box. This opens the If Cases dialog box.
If Cases dialog box

Select Include if case satisfies condition. Enter the conditional expression.

Most conditional expressions contain at least one relational operator, as in:


age>=21

or
income*3<100

In the first example, only cases with a value of 21 or greater for Age (age) are selected. In the second example, Household income in thousands (income) multiplied by 3 must be less than 100 for a case to be selected. You can also link two or more conditional expressions using logical operators, as in:
age>=21 | ed>=4

or
income*3<100 & ed=5

In the first example, cases that meet either the Age (age) condition or the Level of education (ed) condition are selected. In the second example, both the Household income in thousands (income) and Level of education (ed) conditions must be met for a case to be selected.

Additional Statistical Procedures


This chapter contains brief examples for selected statistical procedures. The procedures are grouped according to the order in which they appear on the Analyze menu. The examples are designed to illustrate sample specifications required to run a statistical procedure. The examples in this chapter use the data file demo.sav, except for the following: The paired-samples t test example uses the data file dietstudy.sav, which is a hypothetical data file containing the results of a study of the Stillman diet. In the examples in this chapter, you must run the procedures to see the output. The correlation examples use Employee data.sav, which contains historical data about a companys employees. The exponential smoothing example uses the data file inventor.sav, which contains inventory data collected over a period of 70 days. For information about individual items in a dialog box, click Help. If you want to locate a specific statistic, such as percentiles, use the Index or Search facility in the Help system. For additional information about interpreting the results obtained by running these procedures, consult a statistics or data analysis textbook.

Summarizing Data
The Descriptive Statistics submenu on the Analyze menu provides techniques for summarizing data with statistics and charts.

Frequencies
From the menus choose: Analyze Descriptive Statistics Frequencies...

This opens the Frequencies dialog box.

Select Years with current employer (employ) and move it to the Variable(s) list. Deselect the Display frequency tables check box.

(If you leave this item selected and display a frequency table for current salary, the output shows an entry for every distinct value of salary, making a very long table.) Click Charts to open the Frequencies Charts dialog box.
Frequencies Charts dialog box

Select Histograms and With normal curve, and then click Continue. To select summary statistics, click Statistics in the Frequencies dialog box. This displays the

Frequencies Statistics dialog box.


Frequencies Statistics dialog box

Select Mean, Std. deviation, and Maximum, and then click Continue. Click OK in the Frequencies dialog box to run the procedure.

The Viewer shows the requested statistics and a histogram in standard graphics format. Each bar in the histogram represents the number of employees within a range of five years, and the year values displayed are the range midpoints. As requested, a normal curve is superimposed on the chart.

More about Summarizing Data


There are many ways to summarize data. For example, to calculate medians or percentiles, use the Frequencies procedure or the Explore procedure. Here are some additional methods: Descriptives. For income, you can calculate standard scores, sometimes called z scores. Use the Descriptives procedure and select Save standardized values as variables. Crosstabs. You can use the Crosstabs procedure to display the relationship between two or more categorical variables.

Summarize procedure. You can use the Summarize procedure to write to your output window a listing of the actual values of age, gender, and income of the first 25 or 50 cases. To run the Summarize procedure, from the menus choose:
Analyze Reports Case Summaries...

Comparing Means
The Compare Means submenu on the Analyze menu provides techniques for displaying descriptive statistics and testing whether differences are significant between two means for both independent and paired samples. You can also test whether differences are significant among more than two independent means by using the One-Way ANOVA procedure.

Means
In the demo.sav file, several variables are available for dividing people into groups. You can then calculate various statistics in order to compare the groups. For example, you can compute the average (mean) household income for males and females. To calculate the means, use the following steps: From the menus choose:
Analyze Compare Means Means...

This opens the Means dialog box.


Means dialog box (layer 1)

Select Household income in thousands (income) and move it to the Dependent List. Select Gender (gender) and move it to the Independent List in layer 1. Click Next. This creates another layer. Means dialog box (layer 2)

Select Owns PDA (ownpda) and move it to the Independent List in layer 2. Click OK to run the procedure.

Paired-Samples T Test
When the data are structured in such a way that there are two observations on the same individual or observations that are matched by another variable on two individuals (twins, for example), the samples are paired. In the data file dietstudy.sav, the beginning and final weights are provided for each person who participated in the study. If the diet worked, we expect that the participants weight before and after the study would be significantly different. To carry out a t test of the beginning and final weights, use the following steps: Open the data file dietstudy.sav, which can be found in the \tutorial\sample_files\subdirectory of the directory in which you installed SPSS. From the menus choose:
Analyze Compare Means Paired-Samples T Test...

This opens the Paired-Samples T Test dialog box.


Paired-Samples T Test dialog box

Click Weight (wgt0). The variable is displayed in the Current Selections group (below the variable list). Click Final weight (wgt4). The variable is displayed in the Current Selections group. Click the arrow button to move the pair to the Paired Variables list. Click OK to run the procedure. If there are rows of asterisks in some columns, double-click the chart and drag to make the columns wider. The results show that the final weight is significantly different from the beginning weight, as indicated by the small probability displayed in the Sig. (2-tailed) column of the Paired Samples Test table.

More about Comparing Means


The following examples suggest some ways in which you can use other procedures to compare means. Independent-Samples T Test. When you use a t test to compare means of one variable across independent groups, the samples are independent. Males and females in the demo.sav file can be divided into independent groups by the variable Gender (gender). You can use a t test to determine if the mean household incomes of males and females are the same. One-Sample T Test. You can test whether the household income of people with college degrees differs from a national or state average. Use Select Cases on the Data menu to select the cases with Level of Education (ed) >= 4. Then, run the One-Sample T Test procedure to compare Household income in thousands(income) and the test value 75.

One-Way ANOVA. The variable Level of Education (ed) divides employees into five independent groups by level of education. You can use the One-Way ANOVA procedure to test whether Household income in thousands (income) means for the five groups are significantly different.

ANOVA Models
The General Linear Model submenu on the Analyze menu provides techniques for testing univariate analysis-of-variance models. (If you have only one factor, you can use the One-Way ANOVA procedure on the Compare Means submenu.)

Univariate Analysis of Variance


The GLM Univariate procedure can perform an analysis of variance for factorial designs. A simple factorial design can be used to test if a persons household income and job satisfaction affect the number of years with current employer. From the menus choose:
Analyze General Linear Model Univariate...

This opens the Univariate dialog box.


Univariate dialog box

Select Years with current employer (employ) as the dependent variable. Select Income category in thousands (inccat) and Job satisfaction (jobsat) as fixed factors. Click OK to run the procedure.

In the Tests of Between-Subjects Effects table, you can see that the effects of income and job satisfaction are definitely significant and that the observed significance level of the interaction of income and job satisfaction is 0.000. For further interpretation, consult a statistics or data analysis textbook.

Correlating Variables
The Correlate submenu on the Analyze menu provides measures of association for two or more numeric variables. The examples in this topic use the data file Employee data.sav.

Bivariate Correlations
The Bivariate Correlations procedure computes statistics such as Pearsons correlation coefficient. Correlations measure how variables or rank orders are related.

Correlation coefficients range in value from 1 (a perfect negative relationship) and +1 (a perfect positive relationship). A value of 0 indicates no linear relationship. For example, you can use Pearsons correlation coefficient to see if there is a strong linear association between Current Salary (salary) and Beginning Salary(salbegin) in the data file Employee data.sav.

Partial Correlations
The Partial Correlations procedure calculates partial correlation coefficients that describe the relationship between two variables while adjusting for the effects of one or more additional variables. You can estimate the correlation between Current Salary (salary) and Beginning Salary (salbegin), controlling for the linear effects of Months since Hire (jobtime) and Previous Experience (prevexp). The number of control variables determines the order of the partial correlation coefficient. To carry out this Partial Correlations procedure, use the following steps: Open the Employee data.sav file. It is usually in the directory where SPSS is installed. From the menus choose:
Analyze Correlate Partial...

This opens the Partial Correlations dialog box.


Partial Correlations dialog box

Select Current Salary (salary) and Beginning Salary (salbegin) and move them to the Variables list. Select Months since Hire (jobtime) and Previous Experience (prevexp) and move them to the

Controlling For list. Click OK to run the procedure. The output shows a table of partial correlation coefficients, the degrees of freedom, and the significance level for the pair Current Salary (salary) and Beginning Salary (salbegin).

Regression Analysis
The Regression submenu on the Analyze menu provides regression techniques.

Linear Regression
The Linear Regression procedure examines the relationship between a dependent variable and a set of independent variables. You can use it to predict a persons household income (the dependent variable) from independent variables such as age, number in household, and years with employer. From the menus choose:
Analyze Regression Linear...

This opens the Linear Regression dialog box.

Linear Regression dialog box

Select Household income in thousands (income) and move it to the Dependent list. Select Age in years (age), Number of people in household (reside), and Years with current employer (employ) and move them to the Independent(s) list. Click OK to run the procedure. The output contains goodness-of-fit statistics and the partial regression coefficients for the variables. Examining fit. To see how well the regression model fits your data, you can examine the residuals and other types of diagnostics that this procedure provides. In the Linear Regression dialog box, click Save to see a list of the new variables that you can add to your data file. If you generate any of these variables, they will not be available in a later session unless you save the data file. Methods. If you have collected a large number of independent variables and want to build a regression model that includes only variables that are statistically related to the dependent variable, you can select a method from the drop-down list. For example, if you select Stepwise in the above example, only variables that meet the criteria in the Linear Regression Options dialog box are entered in the equation.

Nonparametric Tests
The Nonparametric Tests submenu on the Analyze menu provides nonparametric tests for one sample or for two or more paired or independent samples. Nonparametric tests do not require assumptions about the shape of the distributions from which the data originate.

Chi-Square
The Chi-Square Test procedure is used to test hypotheses about the relative proportion of cases falling into several mutually exclusive groups. You can test the hypothesis that people who participated in the survey occur in the same proportions of gender as the general population (50% males, 50% females). In this example, you will need to recode the string variable Gender (gender) into a numeric variable before you can run the procedure. From the menus choose:
Transform Automatic Recode...

This opens the Automatic Recode dialog box.

Automatic Recode dialog box

Select the variable Gender (gender) and move it to the Variable -> New Name list. Type gender2 in the New Name text box, and then click the New Name button. Click OK to run the procedure.

This creates a new numeric variable called gender2, which has a value of 1 for females and a value of 2 for males. Now a chi-square test can be run with a numeric variable. From the menus choose:
Analyze Nonparametric Tests Chi-Square... This opens the Chi-Square Test dialog box. Chi-Square Test dialog box

Select Gender (gender2) as the test variable. Select All categories equal, since, in the general population of working age, the number of males and

females is approximately equal.


Click OK to run the procedure.

The output shows a table of the expected and residual values for the categories. The significance of the chi-square test is 0.6. Consult a statistics or data analysis text book for more information on interpretation of the statistics.

Вам также может понравиться