00 голосов за00 голосов против

17 просмотров77 стр.Mar 03, 2014

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

17 просмотров

00 голосов за00 голосов против

Attribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 77

com/mycsula

PASW Statistics 17 (SPSS 17)

INFORMATION TECHNOLOGY SERVICES

California State University, Los Angeles

www.youtube.com/mycsula

Version 1.0 Winter 2010

Table of Contents

Introduction Part 1 .............................................................................................. 4

Downloading the Data Files ................................................................................... 4

Starting PASW Statistics........................................................................................ 4

The PASW Statistics Window ............................................................................... 5

Data View................................................................................................................. 5

Variable View.......................................................................................................... 6

Creating a Data File................................................................................................ 6

Defining Variables .................................................................................................. 6

Data Entry ............................................................................................................... 8

Descriptive Statistics............................................................................................... 9

Frequency Analysis................................................................................................. 9

Crosstabs................................................................................................................ 11

Data Manipulation................................................................................................ 12

Select Cases............................................................................................................ 12

Splitting a File ....................................................................................................... 14

Find and Replace .................................................................................................. 15

Reporting ............................................................................................................... 16

Appendix................................................................................................................ 17

Introduction Part 2 ............................................................................................ 18

Downloading the Data Files ................................................................................. 18

Null Hypothesis ..................................................................................................... 18

Statistical Tests...................................................................................................... 19

Tests of Significance.............................................................................................. 19

PASW Statistics 17 (SPSS 17), Part 1 2

Correlations........................................................................................................... 19

Paired-Samples T Test.......................................................................................... 20

Independent-Samples T Test ............................................................................... 22

Multiple Response Sets......................................................................................... 23

Multiple Response Frequencies ........................................................................... 24

Multiple Response Crosstabs............................................................................... 25

Data Manipulation................................................................................................ 27

Copying and Pasting Variable Properties .......................................................... 27

Inserting Variables and Cases ............................................................................. 29

Deleting Variables and Cases............................................................................... 30

Merging Data Files................................................................................................ 30

Creating the Data File for Merging..................................................................... 30

Inputting the Data in Variable View................................................................... 30

Merging the Data Files ......................................................................................... 32

Appendix................................................................................................................ 35

Introduction Part 3 ............................................................................................ 37

Downloading the Data Files ................................................................................. 37

Simple Regression................................................................................................. 37

Scatter Plot ............................................................................................................ 37

Predicting Values of Dependent Variables......................................................... 39

Predicting This Years Sales with Simple Regression Model ........................... 41

Multiple Regression .............................................................................................. 43

Predicting Values of Dependent Variables......................................................... 43

Predicting This Years Sales with Multiple Regression Model ........................ 45

Data Transformation............................................................................................ 46

Computing ............................................................................................................. 46

Polynomial Regression.......................................................................................... 47

PASW Statistics 17 (SPSS 17), Part 1 3

Regression Analysis .............................................................................................. 48

Analyzing the Results ........................................................................................... 48

Chart Editing......................................................................................................... 49

Adding a Line to the Scatter Plot ........................................................................ 49

Manipulating the Scales on X- and Y-axes......................................................... 50

Adding a Title to the Chart .................................................................................. 52

Adding Colors to the Chart.................................................................................. 53

Filling a Background Color ................................................................................. 54

Introduction Part 4 ............................................................................................ 55

Downloading the Data Files ................................................................................. 55

Chi-Square............................................................................................................. 55

Chi-Square Test for Goodness-of-Fit .................................................................. 55

With Fixed Expected Values................................................................................ 55

With Fixed Expected Values and within a Contiguous Subset of Values........ 58

With Customized Expected Values ..................................................................... 59

One-Way Analysis of Variance............................................................................ 60

Post Hoc Tests ....................................................................................................... 63

Two-Way Analysis of Variance ........................................................................... 65

Importing/Exporting Microsoft Excel and PowerPoint .................................... 68

Using Scripting for Redundant Statistical Analyses.......................................... 71

PASW Statistics 17 (SPSS 17), Part 1 4

Introduction Part 1

PASW stands for Predictive Analytics Software. This program can be used to analyze data

collected from surveys, tests, observations, etc. It can perform a variety of data analyses and

presentation functions, including statistical analysis and graphical presentation of data. Among

its features are modules for statistical data analysis. These include 1) descriptive statistics, such

as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and

multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,

cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for

survey research, though by no means is it limited to just this topic of exploration.

This handout (Descriptive Statistics) introduces basic skills necessary to run PASW Statistics. It

includes how to create a data file and run descriptive statistics. It is especially tailored to answer

three research questions formulated in the sample survey questionnaire, eventually giving users

an overview of how PASW Statistics can be used for survey research. The three research

questions formulated in the sample survey are as follows:

1. What kind of computer do people prefer to own?

2. What color do people prefer for their computer?

3. Is computer color preference different between genders?

Downloading the Data Files

This handout includes sample data files that can be used for hands-on practice. The data files are

stored in a self-extracting archive. The archive must be downloaded and executed in order to

extract the data files.

! The data files used with this handout are available for download at

http://www.calstatela.edu/its/training/datafiles/pasw17p1.exe.

! Instructions on how to download and extract the data files are available at

http://www.calstatela.edu/its/docs/download.php.

Starting PASW Statistics

The following steps are for starting PASW

Statistics 17 using the computers in the Open

Access Labs (OALs). The steps for starting

the program at home or on other computers

may be slightly different.

To start PASW Statistics 17:

1. Click the Start button, point to All

Programs, point to Course Work,

point to SPSS Inc, point to PASW

Statistics 17, and select PASW

Statistics 17. The PASW Statistics 17

dialog box opens (see Figure 1).

2. Click the Cancel button to create a

new data file.

Figure 1 - PASW Statistics 17 Dialog Box

For additional SPSS help, visit http://www.youtube.com/mycsula

The PASW Statistics Window

The Data Editor window opens with two view tabs: Data View and Variable View. The Data

View is used for data input, and the Variable View is used for adding variables and defining

variable properties (e.g., modifying attributes of variables). As displayed in Figure 2, the Data

Editor window includes several components. The Title bar displays the name of the current file

and the application. The Menu bar allows you to access various commands that are grouped

according to function. The Toolbar provides shortcuts to commonly used menu commands.

Figure 2 - PASW Statistics Data Editor Window

DATA VIEW

When PASW Statistics is launched, the Data Editor window opens in Data View, which looks

similar to a Microsoft Excel spreadsheet (which is just an array of rows and columns). The

difference is that the rows and columns in Data View are referred to as cases and variables,

respectively (see Table 1).

Table 1 - Elements in Data View

Element Description

Variable Each column represents a variable. Any survey questionnaire item or test

item can be a variable. Commonly defined variable types are numeric or

string. When defining variables as numeric, users need to specify decimal

places. Variable names can be up to 256 characters long and must start

with a letter. Make variable names meaningful and easily recognizable.

Case

Each row represents a case. The participants in the study can be cases. For

example, if 100 participants are involved in your study, then 100 cases (or

rows) of information should be generated. Responses to the question

items should be entered consistently from left to right for each participant.

PASW Statistics 17 (SPSS 17), Part 1 6

Cell

A cell is an intersection between cases and variables. Each response to a

survey question should be entered in a cell for each participant according

to the defined variable data types.

VARIABLE VIEW

Variable View is where variables are defined by assigning variable names and specifying the

attributes, such as data type (String, Date, Numeric, etc.), value labels, and measurement

scales (Nominal, Ordinal, or Scale). Users can think of Variable View as the backbone

structure for the Data View; data cannot be entered nor viewed without first defining variables in

Variable View (see Table 2).

Table 2 - Elements in Variable View

Element Description

Variable Name

PASW Statistics will initially give a default variable name (var00001) that

users can change. It is recommended to assign a brief and meaningful

name to variables (e.g., Name, Gender, and GPA).

Variable Type

The variable type determines how the cases are entered. Generally, text-

based characters are of String type and number-based characters are of

Numeric type. For example, if a user has a variable called Name,

then its variable type should be String. Similarly, a variable named

GPA should be a Numeric type with (normally two) decimal places.

Value Labels

Value labels allow users to describe what the variable name stands for.

For example, if a variable has been defined as Fav, most likely others

may not know what it stands for. To avoid misinterpretation, value labels

can be utilized to clearly define variable names.

Creating a Data File

Creating a new PASW Statistics data file consists of two stages: (1) defining variables and (2)

entering the data. Defining the variables involves multiple processes and requires careful

planning. Once the variables have been defined, the data can then be added.

DEFINING VARIABLES

First, variable names based on your research questionnaire need to be assigned. If variable names

are not assigned, PASW Statistics will assign default names that may not be recognizable.

Second, the Type attribute should be specified for each variable. If necessary, assign labels to

values to help all users of the file understand the data better.

To define variables (example):

1. Click the Variable View tab at the lower left corner of the Data Editor window (see

Figure 3).

2. Type [Name] in the first cell under the Name column and press the [Enter] key.

3. Under the Type column, click the ellipses button . The Variable Type dialog box opens

(see Figure 4).

4. Select the String option.

5. Click the OK button.

PASW Statistics 17 (SPSS 17), Part 1 7

Figure 3 - Variable View Tab

Figure 4 - Variable Type Dialog Box

6. Type [Gender] in row two under the Name column.

7. Activate the cell in row two under the Decimals column and change the entry to 0

using the spin box.

8. Type [What is your gender?] in row two under the Label column.

9. Click the ellipses button in row two under the Values column. The Value Labels dialog

box opens (see Figure 5).

10. Type [1] in the Value: box.

11. Type [female] in the Label: box.

12. Click the Add button.

13. Repeat steps 10-12 using a value of [2] and a label of [male].

Figure 5 - Value Labels Dialog Box (Gender)

14. Click the OK button.

15. Type [GPA] in row three under the Name column and press the [Enter] key.

16. Type [Age] in row four under the Name column.

17. Click row four under the Decimals column and change the entry to 0 using the spin

box.

18. Type [What is your age?] in row four under the Label column.

19. In row four under the Values column, click the ellipses button. The Value Labels dialog

box opens (see Figure 6).

20. Type [1] in the Value: box.

21. Type [19 or younger] in the Label: box.

22. Click the Add button.

PASW Statistics 17 (SPSS 17), Part 1 8

23. Repeat steps 20-22 for values [2] through [5] and label them as shown in Table 3 (you

may also refer back to the sample questionnaire). See Figure 6 for the results.

24. Click the OK button.

Table 3 - Value Labels

Value Label

2 20-23

3 24-27

4 28-31

5 32 or over

Figure 6 - Value Labels Dialog Box (Age)

DATA ENTRY

After defining the variables, users can enter data for each case. If variables are defined as having

a Numeric data type, then numeric data should be entered. PASW Statistics will only accept

numeric digits (0-9) for a Numeric data type. If variables are defined as String data, any

keyboard character can be entered.

To enter data:

1. Click the Data View tab at the lower left corner of the Data Editor window (see Figure

7).

2. Click in a cell and type the corresponding data. The entry will also appear in the Cell

Editor (see Figure 8).

Figure 7 - Data View Tab

Figure 8 - Data Entry

PASW Statistics 17 (SPSS 17), Part 1 9

Descriptive Statistics

After data has been entered, users may begin analyzing the data by using descriptive statistics.

Descriptive statistics are the most commonly used statistics for summarizing data frequency or

measures of central tendency (mean, median, and mode).

Research Question # 1

What kind of computer do people prefer to own?

FREQUENCY ANALYSIS

We can use frequency analysis to answer the first research question. Frequency analysis is a

descriptive statistical method that shows the number of occurrences of each response chosen by

the respondents. When using frequency analysis, PASW Statistics can also calculate the mean,

median, and mode to help users analyze the results and draw conclusions. The following

example will use a frequency analysis to answer Research Question # 1: What kind of computer

do people prefer to own? using the data collected from our sample survey (see Appendix).

To perform frequency analysis:

1. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.

2. Locate and open the Part 1.sav file.

3. Click the Analyze menu, point to Descriptive Statistics, and select Frequencies (see

Figure 9). The Frequencies dialog box opens (see Figure 10).

4. Select the variable(s) desired to be analyzed. In this case, select the variable Computer

Owned from the list box on the left.

5. Click the transfer arrow button . The selected variable is moved to the Variable(s): list

box.

6. Select the Display frequency tables check box if necessary.

Figure 9 - Frequency Analysis from Analyze Menu

Figure 10 - Frequencies Dialog Box

7. Click the Statistics button. The Frequencies: Statistics dialog box opens (see Figure

11).

8. Select the Mean, Median, and Mode check boxes in the Central Tendency section; select

the Std. deviation check box in the Dispersion section.

PASW Statistics 17 (SPSS 17), Part 1 10

Figure 11 - Frequencies: Statistics Dialog Box

9. Click the Continue button. This returns you to the Frequencies dialog box.

10. Click the OK button. An Output Viewer window opens and displays the statistics and

frequency table (see Figure 12). The columns of the table Computer Owned display the

Frequency, Percent, Valid Percent, and Cumulative Percent for each different

type of computer owned.

Figure 12 - Frequencies Output

The measures of central tendency (mean, median, and mode) can be used to summarize various

types of data. Mode can be used for nominal data, such as computer type, computer color,

ethnicity, etc. Mean or median can be used for interval/ratio data, such as test scores, age, etc.

The mean is also useful for data with a skewed distribution.

Answer to Research Question # 1

What kind of computer do people prefer to own?

PASW Statistics 17 (SPSS 17), Part 1 11

Answer: IBM or Compatible

Explanation: Look at question # 7 in the Sample Survey. Notice that option # 3 is IBM or

Compatible. In the output Statistics table, the mode for Computer Owned is 3, which is

IBM or Compatible. In addition, the frequency analysis results for Computer Owned

indicates that 49 out of 80 people own an IBM or Compatible computer. This can be

considered their preference.

Research Question # 2

What color do people prefer for their computer?

CROSSTABS

Crosstabs are used to examine the relationship between two variables. To answer the second

research question, users will need to analyze two variables: Computer Owned and Color

(which indicates color preference). Using crosstabs will show the intersection between these two

variables and reveal the computer type and color preferred by most people.

To perform a crosstabs analysis:

1. In Data View, click the Analyze menu, point to Descriptive Statistics, and select

Crosstabs (see Figure 13). The Crosstabs dialog box opens.

2. Select the variable Computer Owned from the list box on the left.

3. Click the transfer arrow button to move it to the Row(s): list box.

4. Select the variable color (see Figure 14).

5. Click the transfer arrow button to move it to the Column(s): list box.

6. Click the OK button. An Output Viewer window opens and displays two tables: Case

Processing Summary and the Crosstabulation matrix (see Figure 15).

Figure 13 - Crosstab Analysis from Analyze Menu

Figure 14 - Crosstabs Dialog Box

PASW Statistics 17 (SPSS 17), Part 1 12

Figure 15 - Crosstabs Output

Answer to Research Question # 2

What color do people prefer for their computer?

Answer: IBM or Compatible in beige color

Explanation: As shown in the Crosstabulation matrix above, IBM or Compatible is the

most preferred computer type from the row variable (Computer Owned). From the column

variable (color), beige is shown as the most preferred color. Therefore, you can conclude

that most people prefer IBM or Compatible computers that are in beige color.

Data Manipulation

Data files are not always ideally organized in a form to meet specific needs. For example, users

may wish to select a specific subject or split the data file into separate groups for analysis.

SELECT CASES

If you have two or more subject groups in your data and you want to analyze each subject in

isolation, you can use the select cases option. For example, the data we are currently analyzing

has both male and female participants. However, if you wish to analyze only female cases, then

you select Gender cases and set the condition for female cases only.

To select cases for analysis:

1. Click the Data menu and select Select Cases (see Figure 16). The Select Cases dialog

box opens (see Figure 17).

2. Click the If condition is satisfied option.

3. Click the If button. The Select Cases: If dialog box opens.

4. Select the variable Gender in the left list box.

5. Click the transfer arrow button to move it to the right text box.

6. Click the = button .

7. Click the 1 button .

8. Click the Continue button. This takes you back to the Select Cases dialog box.

9. Click the OK button. This takes you back to Data View. All males will be excluded from

the statistical analysis.

10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this

handout.

11. Click the OK button. The Output Viewer window updates (see Figure 18).

For additional SPSS help, visit http://www.youtube.com/mycsula

Figure 16 - Select Cases from Data Menu

Figure 17 - Select Cases Dialog Box

From the cross tabulation in the Output Viewer window in Figure 18 below, look at the column

for the most preferred color and the row for the computer types. Since we selected only female

cases, what is the computer color most preferred by women? Ten women chose IBM or

Compatible with color option 5. Thus, you may conclude that most female participants prefer

the color 5 for IBM or Compatible computers. However, what does 5 represent? This

problem arose by not labeling the variable value 5 as Other. Moreover, even if it were

labeled Other, it does not indicate any particular color, making it difficult to draw a

conclusion. In order to avoid such problems, it is suggested that you provide a blank space where

participants can specify Other color preferences besides the ones specified in the survey

questionnaire.

Figure 18 - Select Cases Output

Example:

What kind of color do you like to have for your computer?

1. Beige 2.Black 3.Gray 4.White 5.Other __________

Research Question # 3

PASW Statistics 17 (SPSS 17), Part 1 14

Is computer color preference different between genders?

SPLITTING A FILE

To answer the third research question, we need to split the file. You can analyze one particular

group of subjects using the select cases option. However, if you wish to compare the response or

performance differences by groups within one variable, it is best to use the split files option.

To split a file for analysis:

1. Turn off the select cases option.

2. Click the Data menu and select Select Cases. The Select Cases dialog box opens.

3. Select the All cases option.

4. Click the OK button. Notice that the male cases that were excluded are now all included

in the data file.

5. Select the Data menu and select Split File. (see Figure 19). The Split File dialog box

opens (see Figure 20).

Figure 19 - Split File from Data Menu

Figure 20 - Split File Dialog Box

6. Select the variable Gender from the left list box.

7. Select the Compare groups option.

8. Click the transfer arrow button to move the variable Gender to the Groups Based

on: list box.

9. Click the OK button.

10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this

handout.

11. Click the OK button. The Output Viewer window crosstabulation table opens (see

Figure 21).

PASW Statistics 17 (SPSS 17), Part 2 15

Figure 21 - Split File Output Data

Answer to Research Question # 3

Is computer color preference different between genders?

Answer: Yes

Explanation: There is a computer color preference difference based on gender. From the

crosstabulation output, females prefer IBM or Compatible of Other color over the colors

beige, black, gray, or white. The male group prefers IBM or Compatible of black color.

FIND AND REPLACE

In PASW Statistics, the Find and Replace function is more efficient to use. Users can use Find

and Replace in Data View. However, only the Find function is available for users in Variable

View.

To use the Find and Replace function:

1. Click the Edit menu and select Find. The Find and Replace dialog box opens (see

Figure 22).

2. In the Find: box, type [Clinton].

3. Select the Replace check box to replace Clinton with another word.

4. Click in the Replace with: box, and type the name [Cliff].

5. Click the Show Options button.

6. Under Match to, select the Entire cell option.

7. Click the Replace All button.

Figure 22 - Find and Replace Dialog Box (Data View)

PASW Statistics 17 (SPSS 17), Part 2 16

NOTE: Under the Match to section of the Find and Replace dialog box (see Figure 22),

Contains means PASW Statistics will find each instance of the word/phrase/number appearing in

a cell, whether or not it is the only information enclosed. The Entire cell option will find the

word/phrase/number that matches the entire cell as a whole. Selecting the Begins with and Ends

with options will search the character indicated by the user.

Reporting

Once the statistical analysis is complete, the final step is to create a report. In the report, you may

include PASW Statistics output (e.g., graphs and tables) for supporting your analysis. Using the

Copy and Paste functions, the tables/graphs generated in PASW Statistics can be copied from the

Output Viewer window and pasted into a Microsoft Word document without having to create

new tables or graphs.

To create a report using Microsoft Word:

1. In the Output Viewer window, right-click a table. A box appears around the table and a

red arrow to the left of the table (which means it is selected).

2. Select Copy from the shortcut menu.

3. Open Microsoft Word.

4. Right-click in the Word document and select Paste from the shortcut menu. The table is

copied into the Word document.

PASW Statistics 17 (SPSS 17), Part 2 17

Appendix

SAMPLE SURVEY

Research Questions

1. What kind of computer do people prefer to own?

2. What color do people prefer for their computer?

3. Is computer color preference different between genders?

Survey Questions

1. What is your name? ____________________________

2. What is your gender? ____________________________

3. What is your G.P.A.? ____________________________

4. What is your age?

1. 19 or younger 2. 20-23 3. 24-27 4. 28-31 5. 32 or over

5. How much do you make in a month?

1. Less than $1000 2. $1000$1499 3. $1500$1999 4. $2000$2499 5. Over $2500

6. What is your class standing?

1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate

7. What kind of computer do you own?

1. Toshiba 2. Apple 3. IBM or Compatible 4. Other 5. None

8. What kind of computer have you used?

1. IBM or Compatible 2. Apple 3. Toshiba 4. Other 5. None

9. What color do you like to have for your computer?

1. Beige 2. Black 3. Gray 4. White 5. Other

PASW Statistics 17 (SPSS 17), Part 2 18

Introduction Part 2

PASW stands for Predictive Analytics Software. This program can be used to analyze data

collected from surveys, tests, observations, etc. It can perform a variety of data analyses and

presentation functions, including statistical analysis and graphical presentation of data. Among

its features are modules for statistical data analysis. These include 1) descriptive statistics, such

as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and

multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,

cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for

survey research, though by no means is it limited to just this topic of exploration.

This handout (Test of Significance) introduces 1) several data entry and data manipulation

techniques that help you save time, 2) basic skills to perform tests of significance, such as

correlations and t tests, and 3) an introduction to multiple response sets. The step-by-step

instructions will help you understand how to interpret the output of your tests from data supplied

by your research question(s). Follow the steps carefully to get appropriate results. Please note

that a slightly different process might yield unexpected and complicated results. This is a

continuation of the PASW Statistics Descriptive Statistics handout.

Downloading the Data Files

This handout includes sample data files that can be used for hands-on practice. The data files are

stored in a self-extracting archive. The archive must be downloaded and executed in order to

extract the data files.

! The data files used with this handout are available for download at

http://www.calstatela.edu/its/training/datafiles/pasw17p2.exe.

! Instructions on how to download and extract the data files are available at

http://www.calstatela.edu/its/docs/download.php.

Null Hypothesis

The null hypothesis (H!) represents a theory that has been presented, either because it is believed

to be true or because it is to be used as a basis for an argument. It is a statement that has not been

proven. It is also important to realize that the null hypothesis is the statement of no difference.

For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is

no better, on average, than the current drug (in other words, the new drug exhibits the same

behavior as the old drug). The null hypothesis (Hu) and the alternative hypothesis (H1) can be

stated as:

H!: There is no difference between the two drugs.

H": Theie is a significant uiffeience between the two uiugs.

Special consideration is given to the null hypothesis. This is due to the fact that the null

hypothesis relates to the statement being tested, whereas the alternative hypothesis relates to the

statement to be accepted if and when the null is rejected.

The final conclusion, once the test has been carried out, is always given in terms of the null

hypothesis. The result is either "Reject Hu in favor of H1" or "Do not reject Hu"; the conclusion is

never "Reject H1" or "Accept H1."

PASW Statistics 17 (SPSS 17), Part 2 19

If the conclusion is "Do not reject Hu," this does not necessarily mean that the null hypothesis is

true. It only suggests that there is no sufficient evidence against Hu in favor of H1. Rejecting the

null hypothesis then suggests that the alternative hypothesis may be true.

NOTE: The null hypothesis essentially states that the given cases or items under consideration are

statistically the same or exhibit the same behavior without any significant difference. The alternate

hypothesis states that the given cases exhibit different behavior or that they have a statistically significant

difference.

Statistical Tests

Statistics is a set of mathematical techniques used to summarize research data and determine

whether the data supports a proposed hypothesis. PASW Statistics includes tools that can be used

to analyze variables and determine the strength and nature of the relationship between two

variables and whether the means (averages) of two data sets (samples) are statistically the same

or different.

Tests of Significance

The following examples are sample research questions that can be answered using PASW

Statistics analytical methods.

CORRELATIONS

A correlation is a statistical device that measures strength or degree of a supposed linear

association between two or more variables. One of the more common measures used is the

Pearson correlation, which estimates a relationship between two interval variables.

Research Question # 1

Is there a relationship between academic performance and Internet access?

H!: There is no difference between academic performance and Internet access.

H": Theie is a significant uiffeience between acauemic peifoimance anu Inteinet access.

To run a correlation analysis:

1. Locate and open the Part 2.sav file.

2. Click the Analyze menu, point to Correlate, and select Bivariate. The Bivariate

Correlations dialog box opens (see Figure 23).

3. Select the variables active, posttest, and gpa in the list box on the left.

4. Click the transfer arrow button to move them to the Variables: list box.

5. Select the Pearson check box and the Two-tailed option if necessary.

6. Click the OK button. The Output Viewer window opens with a Correlations table

(see Figure 24).

PASW Statistics 17 (SPSS 17), Part 2 20

Figure 23 - Bivariate Correlations Dialog Box

Figure 24 - Bivariate Correlations Output Table

The Answer to Research Question # 1

Is there a relationship between academic performance and Internet access?

Answer: Yes

Explanation: As shown in Figure 24 above, the correlation index for the relationship between active

and posttest is 0.476, which is between 0.4-0.7. The correlation index for the relationship between

active and gpa is 0.448, which is between 0.4-0.7. The results from these analyses indicate that

there is a moderate, positive relationship between academic performance and Internet access.

PAIRED-SAMPLES T TEST

A Paired-Samples T Test is used to test if an observed difference between two means is

statistically significant. To run a t test, the following assumptions should be met: the data 1) has

normal distribution, 2) is a large data set, and 3) has no outliers. If any of these assumptions are

not met, then a nonparametric test should be used.

Figure 9 - Bivariate Correlations Output Table

PASW Statistics 17 (SPSS 17), Part 2 21

Research Question # 2

Is there an instructional effect taking place in the computer class?

H!: Theie is #$ influence of using the Inteinet on acauemic achievement foi this class.

H

1

:

Theie is an influence of using the Inteinet on acauemic achievement foi this class.

The hypothesis is that Internet familiarity cannot influence the academic achievement in the

computer class. The variables that reflect academic achievement are pretest and posttest.

To run a Paired-Samples T Test:

1. Click the Analyze menu, point to Compare Means, and select Paired-Samples T

Test. The Paired-Samples T Test dialog box opens (see Figure 25).

2. Select the variables pretest and posttest in the list box on the left.

3. Click the transfer arrow button to move them to the Paired Variables: list box.

4. Click the OK button. The Output Viewer window opens (see Figure 26).

Figure 25 - Paired-Samples T Test Dialog Box

The Answer to Research Question # 2

Is there an instructional effect taking place in the computer class?

Figure 26 - Paired-Samples T Test Output Table

Answer: Yes

Explanation: The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001,

the mean difference (-4.5172) between pretest and posttest is statistically significant.

According to the Sig. of 0.001 (which is less than 0.05), the hypothesis is rejected. Therefore, it

can be inferred that there was instructional effect taking place in the computer class.

PASW Statistics 17 (SPSS 17), Part 2 22

INDEPENDENT-SAMPLES T TEST

An Independent-Samples T Test is used to determine the likelihood that two independent data

samples came from populations that have identical means. If this were true, then the difference

between the means should be equal to zero. The null hypothesis in this case would be that the

two means are equal.

Two variables are required in the data set. One variable is the measured parameter. Examples

include weight, height, or frequency. The second variable divides the data set into two groups.

Light and Dark are the groups whose means will be compared.

Research Question # 3

Is there a difference in the average number of seedlings grown in the light

and those grown in the dark?

In this example, 20 Petri dishes each contained 10 celery seeds. Ten of the dishes were kept in

the dark for one week; the other 10 were placed under a grow light for the same amount of time.

At the end of the week, the number of seeds that sprouted was counted in each dish.

H!: Variance (light) = variance (dark).

H

1

:

Variance (light) ! variance (dark).

H!: There is no difference between seedlings under the light and in the dark (

! (light) &

! (uaik)).

H

1

:

Theie is sig. uiffeience between seeulings unuei the light anu in the uaik ( ! (light) ! !

(uaik)).

NOTE: The first set of hypotheses is testing the variance, while the proceeding set is testing for the mean.

The variances have to be equal before we can determine if the means are equal.

NOTE: Variance: The arithmetic mean of the squared deviations from the mean, which is essentially used

to see how far the single samples are from the mean. We need to make sure the variances are equal before

we can determine if the means are equal. If the variances are equal, users will be able to move to the T

Test. If the variances are not equal, users will have to do more testing.

To run the Independent-Samples T Test:

1. Locate and open the Seedlings.sav file.

2. In Data View, click the Analyze menu, point to Compare Means, and select

Independent-Samples T Test. The Independent-Samples T Test dialog box opens (see

Figure 27).

3. Select the Seedlings variable in the list box on the left.

4. Click the transfer arrow button to move the variable to the Test Variable(s): list box.

5. Select the Treatment variable in the list box on the left.

6. Click the transfer arrow button to move the variable to the Grouping Variable: list box.

7. Click the Define Groups button. The Define Groups dialog box opens (see Figure 28).

8. Enter [0] in the Group 1: box, enter [1] in the Group 2: box, and then click the Continue

button.

9. Click the OK button. The Output Viewer window opens with several tables, including

an Independent-Samples Test table (see Figure 29).

PASW Statistics 17 (SPSS 17), Part 2 23

Figure 27 - Independent-Samples T Test Dialog Box

Figure 28 - Define Groups Dialog Box

The Answer to Research Question # 3

Is there a difference in the average number of seedlings grown in the light

and those grown in the dark?

Figure 29 - Independent-Samples T Test Output

Answer: Yes

Explanation: The mean difference in seedlings sprouted between the two treatments (light and

dark) was -2.900. The value of t, which is -3.179, was statistically significant (p=0.005).

Therefore, the null hypothesis is rejected.

Multiple Response Sets

Very often, a survey will contain questions where the respondent is allowed to select more than

one answer. Managing such questions in PASW Statistics can produce some difficulty. Each

response in a multiple response question should be coded as a separate variable and then grouped

under a multiple response set of variables. The multiple response set can then be analyzed using

frequency counts or crosstabs.

To define a multiple response set of variables:

1. Locate and open the Airlines.sav file.

2. In Data View, click the Analyze menu, point to Multiple Response, and select Define

Variable Sets (see Figure 30). The Define Multiple Response Sets dialog box opens

(see Figure 31).

PASW Statistics 17 (SPSS 17), Part 2 24

Figure 30 - Define Variable Sets from Analyze

Menu

Figure 31 - Define Multiple Response Sets Dialog Box

3. Select the American, TWA, United, USAir, and Other airline variables and

move them to the Variables in Set: list box.

4. Make sure the Dichotomies option is selected and enter [1] in the Counted value: box.

5. Type [Airlines] in the Name: box.

6. Type [Airline frequency of response] in the Label: box.

7. Click the Add button. The set is created as $Airlines and listed in the Multiple

Response Sets: list box.

8. Click the Close button.

MULTIPLE RESPONSE FREQUENCIES

It is possible to obtain the answer by running a frequency analysis for each of the airline

variables. The result of such an analysis will only provide an overall raw frequency for each

response and will not allow percentage comparisons between the different airlines. A frequency

analysis that uses a multiple response set will provide an appropriate response with concise

output.

Research Question # 4

In a survey of airline passengers, which airline was selected as having been

flown most often in the previous six months?

To analyze the frequency of response for each variable in a multiple response set:

1. Click the Analyze menu, point to Multiple Response, and select Frequencies. The

Multiple Response Frequencies dialog box opens (see Figure 32).

2. Select the multiple response set labeled $Airlines and move it to the Table(s) for: list

box.

3. Click the OK button. An Output Viewer window opens with the frequency analysis (see

Figure 33).

PASW Statistics 17 (SPSS 17), Part 2 25

Figure 32 - Multiple Response Frequencies Dialog Box

The Answer to Research Question # 4

In a survey of airline passengers, which airline was selected as having been

flown most often in the previous six months?

Figure 33 - Airline Frequency Analysis Output

Answer: United

Explanation: As seen in the Output Viewer window, there were 18 people surveyed and 44 total

responses generated. Of the 44 total responses, United was selected most often with 12 responses

(representing 27.3% the largest portion of the total responses).

MULTIPLE RESPONSE CROSSTABS

Without the use of a multiple response set, each airline would have to be analyzed against the

variable that the passengers used to identify themselves as being afraid of flying. This would

require the use of a crosstab analysis. However, the overall results would not allow for easy

comparison between each of the airlines. The best way to answer the question would be to

include the multiple response set into a crosstab analysis.

Research Question # 5

In a survey of airline passengers, which airline was selected most often by

those passengers who identified themselves as afraid to fly?

PASW Statistics 17 (SPSS 17), Part 2 26

To incorporate a multiple response set into a crosstab analysis:

1. Click the Analyze menu, point to Multiple Response, and select Crosstabs. The

Multiple Response Crosstabs dialog box opens (see Figure 34).

Figure 34 - Multiple Response Crosstabs Dialog Box

2. Select the FearFactor variable as the Row(s): variable and the $Airlines multiple

response set as the Column(s): variable.

3. Select the FearFactor variable after it is designated as the Row(s): variable. The

Define Ranges button becomes active.

4. Click the Define Ranges button. The Multiple Response Crosstabs: Define Variable

Ranges dialog box opens (see Figure 35).

Figure 35 - Multiple Response Crosstabs: Define Variable Ranges Dialog Box

5. Enter [0] in the Minimum: box and [1] in the Maximum: box for the FearFactor

variable.

6. Click the Continue button.

7. Click the Options button. The Multiple Response Crosstabs: Options dialog box opens

(see Figure 36).

8. Select the Cases option and then click the Continue button.

9. Click the OK button. The Output Viewer window opens with the crosstab results (see

Figure 37).

PASW Statistics 17 (SPSS 17), Part 2 27

Figure 36 - Multiple Response Crosstabs: Options Dialog Box

The Answer to Research Question # 5

In a survey of airline passengers, which airline was selected most often by

those passengers who identified themselves as afraid to fly?

Figure 37 - Multiple Response Crosstabs Output

Answer: USAir

Explanation: Of the 18 people surveyed, ten identified themselves as being afraid to fly. Within

that group of survey respondents, USAir was the airline selected most often (seven times).

Data Manipulation

PASW Statistics also provides tools to make data manipulation a simple task.

COPYING AND PASTING VARIABLE PROPERTIES

Copying and pasting is very useful when the same properties need to be given to different

variables.

To copy and paste variable properties:

1. Click the File menu, point to New, and select Data.

2. Click the Variable View tab at the lower left corner of the Data Editor window (see

Figure 38).

Figure 38 - Variable View Tab

PASW Statistics 17 (SPSS 17), Part 2 28

3. Type [active] in the first cell under the Name column and press the [Enter] key.

4. Click in the first cell under the Decimals column and decrease the entry to 0.

5. Click in the first cell under the Values column and click the Ellipses button . The

Value Labels dialog box opens (see Figure 39).

6. Type [1] in the Value: box.

7. Type [Strongly Disagree] in the Label: box.

8. Click the Add button.

9. Assign [2], [3], and [4] for [Disagree], [Agree], and [Strongly Agree], respectively, by

repeating steps 6-8 for each value added (see Figure 39).

Figure 39 - Value Labels Dialog Box

10. Click the OK button.

11. Switch back to Data View (see Figure 40).

12. Click the active variable heading to highlight the column.

13. Click the Edit menu and select Copy to copy the properties of the variable active.

14. Highlight the number of variables needed to apply the same properties to by clicking on

the header of the first variable and dragging the pointer across to the last header (see

Figure 41 and Figure 42).

15. Click the Edit menu and select Paste. The copied properties of the variable active will

be applied to the target variables, and the Data View and Variable View will change (see

Figure 43 and Figure 44).

Figure 40 - Data View Tab

Figure 41 - Selected Variable

PASW Statistics 17 (SPSS 17), Part 2 29

Figure 42 - Selecting Target Variables

Figure 43 - Data View Showing New Variables

Figure 44 - Variable View Showing New Variables

INSERTING VARIABLES AND CASES

By using Insert Variable and Insert Cases, variables and cases can be added into any location

of the data file in a simple, straightforward manner. Assume that one wants to insert a new

variable named midterm between pretest and posttest and use it for test score data. The

following instructions describe how to insert a new variable and make it available for Numeric

data type.

To insert a variable:

1. Switch to Data View.

2. Click the posttest variable heading to highlight the column.

3. Click the Edit menu and select Insert Variable. A new variable is inserted to the left of

the highlighted variable (posttest).

NOTE: The new variable is created with a default name VAR00001 which can be changed

later.

4. To define the properties of the new variable, double-click the variable heading. The

Variable View is activated for the new variable.

5. Type [midterm] in the Name column of the new variable.

6. Change the variable type if desired.

In the same manner, it is possible to insert cases in a particular location in Data View. For

instance, assume that a case should be inserted between case 10 and 11 for a particular

students record. By following the instructions below, one case will be inserted after the 10th

case.

To insert cases (example):

1. Switch to Data View.

2. Click row number 11 to highlight the case.

3. Click the Edit menu and select Insert Cases. A new case is inserted above case 11.

PASW Statistics 17 (SPSS 17), Part 3 30

DELETING VARIABLES AND CASES

Variables and cases can be deleted by using the Delete command.

To delete a variable or case:

1. In Data View, click the variable heading or the case number to highlight what will be

deleted.

2. Click the Edit menu and select Clear. The variable or case is deleted.

Merging Data Files

The merging data files function is useful for users who store each of their topics in separate files

and eventually need or want to combine them together. This allows users to import data from one

file into another as long as both sets of data (from each file) contain a common identifier for each

of the cases that the user wishes to combine.

An identifier has no meaning other than to distinguish each case from one another, and to

identify the correlating cases from the additional data files. This identifier can be a unique value,

number, or letter combination to be applied to each case.

NOTE: The variables do not have to be the same across data files.

CREATING THE DATA FILE FOR MERGING

Scenario: A psychological focus group on campus needs to create a file for a longitudinal study

for ten students on campus. Each file will have the same students, but four different focal points

of study pertaining to each question. Over the five year span of the study, the ten students will be

asked twelve questions each year (one a month), and the same questions will be asked each year.

At the end of the year, the three files will be combined in an annual questionnaire file to be

properly analyzed.

The merging data files function can be used to satisfy this requirement.

Inputting the Data in Variable View

Files must be created first before being merged.

To create a data file for merging:

1. Click the File menu, point to New, and select Data.

2. Once the new file has been created, select the Variable View tab.

3. For the first variable, name it [ID] to be your identifier variable, and press the [Enter]

key.

4. Change the Type attribute by clicking the ellipses button and selecting the String option

from the Variable Type dialog box.

5. Change the width to [10] and click the OK button.

6. Click in the second variable cell, type [January], and press the [Enter] key.

7. Change the Type attribute to String.

8. In the Label attribute, type [What pet would you like to own?] (see Figure 45).

9. Repeat steps 6 through 8 to enter the data in Table 4.

PASW Statistics 17 (SPSS 17), Part 3 31

Figure 45 - Define Variables in Variable View

Table 4 - Variables for Case Study

Month Attribute Type Length Label Attribute

February String 10 What is your favorite shape?

March String 12 It is 1:30pm, what are you eating?

April String 12 What is your preferred beverage?

10. Once this information has been defined in Variable View, switch by clicking the Data

View tab to enter the corresponding case information.

11. Enter [Alfred] in case 1 of the ID variable, [Bethel] in case 2 of the ID variable, down to

[Jessie] in case 10 of the ID variable. Enter the corresponding information according to

Table 5. See Figure 46 for the results.

Table 5 - Input Case Information

Case ID January February March April

1 Alfred Dog Star Pizza Water

2 Bethel Cat Square Fruit Soda Pop

3 Chris Cat Triangle Veggies Grape Juice

4 Dante Dog Rectangle Sandwich Orange Juice

5 Erica Tiger Oval Chips Aloe Water

6 Fernando Tarantula Circle Calzon Beer

7 Grenadine Dog Octagon Salad White Wine

8 Harold Bees Polygon Soup Naked Juices

9 Isadora Turtle Rhombus PandaExpress V8 Juice

10 Jessie Hamster Oval Egg Salad Lemonade

PASW Statistics 17 (SPSS 17), Part 3 32

Figure 46 - Input Case Information

12. Save the file by clicking the File menu and selecting Save. The Save Data As dialog box

opens.

13. Select the Desktop as the destination and type [Merge 1] in the File name: text box.

14. Click the Save button.

15. Close the Output Viewer window.

MERGING THE DATA FILES

To merge data files, all files must have a common variable. The common variable in this case is

ID.

To merge data files: (First, make sure the files have the same IDs.)

1. Open the files Merge 2 and Merge 3 and check for consistency across all of the IDs.

2. Minimize the Merge 2 and Merge 3 data files.

3. Once back in the Merge 1 file, click the Data menu, point to Merge Files, and select

Add Variables (see Figure 47).

Figure 47 - Data Menu When Selecting Add Variables

PASW Statistics 17 (SPSS 17), Part 3 33

4. The Add Variables to Merge 1.sav dialog box opens. Select the An external PASW

Statistics data file option and click the Browse button (see Figure 48).

Figure 48 - Add Variables to Merge 1.sav Dialog Box

5. Locate and select the Merge 2 data file and click the Open button.

6. Click the Continue button. The Add Variables from Merge 2.sav dialog box opens (see

Figure 49).

7. Select the Match cases on key variables in sorted files check box.

8. From the Excluded Variables: list box, select ID>(+) (see Figure 49), and using the

transfer arrow button , move it to the Key Variables: box.

Figure 49 - Add Variable from Merge 2.sav Dialog Box

9. Click the OK button. A warning message dialog box opens (see Figure 50).

PASW Statistics 17 (SPSS 17), Part 3 34

Figure 50 - Sorting Warning Dialog Box

10. Click the OK button to close the warning message. The finished product should look like

Figure 51.

Figure 51 - Merged 1 and 2 Files

11. Repeat steps 3-10 for the Merge 3 file.

PASW Statistics 17 (SPSS 17), Part 3 35

Appendix

QUESTIONNAIRE

This survey is designed to investigate relationships between Internet access and academic

success. It consists of three parts: questions related to the background information of the

respondent, questions about Internet use patterns, and several open-ended questions. Please

select appropriate answers that best describe your activities on the Internet as truthfully as

possible. The results of this study will be used anonymously for the PASW Statistics Part 2: Test

of Significance workshop.

Background Information

1. Age: ____________________________

2. Major: ___________________________

3. G.P.A.: __________________________

4. Monthly Income: __________________

Internet Access

5. Do you have a computer at home?

1. Yes 2. No

6. Where do you surf on the Internet? (You can circle more than one option for this question.)

1. At school 2. At home 3. At work 4. Other ____________

7. How long do you stay online per day?

1. Less than 30 minutes 2. 1-2 hours 3. More than two hours

Questions 8 through 19 are designed to investigate the frequency and types of activities on

the Internet. These questions have a 4 point Likert-scale ranging from strongly disagree to

strongly agree. Please circle the option that best describes your activities on the Internet.

SD: Strongly Disagree

D: Disagree

A: Agree

SA: Strongly Agree

SD D A SA

8. I am a very active Internet surfer. 1 2 3 4

9. I surf the Internet to look for articles for research

papers. 1 2 3 4

PASW Statistics 17 (SPSS 17), Part 3 36

SD D A SA

10. I surf the Internet to read current news. 1 2 3 4

11. I use the Internet only to e-mail my friends,

family, and professors. 1 2 3 4

12. I surf the Internet to check movie schedules. 1 2 3 4

13. I surf the Internet to look for personal

information (e.g., yellow pages). 1 2 3 4

14. I surf the Internet to look for job openings 1 2 3 4

15. I use the Internet to play games. 1 2 3 4

16. I use the Internet to download forms and files

(e.g., income tax forms). 1 2 3 4

17. I surf the Internet to improve my computer skills. 1 2 3 4

18. I surf the Internet to purchase books. 1 2 3 4

19. I surf the Internet to purchase other merchandise

(e.g., video tapes, clothes, computers). 1 2 3 4

Question 20 is an open-ended question.

20. Are there any other Internet activities that are not included in this survey? If so, please

describe them below.

____________________________________________________________________

____________________________________________________________________

____________________________________________________________________

____________________________________________________________________

PASW Statistics 17 (SPSS 17), Part 3 37

Introduction Part 3

PASW stands for Predictive Analytics Software. This program can be used to analyze data

collected from surveys, tests, observations, etc. It can perform a variety of data analyses and

presentation functions, including statistical analysis and graphical presentation of data. Among

its features are modules for statistical data analysis. These include 1) descriptive statistics, such

as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and

multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,

cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for

survey research, though by no means is it limited to just this topic of exploration.

This handout (Regression Analysis) provides basic instructions on how to answer research

questions and test hypotheses through the use of linear regression (a technique which examines

the relationship between a dependent variable and a set of independent variables). The value of

the dependent variable (e.g., salespersons total annual sales) can be predicted based on its

relationship to the independent variables used in the analysis (e.g., age, education, and years of

experience). The two research questions proposed for this workshop are as follows:

1. How much will each salesperson make this year?

2. Who will qualify for a $1,000 bonus?

Downloading the Data Files

This handout includes sample data files that can be used for hands-on practice. The data files are

stored in a self-extracting archive. The archive must be downloaded and executed in order to

extract the data files.

! The data files used with this handout are available for download at

http://www.calstatela.edu/its/training/datafiles/pasw17p3.exe.

! Instructions on how to download and extract the data files are available at

http://www.calstatela.edu/its/docs/download.php.

Simple Regression

Simple regression estimates how the value of one dependent variable (Y) can be predicted based

on the value of one independent variable (X). The linear equation for simple regression is as

follows:

Y = aX + b

Simple regression can answer the following research question:

Research Question # 1

How much will each salesperson make this year?

SCATTER PLOT

A scatter plot displays the nature of the relationship between two variables. It is recommended to

run a scatter plot before performing a regression analysis to determine if there is a linear

relationship between the variables. If there is no linear relationship (i.e., points on a graph are

not clustered in a straight line), there is no need to run a simple regression.

PASW Statistics 17 (SPSS 17), Part 3 38

To run a scatter plot:

1. Start PASW Statistics 17.

2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.

3. Locate and open the Regression.sav file.

4. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot (see

Fout!Verwijzingsbron niet gevonden.). The Scatter/Dot dialog box opens (see

Fout!Verwijzingsbron niet gevonden.).

NOTE: To estimate the relationship between two variables, select the Simple Scatter plot.

Figure 52 - Graphs Menu When Selecting

Scatter/Dot

Figure 53 - Scatter/Dot Dialog Box

5. If necessary, select the Simple Scatter option, and then click the Define button (see

Fout!Verwijzingsbron niet gevonden.). The Simple Scatterplot dialog box opens (see

Fout!Verwijzingsbron niet gevonden.).

Figure 54 - Simple Scatterplot Dialog Box

6. Select the variable Last year sales [lastsale] from the list box on the left.

PASW Statistics 17 (SPSS 17), Part 3 39

7. Click the first transfer arrow button to move the variable to the Y Axis: box.

8. Select the variable Years of experience [yearexpe] from the list box on the left.

9. Click the second transfer arrow button to move the variable in the X Axis: box.

10. Click the OK button. The Output Viewer window opens with a scatter plot of the

variables (see Figure 55).

NOTE: A graph similar to Figure 55 will be displayed in the Output Viewer window. This scatter

plot indicates that there is a linear relationship between the variables Last year sales and Years

of experience.

The next step is to find a line that best accommodates the pattern of points in this scatter plot.

The steps on how to enhance graph appearance are included in the last section of this handout.

Figure 55 - Scatter Plot

PREDICTING VALUES OF DEPENDENT VARIABLES

Since it is known that a linear relationship exists between the two variables, the regression

analysis can be performed to predict this years sales.

To run a simple regression analysis:

1. Switch to the Data Editor window.

2. Click the Analyze menu, point to Regression, and select Linear (see Figure 56). The

Linear Regression dialog box opens.

Figure 56 - Analyze Menu When Selecting Linear

PASW Statistics 17 (SPSS 17), Part 3 40

3. Select the variable Last year sales [lastsale] from the variable list box on the left and

move it to the Dependent: box by clicking the first transfer arrow button (see Figure 57).

Figure 57 - Linear Regression Dialog Box

4. Select the variable Years of experience [yearexpe] from the variable list box on the

left and move it to the Independent(s): box by clicking the second transfer arrow button.

5. Click the OK button.

The following tables present the results of a simple regression. R Square (.918) indicates that

this model accounts for almost 92% of the total variation in the data (see Figure 58).

Figure 58 - Model Summary Output

PASW Statistics 17 (SPSS 17), Part 3 41

Figure 59 - Coefficients Output

The slope and the y-intercept as seen in Figure 59 should be substituted in the following linear

equation to predict this years sales: Y = aX + b. In this case, the values of a, b, x, and y will be

as follows:

a = 1954.658

b = 440.987

X = Years of experience (values of independent variable)

Y = Last year sales (values of dependent variable)

PREDICTING THIS YEARS SALES WITH SIMPLE REGRESSION MODEL

To predict this years sales for each salesman, the values of a and b should be substituted in the

following linear equation:

Y = aX + b

Last year sales = (a * yearexpe) + b

This year sales = (1954.658 * yearexp2) + 440.987

a = 1954.658

b = 440.987

X = Years of experience [yearexp2]

Y = This year sales

NOTE: The new independent variable, yearexp2 is used instead of yearexpe in order to predict

this years sales.

To predict this years sales using the computing function:

1. Switch to the Data Editor window.

2. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Figure 60).

3. In the Target Variable: box, type [Simple].

PASW Statistics 17 (SPSS 17), Part 3 42

Figure 60 - Compute Variable Dialog Box

4. In the Numeric Expression: box, enter the following equation by typing or selecting

from the dialog box keypad:

[1954.658 * yearexp2 + 440.987]

NOTE: It is recommended to select the variable yearexp2 directly from the variable list box

on the left of the Compute Variable dialog box to prevent typing mistakes.

5. Click the OK button. The results will be displayed in the Simple column in Data View

(see Figure 61).

Figure 61 - Simple Regression Results

To change the data type for the new variable Simple:

1. Click the Variable View tab at the lower left corner of the Data Editor window (see

Fout!Verwijzingsbron niet gevonden.).

PASW Statistics 17 (SPSS 17), Part 3 43

Figure 62 - Variable View Tab

2. Locate the variable Simple and click the Ellipses button

under the Type column.

The Variable Type dialog box opens (see Fout!Verwijzingsbron niet gevonden.).

3. Select the Dollar option, and then select the $###,###,### format (12 digits width with 0

decimal places).

Figure 63 - Variable Type Dialog Box

4. Click the OK button, and then click the Data View tab.

Figure 64 - Simple Regression Prediction

NOTE: The prediction of this years sales for each salesperson are computed under the new

variable named Simple as shown in Fout!Verwijzingsbron niet gevonden..

Multiple Regression

Multiple regression estimates the coefficients of the linear equation when there is more than one

independent variable that best predicts the value of the dependent variable. For example, it is

possible to predict a salespersons total annual sales (the dependent variable) based on

independent variables such as age, education, and years of experience. The linear equation for

multiple regression is as follows:

Z = aX + bY + c

PREDICTING VALUES OF DEPENDENT VARIABLES

The previous section demonstrated how to predict this years sales (the dependent variable)

based on one independent variable (number of years of experience) by using simple regression

analysis. Similarly, this years sales (the dependent variable) can be predicted from more than

PASW Statistics 17 (SPSS 17), Part 3 44

one independent variable, such as Years of experience and Years of education, by using

multiple regression analysis.

To run multiple regression analysis:

1. Click the Analyze menu, point to Regression, and select Linear. The Linear

Regression dialog box opens (see Figure 65).

2. From the variable list box, select Last year sales [lastsale] as a dependent variable and

move it to the Dependent: box by clicking the first transfer arrow button .

3. From the variable list box, select Years of experience [yearexpe] and Years of

education [educatio] and move them to the Independent(s): box by clicking the second

transfer arrow button .

4. Click the OK button.

NOTE: If there are variables in the Independent(s): or Dependent: boxes, click the Reset button

before performing steps 2 and 3 above.

Figure 65 - Linear Regression Dialog Box

Figure 66 - Model Summary Output for Multiple

Regression

NOTE: The table should look similar to

Fout!Verwijzingsbron niet gevonden.. R

Square = .976 indicates that this model can

predict this years sales almost 98% correctly.

Figure 67 - Multiple Regression Output

PASW Statistics 17 (SPSS 17), Part 3 45

The slopes and the y-intercept as seen in Figure 67 should be substituted in the following linear

equation to predict this years sales: Z = aX+ bY + c

In this case, the values of a, b, x, and y will be as follows:

a = 1874.5

b = 609.391

c = (-8510.838)

X = Years of experience (independent variable)

Y = Years of education (independent variable)

Z = This year sales (dependent variable)

As indicated in the output table, the coefficient for Years of experience is 1874.5and the

coefficient for Years of education is 609.391.

PREDICTING THIS YEARS SALES WITH MULTIPLE REGRESSION MODEL

To predict this years sales for each salesman, the values of a, b, and c should be substituted in

the following linear equation: Z = aX + bY + c

This year sales = 1874.5 * Years of experience + 609.391 * Years of education + (-8510.838)

To predict this years sales by multiple regression analysis:

1. Switch to the Data Editor window.

2. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Figure 68).

3. Click the Reset button.

4. In the Target Variable: box, type [multiple].

5. In the Numeric Expression: box, enter the following equation by typing or selecting

from the dialog box keypad:

[1874.5 * yearexp2 + 609.391 * educatio - 8510.838]

Figure 68 - Compute Variable Dialog Box

PASW Statistics 17 (SPSS 17), Part 3 46

6. Click the OK button. The results will be displayed in the multiple column in Data View

(see Fout!Verwijzingsbron niet gevonden.).

Figure 69 - Multiple Regression Results

NOTE: The predictions of sales for each salesperson using two independent variables are listed under the

new variable named multiple.

Data Transformation

Situations may arise where data transformation is useful. Most data transformations can be done

with the Compute command. Using this command, the data file can be manipulated to fit

various statistical performances.

Research Question # 2

Who will earn a $1,000 bonus?

COMPUTING

Since each persons yearly sales were already predicted, those who made more than $2,000

above the predicted values, obtained via multiple regression analysis, will receive $1,000 as a

bonus. Using the Compute command, those salespeople who met the criteria can be easily

located by comparing the values of this years actual sales with the predictions from multiple

regression analysis computed in the previous lesson.

The first step in predicting who will receive a bonus is to calculate the difference between this

years actual sales and the prediction of this years sales from the multiple regression analysis.

To predict who will qualify for the bonus:

1. Open the Bonus.sav file.

2. If the Save As dialog box opens, click the No button.

3. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Fout!Verwijzingsbron niet gevonden.).

4. In the Target Variable: box, type [bonus].

5. In the Numeric Expression: box, type [1000].

PASW Statistics 17 (SPSS 17), Part 3 47

Figure 70 - Compute Variable Dialog Box

6. Click the If button. The Compute Variable: If Cases dialog box opens (see Figure 71).

7. Select the Include if case satisfies condition: option.

8. Enter the following expression by typing or selecting from the dialog box keypad:

[thissale - multiple >= 2000]

Figure 71 - Compute Variable: If Cases Dialog Box

NOTE: It is recommended that you select the variables and the >= sign directly from the variable

list box and keypad provided in the dialog box to prevent mistakes.

PASW Statistics 17 (SPSS 17), Part 3 48

9. Click the Continue button, and then click the OK button.

NOTE: Salespersons #49 Jason and #44

Ivett are a couple of the sales personnel

who will be qualified to receive a $1,000

bonus due to them making $2,000 over

their predicted sales from last lesson (see

Fout!Verwijzingsbron niet gevonden.).

Figure 72 - Bonus Results

Polynomial Regression

This type of regression involves fitting a dependent variable (Y

i

) to a polynomial function of a

single independent variable (X

i

). The regression model is as follows (see Table 6 for the meaning

of the variables):

Y

i

= a + b

1

X

i

+ b

2

X

i

2

+ b

3

X

i

3

+ + b

k

X

i

k

+ e

i

Table 6 - Breakdown of the Variables

Variable Meaning

a Constant

b

j

The coefficient for the independent variable to the jth power

e

i

Random error term

REGRESSION ANALYSIS

To look at the growth relationship between weight and age:

1. Open the Growth.sav file.

2. Click the Analyze menu, point to Regression, and select Curve Estimation. The

Curve Estimation dialog box opens to define the parameters of the analysis (see Figure

73).

3. Transfer the wght variable to the Dependent(s): box and the age variable to the

Independent Variable: box.

NOTE: The weight (dependent) variable is what is being predicted using the age (independent)

variable.

4. Deselect the Plot models check box.

5. Select the Display ANOVA table check box.

6. Under Models, deselect the Linear check box and select the Cubic check box.

7. Click the OK button.

PASW Statistics 17 (SPSS 17), Part 3 49

Figure 73 - Curve Estimation Dialog Box

Analyzing the Results

This cubic model has an R2 of 99.567% (see Figure 74). The F-ratio indicates a highly

significant fit. The best fitting cubic polynomial is given by the follow equation:

(Where Y

i

is weight and X

i

is age);

Y

i

= 0.052 0.017 X

i

+ 0.010 X

i

2

0.001 X

i

3

+ e

i

Multiple regression can be used to fit polynomials of higher order. If X is the dependent variable,

use the Transform and Compute options of the Data Editor (as discussed earlier in this lesson)

to create new variables X2 = X*X, X3 = X*X2, X4 = X*X3, etc., then use these new variables

(X, X2, X3, X4, etc.) as a set of independent variables for a multiple regression analysis.

PASW Statistics 17 (SPSS 17), Part 3 50

Figure 74 - Polynomial Regression Summary Results

Chart Editing

During the final stage of research, enhancing the appearance of charts and figures can be very

helpful for readers to understand what may seem to be confusing statistics. This will save the

time and effort to copy and paste an object from one program to another and to modify its

features. The following steps explain some useful methods to enhance the appearance of a chart.

ADDING A LINE TO THE SCATTER PLOT

Adding a straight line to fit the scattered pattern of a data chart can help emphasize the linear

relationship among the data.

To add a line to the scatter plot:

1. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot.

2. Select the Simple Scatter option, and then click the Define button.

3. Transfer the age variable to the X Axis: box and the wght variable to the Y Axis:

box, and then click the OK button. A chart appears in the Output Viewer window.

4. Double-click the chart in the Output Viewer window to modify it. The Chart Editor

window opens (see Fout!Verwijzingsbron niet gevonden.).

5. Right-click a chart marker (see Fout!Verwijzingsbron niet gevonden.) and select Add

Fit Line at Total from the shortcut menu.

6. Under Fit Method, select the Cubic option, and then click the Apply button.

7. Close the Chart Editor window.

NOTE: Notice that the Add Fit Line at Total does not capture the way the data curves, but the

cubic method is almost a perfect fit (see Figure 77).

PASW Statistics 17 (SPSS 17), Part 3 51

Figure 75 - Chart Editor Window

Figure 76 - Chart Markers

Figure 77 - Adding a Fit Line to the Scatter Plot

MANIPULATING THE SCALES ON X- AND Y-AXES

The X-axis and Y-axis can be adjusted to enhance the overall appearance and readability of the

chart. Various elements of the axes can be manipulated, such as scale, ticks and grids, number

format, and axis label.

To manipulate the scales on the X-axis:

1. If necessary, open the Regression.sav file.

2. Run the scatter plot where the Y-axis is Last year sales and the X-axis is Years of

experience.

3. Double-click the chart to open the Chart Editor window.

4. Click the Select the X axis button on the Standard toolbar to manipulate the X-axis.

The Properties dialog box opens.

5. Select the Scale tab (see Fout!Verwijzingsbron niet gevonden.).

6. Change the value in the Lower margin (%): box to 0.

7. Select the Labels & Ticks tab (see Fout!Verwijzingsbron niet gevonden.).

8. In the Major Ticks section, select the Display ticks check box.

9. Click the Style arrow and select Inside from the list.

PASW Statistics 17 (SPSS 17), Part 3 52

Figure 78 - X-axis Properties Dialog Box: Scale

Tab

Figure 79 - X-axis Properties Dialog Box: Labels

& Ticks Tab

10. Click the Show Grid Lines button on the Standard toolbar to show the Properties

dialog box.

11. Select the Grid Lines tab, select the Major ticks only option, click the Apply button, and

then click the Close button (see Fout!Verwijzingsbron niet gevonden.).

12. Click the Select the Y axis button on the Standard toolbar to manipulate the Y-axis.

The Properties dialog box opens.

13. Select the Scale tab (see Fout!Verwijzingsbron niet gevonden.).

Figure 80 - Properties Dialog Box: Grid Lines Tab

PASW Statistics 17 (SPSS 17), Part 3 53

Figure 81 - Y-axis Properties Dialog Box: Scale

Tab

14. Change the value in the Lower margin (%:) box to 0.

15. Click the Apply button, and then click the Close button.

Figure 82 - Before Manipulating the X-axis

Figure 83 - After Manipulating the X-axis

ADDING A TITLE TO THE CHART

Adding a title to the chart is a simple process that enhances the charts appearance.

To add a title to a chart:

1. In the Chart Editor window, click in a blank area outside the first chart to select the

whole chart, then move the mouse pointer to one of the selection handles until it becomes

a two-headed arrow.

2. Drag the mouse pointer to reduce the chart size.

3. Click the Insert a text box button on the Standard toolbar. The text box appears

above the chart and the Properties dialog box opens.

4. Type Relationship Between Last Year Sales and Years of Experience in the text box.

5. Click the border of the text box to select it.

PASW Statistics 17 (SPSS 17), Part 3 54

6. Select the Text Style tab in the Properties dialog box, select a color for the title text, click

the Apply button, and then click the Close button.

7. Click the Bold button on the Standard toolbar, and change the Font Size to 12.

8. Resize the text box to fit the text.

9. If necessary, resize the chart to display the title at the top of the chart (see

Fout!Verwijzingsbron niet gevonden.).

Figure 84 - Adding a Title to the Chart

ADDING COLORS TO THE CHART

All elements on the chart can be colored differently to add emphasis or distinguish between

elements.

To add colors to a chart:

1. In the Chart Editor window, select the chart element to change or add color to, such as

one of the plots (see Fout!Verwijzingsbron niet gevonden.).

2. Click the Show Properties Window button on the Standard toolbar. The Properties

dialog box opens (see Fout!Verwijzingsbron niet gevonden.).

3. Select the Marker tab, and then select a color from the color palette.

4. To change the marker type, click the Type arrow in the Marker section and select a

symbol from the menu (see Fout!Verwijzingsbron niet gevonden.).

5. View the changes in the Preview section.

6. Click the Apply button, and then click the Close button.

PASW Statistics 17 (SPSS 17), Part 3 55

Figure 85 - Adding Color to the Chart

Figure 86 - Properties Dialog Box

FILLING A BACKGROUND COLOR

The background color can also be filled to make the chart stand out.

To fill in a background color:

1. Click inside a blank area of the chart to select the entire chart area (see

Fout!Verwijzingsbron niet gevonden.).

2. Click the Show Properties Window button on the Standard toolbar. The Properties

dialog box opens.

3. Select the Fill swatch .

4. Click the Pattern arrow and select a background pattern.

5. Click the Apply button, and then click the Close button.

Figure 87 - Filling a Background Color

PASW Statistics 17 (SPSS 17), Part 4 56

PASW Statistics 17 (SPSS 17), Part 4 57

Introduction Part 4

PASW stands for Predictive Analytics Software. This program can be used to analyze data

collected from surveys, tests, observations, etc. It can perform a variety of data analyses and

presentation functions, including statistical analysis and graphical presentation of data. Among

its features are modules for statistical data analysis. These include 1) descriptive statistics, such

as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and

multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,

cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for

survey research, though by no means is it limited to just this topic of exploration.

This handout (Chi-Square and ANOVA) introduces basic skills for performing hypothesis tests

utilizing Chi-Square test for Goodness-of-Fit and generalized pooled t tests, such as ANOVA.

The step-by-step instructions will guide the user in performing tests of significance using

PASW Statistics and help the user understand how to interpret the output for research questions.

Downloading the Data Files

This handout includes sample data files that can be used for hands-on practice. The data files are

stored in a self-extracting archive. The archive must be downloaded and executed in order to

extract the data files.

! The data files used with this handout are available for download at

http://www.calstatela.edu/its/training/datafiles/pasw17p4.exe.

! Instructions on how to download and extract the data files are available at

http://www.calstatela.edu/its/docs/download.php.

Chi-Square

The Chi-Square (!

2

) test is a statistical tool used to examine differences between nominal or

categorical variables. The Chi-Square test is used in two similar but distinct circumstances:

To estimate how closely an observed distribution matches an expected distribution also

known as the Goodness-of-Fit test.

To determine whether two random variables are independent.

CHI-SQUARE TEST FOR GOODNESS-OF-FIT

This procedure can be used to perform a hypothesis test about the distribution of a qualitative

(categorical) variable or a discrete quantitative variable having only finite possible values. It

analyzes whether the observed frequency distribution of a categorical or nominal variable is

consistent with the expected frequency distribution.

With Fixed Expected Values

'()(*+,- ./()01$# 2 "

3*# 0-( -$)410*5 ),-(6/5( 61),-*+7( )/44$+0 )0*88 (9(#5: 0-+$/7-$/0 0-( ;((<=

A large hospital schedules discharge support staff assuming that patients leave the hospital at a

fairly constant rate throughout the week. However, because of increasing complaints of staff

shortages, the hospital administration wants to determine whether the number of discharges

varies by the day of the week.

PASW Statistics 17 (SPSS 17), Part 4 58

H!: Patients leave the hospital at a constant rate (there is no difference between the discharge

rates for each day of the week).

To perform the analysis:

1. Start PASW Statistics 17.

2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.

3. Navigate to the data files folder, select the chi-hospital.sav file, and then click the

Open button.

Before the Chi-Square test is run, the observed values need to be declared.

To declare the observed values:

1. Click the Data menu and select Weight Cases. The Weight Cases dialog box opens

(see Figure 88).

Figure 88 - Weight Cases Dialog Box

2. Select the Weight cases by option.

3. Select the Average Daily Discharges [discharge] variable and transfer it to the

Frequency Variable: box.

4. Click the OK button.

To perform the analysis:

1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square. The

Chi-Square Test dialog box opens (see Figure 89).

PASW Statistics 17 (SPSS 17), Part 4 59

Figure 89 - Chi-Square Test Dialog Box

2. Select the Day of the Week [dow] variable and transfer it to the Test Variable List: box

(see Figure 89).

3. Click the OK button. The Output Viewer window opens (see Figure 90).

Figure 90 - Chi-Square Frequencies Output Table

Figure 91 - Chi-Square Test Statistics Output Table

PASW Statistics 17 (SPSS 17), Part 4 60

Reporting the analysis results:

H

0

: Rejected in favor of H

1

.

H

1

: Patients do not leave the hospital at a constant rate.

Explanation: Figure 91 indicates that the calculated "

2

statistic, for six degrees of freedom, is

29.389. Additionally, it indicates that the significance value (0.000) is less than the usual

threshold value of 0.05. This suggests that the null hypothesis, H

0

(patients leave the hospital at a

constant rate), can be rejected in favor of the alternate hypothesis, H

1

(patients leave the hospital

at different rates during the week).

With Fixed Expected Values and within a Contiguous Subset of Values

By default, the Chi-Square test procedure builds frequencies and calculates an expected value

based on all valid values of the test variable in the data file. However, it may be desirable to

restrict the range of the test to a contiguous subset of the available values, such as weekdays only

(Monday through Friday).

Research Question # 2

>-( -$)410*5 +(?/()0) * 8$55$;@/4 *#*5:)1)A ,*# )0*88 B( ),-(6/5(6 *))/C1#7 0-*0

4*01(#0) 61),-*+7(6 $# ;((<6*:) $#5: DE$#6*: 0-+$/7- F+16*:G 5(*9( *0 * ,$#)0*#0

6*15: +*0(=

H!: Patients discharged on weekdays only (Monday through Friday) leave at a constant daily

rate.

To run the analysis:

1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square. The

Chi-Square Test dialog box opens.

2. Select the Use specified range option (see Figure 89).

3. Enter [2] in the Lower: box and [6] in the Upper: box.

4. Click the OK button. The Output Viewer window opens (see Figure 92 and Figure 93).

Notice that the test range is restricted to Monday through Friday.

Figure 92 - Chi-Square (Subset) Frequencies Output Table

Figure 93 - Test Statistics Output Table

NOTE: The expected values are equal to the sum of the observed values divided by the number of

rows, while the observed values are the actual numbers of patients discharged.

Reporting the analysis results:

H!A Bo not ieject. Patients discharged on weekdays only (Monday through Friday) leave at a

constant daily rate.

PASW Statistics 17 (SPSS 17), Part 4 61

Explanation: Figure 92 indicates that on average, about 92 patients were discharged from the

hospital each weekday. The rate for Mondays was below average and the rate for Fridays was

greater than average. Figure 93 indicates that the calculated value of the Chi-Square statistic was

5.822 at four degrees of freedom. Because the significance level (0.213) is greater than the

rejection threshold of 0.05, H! (patients were discharged at a constant rate on weekdays) could

not be rejected.

Using the Chi-Square test procedure, it was determined that the rate at which patients were

discharged from the hospital was not constant over the course of an average week. This was

primarily due to a greater number of discharges on Fridays and fewer discharges on Sundays.

When the range of the test was restricted to weekdays, the discharge rates appeared to be more

uniform. Staff shortages could be corrected by adopting separate weekday and weekend staff

schedules.

With Customized Expected Values

Research Question # 3

Does first-class mailing provide quicker response time than bulk mail?

A manufacturer tries first-class postage for direct mailings, hoping for faster responses than with

bulk mail. Order takers record how many weeks each order takes after mailing.

H

0

: First-class and bulk mailings do not result in different customer response times.

Before the Chi-Square test is run, the cases must be weighted. Because this example compares

two different methods, one method must be selected to provide the expected values for the test

and the other will provide the observed values.

To weight the cases:

1. Open the chi-mail.sav file.

2. Click the Data menu and select Weight Cases. The Weight Cases dialog box opens.

3. Select the Weight cases by option.

4. Select the First Class Mail [fcmail] variable and transfer it to the Frequency Variable:

box.

5. Click the OK button.

To run the analysis:

1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square. The

Chi-Square Test dialog box opens.

2. Select the Week of Response [week] variable and transfer it to the Test Variable List:

box.

3. Select the Values: option in the Expected Values section.

4. Enter [6] in the Values: box.

5. Click the Add button.

6. Repeat steps 4 and 5, adding the values [15.1], [18], [12], [11.5], [9.8], [7], [6.1], [5.5],

[3.9], [2.1], and [2] (in that order).

7. Click the OK button. The Output Viewer window opens.

NOTE: The expected frequencies in this example are the response percentages that the firm has

historically obtained with bulk mail.

PASW Statistics 17 (SPSS 17), Part 4 62

Figure 94 - First-Class/Bulk Mail Week of Response

Figure 95 - Week of Response Test Statistics

Reporting the analysis results:

H

0

: Do not reject. There was no statistical difference between customer response times using

first-class mailing and customer response times using bulk mailing.

Explanation: The manufacturer hoped that first-class mail would result in quicker customer

response. As indicated in Figure 94, the first two weeks indicated different response times of

four and seven percentage points, respectively. The question was whether the overall differences

between the two distributions were statistically significant.

The Chi-Square statistic was calculated to be 12.249 at eleven degrees of freedom (see Figure

95). The significance value (p) associated with the data was 0.345, which was greater than the

threshold value of 0.05. Hence, H! was not rejected because there was no significant difference

between first-class and bulk mailings. The first-class mail promotion did not result in response

times that were statistically different from standard bulk mail. Therefore, bulk postage was more

economical for direct mailings.

One-Way Analysis of Variance

One-way analysis of variance (One-Way ANOVA) procedures produce an analysis for a

quantitative dependent variable affected by a single factor (independent variable). Analysis of

variance is used to test the hypothesis that several means are equal. This technique is an

extension of the two-sample t test. It can be thought of as a generalization of the pooled t test.

Instead of two populations (as in the case of a t test), there are more than two populations or

treatments.

Research Question # 4

Which of the alloys tested would be appropriate for creating an underwater sensor array?

PASW Statistics 17 (SPSS 17), Part 4 63

To create an underwater sensor array, four different alloys are tested for corrosion resistance.

Five plates of the same size of each alloy are placed underwater for 60 days. After 60 days, the

number of corrosion pits on each plate is measured.

H

0

: The four alloys exhibit the same kind of behavior and are not different from one another.

To run One-Way ANOVA:

1. Open the alloy.sav file.

NOTE: Each case within the One-Way ANOVA data file represents one of the 20 metal plates

(five plates of four different alloys) and is characterized by two variables. One variable assigns a

numeric value to the alloy. The other variable is used to quantify the number of pits on the plate

after being underwater for 60 days (see Figure 96).

Figure 96 - Alloy Data File

2. In Data View, click the Analyze menu, point to Compare Means, and select One-Way

ANOVA. The One-Way ANOVA dialog box opens (Figure 97).

Figure 97 - One-Way ANOVA Dialog Box

3. Select the pits variable from the box on the left and transfer it to the Dependent List:

box (see Figure 97).

4. Select the Alloy [alloy] variable from the box on the left and transfer it to the Factor:

box (see Figure 97).

5. Click the Options button. The One-Way ANOVA: Options dialog box opens (see

Figure 98).

PASW Statistics 17 (SPSS 17), Part 4 64

Figure 98 - One-Way ANOVA: Options Dialog Box

6. Select the Descriptive, Homogeneity of variance test, and Means plot check boxes.

7. Click the Continue button.

8. Click the OK button. The Output Viewer window opens.

Figure 99 - ANOVA Descriptive Output

Figure 100 - Output for Test of Homogeneity of Variances

Figure 101 - ANOVA Output

Reporting the analysis results:

H

0

: Reject in favor of H

1

.

H

1

A The four alloys do not exhibit the same kind of behavior. They are statistically different from

one another.

Explanation: Figure 99 lists the means, standard deviations, and individual sample sizes of each

alloy. Figure 100 provides the degrees of freedom and the significance level of the population;

df1 is one less than the number of sample alloys (4-1=3) and df2 is the difference between

PASW Statistics 17 (SPSS 17), Part 4 65

the total sample size and the number of sample alloys (20-4=16). Figure 101 lists the sum of the

squares of the differences between means of different alloy populations and their mean square

errors. In Figure 101, the Between Groups variation 6026.200 is due to interaction in samples

between groups. If sample means are close to each other, this value is small. The Within

Groups variation 335.600 is due to differences within individual samples. The Mean

Square values are calculated by dividing each Sum of Squares value by its respective degree

of freedom (df). The table also lists the F statistic 95.768, which is calculated by dividing the

Between Groups Mean Square by the Within Groups Mean Square. The significance level

of 0.000 is less than the threshold value of 0.05 and indicates that the null hypothesis can be

rejected, leading to the conclusion that the alloys are not all the same.

Post Hoc Tests

In ANOVA, if the null hypothesis is rejected, then it is concluded that there are differences

between the means (

1,

2,

,

a

). It is useful to know specifically where these differences exist.

Post hoc testing identifies these differences. Multiple comparison procedures look at all possible

pairs of means and determine if each individual pairing is the same or statistically different. In an

ANOVA with # treatments, there will be #*(#-1)/2 possible unique pairings, which could mean a

large number of comparisons.

Research Question # 5

Is the mean difference between alloy sets statistically significant?

The previous null hypothesis was rejected, leading to the conclusion that all the alloys do not

exhibit the same behavior. The next part of the analysis is to determine if the mean difference

between individual alloy sets is statistically significant.

H!A

0

=

1

=

a

H"A

0

=

1

=

a

To run post hoc tests:

1. In Data View, click the Analyze menu, point to Compare Means, and select One-Way

ANOVA. The One-Way ANOVA dialog box opens (see Figure 102).

Figure 102 - One-Way ANOVA Dialog Box

2. Click the Post Hoc button. The One-Way ANOVA: Post Hoc Multiple Comparisons

dialog box opens (see Figure 103).

PASW Statistics 17 (SPSS 17), Part 4 66

3. Select the LSD check box, click the Continue button, and then click the OK button. The

Output Viewer window opens.

NOTE: LSD stands for List Square Difference, which compares the means one by one.

Figure 103 - One-Way ANOVA: Post Hoc Multiple Comparisons Dialog Box

Figure 104 - Multiple Comparisons Output

Figure 105 - Means Plot

Reporting the analysis results:

PASW Statistics 17 (SPSS 17), Part 4 67

H

0

A Reject in favoi of H

1

.

H

1

:

At least one of the means is uiffeient.

Explanation: Figure 104 shows the results of comparing pairs of means between different alloy

sets. Each row indicates the difference between the two corresponding treatments. Alloys 1

and 4 have a mean difference of 2.4 (a relatively small value). Also, the significance level of

0.420 indicates that the null hypothesis cannot be rejected for the comparison of alloys 1 and

4.

There is no statistically significant difference between them. Alloy pairs 1 and 2, 1 and

3, 2 and 3, 2 and 4, and 3 and 4 have large mean differences with significance

values of 0.000. In these cases, the null hypothesis can be rejected, leading to the conclusion

that they are statistically different. Also, the means plot (see Fout!Verwijzingsbron niet

gevonden.) shows that both alloys 1 and 4 have average mean values of pits very close to

each other. Because alloys 1 and 4 have the lowest mean number of corrosion pits, they are

the best candidates for the array. Depending on the relative costs of the two alloys, the one that is

more cost effective can be selected to construct the array.

Two-Way Analysis of Variance

Two-way analysis of variance (Two-Way ANOVA) is an extension to the one-way analysis of

variance. The difference is that instead of running the test by using a single independent variable,

two or more independent variables can be used to run the test in two-way analysis of variance.

There are several advantages in using several variables over using a one variable design. Some of

the advantages are a two-variable design ANOVA is more efficient and it also helps increase

statistical power of the result.

Research Question # 6

Will typing ability and test method affect student test scores?

To answer the question, an essay final is given to the class. Two test methods are used half the

students are assigned to write the final with a blue-book and the other half with notebook

computers. In addition, the students are partitioned into three groups, namely: no typing ability,

some typing ability, and highly skilled at typing. After evaluating the final, the mean score of

each group is examined.

H

0

: Typing ability and test method do not affect student test scores.

H": Typing ability and test method do affect student test scores.

To run Two-Way ANOVA:

1. Open the Two-Way-ANOVA.sav file (see Figure 106).

PASW Statistics 17 (SPSS 17), Part 4 68

Figure 106 - Two-Way ANOVA Data File

2. In Data View, click the Analyze menu, point to General Linear Model, and select

Univariate (see Figure 107). The Univariate dialog box opens (see Figure 108).

Figure 107 - Analyze Menu When Selecting

Univariate

Figure 108 - Univariate Dialog Box

3. Select the SCORE variable from the box on the left and transfer it to the Dependent

Variable: box (see Figure 108).

4. Select the ABILITY and METHOD variables from the box on the left and transfer

them to the Fixed Factor(s): box (see Figure 108).

5. Click the Options button. The Univariate: Options dialog box opens (see Figure 109).

6. Select the Descriptive statistics check box.

7. Click the Continue button.

8. Click the OK button. The Output Viewer window opens (see Figure 110 and Figure

111).

PASW Statistics 17 (SPSS 17), Part 4 69

Figure 109 - Univariate: Options Dialog Box

Figure 110 - ANOVA Descriptive Output Table

Figure 111 - Output Table for Tests of Between-Subjects Effects

PASW Statistics 17 (SPSS 17), Part 4 70

Reporting the analysis results:

H

0

: Reject in favor of H

1

for Ability and the interaction between Ability and Method

(Ability*Method).

H

1

A Typing ability and test method affect student test scores.

Explanation: Figure 110 lists the means and standard deviations from three abilities in two

methods. Students who have some typing ability and use the computer method achieve the

highest mean score (mean=36.67). As indicated in Figure 111, because the significance value of

Method (0.901) is more than the threshold value (0.05), it can be concluded that the Method

factor alone does not affect test scores. The significance values of Ability (0.033) and the

interaction between the two factors Ability*Method (0.047) are less than the threshold value

(0.05), leading to the conclusion that Ability and the combination of Ability and Method

(Ability*Method) do affect student test scores.

Importing/Exporting Microsoft Excel and PowerPoint

PASW Statistics can be used to analyze data in a Microsoft Excel spreadsheet. PASW Statistics

provides the ability to directly import an Excel spreadsheet into the Data Editor and

automatically create variables based on the column headings in the spreadsheet. PASW Statistics

can also export data from the Data Editor window to Microsoft PowerPoint.

To import an Excel spreadsheet into PASW Statistics:

1. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens (see

Figure 112).

2. Click the Files of type: arrow and select the Excel (*.xls, *.xlsx, *.xlsm) file type.

3. Locate and open the demo.xls file. The Opening Excel Data Source dialog box opens

(see Figure 113).

For additional handouts, visit http://www.youtube.com/mycsula.

Figure 112 - Open Data Dialog Box

Figure 113 - Opening Excel Data

Source Dialog Box

4. Click the OK button. PASW Statistics will process and read the Excel file and convert all

first row column headings into variables using the best approximation for the variable

attributes (see Figure 114 and Figure 115).

NOTE: If the Excel file contains multiple worksheets, select the desired worksheet by clicking

the Worksheet: arrow. Additionally, if only a specific range of cells in the worksheet is to be

imported, the range must be specified in the Range: box.

Figure 114 - Excel File

PASW Statistics 17 (SPSS 17), Part 4 72

Figure 115 - Excel File Imported into PASW Statistics

The reverse situation may also arise, where data in a PASW Statistics file must be analyzed

using Excel. This can be accomplished by exporting the contents of the Data Editor into an

Excel spreadsheet.

To export PASW Statistics data into an Excel spreadsheet:

1. In the Data Editor, click the File menu and select Save As. The Save Data As dialog

box opens (see Figure 116).

2. Click the Save as type: arrow and select the Excel 97 through 2003 (*.xls) or the

Excel 2007(*.xlsx) file type.

NOTE: Selecting the Write variable names to spreadsheet check box will cause PASW Statistics

to write the variable names as column headings in the spreadsheet.

NOTE: If only certain variables from the Data Editor are desired in the spreadsheet, the user can

click the Variables button and select/deselect variables in the Save Data As: Variables dialog

box (see Figure 117).

PASW Statistics 17 (SPSS 17), Part 4 73

Figure 116 - Save Data As Dialog Box

Figure 117 - Save Data As: Variables Dialog Box

3. Select a destination drive by clicking the Look in: arrow.

4. Enter a name for the Excel file in the File name: box.

5. Click the Save button.

NOTE: Upon completion, the Output Viewer window will open with a report summarizing the

details and results of the export operation (see Figure 118).

Figure 118 - PASW Statistics Export Output Report

To export PASW Statistics Output charts into a PowerPoint slide:

1. In the Output Viewer window, click on the designated table. A box appears around the

table and a red arrow to the left of it.

2. Click the File menu and select Export. The Export Output dialog box opens (see

Figure 119).

3. Click the Type: arrow and select the PowerPoint (*.ppt) file type.

4. Click the Browse button. The Save File dialog box opens (see Figure 120).

5. Select a destination drive by clicking the Look in: arrow.

6. Enter a name for the PowerPoint file in the File name: box.

7. Click the Save button.

8. Click the OK button.

PASW Statistics 17 (SPSS 17), Part 4 74

Figure 119 - Export Output Dialog Box

Figure 120 - Save File Dialog Box

Using Scripting for Redundant Statistical Analyses

Every statistical analysis used by PASW Statistics is executed through a special programming

language. The specific code used for each analysis can be captured, stored as a script file, and

edited if necessary. A series of scripts in a script file can be run either individually or all at the

same time. Scripting automates a series of statistical analyses that are performed on a data file

that changes data, but always contains the same variables. Scripts are captured and edited in the

PASW Statistics Syntax Editor window.

PASW Statistics 17 (SPSS 17), Part 4 75

The following example illustrates the usefulness of capturing, storing, and running scripts. The

data for the example is taken from a classroom setting for a class that lasts one week. At the end

of each week, data is compiled for each student. The variables in the set include the subject

name, gender, pretest scores, posttest scores, grade point average, computer ownership, and

method of administering examinations for that individual. Each week, a report is generated that

answers a series of questions about the class from the previous week. The questions answered

and the statistical analyses used are the same every week, as described in Table 7.

Table 7 - Scripted Questions and Statistical Techniques

Question Statistical Technique(s) to Answer Question

Does the data set include equal numbers of

each gender and each test method?

Split the file

Crosstabs

Is there a difference between the male and

female pretest scores?

Select all cases

Independent-Samples T Test

Is there a difference between the male and

female posttest scores?

Independent-Samples T Test

Is there a difference between the overall

pretest and posttest scores?

Paired-Samples T Test

Do gender, computer ownership, and test

method affect test scores?

Three-Way ANOVA

Do gender, computer ownership, and test

method affect test scores differently

depending on gender?

Split the file

Two-Way ANOVA

Is there a linear relationship between the

pretest and posttest scores for each gender?

Scatter plot graph with file split

Can pretest scores predict posttest scores for

each gender?

Simple regression with file split

Is there an overall linear relationship between

pretest and posttest scores?

Select all cases

Scatter plot graph

Can pretest scores predict posttest scores? Simple regression

To construct a script file that will automatically run the analyses:

1. Open the ClassData.sav file.

2. Click the Edit menu and select Options. The Options dialog box opens (see Figure

121).

3. Click the Viewer tab, select the Display commands in the log check box, click the Apply

button, and then click the OK button.

NOTE: The script file is built by performing each statistical analysis in the desired order. All

analyses must be performed manually one time while the file is being built. In the current

example, the file will first be split, and then a crosstab table will be constructed.

4. Click the Data menu and select Split File. The Split File dialog box opens.

5. Select the Compare groups option and transfer the gender variable to the Groups

Based on: box.

6. Click the Paste button to add the command to the script file. The Split File dialog box

closes and the PASW Statistics Syntax Editor window opens with the pasted command

displayed (see Figure 122).

PASW Statistics 17 (SPSS 17), Part 4 76

7. In the PASW Statistics Data Editor window, click the Analyze menu, point to

Descriptive Statistics, and select Crosstabs. The Crosstabs dialog box opens.

Figure 121 - Options Dialog Box

Figure 122 - PASW Statistics Syntax Editor Window

8. Move the gender variable to the Row(s): box and the method variable to the

Column(s): box.

9. Click the Paste button. The Crosstabs dialog box closes and the command is pasted in

the PASW Statistics Syntax Editor window (see Figure 123). The first question in

Table 7 has been entered into the script file.

NOTE: Scripts for each of the remaining analytical techniques would be entered into the script

file by using the Paste button in each dialog box after the parameters were set.

Figure 123 - PASW Statistics Syntax Editor

Window

Figure 124 - Save Syntax As Dialog Box

10. Save the script file by clicking the File menu in the PASW Statistic Syntax Editor

window and selecting Save As. The Save Syntax As dialog box opens (see Figure 124).

11. Enter the location and name for the file and click the Save button.

PASW Statistics provides several options when running a script file. PASW Statistics script files

have the .sps file extension. The Run menu of the PASW Statistic Syntax Editor contains

commands for All, Selection, Current, and To End.

PASW Statistics 17 (SPSS 17), Part 4 77

To run an existing script file:

1. In the Data Editor window, click the File menu, point to Open, and select Syntax

(see Figure 125). The Open Syntax dialog box opens.

2. Locate and open the WeeklyAnalysis.SPS syntax file. The PASW Statistics Syntax

Editor window opens with the script displayed.

3. In the PASW Statistics Syntax Editor, click the Run menu and select All (see Figure

126). Every command in the script file will execute and the results will be displayed in

the Output Viewer window.

NOTE: If the Display commands in the log check box in the Viewer tab of the Options dialog

box remains selected, individual script commands will appear with the output in the Output

Viewer window.

Figure 125 - File Menu When Selecting Syntax

Figure 126 - Run (Syntax) Menu

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.