Вы находитесь на странице: 1из 12

4-1

Session 4

Using Compute to Create New VariabIes

page

Basic command 4-2
Mathematical Expressions 4-2
Example 4-4
Missing Values and Compute 4-6
The Count Command 4-8
Practical session 4 4-10


4-2
SESSION 4: Using Compute to Create New VariabIes

The Compute command allows you to create new variables and assign
values to these variables for each case.

Basic Command

New variable = mathematical expression

Upon selecting ...

Transform
Compute ...

the Compute VariabIe dialog screen will appear and prompt you for a
new variable name, a Target VariabIe, and a NumericaI Expression.
Before we look at an example, am going to describe some of the
different forms of numerical expressions.
MathematicaI Expressions

The mathematical expressions can be.

A variable

AGEGROUP = AGE

This allows you to create a copy of another variable. f you then recoded
the copy you would effectively be recoding into different variables.

A constant

TOTNC = 0

This may be useful if you want to set a variable to 0, such as TOTNC
(total income) before you then go on and use a more complicated
COMPUTE command to calculate the actual total income.

A mathematical expression can include an arithmetic operator

+ addition
- subtraction
* multiplication
/ division
** exponentiation ("to the power of)

4-3
Some examples

TOTNC = WAGES + BONUS

YEARS = MONTHS/12

SQDOCTOR=DOCTOR**2

BYEAR = 87 - RAGE

n the last example, we can discover the (approximate) birth year of the
respondents in the 1987 Social Attitude Survey, knowing their age
(RAGE).


Mathematical expressions can also contain functions

Arithmetic Functions

i.e. LG10 or SQRT

LGNCOME = LG10(TOTNC)

This will calculate the log of the variable TOTNC and put the value into
the new variable LGNCOME.

Statistical Functions

e.g. SUM, MEAN

TOTNC=SUM(WAGES,BONUS)

This is another way of calculating the total income from two other
variables, WAGES and BONUS.


AVERAGE=MEAN(MARK1 TO MARK5)

This computes a new variable AVERAGE which is the mean of MARK1,
MARK2, MARK3, MARK4, MARK5. The keyword to can be used to
indicate that variables (in the order they are found in the file) in between
the two named in the command should also be included.

Logical Functions

These evaluate as 'true' or 'false'. f 'true', then the new computed
variable will be given the value 1. f 'false' the new variable is given the
value 0.

e.g. Any, Range
4-4

RETRED=RANGE(age,65,98)

Only those aged between 65 and 98 for the variable age will be given the
value 1 for the new variable, RETIRED; the rest of the cases are given 0.


LEAPYEAR=ANY(year,1964,1968,1972,1976,1980,1984,1988,1992)

Here if the value of the first argument (the variable year) matches any of
the remaining arguments (1964, 1968 etc.) then the statement evaluates
as 'true' and the new variable LEAPYEAR becomes equal to the value 1.


ExampIe

Let's suppose we collected information for each respondent on how much
income they earned and also how much they earned in bonus payments
each year. Figure 4.1 shows the data for 4 hypothetical people :


Figure 4.1

Suppose we wish to create a new variable, let's call it totinc, which will be
their total income.

n SPSS we select.

Transform
Compute variable...

and then indicate the new variable name, totinc, in the Target VariabIe
box and the mathematical expression, wages+bonus, in the NumericaI
Expression box. This expression can be built up by clicking on the
appropriate variables and symbols on the calculator pad, or simply by
typing in the NumericaI Expression box (see Figure 4.2).

Clicking on OK will get SPSS to carry out this calculation and create a
new variable in the Data Window.

Let us do this for the data in Figure. 4.1.

4-5
File
Open
Data
H:\My Documents\spss data\Fig4_1.sav
OK

We then click on the Transform menu item followed by

Compute variable

Target Variable totinc, Numerical Expression wages+bonus

Figure 4.2

Then click on OK.

The Data Window will then look like this :

4-6

Figure 4.3
Missing VaIues and Compute

Figure 4.4 shows the data for 8 hypothetical people. The missing value
for both wages and bonus is -1.00 and the dots in the data indicate blank
data, i.e. system missing values.


Figure 4.4

f we carried out the same compute command as above we would arrive
at the following data for the new variable, totinc:


Figure 4.5

4-7
Notice that valid data for totinc=wages+bonus only appears where there
is valid data for both wages and bonus.

However, another way of carrying out this computation would be to use
the function, SUM (wages, bonus) (see Figure 4.6).
Figure 4.6

Choosing the StatisticaI option in the Function group gives a list of
statistical functions. We choose Sum

We would get the following results for the new variable, this time called
totinc2 (Figure. 4.7).


Figure 4.7
4-8

Notice now that totinc2 contains a valid value even if either one of the
composite variables, wages or bonus is missing. f both are missing,
then totinc2 contains a system missing value.

This shows the basic difference between the treatment of missing values
when using arithmetic operators and functions in the computer command.

You can force the SUM and MEAN function to only use cases with
complete or prespecified sets of information.

For instance :

totinc=SUM.2(wages,bonus)

Here the .2 in SUM.2 specifies that only if both wages and bonus have
valid data will the new variable totinc contain valid data. The result is the
same as if the expression had been wages+bonus.

With

average=MEAN.5(mark1 to mark5)

only if each mark is a valid value will a valid mean mark be calculated.


The Count command

Another way to create a new variable is to use the count command. This
command will create a new variable which will count the occurrence of
specified values across several variables.

Suppose we are using H:\My Documents\spss data\bsas91b.sav, and
that we are interested in how many environmental issues people are quite
serious (value 3), or very seriously (value 4) worried about. There are
nine questions about how seriously people think about environmental
issues in the data, envir1 to envir9. Therefore a count of nine would
mean that that person considers every issues a serious problem.

n order to do this select..

Transform
Count values within cases...

The Count Occurrences of vaIues within Cases dialog box appears
and here we type in the new variable name, numenv (and a label if
required) and then select all nine envir variables and transfer them to the
variables box (see Figure 4.8).

4-9
We then click on Define VaIues and the next dialog box allows us to
indicate the values we wish to include in our count. n this case, the
values 3 (Quite serious) and 4 (Very Serious) are included as a range.

Figure 4.8

The resulting frequency of the new variable, numenv is as follows :

4-10
numenv
1707 58.5 58.5 58.5
8 .3 .3 58.8
16 .5 .5 59.3
25 .9 .9 60.2
44 1.5 1.5 61.7
68 2.3 2.3 64.0
95 3.3 3.3 67.3
193 6.6 6.6 73.9
453 15.5 15.5 89.4
309 10.6 10.6 100.0
2918 100.0 100.0
.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
Total
Valid
Frequency Percent Valid Percent
Cumulative
Percent

Figure 4.9

However, this frequency table seems to be telling us that a lot of people
did not care about the environment. f we look at the original code book
we see that a large number of people are SYSTEM missing or missing for
these variables and therefore should be excluded from the count. We
shall discover how to perform a conditional data modification using F in
the next session.


PracticaI session 4

The British Social Attitudes Survey includes a set of variables about
respondents' opinion about the seriousness of various environmental
pollutants and damage (noise from aircraft, lead from petrol, industrial
waste in rivers and seas, waste from nuclear electricity stations, industrial
fumes in the air, noise and dirt from traffic, acid rain, aerosol chemicals
and loss of rain forests). Respondents were asked to indicate for each of
these whether they thought the effect on the environment was not at all
serious (code 1), not very serious (code 2), quite serious (code 3), very
serious (code 4) or that they did not know (code 8) or did not reply (code
0). The answers are recorded in variables called ENVR1, to ENVR9.
One way of getting an overall summary score for a respondent's attitude
to the environment would be to sum the scores on these nine variables.
This can be done with the COMPUTE command in which a new variable,
ENVRALL is set to the sum total of the scores on each of the ENVR
variables, for each respondent.

Start up SPSS in the usual way

Retrieve the SPSS data file H:\My Documents\spss data\bsas91b.sav.

Then create the COMPUTE command to calculate the new variable
envaII1 by using the following numerical expression:

4-11
envir1+envir2+envir3+envir4+envir5+envir6+envir7+envir8+envir9

Produce a bar chart for this variable. To plot a bar chart, click on Charts
in the Frequencies dialog box


Figure 4.10

and then click on Bar Charts, Figure 4.11


Figure 4.11

Repeat the compute command but this time calculate a new variable,
envaII2 using the following numerical expression :

SUM (envir1 to envir9)

Then produce another frequency count and Bar Chart so that you can
compare the distribution of envaII1 and envaII2.


Are there any differences between the bar charts? Why? Which form of
the compute command would be best to use?

Save your output as Exer4.spo.

4-12
Exit SPSS.

Вам также может понравиться