Вы находитесь на странице: 1из 5

# Basic statistics exercises:

I was asked by the faculty to include some take-home exercises in this class to show you how to
use excel (a program you presumably all have!) to do basic statistical comparisons. Many of you
may already know how to do all of these; however, these exercises ensure that everyone is able
to do some basic types of analyses before leaving this class. Although we won’t really be going
over these in class, the exercises will be graded, so that I can ensure you all are able to do them.
Since they are fairly simple, they are an easy “A” for part of the class.

These assignments occur over the course of several weeks, you will continue to use the same
dataset for all of them. You each will be given your own, unique dataset, so you won’t get the
same answers as one another. Be sure to report your dataset # on all assignments, or I won’t
know what your answers should have been! The particular instructions are below.

In all cases, you have data from two groups of participants (groupA & groupB), each of whom
gave you results from three different tasks (taskA, taskB, and taskC). If it makes it easier for
you, you can rename them (i.e., adults with and without hearing loss on tasks of speech
perception and word recognition; children with and without SLI on tasks of working memory
and grammar production, what-have-you). Since you’ll continue to use the same set of data for
multiple exercises, it is best to enter it all into a spreadsheet that you continue to use. (That way,
the assignments after the first one won’t take more than a few minutes to do.) I would
recommend the following format:
Participant Group TaskA TaskB TaskC
1 A score score score
2 A score score score
3 A score score score
etc.
16 B score score score
17 B score score score
18 B score score score
etc.

For each exercise, there are just a few numbers you actually need to report back to me – I don’t
need the full file, or an excel print out. Indeed, I don’t want that – just hand in a sheet of paper
with the answers. You will presumably get all the answers correct – if you miss any, then you
should come show me what you did, or email me your spreadsheet.

Often, I ask you to do the same thing two ways. Admittedly, I wouldn’t know if you did or not.
But the idea is that these exercises will walk you through the steps so that you could do things
either way in the future – so while I have no way of checking, its really to your benefit to try and
do the analyses both ways.

Also, if any result gives you the answer in scientific format (with a number including an E, then
a negative sign, then 2 digits) you can get the number in a more typical format by selecting the
cell, going to formatcell picking number, and then telling it how many numbers after the
decimal points you want… (as many as the 2 digits after the E….)
Exercise one: Descriptive statistics.
Part A. Find the mean, standard deviation, and standard error for each of the 4
task/group combinations using tasks A & B (ignoring task C): (groupAtaskA,
groupBtaskA, groupAtaskB, groupBtaskB).

## To do a mean: the formula is =average(cella:cellz)

Type in =average( and then highlight the section you want, then hit add the end-paren)

Standard deviation works the same way, but it is stdev, not average

Standard error is what we typically use in figures for the error bars, so its an important
statistic to be able to calculate in our field, but excel does not give it directly. The formula for
standard error is the standard deviation (which you just did) divided by the square root of the
sample size (which you know). The formula for square root is =SQRT(_).
If you didn’t want to count out the sample size, you could have excel count the cells for
you: =count(cella:cellz). So the full formula would be
=(stdev(cella:cellz)/sqrt(count(cella:cellz)))

## Part B. Other descriptive statistics.

The other way to do descriptive statistics is with the analysis toolpack.
Go to tools -> Data analysis and then select descriptive statistics
(If you don’t see data analysis as an option under tools, it means you haven’t added in that
option to Excel. Instead, your column probably has “add-ins” as an option. Click on that. You
have to opt to have the statistics add-in installed; it comes with excel, but is only added on if you
select it. So click on add-ins, and then select Analysis ToolPak).
Highlight the section you want the statistics for. You can opt to have it tell you the results in
another worksheet (the default), in an entirely new file, or as part of the same worksheet if you
give it an output range (so it doesn’t write over something else you need). Just do it in a new
worksheet. This gives you a number of other measures as well.
Using this method, ignore group, and find the mean, median, standard deviation,
standard error, mode, kurtosis, and range (min/max) for the task as a whole.
(Note: you could have gotten such results individually with functions, too. The functions
would be =MIN, =MAX, =MODE, etc.)

Part C. Counts. One particularly useful function for grading is to have excel tell you
how many As, Bs, Cs, etc. you have on an exam. This isn’t really a stats thing, but is good to
know how to do anyway. This can be done with the COUNTIF function. Pretend for the
moment that all of the scores in task C were exam grades. Lets say you want to know how many
people got an A (90% or better). At the bottom of the column, enter =COUNTIF( Then you
have to highlight the section you want it to count over (i.e, the column). Add a comma after that.
Now you need to give the “rule” for what it should count. In this case, lets say 90 is an A. So it
would be “>=90” (greater than or equal to 90). Then add the end paren. It should count the
number of scores >90. Now, to do the Bs, you can do the same thing, but then you have to
subtract out the As. That is, if you type =countif(range, “>=80”) it would give you As and Bs
combined, so you need to subtract the earlier cell (or make the condition be that the score is both
>=80 & <90). Using either approach, tell me the number of As and Bs in the column.
Exercise 2: correlations
To do this exercise, we’re going to ignore the group difference.

## Part A. Determine if there is a correlation between scores on task A and scores on

task B for all 30 people.

## Go to tools -> Data analysis and then select correlation

Click in the section called “input range”, and then highlight your entire two columns for
A&B (don’t highlight C). Make sure the “grouped by” says columns, not rows.
The value you want is the correlation between column 1 and column 2 (the one between
any column and itself is, of course, 1).

Part B. Next, find the correlations among all three. To do this, do the same thing,
but select all three columns. You should get a larger table this time, with 3 values (one
identical to the one before).

Part C. Just like with the other tests, there is also a function for this. Try it out by
doing the correlation for only group A (15 people) on tasks A & B:
=correl(range1, range2).

## Exercise three: t-tests

To do this exercise, we’re going to ignore some group & task differences…. And, we’re
also going to ignore task C at first.

To do both parts A & C, you need to decide which type of t-test you want to do (paired or
unpaired). Go to tools data analysis and select the right choice. (Assume equal variance for
this). In each case, it asks you for the range for variable 1 and variable 2: click in the box, then
highlight the appropriate region in your spreadsheet.

Part A. Assume for the moment that you only had one group of 30 people, rather than
two groups of 15 each. Each person does still does both tasks, A & B. Do a t-test to determine
if the tasks give different results (i.e, if one task is easier).

Part B. Now do a t-test to see if task A was different than “chance” where chance is
50%. (A one-tailed t-test). Excel won’t do this, but it’s a simple formula. For a one group t-test,
you take the mean of the group you tested, subtract the known value (here, chance), and divide
by the standard error. To see if this is significant, use your table. Do the same thing for task B,
as well.

Part C. Assume that A & B were two different studies (rather than two different tasks);
each study has two groups of people who did the same single task. Do two t-tests (one for each
experiment) in which you determine whether the two groups of 15 people differed.

Part D. There is another way to do a t-test in excel, though. Excel also has a t-test
function. If you’re entering data in once, and then analyzing it, the way we did it above is fine.
But if you’re testing a series of participants and looking at the data repeatedly over the course of
the study, you’d have to keep doing this over and over. (It doesn’t automatically update as you
add in new data). So in that case, you’re better off doing the t-test function. It just isn’t quite as
intuitive, in some ways. To demonstrate this, I want you to compare tasks A & B to C (using
paired t-tests) using the ttest function. This is a function of the format
= TTEST(array1,array2,tails,type)
So click in the cell where you want the results. Type in the equal sign, then ttest, then
highlight the first set of cells you want analyzed, then the second set. For tails, you enter 1 for a
1-tailed test, 2 for a 2-tailed test. (Do 2-tailed ones here). Then for type, you enter 1 for paired, 2
for unpaired (assuming equal variance). Since this is a function, it updates itself (recalculates)
every time you change the data or add more in. But, it actually doesn’t tell you the t-value itself
– it gives you p (the probability). If you want the t-value, you need to do a second function,
called TINV, on the results of the ttest function. This function is in the format
=TINV(probability,degrees_freedom).
So in a neighboring cell, type in =TINV( Then, for the probability, click on the cell that
gave you the TTEST result. For df, you need to enter in the number.
So first, do this for A vs. C, and then again for B vs. C. Report both the probability and
ttest results in the correct manner. Then, do it for A vs. B. You should get the same thing you
got in part A, above. But then change the numbers for the following 5 subjects’ A values to be
10 points higher than they were: Harry, Barney, Sally, David, Fred. The t-value and probability
should change. Report those, as well. (Remember to go back and undo the data changes
afterwards!)

## Exercise four: 1-way ANOVA

Excel does not allow you to do a repeated-measures 1-way ANOVA. It allows you to do an
ANOVA, but it presumes that the data is NOT repeated. So we're going to pretend, for the
moment, that your dataset consists of 3 groups of 30 subjects each - that is, that you tested 90
subjects, 30 in each task.
Determine if the three tasks differ from one another. (Again, this is pretending that
you had 3 separate groups of subjects - if you'd really done the repeated measures design in
your dataset, you'd need to do a stats package other than excel!!!)
Go to tools  data analysis and select ANOVA: single factor. Tell it to group by column
(so you have 3 groups of 30 people each in your output). You would report this as F (df
between, dfwithin)= F-value, p< p-value.

## Exercise five: Fisher’s exact test.

Ok, this one doesn’t use Excel, but Fisher’s is so easy to use, that you should all be aware
of its existence, particularly because distributional questions come up fairly often. There are
many websites that will do a Fisher’s for you; one you can try is at
http://www.langsrud.com/fisher.htm
You have 30 subjects in your data set, with 15 in each group. Does one group have more
males than females? Report the probability of this being significant….
(You could also do a Chi-squared test, if your Ns aren’t too small – which excel can do.)

## Exercise six: 2-way ANOVA

Although Excel has two-way ANOVAs as an option, these actually only do the case where you
have one observation per cell. Since you have 15 observations per cell, it will not work with
your data set. To do anything beyond a 1-way ANOVA, you really need a stats package. All
stats packages are different, and although all of you have access to SPSS via BSOS’s software,
using SPSS is getting outside of the topic of this course…. So, I’m not going to have you
actually do a 2-way ANOVA. But let’s assume you’d done so, and found a significant
interaction. Do the follow-up t-tests you’d need to do to explain that significant interaction, and
then explain the findings in text (i.e., tell me where the interaction comes from, IN PLAIN
ENGLISH (a sentence or two explanation of what it really means)). (HINT: you’ve already
done some of those tests).

## Exercise seven: arcsin transformations

As noted in class, percentages are not actually Gaussian, so really should not be used in most
statistical tests.

Let’s pretend all your values are percentages. Take Task C, and do arcsine transformations of
each number – then report the average of these arcsin values. How much does this differ from
the average of the “percentages” you’d had before?

Note: Excel does arcsines in radians, not in degrees, so you need to account for this!
Step 1: Turn each percentage into a proportion (i.e., divide by 100)
Step 2: Take the square root of each number. = SQRT(cell)
Step 3: Take the arcsine of each number. = ASIN (cell)
Step 4: Convert into degrees DEGREES (cell)
Or, to do this all at once, the formula is:
=degrees(ASIN(SQRT(cell/100))))
If you want to test that your formula works, the transformation of 100 should give you 90, the
transformation of 10 should give you 18.4-something.