0 оценок0% нашли этот документ полезным (0 голосов)

40 просмотров0 страницNov 14, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

0 оценок0% нашли этот документ полезным (0 голосов)

40 просмотров0 страницAttribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 0

In biology we often need to use statistics to analyse experimental results. In AS biology we need to learn to use descriptive

statistics to summarise data.

Merlin

The best way to carry out statistical calculations is to use computer software. In school we have "Merlin", which is an Excel

add-in that performs all the statistics easily. To use Merlin:

1. Double-click on the file "Merlin.xla" (your teacher will tell you where to find it).

2. If you get a warning message about macros then click on "enable macros", or Merlin won't work.

3. Excel will start with Merlin installed. There will be a new "Merlin" menu item.

4. Open a new or existing Excel file and use the Merlin functions just like normal Excel functions. You can also plot

charts from the Merlin menu.

5. You can get help on all Merlin's functions using Merlin help in the Merlin menu.

Merlin is completely free and can be copied to any computer, so you can use it at home.

Descriptive Statistics

number of times each value

occurs

to get replicates. Replicate measurements in biology

are rarely identical, due to random errors and

natural variation. If enough measurements are

repeated they can be plotted on a histogram, like the

one on the right. This usually shows a normal

distribution, with most of the repeats close to some

central value. Many biological phenomena follow

this pattern: e.g. peoples' heights, number of peas in

a pod, the breathing rate of insects, etc.

mean

95% CI

95% CI

normal

distribution

curve

values

The central value of the normal distribution curve is the mean (also known as the arithmetic mean or average). But how

reliable is this mean? If the data are all close together, then the mean is probably good, but if they are scattered widely, then

the calculated mean may not be very reliable. The

small confidence limit,

large confidence limit,

low variability,

high variability,

spread of the replicates is given by the standard

data close together,

data scattered,

deviation (SD), and the more spread out the

mean is reliable

mean is unreliable

replicates are, the larger the SD. The accuracy (or

confidence) of the calculated mean is given by

the 95% confidence interval (CI). You can be

95% confident that the real mean lies somewhere

in the range: calculated mean CI. For example

if a calculated mean is 10 with a CI of 2, then we are confident that the real mean lies somewhere in the range 8 to 12, and

there is only a 5% chance that it lies outside this range. Whenever you calculate a mean you should also calculate a

confidence interval to indicate the quality of your replicates. The confidence interval should be shown on a chart as an error

bar.

Non-Normal Data

Sometimes replicate data are not normally-distributed, and a histogram of replicates doesn't give a symmetrical, bell-shaped

curve. This can happen with arbitrary scales like "1-5", calculated data and data sets with extreme outliers. In this case it's

meaningless to calculate a mean or a CI, so instead of a mean you calculate the median; instead of a CI you calculate the

interquartile range; and instead of a bar chart you draw a box plot. This shows the median as a central line; the interquartile

range as a box; and the maximum and minimum values as "whiskers".

HGS Biology

NCM/01/06

A2 statistics page 2

This first example shows two data sets summarised and

plotted as a bar chart. Cell B11 contains the formula

=MEAN(B2:B10); and cell B12 contains the formula

=CI(B2:B10). These formulae were dragged across to

column C to save typing. Excel will always return the

results of a calculation to about 8 decimal places of

precision. This is meaningless, and cells with calculated

results should always be formatted to a more sensible

precision (Format menu > Cells > Number tab >

Number). Also take time to format the table nicely by

adjusting the column widths, alignment, etc.

The bar chart was plotted by Merlin > Bar Chart. The categories range is the names in cells B1:C1; the values range is the

means in cells B11:C11 and the error bar range is the Cis in cells B12:C12. Only the means are plotted, not the raw data.

Type "mean values" into the value axis title box and press OK.

These two groups have the same mean but different confidence intervals. In group A the CI is small compared to the mean,

so the data are reliable and you can be confident that the real mean is close to your calculated mean. But in group B the CI

is large compared to the mean, so the data are unreliable, as the real mean could be quite far away from your calculated

mean.

This second example shows the effect of enzyme concentration on the rate

of reaction, which is best presented as a scatter graph. Each measurement

was repeated 5 times and a mean and CI calculated in rows 8 and 9. The

scatter graph was plotted by Merlin > Scatter Graph. The X range is B2:F2;

the values range is B8:F8 and the error bar range is B9:F9. The X axis title

is cell B1 and Y axis title is "rate of reaction".

We want to draw a line of best fit, so we click on straight trend-line, and

select "force through origin", since we know that there is zero rate with no

enzyme. A line can also be drawn joining the points, if that is appropriate.

A line of best fit should always go through the error bars.

This example shows some data on a 1-5 score. The median is

calculated with the formula =MEDIAN(B3:B10) in cell B11, and

dragged to column C. The box plot was plotted by Merlin > Box

Plot. The values range is the raw data in cells B3:C10; the Raw

replicate data option should be selected; the categories range is the

names in cells B2:C2; and the value axis title is cell B1.

Once you have drawn a graph, you can now change any aspect of it by double-clicking (or sometimes right-clicking) on the

part you want to change. For example you can: move and re-shape the graph; change the shape and size of the markers

(dots); or change the axes scales and tick marks.

HGS Biology

NCM/01/06

A2 statistics page 3

Questions

1. The effect of three different fertilisers on growth of wheat seedlings was investigated by measuring the heights of wheat

seedlings after treatment with fertiliser. Five plants were measured with each fertiliser, with the following results:

+HLJKWRIVHHGOLQJP

)HUWLOLVHU$

)HUWLOLVHU%

)HUWLOLVHU&

SODQW

SODQW

SODQW

SODQW

SODQW

PHDQ

&,

Use Merlin to calculate the means and 95% confidence limits of these results, then plot a bar chart graph of the mean

results with error bars.

2. In an investigation into the rate of photosynthesis in Elodea the number of bubbles given off in one minute was counted

under different light intensities. Each measurement was repeated 5 times. Use Merlin to calculate the means and 95%

confidence intervals of these results, then plot a graph of the mean results with error bars and a line of best fit.

1RRIEXEEOHVSHUPLQ

OLJKWLQWHQVLW\/X[

WULDO

WULDO

WULDO

WULDO

WULDO

PHDQ

&,

3.

The measured body temperature for a group of 30 students is 36.6 0.95C. Does this result agree with the human

body temperature of 37.5C given in the textbook?

4.

(You can choose many or none.)

(a) There is a linear relation

(b) There is a curved relation

(c) The point at X=30 is higher than the point at X=40

(d) The line of best fit must go through the origin

(e) The point at X=40 is an anomaly.

(f) No conclusions could be drawn from this data.

20

30

40

50

60

What conclusion can you draw from the bar chart on the right?

H eig h t

5.

10

HGS Biology

NCM/01/06

70

A2 statistics page 1

In A2 biology we need to learn how to use statistics to analyse experimental results. Sometimes experimental results are not

very clear and so it is difficult to make a firm conclusion. In these cases an appropriate statistical test can help to clarify the

results so that a valid conclusion can be made. Statistics can therefore be used to extract the maximum amount of

information from experimental data, and are an essential tool of experimental biology.

There are three stages in using a statistical test: choosing a test; carrying it out; and making a conclusion. Before we choose

a test though, we need to learn a little about different kinds of test and different kinds of data.

S ta tistics

D escrip tiv e S tatistic s

e .g . m e an , m e d ia n, sta nd a rd d ev iatio n

p o p ulation in d e x, etc .

n u ll h y p o th e sis is co rre ct.

C o m p a ra tiv e S ta tistic s

m o re se ts o f d a ta

e .g . t-test, 2 -test, etc.

se ts o f d ata

e .g . co rrelation , regre ssio n.

Descriptive statistics are used to summarise data so you can simplify them and plot a graph. This is what you did in AS,

and includes the mean and 95% confidence interval. For non-normal data you may instead have to calculate the median and

interquartile range.

Inferential Statistics test a statement called the null hypothesis, and return a probability (called a P-value) that the null

hypothesis is true. The null hypothesis is a mathematical statement and is fixed for a given test (the exact null hypothesis is

given for each test on the next few pages). It has nothing to do with (and can be quite different from) any scientific

hypothesis you may be making about the result of the experiment. The lower the probability, the less likely it is that the null

hypothesis is true, and in biology we usually take 0.05 (or 5%) as the cut-off. This may seem very low, but it reflects the

facts that biology experiments are expected to produce quite varied results.

If P < 5% (0.05) then we reject the null hypothesis, and conclude that there is a significant difference or association.

If P 5% (0.05) then we accept the null hypothesis, and conclude that there is no significant difference or association.

There are two kinds of inferential statistics:

Comparative statistics are used to compare different sets of data to see if they are different (e.g. is this group bigger than

that group?). The null hypothesis states that there is no difference between the sets of data. You must also choose between

matched-sample experiments and independent sample experiments. In a matched-sample experiment each measurement

from the first group matches up with corresponding measurement in the other groups, perhaps because they were all made

on the same subject (e.g. a "before and after" experiment). Otherwise you have an independent sample experiment. If it isn't

obvious which value matches with which, then it's probably not matched.

Association statistics are used to look for an association (or correlation) between two sets of data (e.g. if this goes up does

that go up?). The null hypothesis states that there is no association between the sets. A scatter graph (or mosaic chart for

categoric data) of one factor against the other (without a line of best fit) indicates the association. If both factors increase

together then there is a positive correlation; if one factor decreases when the other increases then there is a negative

correlation; and if the scatter graph has apparently random points then there is no correlation:

HGS Biology

NCM/01/06

A2 statistics page 2

N o C o rre la tio n

variable 2

variable 2

variable 2

variab le 1

variab le 1

variab le 1

First there is a P-value, which tests the null hypothesis that there is no correlation. If P<5% then the null hypothesis is

rejected and there is a significant correlation (with the strength indicated by the correlation coefficient); but if P>5%

then we accept the null hypothesis that there is no correlation (and the value of the correlation coefficient is

meaningless).

Second there is a correlation coefficient, which gives the strength of the association. It varies from 0 (no correlation) to

1 (perfect correlation). Positive values indicate a positive correlation while negative values indicate a negative

correlation. The larger the absolute value (positive or negative), the stronger the correlation.

A correlation does not necessarily mean that there is a causal relationship between the factors (i.e. changes in one factor

cause the changes in the other). The changes may both be caused by a third factor, or it could be just coincidence. Further

controlled studies would be needed to find out.

There are different statistical tests designed for different kinds of data. Data can be classified as quantitative (numbers) or

qualitative (words).

D ata

Q u an titativ e D ata

(n u m b ers)

n o rm ally

d istrib uted

Q u alitativ e D ata

(w o rd s)

n o t n orm ally

d istrib uted

o r o rd in a l

c an b e

ran k e d

c an n o t be

ran k e d

O rd in a l D a ta

N o m in a l D a ta

u se freq u en c y tests

Most quantitative measurements (e.g. length, mass, temperature, rates, counts) are normally-distributed, especially if you

have a large number of repeats. For normally-distributed data you can use the most powerful Parametric Tests.

Other quantitative measurements are not normally-distributed. These include arbitrary scales like "1-5", calculated data

and data sets with extreme outliers. In this case the parametric tests are invalid, so choose Non-Parametric Tests.

Some qualitative data can be ranked, so can be replaced by numerical ranks (e.g. big, medium, small become 1, 2, 3).

These can then be analysed using Non-Parametric Tests.

Finally, the data can simply be categories that cannot be ranked (e.g. colours, shapes, species). We can't do maths on

categoric data, but we can count the numbers in each category to give frequencies, and then compare these observed

frequencies with some expected frequencies using Frequency Tests.

HGS Biology

NCM/01/06

A2 statistics page 3

All the different statistical tests you will come across are summarised in this table. Don't panic, you don't need to learn this!

Parametric Tests

(for normal data)

Descriptive

Statistics

Comparative

Statistics

2 groups

2 groups

2 groups

Matched

Samples

2 groups

Chart

Association

Statistics

Details of Linear Relationship

Box Plot

Unpaired t-Test

Mann-Whitney U-Test

Anova

Kruskal-Wallis Test

Paired t-Test

Wilcoxon Test

Matched Anova

Friedman Test

Scatter Graph

Independent

Samples

Median, Quartiles

Bar Chart

Chart

Frequency Tests

(for nominal data)

Mean, Standard

Deviation, 95% CI

Summarise Data

Non-Parametric Tests

(for ordinal data)

Scatter Graph

Mosaic Chart

Pearson Correlation

Spearman Correlation

Chi-Squared Test

of Association

Linear Regression

Chi-Squared Test

or G-Test

1. Choose a Test

While planning your investigation, choose a suitable statistical test using Merlin's "Choose a

Test" or the table above.

Hypothesis

Look up the test in this document. State the null hypothesis as precisely as you can for your

experiment; e.g. "There is no difference between the means of the plant heights in the two areas".

This statement is what the statistical test will actually test.

3. Obtain

Results

Carry out the investigation and obtain results. For hypothesis-testing we need as many replicate

measurements as possible, and as a guide, aim for at least 10 replicates and preferably 20 in each

set.

4. Present the

Data

Present your raw data in a neat results table. Use Merlin to calculate descriptive statistics, like

mean and 95% CI, and plot an appropriate graph.

5. Carry out the Type the Merlin formula for your chosen test into an empty cell e.g. =TTESTP(B3:B12,C3:C12).

Statistical Test This will return the P-value for that test. It's a good idea to format this cell as a percentage

(Format menu > Cells > Number tab > Percentage), so for example a P-value of 0.02 appears as

2%.

6. Make a

Conclusion

State whether you accept or reject the null hypothesis, and write a sentence explaining exactly

what that means in this case. For example if P was < 5% then you would reject the null

hypothesis and say that the plants in group A are significantly taller than the plants in group B. Or

if P was > 5% then you would accept the null hypothesis and say that as far as you can tell from

the data, there is no significant difference between the heights of the two groups of plants. The

wording of the conclusion is important, so use these examples as a guide and think carefully

about what you are saying. Note that if P > 5% we haven't proved the null hypothesis, but since

our data are consistent with it, we accept it.

There are lots of examples of conclusions on the next five pages. This is a statistics reference guide, which describes each

of the tests in detail.

HGS Biology

NCM/01/06

A2 statistics page 4

Unpaired t-Test

(t-test)

This test is used to compare two sets of data, and it

tests the null hypothesis that the two sets have the

same mean. The data must be normally-distributed and

there must at least 10 replicates (and preferably much

more). If P<5% then the null hypothesis is rejected

and there is a significant difference between the two

means.

The Merlin function is =TTESTU(range1, range2). In

this example the effect of two fertilisers on yield of

potatoes

is

compared.

The

formula

=TTESTU(B3:B12,C3:C12) is typed into cell B15

and formatted as a percentage. The P value is >5% so

we accept the null hypothesis and conclude that there

is no significant difference between the two fertilisers.

Paired t-Test

This test is used to compare two sets of paired

(matched) data, and it tests the null hypothesis that the

mean difference between the pairs is zero. The data

must be in pairs, they must be normally-distributed

and there must at least 10 replicates (and preferably

much more). If P<5% then the null hypothesis is

rejected and there is a significant difference between

the two sets.

The Merlin function is =TTESTP(range1, range2).

This example compares pulse rate before and after

eating a large meal. Because each individual had their

pulse measured before and after the meal then the data

are paired. The formula =TTESTP(B3:B12,C3:C12) is

typed into cell B15 and formatted as a percentage. The

P value is <5% so we reject the null hypothesis and

conclude that pulse rate is significantly higher after a

meal.

This test is used to compare two or more sets of data,

and it tests the null hypothesis that the sets have the

same mean. The data must be normally-distributed and

there must at least 10 replicates (and preferably much

more). If P<5% then the null hypothesis is rejected,

and at least one of the sets is significantly different.

The Merlin function is =ANOVA(range), where each

column in the range is a different set. This example

compares 3 different colours of light on the rate of

photosynthesis in Elodea measured by length of

oxygen bubble produced in a given time. The formula

=ANOVA(B3:D12) is typed into cell B15 and

formatted as a percentage. The P value is <5% so we

reject the null hypothesis and conclude that at least

one of the colours is significantly different. From the

means, green must be significantly lower than the

others.

HGS Biology

NCM/01/06

A2 statistics page 5

This test is used to compare two or more sets of

matched data, and it tests the null hypothesis that the

mean difference between the sets is zero. The data

must be matched, they must be normally-distributed

and there must at least 10 replicates (and preferably

much more). If P<5% then the null hypothesis is

rejected and at least one of the sets is significantly

different.

The Merlin function is =ANOVAM(range), where

each column in the range is a different set. This

example compares the mass of food eaten by 8 deer at

four different times of year. The formula

=ANOVAM(B3:E10) is typed into cell B13 and

formatted as a percentage. The P value is <5% so we

reject the null hypothesis and conclude that at least

one of the months is significantly different. From the

means, significantly less food is eaten in May and

August.

Mann-Whitney U-Test

This test is used to compare two sets of data, and it

tests the null hypothesis that the two sets have the

same median. The data can be any form so long as

they can be ranked and there must at least 10

replicates (and preferably much more). If P<5% then

the null hypothesis is rejected and there is a significant

difference between the two medians.

The Merlin function is =UTEST (range1, range2).

This example compares the abundance of blown algae

(measured on a 1-5 score) on two different shores. The

formula =UTEST(B3:B12,C3:C12) is typed into cell

B13 and formatted as a percentage. The P value is

<5% (just!) so we reject the null hypothesis and

conclude that there is significantly more algae on the

sheltered shore.

This test is used to compare two sets of paired

(matched) data, and tests the null hypothesis that the

median difference between the pairs is zero. The data

must be in pairs; can be any form that can be ranked

and there must at least 10 replicates (and preferably

much more). If P<5% then the null hypothesis is

rejected and there is a significant difference between

the two sets.

The Merlin function is =WILCOXON(range1,

range2). This example compares memory scores

(number of words recalled from a text) before and

after

drinking

alcohol.

The

formula

=WILCOXON(B3:B12,C3:C12) is typed into cell C13

and formatted as a percentage. The P value is <5% so

we reject the null hypothesis and conclude that

memory is significantly worse after drinking alcohol.

HGS Biology

NCM/01/06

A2 statistics page 6

Kruskall-Wallace Test

This test is used to compare two or more sets of data,

and it tests the null hypothesis that the sets have the

same median. The data can be any form that can be

ranked and there must at least 10 replicates (and

preferably much more). If P<5% then the null

hypothesis is rejected and there is a significant

difference between at least one of the sets.

The Merlin function is =KWTEST(range), where each

column in the range is a different set. This example

compares decay rates of three species of leaf,

measured by % of leaf area remaining after 8 weeks

burial. The formula =KWTEST(B3:D11) is typed into

cell B12 and formatted as a percentage. The P value is

<5% so we reject the null hypothesis and conclude that

there is a significant difference between at least two of

the leaves. From the box plot, the beech leaves must

decay significantly more slowly than the other two

species.

Friedman Test

This test is used to compare two or more sets of

matched data, and it tests the null hypothesis that the

median difference between the sets is zero. The data

can be any form that can be ranked but must be

matched, and there must at least 10 replicates (and

preferably much more). If P<5% then the null

hypothesis is rejected and at least one of the sets is

significantly different.

The Merlin function is =FRIEDMAN(range), where

each column in the range is a different set. This

example compares the symptoms of patients (on a

score system) before and after treatment with a drug.

The formula =FRIEDMAN(B3:D12) is typed into cell

B13 and formatted as a percentage. The P value is

<5% so we reject the null hypothesis and conclude that

at least one of days is significantly different. From the

medians, there is a significant drop in symptoms after

treatment, so the drug works.

This test is used for frequencies of categoric data, and it tests the null hypothesis that there is no difference between the

observed and expected frequencies. If P<5% then the null hypothesis is rejected and there is a significant difference

between the frequencies. The Excel function is =CHITEST(obsrange, exprange). There are different ways of calculating

the expected frequencies:

Sometimes the expected frequencies can be calculated

from a quantitative theory such as Mendel's laws of

genetics. In this example the frequencies of flower

colours from a genetic cross are compared to an

expected 3:1 ratio. The expected frequencies can be

calculated from the total number of observations (929)

using simple Excel formulae. The formula

=CHITEST(B2:B3,C2:C3) is typed into cell C5 and

formatted as a percentage. The P value is >5% so we

accept the null hypothesis and conclude that the

observed data are consistent with Mendel's law.

HGS Biology

NCM/01/06

A2 statistics page 7

Other times the expected frequencies can be calculated

by assuming that the frequencies in all the categories

should be the same. In this example the frequencies of

boys and girls born in a hospital over a period of time

are compared to an expected 1:1 ratio. The expected

frequencies can be calculated from the total number of

observations (445) using simple Excel formulae. The

formula =CHITEST(B2:B3,C2:C3) is typed into cell

C5 and formatted as a percentage. The P value is >5%

so we accept the null hypothesis and conclude that

there is no difference between male and female births.

G-Test

This test is also used for frequencies of categoric data, and it can be used whenever the chisquared test can be used. Many

statisticians prefer the G-test to the chisquared test. It has the same null hypothesis as the chisquared test. The Merlin

function is =GTEST(obsrange, exprange).

For the genetic cross example above GTEST gives the P-value 53.04%.

For the sex of baby example above GTEST gives the P-value 6.45%.

So in both cases the conclusion is the same.

This test is used to test for a correlation between two

sets of normally-distributed data. The correlation

coefficient is called r and is calculated using the

function

=PEARSON(range1, range2).

The

corresponding P-value, which tests the null hypothesis

that r=0, is calculated using the function

=PEARSONP(range1, range2).

In this example the heights of 10 fathers are compared

with their sons. A scatter graph is plotted and the

formula =PEARSONP(B2:B12,C2:C12) is typed into

cell C13 and formatted as a percentage and the

formula =PEARSON(B2:B12,C2:C12) is typed into

cell C14. The P value is <5% so the null hypothesis is

rejected and we conclude that there is a significant

positive correlation with a strength of 0.72, so tall

fathers do have tall sons.

This test is used to test for a correlation between two

sets of non-normal data. The correlation coefficient is

called rs and is calculated using the function

=SPEARMAN(range1, range2). The corresponding Pvalue, which tests the null hypothesis that rs=0, is

calculated using the function =SPEARMANP(range1,

range2).

In this example the social status of hens (measured by

their pecking order) is compared with their mass. A

scatter graph is plotted and the formula

=SPEARMANP(B2:B11,C2:C11) is typed into cell

C12 and formatted as a percentage and the formula

=SPEARMAN(B2:B11,C2:C11) is typed into cell

C13. The P value is <5% so the null hypothesis is

rejected and we conclude that there is a significant

negative correlation with a strength of -0.77, so big

hens are higher up the pecking order.

HGS Biology

NCM/01/06

A2 statistics page 8

This test is used to test for an association between two

factors that are measured by categoric data. The two

sets of categories are the rows and columns of a

contingency table, which contains the frequencies. The

association coefficient is called Cramer's V and is

calculated

using

the

Merlin

function

=CRAMER(range). The corresponding P-value is

called the 2 test of association and it tests the null

hypothesis that there is no association (i.e. Cramer's

V=0). It is calculated using the Merlin function

=CHIASSOC(range).

This example investigates whether there are more

nests in birch or sycamore trees, in other words

whether there is an association between tree species

and birds' nests. The formula =CHASSOC(B2:C3) is

typed into cell B4 and formatted as a percentage and

the formula =CRAMER(B2:C3) is typed into cell B5.

The P value is <5% so the null hypothesis is rejected

and we conclude that there is a significant association,

though it is only weak, with a strength of 0.22.

Linear Regression

This is used to describe a linear relationship between

two sets of data. It is used when you already know that

one variable causes the changes in the other variable

(i.e. there is a causal relationship). Regression fits a

straight line to the data, and gives the values of the

intercept and slope (or gradient) of that line (a and b in

the equation y = a + bx).

The Merlin function =REGRESS(xrange, yrange, flag)

returns values for the slope and intercept as well as

their 95% confidence intervals. The flag value

determines whether the intercept is fixed at zero.

REGRESS is an array function, so a square of 4 cells

is selected and the function is entered with ctrl-shiftEnter.

In this example the absorption of a yeast cell

suspension is plotted against its cell concentration

from

a

cell

counter.

The

formula

=REGRESS(A2:A12,B2:B12,0) was typed into cells

b15:c16 and entered as an array formula. The intercept

was fixed at zero because 0 cells have 0 absorbance.

The straight trendline was also plotted on the scatter

graph. The regression can then be used to make

quantitative predictions. For example, we could

predict that a sample with an absorbance of 1.37 has a

cell concentration of 9 x 107 cells cm-3.

HGS Biology

NCM/01/06

A2 statistics page 9

Excel Tips

Take time to tidy all results tables, as Excel's default formatting isn't very good e.g. line up titles with values.

Format all numbers to an appropriate number of decimal places (Format menu > Cells > Number tab > Number).

Format P-values as percentages (Format menu > Cells > Number tab > Percentage). This automatically multiplies the

P-value by 100 and adds the % sign to make small P values easier to read and understand.

Use Merlin for charts, even if Excel provides the same chart (e.g. a scatter graph). The format of Merlin charts is better

for scientific data. Even so, take time to tidy up all charts, adjusting the size, shape, colour, font size, etc. to make the

chart clear.

Never use an Excel line chart.

For each of these investigations, choose the best statistics test. In some cases there may be more than one equally good

answer, generally if you can't tell whether the dependent variable is normally distributed or not.

1.

growing in open and shaded areas is measured to

investigate the effect of light intensity on leaf area.

2.

and edge of a stream was compared at 12 stations

along the stream.

3.

4.

barley seeds affected the seeds themselves, batches

of 100 seeds were exposed to three different doses

of radiation (none/low/high) and then planted to see

if the seeds germinated or not.

Thirteen plants were kept in large sealed glass

bottles and the carbon dioxide concentration inside

was measured over a period of 24h to see if the

decrease in carbon dioxide per hour during the day

was different from the increase in carbon dioxide

per hour during the night.

5.

seedlings and yield of seeds per plant?

6.

produced by a plant and mean mass per seed?

HGS Biology

7.

incidence of a childhood disease in their town Given

the known national incidence of the disease, how

can they tell if the frequency in their town is

different?

8.

different fields.

9.

Body Mass Index .

measured lichen diversity at several locations in a

town to see if there was a relation between distance

from town center and diversity index.

11. In order to find the best preparation of a vaccine,

three different forms of the vaccine were inject into

the same 20 healthy volunteers on three separate

occasions, and the concentrations of antibodies in

the blood was measured.

12. To investigate the effect of handling by humans on

lab rats, the activity of 10 handled and 10 unhandled

rats was scored on a 1-5 scale (quiet to very active).

NCM/01/06

A2 statistics page 10

Statistics Problems

6LWH

(a) choose an appropriate statistical test, giving reasons

for your choice.

(b) state the null hypothesis

(c) carry out the test in Merlin (plotting a chart if

appropriate)

(d) clearly state the conclusion to the problem.

&RQGXFWLYLW\

'LYHUVLW\

LQGH[

and 10 patients another drug. The number of hours of

relief from symptoms was measured with the

following results:

'UXJ$

'UXJ%

or you can copy and paste the Excel tables and charts

into a Word document (use paste special - Excel object).

diversity, and how strong is it?

4. Bacteria were grown in three different kinds of milk,

and their growth measured by change in pH (since the

bacteria produce lactic acid from the milk sugars).

The larger the pH change, the more the growth. Since

these numbers are small, they cannot be assumed to

be normally-distributed. 10 cultures were set up for

each milk, but 3 did not yield any results.

&XOWXUH

)XOOIDW

6HPL

8+7

QXPEHU

PLON

VNLPPHG

PLON

PLON

normally-distributed.

2. In an investigation into behaviour in woodlice a

choice chamber was set up with four different

environments. 100 woodlice were introduced into the

choice chamber and their location after 2 minutes was

noted. The results were:

(QYLURQPHQW

the 3 milks?

QRRI

ZRRGOLFH

GU\GDUN

GU\OLJKW

ZHWGDUN

ZHWOLJKW

significant?

reaction was investigated with the following results.

(Q]\PHFRQFHQWUDWLRQ

5DWH

P0

DUELWUDU\XQLWV

HGS Biology

concentration of polluting ions was measured at six

different sites by conductivity, and a diversity index

was calculated for the species present. The diversity

index is calculated from biotic data, so is not

normally distributed. The higher the index, the more

species are present.

data, and find the slope of this line. Use the slope to

predict the rate at an enzyme concentration of

0.7 mM.

NCM/01/06

A2 statistics page 11

6. The flow rate of a river was measured at seven

stations one two different days.

6WDWLRQ

'D\

'D\

and the number of beetles collected were counted..

7UDS

'HFLGXRXV

&RQLIHURXV

ZRRGODQG

ZRRGODQG

two days? Assume the flow rates are not normallydistributed.

7 In an investigation on a rocky shore the width and

heights of limpets were compared

/LPSHW

:LGWK

+HLJKW

PP

PP

sides of a group of trees were compared.

of beetles trapped in the two woods? (Since these

counts are so small, they cannot be assumed to

normally-distributed.)

7UHH

$UHDRIPRVVRQ

$UHDRIPRVVRQ

QRUWKVLGHRIWUHH

VRXWKVLGHRIWUHH

PP

PP

phenotypes were recorded in the F2 generation:

<HOORZURXQGVHHGV

<HOORZZULQNOHGVHHGV

*UHHQURXQGVHHGV

*UHHQZULQNOHGVHHGV

south sides? Assume the areas are normallydistributed.

9:3:3:1. Do these observed results agree with the

expected ratio?

flowers was recorded over a period of time.

ZKLWH

\HOORZ

EOXH

IORZHUV

IORZHUV

IORZHUV

EHHWOHV

IOLHV

ZDVSV

of insect?

HGS Biology

NCM/06/03