Академический Документы
Профессиональный Документы
Культура Документы
Statistical analysis
1
CORE
S cience is concerned with the systematic study of
the natural world around us. Biology is particularly
concerned with the study of living organisms and includes
Volume of gas (mL)
12
10
8
all levels of life from molecules to ecosystems. In the study
of biology, scientists make careful observations and in 6
measurement. 70
Mass of organism (kg)
60
Error bars can be displayed for the values of both variables. Example 2
In the Excel-generated graph in Figure 102 the mass of the When calculating a mean of ratios (for example,
marine organisms is measured using scales with a precision percentages) for several groups of different sizes, the
of 2 kg. The volume of the marine organisms is measured ratio for the combined total of all the groups is not the
with a precision of 0.006 m3. Notice the error bars are mean of the proportions for the individual groups.
small and they all fall on the trend line (see Topic1.1.6),
indicating that the measurements are reliable. For example, if 40 rats from a batch of 100 are male, this
implies 40% are male. If 120 rats from a batch of 240 are
CORE
(Please refer to Chapter 2 - Data collection and processing male, this implies 50% are male. The mean percentage
of The IBID Student Guide Biology for a detailed of males (50 + 40)/2 = 45% is not the percentage of
discussion about errors or uncertainties and precision versus males in the two groups, because there are 40 + 120 =
accuracy). 160 males in a total of 340 = 47% approximately.
CORE
Figure 108
The screen shots in Figures 105 to 109 and the instructions
describe how to enter the data from Figure 104 and perform
summary statistics on a TI graphical calculator (one of
the recommended calculators for the IB Mathematics
courses).
Press STAT
Figure 109
1 10.2 16.8
2 11.6 19.7
3 9.7 18.5
4 13.3 22.5
The output is shown in Figure 112 for the Group 1 small mean
shell data. The mean and standard deviation have been
manually highlighted in bold.
Mean 10.62
Standard Error 0.852877482
frequency
Median 10.2
CORE
Mode #N/A
Standard Deviation 1.907092027
Sample Variance 3.637
Kurtosis -0.229700866
Skewness 0.411499631
Range 5
Minimum 8.3
Maximum 13.3
Figure 113 A normal distribution curve
Sum 53.1
Count 5
Figure 112 Output data for shell length If the data is normally distributed then 68% (approximately
(generated in Excel) two thirds) of the sample have heights which are within
10 cm of 180 cm, that is, 68% of the sample have heights
between 170 cm and 190 cm.
1.1.3 State that the term standard deviation is In addition 95% of the sample heights lie within two
used to summarize the spread of values standard deviations of the mean. In this example, two
around the mean, and that 68% of the standard deviations = 2 x 10 cm = 20 cm. In other words
values fall within one standard deviation 95% of the sample have heights between 160 and 200 cm.
of the mean. Figure 114 illustrates these two values.
IBO 2007
mean
If repeated measurement of continuous Biological
variables, such as the height or weight of humans from a
large population, are plotted, a close approximation to a
normal distribution is obtained. The normal distribution
has some very special characteristics.
frequency
SD SD
CORE
14.16 6.06
of standard deviation indicates a wider spread around the 8.13 12.53
mean. Figure 115 shows Gaussian curves for the frequency 6.79 15.45
distributions of two statistical populations with differing 11.06 15.64
standard deviations (spreads). 5.83 15.19
10.73 14.93
6.68 7.94
5.02 8.28
SD 1
10.37 12.65
Standard deviation Standard deviation
frequency of occurrence
2.761473899 3.545349066
Mean Mean
8.963333333 11.36
SD1 < SD2
Move the cursor over the respective sections and enter ten
(for the mean) and indicate that the data is in L1.
1.1.6 Explain that the existence of a correlation A negative correlation means that if the value of X increases,
does not establish that there is a causal the value of Y will decrease. A negative correlation could
relationship between two variables. be found in the amount of salt in a jelly and the number of
IBO 2007 bacterial colonies growing on the jelly after a fixed amount
of time.
Biological data should always be plotted to show the
relationship between two sets of data. A line graph should The closer the value is to -1 or 1, the stronger the relationship
be plotted if the independent variable is under the control between the variables, that is, the less scatter there would
CORE
of the student performing the investigation. If both be about a line of best fit. A coefficient of 0 implies that
variables are dependent (that is, measured) then the values there is no relationship between the variables.
should be plotted in the form of a scattergram.
Figure 123 shows examples of correlation with linear
Regression and correlation are methods used when testing regression lines. In (i) and (ii) the correlation is good; for
relationships between samples of variables. If one variable (i) the correlation is positive and the correlation coefficient
is known or assumed to be dependent on the other in a is close to 1; for (ii) the correlation is negative and the
linear manner then a linear regression technique is used correlation coefficient is close to -1; in (iii) there is weak
to determine the line of best fit. positive correlation and the correlation coefficient would
be close to zero.
A correlation coefficient can then be calculated which
indicates how well the experimental data fit the line of Care must be taken when interpreting correlation
best fit. Correlation coefficients are expressed as a number coefficients, because if two variables are highly correlated
between -1 and 1. A positive coefficient indicates a it does not necessarily mean that one causes the other. In
positive relationship while a negative coefficient indicates statistical terms, correlation does not imply causation.
a negative relationship (between the data and the line of
best fit). There are three possible relationships between two
variables, X and Y:
A positive correlation means that if the value of X increases, Causation: Changes in X cause changes in Y.
the value of Y will also increase. A positive correlation Common response: Both X and Y respond to changes
could be found between the amount of sugar in a jelly and in some unobserved variable.
the number of bacterial colonies growing on the jelly after Confounding: The effect of X on Y is mixed up with
a fixed amount of time. the effects of other variables on Y.
(i) x
f f
(ii) x (iii) x
Exercises
Drug A Drug B
61.6 39.3
64.6 26.3
55.6 32.4
45.2 21.5
50.6 60.3
70.5 24.3
67.7 36.4
57.5 47.4
66.5 33.2
42.3 57.2
Figure 124