Вы находитесь на странице: 1из 11

LESSON 1 A REVIEW OF STATISTICS

This is a review of basic statistical concepts, at the end of which you should be able to:

1. Explain the need for statistics in agricultural experiments

2. Explain the use of graphical methods to describe the distribution of variables.

3. Compute and compare various measures of central tendencies and dispersions to describe a population or sample

The need for statistics Agricultural research often deals with plant and animal studies either in the laboratory or in the field. However, plants and animals tend to vary from each other even when they are treated similarly. Examples of such variations are:

Differences in the heights of plants grown under similar conditions

Differences in insect damage of plants similarly treated

Differences in the life cycle of a parasite raised under similar conditions

Differences in the diameters of bacteria-free zones in similar Petri dishes Some of the factors giving rise to such differences include:

Inherent genetical differences between plants or animals

Soil heterogeneity

Climatic variations

Variation in cultural practices

Competition for nutrients, moisture, and sunlight

Mechanical errors

Therefore it is expected that plants or animals similarly treated will yield different measurements of variables such as plant height, fresh yield, or stem diameter.

On the other hand, the objective of most agricultural or biological experiments is to determine the plant or animal responses to different treatments. For example, we may wish to determine if there is any difference in the yield of kai lan grown in two different levels of nitrogen. An experiment was carried out with three plots for each level of nitrogen and the results tabulated below:

 Plot 1 Plot 2 Plot 3 Average Nitrogen 1 126 132 135 131.00 Nitrogen 2 136 131 136 134.33

Yield of kai lan (gm per plant)

While the average yield seems to be higher for kai lan grown under nitrogen 2, note that the plot 2 yield of 131 is lower than the plot 2 and plot 3 yields of nitrogen 1. Notice also that there is considerable variation between plots of the same nitrogen level. As a result, the variation between the yields of the two different levels of nitrogen may be partly due to variation between nitrogen levels and partly due to naturally occurring variations between plots similarly treated.

The experimenter will have to:

1. separate these two sources of variations

2. obtain “an estimate of the true difference caused by the treatments” (Roger Mead, et al, 2003).

3. have some assurance of how accurate this estimated difference is

Statistics, which may be considered as “A body of concepts and methods that is used to collect and interpret data concerning a particular area of investigation and to draw conclusions in situations where uncertainty and variation exist”, has been used in the past seventy years to help researchers achieve the above three objectives.

Some basic statistical terms

Statistical Variable

An observable, countable or measurable characteristic that exists and differs from one individual to another in a population. Examples of variables are:

1. Highest academic qualifications of 1000 employees of a certain organization.

2. The number of rhizomes of 100 lallang plants.

3. The weight of 200 water melons

Types of variables Variables may be discrete or continuous. Discrete variable one which can only assume values which are countable

Examples: number of

fruitlets per oil palm bunch, number of spores in a medium, number of insects caught in a trap. Continuous variable one which can assume any value within an interval. Examples: weight in 10.2 gm, 10.24 gm, 10.2436 gm, height of high school students in cm. When we look at a distribution of height we may mistake it to be a discreet variable because heights may be recorded as 165cm, 166cm, 170cm, etc. This is we tend to round off a person’s height to the nearest cm. and to the precision of the machine used to measure the height. A person growing from 164cm to 165cm must grow continuously through 164.1, 164.15, etc. before he reaches 165.00 cm tall.

Types of Measurements Variables may also be measured or counted using scale, ordinal, or nominal measurement. Scale measurement the strongest of the three. Used to measure continuous variable. Makes use of an unit of measurement. Two subtypes: interval scale and ratio scale.

Interval scale has a zero that does not necessarily mean zero or nil e.g.: 0 o C does not mean the absence of temperature. Ratio scale has a natural zero. Example height in centimeter. Weight in grams.

Ordinal ranked as small, medium, large. Nominal the weakest of the three. Values are arbitrarily assigned to 2 or more groupings. Example:

sex 1 for Male, 2 for Female.

Population This refers to the entire collection of individuals that possess some measurable, countable or observable characteristics that we wish to study. For example if we wish to study the plant height of a certain variety of paddy, the heights of all paddy plants of that variety is the population. We may also regard all plants of that variety of paddy to be the population.

Frequently it is not possible to investigate the entire population either because it is too large or it is infinite or the test is destructive.

Sample This is a small part of the population that is studied to gain information about it. An important characteristic of a sample is that it must be representative of the population. Data resulting from experiments are usually considered as sample data.

Experimental Unit An individual member of the population that is subjected to treatment and measurement. It is also known as sampling unit. It may a single plant, a collection of several polybags of plants, a plot of several rows of plants or an entire field.

Observation (Case) All measurements made on a single experimental unit is known as an observation or case.

Selection of sample As mentioned earlier, a sample is a part of a population that is studied to gain information about it. As such, it must be representative of the population. One way to ensure this is to draw a random sample so that each and every individual in the population has an equal chance of being selected into the sample. This can be achieved by using either the random number function in statistical packages, random number generator in scientific calculators, or random number tables.

Descriptive statistics As a result of the inherent variability of plant and animal materials, different ways of describing the “distribution” of observed or measured variables will have to be used if the data recorded is to be of any use to the recipient.

Frequency Distribution The following table is the weight (gm) of 144 carrots harvested from a particular field plot. Very little can be seen about how the weight of carrot is distributed from the raw data. Original data

 405 109 549 72 32 248 221 234 287 15 556 251 388 166 106 257 62 325 673 610 709 494 93 173 498 194 499 144 93 203 179 70 717 185 132 258 373 380 353 640 509 74 238 149 36 95 54 484 529 147 376 662 86 217 380 589 113 496 657 28 78 172 318 120 296 1017 197 202 116 129 223 813 66 201 292 59 341 606 97 284 915 512 325 663 290 87 39 429 643 191 305 258 252 385 121 318 216 66 288 670 390 100 579 106 245 426 436 393 599 874 597 656 80 175 106 543 340 134 124 157 325 55 274 18 71 211 331 254 87 77 771 325 749 21 446 88 283 212 31 286 343 91 505 87

A frequency table may be used to group the original data into a number of equal sized classes and then to count the number of carrots with weights falling within each class.

 Class Frequency Greater than Freq p 0.0-66.7 14 0 144 1.00 66.8-133.4 29 66.7 130 0.93 133.5-200.0 14 133.4 101 0.70 200.1-266.8 19 200.0 87 0.60 266.9-333.5 17 266.8 68 0.47 333.6-400.0 12 333.5 51 0.35 400.1-466.9 5 400.0 39 0.27 467.0-533.6 9 466.9 34 0.24 533.7-600.0 7 533.6 25 0.17 600.1-667 8 600.0 18 0.13 667.1-733.7 4 667.0 10 0.07 733.8-800.0 2 733.7 6 0.04 800.1-867.1 1 800.0 4 0.03 867.2-933.8 2 867.1 3 0.02
 934.9-1000.0 0 933.8 1 0.01 1000.1-1067.2 1 1000.0 1 0.01 1067.3-1133.9 0 1067.2 0 0.00 1134.0-1200.0 0

From the above frequency table, It seems that carrot weight is not normally distributed there are more lighter carrots and less heavy ones. In addition, by using the greater than frequency counts, it can be quickly seen that 17% of the carrots weigh more than 533.6 gm each. Therefore, if a carrot is randomly selected from the group of 144, the probability that it weighs more than 533.6 gm is 0.17. Similarly, the probability that it weighs between 533.6 gm and 600.0 gm is 0.04 (0.17 - 0.13).

Graphical presentation 30
25
20
15
10
5
0
0
200
400
600
800
1000
Frequency

Weight

Mean = 301.9

Std. Dev. = 221.313

N = 144

Histogram showing the distribution of weight among the 144 carrots.

The information in the frequency table above can be presented graphically using a histogram that has a “normal” curve overlay (created using SPSS). The width of each column is the same and the frequency of each class is represented by the height of its column.

Measures of central tendency and dispersion The above histogram clearly shows that while the weight of the 144 carrots tend towards some central value, there is also considerable variation (or dispersion) between them. The various measures of central tendency are:

1. mode

2. median

3. mean (average)

While the various measures of dispersion are:

1. range

2. sum of deviation

3. sum of absolute deviation

4. sum of squares of deviation

5. variance

6. standard deviation

7. coefficient of variation

Task 1 define or describe each of the above 10 measures and discuss their relative usefulness. Make sure that you can differentiate between parameters and statistics.

The normal distribution The distribution of weight among the 144 carrots is clearly “skewed” towards lower weights. However, if more and more carrots were weighed and the classes were made narrower, we are likely to have a normal distribution which is bell-shaped, symmetrical about the mean and can be completely defined by its mean and standard deviation.

The normal curve is such that approximately 68% of the distribution is within 1 standard deviation from the mean and about 95% is within 2 standard deviation from the mean. In addition, the two tails are supposed to extend infinitely without touching the x axis.

Most biological and agricultural variables are distributed normally. However, if we were to plot a normal distribution curve for each variable, numerous normal distributions can be produced. 0.14
A
B
0.12
C
0.1
0.08
0.06
0.04
0.02
0
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
-0.02

The three populations shown above are distributed normally. Populations A and C have the same

of 3 units but they have different means of 12 and 15 units respectively. Population B differs from

both A and C in terms of and because it has a of 18 units and a of 6 units.

Task 2 confirm that about 68% of the population lies between 1 from for populations A

and B. The standard normal curve Instead of producing a normal curve for each variable, it would be more convenient to develop a standard normal curve which does not depend on the value of mean and standard deviation. Instead

of considering the distribution of x which has a mean of µ and a standard deviation of , we consider

the distribution of Z which is defined as Z = (x-µ)/. Z is normally distributed and the probability of x lying within any range can be found by first converting the x value into Z scores. Standard Normal Curve
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
Standard Deviation from mean

Task 3 the heights of adult Malaysian males is known to be normally distributed with a mean of 155cm and a standard deviation of 4.5cm. If a Malaysian adult male is randomly selected, what is the probability that his height is:

1. greater than 160 cm?

P(x>160) = P(Z> x ) = P(Z>

160

155 ) = P(Z>1.11) =

4.5

2. between 158 and 162 cm?

P(158<x<162) = P(

158

155 < Z < 4.5

4.5

162

155 ) = P ( 0.67 < Z < 1.56) =

3. between 150cm and 160 cm?

P(150<x<165) = P(

150

155 < Z< 4.5

4.5

160

155 ) = P (-1.11 < Z< 1.11) =

Sampling distribution

Most experiments are carried out on samples of the target population. As such, it is important to look

into the distribution of sample means and its relationship to the distribution of the original variable under study. Some properties of the distribution of sample means (sampling distribution) are:

1. The mean of the sample means is expected to be the same as the mean of the original distribution

2. The sample mean should be a better estimate of the population mean than a single individual. As such, the spread (variance) of a sampling distribution is expected to be less than that of the original distribution.

If a variable x is distributed with a population mean of µ and a population variance of 2 , samples

of size n will result in sample means x s which will be distributed with a mean of µ and a variance

of

2

n

The square root of the variance of sample means, n

,(which can be regarded as the standard

deviation of sample means) is called the standard error of the mean.

3. If the original distribution is normal, means of samples drawn from it will be normally distributed regardless of the sample size. Even if the original variable is not normally distributed, the sample means of samples drawn from it tends to be normal if the sample size is sufficiently large. This characteristic behavior of the sample mean is presented as the Central Limit Theorem.

Task 4 a die (dadu) has six faces and each number has an equal chance of showing when the die is thrown.

a. How is the number shown distributed in the original population?

b. Throw a die 10 times and record the mean as a single sample mean. Repeat this 100 times to get 100 sample means. Use SPSS to determine the histogram and distribution curve of the sample means.

c. Compare the distribution of the original population with that of the samples of 10. 1

1 Refer to Appendix 1 for the frequency distributions

If the 144 carrots is considered the entire population of interest, we could now randomly select samples of size 6 each and the means of each random sample computed. This has been done and the sample means of 48 samples of size 6 is tabulated below.

 379 195 367 316 180 347 315 309 330 236 286 335 374 217 323 320 301 249 242 289 364 271 380 324 236 261 217 459 272 257 388 176 336 377 334 280 261 466 280 273 288 364 497 234 159 308 300 224

The frequency table of the sample means is shown below:

Class

Frequency

Class

Frequency

 151-175 1 326-350 5 176-200 3 351-375 4 201-225 3 376-400 4 226-250 5 401-425 0 251-275 6 426-450 0 276-300 6 451-475 2 301-325 8 476-500 1 10
8
6
4
2
Mean = 302.00
Std. Dev. = 73.23788
N = 48
0
200.00
300.00
400.00
500.00
Frequency

Samwght

Histogram showing the distribution of the means of 48 samples of size 6 Note that the distribution of the sample means is more normal than that of the original values. Note also that the mean of the sample means is very close to that of the population mean and the standard error of mean of 73.23788 is quite close to the expected value of :

S.E. of mean = / n

= 221.313 / 6 = 90.35