Вы находитесь на странице: 1из 11

LESSON 1 A REVIEW OF STATISTICS This is a review of basic statistical concepts, at the end of which you should be able to:

: 1. Explain the need for statistics in agricultural experiments 2. Explain the use of graphical methods to describe the distribution of variables. 3. Compute and compare various measures of central tendencies and dispersions to describe a population or sample

The need for statistics Agricultural research often deals with plant and animal studies either in the laboratory or in the field. However, plants and animals tend to vary from each other even when they are treated similarly. Examples of such variations are: Differences in the heights of plants grown under similar conditions Differences in insect damage of plants similarly treated Differences in the life cycle of a parasite raised under similar conditions Differences in the diameters of bacteria-free zones in similar Petri dishes

Some of the factors giving rise to such differences include: Inherent genetical differences between plants or animals Soil heterogeneity Climatic variations Variation in cultural practices Competition for nutrients, moisture, and sunlight Mechanical errors

Therefore it is expected that plants or animals similarly treated will yield different measurements of variables such as plant height, fresh yield, or stem diameter.

On the other hand, the objective of most agricultural or biological experiments is to determine the plant or animal responses to different treatments. For example, we may wish to determine if there is any difference in the yield of kai lan grown in two different levels of nitrogen. An experiment was carried out with three plots for each level of nitrogen and the results tabulated below: Plot 1 Nitrogen 1 Nitrogen 2 126 136 Plot 2 132 131 Yield of kai lan (gm per plant) Plot 3 135 136 Average 131.00 134.33

While the average yield seems to be higher for kai lan grown under nitrogen 2, note that the plot 2 yield of 131 is lower than the plot 2 and plot 3 yields of nitrogen 1. Notice also that there is considerable variation between plots of the same nitrogen level. As a result, the variation between the yields of the two different levels of nitrogen may be partly due to variation between nitrogen levels and partly due to naturally occurring variations between plots similarly treated.

The experimenter will have to: 1. separate these two sources of variations 2. obtain an estimate of the true difference caused by the treatments (Roger Mead, et al, 2003). 3. have some assurance of how accurate this estimated difference is Statistics, which may be considered as A body of concepts and methods that is used to collect and interpret data concerning a particular area of investigation and to draw conclusions in situations where uncertainty and variation exist, has been used in the past seventy years to help researchers achieve the above three objectives.

Some basic statistical terms

Statistical Variable An observable, countable or measurable characteristic that exists and differs from one individual to another in a population. Examples of variables are: 1. Highest academic qualifications of 1000 employees of a certain organization. 2. The number of rhizomes of 100 lallang plants. 3. The weight of 200 water melons..

Types of variables Variables may be discrete or continuous. Discrete variable one which can only assume values which are countable.. Examples: number of fruitlets per oil palm bunch, number of spores in a medium, number of insects caught in a trap. Continuous variable one which can assume any value within an interval. Examples: weight in 10.2 gm, 10.24 gm, 10.2436 gm, height of high school students in cm. When we look at a distribution of height we may mistake it to be a discreet variable because heights may be recorded as 165cm, 166cm, 170cm, etc. This is we tend to round off a persons height to the nearest cm. and to the precision of the machine used to measure the height. A person growing from 164cm to 165cm must grow continuously through 164.1, 164.15, etc. before he reaches 165.00 cm tall.

Types of Measurements Variables may also be measured or counted using scale, ordinal, or nominal measurement. Scale measurement the strongest of the three. Used to measure continuous variable. Makes use of an unit of measurement. Two subtypes: interval scale and ratio scale.
o

Interval scale has a zero that does not necessarily mean zero or nil e.g.: 0 C does not mean the absence of temperature. Ratio scale has a natural zero. Example height in centimeter. Weight in grams.

Ordinal ranked as small, medium, large. Nominal the weakest of the three. Values are arbitrarily assigned to 2 or more groupings. Example: sex 1 for Male, 2 for Female.

Population This refers to the entire collection of individuals that possess some measurable, countable or observable characteristics that we wish to study. For example if we wish to study the plant height of a certain variety of paddy, the heights of all paddy plants of that variety is the population. We may also regard all plants of that variety of paddy to be the population.

Frequently it is not possible to investigate the entire population either because it is too large or it is infinite or the test is destructive.

Sample This is a small part of the population that is studied to gain information about it. An important

characteristic of a sample is that it must be representative of the population. Data resulting from experiments are usually considered as sample data.

Experimental Unit An individual member of the population that is subjected to treatment and measurement. It is also known as sampling unit. It may a single plant, a collection of several polybags of plants, a plot of several rows of plants or an entire field.

Observation (Case) All measurements made on a single experimental unit is known as an observation or case.

Selection of sample As mentioned earlier, a sample is a part of a population that is studied to gain information about it. As such, it must be representative of the population. One way to ensure this is to draw a random sample so that each and every individual in the population has an equal chance of being selected into the sample. This can be achieved by using either the random number function in statistical packages, random number generator in scientific calculators, or random number tables.

Descriptive statistics As a result of the inherent variability of plant and animal materials, different ways of describing the distribution of observed or measured variables will have to be used if the data recorded is to be of any use to the recipient.

Frequency Distribution The following table is the weight (gm) of 144 carrots harvested from a particular field plot. Very little can be seen about how the weight of carrot is distributed from the raw data. Original data 405 388 498 373 529 78 66 290 216 599 325 749 109 166 194 380 147 172 201 87 66 874 55 21 549 106 499 353 376 318 292 39 288 597 274 446 72 257 144 640 662 120 59 429 670 656 18 88 32 62 93 509 86 296 341 643 390 80 71 283 248 325 203 74 217 1017 606 191 100 175 211 212 221 673 179 238 380 197 97 305 579 106 331 31 234 610 70 149 589 202 284 258 106 543 254 286 287 709 717 36 113 116 915 252 245 340 87 343 15 494 185 95 496 129 512 385 426 134 77 91 556 93 132 54 657 223 325 121 436 124 771 505 251 173 258 484 28 813 663 318 393 157 325 87

A frequency table may be used to group the original data into a number of equal sized classes and then to count the number of carrots with weights falling within each class.

Class 0.0-66.7 66.8-133.4 133.5-200.0 200.1-266.8 266.9-333.5 333.6-400.0 400.1-466.9 467.0-533.6 533.7-600.0 600.1-667 667.1-733.7 733.8-800.0 800.1-867.1 867.2-933.8

Frequency 14 29 14 19 17 12 5 9 7 8 4 2 1 2

Greater than 0 66.7 133.4 200.0 266.8 333.5 400.0 466.9 533.6 600.0 667.0 733.7 800.0 867.1

Freq 144 130 101 87 68 51 39 34 25 18 10 6 4 3

p 1.00 0.93 0.70 0.60 0.47 0.35 0.27 0.24 0.17 0.13 0.07 0.04 0.03 0.02

934.9-1000.0 1000.1-1067.2 1067.3-1133.9 1134.0-1200.0

0 1 0 0

933.8 1000.0 1067.2

1 1 0

0.01 0.01 0.00

From the above frequency table, It seems that carrot weight is not normally distributed there are more lighter carrots and less heavy ones. In addition, by using the greater than frequency counts, it can be quickly seen that 17% of the carrots weigh more than 533.6 gm each. Therefore, if a carrot is randomly selected from the group of 144, the probability that it weighs more than 533.6 gm is 0.17. Similarly, the probability that it weighs between 533.6 gm and 600.0 gm is 0.04 (0.17 - 0.13).

Graphical presentation

30

25

20

Frequency

15

10

5 Mean = 301.9 Std. Dev. = 221.313 N = 144 0 0 200 400 600 800 1000

Weight
Histogram showing the distribution of weight among the 144 carrots.

The information in the frequency table above can be presented graphically using a histogram that has a normal curve overlay (created using SPSS). The width of each column is the same and the frequency of each class is represented by the height of its column.

Measures of central tendency and dispersion The above histogram clearly shows that while the weight of the 144 carrots tend towards some central value, there is also considerable variation (or dispersion) between them. The various measures of central tendency are: 1. mode 2. median 3. mean (average) While the various measures of dispersion are: 1. range 2. sum of deviation 3. sum of absolute deviation 4. sum of squares of deviation 5. variance 6. standard deviation 7. coefficient of variation Task 1 define or describe each of the above 10 measures and discuss their relative usefulness. Make sure that you can differentiate between parameters and statistics.

The normal distribution The distribution of weight among the 144 carrots is clearly skewed towards lower weights. However, if more and more carrots were weighed and the classes were made narrower, we are likely to have a normal distribution which is bell-shaped, symmetrical about the mean and can be completely defined by its mean and standard deviation.

The normal curve is such that approximately 68% of the distribution is within from the mean and about 95% is within

1 standard deviation

2 standard deviation from the mean. In addition, the two

tails are supposed to extend infinitely without touching the x axis.

Most biological and agricultural variables are distributed normally. However, if we were to plot a normal distribution curve for each variable, numerous normal distributions can be produced.

0.14 A B 0.12 C

0.1

0.08

0.06

0.04

0.02

0 0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

-0.02

The three populations shown above are distributed normally. Populations A and C have the same of 3 units but they have different means both A and C in terms of

of 12 and 15 units respectively. Population B differs from


1 from for populations A

and because it has a of 18 units and a of 6 units.

Task 2 confirm that about 68% of the population lies between and B. The standard normal curve

Instead of producing a normal curve for each variable, it would be more convenient to develop a standard normal curve which does not depend on the value of mean and standard deviation. Instead of considering the distribution of x which has a mean of and a standard deviation of , we consider the distribution of Z which is defined as Z = (x-)/. Z is normally distributed and the probability of x lying within any range can be found by first converting the x value into Z scores.

Standard Normal Curve


0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

-5.00

-4.00

-3.00

-2.00

-1.00

0 0.00

1.00

2.00

3.00

4.00

5.00

Standard Deviation from mean

Task 3 the heights of adult Malaysian males is known to be normally distributed with a mean of 155cm and a standard deviation of 4.5cm. If a Malaysian adult male is randomly selected, what is the probability that his height is:

1. greater than 160 cm? P(x>160) = P(Z>

) = P(Z>

160 155 ) = P(Z>1.11) = 4.5

2. between 158 and 162 cm?

P(158<x<162) = P(

158 155 162 155 <Z< ) = P ( 0.67 < Z < 1.56) = 4.5 4.5

3. between 150cm and 160 cm?

P(150<x<165) = P(

150 155 160 155 < Z< ) = P (-1.11 < Z< 1.11) = 4.5 4.5

Sampling distribution

Most experiments are carried out on samples of the target population. As such, it is important to look into the distribution of sample means and its relationship to the distribution of the original variable under study. Some properties of the distribution of sample means (sampling distribution) are: 1. The mean of the sample means is expected to be the same as the mean of the original distribution 2. The sample mean should be a better estimate of the population mean than a single individual. As such, the spread (variance) of a sampling distribution is expected to be less than that of the original distribution. If a variable x is distributed with a population mean of and a population variance of , samples
2

of size n will result in sample means of

x s which will be distributed with a mean of and a variance

2
n

The square root of the variance of sample means,

,(which can be regarded as the standard

deviation of sample means) is called the standard error of the mean.

3. If the original distribution is normal, means of samples drawn from it will be normally distributed regardless of the sample size. Even if the original variable is not normally distributed, the sample means of samples drawn from it tends to be normal if the sample size is sufficiently large. This characteristic behavior of the sample mean is presented as the Central Limit Theorem.

Task 4 a die (dadu) has six faces and each number has an equal chance of showing when the die is thrown.

a. How is the number shown distributed in the original population? b. Throw a die 10 times and record the mean as a single sample mean. Repeat this 100 times to get 100 sample means. Use SPSS to determine the histogram and distribution curve of the sample means. c. Compare the distribution of the original population with that of the samples of 10.
1

Refer to Appendix 1 for the frequency distributions

If the 144 carrots is considered the entire population of interest, we could now randomly select samples of size 6 each and the means of each random sample computed. This has been done and the sample means of 48 samples of size 6 is tabulated below. 379 374 236 261 195 217 261 466 367 323 217 280 316 320 459 273 180 301 272 288 347 249 257 364 315 242 388 497 309 289 176 234 330 364 336 159 236 271 377 308 286 380 334 300 335 324 280 224

The frequency table of the sample means is shown below:

Class 151-175 176-200 201-225 226-250 251-275 276-300 301-325

Frequency 1 3 3 5 6 6 8

Class 326-350 351-375 376-400 401-425 426-450 451-475 476-500

Frequency 5 4 4 0 0 2 1

10

Frequency

Mean = 302.00 Std. Dev. = 73.23788 N = 48 0 200.00 300.00 400.00 500.00

Samwght

Histogram showing the distribution of the means of 48 samples of size 6 Note that the distribution of the sample means is more normal than that of the original values. Note also that the mean of the sample means is very close to that of the population mean and the standard error of mean of 73.23788 is quite close to the expected value of : S.E. of mean =

/ n = 221.313 /

6 = 90.35

Вам также может понравиться