Вы находитесь на странице: 1из 93

Fundamentals of

Statistics
Outline of the Presentation
Introduction of statistics
Basic terms used in statistics
Measure of Central Tendency
Measure of Dispersion
Correlation
Regression Analysis
Testing of Hypothesis
Design of Experiment
Introduction of Statistics
Statistics is a tool in the hands of mankind to
translate complex facts into simple and
understandable statements of facts.
Statistics is used in the three senses:
Statistics as the numerical figures or data.
Statistics as a science.
Statistics as measures based on samples.
Basic Terms
Data collection of measurements
Population all possible data
Sample collected data
Variable value that may change within
the scope of a given problem
Types of Variables
Discrete Variable has a fixed value
Continuous Variable - can assume any
value
Measure of Central Tendency
A single expression, representing the whole
group, is selected which may fairly adequate idea
about the whole group. This single expression in
statistics is known as the average.
Averages are, generally, the central part of the
distribution and, therefore, they are also called the
measures of central tendency.

Features of a Good Average
It should be rigidly defined so that different persons may not
interpret it differently.
It should be easy to understand and easy to calculate.
It should be based on all the observations of the data.
It should be easily subjected to further mathematical calculations.
It should not be unduly affected by the extreme values.
It should be easy to interpret.
It should have sampling stability. It means that if one takes different
samples from the same population, the average of any sample
should approximate turn out to be same as those of the other
samples.
Types of Measures of Central Tendency
Arithmetic Average or Arithmetic Mean
Median
Mode
Arithmetic Mean
Arithmetic mean is obtained by dividing
the sum of the values of all items of a
series by the number of items of that
series.
Arithmetic Mean is denoted by X
Practical Steps Involved in the Computation of
Arithmetic Mean in Case of An Individual Series
by Direct Method
Step 1 - Treat the given values of variable as X.
Step 2 Enter the given values in a column headed as X.
Step 3 Add together all the values of variable X and obtain the total
i.e
Step 4 Apply the following formula:


Where, = Arithmetic Mean
= Sum of all values of variable X
N = Number of observations


X
N
X
X

=
X

X
Practical Steps Involved in the Computation of Arithmetic Mean in
Case of An Individual Series When Deviations are Taken from the
Assumed Mean
Step 1 - Treat the given values of variable as X.
Step 2 Enter the given values in a column headed as X.
Step 3 Take any value as Assumed Mean (denoted as A)
Step 4 Take the deviations of the variable X from the assumed mean A and
denote these deviations (X-A) by d and enter the same in a column
headed as d.
Step 5 Obtain the sum of these deviations i.e.,
Step 6 Apply the following formula :

Where, = Arithmetic Mean, A = Assumed Mean
= Sum of deviations
N = Number of observations

d
N
d
A X

+ =
X

d
Example
From the following data, calculate Arithmetic
Mean:

Serial
No.
1 2 3 4 5 6
Yield 5 15 25 35 45 55
Solution by Direct Method






Mean

N
X
X

=
30
6
180
=
Serial No. Yield (X)
1 5
2 15
3 25
4 35
5 45
6 55
N = 6 = 180
X
Solution by Assumed Mean Method






Mean


N
d
A X

+ =
Serial No. Yield (X) (X- 25) = d
1 5 -20
2 15 -10
3 25 0
4 35 10
5 45 20
6 55 30
N = 6 = 30

d
6
30
25+ = X
30 = X
Practical Steps Involved in the Computation of Arithmetic
Mean in Case of Discrete Series by Direct Method
Step 1- Treat the given values of variable as X and frequencies as f
Step 2 - Enter the given values of variable X in a column headed as X
Step 3 Enter the given frequencies f in a column headed as f and
obtain the sum of these frequencies i.e., N or
Step 4 Multiply the variable of each row with the respective
frequency and denote these product by fX and enter the same
column headed as fX
Step 5 obtain the sum of these products i.e.,
Step 6 Apply the following formula:


Where, = Arithmetic Mean
= Sum of Products of frequency and value of variable
N = = Sum of frequencies

fX
N
fX
X

=
X

fX

f
Computation of Arithmetic Mean in Case of
Discrete Series by Assumed Mean Method



Where, = Arithmetic Mean,
A= Assumed Mean
= Sum of products of deviations and
frequencies
N = = Sum of frequencies
N
fd
A X

+ =
X

fd

f
Computation of Arithmetic Mean in case of a
Continuous Series by Direct Method



= Arithmetic Mean
= Sum of products of mid-points and
frequency
N = = Sum of frequencies
Mid-Point (m) = (Lower limit + Upper limit)/2



N
fm
X

=
X

fm
Computation of Arithmetic Mean in case of a
Continuous Series by Assumed Mean Method



Computation of Arithmetic Mean by
Step Deviation Method


Where c = Class interval







N
fd
A X

+ =
c
N
fd
A X + =

Median
Median is the value of the variable that divides the series
into two equal parts in such a way that half of the items
lie above this value and the remaining half lie below this
value. Median is called a positional average because it
is based on the position of a given observation in a series
arranged in an ascending or descending order and the
position of the median is such that an equal number of
items lie on either side of it. Median is usually denoted
by Md
Computation of Median
Individual Series
Arrange the n values of the given variable in ascending (or
descending) order of magnitudes.
Case I. When n is odd
In this case term is the median
Md = term
Case II. When n is even
In this case, there are two middle terms (n/2)th and (n/2+1)th. The
median is the average of these two terms, i.e.,
Md =
th
n
2
1 +
th
n
2
1 +
2
] 1 ) 2 / [( ) 2 / ( + + n n
Discrete Series
In the case, the values of the variable are
arranged in ascending or descending order
of magnitude. A table is prepared showing
the corresponding frequencies and
cumulative frequencies.
Median (Md) = th value
|
.
|

\
| +
2
1 n
Continuous Series
In this case the data is given in the form of a frequency table with class-
interval, etc., and the following formula is used to calculate the median.



Where
L = lower limit of the class in which the median lies,
n = total number of frequencies, i.e,
f = frequency of the class in which the median lies,
C = cumulative frequency of the class preceding the median class,
i = width of the class interval of the class in which the median lies.
i
f
C n
L M
d
*
) 2 / (
+ =

= f n
Mode
Mode is that value in a series which occurs most frequently. In a
frequency distribution mode is that variate which has the maximum
frequency.
Types of Modal Series
A series of observation may have one or more modes
Unimodal series The series of observations which contains only one
mode.
Bimodal Series The series of observations which contains two
modes is called a bimodal series.
Trimodal Series The series of observations which contains three
modes is called a trimodal series.
Ill-defined Mode If a series of observations has more than one mode
then the mode is said to be ill-defined.
Computation of mode in case of
Continuous Series
Mode =
Where, L = Lower limit of the modal Class
f1 = Frequency of the modal class
f0 = Frequency of the pre-modal class
f2 = Frequency of the post- modal class
i = Class interval of modal class
i
f f f
f f
L *
2
2 0 1
0 1

+
Measure of Dispersion
Dispersion (also known as Scatter, Spread or Variation)
measures the extent to which the items vary from some
central value. It may be noted that the measures of
dispersion measure only the degree (i.e. the amount of
variation) but not the direction of the variation. The
measures of dispersion are also called average of the
second order because these measures give an average of
the difference of various items from an average.
Suppose there are three series of nine items each as
follows:

Series A Series B Series C
40 36 1
40 37 9
40 38 20
40 39 30
40 40 40
40 41 50
40 42 60
40 43 70
40 44 80
Total =360 360 360
Mean= 40 40 40
Measure of Dispersion
Range
Quartile Deviation
Mean Deviation
Standard Deviation

Standard Deviation
Standard Deviation is the root of the arithmetic
mean of the squares of deviations of all items of
the distribution from the arithmetic mean.
Standard Deviation:

n
x x


=
2
) (
o
Correlation
Correlation is the relationship that exists between two or more
variables. If two variables are related to each other in such a way
that change in one creates a corresponding change in the other, then
the variables are said to be correlated. Some examples of such
relationships are as follows:
Relationship between the quantum of rainfall and yield of wheat.
Relationship between the price of commodity and demand of
commodity.
Relationship between the age of husband and age of wife.
Relationship between the dose of insulin and blood sugar.
Types of Correlation
Positive or Negative
Simple, Multiple or Partial
Linear or Non-linear
Positive or Negative Correlation
Positive : When the values of two variables move in the same
direction i.e. when an increase in the value of one variable is
associates with an increase in the value of other variable and vise
versa, correlation is to be positive.
Negative : When the values of the two variables move in opposite
directions, so that with an increase in the values of one variable the
value of the other variable decrease, and with a decrease in the
values of one variable the values of the other variable increase,
correlation is said to be negative.
Simple, Multiple and Partial Correlation
Simple: In simple correlation we study only two
variables. Eg: price and demand.
Multiple : In multiple correlation we study together the
relationship between three or factors.
Eg: Production, rainfall and use of fertilizers.
Partial: In partial correlation more than two factors are
involved but correlation is studied only between two
factors and the other factors are assumed to be constant.



Linear and Non-linear (Curvi-linear)
Correlation
Linear: The correlation between two variables is said to
be linear if corresponding to a unit change in the value
of one variable there is a constant change in the value of
the other variable.
Non-linear : The correlation between two variables is
said to be non-linear or curvilinear if corresponding to a
unit change in the value of one variable the other
variable does not change at a constant rate but at a
fluctuating rate.
Method to Study Correlation
Scatter or Dot Diagram Method
Karl Pearsons Coefficient of Correlation
Scatter or Dot Diagram Method

Scatter diagram is a graphical method of showing the correlation between
the two variables.
Practical Steps involved in the preparation of Graph
Step1: Show time horizon along the horizontal axis OX and the variable X and
Y along the vertical axis OY
Step 2: Plot the dot for each of the individual valued of X variable and join
these plotted dots to obtain a curve.
Step3: Plot the dots for each of the individual values of Y variable and join
these plotted dots to obtain a curve.
Step 4: Observe both the curves and form an idea about the direction of
correlation.

Example
From the following information draw a
graph and indicate whether the correlation
is positive or negative





Time 1 2 3 4 5 6
X (Rs
in lakh)
10 20 30 40 50 60
Y (Rs
in lakh)
20 30 30 30 40 50
0
10
20
30
40
50
60
70
1 2 3 4 5 6
X (RS. in Lakhs)
Y (Rs. In Lakhs)
Karl Pearsons Coefficient of
Correlation
Karl Pearson (1857-1936) was a great statistician. He
gave the following mathematical formula for measuring
the magnitude of correlation coefficient between two
variables. If X and Y are two variables, then the
correlation coefficient is given by






) ( * ) (
) , (
) , (
Y Var X Var
Y X Cov
Y X =
)
`


)
`

= =
=
n
i
i
n
i
i
n
i
i i
y y x x
y y x x
Y X
1
2 2
1
1
) ( ) (
) )( (
) , (
Interpretation
The value of lies between -1 and +1

Value of Interpretation
If = +1 There exists perfect positive correlation
between the variables.
If = -1 There exists perfect negative correlation
between the variables.
If = 0 There exists no relationship between the
variables.
If +0.75 < +1 There exists high positive correlation
between the variables.
If -0.75 > -1 There exists high negative correlation
between the variables.
If +0.50 < +0.75

There exists moderate positive correlation
between the variables
If -0.50 >-0.75 There exists moderate negative correlation
between the variables
If < +0.50 There exists low positive correlation
between the variables
If > -0.50 There exists low negative correlation
between the variables

Regression Analysis
Regression is the measure of average relationship
between two or more variables in terms of the original
units of the data.
There are two types of variables in regression analysis
Independent variable (Regressor or Predictor)
Variable which influences the value or is used for
prediction.
Dependent variable (Regressed or Explained
variable)
Variable whose value is influenced or is to be predicted.
Types of Regressions
Simple Regression: The regression analysis
confined to the study of only two variables at a
time is called the simple regression.
Multiple Regression : The regression analysis
for studying more than two variables at a time is
known as multiple regression.
Linear or Non-linear Regression
Linear Regression
If the curve is a straight line, then there is a linear
regression between the variables under study.
The relationship between the two x and y is linear.
In order to estimate the best average values of the two
variables, two regression equations are required and they
are used separately.
Contd.
One equation is used for estimating the value of
x variable for a given value of y variable and the
second equation is used for estimating the value
of y variable for a given value of x variable.
The assumption is that one is an independent
variable and the other is a dependent variable
and vise versa.
Non-linear Regression
If a curve of regression is not a straight line, i.e., not a
first degree equation in the variable x and y, then it is
called a non-linear or curvilinear regression.
In this case the regression equation will have a
functional relation between the variables x and y
involving terms in x and y of the degree higher than one,
i.e., involving terms of the type etc.
xy y x y x , , , ,
3 3 2 2
Lines of Regression
Line of regression of X on Y

It can also be put in the form:
X = a+bY
Where a= intercept of the line (i.e. value of dependent
variable when value of independent variable is zero)
b = slope of the line X and Y (i.e. the amount of change
in the value of the dependent variable per unit change in
independent variable).
Line of Regression of Y on X

It can also be put in the form:
Y = a+bX








) ( Y Y r X X
y
x
=
o
o
) ( X X r Y Y
x
y
=
o
o

Testing of Hypothesis

Hypothesis testing is a process of making
a decision on whether to accept or reject
an assumption about the population
parameter on the basis of sample
information at a given level of
significance.
Null Hypothesis
A hypothesis which is tested under the assumption that it is true called a null
hypothesis.
Symbol: It is denoted by H0
Example: The null hypothesis may be that the population mean () is 50
and is set up as follows:
H0 : = 50

Alternative Hypothesis
Alternative hypothesis is the hypothesis which differs from the null hypothesis.
Symbol : It is denoted by H1
Example : If population mean is 50, an alternative hypothesis may be any one of
following three:
H1 : 50
H1 : >50
H1 : <50

Level of Significance
Level of significance is the maximum
probability of rejecting the null hypothesis
when it is true.
Symbol : It is denoted by
Usefulness : It is used as a guide in
decision-making.
o
Critical Region or Rejection Region
The critical region CR, or rejection region RR, is a set of
values of the test statistic for which the null hypothesis
is rejected in a hypothesis test. That is, the sample space
for the test statistic is partitioned into two regions; one
region (the critical region) will lead us to reject the null
hypothesis H0, the other will not. So, if the observed
value of the test statistic is a member of the critical
region, we conclude "Reject H0"; if it is not a member of
the critical region then we conclude "Do not reject H0"
Contd.
The critical region may be represented by
a portion of the area under the normal
curve in the following two ways:
By two tailed tests
By one tailed tests
Two-Tailed Tests
A two-sided test is a statistical hypothesis test in which the values for
which we can reject the null hypothesis, H0 are located in both tails of
the probability distribution.

One-Tailed Test
A one-sided test is a statistical hypothesis test in which the values
for which we can reject the null hypothesis, H0 are located entirely
in one tail of the probability distribution.
Type I or Type II Errors
The decision to accept or reject null hypothesis H0 is made on the basis of the
information supplied by the sample data. There is always a chance of
committing an error. There are two possible types of error in the test of a
hypothesis.
Type I Error This is the error committing to the test in rejecting a true null
hypothesis. The probability of committing Type I Error is denoted by
Type II Error This is the error committed by the test in accepting a false null
hypothesis. The probability of committing Type II Error is denoted by
) (o
) (|
Table showing Errors or Correct
Decision in Test of Significance
Hypothesis Accept H0 Reject H0
True Correct Decision Type I Error
False Type II Error Correct Decision
) (o
) (|
Power of the Test
Power of the test is the probability of rejecting a
false null hypothesis.
It can be calculated as follows:
Power of the Test = 1- Probability of Type II
Error
Degree of Freedom
The number of degrees of freedom generally refers
to the number of independent observations in a
sample minus the number of population parameters
that must be estimated from sample data.
or
Degrees of freedom describes the number of values
in the final calculation of a statistic that are free to
vary.
Practical Steps Involved in the
Testing Hypothesis
Step1- Specify the Null and Alternative
Hypothesis:

Type of Test Null Hypothesis
(H0)
Alternative
Hypothesis (H1)
(a) Two-tailed test H0: = 50 H1 : 50
(b) One-tailed test
(i) Right-tail
(ii) Left-tail

H0: 50
H0: 50

H1 : > 50
H1 : < 50

Contd
Step2- Specify the appropriate test statistic to be used
Example : Test statistics to be used for different tests

Test Statistic Used for Test
(i) Z-test For test of hypothesis involving large
sample i.e. > 30
(ii) t-test For test of Hypothesis involving
small sample i.e. 30
(iii) Chi-square test For testing the discrepancy between
observed frequencies, without any
reference to population parameter
(iv) F- test For testing the sample variances
Contd
Step 3 Specify the level of significance such as
5% or 1%.
Step4 Compute the value of test statistic (e.g. t,
F,Chi-square, Z) used in testing.
Step 5 Find the critical value of the test statistic
used at the selected level of significance from the
table of respective statistics distribution.


Contd
Step 6 Specify the decision as follows:
Since the computed value is greater than the critical value, we reject
the null hypothesis (H0) and conclude that the difference is
significant and it could not have arisen due to fluctuations of
random sampling.
or
Since the computed value is less than the critical value, we accept
the null hypothesis (H0) and conclude that difference is not
significant and it could have arisen due to fluctuations and random
sampling.

t- test
t-test is used when
The sample size is 30 or less
The variance of the population is unknown
The sample is random sample
The population is normal and selection of items
is independent
Application of t- test
To test the significance of the mean of a
random sample.
To test the significance of the difference
between means of two independent
samples.
To test the significance of the difference
between the means of two dependent
samples or paired observations.
To test the significance of an observed
correlation coefficient.
Test for specified mean of a small sample when
population standard deviation is unknown







d.f = n-1
Test for difference between the means of two
independent random samples
Test for difference between the means of two dependent
samples





d = difference between values of pair
Test for an observed correlation coefficient
S
n X
t
) (
=
2 1
2 1 2 1
n n
n n
S
X X
t
+

=
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
+
+
=
n n
S n S n
S
S
n d
t =
) 1 (
) (
2 2

=

n N
d d n
S
2 *
1
2

= n
r
r
t
1
) (
2

=

n
X X
S
Chi- Square Test
Chi-Square test is a technique to examine whether a
given discrepancy (i.e. value of chi-Square) between
theory and observation is considered to be significant.
Chi-Square is given by:

Where O = observed frequency
E = Expected frequency


=
E
E O
2
2
) (
_
Conditions for the Application of
Chi-Square Test
Random Samples The sample must be drawn
at random from the target population.
Independent Observations The observations of
each sample must be independent
At least 50 observations Each sample must
contain at least 50 observations
Data in original Units The data must be
expressed in original units
At least 5 frequencies In any one cell, there
must be at least 5 frequencies

F- Test or Analysis of Variance (ANOVA)
The analysis of variance or F-Test is a
technique used for testing the significance
of the difference among more than two
sample means and to make inferences
about whether such samples are drawn
from the populations having the same
mean.
F-test is based on the ratio rather than
difference between variances.
F test is obtained by


or




2
2
2
1
s
s
f =



=
) 1 /( ) (
) 1 /( ) (
2
2
2
2
1
2
1
1
n X X
n X X
f
Assumptions of Analysis of Variance or F test
Each sample is drawn randomly from a
normal population and the sample
statistics tend to reflect the characteristics
of the population
The population from which the samples
are drawn have same means and variance
i.e.

2 2
3
2
2
2
1
3 2 1
....
......
n
n
o o o o

= = =
= = =
Use of F- test
For test of hypothesis of equality between
two variances.
For test of hypothesis of equality among
several sample means.
Analysis of Variance
Analysis of variance is the ratio of 2
variances
i. Between Samples
ii. Within Samples
Its purpose is to find out the influence of
different forces working on them.
It is used for agricultural experiments,
for natural sciences, for physical
sciences etc.

Classification Model
One way classification model
Two way classification model
One Way Classification Model

One way classification model is designed
to study the effect of one factor in an
experiment.
It is designed to test the null hypothesis
that the arithmetic means of the
population from which the k samples are
randomly drawn are equal to one another.


k
H ...... :
3 2 1 0
= = =
Practical Steps Involved in One Factor Analysis
of Variance
Step 1- We set
Step 2 Calculate the mean of each sample
i.e. and grand average as follows:

Step3 Calculate the differences between
means of various samples and grand
average.
Step 4 - Square these differences and obtain
their total i.e for each sample.
2
2
2
1 1
2
2
2
1 0
:
:
o o
o o
=
=
H
H
k X X X ,......., , 2 1
k
k
N N N
X X X
X
+ + +
+ + +
=
......
.....
2 1
2
1


2
1 ) ( X X
Contd
Step 5 Calculate the sum of squares
between the samples (SSB) as follows:

Step 6 Calculate the difference between
the various items in a sample and the
mean value of the respective samples.
Step 7 Calculate the sum of squares
within the samples (SSW) as follows:

......... ) ( ) ( ) (
2
3
2
2
2
1 + + + =

X X X X X X SSB
...... ) ( ) ( ) (
2
3
2
2
2
1
+ + + =

X X X X X X SSW
Contd.
Step 8 Prepare ANOVA table as follows:

Source of
Variation
Sum of
Squares
Degree of
freedom
Mean
Squares
Computed
value of F
Table value
of F
Between
Samples
SSB c-1 MSB =
SSB/c-1
F =
MSB/MSW
Within
Samples
SSW n-c MSW=
SSW/n-c
Total SST n-1
Contd.
Step 9 Interpretation

Case Interpretation
If the computed value of F is
greater than the table value of F
The difference in the variances is
significant and it could not have
arisen due to fluctuations of
random sampling and hence we
reject H0
If the computed value of F is less
than the table value of F
The difference in the variances is
not significant and it could have
arisen due to fluctuation of random
sample and hence we accept H0
Two way Classification Model
Two way classification model is designed to study the effects of two
factors simultaneously in the same experiment.
ANOVA Table

S.V Sum of
Squares
Degree of
freedom
Mean Squares Variance Ratio
Between
Columns
SSC c-1 MSC = SSC/c-1 F1 = Greater
variance/ Smaller
Variance

Within Samples SSR r-1 MSR = SSR/r-1 F2 = Greater
variance/smaller
Variance
Residual/Error SSE (c-1) (r-1) MSE = SSE /(c-1) (r-1)
Total SST rc-1
Design of Experiment
Design of Experiment was developed by
R.A.Fisher in twentieth century.
Design of Experiment focuses on improving
agricultural experimentation
Design of Experiment is also used in many
industrial sectors in the development and
optimization of manufacturing processes.
Definition of Design of Experiment
Design of Experiment (DOE) is a structured approach to efficiently
characterize, improve, and optimize a process or product by
collecting, analyzing and interpreting data. The methodology is
used to
Plan and analyze experiments to study several factors
simultaneously.
Reduce experimental cost, time and resources.
Establish cause and effect relationships between process inputs and
process outputs.
Identify key factors that have the greatest and least impact on
product or process performance.
Improve yield, reliability and performance.
Terminology in Experimental Design
Experiment: An experiment is a device or a means of getting an
answer to the problem under study.
Treatment : Various objects of comparison in a comparative
experiment are termed as treatments. Eg. In field experimentation
different fertilizers or different varieties of crop or different methods
of cultivation are the treatments.
Experimental Unit : The smallest division of the experimental
material to which, we apply the treatments and on which we make
observations on the variable under study. In field experiment the
plot of land is experimental unit.
Contd
Experimental Error: The variation in experimental
units that have been exposed to the same treatment is
attributed to experimental error. This variability is due to
uncontrollable factors, or noise factors.
The variation arises due to
The inherent variability in the experiment material to
which treatments are applied.
The lack of uniformity in the methodology of
conducting the experiment.
Lack of representativeness of the sample to the
population under study.
Replication : Number of times a particular treatment is
repeated or executed in a design.
Principles of Design of Experiment
Replication: Repetition of the experiment under similar conditions.
Number of replication should not be less than 4.
Benefits
Reduces experimental error
Increases the precision of estimates
Randomization: Random process to assign experimental units to
treatments.
Local control (Error control): Process of reducing the
experimental error by dividing the whole block into homogenous
subgroups.
Completely Randomized Design
(CRD)
This design is based on the principle of randomization and local control.
In this design, the basic assumption is that the field is homogenous. So,
treatments are allocated in a pure random manner to the different plots.
Limitations
It is suitable only for small number of treatments and for homogenous
experimental material.
It does not consider the principle of local control.
Since the randomization is not restricted in any direction to ensure that the
units receiving particular treatment is similar to those receiving the other
treatment and because of this the total variation among the experimental
units will be included in the residual variance. This makes the design less
efficient and less sensitive also.

Randomized Block Design (RBD)
In field experiment if experimental material is
not homogenous and the fertility gradient is only
in one direction.
It is a simple method to control experimental
error.
In this method grouping the whole experimental
material into relatively homogenous STATA or
subgroup.
The treatments are apply randomly to each
stratum or block.
Example
If a farmer wishes to know the effect of 4
different types of fertilizers on crop yield
he can profitably the technique of
randomized block design.
Suppose the experiment is to be
conducted on 4 blocks of land each having
5 plots.
Now he can use fertilizers A, B, C, D on a
random sample basis on the 5 plots of
land in the first block.
Contd.
Similarly on the remaining 3 blocks also,
he will use these four fertilizers on the
basis of random sampling in the five plots
in each block.
The use of the random sampling technique
would nullify the effect of soil fertility on
output because fertilizers is used (on the
basis of random sampling)on different
plots and in different blocks which may
have different soil fertility.
Contd
Random sampling technique would ensure
that each fertilizer is used on different
types of plots with varying fertility.

This experiment would yield better results
because the effect of soil fertility is
nullified and we can judge the
effectiveness of each fertilizer more
correctly.
Contd
On the basis of random sampling the
farmer may obtain the following
arrangement.




Limitation: It is not suitable for large
number of treatments.

Plots


Blocks
1 2 3 4 5
1 A C B D B
2 B D A C A
3 D C C A B
4 A B C D D
Latin Square Design

With the Latin Square design we are able to
control variation in two directions.
Treatments are arranged in rows and columns
Each row contains every treatment.
Each column contains every treatment.
The most common sizes of LS are 5x5 to 8x8
Advantages of the LS Design

We can control variation in two directions.
Increase efficiency as compared to the
CRD and RBD.
The analysis remains relatively simple
even with missing observation

Example
Suppose we have 4 treatments to be compared and blocks are in
perpendicular directions. Also, each block must have 4 plots for 4
treatments under study (one for each). Thus, there will be total 16
plots in the form of 4 rows and 4 columns. The treatments are then
applied at random, such that each treatment appears but once in
each row and each column. Obviously, there can be a number of
arrangements. A particular layout may be as follows:
A C D B
C A B D
D B C A
B D A C







Thanks!

Вам также может понравиться