Вы находитесь на странице: 1из 6

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

Calculating the sample size of multivariate populations:


Norms of representation
CONSTANTINOS N. TSIANTIS
Department of Energy Technology
Technological Educational Institution of Athens
Agiou Spyridonos Street, Aegaleo 12210 Athens
GREECE
cotsiant@tee.gr

Abstract: - This article is touching the problem of representation of a mathematical space and treats the
problem of sampling as a problem of representation. It makes the distinction between population
representation and statistical representation, and considers statistical representation as the product of population
representation and statistical (behavioral) factor . It produces the formula of representation of a population
N (consisted of m number of classes with given number of subjects per class), giving the sample size nath . It
produces the formula of statistical representation nbath as product of nath and of statistical factor. Application
of the first formula justifies the number of representatives at the Vouli of ancient Athens while application of
the second formula gives results similar to those in statistical bibliography.
Key-Words: - Representation, Sample, Athenian Norm, Statistical Factor, Allocation,
Variance.

method of stratified sampling which accompanies


the principle of representation and the condition for
minimum variance.
During the last decades,
computer advancements and programming (nQuery,
PASS, SAT, etc.) have provided a spectrum of
numerical methods in facing the problem of
sampling. Although, this plethora of approaches has
established a relativism of solutions and has
triggered a time consuming process in dealing and
choosing among alternatives. This situation is not
functional for the practitioner of statistics at the
various scientific fields (social sciences, education,
psychology, biostatistics, environmental sciences,
physical sciences, etc.) and, in addition, it preserves
an undesirable relativism
in mathematical
epistemology which demands definite and close
solutions for its problems.
This article is an endeavour to consider the
problem of sample size as a problem of
representation and develop close mathematical
formulas which reduce relativism and strict the
number of assumptions underlying the calculation of
sample size.

1 Introduction
The existence of a mathematical formula giving the
number of persons required to represent a community
of citizens is a task of high significance for the
political and social sciences in a democratic society.
Such a formula has not be known up today even
though probably it had been used in determining
the number of representatives for the parliament
(Vouli) of ancient Athens.
The calculation of sample size belongs to the
central issues of statistics and
influences the
validity of research outcomes and research cost as
well. Modern statistics has provided us with
formulas and tables for determining the sample size
required to make comparisons among population
groups [1],[2],[3] by using the concept of effect size
and the assumption of normal distribution as far as
the measurable
characteristics of
subjects.
Although, the effect size and the assumption of
normal distribution
are not usually known
beforehand [4] and, hence, previous statistical data
are required. The same demand holds also for the

ISSN: 1790-5117

Stratification, Minimum

301

ISBN: 978-960-6766-47-3

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

the representative space.

Problem

2.1 The representation of a population


A population of size N consists of m mutually

3.2 Solution of
representation

exclusive and exhaustive classes of subjects, with


N 1 subjects in class 1, N 2 subjects in class 2,..., and

N m subjects in class m.

the problem of population

3.2.1 Notations-Definitions
N
the total size of population consisted of m
classes of subjects
m
the number of classes- the same in the
population and the sample
the number of subjects per
N 1 , N 2 ,, N m
population class

What is the value of n

and the synthesis n1 , n 2 ,, n m of a sample, drawn


randomly from the population, so that the minimum
valid representation of the population
to be
achieved ?

2.2 The statistical representation


A population of size N is consisted of m

N1 + N 2 + ... + N m = N

mutually exclusive and exhaustive classes of


subjects, with N 1 subjects in class 1, N 2 subjects in

w1 , w2 ,..., wm

(Eq.1)

the percentage of each class in the

population

class 2,..., and N m subjects in class m. The subjects


are measured in terms of some variable of interest X
and the statistical parameters for the population and
its classes (i.e., means and SDs) are considered
known. A
sample of size n , consisted of
subjects respectively per class, is
n1 , n 2 ,, n m
drawn randomly from the population. What is the
value of n and its allocation ( n1 , n 2 ,, n m ) so that,
on the basis of available information (data), the
minimum valid representation of the population to
be achieved?

w1 =

N
N1
N
, w2 = 2 ,, wm = m
N
N
N

w1 + w2 + ... + wm = 1
n

(Eq.2)

(Eq.3)

the sample size (under calculation)

n1 , n 2 ,, n m

the number of subjects per class in

the sample

n1 + n2 + ... + nm = n
1 , 2 ,..., m

(Eq.4)

the percentage of each class in

the sample

Problem Solution

1 =

3.1 The principle of representation

1 + 2 + ... + m = 1

Statistical methodology has used various ideas and


strategies to extract a sample from a population. A
guiding principle for doing this is the principle of
representation suggested and developed by mentors
in the field [5],[6]. Related to this principle is the
principle of random sampling and the method of
stratification as well.
The principle of representation
implies, in
essence, the demand of similarity between the
synthesis of a space of interest (population) and the
synthesis of its representative space (sample). This
similarity is expressed by the equality of respective
class proportions between the space of interest and
the representative space: wi = i (i =1: m) . The same
thing can be stated probabilistically as follows: the
probability that an element of class-i of space has
to be found in it equals to the probability that an
element of representative class-i has to be found in

ISSN: 1790-5117

n
n1
n
, 2 = 2 ,, m = m
n
n
n

(Eq.5)

(Eq.6)

3.2.2 Deriving
the Athenian
norm of
representation ( ni proportional to Ni )
The solution of Problem 2.1 (find the sample size
n and its synthesis n1 , n 2 ,, n m representing a
population N with synthesis N 1 , N 2 ,..., and N m )
is formulated as follows:
If n is the required size of the sample and

n1

the number of subjects included in the first


sample class, then, the proportion of subjects of
class-1 in the sample is

1 =

302

n1
n

(Eq.7)

ISBN: 978-960-6766-47-3

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

n 2 m w1 w 2 ...wm = N m

The probability that a subject (of whatever class)


from the population N be represented in the sample
n is

p=

n
N

(Eq.16)
From Eq.16 we take then directly the size n of the
sample:

n=

(Eq.8)

The probability p11 that a subject of class-1 from the


population N be represented in the sample n

where

(Eq.9)

p prob 1 is the probability that a subject of

class-1 be found in the population N, that is

p prob1 =

N1
= w1
N

(Eq.10)

Thus, the probability p11 , through Eqs.8 and 10,


becomes

p11 = w1

n
N

The synthesis of the sample n1 , n 2 ,, n m (the


number of subjects per class in the sample) is given
then by the equation:

(Eq.11)

By applying Eq.11 for the n1 subjects of class-1,


we take the probability p1n that n1 subjects of
class-1 of population N be represented in the
sample n (no matter in which sample class), that is

p1n = n1 w1

n
N

ni =

n1
n
= n1w1
n
N

(Eq.13)
subjects of

class-1 (through

(Eq.14)
We repeat the above process for all the classes of
population, which are
mutually exclusive and
i = 1,2,..., m classes
exhaustive. Thus, for the
consisted of N 1 , N 2 ,, N m subjects respectively
(having proportions w1 , w2 ,..., wm ) the following
system of equations is formed:

n 2 w2 = N

............................

n 2 wm = N

n 2 w1 = N

ISSN: 1790-5117

(Eq.18)

3.2.3 Application 1
The Athenian parliament (Vouli) was established by
Solon in 594 B.C and originally was consisted of
400 men (one hundred men from each of the four
tribes). Cleisthenes (508 B.C.) expanded the number
of representatives to 500 (50 mean from each of the
10 municipalities /demoi of Attica). Membership
was restricted during that time to the top three of the
original four property classes (the nobles/
Pentacosiomedimnoi, the knights/ Hippes and the
farmers /Zeugitae, not the Thetes) and to the male
citizens over the age of thirty.
According to Sinclair, the number of citizens in
the city of ancient Athens (males, females and
children) was estimated to 120000 around the 480
B.C. and to 160000-170000 around the 431 B.C.
(beginning of the Peloponnesian war). The number
of male citizens who had completed the 30th year of
age and were permitted to participate at the Vouli
was about 30000 in 480 BC and 40000 in 431 B.C.
[7].
The sum percentage of the top two classes,
according to Glotz [8], was about 6.0 % of the male
citizens ( w1 + w2 =0.06, with w2 about 3%). The
majority of citizens were small farmers (zeugitea),
whose percentage w3 can be derived by the equation

n 2 w1 = N

By multiplying the
take the relation

Ni

nath = wi nath = i nath


N

i = 1,2,..., m

(Eq.12)

But, according to the principle of representation, the


above probability must be equal to the probability
implied by the respective class proportion
1 = n1 / n (Eq.5). That is,

Therefore, for the


Eq.13) we take

(Eq.17)

The above formula (Eq.17) indicates that the sample


size for the case of population representation
depends not only upon the size of population N, but
also upon the way that N is distributed among the
population classes, as indicating by wi ' s (Eqs.2).
Since, as is explained below (see application 3.2.3),
Eq.17 justifies the number of representatives
making the Vouli (Parliament) of ancient Athens, it
is called the Athenian norm of representation and
n is signified here by the symbol nath .

equals to the product

p11 = Pprob1 p

N
m w w ...w
m
1 2

(Eq.15)

above equations in parts, we

303

ISBN: 978-960-6766-47-3

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

1) The standard deviation for a variable X (like

w3 = 1 w1 w2 . If we apply the method of

its variance ) has two sources: i) the standard


deviations 1 , 2 ,, m of subjects within each
class, and ii) the standard deviations between
classes.
2) The proportion by which a SD unit from classi (i = 1,2,..., m) contributes to the sum of SDs
within classes is
2

population representation for the above three civic


classes, by ranging N from 30000-40000 and w1 and

w2 from 0.0255 to 0.0325 (using some computer


program), then an area of possible solutions is
identified which for w3 = 0.94 approaches the
number of 500 representatives (465 for N=30000,
502 for N=35000 and 537 for N=40000).

3.3 Solution for the


representation

i =

problem of statistical

1 + 2 + ... + m

(Eq.19)

where

1 + 2 + ... + m = 1

3.3.1 Additional Notation


X
the variable of interest

the mean for the whole population

(Eq.20)

Through this definition of i ' s we have the same


unit measuring
the SDs within classes and,
simultaneously, the probability (proportion) by
which the behavioural component of each class
contributes to the overall behaviour (sum of SDs).
3) The sum 1 + 2 + ... + m participates with

1 , 2 ,..., m the mean for each population class

the standard deviation


for the whole
population
the standard deviation for each
1 , 2 ,, m
population class
x
the mean for the whole sample
the mean for each sample class
x1 , x 2 ,..., xm
s
the standard deviation for the whole sample
the standard deviation for each
s1 , s 2 ,, s m
sample class.

percentage ( 1 + 2 + ... + m ) / to the SD of the


total population. One unit of the within classes SD
particpates, therefore, with percentage 1/ .
4) The probability that the SD proportion 1 (of
class-1) make presence to the SD of the total
population equals then to the product 1 * 1 / .
5) We consider now an element of class-1 from the
subspace N 1 1 . The probability that this element
make presence on the total space N equals to the
product of probabilities of its constituent subspaces,
that is: ( N 1 / N ) * (1 / ) . In other words

3.3.2 Deriving the norm of statistical


representation: ni proportional to Ni i (Neyman
allocation)
The procedure followed for
deriving the
fundamental formula (Eq.17) can be repeated
appropriately to derive the formula for the case of
statistical representation. This representation
incorporates, besides the population representation,
the statistical factor which is expressing the
measurable characteristics (behaviour) of subjects in
relation to some variable of interest X. We select
here as statistical parameter to represent these
characteristics (behavioural data) the standard
deviation (SD).
To proceed with the solution, we consider N
and as elements of two distinct independent
subspaces, upon which we can work separately
demanding at the same time that their product be
represented by the sample n .
We rewrite for this purpose Eq.10 ( Pprob1 = N1 / N )

Pprob1 =

N11
N

(Eq.21)

6) Then, the probability p11 that an element of


class-1 of space N be represented in the sample
n equals to
p11 = Pprob1 * ( n / N ) . Having n1
elements, the respective probability becomes equal
to p1n = n1 Pprob1 n / N .
7) But, according to the principle of representation,
the above probability must correspond to the fist
sample class
and be equal to the respective
proportion 1 = n1 / n . That is,

n1
N n
= n1 1 1
n
N N

(Eq.22)

and, therefore,

and Eq.11 ( p11 = w1n / N ), by taking into account


that:

ISSN: 1790-5117

n2 =

304

N 2
N 2
N
=
=
N11 (w1 N )1 w11

(Eq.23)

ISBN: 978-960-6766-47-3

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

equal. For this case, the behavioral factor becomes

By repeating the above process for all classes


i = 1 : m , a system of m equations is formed:

n2 =

i = 1 : m

N
wi i

f S = m s and the per class sample size becomes


ni = wi nbath . What is needed here is only the

(Eq.24)

(Eq.24)of the SD for the whole population or


knowledge
the sample.
The allocation ni = wi nbath reflects the case of
simple stratified sampling which ensures variance
smaller than that of simple random sampling but
bigger than that of optimal allocation (Eq.28 ). It is
obvious that when the statistical factor becomes unit,
then Eq.25 degenerates into Eq.17 and nath nbath .

The coexistence of above equations provides the


required sample size. For this case of statistical
representation we signify it by the symbol nbath :

nbath =

N
w1 w2 ...wm

nath f
1 2 ... m
(Eq.25)

3.3.3 Application 2:
example

Eq.25 implies that the sample size for the case of


statistical representation is the product of population
representation expressed by nath and of statistical-

Deming [6] in his classical book Some theory of


sampling (pp.233-4), states an application example
on
the method of
Neymans
sampling
( ni proportional to Ni i ).

behavioural factor f defined by the equation

1 2 ... m

(Eq.26)

Table 1. Description of the Universe

is usually
The statistical factor f , since
unknown, can be replaced by the respective sample
factor (which can be calculated from some previous
measurement)

fS

Stratum
limits in
terms of
total
assets
(1000$)

s
m

1 2 ... m

(Eq. 27)
The synthesis of the representative space (sample)
n1 , n 2 ,, n m , in order to be here similar to the
synthesis of
space (principle of representation),
must follow the equations:

ni =

N i i

nbath
N1 1 + ... + N m m

i = 1,2,..., m

Unknown
Under 50
50-99
100-249
250-499
500-999
10004999

Number of
corporations

Estimated
average
net
income
(x 1000$)

Ni

5600
28700
11100
13000
7500
5100
5800

1
1
5
15
50
100
300

Standard
deviation of net
income
(x 1000$)

5
5
8
20
65
130
390

(Eq.28)
Demings example is rephrased here as follows: A
program is planned with purpose to collect financial
data (such as sales, market cost of goods, income)
from the American manufacturing corporations. For
checking
the reliability (accuracy) of the under
collection data, the project administration decided to
set the net income of each corporation as the
controlling criterion of data reliability. A sample,
henceforth, was designed with purpose to estimate
the precision of net income and, in consequence,
the accuracy of rest financial parameters.
Demings calculations of sample size (7600) and
synthesis are expressed is the results illustrated in
Table 2.
To compare the method of statistical

It is well known that the sample allocation


expressed by Eq.28 is the one which ensures
minimum variance [6,9].
The calculation of sample size, therefore, for the
case of statistical representation requires, in addition
to Nis, the knowledge of SDs. Since the SDs of
population classes 1 , 2 ,..., m are usually
unknown, they can be replaced by their respective
sample estimates s1 , s2 ,..., sm . If the SDs for the
sample classes are not given, then a little less
sample size is calculated when these are taken as

ISSN: 1790-5117

Comparison to Demings

305

ISBN: 978-960-6766-47-3

AMERICAN CONFERENCE ON APPLIED MATHEMATICS (MATH '08), Harvard, Massachusetts, USA, March 24-26, 2008

representation, proposed in this article, we calculated


the sample size through the following steps:
(1) Calculation of proportions wi = N i / N , for

( nath ). The second problem of single statistical


representation includes the problem of population
representation and
its solution incorporates
automatically the condition of minimum variance
permitted by the available information. If the
population representation is known, the statistical
representation can be achieved with good accuracy
by knowing only the SD of the population or
sample as a whole.
3. The proposed formula for the
population
representation is a straight forward one, does not
demand the statistical factor, is easily applicable and
introduces a new point of view for the project of
political-social sciences. This formula is considered
as a fundamental one, since nath
is the decisive
multiplication factor in the formula of statistical
representation nbath = nath * f . The allocation of
subjects implied by the last formula comes directly
from the application of representation principle (not
from an optimization procedure), a fact which
together with the automatic satisfaction of condition
of minimum variance generates also promising
ideas for the field.

i = 1,2,..., m and m = 7 .
(2) Calculation of i ' s , by Eq.19
(3) Calculation of nath = 805.3336 , by Eq.17
(4) Calculation of ni ' s , by Eq18 ( ni = wi nath ).
(5) Calculation of variance (on the base of above
ni ' s ) through the general equation of variance (not
that of minimum variance)
m

2 = wi2

i2 N i ni

ni N i 1

(Eq.29)

This provided SD = 4.0138.


(6) Calculation of the statistical-behavioural factor,
by Eq.26 (Eq.27): f S =9.4660
(7) Calculation of product nbath = nath * f S , giving
total sample size nbath 7623, and
(8) Allocation of nbath according to Eq.28.
The results for both methods and the percent of
divergence between calculations are given in Table 2.

References:
[1] Cohen, Jacob, Statistical power analysis for the
behavioural sciences, New York: Academic Press,
1969, [231, 243, 248, 252, 314].
[2]
Kirk, Roger E. Introductory Statistics,
Wadsworth Publishing, 1978.
[3] Julious, Steven A Tutorial in Biostatistics:
Sample sizes for clinical trials with Normal data.
Statist. Med. 2004; 23:19211986.
[4] Conover, W. J., Practical Nonparametric
Statistics, 2nd ed., John Wiley & Sons, 1980
[5] Neyman, Jerzy. On the two different aspects of
representative method: The method of stratified
sampling and the method of purposive selection,
Journal of Royal Statistical Society, Vol.97, No.4
(1934), pp.558-625.
[6] Deming, W. E. Some theory of sampling, New
York: Dover Publications, 1966, p.226-230
(originally published by John Wiley in1950).
[7] Sinclair, R.K. Democracy and participation in
Athens, Cambridge University Press, 1988.
[8] Glotz, Gustave, Ancient Greece at Work, (tr. by
M. R. Dobie and E. M. Riley), New York: Barnes &
Noble, 1968.
[9] Tryfos, Peter. Sampling methods for applied
research, New York: John Wiley, 1996, p.98.

Table 2. Comparison of sample size calculations


Stratum
limits in
terms of
total
assets
(1000$)
Unknown
Under 50
50-99
100-249
250-499
500-999
10004999
All
classes

Demings
calculation

Calculation
through the
proposed
formula

ni

nbath ,i

54
277
172
502
942
1281
4372

54
278
172
504
945
1285
4384

0
0.36%
0
0.39%
0.32%
0.31%
0.27%

7600

7622

0.29%

Percent of
divergence
Between
calculations

4 Conclusion
1. The principle of representation may reshape
many types of sampling problems. In this article, by
distinguishing
the
problem
of
population
representation from that of statistical representation,
we found results similar to those of statistical
bibliography.
2. The problem of population representation
(required minimum size) has a unique solution which
is provided by the Athenian norm of representation

ISSN: 1790-5117

306

ISBN: 978-960-6766-47-3

Вам также может понравиться