Академический Документы
Профессиональный Документы
Культура Документы
Item shape No. of items
4
6
5
3
3
Total 21
Organization criteria:
-shape
-color
-dimension
Basic terms
The root of statistica term is the Italian word stato (state). Later, starting from this term a new
term statista was appeared. Its meaning was a person who make business with the state.
Therefore the term statistica, signified at first a collection of facts useful for a statista. With this
meaning Statistica was used in Italy in 16th century and later in France, Holland and Germany.
Today this term signify more than facts regarding to the state and it is used almost in every
domain.
Statistic population - represents a set of items used as object of study, well delimited spatial and
temporal, characterized through its size and structure;
Statistic unit - represents the fundamental item of statistic population which
may be characterized by a set specific characteristics. These characteristics represents the
subject of a research;
Sample - represents the number of statistic units which will be extracted
from a statistic population and will be studied.
Grouping variable - represents a characteristic which permit us to organize the statistic units
from a statistic population in homogenous classes or permit us to study the modifications of
other variables in time or space.
Statistic datas - values of grouping variables determined by using a scale to
measure them. These are used to study the statistic units from a population.
1. Grouping variables
Types: The grouping variables can be classified in many categories using the following
criteria:
a) by content:
a1. attributive variables - these are attributes, characteristics of statistic units
from a statistic population used to organize them in homogenous classes.
Example: gender, age, profession, productivity, seniority, salary …
Time moment variables concerns to time intervals smaller or equal with a day.
Time interval variables concerns to time intervals bigger than a day.
For this type of variables the arithmetical operations must have sense!
continuous - these can take any numeric values from a specific interval.
Example: average grade, weight, salary, productivity in lei etc.
b2. qualitative variables - can have values expressed by words. These are used to
make the difference between many categories.
Example: gender, profession, eyes color, the education level, nationality etc.
1. Grouping variable
Scales used to measure the values of grouping variables
a) Nominal scale – the values determined by using this scale permit us only to
categorize the elements of a population.
With these values we can not construct hierarchies for the elements of a population!
Example: Using the variable eyes color we can categorize the students from a year of
study in the following categories:
Based on these values we can only say that the students can be distributed in the four
categories, like above, and we can not say that the students from a specific categories
are the first in some hierarchy. The colors used above are items from a set for which the
sorting operation can not have sense.
1. Grouping variable
b) Ordinal scale – the values determined by using this scale permit us to construct
hierarchies.
Example: We can determine the consumers preference for a specific product by giving a
values like the following: the best, good, normal, less normal, the worse. We can replace
these values with numbers:
1 = the best,
2 = good,
3 = normal,
4 = less normal,
5 = the worse.
Using any of these values we can not say that a product which has the preference value
the best (1) is tree times better than one product which has the preference value normal
(3) or five times better than a product which has the preference value the worse (5),
even when these preferences are from the same consumer.
The ordinal scale does not allow to determine the distance between two values!
1. Grouping variable
c) Interval Scale – the values determined by this scale can be used in calculus of
proportions with intervals between 0 value (the origin of scale) and their position.
This values cannot be used directly into the proportions calculus because the 0 value
was conventional established and does not signify the absence of studied phenomenon.
An easy to understand example for this kind of scale is the way we use to measure the
time during on a day. The 0000 a.m. does not mean the absence of time. We cannot say
that the 0800 a.m. is two times bigger than 0400 a.m., but we can say that the interval of
time between 0000 and 0800 is two times bigger than the interval 0000-0400.
Other example is the scale used to measure the temperature in Celsius or
Fahrenheit degrees. 0oC does not signify the absence of heat. Also, we cannot say that
60oC mean two times hotter than 30oC, but we can say that for raising the temperature
of an object from 0oC to 60oC is needed two times more heat than we need to raise its
temperature from 0oC to 30oC.
d) Proportional (Rapport) Scale – is the most complete type of scale. The values
determined by this scale can be used for all types of arithmetical operations. For this
scale, the 0 value is absolute 0 and it means the absence of the studied phenomenon.
Example: 0 lei means the absence of money, 100 lei means two times more money than
50 lei.
2. Statistic Series
Definition: Statistic Series – represents a parallel between two or more datasets, at
least one of them must target the grouping variable.
Types of statistic series:
a) By the number of grouping variable included, the statistic series can be categorized in:
simple series - when are constructed as a parallel between two datasets and
includes only one grouping variable;
complex series- when are constructed as a parallel between more than two
datasets and includes at least one grouping variable
b) By the type of grouping variable the statistic series can be categorized in:
o distribution series;
o time series ;
o space series.
2. Distribution series
Conditions:
1. Distribution series can be constructed only by using attributive grouping variables.
Types of distribution series:
1. By the number of grouping variables included:
- with one grouping variable - simple series (one-dimensional)
- with two or many grouping variables - complex series (two-dimensional,
three-dimensional etc.)
2. By the way of grouping the grouping variable values:
- with values grouped by intervals - distribution series by intervals
- with values grouped by variants - distribution series by variants
Grouping Absolute
Number of Relative Increasing cumulative Decreasing
variable (X) frequency
cases frequency absolute frequency cumulative absolute
(fi) (pi) (icfi) frequency (dcfi)
IL1(=xmin)-SL1 f1 p1=f1/N icf1=f1 dcf1=N
SL1-SL2 f2 p2=f2/N icf2=icf1+f2 dcf2=dcf1-f1
SL2-SL3 f3 p3=f3/N icf3=icf2+f3 dcf3=dcf2-f2
LS3-SL4(xmax) f4 p4=f4/N icf4=icf3+f4=N dcf4=dcf3-f3=f4
Total N=f1+f2+f3+f4 1 * *
Inferior (or superior) limit is included in interval.
No.
Grade students Salary No. of workers
2 4 (lei)
3 6 600-800 1
4 10 800-1000 7
5 14 1000-1200 12
6 18 1200-1400 11
7 19 1400-1600 5
8 17 1600-1800 4
9 14 Total 40
10 8 Note. Inferior limit included in interval
Total 110
Example: Simple distribution series by intervals
Example: Simple distribution series by
variants The distribution of workers from SC CRS CONSTRUCT SRL
company by monthly brute salary
The distribution of students from year I,
Accounting, by grades from Basic
Statistics exam in 2007
▲
2. Two-dimensional distribution series
n n n n
F F
i 1
xi
j 1
yj f ij N
i 1 j 1
2. Two-dimensional distribution series
How to construct two-dimensional distribution series
Raw data
Hourly Hourly
Crt. Salary Crt. Salary
productivity productivity
No. No.
2. Two-dimensional distribution series
If between the two grouping variables there is a dependency relationship then the
independent variable will be placed in the first column of the table and the dependent variable
in first row of the table.
▲
2. Two-dimensional distribution series
We construct and fill the table for the two-dimensional distribution:
▲
2. Time series
Conditions:
1. Can be constructed only based on time grouping variable.
2. The values of grouping variable must be ordered chronologically.
3. Must contain a sufficient number of values for capturing the tendency of the studied
phenomenon
4. The values of the studied variables must refer to the same space.
(lei) (tones)
thousand tones oil equivalent l
Hidro - and euro dollar Stock of
Data Data
Year Coal Oil Gas nuclear-electrical USA diesel
energy 17.02.2012 4,3533 3,3100 17.02.2012 1500
2005 5793 5326 9536 3101 20.02.2012 4,3535 3,2903 20.02.2012 *
2006 6477 4897 9395 2961 21.02.2012 4,3550 3,2903 21.02.2012 1250
2007 6858 4651 9075 3264 22.02.2012 4,3602 3,2954 22.02.2012 *
2008 7011 4619 8982 4233 23.02.2012 4,3557 3,2714 23.02.2012 1300
2009 6477 4390 8964 4242 24.02.2012 4,3535 3,2524 24.02.2012 1100
2010 5903 4186 8705 4618 27.02.2012 4,3525 3,2468 27.02.2012 1450
Production of primary energy Exchange rates Gas station 3
Source: The Romania’s Yearbook 2011 Source: RNB
Time-interval statistic series a) b)
Time-moments statistic series
▲
2. Space series
Conditions
1. Can be constructed only based on space grouping variables.
2. Must contain a sufficient number of values for capturing the modifications of the
studied variables in space.
3. The values of the studied variables must refer to the same period of time.
▲
Distribution
attributive series by
intervals
quantitative
continous
Distribution
series by
variants or by
discrete intervals
qualitative
Distribution
Grouping series by
Variables time variants
Time-moments
series
time-moments
Time-intervals
time-intervals series
2011
▲
2. Distribution series
Active social and economic operators from the national economy by size class
2010
▲
1. The constructive elements of a statistical charts
2. Charts for distribution series
3. Charts for time series
4. Charts for space series
5. Statistical charts for comparisons
6. Statistical charts for structures
7. Other statistical charts
1. Constructive elements of a statistical charts
a. Chart title – summarize in a clear and short text the chart’s content
b. The chart scale – is an essential element of a statistic chart. By using the scale we can assure
the proportionality of the indicators represented in the statistic chart.
Types of scales
1. By shape:
- liniar scale
- nonliniar scale
- logarithmic scale
▲
1. Constructive elements of a statistical charts
c. The gridlines
d. The chart figure
e. The legend
f. The explicative note
140
120 100
80
100
60
80 40
60 20
40 0
20 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
▲
2. Charts for distribution series
A. The histogram
B. The frequency polygon
C. The curve of cumulative frequency (Ogive)
The histogram by rectangles
Number Increasing
Weight The evolution of number of packages
of cumulative
(kg) transported by SC Pegasus SRL in January 2009
packages frequencies
50
Number of packages
40 – 45 7 7 45
45 – 50 26 33 40
50 – 55 27 60 35
55 – 60 37 97 30
25
60 – 65 43 140
20
65 – 70 34 174 15
70 – 75 27 201 10
75 – 80 11 212 5
0
Total 212 * 4040 – 45 45 45 – 50 50 50 – 5555 55 – 60 60 60 – 65 65 65 – 70 70 70 – 75 75 75 – 8080
Weight (kg)
▲
2. Charts for distribution series
A. The histogram
B. The frequency polygon
C. The curve of cumulative frequency (Ogive)
The histogram by sticks
Number Increasing
Weight The evolution of number of packages
of cumulative
(kg) transported by SC Pegasus SRL in January 2009
packages frequencies
50
Number of packages
40 – 45 7 7 45
45 – 50 26 33 40
50 – 55 27 60 35
55 – 60 37 97 30
25
60 – 65 43 140
20
65 – 70 34 174 15
70 – 75 27 201 10
75 – 80 11 212 5
0
Total 212 * 40 40 – 4545 45 – 5050 50 – 5555 55 – 6060 60 – 6565 65 – 70 70 70 – 75 75 75 – 80
80
Weight (kg)
▲
2. Charts for distribution series
A. The histogram
B. The frequency polygon
C. The curve of cumulative frequency (Ogive)
The frequency polygon
Number Increasing
Weight The evolution of number of packages
of cumulative
(kg) transported by SC Pegasus SRL in January 2009
packages frequencies 50
Number of packages
40 – 45 7 7 45
45 – 50 26 33 40
50 – 55 27 60 35
30
55 – 60 37 97
25
60 – 65 43 140 20
65 – 70 34 174 15
70 – 75 27 201 10
75 – 80 11 212 5
0
Total 212 * 40 40 – 4545 45 – 5050 50 – 5555 55 – 6060 60 – 6565 65 – 7070 70 – 75 75 75 – 80
80
Weight (kg)
▲
2. Charts for distribution series
A. The histogram
B. The frequency polygon
C. The curve of cumulative frequency (Ogive)
The curve of cumulative frequency (Ogive)
Number Increasing
Weight The evolution of number of packages
of cumulative
(kg) transported by SC Pegasus SRL in January 2009
packages frequencies
250
50 – 55 27 60
55 – 60 37 97 150
60 – 65 43 140
100
65 – 70 34 174
70 – 75 27 201 50
75 – 80 11 212
0
Total 212 * 40 40 – 4545 45 – 5050 50 – 5555 55 – 6060 60 – 6565 65 – 70 70 70 – 75 75 75 – 8080
Weight (kg)
▲
2. Charts for two-dimensional distribution series
Age groups
Gender
-->15 15-30 30-45 45-60 60-->
Male 120 200 255 180 100
Female 115 205 280 250 200
300
250
300
200
250
200 150
150
100 100
50 50
0
-->15 Female 0
15-30
30
30-45
45 -->15
60
45-60 Male 15-30
30
60--> 30-45
45
45-60
60 Female
60--> Male
▲
3. Charts for time series
A. Chronogram (Line charts)
2006 748
800
2007 805
2008 983 600
2010 901
200
2011 705
0
2004 2005 2006 2007 2008 2009 2010 2011
▲
3. Charts for time series
850
2007 805
800
2008 983 750
550
2004 2005 2006 2007 2008 2009 2010 2011
▲
3. Charts for time series
C. Column charts
700
2007 805
600
300
2010 901 200
0
2004 2005 2006 2007 2008 2009 2010 2011
▲
3. Charts for time series
D.Radial polar diagram
xmax xmin
r Jan
2 Dec
Feb
Nov
Mar
Icecream Icecream
Month sales Month sales Oct Apr
-mil. lei- -mil. lei-
Jan 5 Jul 35 May
Sep
Feb 6 Aug 18
Jun
Mar 9 Sep 10 Aug Jul
Apr 10 Oct 8
May 16 Nov 7
Jun 30 Dec 6
Radial polar diagram
▲
3. Charts for time series
Oct Mar
Icecream Icecream
Month sales Month sales
-mil. lei- -mil. lei- Sep
Apr
Jan 5 Jul 35
May
Feb 6 Aug 18 Aug
Jul Jun
Mar 9 Sep 10
Apr 10 Oct 8
May 16 Nov 7
Jun 30 Dec 6 Sectorial polar diagram
▲
4. Charts for space series
A. Column diagram
800000
700000
600000
500000
Mehedinţi 332673
100000
Olt 523291
0
Vâlcea 438388 Dolj Gorj Mehedinţi Olt Vâlcea
▲
4. Charts for space series
Vâlcea
Olt
Mehedinţi
County Population
Dolj 762142 Gorj
Gorj 401021
Mehedinţi 332673 Dolj
Olt 523291
0 100000 200000 300000 400000 500000 600000 700000 800000
Vâlcea 438388
▲
4. Charts for space series
C. Cartogram
Legend
Below average
Above average
Crişana Maramureş
7 Average = 9%
Moldova
7
Transilvania
8
Source: Studiul statistic al pieţei
muncii din regiunea Oltenia.
Banat
C. Radu, 11
Muntenia
N. Vasilescu, 10 Dobrogea
Oltenia 15
C. Ionaşcu, 13
Editura Sitech,
Craiova, 2005, p. 121. Bucureşti
9
D. Cartodiagram
Legend
Nord-West below average
92,3 Nord-East above average
92,4 Naţional average =
91,6%
Center
Source: Studiul statistic al pieţei West 91,6
92,8
muncii din regiunea Oltenia. South-East
89,6
C. Radu,
N. Vasilescu, South-West
South
C. Ionaşcu, 93,1
90,1
Editura Sitech,
Craiova, 2005, p. 121. Bucureşti
91,2
▲
5. Comparisons diagrams by two-dimensional shapes
A. The rectangle. Case 2
A=Lxw
P=Nxw
No. of
Productivity (w)
Company Production (P) workers
- thousands lei -
- mil lei- (N)
A 350 200 1750
B 250 156 1603
N = 200 N = 156
w = 1750
w = 1603
P=350 mil lei P=250
mil lei
A B
▲
5. Comparisons diagrams by two-dimensional shapes
No. of
Productivity (w)
Company Production (P) workers
- thousands lei -
- mil lei- (N)
A 350 200 1750
B 250 156 1603
A B
▲
5. Comparisons diagrams by two-dimensional shapes
A B
▲
5. Comparisons diagrams by three-dimensional shapes
Volume Length width height
Unit cost
Production No. of Productivity
-thousands
Company -mii lei- workers -thousands lei-
lei/unit
D. The parallelepiped (P) (N) (w)
(c)
A 8750 200 1750 25
B 7200 150 1600 30
w=1750 units
w=1600 units
N=200 workers N=150 workers
A B
▲
5. Comparisons diagrams by three-dimensional shapes
No. of
Productivity (w)
Company Production (P) workers
- thousands lei -
E. The Cylinder - mil lei- (N)
A 350 200 1750
B 250 156 1603
A B
▲
5. Comparisons diagrams by three-dimensional shapes
No. of
Productivity (w)
Company Production (P) workers
- thousands lei -
- mil lei- (N)
A 350 200 1750
B 250 156 1603
F. Sphere
A B
▲
6. Diagrams for structures
- other products
4,55%
13,64%
4,55% - clothing
13,64%
- appliances
54,54% 54,54%
- food
27,27% 27,27%
- food
- appliances
- clothing
- other products
▲
6. Diagrams for structures
13.64%
27.27% - food
- appliances
- clothing
- other producs
54.54%
▲
7. Other statistical charts
Balance diagram
Example: For a storage of goods we know the following data:
N1=3000 pcs.; I = 500 pcs.; E = 1500 pcs. ; N2 = 2000 pcs.
3500
3000
2500
2000
1500
1000
500
N1 I E N2
▲
7. Other statistical charts
Ages
Pyramid
Source: The
Romania’s Yearbook
2009
▲
Can construct a series • histogram
distributions • frequency polygon
• cumulative frequency curve
Time series
• cronogram
without
• cronogram with gap
seasonality
• column diagram
▲
2. Rules for positioning and color the elements of a chart
▲
2. Rules for positioning and color the elements of a chart
▲
2. Rules for positioning and color the elements of a chart
▲
2. Rules for positioning and color the elements of a chart
▲
1. The mode ∟ □
2. The quantiles
3. The mean
The mode
Definition: represents that value of the studied variable which has the maximum of absolute
frequency.
•Because it need the values for absolute frequencies can be calculate for distribution series
Calculus
- mathematical calculus - Graphic calculus:
fi
1
Mo Li k 25
1 2
20
15
10
5
xi
100 110 120 130 140 150 160 170
Mo
▲
The quantiles
Definition: represents indicators of position which allows us to split a dataset in a specific
number of equal size parts.
Types
•quartiles - allow us to split a dataset in 4 equal parts. There are 3 quartile.
•Deciles - allow us to split a dataset in 10 equal parts. There are 9 deciles.
•Percentiles - allow us to split a dataset in 100 equal parts. There are 99 percentiles.
Calculus
ci k
x ci Li ci f j Sf
n f
np np p ci
np
▲
The median
Definition: represents that value which can be used to split an ascending or descending ordered
dataset in two equal parts.
Calculus
• for simple series (previously ascending or descending ordered):
- with odd number of values: - with even number of values
n 1 xn xn
Place of Me Me 2 2
1
2 2
• for simple distribution series:
- Mathematical calculus - Graphic calculus:
Increasing cumulative frequencies
fi 100
k
Me Li Sf f
90
2 80
ME f i 70
2 60
Because of the phenomenon diversity and complexity, in practice we use many types of
means:
- arithmetic mean
- harmonic mean
- geometric mean
- quadratic mean
- chronologic mean
▲
The arithmetic mean – subtypes, calculus
Subtypes:
-unweighted - it is used when the values of the studied variable are unique.
-weighted - it is used when the values of the studied variable are not unique.
Also we use it when we know the absolute frequency of each value
of the studied variable.
Calculus:
-unweighted - weighted
n n
x i x f i i
x i 1
x i 1
n
n
f
i 1
i
Applicability
The arithmetic mean is used when the studied phenomenon records almost constant changes
(in arithmetic progression), showing therefore a linear trend. The arithmetic mean is the type
most often used in practice.
▲
The arithmetic mean –properties
I. properties for verifying the accuracy of the calculation
1. The arithmetic mean has a values always included between the maximum and minimum
value of the studied variable.
xmin x xmax
2. The sum of deviations of studied variable's values from it's mean is always zero.
n
- for unweighted mean
(x x) 0
i 1
i
n
- for weighted mean
(x x) f
i 1
i i 0
▲
The arithmetic mean – properties
II. To simplify the calculation
1. x we construct another
If we start from a variable X (x1 , x2 , … xn ) with mean and
variable X’, with values determined by subtracting the same constant a from the X values
meaning X’(x1-a , x2-a , … xn-a ), the mean of X’ variable will be equal with x . a
x
' i
x '
(x i a)
xi na
x i
na
x a
n n n n n
2. If we start from a variable X (x1 , x2 , … xn) with mean and wexconstruct a new variable X”,
with values determined by dividing X values by a constant k, meaning X”(x1/k , x2/k , … xn
/k), the mean of X’’ variable will be equal with: x/k
xi 1
x k
''
k
xi
1 x 1 x
x
''
i
i
x
n n n k n k k
xi a
n
3. By combining the above two properties,
also valid in the case of weighted
i 1 k
fi
k a
arithmetic mean we obtain a formula x n
fi
for simplified calculus:
▲
i 1
The mean of binomial (dichotomial) variable
The binomial variable (dichotomial): it is that variable which has only two alternative values.
Example: a). Quality of a product: good or scrap; b). The status of a student after an exam:
passed or not passed; c). gender: male, female:
x1= 1 f1 p1= p
x2= 0 f2 p2= q
x
xi fi
x1 f1 x2 f2
x1
f1
x2
f2
1 p 0 q p
fi f1 f2 f1 f2 f1 f2
▲
Harmonic mean
Applicability
Formula
- unweighted - weighted
xh
1
n
f i
x
xh
1
i x f i
i
▲
Harmonic mean - properties
1. If are calculated, for the same dataset, the arithmetic mean and harmonic mean always
verify this relation:
xh x
1
2. If between two variables X and Y exists this relation X then the same relation will
1 Y
exists between their means: X
Y
▲
The quadratic mean
Applicability
1. The quadratic mean it is used in the case when the studied phenomenon records changes
approximately in exponential progression (example: when the growth is slower at the
beginning of the series and becoming more pronounced towards the end). It is used in the
analysis of exponential trends.
2. It is used as a mathematical model for one of the synthetic indicators of variance: the
standard deviation σ.
Formula
- unweighted - weighted
n
i fi
n
x
2
2 x
xq
i
i 1
xq i 1
n
n
f i 1
i
Properties
1. If are calculated for the same dataset, the arithmetic mean and the quadratic mean
always verify this relation:
▲
x xq
The geometric mean
Applicability
1. The geometric mean it is used in the case when the studied phenomenon records changes,
approximately, in geometric progression.
2. It is used frequently when the differences between the values of the studied variable are
larger the beginning of the series and become smaller toward to the end of it.
3. It is used as a mathematical model for calculate one of chronological series synthetic indicator
(average index of dynamics).
Formula:
- unweighted - weighted
n
n
fi n
xg n xi xg i 1
i
x fi
i 1
i 1
Properties
1. If are calculated for the same dataset, the arithmetic mean and the geometric always
xg x
verify this relation:
If are calculated for the same dataset, the arithmetic, the geometric, the quadratic and the
harmonic mean always verify this relation:
xh x g x xq
▲
The chronologic mean
Applicability:
The chronologic mean is used, exclusively, for time-moments series.
1. unweighted chronologic mean – for time-moments series with moments regularly placed
in time (the periods of time are equal between any two consecutively time-moments).
2. weighted chronologic mean – for time-moments series with moments irregularly placed in
time (at least one of the periods of time between two consecutively time-moments is
different than the rest of them).
Relaţie de calcul
- unweighted - weighted
k n 1 k n 1
x x i i xt xt i i i i
xc i 1
i 1
xc i 1
i 1
n 1
n 1 k
t t
k
i i
were: i 1 i 1
xi xi 1 moving average n - the number of statistic series terms
xi
2 i 1..n k n 1 - the number of time periods
ti - the period of time, expressed usualy in days, between i and i+1 time moments.
▲
Can construct a series
distribution •weighted arithmetic mean
time series
•unweighted arithmetic mean
time-intervals • unweighted quadratic mean
• unweighted geometric mean
time- = • unweighted chronological
moments mean
≠ •weighted chronological mean
Dataset
space series
special cases
Example:
• we know the xi values
• we don’t know the fi absolute frequencies
• we know values like xi.fi which are:
− equal → Unweighted harmonic mean
− different → harmonic mean as a transformed
Cannot construct a form of weighted arithmetic mean
Types of mean series
▲
The calculul of the mode and median for distribution series
Productivity Number of 1
(pieces) employees Mo Li k
100 -110 5 1 2
110 -120 10 25 15
130 10
120 -130
130 -140
15
25
25 15 25 20
140 -150 20 136.67
150 -160 15
160 -170 10
Total 100
fi
k 100 10
Me Li Sf f 130 2 30 25 138
2
ME
▲ The median The mode
Types of mean
Formula
Types of mean
unweighted weighted
x x
x fi i
• arithmetic x i
n f i
xh
n
x
f i
• harmonic
1 h
1
xi x f i
i
x g n xi x g i xif i
f
• geometric
• quadratic x 2
xp
i fi
x 2
xp i
n f i
• cnronologic x x
x t k k
k
t
xc
k k
▲ Diagram
The arithmetic mean –properties
Example:
International transport
a = 1,1 mean
Gross weight
Initial variable X (tones) 8 9 10 11 12 10
Transformed Net weight
variable X'=X-a (tones) 6,9 7,9 8,9 9,9 10,9 8,9
Consumption
k= 2 mean
Unit consumption
before upgrade
Initial variable X (kg/piece) 8 9 10 11 12 10
Unit consumption
Transformed after upgrade
variable X''=X/k (kg/piece) 4 4,5 5 5,5 6 5
▲
Example of harmonic mean use:
Case: xi . fi are equal
The The average sale price is :
Sale price
collected
Product per unit n 2
value xh 13.33
1 1 1
x
(xi)
(V = xi . fi)
i 10 20
A 10 800
B 20 800
B 20 800
Grades 4 5 6 7 8 9 10
Number of 9 10 10 2 2 1 1
students
4 5 6 7 8 9 10
x 7
7
4 9 5 10 6 10 7 2 8 2 9 1 10 1
x 5,57
35
x
xf i i
x
2950
2,95
f i 1000
xi a
k fi x
400
0,5 2,75 2,95 a=2,75
x k a 1000
fi
k=0,5
▲ The arithmetic mean
Example of time-moments series
Rty/ 1 I ty/ 1 1 100
Rty/ t 1 I ty/ t 1 1 100
▲
Dynamics Indicators
Average indicators
- Absolute
• average level of studied variable ( ) -determined
y by taking into account the type of
time series.
If the time series is by intervals then the arithmetic mean must be used.
If the time series is by moments then the weighted or unweighted chronological mean
must be used .
n1
- Relative
• Average index of dynamics I y n1 I ty/ t 1
• Average rhythm of dynamics
R y I y 1 100
• Average absolute value of a percent
va (1%) ( A)
R
▲
Dynamics Indicators (absolute and relative) – example
Absolute Relative
Production
Month (t) Value (Y) Yn/1 Yn/n-1 IYn/1 IYn/n-1 RYn/1 RYn/n-1
mil. lei
1 430 * * * * * *
2 380 -50,00 -50,00 0,88 0,88 -11,63 -11,63
3 400 -30,00 20,00 0,93 1,05 -6,98 5,26
4 410 -20,00 10,00 0,95 1,03 -4,65 2,50
5 360 -70,00 -50,00 0,84 0,88 -16,28 -12,20
6 340 -90,00 -20,00 0,79 0,94 -20,93 -5,56
7 380 -50,00 40,00 0,88 1,12 -11,63 11,76
Sum/Product 2700 * -50,00 * 0,88 * *
Rithm
Dynamics Indicators (average) – example
Indicators y I R
Average absolute 385,71 -8,33 * *
Average relative * * 0,9796 -2,04
Average absolute value of a
* 4,083 * *
percent
Time series adjustment
Time series adjustment = determination of a model which is the best approximate of time series
tendency.
Utility:
Once we determine the model of time series tendency:
- we can use it to make forecasts
- we can use it to determine the value of time series missing (interpolation)
▲
Time series adjustment
1. Graphic method – it is based on visual identification of adequate tendency model by
testing many types of known models, using as support the cronogram or the historiogram
of analyzed time series.
Initial data
Linear trend
Exponential trend
Geometric trend
▲
Time series adjustment
2. Mecanical methods – are based on using of mathemathical relations determined between
time series terms, which allow total or partial decrease of random fluctuations generated
by empirical data included in the analyzed time series and the identification of tendency
model.
▲
Time series adjustment
Staggered average method – is based on calculus of averages from 2,3 or many time series
successive terms, without repeating any of them, then on using this new terms instead of
initial data for determining the tendency.
Staggered
Month (t) Y 440 Y
average 420
Staggered average
1 430
405 400
2 380 380
3 400 360
405
4 410 340
5 360 320
350
6 340 300
1 2 3 4 5 6 7
7 380
Moving
Month (t) Y
averages 440 Y Moving averages
1 430
420
405,00 400
2 380
390,00 380
3 400
405,00
360
4 410 340
385,00
5 360 320
350,00 300
6 340 1 2 3 4 5 6 7
360,00
7 380
This methods does not completely remove the random fluctuations.
▲
Time series adjustment
Average absolute change method – is based on using of a recurrence relation which can be
established between any of the time series terms, absolute average change and first term
of time series to calculate new values corresponding to each term of time series, then on
using of these new terms instead of initial data for determining the tendency.
Month yt y1 t 1
Y
(t)
1 430=y1 430
2 380 421,67
3 400 413,33
4 410 405,00
5 360 396,67
6 340 388,33
7 380 380,00
This method remove completely the random fluctuations. It can be used with good results in
case of linear tendency.
▲
Time series adjustment
Average index method – is based on using of a recurrence relation which can be established
between any of the time series terms, average index and first term of time series to
calculate new values corresponding to each term of time series, then on using of these new
terms instead of initial data for determining the tendency.
Month
Y yt y1 I t 1
(t)
1 430 = y1 430,00
2 380 421,23
3 400 412,64
4 410 404,23
5 360 395,98
6 340 387,91
7 380 380,00
This method remove completely the random fluctuations. It can be used with good results in
case of geometric tendency.
▲
Ajustarea seriilor cronologice –Metode analitice
Tipuri de modele
-Modele de ajustare
-Modele autoproiective
-Modele explicative
A. Modele de ajustare
-Modele aditive Yt Tt St Ct ut
-Modele multiplicative
unde:
- perturbaţie
B. Modele autoproiective
Yt f Yt 1 , Yt 2 ,..., ut
C. Modele explicative
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Metodele analitice de ajustare a seriilor cronologice au la bază ideea de a descoperi modelul de
funcţie matematică care aproximează cel mai bine tendinţa datelor reale.
Este necesar un criteriu care să permită selectarea funcţiei (din mulţimea funcţiilor testate) care
aproximează cel mai bine evoluţia datelor reale.
Criteriul cel mai utilizat este minimizarea diferenţelor dintre datele reale şi datele calculate prin
intermediul funcţiilor testate (Criteriul celor mai mici pătrate).
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Testarea funcţiilor matematice, pentru a vedea cât de bine aproximează evoluţia datelor reale,
presupune parcurgerea mai multor etape:
1. Pentru fiecare funcţie se parcurg paşii:
a. Se stabileşte modelul general al funcţiei ce va fi testate;
b. Se particularizează funcţia prin determinarea valorilor parametrilor săi astfel încât să
se atingă precizia maximă în aproximare. În situaţia în care funcţia permite calculul, se
poate face prin metoda simplificată.
c. Se calculează prin funcţie, pentru fiecare termen real, câte o valoare corespondentă;
d. Se determină suma erorilor pe care le generează utilizarea respectivei funcţii în
aproximarea evoluţiei datelor reale prin calculul sumei pătratelor diferenţelor dintre
valorile reale şi cele calculate prin funcţie;
2. Se compară suma erorilor pe care le generează fiecare funcţie în aproximarea tendinţei
datelor reale şi se alegea acea funcţie pentru care erorile sunt cele mai mici.
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Notaţii utilizate:
y - valorile reale ale variabilei studiate prin intermediul serie cronologice
Yt - valorile asociate variabilei studiate, dar calculate prin intermediul
funcţiei testate
t - rangurile perioadelor de timp cuprinse în serie
a,b,c.. - parametrii funcţiei testate
Funcţia liniară
a. Yt a b t
b.Determinarea parametrilor funcţiei (a şi b) se poate face aplicând funcţiei criteriul minimizării
celor mai mici pătrate
2
y Y min y a b t min
2
t
Este posibilă atingerea minimului pentru suma de mai sus dacă derivatele parţiale de ordinul I
în raport cu parametrii a şi b sunt nule.
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Derivând expresia celor mai mici pătrate în raport cu parametrul a obţinem:
y n a b t 0 y n a b t
apoi în raport cu parametrul b,
y t a
t b t 2
0 y t a
t b t 2
Ambele ecuaţii de mai sus pot fi integrate într-un sistem, care poate fi rezolvat prin orice
metodă cunoscută.
y n a b t
y t a t b t2
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
O variantă de rezolvare rapidă este cea utilizând determinanţi:
y n a b t
y t a t b t2
Se construiesc următorii determinanţi:
n t y t n y
a b
t t 2
y t t 2
t y t
a b
a b
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
O altă variantă de rezolvare rapidă este cea folosind calcul simplificat. Acesta presupune ca lui t
(variabila timp) să îi dăm valori particulare astfel încât Σt = 0.
Astfel, dacă:
- seria are număr impar de termeni, lui t i se vor asocia următoarele valori:
- 0 în dreptul termenului din centrul seriei
- -1, -2, -3... pentru termenii aflaţi deasupra termenului central al seriei.
Atribuirea valorilor se face începând de la acesta către primul termen al
seriei.
- 1,2,3... pentru termenii aflaţi sub termenul central al seriei. Atribuirea
valorilor se face începând de la acesta către ultimul termen al seriei.
- seria are număr par de termeni, lui t i se vor asocia următoarele valori:
- -1, 1 pentru cei doi termeni din centrul seriei
- -3,-5,-7...pentru termenii aflaţi deasupra termenilor centrali
- 3,5,7... pentru termenii aflaţi sub termenii centrali
▲
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Exemplu de calcul
7 28
Valoarea
producţiei 28 140
Anul industriale t t2 y.t Yt (y-Yt)2
(mld. lei) 3279 28
y a
1993 410 1
13674 140
1 410 408,64 1,84
1994 430 2 4 860 428,57 2,04 7 3279
1995 450 3 9 1350 448,50 2,25 b
28 13674
1996 460 4 16 1840 468,43 71,04
1997 490 5 76188
25 2450 488,36 2,70 a 388.7143
1998 509 6 36 3054 508,29 0,51 196
1999 530 7 49 3710 528,21 3,19 3906
b 19.92857
Total 3279 28 140 13674 3279 83,571 196
Yt 388.7143 19.92857 t
Ajustarea seriilor cronologice –Metode de ajustare a trendului
540
520
500
480
460
440
420
Valoarea producţiei industriale y Yt
400
1993 1994 1995 1996 1997 1998 1999
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Funcţia parabolică de ordinul II Yt a b t c t 2
Funcţia are trei parametri a, b, c. Valorile aferente acestora se pot determina în acelaşi mod ca
ca şi în cazul funcţiei liniare, prin aplicarea criteriului celor mai mici pătrate. Se obţine sistemul:
y
1
na b
t
t
y 1 1
a b 2
t t
Rezolvarea sa nu se mai poate face prin metoda de calcul simplificat decât într-o singură situaţie,
dar se pot utiliza în schimb alte metode: determinanţi, substituţie, reducere...
Ajustarea seriilor cronologice –Metode de ajustare a trendului
Funcţia exponenţială Yt a b t
log Yt log a bt log a log bt log a t log b
apoi se fac următoarele substituţii:
log Yt u
caz în care expresia funcţiei devine:
log a v
log b z u v z t S-a obţinut o expresie similară cu cea a funcţiei
liniare. Rezolvarea se face la fel ca în cazul
funcţiei liniare, determinându-se parametrii v şi
z.
a 10 v b 10 z
Ajustarea seriilor cronologice –Determinarea componentei sezoniere
Pentru determinarea componentei ciclu Ct sunt necesare serii de timp foarte lungi.
De obicei acest lucru nu este posibil. În această situaţie se poate renunţa la determinarea
acesteia.
Determinarea componentei sezoniere este însă posibilă. Pentru aceasta se procedează astfel:
A. Modelul aditiv
Yt Tt St ut
1. după determinarea funcţiei trendului Tt cu ajutorul metodelor prezentate până aici se poate
izola componenta sezonieră
yt Tt St ut
- Componenta sezonieră St a unei serii cronologice poate fi prezentată ca o funcţie de forma:
St c1 S1 c2 S 2 c3 S3 ...c j S j ... cm S m
unde: cj – coeficienţi ce măsoară modificările la nivelul fiecărui sezon j. j=1..m
Sj – variabilă indicatoare a sezonului
Ajustarea seriilor cronologice –Determinarea componentei sezoniere
yt Tt St ut
u t 0
Ajustarea seriilor cronologice –Predicţia
4. Se calculează eroarea medie a estimaţiei :
n 2
u t
E t 1
n2
5. Se fixează precizia dorită a estimaţiei prin determinarea coeficientului ta,n-2 tabelul distribuţiei
Student:
Yt Tt St ut
dând lui t valori în continuarea celor corespunzătoare seriei
Usually when we try to study phenomena or process into a population for which:
we don’t have data for all its elements
we can’t study all the elements because in the process we damage them total or partial
we want to obtain the maximum of informations rapid and the smallest costs
▲
Introduction
In which fields, the statistic sampling is used?
Statistic sampling is used very often in the following fields of activity:
-Marketing: market researches regarding the behavior of consumers, the demand and offer of
goods etc.
-Industry: production quality, quality of raw materials, statistic setting of equipment
-sociology: in study of behavior individuals
-medicine: treatments efficacy, determining of optimal dosage for drugs
-agriculture: for estimating the quantity and the quality of production before harvest
-other: in standard of living characterization, quality of TV or radio programs, opinion survey
etc.
▲
Introduction
- Average of X variable xs
x i
x0
x i
n N
- Variance of X variable
2
i s
x x 2
2
x
i x0
2
n 1
s 0
N
▲
Sampling methods
For obtaining greater precision of the statistic sampling results, the sample must respect the
condition of representativeness, meaning:
The sample must reproduce as much is possible the structure of population from where it was
extracted.
For extracting the sample we can use one from the following methods:
I. random sampling
a) Pure random sampling
b) Systematic sampling
II. Nonrandom sampling
▲
Sampling methods
a) Pure random sampling:
There are two variants for this method:
a1. with repetition. The sample will be formed by extracting one by one the items from
population. After each extraction the item is recorded in the sample and then it is reintroduced
in the population. In this case, the volume of population is constant during the extraction of the
sample and the probability to extract any of the items from population is also constant.
a2. without repetition. It is the same methods as above with only one difference: after each
extraction the item is not reintroduced in the population.
▲
Sampling methods
a2. Random sampling without repetition
This method has the following characteristics :
- the population volume became smaller and smaller during the extraction of the sample
- the probability of one item extraction fom the population raiseduring the extraction;
1 1 1 1
p1 ; p2 ; p3 ;...; pi ;...
N N 1 N 2 N i 1
- after the last extraction in population remain N-n items.
Because of the fact that an item cannot be extracted many times into the sample, this method
produce smaller errors than in the previous method.
▲
Sampling methods
b) Systematic sampling:
It is used when population is already organized by some criteria (Example: the students from
one faculty ordered by their identification number, fruit trees from an orchard, etc).
To use this methods, first we need to calculate a numbering step (k):
Example:
Supposing that it was extracted from the urn the ticket with number 4, the sample will be formed
from the following items: 4, k+4, 2k+4, 3k+4,...,(n-1)k+4
▲
Sampling methods
II. Nonrandom sampling:
This method can be used when the studied population has a small number of items. In this case
using the random methods to extract the sample will produce bigger error than in the situation
in which a sampling specialist subjective extract the sample.
Based on his experience the specialist can, in this conditions, to extract a
representative sample for a population.
▲
Statistic sampling indicators
In the case of statistic sampling we can encounter errors which regards process of collecting and
processing of data and specific errors for each type of sampling methods used.
- random errors - which appear no matter how rigorous we may organize the
sampling and process the collected data. These errors are based on the fact that we will never
can extract perfect representative sample for the studied population.
▲
Statistic sampling indicators
If in practice for studying any X variable, we would organize the extraction of all possible random
samples, by using the random sampling with repetition, we would calculate for all samples the
corresponding averages ( ), then we would calculate the xvariance of these averages from the
s
average calculated at the level of population
( ) we would obtain:
x0
The variance of samples averages from the
x x 2
fi
population average 2
s 0
f
(the average error of representativeness) with rep.
i
Between the population variance ( ) and
2
0 the variance of samples averages from the
population average ( with)rep
2
exist the following relation:
02 n w2ith rep.
We can extract from this
When we don’t know the value of population
relation the average error of
variance ( 02we can use instead
) and n >100
representativeness with
of it, with good results, the sample variance (
repretition:
): s2
02 s2 s2
with rep. with rep. with rep.
n n n 1
▲
Statistic sampling indicators
Between the variance of samples averages from the population average, using the variant of
random sampling methods with repetition ( w2variant
) and the
ith rep.
without repetition (
) exist this relation: 2 without rep.
wu
2
N 1
rep.
withoutrep. N n
2
N-1 - the number of items from population at the end of the extraction of the sample
using random sampling with repetition;
N-n - the number of items from population at the end of the extraction of the sample
using random sampling without repetition.
Thus:
N n N n
2
2
0
N 1 n N 1
without rep. with rep.
If the volume of population is big, we can approximate (N-1) with N and from previous relation
we obtain:
2
n
without rep. 1
0
n N
▲
Indicatorii
Statistic sampling
sondajului
indicators
statistic
The exact value of average error of representativeness can be determined only if we extract all
posibile samples and calculate the errors generated by the use of each samples average instead
of population average. In practice we never extract all the possible samples, that’s way we use
an estimation indicator:
Maxim admisible error:
z
x z
, where:
- the argument of cumulative normal distribution z2
z
1 z
function: e 2
dz
- the probability used to guarantee the results of sampling
2 z
z
Using the maximum admisible error we can determine a confidence interval for the population (
):
x0 x0 xs x; xs x
One of the frequent encountered problem in case of statistic sampling is to determine a
specific sample volume which assure us to respect a maximum admissible error previous
established. In this case the sample volum can be deteremined starting from relation between
maximum admissible error which has a specific form for each sampling methods.
Example: In the case of random sampling with repetition it has this form:
2 z 2 02
xwith rep. z with rep. z 0 , from where n 2
n xwith rep
▲
Types of statistic sampling
Combining these factors results the following most important types of statistic sampling:
1) simple random sampling: -with repetition
-without repetition
2) stratified sampling: -with repetition
-without repetition
3) clustered sampling, usually organized without repetition, because operate usually
with small number of clusters.
▲
Simple random sampling
Calculus
Indicators
With repetiton Without repetiton
1. Average error of 2
2 02 n
0
s 1
representativeness
n n 1 n N
2. Maximum admissible x z x z
error
z 2 02
z 2 02 n
3. Sample volume n z 2 02
x 2 x
2
N
If the studied variable is binomial then for variance ( ) andstandard
2 deviation
( ) must use the following relations :
2 pq p1 p pq p1 p
▲
Stratified sampling
It is used when the population is not homogenous. In these situations the population it is
organized by homogenous groups.
For respecting the reprezentativeness condition, the sample must be formed by
extracting a number of items proportional with the volume of each groups.
Calculus
Indicators
With repetiton Without repetiton
1. Average error of 2
2 02 n
representativeness 0
s 1
n n 1 n N
2. Maximum admissible
error
x z x z
z 2 02
z 2 02 n
3. Sample volume n z 0
2 2
x 2 x
2
N
▲
Cluster sampling
It is used when the population is formed from complex items (clusters) and not individual items
(example: juice bottles packed in boxes). In this case the sample is formed by extracting cluster
(set of items) by cluster and not item by item.
Indicators Calculus
Without repetiton
2 Rr
1. Average error of representativeness
r 1 R 1
Rz 2 2
3. Sample volume r
R 1x 2 z 2 2
▲
Random sampling with repetition
Population
Sample
1 2
6
3 3
4
5 7 5 5
8 8
10
9
▲
Random sampling without repetition
Population
Sample
1 2
6
3
4
5 7
8
10
9
▲
Systematic sampling
From a population of 1500 students enrolled in a faculty matriculation register we extract a
sample of 10%. N = 1500 n = 150
1500
1 2 3 4 5 6 7 8 9 k 10
150
10 11 12 13 14 15 16 17 18 We introduce into an urn
tickets containing numbers
from 1 to 10 and we extract
19 20 21 22 23 24 25 26 27 only one. Supposing that the
ticket with the number 8 was
28 29 30 31 32 33 34 35 36 extracted, we determine the
rest of the students from the
sample using the following
37 38 39 40 41 42 43 44 45
relation:
46 47 48 49 50 51 52 53 54 (i - 1)k+8
(1 - 1)10+8 = 8, for i = 1
(2 - 1)10+8=18, for i = 2
55 56 57 58 59 60 61 62 63
(3 - 1)10+8=28, for i = 3
.............................................
64 65 66 67 68 69 70 71 72
(n - 1)k+8=1498, for i = n
▲
1. Elementary methods
2. Analytic methods
3. Linear correlation
4. Non linear correlation
5. Nonparametric correlation
Introduction
The synthetic expression of causal link intensity between phenomena is called correlation.
The phenomena between which a causal determination exists can be found in one of the
following situations:
- cause - when it determine the appearance or modification of other phenomena;
- effect - when it is a result of the effects generated by other phenomena.
The variables that describe this two categories of phenomena can be:
- Cause variable (independent, factorial) – when it characterize a cause phenomena
- Effect variable (dependent, resultatives) – when it characterize effect phenomena.
▲
Types of correlations
▲
Elementary methods
1. Correlation table method
a. The existence of correlation between the cause and effect variable is showed by the
frequency grouped into a strip with a specific shape.
b. The sense of correlation is given by the diagonal were the strip is placed.
c. The strength of correlation is given by the width of the strip.
d. The shape of correlation is given by the shape of the strip.
▲
Elementary methods
2. Graphic methods
Correlogram Y
a. The existence of the correlation between the cause and effect variables is given by value of α
angle variabile different than 0o or 90o.
b. The sense of correlation is given by the tendency line.
c. The intensity of correlation is give by the size of the α angle. Maximum of correlation
intensity is when α value is has 45o for direct correlation or 135o for inverse correlation.
2.
For all the tested model we compare they sum
the model with the smallest error.
2
Yxof errors calculated as above and we chose
y Yx 2
min
▲
Linear correlation
Simple linear regression Y Y
b>0 b<0
Yx a bx a
a
y i na b xi
X X
x y a x b x2
i i i i
n x y x n y
a b
x x2 xy x 2
x xy
a b
a b
▲
Linear correlation
Example -lei, monthly, on household-
Total income Total
Years (x) expenses(y) x2 x.y Yx (y-Yx)2
2001 521,79 516,52 272264,80 269514,97 533,55 289,9618
2002 658,51 651,66 433635,42 429124,63 654,23 6,626353
2003 795,09 781,45 632168,11 621323,08 774,80 44,26912
2004 1085,79 1049,94 1178939,92 1140014,35 1031,40 343,5792
2005 1212,18 1149,33 1469380,35 1393194,84 1142,97 40,43068
2006 1386,32 1304,66 1921883,14 1808676,25 1296,69 63,53853
2007 1686,74 1541,96 2845091,83 2600885,61 1561,88 396,6701
Total 7346,42 6995,52 8753363,58 8262733,73 6995,52 1185,076
7 7346,42 532817643,00
7303658,24 a 72,952
7346,42 8753363.58 7303658,24
6995.52 7346.42 6447108,08
a 532817643,00 b 0,883
8262733.73 8753363.58 7303658,24
Yx 75.952 0.883x
7 6995.52
b 6447108,08
7346.42 8262733.73
▲
Linear correlation
Example
Yx 75.952 0.883x
▲
Linear correlation
2. The intensity of linear correlation is determined by using Pearson linear correlation coefficient
(ry,x):
xi x y i y n xi y i xi y i
ry , x
x y
2
2
ry , x n xi xi n y i y i
2
2
n
- The Pearson linear correlation coefficient (ry,x) takes values in this interval [-1;1]
- The intensity of linear correlation increase when the coefficient is approaching to the extremes
of the above interval.
-Negative values means inverse correlations and positive values means direct correlation.
Observation: The 0 value of the linear correlation coefficient means no linear correlations
between the cause and effect variable but does not exclude a nonlinear correlation!
Example. Using date from the previous table we obtain : ry,x = 0,99927
▲
Nonlinear correlation
Y Y
1. Regression
Quadratic function: Yx a bx cx 2
i i i
2
y na b x c x X X
n x x 2
xi y i a xi b xi c xi
2 3
x x x 2 3
xi2 y i a xi2 b xi3 c xi4 x x x
2 3 4
y x x 2
n y x 2
n x y
a xy x x 2 3
b x x y x 3
c x x x 2
y
x2 y x3 x4 x x2 2
y x 4
x x x2 3 2
y
a c
a ; b b; c
▲
Nonlinear correlation
1. Regression
b
Hyperbolic function: Yx a
x
1
i
y na b x
i
yi a 1 b 1
xi x x2
i i
Nonlinear correlation
1. Regression
Exponential function: Yx ab x
Y ab x log Y log ab x log a log b x log a x log b
z log Y u log a w log b
z u wx
z nu w x
xz u x w x 2
a = 10u
b = 10w
Nonlinear correlation
2. The intensity of correlation
To determine the strength of the correlation between the variables x (cause) and y (effect), in
case of using any of the linear or non-linear function is used the correlation ratio, which is based
on the overall variance decomposition in factorial variances.
General dispersion. Summarizes the total variance of the variable overall result of the
simultaneous action of all influencing factors.
iy y 2
y2
n
Explained variance. Summarizes the variation of y (effect) variable explained by the influence of
the variable x (cause) included in the correlation couple.
xi
Y y 2
Y2
x
n
Nonlinear correlation
2. The intensity of correlation
Non-explained variance. Summarizes the variation of y (effect) variable that cannot be explained
by the influence of the variable x (cause) included in the correlation couple.
i x
y Y 2
y2,Y
x
n
R R 2
R[0;1]
Nonparametric correlation
Rank - the position of each variables X and Y values of the correlation couple in the set that they
belong, ordered ascending or descending.
ui - ranks of the values xi, from ordered set x1, x2, …, xn;
wi - ranks of the values yi, from ordered y1, y2, …, yn;
Spearman coefficient
where:
6 d 2
d - the diference between ui and wi ranks;
1
n n2 1 n - the number of statistic series terms.
The relation gives accurate results as long as the premises are used to obtain it, are
u w
i i
Ranks ui and wi are unique, not repeated in the set that they belong. If this latter condition is
not satisfied then you can do so: the value of xi or yi that are repeating will be copied only once
in their set and as corresponding value will have the average of the other variable values
corresponding to the value that is repeating.
Kendall coefficient
where:
2S 2P Q
P - the sum of wi ranks bigger than the current rank.
nn 1 nn 1 Q - the sum of wi ranks smaller than the current rank.
64 2 26 2
1 0.994 0.857
8 82 1 88 1