Академический Документы
Профессиональный Документы
Культура Документы
1 Introduction
Important equations for this video:
X = {x1 , ..., xn }
n
1X
x = xi
n i=1
" n #
1 X
x 2 = (xi x )2
n i=1
The symbol x is the mean of x, and x 2 is the variance of x. The standard deviation
is denoted x .
2 Mean
Example:
Z = {1, 5, 12}
|Z| = 3
1 + 5 + 12 18
z = = =6
3 3
The mean z is also denoted (z) or simply .
Symbolic example:
Y = {y1 , y2 , y3 , y4 }
1
y = (y1 + y2 + y3 + y4 )
4 !
4
1 X
= yi
4 i=1
1
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University
X = {x1 , x2 , ..., xn },
The variable i is a counter. The variable n is a number, which tells you when to stop counting.
3 Mean centering
Z = {1, 5, 12}
z = 6
0 1 5 6 12 R
Z 0 = {1 6, 5 6, 12 6}
= {5, 1, 6}
z 0 = 0
5 1 0 6 R
Mean centering data produces a new data set, which has the same relationships, but the
mean is zero.
2
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University
4 Variance
Z = {1, 5, 12}
z = 6
W = {5, 6, 7}
w = 6
5 6 7
0 1 5 6 12 R
Z and W have the same mean, but Z is more spread out, so z should be greater than w .
" 3 #
1 X
w2 = (wi w )2
3 i=1
1
(5 6)2 + (6 6)2 + (7 6)2
=
3
1
(1)2 + 02 + 12
=
3
2
=
3
r
2
w =
3
3
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University
" 3
#
1 X
z2 = (zi z )2
3 i=1
1
(1 6)2 + (5 6)2 + (12 6)2
=
3
1
(5)2 + (1)2 + 62
=
3
62
=
3
r
62
w =
3
z2 w2 , so Z is much more spread out than W .