Вы находитесь на странице: 1из 64

Descriptive Statistics and Probability Distributions

Munmun Biswas

Dept. of Statistics, Brahmananda Keshab Chandra College

July 9, 2019

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 1 / 37
Introduction

“Statistics is the universal tool of inductive inference, research in natural


and social sciences, and technological applications.
Statistics, therefore, must always have purpose, either in the pursuit of
knowledge or in the promotion of human welfare.”
- Prasanta Chandra Mahalanobis

A Roadmap.. Collection of Data - Summarization of Data - Analysis


of Data- Interpretation of Data towards a VALID DECISION.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 2 / 37
Application of Statistics in Real World Problem

Finance - correlation and regression, index numbers, time series


analysis, volatility modelling.
Marketing - hypothesis testing, chi-square tests, nonparametric
statistics.
Development Studies - Gender Inequality, Child Health, Poverty :
Econometrics and Applied Statistics.
Forecasting - Weather Forecasting using TS.
Self-Driving Cars, Robotics, AI, Image and Video Processing etc. as
well uses STATISTICS extensively.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 3 / 37
Outline:

Descritive Statistics
Data Type
Data Representation
Data Summery
Data Shape
Bivariate Data
Probability Distributions
Probability
Random Variable
Probability Distribution Function
Expectation and Variance
Moments
Binomial, Poisson
Normal Distribution

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 4 / 37
Descriptive Statistics

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 5 / 37
Descritive Statistics: Data Type

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 6 / 37
Quantitave data has numerical values. This can also be called
variables
Examples: Height, Weight, Number of defectives in a particular lot of
products, Number of accidents in a traffic signal per week
Variables can be discrete when it takes only discrete values in a
certain interval
Variables can be continuous when it may take all the values of an
interval
Qualitative (or Categorical) data do not have numerical values. It can
also be called attribute.
Examples: Blood groups, gender, exam grades, income groups
Categories which can be ordered can be classified as ordinal data, if
not they are called as nominal data.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 7 / 37
Questions

Examples of continuous variables?


Examples of discrete variables?
Examples of oridal data?
Examples of nominal data?

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 8 / 37
Graphical Representation

Line Diagram: Discrete time series data


Monthly electric consumption

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 9 / 37
Graphical Representation

Line Diagram: Discrete time series data


Monthly electric consumption

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 9 / 37
Graphical Representation

Line Diagram: Discrete time series data


Monthly electric consumption

Bar Diagram: Discrete variable, Categorical data


Vertical bars
Horizontal bars
Multiple bars
Divided bar

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 9 / 37
Bar Diagrams
Bar Diagram: To represent Discrete variable, Categorical data
Examples:

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 10 / 37
Bar Diagrams
Bar Diagram: To represent Discrete variable, Categorical data
Examples:

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 10 / 37
Bar Diagrams
Bar Diagram: To represent Discrete variable, Categorical data
Examples:

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 10 / 37
Bar Diagrams
Bar Diagram: To represent Discrete variable, Categorical data
Examples:

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 10 / 37
Continuous data: Histogram

Histogram is the graphical representation of frequency distribution of


continuous data.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 11 / 37
Continuous data: Histogram

Histogram is the graphical representation of frequency distribution of


continuous data.

test scores frequency (f ) class width (d) frequency density (r = df )


0-10 7 10 0.70
10-30 11 20 0.55
30-50 14 20 0.70
50-60 16 10 1.60
60-80 20 20 1.00
80-100 9 20 0.45
100-120 3 20 0.15

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 11 / 37
Histogram

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 12 / 37
Frequency Distribution

Frequency distribution is a form of data summarization. It is obtained by


first dividing the range of (sample) values into a number of class-intervals,
and then classifying the values into these intervals. A frequency
distribution thus consists of a set of class-intervals and the frequency of
values in each of them.
It helps comprehension of both location and dispersion of data.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 13 / 37
Data Summary

Location / Central Tendency – Measure of the center point of any


data set
Measures of location: Mean, Median, Mode, Quartiles
Spread / Dispersion – Measure of the spread of any data set around
its center
Measures of dispersion: Variance, Range, Quartile deviation
Shape – Measure of symmetry of any data set around its center.
The data may be symmetric or asymmetric. Asymmetric data can be
positively or negetively skewed.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 14 / 37
Measures of Location

Arithmatic Mean: Simple average or weighted average


x̄ = x1 +x2 +...+x
n
n
OR x̄ = x1 f1f+x 2 f2 +...+xk fk
1 +f2 +...+fk
Mode: Mode is the most frequently occurring data point in a data set
Median: Median is the middle most point of a data set arranged in
an ascending or descending order
Example:

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 15 / 37
Spread Measures

Range is the difference between the maximum and minimum data


value R(x) = xmax − xmin
Variance or Standard Deviation tell us how individual data points
2 2 +...+(x −x̄)2
are spreaded around mean V (x) = (x1 −x̄) +(x2 −x̄)
n
n

If the graph represents delivery time of an item to different customers then


individual customer’s experience about delivery time would be different.
M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 16 / 37
Different Shapes

Symmetric data: Mean= Median= Mode


Positive skew data: Mean> Median> Mode
Negative skew data: Mean< Median< Mode

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 17 / 37
Inter Quartile Range

The lower quartile Q1 is the value such that 1/4th of the observations
fall below it and 3/4ths fall above it.
The middle quartile Q2 is the median
The third quartile Q3 is the value such that 3/4ths of the
observations fall below it and 1/4th above it.
The Inter Quartile Range IQR is the difference between the third
quartile and he first quartile.
Thus IQR = Q3 − Q1

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 18 / 37
Box Plot

Box Plots are simple means of providing a useful picture of how the
data are distributed.
To draw Box Plot
Determine Q1 , Q3 and IQR
A line is drawn at the median to divide the box
Two lines, known as Whiskers are drawn outward from the box.
One line extends the top edge of the box at Q3 to either xmax or
Q3 + 1.5 × IQR whichever is lower. Another line from the bottom
edge of the box at Q1 extends downward to a value that is either the
xmin or Q1 –1.5 × IQR whichever is greater.
The end points of the whiskers are known as upper and lower adjacent
values
Values that fall outside the adjacent values are candidates for
consideration as outliers. They are plotted as bullets (◦).

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 19 / 37
Box Plot

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 20 / 37
Probability Distribution

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 21 / 37
Probability

Probability is a mathematical language for quantifying uncertainty.


Few terminologies:
Sample Space Ω is the set of possible outcomes of an emperiment
Points ω in Ω are called sample outcomes
Subsets of Ω are called events
Example: A coin is tossed twice. Then Ω = {HH, HT , TH, TT }. The
event (A) that the first toss is a head is A = {HH, HT }

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 22 / 37
Definition of Probability

A function P that assigns a real number P(A) to each event A is a


probability distribution or a probability measure if it satisfies the
following three axioms:
Axiom 1: P(A) ≥ 0 for every A
Axiom 2: P(Ω) = 1
Axiom 3: If A1 , A2 , . . . are pairwise disjoint then

X
P(∪∞
i=1 Ai ) = P(Ai )
i=1

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 23 / 37
Few Properties of Probability

P(φ) = 0
A ⊂ B ⇒ P(A) ≤ P(B)
0 ≤ P(A) ≤ 1
P(Ac ) = 1 − P(A)
A ∩ B = φ ⇒ P(A ∪ B) = P(A) + P(B)
For any two events A and B, P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
|A|
If Ω is finite and if each outcome is equally likely then P(A) = |Ω| ,
where |.| denotes cardinality of a set

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 24 / 37
Independent Events, Conditional Probabity

Two events are independent if P(AB) = P(A)P(B)

Conditional Probability of event A given B is defined as


P(A|B) = P(AB)
P(B) , if P(B) > 0

If A and B are independent then P(A|B) = P(A)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 25 / 37
Random Variables

A random variable (r.v) is a mapping X : Ω → R that assigns a real


number X (ω) to each outcome ω
Example: A coin is tossed three times.
Ω = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }. Let X is
the number of heads in a particular trial

ω HHH HHT HTH HTT THH THT TTH TTT


X (ω) 3 2 2 1 2 1 1 0
P(X (ω)) 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
The probability distribution of the random variable X is given by
x 0 1 2 3
P(X (ω)) 1/8 3/8 3/8 1/8

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 26 / 37
Probability Distribution of a RV

If X is a discrete r.v f (x) = P(X = x) is called the Probability mass


function (p.m.f) of X

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 27 / 37
Probability Distribution of a RV

If X is a discrete r.v f (x) = P(X = x) is called the Probability mass


function (p.m.f) of X
Any function f (x) to be a p.m.f of a r.v X must satisfy

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 27 / 37
Probability Distribution of a RV

If X is a discrete r.v f (x) = P(X = x) is called the Probability mass


function (p.m.f) of X
Any function f (x) to be a p.m.f of a r.v X must satisfy
i) f (x) ≥ 0

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 27 / 37
Probability Distribution of a RV

If X is a discrete r.v f (x) = P(X = x) is called the Probability mass


function (p.m.f) of X
Any function f (x) to be a p.m.f of a r.v X must satisfy
(x) ≥ 0
i) fP
ii) i f (xi ) = 1 where x1 , x2 , . . . are all possible values taken by X

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 27 / 37
Probability Distribution of a RV

If X is a discrete r.v f (x) = P(X = x) is called the Probability mass


function (p.m.f) of X
Any function f (x) to be a p.m.f of a r.v X must satisfy
(x) ≥ 0
i) fP
ii) i f (xi ) = 1 where x1 , x2 , . . . are all possible values taken by X
If the r.v X is continuous
R∞ and if there exists a function f (x) such that
f (x) ≥ 0 for all x, −∞ f (x)dx = 1 and for every a ≤ b,
Rb
P(a < X < b) = a f (x)dx.
The function f (x) then is called the probability density function
(pdf) of X

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 27 / 37
Expectation and Variance

The expected value/ mean/ first moment of X


n P xf (x) if X is discrete
µ = E(X ) = R xfx (x)dx if X is continuous

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 28 / 37
Expectation and Variance

The expected value/ mean/ first moment of X


n P xf (x) if X is discrete
µ = E(X ) = R xfx (x)dx if X is continuous

If there is a large number of iid draws X1 , X2 , . . . , Xn from the


probability distribution of X then E(X ) ≈ ni=1 Xi /n (Weak Law of
P
Large Numbers)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 28 / 37
Expectation and Variance

The expected value/ mean/ first moment of X


n P xf (x) if X is discrete
µ = E(X ) = R xfx (x)dx if X is continuous

If there is a large number of iid draws X1 , X2 , . . . , Xn from the


probability distribution of X then E(X ) ≈ ni=1 Xi /n (Weak Law of
P
Large Numbers)
Any function
R Y = r (X ) of r.v X is also a r.v and
E(Y ) = r (x)f (x)dx

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 28 / 37
Expectation and Variance

The expected value/ mean/ first moment of X


n P xf (x) if X is discrete
µ = E(X ) = R xfx (x)dx if X is continuous

If there is a large number of iid draws X1 , X2 , . . . , Xn from the


probability distribution of X then E(X ) ≈ ni=1 Xi /n (Weak Law of
P
Large Numbers)
Any function
R Y = r (X ) of r.v X is also a r.v and
E(Y ) = r (x)f (x)dx
The variance of a r.v RX denoted by σ 2 is defined by
V (X ) = E(X − µ)2 = (x − µ)2 f (x)dx

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 28 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.
What are the values X can take?

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.
What are the values X can take?
X may take value x ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.
What are the values X can take?
X may take value x ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Check P(X = x) = 10 x 10−x for x ∈ {0, 1, . . . , 10}

x (0.05) (1 − 0.05)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.
What are the values X can take?
X may take value x ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Check P(X = x) = 10 x 10−x for x ∈ {0, 1, . . . , 10}

x (0.05) (1 − 0.05)
P10 10
Check x=0 x (0.05)x (1 − 0.05)10−x = 1

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution: Example

Suppose a company producing bulbs is packing it in a lot of size 10.


The company has a 5% chance of producing a defective bulb.
Therefore each lot may contain some defective bulbs.
Define X to be the number of defective bulbs in a randomly chosen
lot.
What are the values X can take?
X may take value x ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Check P(X = x) = 10 x 10−x for x ∈ {0, 1, . . . , 10}

x (0.05) (1 − 0.05)
P10 10
Check x=0 x (0.05)x (1 − 0.05)10−x = 1
X ∼ Binomial(10, 0.05)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 29 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}
Check E(X ) = ni=0 x xn p x (1 − p)n−x = np
P 

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}
Check E(X ) = ni=0 x xn p x (1 − p)n−x = np
P 

Check V(X ) = E (X 2 ) − (np)2 = np(1 − p)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}
Check E(X ) = ni=0 x xn p x (1 − p)n−x = np
P 

Check V(X ) = E (X 2 ) − (np)2 = np(1 − p)


It is a symmetric distribution if p = 1/2

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}
Check E(X ) = ni=0 x xn p x (1 − p)n−x = np
P 

Check V(X ) = E (X 2 ) − (np)2 = np(1 − p)


It is a symmetric distribution if p = 1/2

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Binomial Distribution

X ∼ Binomial(n,
 p), where n is an integer and 0 < p < 1 if
P(X = x) = xn p x (1 − p)n−x for x ∈ {0, 1, . . . , n}
Check E(X ) = ni=0 x xn p x (1 − p)n−x = np
P 

Check V(X ) = E (X 2 ) − (np)2 = np(1 − p)


It is a symmetric distribution if p = 1/2

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 30 / 37
Poisson Distribution

X is said to have a Poisson distribution with parameter λ if


x
P(X = x) = e −λ λx! for x ∈ {0, 1, 2, . . .}

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 31 / 37
Poisson Distribution

X is said to have a Poisson distribution with parameter λ if


x
P(X = x) = e −λ λx! for x ∈ {0, 1, 2, . . .}
Check ∞ −λ λx = e −λ e λ = 1
P
x=0 e x!

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 31 / 37
Poisson Distribution

X is said to have a Poisson distribution with parameter λ if


x
P(X = x) = e −λ λx! for x ∈ {0, 1, 2, . . .}
Check ∞ −λ λx = e −λ e λ = 1
P
x=0 e x!
Poisson distribution can be seen as an approximation to Binomial
distribution when n → ∞ and np ≈ finite.

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 31 / 37
Poisson Distribution

X is said to have a Poisson distribution with parameter λ if


x
P(X = x) = e −λ λx! for x ∈ {0, 1, 2, . . .}
Check ∞ −λ λx = e −λ e λ = 1
P
x=0 e x!
Poisson distribution can be seen as an approximation to Binomial
distribution when n → ∞ and np ≈ finite.
Example: Let X is defined as the number of accidents in a particular
traffic signal in a year. Suppose the signal has a history of average 5
accidents per year. Then X ∼ Poisson(5)

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 31 / 37
−λ λx
P∞
Check E(X ) = x=0 xe x! =λ
V(X ) = E(X − λ)2 = λ

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 32 / 37
Normal Distribution

Most frequently used probability distribution in Statistics


Many natural phenomena follows Normal Distribution – Human
characteristics such as weights, heights and IQ’s
A continuous r.v X is said to follow Normal distribution
2
with
(x−µ)

parameters µ and σ 2 if it has pdf f (x) = √1 e 2σ 2 for
σ 2π
−∞ < x < ∞
Symmetric distribution: Mean=Median=Mode=µ
Variance=σ 2

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 33 / 37
M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 34 / 37
Standard Normal Distribution

X −µ
If X ∼ N(µ, σ 2 ), then Z = σ ∼ N(0, 1). Z is then called the Standard
Normal Variable

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 35 / 37
Central Limit Theorem

Theorem (Central Limit Theorem)


Let X1 , X2 , . . .P
, Xn be i.i.d random variables with mean µ and variance
σ . Let X̄n = ni=1 Xi , then
2


X̄n − µ n(X̄n − µ)
Zn = p = Z ∼ N(0, 1)
V(X̄n ) σ

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 36 / 37
Thank You

M.Biswas (BKC College) Descriptive Statistics and Probability Distributions July 9, 2019 37 / 37

Вам также может понравиться