Вы находитесь на странице: 1из 22

Data Analysis

Quantitative Data Analysis


for Undergraduate Students

Yasser Elshayeb, PhD


yasser.elshayeb@aucegypt.edu

09/24/2020
Data Analysis Course Plan
• Introduction and Setup (Session I)
 What is Data Analysis? Randomness as a rule for life?
 Tools that will be used throughout the course
 Data: Data banks, Data files, etc.
• Basic Statistics (reminder) (Session I)
 Random Numbers, Probability Theory
 Basic Statistical measures: Mean, Median, Mode, variance, Standard Deviation, etc.
 Correlation
 Some Statistical Distributions and their usage
• Data Analysis (Séance II)
 Data Matrix
 Data Analysis life cycle
 Typology of Data Analysis Techniques
 Data Preparation
• Visual Data Analysis (Session III)
• Exploratory methods of data Analysis (Session IV)
 Principal Component Analysis
 Clustering Analysis
• Inferential / Predictive Data Analysis (Session V)
 Regression Analysis
 Analysis of Variance
• Other techniques of Data Analysis (Session VI)
 Time Series Analysis
 Spatial Data Analysis
 Case Based Reasoning
 Etc.
©yasser.elshayeb@aucegypt.edu
Data Analysis Course Plan
• Sessions:
 Session I: August 8th 2020
 Session II: August 11th 2020
 Session III: August 15th 2020
 Session IV: August 18th 2020
 Session V: August 22nd 2020
 Session VI: August 25th 2020
• Evaluation:
 Quizzes during the sessions and after
 End of course project
• Tools:
 Excel
 Tableau online
 R

©yasser.elshayeb@aucegypt.edu
Data Analysis Plan: Session II
• Statistics & Probability
• The Data Matrix
• Data Preparation
• Geometry of the Data Matrix
• Typology and Cycle of Data Analysis

©yasser.elshayeb@aucegypt.edu
Data Analysis Sampling

©yasser.elshayeb@aucegypt.edu
Data Analysis Probability I

number of times E occurs


P( E ) 
number of possible times

©yasser.elshayeb@aucegypt.edu
Data Analysis Probability II
• Mutually exclusive events [P(E1 AND E2)=0], [P(E1 OR E2) =
P(E1) + P(E2)]
• Independent events [P(E1 AND E2) = P(E1) * P(E2)],
• [P(E1 OR E2) = P(E1) + P(E2) – P(E1 AND E2)]
• Conditional probability of event A occurring given that event B
has already occurred is denoted by P(A|B) and can be calculated
as P(A|B)=P(A AND B)/ P(B).
• Complimentary events P(A) = 1 – P(A’)
• P(at least one of two independent events occurs) = P(A OR B) =
1–P(A’ AND B’) = 1–P(A’) * P(B’)

©yasser.elshayeb@aucegypt.edu
Data Analysis Central Theorem limit (Law of LARGE Numbers)
• The whole world is NORMAL @ large numbers

©yasser.elshayeb@aucegypt.edu
Data Analysis Binomial Distribution I
• Concern the cases where only two probabilities
(Success or failure) are present in each
experiment and each experiment is independent
• IF Probability of Success is P(S) = S; THEN
Probability of Failure is P(F) = F = 1-S
• P(x) = Proportion of batches with x S’s =

• Mean of binomial distribution = n.S

n!
 S x (1  S ) n  x x  0,1, ..., n
x!(n  x)!

©yasser.elshayeb@aucegypt.edu
Data Analysis Binomial Distribution II

0,30

0,24

0,18

0,12

0,06

0,00
0 1 2 3 4 5 6 7 8

S = 0,5625, n = 8
©yasser.elshayeb@aucegypt.edu
Data Analysis Poisson Distribution I
• Used as a model for the number of times an “event” occur during
a specified time period or in a particular region of space.

_λ x
e λ
• Mean of fthe 
( x)Poisson x  0,1=
distribution , 2λ, ... & λ  n. S  0
x!
IF
n 100, S  0.01, λ  n.S  20
THEN
λ x
n! x n x e λ
S (1  S ) 
x!(n  x)! x!
©yasser.elshayeb@aucegypt.edu
Data Analysis Poisson Distribution II

0,24

0,20

0,16

0,12

0,08

0,04

0,00
0 1 2 3 4 5 6 7 8 9 10 11 12

©yasser.elshayeb@aucegypt.edu
Data Analysis Plan of the Presentation
• General Concepts:
 The Data Matrix
 Data Preparation
 Geometry of the Data Matrix
 Typology
• Uni, Bi, and Multi -Variate “Visual” Data Analysis
• MultiVariate Data Analysis: Structure
 Principal Component Analysis & and Factor Analysis
 Clustering Analysis
• MultiVariate Data Analysis: Prediction
 ANOVA
 Regression Analysis
 Discriminant Analysis
 Canonical Analysis
• Special Analysis (Case Based Reasoning, Time Series, etc.)

©yasser.elshayeb@aucegypt.edu
Data Analysis Data Analysis:
General Concepts

09/24/2020
Data Analysis The Data Matrix 1
• Contains Large amount of Variables (Parameters) and Cases (Individuals)
• IV’s and DV’s;
• All Variables are Correlated with one another to varying degrees;
• All Individuals are similar with one another to varying degrees;
• Variables are:
 Quantitative
o Continuous (Measurements)
o Discrete (Counts)
o Ordinal (Scale)
 Qualitative (Descriptive)
• It is the result of Experimental and Non Experimental (Correlational and
Survey) Researches.

©yasser.elshayeb@aucegypt.edu
Data Analysis The Data Matrix 2
Variables:
Continuous Variables
Categorical
Binary

X
Text
etc. Measurements

(r,c)

Cases
Ranges
(n,p) Chronologies

Proximities Binaries
Data
Frequencies

Multiple Modalities Grades

©yasser.elshayeb@aucegypt.edu
Data Analysis Data Preparation
• Discritization
Transformation of continuous values to Ranges
• Coding
Transformation of Qualitative values to numerical values
• Normalization
Value = (value-µ)/STD
• Missing Values
 Omission
 Mean Value
 Estimation
 Etc.
• Outliers
 Detection
 Patterns

©yasser.elshayeb@aucegypt.edu
Data Analysis Geometry of the Data Matrix
p
p
1 j p R xip
1 x11 x1j x1p xi
j
xij
xi1
i xi1 xij xip 1
n
n
R xnj
n xn1 xnj xnp
xj
i
xij
x1j
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Matrix: Basic Statistics
• Mean
• Variance and Standard Deviation
• Covariance
• Correlation and Matrix of Correlations
• Eigenvalues
• Eigenvectors

©yasser.elshayeb@aucegypt.edu
Data Analysis Typology 2
Multivariate Methods

Dependant Methods Interdependence Methods

Multiple One Several Multiple Metric Nonmetric


Dependant/ Dependant Dependant Dependant &
Interdependent Variable Variables Independent
Variable Variables
Relationship

Correspondence
Analysis
Metric Nonmetric Metric Nonmetric

Nonmetric
Cluster Multidimensional
Canonical
Analysis Scaling
Conjoint Analysis
Analysis Linear With
Probability dummy Factor
Models Variable Analysis
Structural Multiple
Equation Correlation Multiple MANOVA Metric Metric
Modeling Multiple Discriminant Canonical Canonical Multidimentional
Regression Analysis Analysis Analysis Scaling
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Analysis Strategy
Problem
Or
Research Question
Decision

Elaboration
Communication

Prediction Classification Data Collection

Pattern
Data Preparation
Evaluation
Synthesis
And Patterns
Data Exploration
©yasser.elshayeb@aucegypt.edu
Data Analysis
Thank you

Вам также может понравиться