Академический Документы
Профессиональный Документы
Культура Документы
09/24/2020
Data Analysis Course Plan
• Introduction and Setup (Session I)
What is Data Analysis? Randomness as a rule for life?
Tools that will be used throughout the course
Data: Data banks, Data files, etc.
• Basic Statistics (reminder) (Session I)
Random Numbers, Probability Theory
Basic Statistical measures: Mean, Median, Mode, variance, Standard Deviation, etc.
Correlation
Some Statistical Distributions and their usage
• Data Analysis (Séance II)
Data Matrix
Data Analysis life cycle
Typology of Data Analysis Techniques
Data Preparation
• Visual Data Analysis (Session III)
• Exploratory methods of data Analysis (Session IV)
Principal Component Analysis
Clustering Analysis
• Inferential / Predictive Data Analysis (Session V)
Regression Analysis
Analysis of Variance
• Other techniques of Data Analysis (Session VI)
Time Series Analysis
Spatial Data Analysis
Case Based Reasoning
Etc.
©yasser.elshayeb@aucegypt.edu
Data Analysis Course Plan
• Sessions:
Session I: August 8th 2020
Session II: August 11th 2020
Session III: August 15th 2020
Session IV: August 18th 2020
Session V: August 22nd 2020
Session VI: August 25th 2020
• Evaluation:
Quizzes during the sessions and after
End of course project
• Tools:
Excel
Tableau online
R
©yasser.elshayeb@aucegypt.edu
Data Analysis Plan: Session II
• Statistics & Probability
• The Data Matrix
• Data Preparation
• Geometry of the Data Matrix
• Typology and Cycle of Data Analysis
©yasser.elshayeb@aucegypt.edu
Data Analysis Sampling
©yasser.elshayeb@aucegypt.edu
Data Analysis Probability I
©yasser.elshayeb@aucegypt.edu
Data Analysis Probability II
• Mutually exclusive events [P(E1 AND E2)=0], [P(E1 OR E2) =
P(E1) + P(E2)]
• Independent events [P(E1 AND E2) = P(E1) * P(E2)],
• [P(E1 OR E2) = P(E1) + P(E2) – P(E1 AND E2)]
• Conditional probability of event A occurring given that event B
has already occurred is denoted by P(A|B) and can be calculated
as P(A|B)=P(A AND B)/ P(B).
• Complimentary events P(A) = 1 – P(A’)
• P(at least one of two independent events occurs) = P(A OR B) =
1–P(A’ AND B’) = 1–P(A’) * P(B’)
©yasser.elshayeb@aucegypt.edu
Data Analysis Central Theorem limit (Law of LARGE Numbers)
• The whole world is NORMAL @ large numbers
©yasser.elshayeb@aucegypt.edu
Data Analysis Binomial Distribution I
• Concern the cases where only two probabilities
(Success or failure) are present in each
experiment and each experiment is independent
• IF Probability of Success is P(S) = S; THEN
Probability of Failure is P(F) = F = 1-S
• P(x) = Proportion of batches with x S’s =
n!
S x (1 S ) n x x 0,1, ..., n
x!(n x)!
©yasser.elshayeb@aucegypt.edu
Data Analysis Binomial Distribution II
0,30
0,24
0,18
0,12
0,06
0,00
0 1 2 3 4 5 6 7 8
S = 0,5625, n = 8
©yasser.elshayeb@aucegypt.edu
Data Analysis Poisson Distribution I
• Used as a model for the number of times an “event” occur during
a specified time period or in a particular region of space.
_λ x
e λ
• Mean of fthe
( x)Poisson x 0,1=
distribution , 2λ, ... & λ n. S 0
x!
IF
n 100, S 0.01, λ n.S 20
THEN
λ x
n! x n x e λ
S (1 S )
x!(n x)! x!
©yasser.elshayeb@aucegypt.edu
Data Analysis Poisson Distribution II
0,24
0,20
0,16
0,12
0,08
0,04
0,00
0 1 2 3 4 5 6 7 8 9 10 11 12
©yasser.elshayeb@aucegypt.edu
Data Analysis Plan of the Presentation
• General Concepts:
The Data Matrix
Data Preparation
Geometry of the Data Matrix
Typology
• Uni, Bi, and Multi -Variate “Visual” Data Analysis
• MultiVariate Data Analysis: Structure
Principal Component Analysis & and Factor Analysis
Clustering Analysis
• MultiVariate Data Analysis: Prediction
ANOVA
Regression Analysis
Discriminant Analysis
Canonical Analysis
• Special Analysis (Case Based Reasoning, Time Series, etc.)
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Analysis:
General Concepts
09/24/2020
Data Analysis The Data Matrix 1
• Contains Large amount of Variables (Parameters) and Cases (Individuals)
• IV’s and DV’s;
• All Variables are Correlated with one another to varying degrees;
• All Individuals are similar with one another to varying degrees;
• Variables are:
Quantitative
o Continuous (Measurements)
o Discrete (Counts)
o Ordinal (Scale)
Qualitative (Descriptive)
• It is the result of Experimental and Non Experimental (Correlational and
Survey) Researches.
©yasser.elshayeb@aucegypt.edu
Data Analysis The Data Matrix 2
Variables:
Continuous Variables
Categorical
Binary
X
Text
etc. Measurements
(r,c)
Cases
Ranges
(n,p) Chronologies
Proximities Binaries
Data
Frequencies
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Preparation
• Discritization
Transformation of continuous values to Ranges
• Coding
Transformation of Qualitative values to numerical values
• Normalization
Value = (value-µ)/STD
• Missing Values
Omission
Mean Value
Estimation
Etc.
• Outliers
Detection
Patterns
©yasser.elshayeb@aucegypt.edu
Data Analysis Geometry of the Data Matrix
p
p
1 j p R xip
1 x11 x1j x1p xi
j
xij
xi1
i xi1 xij xip 1
n
n
R xnj
n xn1 xnj xnp
xj
i
xij
x1j
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Matrix: Basic Statistics
• Mean
• Variance and Standard Deviation
• Covariance
• Correlation and Matrix of Correlations
• Eigenvalues
• Eigenvectors
©yasser.elshayeb@aucegypt.edu
Data Analysis Typology 2
Multivariate Methods
Correspondence
Analysis
Metric Nonmetric Metric Nonmetric
Nonmetric
Cluster Multidimensional
Canonical
Analysis Scaling
Conjoint Analysis
Analysis Linear With
Probability dummy Factor
Models Variable Analysis
Structural Multiple
Equation Correlation Multiple MANOVA Metric Metric
Modeling Multiple Discriminant Canonical Canonical Multidimentional
Regression Analysis Analysis Analysis Scaling
©yasser.elshayeb@aucegypt.edu
Data Analysis Data Analysis Strategy
Problem
Or
Research Question
Decision
Elaboration
Communication
Pattern
Data Preparation
Evaluation
Synthesis
And Patterns
Data Exploration
©yasser.elshayeb@aucegypt.edu
Data Analysis
Thank you