Академический Документы
Профессиональный Документы
Культура Документы
ASST4403
Lecture 12: DATA ANALYSIS
Learning outcomes
Present data visually and numerically, e.g. histogram Identify distributions from data by means of e.g. g histogram Perform simple linear regression
How confident are we? How much can we trust the results? How well have we done?
E s tim a tin g p a ra m e te rs
Histogram
Frequency distribution
Frequency distribution: data presented as class intervals and their corresponding frequency Range: the difference between the largest and the smallest data values al es Number of classes (bins): Sturges' rule: select a bin size such that there are about 1 + log2n nonempty bins (n is the number of data values) Class midpoint: average of the class endpoints Relative frequency: the ratio of the frequency of the class interval to the total frequency C Cumulative l ti f frequency: running i t total t l of f the th classes l of frequency distribution
5
6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 73 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 82 8.2 8.3 8.4 8.5 8.6 8.7 8.8
6.5 6 5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 74 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 83 8.3 8.4 8.5 8.6 8.7 8.8
0 8 0 1 0 0 15 0 14 0 0 3 0 3 0 0 9 0 3 0 0 2 1 0 1
n=60 data values, range = 2.3, class width=0.1, number of classes (bins) = 25
6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 Variable
(If we use Sturges rule, number of classes (bins) ( ) =1+log g2n= 1+log g260=7, 7, then class width should be 2.3/7=0.33)
Histogram
Graph of relative frequencies, representing the underlying distribution (PDF). Construct a histogram
Sort data in ascending order Find data range (max-min) Decide on the number of intervals ( (bins) ) of equal q size and bin size (trial and error, there is no best number)
St Sturges'' rule: l select l t a bi bin size i such h th that t th there are about 1 + log2 n non-empty bins (n is the number of samples)
Histogram
1 f (t ) e 2
)2 (t 2 2
100
8
Reproduced with courtesy from Jo Sikorska
Histogram
e f (t )
05 0.5
9
Reproduced with courtesy from Jo Sikorska
10
1000 2000
999 1999
30 4 1
35 30
Frequency
11
20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790
20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790
1 4 9 4 6 4 0 2 0 1 0 1 0 1 1 0 1
10 9 8 7 Fr requency 6 5 4 3 2 1 0 20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790 Variable
12
6 classes (proper)
Lower End Upper End Frequency
Fre quency
18 12 1 3 0 1
An exponential distribution?
n=35 data values, range = 2005, class width=400, number of classes (bins) = 6 (Using Sturges rule, number of classes (bi ) =1+log (bins) 1+l 2n= 1+log 1+l 235=6, 35 6 th then class l width should be 2005/6=334)
13
14
15
16
Example: histogram
A normal distribution?
17
Simple regression
Process of constructing a mathematical model of f function ti to t predict/determine di t/d t i one variable i bl by b another Simple regression = linear regression, two variables Dependent variable = the variable to be predicted, y Independent variable (explanatory variable) =predictor x =predictor, How well does it fit? Find the coefficient of correlation l ti r (as ( close l t to 1 as possible) ibl )
18
y mx b
b
tg =m
19
Example
20
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html
21
22
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html
Example
Individual Annual income ($000) Weekly time on National Direct Calls (minutes) 1 23 69 2 29 95 3 29 102 4 35 118 5 42 126 6 46 125 7 50 138 8 54 178 9 64 156 10 66 184 11 76 176 12 78 225
Slope Intercept r
23
Annualincome ($000)