Академический Документы
Профессиональный Документы
Культура Документы
Lecture 7
László Pótó
From the sample to the population…
Remember: biometrics is about making conclusion from
the collected data (sample) to the unknown population.
Typical questions:
- Is a given lab-data (of a group of patients) different from the
„healthy” value? (what is the expected value – for healthy people?)
- Is a measuring tool/process sharp enough (pipette, drug content
of pills, box of sugar, and so on…)?
- Does a complete series of measurements give the proof that the
values are over a certain limit (air or water pollution, …)?
–
The problem: how to make conclusion: from x and sx to µ and
–x and s (and ‘n’, so the measures of the sample) are known…, but:
x
what about µ and ? So: which population come the sample from.
Two methods: - estimation
- hypothesis testing
The estimation
Point-estimation: ¯x µ and sx
We did this when we supposed based on the 50 students body height data,
that µ=170cm =8cm for the population.
Interval-estimation:
¯ sx a, b ,(two values), so that we can say that
x,
µ is inside of the (a,b) interval by a given probability.
This „ given probability” is the confidence of the estimation.
close to 100%, like 99%, 95%, 90%.
It is already known some such intervals whit a given confidence:
for the bh data: (µ=170cm =8cm ) – so, from last week table:
2 154-186cm (95%) 3 146-194cm (99.7%)
for the 16 data sample means: µ=170cm /n=2cm )
2 166-174cm (95%) 3 164-176cm (99.7%)
68%
95% z × /n
99.7%
Let’s take all the 100 different means and draw around each of them
the 2/n (95%) interval!
How many out of that 100 intervals contains the µ?
95 eset
5 eset
The center (mean) of some of those 2/n length intervals are located
inside of the 2/n range. These will contain the µ - but some means
are outside and those ones will not. ‘Out of 100’ means: those 95 „inside”
2/n intervals would contain µ (confidence), but 5 not (error-risk).
2nd step: is unknown so replace /n by the sx /n
The S.E. can be smaller or larger than /n –
so how about the confidence?
164 166
166 168 170 172 174 176
95 eset
5 eset
How many out of the 95 will be shorter – and longer out of those 5?
(based on the binomial distribution…)
The confidence is decreasing!
It depends on the sample size!
8 2,31
1.
10 2,23
2.
15 2,13
3.
20 2,09
50 2,01 95.
1000 1,96
96.
Z= 1,96
100.
Checking the method: try the opposite hypothesis at the 1st point
The hypothesis testing – 2.
Hypothesis testing in biometrics
„The drug content of 16 pills…” example.
Mean: 102.1 mg, S.D. 4mg. Can be the expected value 100mg?
1, Suppose that =100mg is true! No significant difference, the
difference is just by chance! — „null”-hypothesis — : H0
2, Let’s choose the low-end of „probable” is 5%. „Border for
decision”: . So let it be now = 0.05
3, If =100mg, than how probable is that the mean of 16 data
would differ from the 100mg at least by 2.1mg?
- As to last week: the difference between the mean and the is t*S.E. (here
SE= 4mg/16=1mg) where „t” follows df=n-1 (here 15) t distribution.
- In our case t=2.1/1=2.1 (-times the S.E.). At the t15-curve at 2.13 (figure!)
would „cut” 5% area (probability), so the prob. of „at least 2.1-times” S.E.
difference is >5%. So that p>0.05 (=„probable”) – (figure)
4, Decide about the hypothesis („ =100mg”)
Because at point 3: p> („probable”), not to reject!
5, Conclusion: The mean is not significantly different than the
hypothetical expected value. So can be 100mg!
What did we do here?
We checked how different is the mean than a hypothetical („ H0”)
expected value. (in S.E. units: „t” times)
When the difference „t” is big
(= the area under the t curve – outside of the ‘t’ - is small that means:
at least this size of difference has small probability if H0 was true)
than our sample (the fact) are against of our hypothesis (null-hypothesis)
See: everyday life model of hyp. test: Reject the null-hypothesis!
When the difference „t” is small (= the area under the t curve is big)
at least this size of difference has large probability if H0 was true)
than our sample (the fact) is not against of our hypothesis
(the „null-hypothesis).
See: everyday life model of hyp. test: Do not reject the null-hypothesis!
The probability (area) can be calculated knowing „t” (and n) using the
prob dens function. By computer: „p=” (sharp) or from table: „p< ”.
This is the: One sample t test.
An other (special) case
The effect of diet + training was checked: did it lowered the blood-cholesterol? The
lab data of the 12 patients (2 datasets but in paired arrangement…):
serial 1 2 3 4 5 6 7 8 9 10 11 12
before 201 231 221 260 228 237 326 235 240 267 284 201
after 200 236 216 233 224 216 296 195 207 247 210 209
diff -1 5 -5 -27 -4 -21 -30 -40 -33 -20 -74 8
The „difference”: x¯= -20.17 sx= 23.13 S.E.=sx/n=23.13/ 12=6.68
-If this (p) probability is not less than a predefined limit (), means
no reason to reject the null-hypothesis.
1, H0: the treatment was ineffective B(12, 0.5) 2, =0.05 (border value).
3, How probable is „at least that difference” from the expected k=6?
This probability is 2*(p(k=0)+p(k=1)+p(k=2)) - figure -,
= 2*(0.02%+0.29%+1.61%) = 2*1.93% It is less than =5% .
4, Here p< , so that reject H0.
5, Conclusion: the diet+training was effective.
The difference was significant.
This method is the: sign test.
Note, please: normal distribution was not supposed! The method can be applied just in those
cases: when the data are not normally distributed (t test is not applicable) this test works well
Goals for the 7th week - What was it today?
The 3 steps „road” to the last week method of the statistical inference:
the interval-estimation understand it
The confidence interval for the expected value:
¯x ± t* sx /n calculate
The hypothesis testing
an everyday life model for the method’s 5 steps (was there rain?)
1. Formulate the starting hypothesis (H0)
2. what is the limit between the „small” and „large” probability ()
3. what is the probability to observe
„at least this difference” from the (when H0 sets)?
– t is the size p is the probability (significance).
4. decide about H0: — when p < than reject… (‘not probable…’)
— when p do not reject… (‘probable…’)
5. conclusion ( based on H0 what is the meaning of the decision?)
3 methods: One sample (+ paired) t tests and the sign test
Coming next: compare the two methods, errors, …
From the textbooks :