Академический Документы
Профессиональный Документы
Культура Документы
BIOSTATISICS
brothers
P.S.S.Sundar Rao, J.Richard. An introduction to Biostatistics,3rd
Introduction Definition Essential features of statistics Collection of Data Sources of data Sampling
Presentation of data
Measures of central tendency Variability & its measure The normal curve Probability conclusion
Test of significance
Etymology:
Italian word
Definitions:
research methodology.
It is a pervasive force on which the entire
concepts in statistics.
These are the mathematical methods by which the
relationship observed in the data occurred purely by chance or was there a real relationship between the variables, thereby testing the hypothesis proposed at the start of the study.
They constitute a common yardstick that can be
John Graunt
tabulating & interpreting the data related to living organisms & human beings. Application of statistics to health problems.
Health statistics public/ community health. Medical statistics medicine. Vital statistics demography. Dental statistics dentistry.
To
To define what is normal or healthy in a population. Ex:pulse rate/ min. find:Statistical difference between means of two variables. Ex: mean blood pressure of two cricket teams after a cricket match. Co-relation between two variables. Ex; - Female literacy rate & Infant Mortality Rate. usefulness of sera & vaccines in the field. % of deaths among vaccinated compared to % of deaths among nonvaccinated. To test the efficacy of different treatments eg. Medical management and surgical management of angina patients.
9
To assess the state of oral health in the community and to determine the availability and utilization of dental care facilities. To indicate the basic factors underlying the state of oral health by diagnosing the community and find solutions to such problems. To determine success or failure of specific oral health care programs or to evaluate the programme action. To promote health legislation and in creating administrative standards for oral health.
10
Definition: A collective recording of observations either numerical or other-wise. two broad categories:
Data
Qualitative data
Nominal Dichotomous Ordinal
Quantitative data
Discrete
Continuous
11
Types:
sources
Primary
Secondary
12
Interval scale No absolute zero Eg. Centigrade scale of temperature Ratio scale
13
measurement scales
Examples Recording blood groups a) O
b) A
c) B
d) AB
a) Caries others
have a clearly implied direction but the data are not measured on a measurement scale
Examples Severity of patient perceived pain
population
categorical data
16
The first regular census in India- 1881 Recent census in India- March, 2001 Census act- 1948
accuracy
18
Variable:
a name denoting a condition , occurrence or effect that can assume different values
Divided: subgroups ,classes. have lowest and highest values Class interval : difference between the upper and lower limit of a class Eg: in the class 5 -14, 5 - lower limit and 14 - upper limit. class interval = 14 - 5 =9. Frequency: is the number of units belonging to each
group of the variable. Frequency distribution table: way of presenting data in the tables 21
Extremely useful
23
TYPES OF DIAGRAMS:
Bar Diagram : qualitative data. Multiple Bar: qualitative data Component Bar Diagram: qualitative data. Frequency Polygon: qualitative data Pie Diagram: qualitative data
frequencies
24
Basic rules :
self explanatory simple and consistent with the data. values of the variables - on horizontal or X-axis and
No too many lines on the graph, should not look The scale of presentation right hand top corner
The scale of division of the two axes should be the details of the variables and frequencies
25
27
a single variable. Eg: sex wise or with respect to time or region. each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.
28
29
subgroups.
30
31
between different major groups of observations, then bars are drawn for each group with the same length, either as 1 or 100%. These are then divided according to the sub-group proportion in each major group.
32
The frequency of the group is shown in a circle. Degree of angle denotes the frequency. Instead of comparing the length of bar , the areas
33
time simplest type X-axis, - hours, days, weeks, months or years Y-axis- value of any quantity pertaining to X-axis,
34
quantitative data of continuous type. bar diagram without gap between the bars. represents a frequency distribution. X-axis: the size of an observation is marked. Starting from
0, the limit of each class interval is marked, the width corresponding to the width of the class interval in the frequency distribution.
Y-axis :the frequencies are marked. A rectangle is drawn
above each class interval with height proportional to the frequency of that interval.
35
36
38
Weight in KGs
Weight
39
characteristic.
40
41
Standard Deviation
42
single estimate of a series of data that summarizes the data is known as the parameter and one such parameter is
to facilitate comparison
Arithmetic mean- mathematical estimate. Median - positional estimate. Mode- based on frequency.
43
Types:
Should be easy to understand and compute. should be based on each and every item in the series. should not be affected by extreme observations
44
Ungrouped data:
Mean =
Sum of all the observations of the data Number of observations in the data
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
45
0 1 2 3 4 5 6 7 8 9 10 Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5
In an ordered array, the median is the middle number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers
46
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 3 4 5 6 No2Mode
47
mean
48
median
49
mode
50
Variation Range
Interquartile
Standard Deviation
Population Standard Deviation
Variance
Population Variance Sample Variance
Range
51
distributed
Range = 12 - 7 = 5
Range = 12 - 7 = 5
10
11
12
10
11
12
52
mean for Dr A = 32/8=4 days mean for Dr B = 32/8=4 days mean for Dr C = 32/8=4 days
Sample variance:
S
2
Xi X
i 1
n 1
(x-x) Dr A = -2,0,-1,0, 2,2,-2,1 = 0 Dr B = 0,1,0,-1,0,1,-1,0 = 0 Dr C = -1, -1, 4,-1,-1,-1,-1,0 = 0 (x-x)2 Dr A = 18, Dr B = 4 , Dr C = 22 Thus, Dr A =18/8 = 2.25 Dr B = 4/8 = 0.5 Dr C = 22/8 = 2.75
54
Most important measure of variation Shows variation about the mean Root Mean Square Deviation So for Dr A = 1.5 Dr B = 0.7 Dr C = 1.66 Has the same units as the original data Sample standard deviation:
(for smaller samples <30)
Xi X
i 1
n 1
55
Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
Data C
s = .9258
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 4.57
56
Summarizes the deviations , of a large distribution Indicates whether the variation from mean is by
chance or real
Helps in finding standard error Helps in finding the suitable size of sample Standard
summary
Compare relative variability Variation of same character in two or more series compare the variability of one character in two different
expressing in percentage
C V = S.D x 100
curve is bell shaped. The curve is symmetrical about the middle point. The mean is located at the highest point of the curve measures of central tendency coincide. Maximum number of observations is at the value of the
correspond to the number of observations between any 2 values of the variant - in terms of a relationship between the mean and the SD: a) Mean 1 S.D. covers 68.3% of the observations; b) Mean 2 S.D. covers 95.4% of the observations; c) Mean 3 S.D. covers 99.7% of the observations. This relationship is used for fixing confidence interval.
Normal distribution law forms the basis for various
tests of significance.
60
61
Non-symmetrical distribution
Positively skewed
The further apart the mean and median, the more the distribution is skewed.
62
Left-Skewed
Mean < Median < Mode
Symmetric
Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
63
relative frequency or probable chances of occurrence with which an event is expected to occur on an average
Expressed as p Ranges from 0-1 when p= 0, no chance of event happening When p=1 , 100% chances of event happening
Methods to estimate the difference b/w estimates of samples two hypothesis are made:
population
The difference found is accidental & arises out of sampling
variations
65
States that sample result is different than the hypothetical value of population To minimize errors the sampling distribution or area under normal curve is divided into two regions or zones mean
1.Zone of acceptance :samples in the area of 1.96 SE, null hypothesis accepted
2.Zone of rejection: sample in the shaded area is beyond the mean 1.96 SE, null hypothesisrejected
66
67
Type I error :
68
null hypothesis is wrongly accepted error the null hypothesis is accepted even it falls in the zone
Type II error
of rejection not serious error, needs only confirmation of result by changing the level of significance
69
Accept it
Reject it
Correct decision
Type II error
Type I error
Correct decision
70
freedom
In unpaired t test of difference between 2 means
df = n1+n2-2
Where;n1 & n2 are no observations.
In paired t- test df = n-1
71
Standard error of mean = SD of means of several sample from same population SE = SD of observation in the sample No of observation in the sample Variation in biological observation
72
probability 1
Measure on a scale
0.25
P < 0.001 very highly significant P < 0.01 Highly significant P < 0.05 Significant P > 0.05 Not significant
73
73
compared, it becomes essential to find whether the diff observation b/w the 2 groups is because of sampling variation/ any other factor Method Tests of Significance
74
Tests of significance
Parametric tests:
Their model specifies certain conditions about the parameters of the population from which the research sample is drawn. Used for quantitative data.
Nonparametric tests or distribution free tests:
Their model does not specify conditions about the parameters of the population from which the research sample is drawn. Used for qualitative data.
75
Parametric tests :
Chi Square test Wilcoxon signed rank test Mann-Whitney U test Spearmans correlation test Mc Nemars test Fishers exact probability test
76
Z-test
77
rejection region, i.e. you check whether the parameter of interest is larger (or smaller) than a given value. Two-sided tests are used when we test a parameter for equivalence to a certain value. Deviations from that value in both directions are rejected.
78
Large samples ( > 30) Difference observed b/w sample estimate and that of
population is expressed in terms of SE Score of value of ratio b/w the observed difference & SE is called Z Z = diff in means / SE of mean
79
Designed by W.S Gossett Used in case of small samples Ratio of observed difference b/w means of two small
samples to the SE of difference in same When each individual gives a pair of observations , to test for difference in pair of values , paired t test utilized.
80
twice within the same person - before vs. after. For example, Did the systolic blood pressure change significantly from the scene of the injury to admission? Univariate, Matched, Interval, Normal, 2 groups.
81
The most commonly used statistical test. Developed by Karl Pearson Used for qualitative data To test whether the difference in distribution of
82
Group
01
2-3
4-5 Total
30
20
15
15
5
15
50
50
50
30
20
100
83
Drawbacks :
frequency in any one cell is less than 5. Correction is done by subtracting 0.5 from |0-E| Yatess correction
84
Paired samples :
Wilcoxon signed rank test [Matched pairs test] :
Find the differences between each pair of values & assign rank to the differences from the smallest to the largest without regard to sign. In case there are ties, then we would assign each of the tied observation the mean of the ranks which they jointly occupy.
85
then put to the corresponding ranks & the test statistic T is calculated which happens to be the smaller of the two sums. [The sum of the negative ranks & the sum of the positive ranks] or smaller than the table value in order to be considered significant.
86
Unpaired samples:
Mann-Whitney test [U test]:
Used to determine whether two
Applies under very general conditions. Rank the data jointly taking them as
to the values of the 1stsample [R1] & 2nd sample [R2] separately, then work out the test statistic.
n1( n1 + n2) 2
i.e.
n1n2 +
R1
88
89
making not decision making itself. Do not explain the reasons why does the difference exist. Results are based on probabilities & as such can not be expressed with full certainty. Inferences based on them cannot be said to be entirely correct evidences concerning the truth of the hypothesis
90
Compare more than two samples Compares variation between the classes as well as
within the classes For such comparisons there is high chance of error using t or Z test One-way used to compare more than 3 means from independent groups. Is the age different between White, Black, Hispanic patients? Two-way used to compare 2 or more means by 2 or more factors. Is the age different between Males and Females, With and Without Pnuemonia? 91
relationship
The closer to 1, the stronger the positive linear relationship
92
(X x) (Y- y) Does not prove whether one variable alone can cause the change in other.
93
Thank you
94