Bios Tast Is Tics

1
BIOSTATISICS
B.K. Mahajan. Methods in Biostatistics, 6th edition, Jaypee
brothers
P.S.S.Sundar Rao, J.Richard. An introduction to Biostatistics,3rd
edition, Prentice Hall of India.

James F Jekel, David L Katz, Joann G Elmore. Epidemiology,
biostatistics and preventive medicine, 2nd edition, WB Saunders Company

Research methodology- C.R.Kothari, Foundations of clinical research- Portney & Watkins www.google.com
3
Introduction Definition Essential features of statistics Collection of Data Sources of data Sampling
Uses of statistics in dental science
Presentation of data
Measures of central tendency Variability & its measure The normal curve Probability conclusion
Test of significance
Etymology:
Italian word
statista- means statesman political state.
German word statistic - means
Definitions:
Statistics Principles and methods for the collection, presentation,
analysis and interpretation of numerical data.

The science and art of dealing with variation in such a way as
to obtain reliable results.

5
Statistics is an important & integral part of
research methodology.
It is a pervasive force on which the entire
spectrum of clinical decision making is dependent.

Tests of significance are one of the central
concepts in statistics.
These are the mathematical methods by which the
probability of an observed difference occurring by chance is found.

6
Hence they are used to estimate whether the
relationship observed in the data occurred purely by chance or was there a real relationship between the variables, thereby testing the hypothesis proposed at the start of the study.
They constitute a common yardstick that can be
understood by many people & also communicate
essential information about a research project.

7
John Graunt
Father of health statistics
Bio statistics It is the method of collecting, organizing, analyzing,
tabulating & interpreting the data related to living organisms & human beings. Application of statistics to health problems.
Health statistics public/ community health. Medical statistics medicine. Vital statistics demography. Dental statistics dentistry.
To
To define what is normal or healthy in a population. Ex:pulse rate/ min. find:Statistical difference between means of two variables. Ex: mean blood pressure of two cricket teams after a cricket match. Co-relation between two variables. Ex; - Female literacy rate & Infant Mortality Rate. usefulness of sera & vaccines in the field. % of deaths among vaccinated compared to % of deaths among nonvaccinated. To test the efficacy of different treatments eg. Medical management and surgical management of angina patients.
9
To assess the state of oral health in the community and to determine the availability and utilization of dental care facilities. To indicate the basic factors underlying the state of oral health by diagnosing the community and find solutions to such problems. To determine success or failure of specific oral health care programs or to evaluate the programme action. To promote health legislation and in creating administrative standards for oral health.
10
Definition: A collective recording of observations either numerical or other-wise. two broad categories:
Data
Qualitative data
Nominal Dichotomous Ordinal
Quantitative data
Discrete
Continuous
11
Def: A collective recording of observations either numeric
or otherwise is called data

Qualitative Quantitative
Types:
sources
Primary
Secondary
Methods of collecting data Census Sampling
12
Interval scale No absolute zero Eg. Centigrade scale of temperature Ratio scale
Has a true/ absolute zero

Kelvin temperature scale Most common quantitative data
13
Nominal data Naming or categorical variables that have no
measurement scales
Examples Recording blood groups a) O
b) A
c) B
d) AB
Reasons for extraction of teeth b) periodontitis c) therapeutic d)

14
a) Caries others
Ordinal (ranked) data

Characterized in terms of more than two variables and
have a clearly implied direction but the data are not measured on a measurement scale
Examples Severity of patient perceived pain
a) No pain b) mild pain c) moderate pain d) severe

Esthetic concerns of children
a) Satisfied b) neutral c) not satisfied

15
Dichotomous data (Binary variables)

The variable can have only two values May or may not be directional Examples Sex of the respondents Presence or absence of dental disease in a village
population
Nominal, ordinal and dichotomous data can be called
categorical data
16
Def: the total process of collecting,
compiling and publishing
demographic, economic and social

data pertaining at a specified time
or times, to all persons in a country

or delimited territory
17
The first regular census in India- 1881 Recent census in India- March, 2001 Census act- 1948
Functions Demographics, social & economic conditions of people. Baseline data
Advantages Complete information
Disadvantages expensive, time consuming, needs more man-power, lesser
accuracy
18
Objective of classification of data :

make the data simple, concise, meaningful, interesting and
helpful in further analysis.
two main methods of presenting data:

Tabulation and Diagrams
19
classified on the following bases:

Geographical. i.e , area-wise, e.g. cities, districts etc. Chronological i,e, on the basis of time. Qualitative i.e
according to some attribute.
Quantitative i,e in terms of magnitude.
The two elements of classification are

The variable and The frequency.
20
Variable:
a name denoting a condition , occurrence or effect that can assume different values
Divided: subgroups ,classes. have lowest and highest values Class interval : difference between the upper and lower limit of a class Eg: in the class 5 -14, 5 - lower limit and 14 - upper limit. class interval = 14 - 5 =9. Frequency: is the number of units belonging to each
group of the variable. Frequency distribution table: way of presenting data in the tables 21
Frequency distribution table

Title of the table named at the bottom
The no. of class intervals - between 5 and 20.

The class intervals - at equal width. Clearly defined class limits to avoid ambiguity. For e.g., 0-4.5-9. 10-14. Etc. Clearly defined row and column with the headings.
Units of measurement should be specified.

If the data is not original, the source of the data should be mentioned at the bottom of the table.
22
Extremely useful
attractive to the eyes,

give a bird's eye view of the entire data,
have a lasting impression

facilitate comparison of data relating to
different time periods and regions.
23
TYPES OF DIAGRAMS:
Bar Diagram : qualitative data. Multiple Bar: qualitative data Component Bar Diagram: qualitative data. Frequency Polygon: qualitative data Pie Diagram: qualitative data
Line diagram: qualitative data

Proportional Bar Diagram Histogram: quantitative data of continuous type. Cartograms or Spot Map: geographical distribution of
frequencies
24
Basic rules :
self explanatory simple and consistent with the data. values of the variables - on horizontal or X-axis and
the frequency - vertical line or Y-axis. clumsy.
No too many lines on the graph, should not look The scale of presentation right hand top corner
of the graph. proportional.
The scale of division of the two axes should be the details of the variables and frequencies
presented on the axes.
25
SIMPLE BAR DIAGRAM:

represent qualitative data.
only one variable.
width of the bar remains the same
the length varies according to the frequency in
each category. bars : vertical or horizontal.
Limitation : represent only one classification cannot be used for comparison

26
27
compare qualitative data with respect to
a single variable. Eg: sex wise or with respect to time or region. each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.
28
29
represent qualitative data. both, the number of cases in major groups as
well as the subgroups simultaneously

cases of the major group drawn each rectangle is divided according to no. in the
subgroups.
30
31
represent qualitative data.
compare only the proportion of sub-groups
between different major groups of observations, then bars are drawn for each group with the same length, either as 1 or 100%. These are then divided according to the sub-group proportion in each major group.
32
The frequency of the group is shown in a circle. Degree of angle denotes the frequency. Instead of comparing the length of bar , the areas
of segments are compared.
33
useful to study changes of values in the variable over
time simplest type X-axis, - hours, days, weeks, months or years Y-axis- value of any quantity pertaining to X-axis,
34
quantitative data of continuous type. bar diagram without gap between the bars. represents a frequency distribution. X-axis: the size of an observation is marked. Starting from
0, the limit of each class interval is marked, the width corresponding to the width of the class interval in the frequency distribution.
Y-axis :the frequencies are marked. A rectangle is drawn
above each class interval with height proportional to the frequency of that interval.
35
36
frequency distribution of quantitative data
compare two or more frequency distributions.

a point is marked over the mid-point of the class
interval, corresponding to the frequency.

points are connected by straight lines. The first point and last point are joined to the midpoint
of previous and next class respectively.

To compare two or more frequency distributions, lines of
different types are drawn on the same graph.

37
38
Fig.--. Height and Weight of 20 students of CODS

80 70 60 50 40 30 20 10 0 3 4 5
Height in feet
Weight in KGs
Weight
39
show geographical distribution of frequencies of a
characteristic.
40
The pictures representing the value of items
are called pictograms.

It is most useful way of representing data to
those people who cannot understand.
41
Summary Measures Central Tendency

Mean Median Mode
Variation Range Variance
Standard Deviation
42
single estimate of a series of data that summarizes the data is known as the parameter and one such parameter is
the measure of central tendency.

Objective:
to condense the entire mass of data
to facilitate comparison
Arithmetic mean- mathematical estimate. Median - positional estimate. Mode- based on frequency.
43
Types:

Should be easy to understand and compute. should be based on each and every item in the series. should not be affected by extreme observations
(either too small or too large values).

should be capable of further statistical computations.
It should have sampling stability.
44
simplest measure of central tendency.
Ungrouped data:
Mean =
Sum of all the observations of the data Number of observations in the data
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
45
strong measure of central tendency Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5
In an ordered array, the median is the middle number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers
46
Value that occurs most often
Not affected by extreme values

Used for either numerical or categorical data There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 3 4 5 6 No2Mode
47
mean
48
median
49
mode
50
Variation Range
Interquartile
Standard Deviation
Population Standard Deviation
Variance
Population Variance Sample Variance
Range
Sample Standard Deviation
51
Measure of variation Difference between the largest and
the smallest observations:
Range X Largest X Smallest

Ignores the way in which data are
distributed
Range = 12 - 7 = 5
Range = 12 - 7 = 5
10
11
12
10
11
12
52
Shows variation about the mean EX,
Dr A = 2,4,3,4,6,6,2,5 Dr B = 4,5,4,3,4,5,3,4 Dr C = 3,3,8,3,3,3,4,5
mean for Dr A = 32/8=4 days mean for Dr B = 32/8=4 days mean for Dr C = 32/8=4 days
Sample variance:
S
2
Xi X
i 1
(For sample more than 30)

53
n 1
(x-x) Dr A = -2,0,-1,0, 2,2,-2,1 = 0 Dr B = 0,1,0,-1,0,1,-1,0 = 0 Dr C = -1, -1, 4,-1,-1,-1,-1,0 = 0 (x-x)2 Dr A = 18, Dr B = 4 , Dr C = 22 Thus, Dr A =18/8 = 2.25 Dr B = 4/8 = 0.5 Dr C = 22/8 = 2.75
54
Most important measure of variation Shows variation about the mean Root Mean Square Deviation So for Dr A = 1.5 Dr B = 0.7 Dr C = 1.66 Has the same units as the original data Sample standard deviation:
(for smaller samples <30)
Xi X
i 1
n 1
55
Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
Data C
s = .9258
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 4.57
56
Summarizes the deviations , of a large distribution Indicates whether the variation from mean is by
chance or real
Helps in finding standard error Helps in finding the suitable size of sample Standard
deviation is only interpretable as a measure for variations having

57
summary
approximately symmetric preparations
Compare relative variability Variation of same character in two or more series compare the variability of one character in two different
groups having different magnitude of values or

to compare two characters in the same group by
expressing in percentage
C V = S.D x 100
mean Higher the C.V greater variability

58
curve is bell shaped. The curve is symmetrical about the middle point. The mean is located at the highest point of the curve measures of central tendency coincide. Maximum number of observations is at the value of the
variable corresponding to the mean

number of observations gradually decreases on either
59 side with very few observations at the extreme points.
area under the curve between any 2 pts which
correspond to the number of observations between any 2 values of the variant - in terms of a relationship between the mean and the SD: a) Mean 1 S.D. covers 68.3% of the observations; b) Mean 2 S.D. covers 95.4% of the observations; c) Mean 3 S.D. covers 99.7% of the observations. This relationship is used for fixing confidence interval.
Normal distribution law forms the basis for various
tests of significance.
60
61
Non-symmetrical distribution
Mean, median, mode not the same
Negatively skewed extreme scores at the lower end
Mean < median <mode
Positively skewed

at the higher end Mean >median >mode
The further apart the mean and median, the more the distribution is skewed.
62
Describes how data is distributed Measures of shape Symmetric or skewed
Left-Skewed
Mean < Median < Mode
Symmetric
Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
63
relative frequency or probable chances of occurrence with which an event is expected to occur on an average
Expressed as p Ranges from 0-1 when p= 0, no chance of event happening When p=1 , 100% chances of event happening
no of events occurring total no of trials q = negative probability

64
Methods to estimate the difference b/w estimates of samples two hypothesis are made:
Null hypothesis or hypothesis of no difference

Alternative hypothesis of significant difference
1.Null hypothesis or hypothesis of no difference [Ho]

Asserts that there is no real difference in sample & general
population
The difference found is accidental & arises out of sampling
variations
65
States that sample result is different than the hypothetical value of population To minimize errors the sampling distribution or area under normal curve is divided into two regions or zones mean
1.Zone of acceptance :samples in the area of 1.96 SE, null hypothesis accepted
2.Zone of rejection: sample in the shaded area is beyond the mean 1.96 SE, null hypothesisrejected
66
67
null hypothesis is rejected { when it is true}
Type I error :
The null hypothesis is rejected even it falls in
the zone of acceptance serious error
68
null hypothesis is wrongly accepted error the null hypothesis is accepted even it falls in the zone
Type II error
of rejection not serious error, needs only confirmation of result by changing the level of significance
69
Accept it
Reject it
Null hypothesis is true

Null hypothesis is false
Correct decision
Type II error
Type I error
Correct decision
70
Defined as number of independent numbers in sample

Eg:
When there are 10 values , 9 choices or degrees of
freedom
In unpaired t test of difference between 2 means
df = n1+n2-2
Where;n1 & n2 are no observations.
In paired t- test df = n-1
71
Standard error of mean = SD of means of several sample from same population SE = SD of observation in the sample No of observation in the sample Variation in biological observation
72
An impossible even has probability 0 An event which must occur has
probability 1
Measure on a scale
0.25
Event Event Impossible Unlikely happen
0.5 Event = like happen
0.75 Event certain
P < 0.001 very highly significant P < 0.01 Highly significant P < 0.05 Significant P > 0.05 Not significant
73
73
Whenever 2 sets of observation have been
compared, it becomes essential to find whether the diff observation b/w the 2 groups is because of sampling variation/ any other factor Method Tests of Significance
74
Tests of significance
Parametric tests:
Their model specifies certain conditions about the parameters of the population from which the research sample is drawn. Used for quantitative data.
Nonparametric tests or distribution free tests:
Their model does not specify conditions about the parameters of the population from which the research sample is drawn. Used for qualitative data.
75
Parametric tests :
Non parametric tests :
Large sample tests:
Chi Square test Wilcoxon signed rank test Mann-Whitney U test Spearmans correlation test Mc Nemars test Fishers exact probability test
76
Z-test
Small sample test :
Chi Square test t-test
Independent sample t-test

Paired t test F-test [ANOVA]
To compare sample mean with population

Means of two samples Sample proportion with population Proportion of two samples Association b/w two attributes
77
One-sided tests have one
rejection region, i.e. you check whether the parameter of interest is larger (or smaller) than a given value. Two-sided tests are used when we test a parameter for equivalence to a certain value. Deviations from that value in both directions are rejected.
78
Large samples ( > 30) Difference observed b/w sample estimate and that of
population is expressed in terms of SE Score of value of ratio b/w the observed difference & SE is called Z Z = diff in means / SE of mean
79
Designed by W.S Gossett Used in case of small samples Ratio of observed difference b/w means of two small
samples to the SE of difference in same When each individual gives a pair of observations , to test for difference in pair of values , paired t test utilized.
80
Used to compare the average for measurements made
twice within the same person - before vs. after. For example, Did the systolic blood pressure change significantly from the scene of the injury to admission? Univariate, Matched, Interval, Normal, 2 groups.
81
The most commonly used statistical test. Developed by Karl Pearson Used for qualitative data To test whether the difference in distribution of
attributes in different groups is due to sampling variation or otherwise.
82
e.g. Oral hygiene instructions &
development of new cavities.

Number of new cavities
Group
01
2-3
4-5 Total
No. who received instructions

No. who did not receive instructions Total
30
20
15
15
5
15
50
50
50
30
20
100
83
Tells us about the association but fails
Drawbacks :
to measure the strength of association.
Test is unreliable if the expected
frequency in any one cell is less than 5. Correction is done by subtracting 0.5 from |0-E| Yatess correction
Not applicable when there is 0 or 1 in any
of the cells [ Resort to Fishers exact probability test ]
84
Paired samples :
Wilcoxon signed rank test [Matched pairs test] :
Find the differences between each pair of values & assign rank to the differences from the smallest to the largest without regard to sign. In case there are ties, then we would assign each of the tied observation the mean of the ranks which they jointly occupy.
85
The actual signs of each difference are
then put to the corresponding ranks & the test statistic T is calculated which happens to be the smaller of the two sums. [The sum of the negative ranks & the sum of the positive ranks] or smaller than the table value in order to be considered significant.
Calculated value must be equal to
86
Unpaired samples:
Mann-Whitney test [U test]:
Used to determine whether two
independent samples have been drawn from the same population.
Applies under very general conditions. Rank the data jointly taking them as
belonging to a single sample in either an ascending or descending order.

87
Now find the sum of the ranks assigned
to the values of the 1stsample [R1] & 2nd sample [R2] separately, then work out the test statistic.
n1( n1 + n2) 2
i.e.
n1n2 +
R1
88
Fishers exact probability test :

Used in place of 2 test if
There are 0 or 1 in any of the cells
or any expected value is < 1

Any cell frequency is < 5 or more than
20% of the expected frequencies are< 5.
89
Limitations of Tests of significance :

Tests are only useful aids for decision
making not decision making itself. Do not explain the reasons why does the difference exist. Results are based on probabilities & as such can not be expressed with full certainty. Inferences based on them cannot be said to be entirely correct evidences concerning the truth of the hypothesis
90
Compare more than two samples Compares variation between the classes as well as
within the classes For such comparisons there is high chance of error using t or Z test One-way used to compare more than 3 means from independent groups. Is the age different between White, Black, Hispanic patients? Two-way used to compare 2 or more means by 2 or more factors. Is the age different between Males and Females, With and Without Pnuemonia? 91
Measures the strength of the linear relationship between
two quantitative variables

Denoted by letter r Ranges between 1 and 1
The closer to 1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker any positive linear relationship
92
Pearsons correlation coefficient

r = (X x) (Y-y)
(X x) (Y- y) Does not prove whether one variable alone can cause the change in other.
93
Thank you
94

Bios Tast Is Tics

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Bios Tast Is Tics

Загружено:

Авторское право:

Доступные форматы

1

B.K. Mahajan. Methods in Biostatistics, 6th edition, Jaypee

edition, Prentice Hall of India.

biostatistics and preventive medicine, 2nd edition, WB Saunders Company

Uses of statistics in dental science

statista- means statesman political state.

German word statistic - means

Statistics Principles and methods for the collection, presentation,

analysis and interpretation of numerical data.

to obtain reliable results.

Statistics is an important & integral part of

spectrum of clinical decision making is dependent.

probability of an observed difference occurring by chance is found.

Hence they are used to estimate whether the

understood by many people & also communicate

essential information about a research project.

Father of health statistics

Bio statistics It is the method of collecting, organizing, analyzing,

Def: A collective recording of observations either numeric

or otherwise is called data

Methods of collecting data Census Sampling

Has a true/ absolute zero

Nominal data Naming or categorical variables that have no

Reasons for extraction of teeth b) periodontitis c) therapeutic d)

Ordinal (ranked) data

a) No pain b) mild pain c) moderate pain d) severe

a) Satisfied b) neutral c) not satisfied

Dichotomous data (Binary variables)

Nominal, ordinal and dichotomous data can be called

Def: the total process of collecting,

compiling and publishing

demographic, economic and social

or times, to all persons in a country

Functions Demographics, social & economic conditions of people. Baseline data

Advantages Complete information

Disadvantages expensive, time consuming, needs more man-power, lesser

Objective of classification of data :

helpful in further analysis.

two main methods of presenting data:

classified on the following bases:

according to some attribute.

Quantitative i,e in terms of magnitude.

The two elements of classification are

Frequency distribution table

The no. of class intervals - between 5 and 20.

Units of measurement should be specified.

attractive to the eyes,

have a lasting impression

different time periods and regions.

Line diagram: qualitative data

the frequency - vertical line or Y-axis. clumsy.

of the graph. proportional.

presented on the axes.

SIMPLE BAR DIAGRAM:

only one variable.

width of the bar remains the same

the length varies according to the frequency in

each category. bars : vertical or horizontal.

Limitation : represent only one classification cannot be used for comparison

compare qualitative data with respect to

represent qualitative data. both, the number of cases in major groups as

well as the subgroups simultaneously