Вы находитесь на странице: 1из 94

1

BIOSTATISICS

B.K. Mahajan. Methods in Biostatistics, 6th edition, Jaypee

brothers
P.S.S.Sundar Rao, J.Richard. An introduction to Biostatistics,3rd

edition, Prentice Hall of India.


James F Jekel, David L Katz, Joann G Elmore. Epidemiology,

biostatistics and preventive medicine, 2nd edition, WB Saunders Company


Research methodology- C.R.Kothari, Foundations of clinical research- Portney & Watkins www.google.com
3

Introduction Definition Essential features of statistics Collection of Data Sources of data Sampling

Uses of statistics in dental science

Presentation of data

Measures of central tendency Variability & its measure The normal curve Probability conclusion

Test of significance

Etymology:
Italian word

statista- means statesman political state.

German word statistic - means

Definitions:

Statistics Principles and methods for the collection, presentation,

analysis and interpretation of numerical data.


The science and art of dealing with variation in such a way as

to obtain reliable results.


5

Statistics is an important & integral part of

research methodology.
It is a pervasive force on which the entire

spectrum of clinical decision making is dependent.


Tests of significance are one of the central

concepts in statistics.
These are the mathematical methods by which the

probability of an observed difference occurring by chance is found.


6

Hence they are used to estimate whether the

relationship observed in the data occurred purely by chance or was there a real relationship between the variables, thereby testing the hypothesis proposed at the start of the study.
They constitute a common yardstick that can be

understood by many people & also communicate

essential information about a research project.


7

John Graunt

Father of health statistics

Bio statistics It is the method of collecting, organizing, analyzing,

tabulating & interpreting the data related to living organisms & human beings. Application of statistics to health problems.
Health statistics public/ community health. Medical statistics medicine. Vital statistics demography. Dental statistics dentistry.

To

To define what is normal or healthy in a population. Ex:pulse rate/ min. find:Statistical difference between means of two variables. Ex: mean blood pressure of two cricket teams after a cricket match. Co-relation between two variables. Ex; - Female literacy rate & Infant Mortality Rate. usefulness of sera & vaccines in the field. % of deaths among vaccinated compared to % of deaths among nonvaccinated. To test the efficacy of different treatments eg. Medical management and surgical management of angina patients.
9

To assess the state of oral health in the community and to determine the availability and utilization of dental care facilities. To indicate the basic factors underlying the state of oral health by diagnosing the community and find solutions to such problems. To determine success or failure of specific oral health care programs or to evaluate the programme action. To promote health legislation and in creating administrative standards for oral health.

10

Definition: A collective recording of observations either numerical or other-wise. two broad categories:
Data

Qualitative data
Nominal Dichotomous Ordinal

Quantitative data

Discrete

Continuous
11

Def: A collective recording of observations either numeric

or otherwise is called data


Qualitative Quantitative

Types:

sources

Primary
Secondary

Methods of collecting data Census Sampling

12

Interval scale No absolute zero Eg. Centigrade scale of temperature Ratio scale

Has a true/ absolute zero


Kelvin temperature scale Most common quantitative data

13

Nominal data Naming or categorical variables that have no

measurement scales
Examples Recording blood groups a) O

b) A

c) B

d) AB

Reasons for extraction of teeth b) periodontitis c) therapeutic d)


14

a) Caries others

Ordinal (ranked) data


Characterized in terms of more than two variables and

have a clearly implied direction but the data are not measured on a measurement scale
Examples Severity of patient perceived pain

a) No pain b) mild pain c) moderate pain d) severe


Esthetic concerns of children

a) Satisfied b) neutral c) not satisfied


15

Dichotomous data (Binary variables)


The variable can have only two values May or may not be directional Examples Sex of the respondents Presence or absence of dental disease in a village

population

Nominal, ordinal and dichotomous data can be called

categorical data

16

Def: the total process of collecting,

compiling and publishing

demographic, economic and social


data pertaining at a specified time

or times, to all persons in a country


or delimited territory
17

The first regular census in India- 1881 Recent census in India- March, 2001 Census act- 1948

Functions Demographics, social & economic conditions of people. Baseline data

Advantages Complete information

Disadvantages expensive, time consuming, needs more man-power, lesser

accuracy
18

Objective of classification of data :


make the data simple, concise, meaningful, interesting and

helpful in further analysis.

two main methods of presenting data:


Tabulation and Diagrams
19

classified on the following bases:


Geographical. i.e , area-wise, e.g. cities, districts etc. Chronological i,e, on the basis of time. Qualitative i.e

according to some attribute.

Quantitative i,e in terms of magnitude.

The two elements of classification are


The variable and The frequency.
20

Variable:

a name denoting a condition , occurrence or effect that can assume different values
Divided: subgroups ,classes. have lowest and highest values Class interval : difference between the upper and lower limit of a class Eg: in the class 5 -14, 5 - lower limit and 14 - upper limit. class interval = 14 - 5 =9. Frequency: is the number of units belonging to each

group of the variable. Frequency distribution table: way of presenting data in the tables 21

Frequency distribution table


Title of the table named at the bottom

The no. of class intervals - between 5 and 20.


The class intervals - at equal width. Clearly defined class limits to avoid ambiguity. For e.g., 0-4.5-9. 10-14. Etc. Clearly defined row and column with the headings.

Units of measurement should be specified.


If the data is not original, the source of the data should be mentioned at the bottom of the table.
22

Extremely useful

attractive to the eyes,


give a bird's eye view of the entire data,

have a lasting impression


facilitate comparison of data relating to

different time periods and regions.

23

TYPES OF DIAGRAMS:
Bar Diagram : qualitative data. Multiple Bar: qualitative data Component Bar Diagram: qualitative data. Frequency Polygon: qualitative data Pie Diagram: qualitative data

Line diagram: qualitative data


Proportional Bar Diagram Histogram: quantitative data of continuous type. Cartograms or Spot Map: geographical distribution of

frequencies
24

Basic rules :
self explanatory simple and consistent with the data. values of the variables - on horizontal or X-axis and

the frequency - vertical line or Y-axis. clumsy.

No too many lines on the graph, should not look The scale of presentation right hand top corner

of the graph. proportional.

The scale of division of the two axes should be the details of the variables and frequencies

presented on the axes.

25

SIMPLE BAR DIAGRAM:


represent qualitative data.

only one variable.

width of the bar remains the same

the length varies according to the frequency in

each category. bars : vertical or horizontal.

Limitation : represent only one classification cannot be used for comparison


26

27

compare qualitative data with respect to

a single variable. Eg: sex wise or with respect to time or region. each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.
28

29

represent qualitative data. both, the number of cases in major groups as

well as the subgroups simultaneously


cases of the major group drawn each rectangle is divided according to no. in the

subgroups.

30

31

represent qualitative data.

compare only the proportion of sub-groups

between different major groups of observations, then bars are drawn for each group with the same length, either as 1 or 100%. These are then divided according to the sub-group proportion in each major group.

32

The frequency of the group is shown in a circle. Degree of angle denotes the frequency. Instead of comparing the length of bar , the areas

of segments are compared.

33

useful to study changes of values in the variable over

time simplest type X-axis, - hours, days, weeks, months or years Y-axis- value of any quantity pertaining to X-axis,

34

quantitative data of continuous type. bar diagram without gap between the bars. represents a frequency distribution. X-axis: the size of an observation is marked. Starting from

0, the limit of each class interval is marked, the width corresponding to the width of the class interval in the frequency distribution.
Y-axis :the frequencies are marked. A rectangle is drawn

above each class interval with height proportional to the frequency of that interval.
35

36

frequency distribution of quantitative data

compare two or more frequency distributions.


a point is marked over the mid-point of the class

interval, corresponding to the frequency.


points are connected by straight lines. The first point and last point are joined to the midpoint

of previous and next class respectively.


To compare two or more frequency distributions, lines of

different types are drawn on the same graph.


37

38

Fig.--. Height and Weight of 20 students of CODS


80 70 60 50 40 30 20 10 0 3 4 5
Height in feet

Weight in KGs

Weight

39

show geographical distribution of frequencies of a

characteristic.

40

The pictures representing the value of items

are called pictograms.


It is most useful way of representing data to

those people who cannot understand.

41

Summary Measures Central Tendency


Mean Median Mode

Variation Range Variance

Standard Deviation
42

single estimate of a series of data that summarizes the data is known as the parameter and one such parameter is

the measure of central tendency.


Objective:

to condense the entire mass of data

to facilitate comparison
Arithmetic mean- mathematical estimate. Median - positional estimate. Mode- based on frequency.
43

Types:

Should be easy to understand and compute. should be based on each and every item in the series. should not be affected by extreme observations

(either too small or too large values).


should be capable of further statistical computations.
It should have sampling stability.

44

simplest measure of central tendency.

Ungrouped data:

Mean =

Sum of all the observations of the data Number of observations in the data

0 1 2 3 4 5 6 7 8 9 10
Mean = 5

0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6

45

strong measure of central tendency Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 Median = 5

0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5

In an ordered array, the median is the middle number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers
46

Value that occurs most often

Not affected by extreme values


Used for either numerical or categorical data There may may be no mode

There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 3 4 5 6 No2Mode

47

mean
48

median
49

mode
50

Variation Range
Interquartile

Standard Deviation
Population Standard Deviation

Variance
Population Variance Sample Variance

Range

Sample Standard Deviation

51

Measure of variation Difference between the largest and

the smallest observations:

Range X Largest X Smallest


Ignores the way in which data are

distributed

Range = 12 - 7 = 5

Range = 12 - 7 = 5

10

11

12

10

11

12
52

Shows variation about the mean EX,

Dr A = 2,4,3,4,6,6,2,5 Dr B = 4,5,4,3,4,5,3,4 Dr C = 3,3,8,3,3,3,4,5

mean for Dr A = 32/8=4 days mean for Dr B = 32/8=4 days mean for Dr C = 32/8=4 days
Sample variance:
S
2

Xi X
i 1

(For sample more than 30)


53

n 1

(x-x) Dr A = -2,0,-1,0, 2,2,-2,1 = 0 Dr B = 0,1,0,-1,0,1,-1,0 = 0 Dr C = -1, -1, 4,-1,-1,-1,-1,0 = 0 (x-x)2 Dr A = 18, Dr B = 4 , Dr C = 22 Thus, Dr A =18/8 = 2.25 Dr B = 4/8 = 0.5 Dr C = 22/8 = 2.75

54

Most important measure of variation Shows variation about the mean Root Mean Square Deviation So for Dr A = 1.5 Dr B = 0.7 Dr C = 1.66 Has the same units as the original data Sample standard deviation:
(for smaller samples <30)

Xi X
i 1

n 1

55

Data A

Mean = 15.5

s = 3.338
11 12 13 14 15 16 17 18 19 20 21
Data B Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
Data C

s = .9258
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21

s = 4.57
56

Summarizes the deviations , of a large distribution Indicates whether the variation from mean is by

chance or real
Helps in finding standard error Helps in finding the suitable size of sample Standard

deviation is only interpretable as a measure for variations having


57

summary

approximately symmetric preparations

Compare relative variability Variation of same character in two or more series compare the variability of one character in two different

groups having different magnitude of values or


to compare two characters in the same group by

expressing in percentage
C V = S.D x 100

mean Higher the C.V greater variability


58

curve is bell shaped. The curve is symmetrical about the middle point. The mean is located at the highest point of the curve measures of central tendency coincide. Maximum number of observations is at the value of the

variable corresponding to the mean


number of observations gradually decreases on either
59 side with very few observations at the extreme points.

area under the curve between any 2 pts which

correspond to the number of observations between any 2 values of the variant - in terms of a relationship between the mean and the SD: a) Mean 1 S.D. covers 68.3% of the observations; b) Mean 2 S.D. covers 95.4% of the observations; c) Mean 3 S.D. covers 99.7% of the observations. This relationship is used for fixing confidence interval.
Normal distribution law forms the basis for various

tests of significance.

60

61

Non-symmetrical distribution

Mean, median, mode not the same

Negatively skewed extreme scores at the lower end

Mean < median <mode

Positively skewed

at the higher end Mean >median >mode

The further apart the mean and median, the more the distribution is skewed.

62

Describes how data is distributed Measures of shape Symmetric or skewed

Left-Skewed
Mean < Median < Mode

Symmetric
Mean = Median =Mode

Right-Skewed
Mode < Median < Mean

63

relative frequency or probable chances of occurrence with which an event is expected to occur on an average

Expressed as p Ranges from 0-1 when p= 0, no chance of event happening When p=1 , 100% chances of event happening

no of events occurring total no of trials q = negative probability


64

Methods to estimate the difference b/w estimates of samples two hypothesis are made:

Null hypothesis or hypothesis of no difference


Alternative hypothesis of significant difference

1.Null hypothesis or hypothesis of no difference [Ho]


Asserts that there is no real difference in sample & general

population
The difference found is accidental & arises out of sampling

variations
65

States that sample result is different than the hypothetical value of population To minimize errors the sampling distribution or area under normal curve is divided into two regions or zones mean

1.Zone of acceptance :samples in the area of 1.96 SE, null hypothesis accepted

2.Zone of rejection: sample in the shaded area is beyond the mean 1.96 SE, null hypothesisrejected
66

67

null hypothesis is rejected { when it is true}

Type I error :

The null hypothesis is rejected even it falls in

the zone of acceptance serious error

68

null hypothesis is wrongly accepted error the null hypothesis is accepted even it falls in the zone

Type II error

of rejection not serious error, needs only confirmation of result by changing the level of significance

69

Accept it

Reject it

Null hypothesis is true


Null hypothesis is false

Correct decision
Type II error

Type I error

Correct decision

70

Defined as number of independent numbers in sample


Eg:
When there are 10 values , 9 choices or degrees of

freedom
In unpaired t test of difference between 2 means

df = n1+n2-2
Where;n1 & n2 are no observations.
In paired t- test df = n-1
71

Standard error of mean = SD of means of several sample from same population SE = SD of observation in the sample No of observation in the sample Variation in biological observation

72

An impossible even has probability 0 An event which must occur has

probability 1

Measure on a scale

0.25

Event Event Impossible Unlikely happen

0.5 Event = like happen

0.75 Event certain

P < 0.001 very highly significant P < 0.01 Highly significant P < 0.05 Significant P > 0.05 Not significant

73

73

Whenever 2 sets of observation have been

compared, it becomes essential to find whether the diff observation b/w the 2 groups is because of sampling variation/ any other factor Method Tests of Significance

74

Tests of significance
Parametric tests:

Their model specifies certain conditions about the parameters of the population from which the research sample is drawn. Used for quantitative data.
Nonparametric tests or distribution free tests:

Their model does not specify conditions about the parameters of the population from which the research sample is drawn. Used for qualitative data.
75

Parametric tests :

Non parametric tests :

Large sample tests:

Chi Square test Wilcoxon signed rank test Mann-Whitney U test Spearmans correlation test Mc Nemars test Fishers exact probability test
76

Z-test

Small sample test :

Chi Square test t-test

Independent sample t-test


Paired t test F-test [ANOVA]

To compare sample mean with population


Means of two samples Sample proportion with population Proportion of two samples Association b/w two attributes

77

One-sided tests have one

rejection region, i.e. you check whether the parameter of interest is larger (or smaller) than a given value. Two-sided tests are used when we test a parameter for equivalence to a certain value. Deviations from that value in both directions are rejected.
78

Large samples ( > 30) Difference observed b/w sample estimate and that of

population is expressed in terms of SE Score of value of ratio b/w the observed difference & SE is called Z Z = diff in means / SE of mean

79

Designed by W.S Gossett Used in case of small samples Ratio of observed difference b/w means of two small

samples to the SE of difference in same When each individual gives a pair of observations , to test for difference in pair of values , paired t test utilized.

80

Used to compare the average for measurements made

twice within the same person - before vs. after. For example, Did the systolic blood pressure change significantly from the scene of the injury to admission? Univariate, Matched, Interval, Normal, 2 groups.

81

The most commonly used statistical test. Developed by Karl Pearson Used for qualitative data To test whether the difference in distribution of

attributes in different groups is due to sampling variation or otherwise.

82

e.g. Oral hygiene instructions &

development of new cavities.


Number of new cavities

Group

01

2-3

4-5 Total

No. who received instructions


No. who did not receive instructions Total

30
20

15
15

5
15

50
50

50

30

20

100
83

Tells us about the association but fails

Drawbacks :

to measure the strength of association.

Test is unreliable if the expected

frequency in any one cell is less than 5. Correction is done by subtracting 0.5 from |0-E| Yatess correction

Not applicable when there is 0 or 1 in any

of the cells [ Resort to Fishers exact probability test ]

84

Paired samples :
Wilcoxon signed rank test [Matched pairs test] :

Find the differences between each pair of values & assign rank to the differences from the smallest to the largest without regard to sign. In case there are ties, then we would assign each of the tied observation the mean of the ranks which they jointly occupy.

85

The actual signs of each difference are

then put to the corresponding ranks & the test statistic T is calculated which happens to be the smaller of the two sums. [The sum of the negative ranks & the sum of the positive ranks] or smaller than the table value in order to be considered significant.

Calculated value must be equal to

86

Unpaired samples:
Mann-Whitney test [U test]:
Used to determine whether two

independent samples have been drawn from the same population.

Applies under very general conditions. Rank the data jointly taking them as

belonging to a single sample in either an ascending or descending order.


87

Now find the sum of the ranks assigned

to the values of the 1stsample [R1] & 2nd sample [R2] separately, then work out the test statistic.
n1( n1 + n2) 2

i.e.

n1n2 +

R1

88

Fishers exact probability test :


Used in place of 2 test if
There are 0 or 1 in any of the cells

or any expected value is < 1


Any cell frequency is < 5 or more than

20% of the expected frequencies are< 5.

89

Limitations of Tests of significance :


Tests are only useful aids for decision

making not decision making itself. Do not explain the reasons why does the difference exist. Results are based on probabilities & as such can not be expressed with full certainty. Inferences based on them cannot be said to be entirely correct evidences concerning the truth of the hypothesis
90

Compare more than two samples Compares variation between the classes as well as

within the classes For such comparisons there is high chance of error using t or Z test One-way used to compare more than 3 means from independent groups. Is the age different between White, Black, Hispanic patients? Two-way used to compare 2 or more means by 2 or more factors. Is the age different between Males and Females, With and Without Pnuemonia? 91

Measures the strength of the linear relationship between

two quantitative variables


Denoted by letter r Ranges between 1 and 1

The closer to 1, the stronger the negative linear

relationship
The closer to 1, the stronger the positive linear relationship

The closer to 0, the weaker any positive linear relationship

92

Pearsons correlation coefficient


r = (X x) (Y-y)

(X x) (Y- y) Does not prove whether one variable alone can cause the change in other.

93

Thank you

94

Вам также может понравиться