0 оценок0% нашли этот документ полезным (0 голосов)

408 просмотров146 страницJun 05, 2011

© Attribution Non-Commercial (BY-NC)

PPT, PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

0 оценок0% нашли этот документ полезным (0 голосов)

408 просмотров146 страницAttribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 146

categorize, classify, manipulate, and present a set of

data in a concise way to make suitable for .

Raw data are measurements or variables that have

not been organized, summarized or other wise

manipulated.

Objective of data organization, summarization

manipulation;

-To see the similarity and dissimilarity of objects.

-To see the important features of the collected data.

-To prepare data for summarization and analysis.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By 1

8/12/2010

Minlikalew D. (B.Sc.)

Cont…d

Descriptive statistics include:

Frequency distribution.

Tables.

Graphs.

Numerical summary measures;

- Measures of central tendency.

- Measures of variability.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 2

Minlikalew D. (B.Sc.)

Cont…d

Before summarization, organization,

categorization/classification,

displaying/presentation, analyzation of data, we

need to know;

The concept of data.

The concept of variable.

The concept of measurement and measurement

scale

Victory College, Faculty of Health Science, Department of

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 3

Minlikalew D. (B.Sc.)

Cont…d

Data

Is facts or information which helps for making

reasoning.

Is a collection of observations on one or more

variables.

Is raw material of statistics.

Is information collected from the source.

There are different criteria to classify data into

different groups.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 4

Minlikalew D. (B.Sc.)

A. Based on the nature of the variable in which the data is

collected;

I. Qualitative/Categorical/Non-number data: the data

collected on a qualitative variable and obtained by simple

possession of certain attribute or characteristics.

Example:

-Breast feeding status (exclusive, partial, and none).

-Whether the mother was employed (yes, no).

-Marital status (single, married, divorced, widowed).

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 5

Minlikalew D. (B.Sc.)

Cont…d

Nominal data: are categorical data where the order

of the categories is arbitrary. A good example is

race/ethnicity has values 1=White, 2=Hispanic,

3=American Indian, 4=Black, 5=Other. Note that

the order of the categories is arbitrary. Certain

statistical concepts are meaningless for nominal

data. For example it would be silly to ask what are

the mean and standard deviation are for

race/ethnicity.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 6

Minlikalew D. (B.Sc.)

Cont…d

Ordinal data: are categorical data where there is a logical

ordering to the categories. A good example is the Likert scale

that you see on many surveys: 1=Strongly disagree;

2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree. While

computation of a median is easily justified for ordinal data,

some statisticians have reservations about computing a mean

for ordinal data.

II. Quantitative/number data: the data collected on

quantitative variables and obtained by count or measurement.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 7

Minlikalew D. (B.Sc.)

Cont…d

Quantitative/number data Consist of both continuous

and discrete data type.

a.Continuous data: consist of both interval and ratio

data.

Interval data is continuous data where differences

are interpretable, but where there is no "natural"

zero. A good example is temperature in Fahrenheit

degrees. Ratios are meaningless for interval data. You

cannot say, for example, that one day is twice as hot

as another day.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 8

Minlikalew D. (B.Sc.)

Cont…d

Ratio data: are continuous data where both differences

and ratios are interpretable. Ratio data has a natural zero.

A good example is birth weight in kg.

The distinctions between interval and ratio data are subtle,

but fortunately, this distinction is often not important.

Certain specialized statistics, such as a geometric mean

and a coefficient of variation can only be applied to ratio

data.

b. Discrete data: quantitative data collected from discrete

variable.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 9

Minlikalew D. (B.Sc.)

Cont…d

B. Based on the source of data in which it is collected;

I. Primary Data: are those data, which are collected by the

investigator himself. Such data are original in character and

are mostly generated by census/sample survey conducted by

individuals or research institutions.

II.Secondary Data: are those data, which are collected from

secondary source, for example journals, reports,

government publications, publications of professionals and

research organizations.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 10

Minlikalew D. (B.Sc.)

Cont…d

Source of data

There are different sources of data on health and

health related conditions. These are;

Health Surveys:

Vital statistics:

Health Service Records

Census:

of Public Health Officer, Biostatistics Lecture Note

8/12/2010 11

Prepared By Minlikalew D. (B.Sc.)

Cont…d

Systems for collecting data

1.Regular system: Registration of events as they

become available.

2. Ad hoc system: A form of survey to collect

information that is not available on regular basis.

Data collection technique/methods

There are different methods of data collection. For

selection the appropriate method we need to

consider the following points.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 12

Minlikalew D. (B.Sc.)

Cont…d

Selection of data collection methods are based on;

The nature of the investigation whether the study is

qualitative or quantitative.

The resources available and its Relevance of the

information.

Acceptability and Accuracy of the method.

The research interest to focus on and cover on.

Familiarization of the procedure.

The characteristics of the study population are under the

influencing factors.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 13

Minlikalew D. (B.Sc.)

Cont…d

Based on the above selection point the methods are;

For qualitative data:-

1. Focus group discussion.

2. In-depth interview (unstructured/ semi-structured).

3. Observation(participant/non-participant)

4. Case studies.

5. Rapid appraisal techniques.

6. Nominal group techniques.

7. Delphi techniques and life histories.

of Public Health Officer, Biostatistics Lecture Note

8/12/2010 14

Prepared By Minlikalew D. (B.Sc.)

Cont…d

For quantitative data:-

1.Face-to-face and interview.

2.self-administered interview.

3.Postal or mail method and telephone interview.

4.Measuring height, length, weight, BMI, MUAC, chest circumference, head

circumference, blood pressure, Hgb, Hct.

5.Using available information (record review), e.g. mortality report, morbidity

report.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 15

Minlikalew D. (B.Sc.)

Cont…d

Decision-makers need information that is:

– Relevant,

– Timely,

– Accurate and

– Usable.

data collection techniques in terms of advantage

and disadvantage.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 16

Minlikalew D. (B.Sc.)

Cont…d

Summary of each data collection technique

Using available information • Is inexpensive, because • Data is not always easily

data is already there. accessible.

• Permits examination of • Ethical issues concerning

trends over the past. confidentiality may

arise.

• Information may be

imprecise or incomplete.

• Data collection may not

be standardized.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 17

Cont…d

• Gives more detailed and • Ethical issues concerning

Observing context related information. confidentiality or privacy

• Permits collection of may arise.

information on facts not • Observer bias may occur

mentioned in the (observer may only notice

questionnaire. what interest him or her).

• The presence of the data

collector can influence the

situation observed.

• Thorough training of

research assistants is

required.

Interviewing illiterates. interview can influence

• Permits clarification of responses

questions. • Reports of events may be

• Has high response rate than less complete than

written questionnaires. information gained through

observations.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 18

Minlikalew D. (B.Sc.)

Cont…d

Small scale flexible • Permits collection of • The interviewer may

interview data in depth inadvertently influence

information and the respondents.

exploration, • Open ended data is

spontaneous remarks by difficult to analyze.

respondents

may be missed because

spontaneous remarks by

respondent are usually

not recorded or

explored.

Public Health Officer, Biostatistics Lecture Note Prepared By 19

8/12/2010

Minlikalew D. (B.Sc.)

Cont…d

• Less expensive. • Cannot be used with

Administering written • Permits anonymity illiterate

questionnaires and may result in respondents.

more honest • There is often a low

responses. rate of response

• Does not require • Questions may be

research assistants. misunderstood.

• Eliminates bias due

to phrasing

questions differently

with different

respondents.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 20

Minlikalew D. (B.Sc.)

Cont…d

Variable

It is a characteristic which takes different values in

different persons, places, or things. Any aspect of an

individual or object that is measured (e.g., BP) or

recorded (e.g., age, sex) and takes any value. There

may be one variable in a study or many.

E.g., A study of treatment outcome of TB.

Variables can be broadly classified into:

A. Categorical (or Qualitative).

B. Quantitative (or numerical variables).

Victory College, Faculty of Health Science, Department of

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 21

Minlikalew D. (B.Sc.)

Cont…d

A. Categorical (or Qualitative)

Variables that can be measured numerically but can be

divided in to different categories are called qualitative

or categorical variable.

A variable that can’t assume a numerical value but can

be classified in to non-numerical categories according

to a set of rules.

The notion of magnitude is absent or implicit.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 22

Minlikalew D. (B.Sc.)

Cont…d

The variable has only two categories are called binary

or dichotomous. E.g. Sex. The variable with more

than two categories are called polythumous. E.g.

Occupational status.

It can be;

1. Nominal: Variables with no inherent order or

ranking sequence, e.g. numbers used as names

(group 1, group 2...), gender, etc.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 23

Cont…d

2. Ordinal: Variables with an ordered series, e.g. "greatly dislike,

moderately dislike, indifferent, moderately like, greatly like". Numbers

assigned to such variables indicate rank/order only. The "distance"

between the numbers has no meaning.

B. Quantitative (or numerical variables)

A variable that can assume numerical value and measured numerically.

Quantitative data measures either how much? or how many? of

something, i.e. a set of observations where any single observation is a

number that represents an amount or a count.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 24

Minlikalew D. (B.Sc.)

Quantitative variable has the notion of magnitude. It can

be;

1.Discrete

It can only have a limited number of discrete values

(usually whole numbers).

Characterized by gaps or interruptions in the values.

The values aren’t just labels, but are actual measurable

quantities.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 25

Minlikalew D. (B.Sc.)

Example:

The number of episodes of diarrhoea a child has

had in a year. You can’t have 12.5 episodes of

diarrhoea.

The number of accidents.

The number of students in this class.

The number of cars.

E.t.c.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 26

Minlikalew D. (B.Sc.)

Cont…d

2. Continuous

It can have an infinite number of possible values in any given

interval.

Does not possess the gaps or interruptions

Example:

Weight.

Income.

Age.

Time. E.t.c.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 27

Minlikalew D. (B.Sc.)

3. Interval

Do not have a true zero. e.g. 88 degrees is not necessarily double the

temperature of 44 degrees.

Equally spaced variables. e.g. temperature. The difference between a

temperature of 66 degrees and 67 degrees is taken to be the same as

the difference between 76 degrees and 77 degrees.

4. Ratio variables

Variables spaced equal intervals with a true zero point, e.g. age.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 28

Minlikalew D. (B.Sc.)

Cont…d

5. Independent variable

It is a hypothesized cause or influence on a dependent

variable. This might be a variable that you control, like a

treatment, or a variable not under your control, like an

exposure.

6. Dependent variable

The variable that you believe might be influenced or

modified by some treatment or exposure or the variable

you are trying to predict. Sometimes the dependent

variable is called the outcome variable.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 29

Cont…d

The definition of dependent and independent variable

depends on the context of the study. For example

the variable that is dependent in one study may be

independent in the other study.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 30

Cont…d

Measurement and Measurement Scale

Measurement: the assignment of numbers or names to

objects or events according to a set of rules. All

measurements are not the same.

Measurement Scale: ways in which variables/numbers

are defined and categorized. It is talking about the

degree of precision of which a characteristics measured.

Depending on the nature of variable and set of rules

considered to measure variable, there are four scale of

measurements.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 31

Minlikalew D. (B.Sc.)

Cont…d

Each scale of measurement has certain properties which

in turn determines the appropriateness for use of

certain statistical analyses.

1.Nominal scale

The simplest and lowest/weakest strength level of

measurement scale than others, in which the values fall into

unordered categories or classes.

Uses names, labels, or symbols to assign each measurement

and numbers have NO meaning.

Measure always qualitative data.

8/12/2010 Health Officer, Biostatistics Lecture Note Prepared By 32

Minlikalew D. (B.Sc.)

Cont…d

Characteristics to be fulfilled;

- Each categories should be mutually exclusive.

- Each categories should be exhaustive.

- The name or symbols can interchange with

out altering essential information.

Example: Blood type, sex, race, marital status, eye

color, type of tar, University attended, occupation,

residence, e.t.c.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 33

Minlikalew D. (B.Sc.)

Cont…d

2. Ordinal scale

Assigns each measurement to one of a limited

number of categories that are ranked in terms of

order.

The difference among categories are not

necessarily equal and often not even measurable.

Although non-numerical, can be considered to

have a natural ordering.

It is the next higher level of measurement.

It is used usually for qualitative data.

Victory College, Faculty of Health Science, Department of

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 34

Minlikalew D. (B.Sc.)

It is subjective in its nature.

Many health care variables are ordinal in nature.

Example: Patient status, cancer stages, social class, Pain level,

dehydration status, Glasgow coma scale e.t.c.

3. Interval scale

Measured on a continuum and differences between any two

numbers on a scale are of known size.

It assign each measurement to one unlimited number of

categories.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 35

Minlikalew D. (B.Sc.)

It has no true zero point. “0” is arbitrarily chosen

and doesn’t reflect the absence of temp.

The distance between each value is equal and fixed

but the attribute is not equal.

It is used for truly quantitative data.

Examples: Body temperature in OF or OC, directions in

degrees, time of the day, IQ.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 36

Minlikalew D. (B.Sc.)

Cont…d

4. Ratio scale

Measurement begins at a true zero point and the

scale has equal space.

It is the highest level of measurement.

It has true zero point.

Used for purely quantitative data.

Examples: Height, weight, BP, e.t.c.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 37

Cont…d

Nominal

Ordinal

Interval

Ratio

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 38

Minlikalew D. (B.Sc.)

Cont…d

Summary of each measurement scale

with the same scale with a higher scale adjacent scale values zero point for the

value are the same value have more of are equal with scale.

on some attribute. some attribute. respect the attribute

being measured. Ratios are equivalent,

The values of the scale The intervals between e.g., the ratio of 2 to 1

have no 'numeric' adjacent scale values E.g., the difference is the same as the ratio

meaning in the way are indeterminate. between 8 and 9 is the of 8 to 4.

that you usually think same as the difference

about numbers. Scale assignment is by between 76 and 77.

the property of "greater

than," "equal to," or

"less than."

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 39

Minlikalew D. (B.Sc.)

Cont…d

Methods of Data Organization and Presentation

In most cases, useful information is not immediately evident from the

mass of unsorted data and it does not impart information.

Data organization: is making condensed information in a way that

will show patterns of variation clearly.

Precise methods of analysis can be decided up on only when the

characteristics of the data are understood. For the primary objective

of this different techniques of data organization are used.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 40

Cont…d

Objective of data organization

To see the similarity and dissimilarity of objects.

To see the important features of the collected data.

To prepare data for summarization and analysis.

The methods of organizing and presenting

(describing) data differ depending on the type of

data/variable whether it is numerical or categorical

that is organized and presented.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 41

Minlikalew D. (B.Sc.)

1.Describing categorical variables: It includes;

A. Table of frequency distributions

– Frequency

– Relative frequency

– Cumulative frequencies

B. Charts

– Bar charts

– Pie charts

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 42

Minlikalew D. (B.Sc.)

Cont…d

Frequency Distributions

• Frequency: It is the number of times each observation

(for individual data) or each class interval (for grouped

data) occurs.

Frequency Distributions: is arrangement of data in a

table that shows the possible values of the data with the

corresponding frequency or class frequency. A simple

and effective way of summarizing categorical data is to

construct a frequency distribution table.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 43

Minlikalew D. (B.Sc.)

Cont…d

Advantages:

Data to be more easily appreciated.

To draw quick comparisons.

To arrange the data in the form of a table, or in one

of a number of different graphical forms.

Types of frequency distribution

I. Simple Frequency Distribution: a table

representing the frequency versus observations.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 44

Hospital stay (days) of 50 patients in a

In this table the number of medical ward (Hypothetical data)

days of hospital stay

Hospital stay (Days)(xi) Frequency (fi)(the number

represents the variable of patients

under consideration, 0 5

Number of persons 1 10

represents the

2 2

frequency, and the

whole distribution is 4 23

called simple frequency 5 5

distribution.

7 5

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 45

i. Array (Ordered Array)

It is a serial arrangement of numerical data in an

ascending or descending order.

It is the first step in organizing data.

It is appropriate when the number of observation is

greater than 6 and less than 20.

It enables to know quickly the smallest and the largest

measurement and the range in the observation.

It is the simplest method.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 46

Minlikalew D. (B.Sc.)

Example: Raw data: 5, 6, 4, 9, 11, 0, 3, 8.

When these data are put in ordered array

0, 3, 4,5,6,8,9,11.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 47

E.g. Qualitative variables

Non-numerical information Mothers plan No of Mother

feeding

frequency distribution.

Replacement 50

Example: HIV positive mothers feeding

attended at ANC unit on their Mixed feeding 30

future plan for infant feeding.

Nursery 50

Total 230

8/12/2010 Department of Public Health Officer, Biostatistics 48

Lecture Note Prepared By Minlikalew D. (B.Sc.)

II. Groups Frequency Distribution

It is the way of representing large sets of data in class

intervals.

STEPS IN CONSTRUCTION OF GROUPED

FREQUENCY DISTRIBUTION

1.Choosing the classes. (1st Put data in ordered array).

2.Sorting (or tallying) of the data into these classes.

3.Counting the number of items in each class.

4.Displaying the results in the form of a chart or table.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 49

Minlikalew D. (B.Sc.)

Cont…d

1. Choosing the classes.

When data consisting of large number of observations

are divided in to certain groups that have defined

upper and lower limits, each group is called class.

The size of the class is called class interval.

Choosing the suitable classification involves;

a. Determining the appropriate number of class/class

interval.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 50

Minlikalew D. (B.Sc.)

Cont…d

The class/class interval are determined by;

I. Non-statistical method/ convenience method:-

choose class not fewer than 6 and more than 20. The

average is 15. The class less than 6 is much

summarized and causes loss of information, the

class greater than 20 does not meet the objective of

data organization. the exact number we use in a

given situation depends mainly on the number of

measurements or observations we have to group.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By

51

Minlikalew D. (B.Sc.)

Cont…d

II. Statistical method:- choose class by using sturges’s formula.

K = 1 + 3.322(logn)

n = number of observations.

Example: Sample size are 275, How many class interval is needed?

K=1+3.322(log275)

K= 1+3.322(2.433)=9

K = 1 + 3.322(logn)

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 52

Minlikalew D. (B.Sc.)

Cont…d

Note:

The Sturge’s rule should not be regarded as final, but should

be considered as a guide only. The number of classes

specified by the rule should be increased or decreased for

convenient or clear presentation.

Classes should be mutually exclusive and do not overlap.

We must make sure that the smallest and largest values fall

within the classification and none of the values can fall into

possible gaps.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 53

Minlikalew D. (B.Sc.)

Cont…d

b. Determine class width.

Class width denoted by “W” which is equal for each

class. R X max − X min

W= =

K K

Where W=Width of the class

R=Range

Xmax=the largest value in the observation.

Xmin=the lowest value in the observation.

K=the number of class.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 54

Example:

– Leisure time (hours) per week for 40 college students:

23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13

10 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 18

12 27 15 21 25 16

K = 1 + 3.22 (log40) = 6.32 ≈ 6

Maximum value = 38, Minimum value = 10

Width = (38-10)/6 = 4.66 ≈ 5

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 55

Cont…d

c. Determining true limit/class boundary.

Class limit: the smallest and largest values that can

go in to any class are regarded as its limits; they can

be either lower or upper class limits.

True limit/class boundaries are those limits, which

are determined mathematically to make an interval

of a continuous variable which is continuous in both

directions, and no gap exists between classes. The

true limits are what the tabulated limits would

correspond with if one could measure exactly.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 56

Minlikalew D. (B.Sc.)

Cont…d

True limit/class boundaries used for smoothening of

the class intervals.

Obtained by subtract 0.5 from the lower and add it to

the upper limit. This is simple convention.

It can be lower or upper.

d. Determining class mark.

Class mark denoted by “Xc”.

It is the mid point of each classes. The formula is;

UTL + LTL

Xc =

2

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 57

Where Xc=class mark.

UTL=Upper True Limit.

LTL=Lower True Limit.

2. Sorting (or tallying) of the data into these classes.

Tally mark are small vertical bars which are used in a

frequency table to represent the number of times a

particular event has appeared in the collected data. Against

a particular class is a particular value has occurred four

times, we put four tally marks (////) but for the fifth

occurrence we put a cross tally mark

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 58

Cont…d

(////) to give it a block of five. When it occurs for the

sixth time we put an other tally mark by leaving

space. If we use only continuous tally bars like(//////)

there may be confusion in counting and it may lead

to mistakes.

3.Counting the number of items in each class.

Relative frequency is the frequency of each class

interval (fi) divided by the total frequency (n). For

grouped data, n = ∑ fi

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 59

Minlikalew D. (B.Sc.)

Cont…d

Cumulative Frequencies when frequencies of two or more

classes are added up. Helps to find the total number of items

whose values are less than or greater than some value. It can be;

- Less than cumulative frequency distribution: Cumulative

frequency distribution, if we start the cumulation from the lowest

size of the variable to the highest size. The most common one.

- More than cumulative frequency distribution: If the

cumulation is from the highest to the lowest value.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 60

Minlikalew D. (B.Sc.)

Cumulative relative frequency: It is computed by

adding subsequent relative frequencies of interest. It

is also possible to calculate cumulative relative

frequency(frc) by dividing cumulative frequency(fc)

to total frequency (n) (i.e. frc =fc/n for each class).

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 61

Minlikalew D. (B.Sc.)

Cont…d

Exercise: Construct grouped frequency distribution. For the following data. Age of patients

(years) (n=60) in a diabetic clinic in Addis Ababa, January 2000 is

19,82,98,78,30,26,32,66,87,81,40,48,70,61,69,58,60,53,28,54,47,40,

80,56,36,53,65,28,90,95,45,32,34,36,20,62,51,20,17,26,70,81,39,63,

33,66,61,77,41,55,76,70,42,67,22,75,24,50,50,44.

Based on the above data construct a table that contains;

1.Class interval/Class. 6.Relative frequency

2.Class boundary. a. Less than relative frequency.

3.Class mark. b. Greater than relative frequency.

4.Tally mark. 7. Cumulative relative frequency.

5.Frequency. a. Less than crf.

b. Greater than crf.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 62

Cont…d

Statistical Tables

Statistical table is an orderly and systematic

presentation of numerical data in rows and columns.

o Rows are horizontal arrangements of data ,

and row heading is termed stub.

o Columns are vertical arrangement of data

and its heading is called caption.

Both simple and grouped frequency distributions can

be put in statistical tables.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 63

Minlikalew D. (B.Sc.)

Cont…d

Almost any quantitative information can be

organized into a table.

Tables are useful for demonstrating patterns,

exceptions, differences, and other relationships.

In addition, tables usually serve as the basis for

preparing more visual displays of data, such as

graphs and charts, where some of the detail may be

lost.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 64

Minlikalew D. (B.Sc.)

Cont…d

Parts of table

1. Table number:

– Serially numbered.

– Should be written in the center at the top.

2. Title:

– Should be written in the center at the top of the table below the

table number.

3. Caption:

– Refers to the name of the column heading.

– Is written at the center of the column.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 65

Cont…d

4. Stub:

– Refers to the name of the raw heading.

– Written at the extreme left.

5. Body of the table:

– The numerical data expressed in the table.

– When the body is empty, it is called dummy table (table

shell) and the variables are termed dummy variables.

6. Head note:

– Short statement about all or major parts of the table.

– Written below the title in brackets.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 66

Minlikalew D. (B.Sc.)

Cont…d

7. Foot note:

– If any clarification is needed about the parts of a table.

– Written at the bottom of the table.

– Indicate source of data.

The following structure shows the placements of

various parts of a table.

Public Health Officer, Biostatistics Lecture Note Prepared By

67

8/12/2010 Minlikalew D. (B.Sc.)

Cont…d

Common Rules of Constructing Tables

Although there are no hard and fast rules to follow, the following

general principles should be addressed in constructing tables.

1. It should be as simple as possible.

2. It should be self-explanatory. To create a table

that is self-explanatory, follow the guidelines below:

I. Title should be clear and to the point.

II.Title should answer when & where it is done, & what it

explains about.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 68

Minlikalew D. (B.Sc.)

Cont…d

III. Precede the title with a table number.

IV. Label each row and each column clearly and

concisely and include the units of measurement for

the data. Limit the number of variables to three or

less.

V. Totals should be shown either in the top row and the

first column or in the last row and last column. If you

show percents (%), also give their total (always 100).

VI. Explain any code, abbreviation, or symbol, or

exclusion in a footnote.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 69

Cont…d

VII. Note the source of the data in a footnote if the data are

not original.

VIII. Put the title at the top of the table.

IX. Numerical entities of zero should be explicitly written

rather than indicated by a dash. Dashed are reserved for

missing or unobserved data.

X. In cross-tabulated data (variables put as row and column

headings), the dependent variable should be the column

heading and the independent variable should be the row

heading.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 70

Minlikalew D. (B.Sc.)

Cont…d

3. If the data shows a qualitative variable , the

observations are listed in alphabetical order or their

degree of importance.

4. If the data is time bound, classified by time of

occurrence, it should be arranged in chronological order.

It starts from the earlier to the latest or vise versa.

5. If the data represents places, it may be placed in

alphabetical order or in terms of geographic location.

Public Health Officer, Biostatistics Lecture Note Prepared By 71

8/12/2010

Minlikalew D. (B.Sc.)

Cont…d

Types of table

Based on the purpose for which the table is

designed and the complexity of the

relationship, a table could be either of;

A. Simple frequency table.

B. Cross tabulation.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 72

Minlikalew D. (B.Sc.)

Cont…d

A. Simple frequency table Example:- Table X: Overall

(one-way table): immunization status of children in

Adami Tullu Woreda, Feb. 1999.

• Is used when the Immunization Number Percent

status

individual observations

involve only to a single immunized 75 35.7

Not

• The denominators for the immunized

Fully 78 37.2

percentages are the sum immunized

of all observed Total 210 100.0

frequencies.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 73

Cont…d

B. Cross tabulated:

Is used to obtain the frequency distribution of one

variable by the subset of another variable.

The decision for the denominator is based on the

variable of interest to be compared over the subset of

the other variable.

Could be two type;

I. Two-way table.

II. High order table.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 74

Cont…d

Example:-Table Y: TT immunization by marital status of the women of childbearing age, Addis Ababa town, 2006.

I. Two-way table:

Shows two variables/

Source: Mikael A. et al Tetanus Toxoid immunization coverage among women of child bearing age in Assendabo town; Bulletin of JIHS, 1996, 7(1): 13-20

characteristics and

is formed when

either the caption or

the stub is divided

into two or more

parts.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 75

Minlikalew D. (B.Sc.)

Cont…d

II. Higher Order Table: Example:-Table Z: Distribution of Health

Professional by Sex and Residence.

When it is desired to

represent three or more

characteristics/variables

in a single table.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 76

Cont…d

Diagrammatic representation of data

• Appropriately drawn graph allows readers to obtain

rapidly an overall grasp of the data presented.

• Well designed graphs can be incredibly powerful

means of communicating a great deal of information

using visual techniques.

• When graphs are poorly designed, they not only do

not effectively convey message, but also they often

mislead and confuse.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 77

Cont..d

Importance of Diagrammatic Representation

Attractiveness.

They help in deriving the required information in

less time and without any mental strain.

They facilitate comparison.

They show unsuspected events and let to action

Memorization.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 78

Cont…d

Limitations of diagrammatic presentation:

• Fail to show slight differences.

• They are not accurate, provide approximate

information's .

• The are not suitable to all statistical data.

• They are not used when comparison is not necessary

or impossible.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 79

Minlikalew D. (B.Sc.)

Cont…d

General rules that are commonly accepted about

construction of graphs:

1.Self-explanatory and as simple as possible.

2.Titles are usually placed below the graph and it

should again question What? Where? When?.

3.Legends or keys should be used to differentiate

variables if more than one is shown.

4.The axes label should be placed to read from the left

side and from the bottom.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 80

Cont…d

5. The units in to which the scale is divided should be

clearly indicated.

6. The numerical scale representing frequency must start at

zero or a break in the line should be shown.

The choice of the particular form among the different

possibilities will depend on personal choices and/or

the type of the data. Bar charts and pie charts: are

commonly used for qualitative or Histograms and

frequency polygons: are used for quantitative

continuous data.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 81

Cont…d

Common types of diagrammatic representations

1. Bar graph

It is the easiest and most adaptable general-purpose

chart.

Bar graph is especially satisfactory for nominal and

ordinal data.

The heights of bars represent the value of the

frequency (actual number or percentage) for each

category.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 82

Cont…d

The categories are represented on the baseline (x-

axis) at regular intervals and the corresponding

values frequencies or relative frequencies

represented on the Y-axis (ordinate) in the case of

vertical bar diagram and vis-versa in the case of

horizontal bar diagram.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 83

Cont…d

Tips for constructing bar graph:

1. Whenever possible it is better to construct a bar diagram

on a graph paper

2. All bars drawn in any single study should be of the same

width.

3. Leave space between the different bars and should be

equal distances.

4. All the bars should rest on the same line called the base

on the x-axis.

5. Whenever possible, it is advisable to draw bars in order of

magnitude.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 84

Minlikalew D. (B.Sc.)

Cont…d

6. Label both axes clearly.

7. The scale should be started from zero.

8. Use of divided bars is possible to show the

component parts.

8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By

Minlikalew D. (B.Sc.) 85

Cont…d

Types of bar graph Example:

A. Simple bar chart:

– It is a one-dimensional

diagram in which the bar

represents the whole of

the magnitude.

– The height or length of

each bar indicates the

size (frequency) of the

Fig. X: Distribution of pediatric patents in a

figure represented. hospital ward by type of admitting

diagnosis in Hospital X, Jan 2000.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 86

Minlikalew D. (B.Sc.)

Cont…d

B. Double bar graph: Example:

Used to depict two

variables.

marital status of women 15-49 years,

Asendabo town, 1996.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 87

Cont…d

C. Multiple bar chart: Example:

– Represents the relationships

among more than two

variables.

– The component figures

(bars) are shown as separate

bars adjoining each other.

– The height of each bar

represents the actual value

of the component figure.

Fig. X’: Prevalence of cough in school

children by smoking history of children

and their parents, Town A Jan 2000.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 88

Cont…d

D. Sub-divided (component) bar graph:

It is also called segmented bar graph. If a given

magnitude can be split up into subdivisions, or if there

are different quantities forming the subdivisions of the

totals, simple bars may be subdivided in the ratio of

the various subdivisions to exhibit the relationship of

the parts to the whole. The order in which the

components are shown in a "bar" is followed in all

bars used in the diagram.

Are constructed when each total is built up from two

or more component figures.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 89

Minlikalew D. (B.Sc.)

Cont…d

Sub-divided (component) bar Example:

graph are two types. These

are;

I. Actual Component

Bar Diagrams:

When the over all height of

the bars and the individual

component lengths

represent actual figures. Fig.Y’: TT Immunization status by

marital status of women 15-49

years, Asendabo town, 1996.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 90

Cont…d

II. Percentage Example:

Component Bar

Diagram:

Where the individual

component lengths

represent the percentage

each component forms the

over all total.

Note that a series of such bars

Fig. Z: TT Immunization status by marital

will all be the same total status of women 15-49 years, Asendabo

height, i.e., 100 percent. town, 1996.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 91

Cont…d

2. Pie chart

Useful for qualitative or quantitative discrete data.

Shows a relative frequency for each by dividing a

circle into sectors so that the areas of the sectors are

proportional to the frequencies.

Appropriate for variables having six categories,

because the circle should not be divided more than six

sectors.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 92

Minlikalew D. (B.Sc.)

Cont…d

Methods of constructing Example:

pie-chart:

– Construct a frequency table

– Change the frequency in to

percentage (f/n).

– Change the percentage in

degrees.

Where degree = percentage ×

360

– Draw a circle and divide it Fig. X: Distribution of Cause of

accordingly death of females in England &

Wales,1999.

Victory College, Faculty of Health Science,

Department of Public Health Officer,

8/12/2010 Biostatistics Lecture Note Prepared By 93

Minlikalew D. (B.Sc.)

Cont…d

3.Histogram

Is a special kind of bar graph.

Useful for quantitative continuous data.

Is frequency distributions with continuous class

intervals that have been turned in to graphs.

The area of each rectangle represents the frequency

of the corresponding class intervals.

To avoid crowding, you can use class midpoints.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 94

Cont…d

In addition to simplifying Example:

complex data set,

histogram is important

in depicting the shape

(symmetric/skewed)

and location of central

tendency (“averages”)

of a frequency Source: Knapp RG, Miller MC III: Clinical Epidemiology and

biostatistics: The national Medical series for Independent study.

distribution of a Williams& Wilkins 1992 Baltimore, Maryland.

values (μmol/min/ml) obtained from 35

workers Exposed to Pesticides.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 95

Cont…d

4. Frequency polygon

To draw it connect the midpoints of the tops of the

adjacent rectangles (cells) of the histogram with line

segments a frequency polygon is obtained.

When the polygon is continued to the X-axis just out

side the range of the lengths the total area under the

polygon will be equal to the total area under the

histogram.

It is not essential to draw histogram in order to obtain

frequency polygon.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 96

It can be drawn with out erecting rectangles of histogram as

follows:

Methods of constructing frequency polygon:

The scale should be marked in the numerical values of the mid-

points of intervals.

Erect ordinates on the midpoints of the interval - the length or

altitude of an ordinate representing the frequency of the class on

whose mid-point it is erected and join the tops of the ordinates

and extend the connecting lines to the scale of sizes.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 97

Cont…d

Example of frequency polygon Example of frequency polygon

drawn from histogram. drawn with out frequency

700

polygon.

600 A g e o f w o m e n a t th e tim e o f m a rria g e

500

40

400 35

300 30

200 25

No of women

100 Std. Dev = 6.13 20

Mean = 27.6

0 N = 2087.00 15

15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

10

N1AGEMOTH

5

0

12 17 22 27 32 37 42 47

2087 mothers with <5 children, Adami

women at the time of marriage.

Tulu, 2003.

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By 98

8/12/2010

Minlikalew D. (B.Sc.)

Cont…d

5.Ogive Curve (The Cumulative Frequency Polygon)

Some times it may be necessary to know the number

of items whose values are more or less than a certain

amount. To get this information it is necessary to

change the form of the frequency distribution from a

‘simple’ to a ‘cumulative’ distribution.

Ogive curve turns a cumulative frequency

distribution in to graphs.

Are much more common than frequency polygons.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 99

Cont…d

To construct an Ogive curve:

I) Compute the cumulative frequency of the

distribution.

II)Prepare a graph with the cumulative frequency on the

vertical axis and the true upper class limits (class

boundaries) of the interval scaled along the X-axis

(horizontal axis). The true lower limit of the lowest

class interval with lowest scores is included in the X-

axis scale; this is also the true upper limit of the next

lower interval having a cumulative frequency of 0.

Victory College, Faculty of Health

Science, Department of Public

8/12/2010 Health Officer, Biostatistics Lecture 100

Note Prepared By Minlikalew D.

Cont…d

Example: Construct Ogive for Ogive Cumulative frequency curve

the data below.

Table.X:Heart rate of patients admitted to

Hospital D, 2000.

admitted to Hospital B ,2000.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 101

Minlikalew D. (B.Sc.)

Cont…d

Numerical Summary Measures

MCT (Measure of Central Tendency)

A frequency distribution is a general picture of the

distribution of a variable.

But, can’t indicate the average value and the spread

of the values.

On the scale of values of a variable there is a certain

stage at which the largest number of items tend to

cluster.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 102

Minlikalew D. (B.Sc.)

Cont…d

Since this stage is usually in the centre of distribution,

the tendency of the statistical data to get concentrated

at a certain value is called “central tendency”.

The various methods of determining the point about

which the observations tend to concentrate are called

MCT (Measure of Central Tendency).

The objective of calculating MCT is to determine a

single figure which may be used to represent the whole

data set.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 103

Minlikalew D. (B.Sc.)

Cont…d

In that sense it is an even more compact description

of the statistical data than the frequency distribution.

Since a MCT represents the entire data, it facilitates

comparison within one group or between groups of

data.

Characteristics of a good MCT:

A MCT is good or satisfactory if it possesses the

following characteristics;

1. It should be based on all the observations.

2. It should not be affected by the extreme values.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 104

Cont…d

3. It should be as close to the maximum number of values as

possible.

4. It should have a definite value.

5. It should not be subjected to complicated and tedious calculations.

6. It should be capable of further algebraic treatment.

7. It should be stable with regard to sampling.

The three most common measures of central tendency are:

–Mean, Median, and Mode.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 105

Cont…d

Arithmetic Mean

The arithmetic mean is the measure of central location

you are probably most familiar with.

It is the arithmetic average and is commonly called simply

“mean” or “average.”

In formulas, the arithmetic mean is usually represented as μ

for population mean and , read as “x-bar” for sample mean.

It is the sum of all the observations divided by the total

number of observations.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 106

Cont…d

General formula

a) Ungrouped mean

If x1 , x 2 , ..., x n are n observed values , then

n

i =1

∑x i

b) Grouped data n .

x =

In calculating the mean from grouped data, we assume that all

values falling into a particular class interval are located at the

mid-point of the interval. It is calculated as follows:

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 107

Cont…d

k

∑m f

i=1

i i

x= k

∑f i=1

i

where,

k = the number of class intervals.

mi = the mid-point of the ith class interval.

fi = the frequency of the ith class interval.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 108

Cont…d

Properties of the Arithmetic Mean:

• For a given set of data there is one and only one

arithmetic mean (uniqueness).

• Easy to calculate and understand (simplicity).

• Influenced by each and every value in a data set.

• Greatly affected by the extreme values (Sensitivity).

So, mean is an excellent measure of central

tendency when the distribution is symmetric

(normally or approximately normally distributed).

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 109

Cont…d

• Algebraic sum of the deviations of the given values

from their arithmetic mean is always zero (Center of

gravity).

• In case of grouped data if any class interval is open,

arithmetic mean can not be calculated.

• it is not appropriate for either nominal or

ordinal data.

• The sum of the squares of deviations from the

arithmetic mean is less than of those computed from

any other point.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By 110

8/12/2010 Minlikalew D. (B.Sc.)

Cont…d

Advantages;

1) It is based on all values given in the distribution.

2) It is most early understood.

3) It is most amenable to algebraic treatment.

Disadvantages;

1) Overly sensitive to extreme values.

2) When the distribution has open-end classes, its

computation would be based assumption, and

therefore may not be valid.

3) Sometimes it may even look ridiculous (amazing).

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 111

Cont…d

Example 1: The heart rates for n=10 patients were as

follows (beats per minute):

167, 120, 150, 125, 150, 140, 40, 136, 120, 150

What is the arithmetic mean for the heart rate of

these patients?

Ans.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 112

Cont…d

Example 2:Compute the mean age of 169 subjects

from the grouped data.

Class interval Mid-point (mi) Frequency (fi) mifi

10-19 14.5 4 58.0

20-29 24.5 66 1617.0

30-39 34.5 47 1621.5

40-49 44.5 36 1602.0

50-59 54.5 12 654.0

60-69 64.5 4 258.0

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 113

Cont…d

Median

It is the middle value of an observation when the observations

are listed in an increasing or decreasing order.

a)Ungrouped data

The median is the value which divides the data set into two

equal parts.

If the number of values is odd, the median will be the middle

value when all values are arranged in order of magnitude with ½

of the observations being larger than the median value, and ½

smaller.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 114

Minlikalew D. (B.Sc.)

Cont…d

When the number of observations is even, there is no single

middle value but two middle observations. In this case the

median is the mean of these two middle observations, when

all observations have been arranged in the order of their

magnitude.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 115

Minlikalew D. (B.Sc.)

Cont…d

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 116

Minlikalew D. (B.Sc.)

Cont…d

b) Grouped data

In calculating the median from grouped data, we

assume that the values within a class-interval are

evenly distributed through the interval.

– The first step is to locate the class interval in

which it is located.

– Find n/2 and see a class interval with a minimum

cumulative frequency which contains n/2.

– Then, use the following formal.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 117

Minlikalew D. (B.Sc.)

Cont…d

n

−Fc

~

x = Lm +2 W

fm

where,

Lm = lower true class boundary of the interval containing the

median.

Fc = cumulative frequency of the interval just above the median

class interval.

fm = frequency of the interval containing the median

W= class interval width.

n = total number of observations.

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 118

Cont…d

Example. Compute the median age of 169 subjects

from the grouped data.

Class interval Mid-point (mi) Frequency (fi) Cum. freq

10-19 14.5 4 4

20-29 24.5 66 70

30-39 34.5 47 117

40-49 44.5 36 153

50-59 54.5 12 165

60-69 64.5 4 169

Total 169

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 119

Cont…d

Ans.

n/2 = 169/2 = 84.5

n/2 = 84.5 = in the 3rd class interval

Lower limit = 29.5, Upper limit = 39.5

Frequency of the class = 47

(n/2 – fc) = 84.5-70 = 14.5

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 120

Minlikalew D. (B.Sc.)

Cont…d

Properties of the median;

• There is only one median for a given set of data

(uniqueness).

• The median is easy to calculate.

• Median is a positional average and hence it is

insensitive to very large or very small values.

• Median can be calculated even in the case of open

end intervals.

• It is determined mainly by the middle points and

less sensitive to the remaining data points

(weakness).

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 121

Cont…d

• It is not a good representative of data if the number of

items is small.

• The median can be used as a summary measure for

ordinal, discrete and continuous data, in general however,

it is not appropriate for nominal data.

Advantages

1)It is easily calculated and is not much disturbed by

extreme values.

2)It is more typical of the series.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 122

Cont…d

3) The median may be located even when the data are

incomplete.

4) The median is more nearer to the reality and more

representative than the mean.

Disadvantages

1. The median is not so well suited to algebraic

treatment as the arithmetic, geometric and

harmonic means.

2. It is not so generally familiar as the arithmetic mean

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 123

Minlikalew D. (B.Sc.)

Cont…d

Mode

• The mode is the most frequently occurring value among all the

observations in a set of data.

• It is not influenced by extreme values.

• It is possible to have more than one mode or no mode.

• It is not a good summary of the majority of the data.

• The mode can be used as a summary measure for

nominal, ordinal, discrete and continuous data, in

general however, it is more appropriate for nominal

and ordinal data.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 124

Cont…d

Any observation of a Diagrammatic presentation of mode.

variable at which the

distribution reaches a

peak is called a

mode.

Most distributions

encountered in

practice have one

peak and are

described as uni-

modal.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 125

Cont…d

a) Ungrouped data

• It is a value which occurs most frequently in a set

of values.

• If all the values are different there is no mode, on

the other hand, a set of values may have more than

one mode.

Public Health Officer, Biostatistics Lecture Note Prepared

8/12/2010 126

By Minlikalew D. (B.Sc.)

Example 1:

• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6

• Mode is 4 “Unimodal”

Example 2:

• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8

• There are two modes – 2 & 5

• This distribution is said to be “bi-modal”

Example 3:

• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12

• No mode, since all the values are different

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 127

b) Grouped data

• To find the mode of grouped data, we usually refer

to the modal class, where the modal class is the

class interval with the highest frequency.

• If a single value for the mode of grouped data must

be specified, it is taken as the mid-point of the

modal class interval.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 128

Cont…d

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 129

Minlikalew D. (B.Sc.)

Cont…d

Also we can use this formula

Mode = L + d1C

d1 + d2

Where;

L= is the lower limit of the modal class

d1= is the difference of frequencies in the modal class and the

preceding class

d2= is the difference of frequencies in the modal class and the

succeeding class

C= is the class interval of the modal class.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 130

Cont…d

Properties of mode;

• The mode can be used as a summary measure for

nominal, ordinal, discrete and continuous data, in general

however, it is more appropriate for nominal and ordinal

data.

• It is not affected by extreme values.

• It can be calculated for distributions with open end

classes.

• Often its value is not unique.

• The main drawback of mode is that often it does not exist.

• It is an average of position.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 131

Cont…d

Advantages

1. Since it is the most typical value it is the most

descriptive average.

2. Since the mode is usually an “actual value”, it indicates

the precise value of an important part of the series.

3. Used for categorical data to describe the most frequent

category.

4. Not affected by extreme values.

5. Easy to understand

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 132

Cont…d

Disadvantages

1. Unless the number of items is fairly large and the

distribution reveals a distinct central tendency, the

mode has no significance.

2. It is not capable of mathematical treatment.

3. In a small number of items the mode may not exist.

4. Some times there may be more than one mode

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 133

Cont…d

Exercise: A table showing the protein intake of different families.

consumption unit/intervals families frequency

day (g)

15- 25 20 30 600 30

25-35 30 40 1200 70

Find

75-85 mean, median,

80 and mode.

10 800 400

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 134

Cont…d

Measures of Dispersion

MCT are not enough to give a clear understanding about the

distribution of the data.

We need to know something about the variability or spread of

the values — whether they tend to be clustered close together,

or spread out over a broad range.

Measures of Dispersion: Measures that quantify the

variation or dispersion of a set of data from its central

location.

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 135

Cont…d

Dispersion refers to the variety exhibited by the

values of the data.

The amount may be small when the values are close

together.

If all the values are the same, no dispersion.

Other synonymous term to Measures of

Dispersion :

– “Measure of Variation”

– “Measure of Spread”

– “Measures of Scatter”

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 136

Cont…d

Measures of dispersion include:

– Range

– Inter-quartile range

– Variance

– Standard deviation

– Coefficient of variation

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 137

Minlikalew D. (B.Sc.)

Cont…d

1. Range (R)

• The difference between the largest and smallest

observations in a sample.

• Range = Maximum value – Minimum value

Example –

– Data values: 5, 9, 12, 16, 23, 34, 37, 42

– Range = 42-5 = 37

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 138

Cont…d

• Being determined by only the two extreme

observations, use of the range is limited because it

tells us nothing about how the data between the

extremes are spread.

• Further, interpretation of the range depends on the

number of observations-

– when the number of observations increase, the

range can get larger.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 139

Cont…d

2. Percentiles, Quartiles and Inter-quartile Range

• The quartiles are sets of values which divide the

distribution into four parts such that there are an

equal number of observations in each part.

– Q1 = [(n+1)/4]th

– Q2 = [2(n+1)/4]th

– Q3 = [3(n+1)/4]th

third and the first quartiles.

– Q3 - Q 1

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 140

Cont…d

• Although the inter-quartile range sometimes serves

as a useful descriptive measure, it is mathematically

intractable and can also vary considerably from

sample to sample.

• Percentiles divide the data into 100 parts of

observations in each part.

• It follows that the 25th percentile is the first quartile,

the 50th percentile is the median and the 75th

percentile is the third quartile.

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 141

Minlikalew D. (B.Sc.)

Cont…d

3. Variance

• A good measure of dispersion should make use of all the data.

• Intuitively, a good measure could be derived by combining, in some way, the

deviations of each observation from the mean.

• The variance achieves this by averaging the sum of the squares of the deviations from

the mean.

Science, Department of Public

8/12/2010 Health Officer, Biostatistics Lecture 142

Note Prepared By Minlikalew D.

Cont…d

• The sample variance of the set x1, x2, ..., xn of n

observations with mean ẍ is

n

∑ i

(x − x) 2

S = 2 i=1

n -1

zero, thus it is more useful to square the deviations,

add them, find the mean (to get the variance).

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 143

Cont…d

4. Standard Deviation

• Being the square of the deviations, the variance is limited as

a descriptive statistic because it is not in the same units as in

the observations.

• By taking the square root of the variance, we obtain a

measure of dispersion in the original units.

Example : We use the data set of 10 numbers (See Page 29):

19 21 20 20 34 22 24 27 27 27

– The range = 34 – 19 = 15

– The first quartile is 20 and the third quartile is 27

– The inter quartile range = 27 – 20 = 7.

– The variance is 21.88

– The SD = √21.88 = 4.68.

Victory College, Faculty of Health Science, Department of

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 144

Cont…d

5. Coefficient of variation

When we desire to compare the variability in two sets

of data, the standard deviation which calculates the

absolute variation may lead to false results.

The coefficient of variation gives relative variation &

is the best measure used to compare the variability in

two sets of data. Never use SD to compare variability

between groups.

CV = standard deviation

Mean

Public Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 145

Thanks You!!!

Enjoy it.

Victory College, Faculty of Health Science, Department of Public

Health Officer, Biostatistics Lecture Note Prepared By

8/12/2010 Minlikalew D. (B.Sc.) 146

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.