Вы находитесь на странице: 1из 23

B as ic Statis tical C on cep ts for Nu rs es

Date of last revision April 25, 2011

"Statistics are simply tools that researchers employ to help answer research
questions."
Page 1, 2, 3, 4 Next

http://currentnursing.com/nursing_education/nursing_education_in_india.html

• Measurement
• Population, Variable, Sample
• Hypothesis
• Types of errors
• Statistical Power
• Sampling
• Descriptive Statistics
• Inferential statistics
• Parametric and Non-parametric Tests
• Appropriate Statistical tests
• Common Statistical tests
• Multivariate Analysis
• Tips on Choosing the appropriate test
• Computer Aided Analysis

------------------------------------------------------------------------------

Introduction

As the context of health care is changing due to the pharmaceutical services and
technological advances, nurses and other health care professionals need to be prepared to
respond in knowledgeable and practical ways. Health information is very often explained
in statistical terms for making it concise and understandable. Statistics plays a vitally
important role in the research. Statistics help to answer important research questions and
it is the answers to such questions that further our understanding of the field and provide
for academic study. It is required the researcher to have an understanding of what tools
are suitable for a particular research study. It is essential for healthcare professionals to
have a basic understanding of basic concepts of statistics as it enables them to read and
evaluate reports and other literature and to take independent research investigations by
selecting the most appropriate statistical test for their problems. The purpose of analyzing
data in a study is to describe the data in meaningful terms.
Descriptive approach and inferential approach

Depending on the kinds of variables identified (nominal, ordinal, interval, and ratio) and
the design of particular study, a number of statistical techniques is available to analyze
data. There are two approaches to the statistical analysis of data the descriptive approach
and inferential approach. Descriptive statistics convert data into picture of the
information that is readily understandable. The inferential approach helps to decide
whether the outcome of the study is a result of factors planned within design of the study
or determined by chance. The two approaches are often used sequentially in that first,
data are described with descriptive statistics, and then additional statistical manipulations
are done to make inferences about the likelihood that the outcome was due to chance
through inferential statistics. When descriptive approach is used, terms like mean,
median, mode, variation, and standard deviation are used to communicate the analysis
information of data. When inferential approach is used, probability values (P) are used to
communicate the significance or lack of significance of the results (Streiner & Norman,
1996).

Measurement

Measurement defined as “assignment of numeral according to rules” (Tyler 1963:7).


Regardless of the variables under study, in order to make sense out of data collected, each
variable must be measured in such a way that its magnitude or quantity must be clearly
identified. The specific strategy for a particular study depends upon the particular
research problem, the sample under study, the availability of instruments, and the general
feasibility of the project (Brockopp & Hastings-Tolsma, 2003). A variety of measurement
methods are available for use in nursing research. Four measurement scales are used:
nominal, ordinal, interval and ratio.

The nominal level of measurement

The nominal level of measurement is the most primitive or lowest level of classifying
information. Nominal variables include categories of people, events, and other
phenomena are named, are exhaustive in nature, and are mutually exclusive. These
categories are discrete and noncontinous. In case of nominal measurement admissible
statistical operation are counting of frequency, percentage, proportion, mode, and
coefficient of contingency.

The ordinal level of measurement

The ordinal level of measurement is second in terms of its refinement as a means of


classifying information. Ordinal implies that the values of variables can be rank-ordered
from highest to lowest.
Interval Level of Measurement

Interval level of measurement is quantitative in nature. The individual units are


equidistant from one point to the other. The interval data does not have an absolute zero.
For example, temperature is measured in Celsius or Fahrenheit. Interval level of
measurement refers to the third level of measurement in relation to complexity of
statistical techniques that can be used to analyze data. Variables with in this level of
measurement are assessed incrementally, and the increments are equal.

Ratio Level of Measurement

Ratio level of measurement is characterized by variables that are assessed incrementally


with equal distances between the increments and a scale that has an absolute zero. Ratio
variables exhibit the characteristics of ordinal and interval measurement and can also be
compared by describing it as two or three times another number or as one-third, one-
quarter, and so on. Variable like time, length and weight are ratio scales and also be
measured using nominal or ordinal scale. The mathematical properties of interval and
ratio scales are very similar, so the statistical procedures are common for both the scales.

Errors of measurement

When a variable is measured there is the potential for errors to occur. Some of the
sources of errors in measurement are, instrument clarity, variations in administrations,
situational variations, response set bias, transitory personal factors, response sampling,
and instrument format.

Population, Sample, Variable

Population is defined as the entire collection of a set of objects, people, or events, in a


particular context. The population is the entire group of persons or objects that is of
interest to the investigator. In statistics population means, any collection of individual
items or units that is the subject of investigation. Population refers to the collection of all
items upon which statements will be based. This might include all patients with
schizophrenia in a particular hospital, or all depressed individuals in a certain community.
Characteristics of a population that differ form individual to individual are called
variables. A variable is a concept (construct) that has been so specifically defined that
precise observations and therefore measurement can be accomplished. Length, age,
weight, temperature, pulse rate are a few examples of variables.

The sample is a subset of the population selected by investigator to participate in a


research study. A sample refers to a subset of observations selected from the population.
It might be unusual for an investigator to describe only patients with schizophrenia in a
particular hospital and it is unlikely that an investigator will measure every depressed
person in a community. As it is rarely practicable to obtain measures of a particular
variable from all the units in population, the investigator has to collect information from a
smaller group or sub-set that represents the group as a whole. This sub-set is called a
sample. Each unit in the sample provides a record, such as measurement, which is called
an observation. The sample represents the population of those critical characteristics the
investigator plan to study.

Dependent and independent variables

An independent variable is presumed cause of the dependent variable-the presumed


effect. The independent variable is one which explains or accounts for variations in the
dependent variable. An independent variable is one whose change results in change in
other variable. In experiments, the independent variable is the variable manipulated by
the experimenter. A dependent variable is one which changes in relationship to changes
in another variable. A variable which is dependent in one study may be independent in
another. Intervening variable is one that comes between the independent and dependent
variable.

Hypothesis

Hypothesis is statement or declaration of the expected outcome of a research study. It is


based on logical rationale and has empirical possibilities for testing. Hypothesis is
formulated in experimental research. In some non-experimental correlational studies,
hypothesis may also be developed. Normally, there are four elements in a hypothesis:

• (1) dependent and independent variables,


• (2) some type of relationship between independent and dependent variable,
• (3) the direction of the change, and
• (4) it mentions about the subjects, i.e. population being studied.

It is defined as “A tentative assumption made in order to draw out and test its logical or
empirical consequences” (Webster 1968).

Standards in formulating a hypothesis (Ahuja, R. 2001):

• It should be empirically testable, whether it is right or wrong.


• It should be specific and precise.
• The statements in the hypothesis should not be contradictory.
• It should specify variables between which the relationship to be established
• It should describe one issue only.
Characteristics of a Hypothesis

• Characteristics of a Hypothesis (Treece & Treece, 1989)


• It is testable
• It is logical
• It is directly related to the research problem
• It is factually or theoretically based
• It states a relationship between variables
• It is stated in such a form that it can be accepted or rejected

Directional hypothesis predicts an outcome in a particular direction, and nondirectional


hypothesis simply states that there will be difference between the groups. There can be
two hypotheses, research hypothesis and null hypothesis. The null hypothesis is formed
for the statistical purpose of negating it. If the research hypothesis states there is positive
correlation between smoking and cancer, the null hypothesis states there is no relation
between smoking and cancer. It is easy to negate a statement than establishing it.

The null hypothesis is statistical statement that there is no difference between the groups
under study. A statistical test is used to determine the probability that the null hypothesis
is not true and rejected, i.e. inferential statistics are used in an effort to reject the null,
thereby showing that a deference does exists. The null hypothesis is a technical necessity
when using inferential statistics, based on statistical significance which is used as
criterion.

Types of errors

When the null hypothesis is rejected, the observed differences between groups are
deemed improbable by chance alone. For example, if drug A is compared to a placebo for
its effects on depression and the null hypothesis is rejected, the investigator concludes
that the observed differences most likely are not explainable simply by sampling error.
The key word in these statements is probable. When offering this conclusion, the
investigator has the odds on his or her side. However, what are the chances of the
statement being incorrect?

In statistical inference there is no way to say with certainty that rejection or retention of
the null hypothesis was correct. There are two types of potential errors. A type I error
occurs when the null hypothesis is rejected when indeed it should have been retained; a
type II error occurs if the null hypothesis is retained when indeed it should have been
rejected.
Type I Error

Type I errors occur when the null hypothesis is rejected but should have been retained,
such as when a researcher decides that two means are different. He or she might conclude
that the treatment works or those groups are not sampled from the same population
whereas in reality the observed differences are attributable only to sampling error. In a
conservative scientific setting, type I errors should be made rarely. There is a great
disadvantage to advocating treatments that really do not work.

The probability of a type I error is denoted with the Greek letter alpha (a). Because of the
desire to avoid type I errors, statistical models have been created so that the investigator
has control over the probability of a type I error. At the .05 significance or alpha level, a
type I error is expected to occur in 5 percent of all cases. At the .01 level, it may occur in
1 percent of all cases. Thus, at the .05 a level, one type I error is expected to be made in
each of 20 independent tests. At the .01 a level, one type I error is expected to be made in
each 100 independent tests.

Type II Error

The motivation to avoid a type I error might increase the probability of making a second
type of error. In this case the null hypothesis is retained when it actually was wrong. For
example, an investigator may reach the conclusion that a treatment does not work when
actually it is efficacious. The probability of a type II error is symbolized by the Greek
capital letter beta (B). Here the decision is not to reject the null hypothesis when in
actuality the null hypothesis was false. This is a type II error with the probability of beta
(B).

Statistical Power

There are several maneuvers that will increase control over the probability of different
types of errors and correct decisions. One type of correct decision is the probability of
rejecting the null hypothesis and being correct in that decision. Power is defined as the
probability of rejecting the null hypothesis when it should have been rejected. Ultimately,
the statistical evaluation will be more meaningful if it has high power.

It is particularly important to have high statistical power when the null hypothesis is
retained. Retaining the null hypothesis with high power gives the investigator more
confidence in stating that differences between groups were non-significant. One factor
that affects the power is the sample size. As the sample size increases, power increases.
The larger the sample, greater the probability that a correct decision will be made in
rejecting or retaining the null hypothesis.
Another factor that influences power is the significance level. As significance increases,
the power increases. For instance, if the .05 level is selected rather than the .01 level,
there will be a greater chance of rejecting the null hypothesis. However, there will also be
a higher probability of a type I error. By reducing the chances of a type I error, the
chances of correctly identifying the real difference (power) are also reduced. Thus, the
safest manipulation to affect power without affecting the probability of a type I error is to
increase the sample size.

The third factor affecting power is effect size. The larger the true differences between
two groups, the greater the power. Experiments attempting to detect a very strong effect,
such as the impact of a very potent treatment, might have substantial power even with
small sample sizes. The detection of subtle effects may require very large samples in
order to achieve reasonable statistical power. It is worth noting that not all statistical tests
have equal power. The probability of correctly rejecting the null hypothesis is higher with
some statistical methods than with others. For example, nonparametric statistics are
typically less powerful than parametric statistics, for example.

Sampling

The process of selecting a fraction of the sampling unit (i.e. a collection with specified
dimensions) of the target population for inclusion in the study is called sampling.
Sampling can be probability sampling or non-probability sampling.

Probability Sampling or Random sampling

Probability sampling, also called random sampling, is a selection process that ensures
each participant the same probability of being selected. Probability sampling is the
process of selecting samples based on probability theory. Probability theory states that
possibility that events occur by chance. Random sampling is the best method for ensuring
that a sample is representative of the larger population. Random sampling can be simple
random sampling, stratified random sampling, and cluster sampling.

Nonprobability sampling

Nonprobability sampling is the selection process in which the probability that any one
individual or subject selected is not equal to the probability that another individual or
subject may be chosen. The probability of inclusion and the degree to which the sample
represents the population are unknown. The major problem with nonprobability sampling
is that sampling bias can occur. Nonprobability sampling can be convenience sampling,
purposive sampling or quota sampling.
Sampling Error (Standard Error)

Sampling error refers to the discrepancies that inevitably occur when a small group
(sample) is selected to represent the characteristics of a larger group (population). It is
defined as the deference between a parameter and an estimate of that parameter which is
derived from a sample (Lindquist, 1968:8). The means and standard deviations calculated
from the data collected on a given sample would not be the same as those calculations
derived from data collected from the entire population. It is the discrepancy between the
characteristics of the sample and the population that constitutes sampling error.

Descriptive statistics

Descriptive statistics are techniques which help the investigator to organize, summarize
and describe measures of a sample. Here no predictions or inferences are made regarding
population parameters. Descriptive statistics are used to summarize observations and to
place these observations within context. The most common descriptive statistics include
measures of central tendency and measures of variability.

Central tendency or “measures of the middle”

There are three commonly used measures of central tendency: the mean, the median, and
the mode- are calculated to identify the average, the most typical and the most common
values, respectively among the data collected. The mean is the arithmetic average, the
median is the point representing the 50th percentile in a distribution, and the mode is the
most common score. Sometimes each of these measures is the same; on other occasions,
the mean, the median, and the mode can be different. The mean, median, and mode are
the same when the distribution of scores is normal. Under most circumstances the mean,
median, and mode will not be exactly the same. The mode is most likely to misrepresent
the underlying distribution and is rarely used in statistical analysis. The mean and the
median are the most commonly reported measures of central tendency.

The major consideration in choosing between them is how much weight should be given
to extreme scores. The mean takes into account each score in the distribution; the median
finds only the halfway point. As mean best represents all subjects and because of
desirable mathematical properties, the mean is typically favored in statistical analysis.
Despite the advantages of the mean, there are also some advantages to the median. In
particular, the median disregards outlier cases, whereas the mean moves further in the
direction of the outliers. Thus, the median is often used when the investigator does not
want scores in the extreme of the distribution to have a strong impact. The median is also
valuable for summarizing data for a measure that might be insensitive toward the higher
ranges of the scale. For instance, a very easy test may have a ceiling effect but does not
show the true ability of some test-takers. A ceiling effect occurs when the test is too easy
to measure the true ability of the best students. Thus, if some scores stack up at the
extreme, the median may be more accurate than the mean. If the high scores had not been
bounded by the highest obtainable score, the mean may actually have been higher.

The mean, median, and mode are exactly the same in a normal distribution. However, not
all distributions of scores have a normal or bell-shaped appearance. The highest point in a
distribution of scores is called the modal peak. A distribution with the modal peak off to
one side or the other is described as skewed. The word skew literally means "slanted."
The direction of skew is determined by the location of the tail or flat area of the
distribution. Positive skew occurs when the tail goes off to the right of the distribution.
Negative skew occurs when the tail or low point is on the left side of the distribution. The
mode is the most frequent score in the distribution. In a skewed distribution, the mode
remains at the peak whereas the mean and the median shift away from the mode in the
direction of the skewness. The mean moves furthest in the direction of the skewness, and
the median typically falls between the mean and the mode. Mode is the best measure of
central tendency when nominal variables are used. Median is the best measure of central
tendency when ordinal variables are used. Mean is the best measure of central tendency
when interval or ratio scales are used.

Measures of Variability

If there is no variability within populations there would be no need for statistics: a single
item or sampling unit would tell us all that is needed to know about the population as a
whole. Three indices are used to measure variation or dispersion among scores: (1) range,
(2) variance, and (3) standard deviation (Cozby, 2000). The range describes the deference
between the largest and smallest observations made: the variance and standard deviation
are based on average difference or deviation of observations from the mean.

Measures of central tendency, such as the mean and median, are used to summarize
information. They are important because they provide information about the average
score in the distribution. Knowing the average score, however, does not provide all the
information required to describe a group of scores. In addition, measures of variability
are required. The simplest method of describing variability is the range, which is simply
the difference between the highest score and lowest score.

Another statistic, known as the interquartile range, describes the interval of scores
bounded by the 25th and 75th percentile ranks; the interquartile range is bounded by the
range of scores that represent the middle 50 percent of the distribution. In contrast to
ranges, which are used infrequently in statistical analysis, the variance and standard
deviation are used commonly. Since the mean is the average score in a distribution, the
sum of the deviations around the mean will always equal zero. Yet, in order to understand
the characteristic of a distribution of scores, some estimation of deviation around the
mean is important. The sum of these deviations will always equal zero. However, the
squared deviations around the mean can yield a meaningful index. The variance is the
sum of the squared deviations around the mean divided by the number of cases.

Range

Range is the simplest method of examining variation among scores and refers to the
difference between the highest and lowest values produced. It shows how wide the
distribution is over which the measurements are spread. For continuous variables, the
range is the arithmetic difference between the highest and lowest observations in the
sample. In the case of counts or measurements, 1 should be added to the difference
because the range is inclusive of the extreme observations.. The range takes account of
only the most extreme observations. It is therefore limited in its usefulness, because it
gives no information about how observations are distributed. Interquartile range is the
area between the lowest quartile and the highest quartile, or the middle 50% of the scores

Variance

The variance is a very useful statistic and is commonly employed in data analysis.
However, its calculation requires finding the squared deviations around the mean rather
than the simple or absolute deviations around the mean. Thus, when the variance is
calculated, the resulting calculation will be in units that are the natural squared units.
Taking the square root of the variance puts the observations back into their original
metric. The square root of the variance is known as the standard deviation. The standard
deviation is an approximation of the average deviation around the mean. Although the
standard deviation is not technically equal to the average deviation, it gives an
approximation of how much the average score deviates from the mean. One method for
calculating variance is to first calculate the deviation scores. The sum of the set of
deviation score equal to zero. Variance is the squire of the standard deviation:
conversely, a standard deviation is the squire root of the variance. The deviation of a
distribution of scores can then be used to calculate the variance.

Standard Deviation

The standard deviation is the most widely applied measure of variability. When
observations have been obtained from every item or sampling unit in a population, the
symbol for the standard deviation is (lower case sigma). This is parameter of the
population. When it is calculated from a sample it is symbolized s. Standard deviation of
a distribution of scores is the squire root of the variance. Large standard deviations
suggest that scores do not cluster around the mean: they are probably widely scattered.
Similarly small standards deviations suggest that there is very little deference among
scores.
Normal Distribution

The normal distribution is a mathematical construct which suggests that naturally


occurring observations follow a given pattern. The pattern is the normal curve, which
places most observations at the mean and lesser number of observations at either extreme.
This curve or bell-shaped distribution reflects the tendency of the observations
concerning a specific variable to cluster in a particular manner

The normal curve can be described for any set of data given the mean and standard
deviation of the data and assumptions that the characteristics under study would be
normally distributed within the population. A normal distribution of the data suggests that
68% of observations fall within one standard deviation of the mean, 95% fall within two
standard deviations of the mean, and 99.87% fall within three standard deviations of the
mean. Theoretically range of the curve is unlimited.

Standard Scores

One of the problems with means and standard deviations is that their meanings are not
independent of context. For example, a mean of 45.6 means little unless the score is
known. The Z-score is a transformation into standardized units that provides a context for
the interpretation of scores. The Z-score is the difference between the score and the mean,
divided by the standard deviation. To make comparisons between groups, standard scores
rather than raw scores can be used. Standard scores enable the investigator to examine
the position of a given score by measuring its mean deviation from the means of all sores.

Most often, the units on the x axis of the normal distribution are in Z-units. Any variable
transformed into Z-units will have a mean of 0 and a standard deviation of 1. Translation
of Z-scores into percentile ranks is accomplished using a table for the standard normal
distribution. Certain Z-scores are of particular interest in statistics and psychological
testing. The Z-score 1.96 represents the 97.5th percentile in a distribution whereas -1.96
represents the 2.5th percentile. A Z-score of less than -1.96 or greater than +1.96 falls
outside of a 95 percent interval bounding the mean of the Z-distribution. Some statistical
definitions of abnormality view these defined deviations as cutoff points. Thus, a person
who is more than 1.96 Z-scores from the mean on some attribute might be regarded as
abnormal. In addition to the interval bounded by 95 percent of the cases, the interval
including 99 percent of all cases is also commonly used in statistics.

Confidence Intervals

In most statistical inference problems the sample mean is used to estimate the population
mean. Each sample mean is considered to be an unbiased estimate of the population
mean. Although the sample mean is unlikely to be exactly the same as the population
mean, repeated random samples will form a sampling distribution of sample means. The
mean of the sampling distribution is an unbiased estimate of the population mean.
However, taking repeated random samples from the population is also difficult and
expensive. Instead, it is necessary to estimate the population mean based on a single
sample; this is done by creating an interval around the sample mean.

The first step in creating this interval is finding the standard error of the mean. The
standard error of the mean is the standard deviation divided by the square root of the
sample size. Statistical inference is used to estimate the probability that the population
mean will fall within some defined interval. Because sample means are distributed
normally around the population mean, the sample mean is most probably near the
population value. However, it is possible that the sample mean is an overestimate or an
underestimate of the population mean. Using information about the standard error of the
mean, it is possible to put a single observation of a mean into context.

The ranges that are likely to capture the population mean are called confidence intervals.
Confidence intervals are bounded by confidence limits. The confidence interval is
defined as a range of values with a specified probability of including the population
mean. A confidence interval is typically associated with a certain probability level. For
example, the 95 percent confidence interval has a 95 percent chance of including the
population mean. A 99 percent confidence interval is expected to capture the true mean in
99 of each 100 cases. The confidence limits are defined as the values for points that
bound the confidence interval.Creating a confidence interval requires a mean, a standard
error of the mean, and the Z-value associated with the interval.

Inferential statistics

Inferential statistics are mathematical procedures which help the investigator to predict or
infer population parameters from sample measures. This is done by a process of inductive
reasoning based on the mathematical theory of probability (Fowler, J., Jarvis, P. &
Chevannes M. 2002).

Probability

The idea of probability is basic to inferential statistics. The goal of inferential statistical
techniques is same, to determine as precisely as possible the probability of an occurrence.
It can be regarded as quantifying the chance that a stated outcome of an event will take
place. Probability refers to the likelihood that the differences between groups under study
are the result of chance. Probability Theory states, any given event out of all possible
outcomes. When any numbers of mutually exclusive sets are given they add up to a
singularity. When a coin is tossed it has two out comes, either head or tail, i.e. 0.5 chance
for head and 0.5 chance for tail. When these two chances are added it gives 1. For
example, in a class there are fifty students, the chance of students to become first in the
class is 1 in 50 (i.e. .02). By convention, probability values fall on a scale between 0
(impossibility) and 1 (certainty), but they are sometimes expressed as percentages, so the
‘probability’ scale has much in common with the proportion scale. The chance of
committing type one error is decided by testing the hypothesis for its probability value. In
behavioural sciences <.05 is taken as alpha value for testing the hypothesis. When
stringent outcomes are required <.01 or <.001 are taken as the alpha value or p value.

Statistical Significance (alpha level)

The level of significance (or alpha level) is determined to identify the probability that the
deference between the groups have occurred by chance rather than in response to the
manipulation of variables. The decision of whether the null hypothesis should be rejected
depends on the level of error that can be tolerated. The tolerance level of error is
expressed as a level of significance or alpha level. The usual level of significance or
alpha level is 0.05, although at times levels of 0.01 or o.001 may be used when high level
of accuracy is required. In testing the significance of obtained statistics, if the investigator
rejects the null hypothesis when, in fact, it is true he commits type I error or alpha error,
and when the investigator accepts the null hypothesis when, in fact, it is false he commits
type II or beta error (Singh AK, 2002).

Parametric and Non-parametric Tests

Parametric and non-parametric test are commonly employed in behavioral researches.

Parametric Tests

A parametric test is one which specifies certain conditions about the parameter of the
population from which a sample is taken. Such statistical tests are considered to be more
powerful than non-parametric tests and should be used if their basic requirements or
assumptions are met. Assumptions for using parametric tests:

• The observation must be independent.


• The observation must be drawn from a normal distribution.
• The sample drawn from a population must have equal variances and this condition
is more important if the size of the sample is particularly small, i.e. homogenicity
of variables.
• The variables must be expressed in interval or ratio scales.
• The variables under study should be continuous

Examples of parametric tests are t-test, z-test and F-test.


Non-parametric tests

A non-parametric test is one does not specify any conditions about the parameter of the
population from which the population is drawn. These tests are called distribution-free
statistics. For non-parametric tests, the variables under study should be continuous and
the observations should be independent. Requisites for using a non-parametric statistical
test are:

• The shape of the distribution of the population from which a sample is drawn is
not known to be normal curve.
• The variables have been quantified on the basis of nominal measures (or
frequency counts)
• The variables have been quantified on the basis of ordinal measures or ranking.
• A non-parametric test should be used only when parametric assumptions cannot
be met.

Common non-parametric tests

• Chi-squire test
• Mann-Whitney U test
• Rank difference methods (Spearman rho and Kendal’s tau)
• Coefficient of concordance (W)
• Median test
• Kruskal-Wallis test
• Friedman test

Tips on using appropriate tests in experimental design

Two unmatched (unrelated) groups, experimental and control (e.g. patient receiving a
prepared therapeutic intervention for depression and control group of patients on routine
care)-

• See the distribution, whether normal or non-normal


• If normal, use parametric tests (independent t-test)
• If non-normal, go for nonparametric tests- Mann-Whitney U test or making the
data normal through natural log transformation or z-transformation.

Two-matched (related) groups, pre-post design (the same group is rated before
intervention and after the period of intervention the group is again rate. i.e. two ratings in
the same or related group)-
• See distribution, whether normal or non-normal
• If normal use parametric paired t-test.
• If non-normal, use nonparametric Wilcoxon Sign Rank (W) test

More than two –unmatched (unrelated) groups (for example three groups:
schizophrenia, bipolar and control group)-

• see distribution whether normal or non-normal


• if normally distributed use parametric One-way ANOVA
• if non-normal use nonparametric Kruskal-Wallis test

More than two matched (related) groups (for example in ongoing intervention ratings
at different times- t1, t2, t3, t4 …)

• See distribution, normal or non-normal


• If the data is normal use parametric Repeated Measures ANOVA
• If data is non-normal use nonparametric Friedman’s test

Matched (related) and unmatched (unrelated) observations

When analyzing bivariate data such as correlations, a single sample unit gives a pair of
observations representing two different variables. The observations comprising a pair are
uniquely linked, are said to be matched or paired. For example, the systolic blood
pressure of 10 patients and measurements of another 10 patients after administration are
unmatched. However, the measurements of the same 10 patients before and after
administration of the drug are matched. It is possible to conduct more sensitive analysis if
the observations are matched.

Common Statistical tests

Chi-squire (X2) Test (analyzing frequencies)

The chi-squire test is one of the important non-parametric tests. Guilford (1956) has
called it the ‘general-purpose statistic’. Chi-squire test are widely referred to as test of
homogenicity, randomness, association, independence and goodness of fit. The chi-squire
test is used when the data are expressed in terms of frequencies of proportions or
percentages. This test applies only to discrete data, but any continuous data can be
reduced to the categories of in such a way that they can be treated as discrete data. The
chi-square statistic is used to evaluate the relative frequency or proportion of events in a
population that fall into well-defined categories. For each category, there is an expected
frequency that is obtained from knowledge of the population or from some other
theoretical perspective. There is also an observed frequency for each category. The
observed frequency is obtained from observations made by the investigator. The chi-
square statistic expresses the discrepancy between the observed and the expected
frequency.

There are several uses of chi-squire test as:

1. Chi-squire test can be used as a test of equal probability hypothesis (equal probability
hypothesis is meant the probability of having the frequencies in all the given categories as
equal).

2. Testing the significance of the independence hypothesis (independent hypothesis


means that one variable is not affected by or related to another variable and hence, these
two variables are independent).

3. Chi-squire test can be used in testing a hypothesis regarding the normal shape of a
frequency distribution (goodness-of-fit).

4. Chi-squire test is used in testing significance of several statistics like phi-coefficient,


coefficient of concordance, and coefficient of contingency.

5. In chi-squire test, the frequencies we observe are compared with those we expect on
the basis of some null hypothesis. If the discrepancy between the observed and expected
frequencies is great, then the value of the calculated test statistic will exceed the critical
value at the appropriate number of degree of freedom. Then the null hypothesis is
rejected in favor of some alternative. The mastery of the method lies not in so much in
the computation of the test statistic itself, but in the calculation of expected frequencies.

6. The chi-squire statistic does not give any information regarding the strength of a
relationship: it only conveys the existence of or non-existence of the relationship between
the variables investigated. To establish the extent and nature of the relationship,
additional statistics such as phi, Cramer’s V, or contingency coefficient can be used
(Brockopp &Hastings-Tolsma, 2003).

Tips on analyzing frequencies

• All versions of the chi-squire test compare the agreement between a set of
observed frequencies and those expected if some null hypothesis is true.
• All objects are counted the nominal scale or unambiguous intervals on a
continuous scale like successive days or moths ma be regarded for the application
of the tests.
• Apply Yate’s correction in the chi-squire test when there is only one degree of
freedom, i.e. when there is only ‘one way’ test and in 2×2 contingency table.
Testing normality of a data

Parametric statistical techniques depend upon the mathematical properties of the normal
curve. They usually assume that samples are drawn from populations that are normally
distributed. Before adopting a statistical test, it is essential to determine whether the data
is normal or non-normal. The normality of data can be checked by two ways, either plot
out the data to see if they look normal or using sophisticated statistical procedures. There
are statistical tests to see normality of the data. The commonest one is Kolmogorov-
Smirnov test. As per the central limit theorem, if there is no significance in the P value
(> .05) ideally a parametric test can be used for analyzing the data, and if there is
significance (<.05) a non-parametric test should be used for analysis. A Shapiro-Wilk
test is used to see whether parameters used to test normality is within the allowed limit.
Statistical packages like SPSS can be used for doing this test.

t-test and z-test (comparing means)

In experimental sciences, comparisons between groups are very common. Usually, one
group is the treatment, or experimental group, while the other group is the untreated, or
control group. If patients are randomly assigned to these two groups, it is assumed that
they differ only by chance prior to treatment. Differences between groups after the
treatment are usually used to estimate treatment effect. The task of the statistician is to
determine whether any observed differences between the groups following treatment
should be attributed to chance or to the treatment. The t-test is commonly used for this
purpose. There are actually several different types of t-tests

Types of t-Tests

• Comparison of a sample mean with a hypothetical population mean.


• Comparison between two scores in the same group of individuals.
• Comparison between observations made on two independent groups.

t-test and z-test are parametric inferential statistical techniques used when comparison of
two means are required. It is used to test the null hypothesis that there is no difference in
means between the two groups. The reporting of the results of t-test generally includes
the df, t-value, and probability level. A t-test can be one-tailed or two-tailed. If the
hypothesis is directional, a one-tailed test is generally used, and if the hypothesis is non-
directional. t-test is used when sample size is less than 30 and z-test is used when sample
size is more than 30.

There are dependent and independent t-tests. The formula to calculate a t-test can differ
depending on whether the samples involved are dependent or independent. Samples are
independent when there are two groups such as an experimental and a control group.
Samples are dependent when the participants from two groups are paired in some
manner. The form of the t-test that is used with a dependent sample may be termed as
paired, dependent, matched, or correlated (Brockopp & Hastings-Tolsma, 2003).

Degree of freedom (df)

Degree of freedom (df) is a mathematical concept that describes the number of events or
observations that are free to vary: for each statistical test there is a formula for calculating
the appropriate degree of freedom (n-1).

Mann-Whitney U-test

The Mann-Whitney U test is a non-parametric substitute for the parametric t-test, for
comparing the medians of two unmatched pairs. For application of U test data must be
obtained on ordinal or interval scale. We can use Mann-Whitney U-test to compare the
median time undertaken to perform the task by a sample of subjects who had not drunk
with that of another sample who had drunk a standardized volume of alcohol. This test is
used to see group difference, when the data is non-normal and the groups are
independent. The test can be applied in groups with unequal or equal size.
Some key points about using Mann-Whitney U-test are:

• This test can be applied to interval data (measurements), to count of things,


derived variable (proportions and indices) and to ordinal data (rank scales, etc.)
• Unlike some test statistics, the calculated value of U has to be smaller than the
tabulated critical value in order to reject null hypothesis.
• The test is for difference in medians. It is common error to record a statement like
‘the Mann-Whitney U-test showed there is significant difference in means. There
is, however, no need to calculate the medians of each sample to do the test.

Wilcoxon test -matched pairs

The Wilcoxon test for matched pairs is a non-parametric test for comparing the medians
of two matched samples. It calls for a test statistic T whose probability distribution is
known. The observation must be drawn on interval scale. It is not possible to use this test
on ordinal measurements. The Wilcoxon's test can be used in matched pair samples. This
test is for difference in medians and the test assumes that samples have been drawn from
parent populations that are symmetrically not necessarily normally distributed.

Pearson Product-Moment Correlation Coefficient

The Pearson product-moment correlation method is a parametric test is a common


method assessing the association between two variables under study. In this test an
estimation of at least one parameter is involved, measurement is at an interval level, and
it is assumed that the variable under study is normally distributed within the population.
Spearman Rank correlation Coefficient

Spearman’s r is a nonparametric test, which is equivalent to parametric Pearson r.


Spearman’s Rank Correlation Technique is used when the conditions of the Product
Moment Correlation Coefficient do no apply. This test is widely used by health scientists
and uses ranks of the x and y observations and the raw data themselves are discarded.

Tips on using correlation tests

• When observations of one or both variables are on an ordinal scale, or are


proportions, percentages, indices or counts of things, use the Spearman’s Rank
Correlation Coefficient. The number of units in the sample i.e. the number of
paired observations should be between 7 and 30.
• When observations are measured on interval scale use Product Moment
Correlation Coefficient should be considered. . Sample units must be obtained
randomly, and the data should be bivariate normal i.e. x and y.
• The relationship between the variables should be rectilinear (straight line) not
curved. Certain mathematical transformations (e.g. logarithmic transformation)
will ‘straighten up’ curved relationships.
• A strong and significant correlation does not mean does not mean one necessarily
the cause of the other. It is possible that some additional, unidentified factor is
underlying source of variability in both variables.
• Correlations measured in samples estimate correlations in the populations. A
correlation in a sample is not ‘improved’ or strengthened by obtaining more
observations: however, larger samples may be required to confirm the statistical
significance of weaker correlations.

Common Statistical Tests

Regression Analysis

Regression analysis is often used to predict the value of one variable given information
about another variable. The procedure can describe how two continuous variables are
related. Regression analysis is used to examine relationships among continuous variables
and is most appropriate for data that can be plotted on a graph. Data are usually plotted,
so that the independent variable is seen on the horizontal (x) axis and the dependent
variable on the vertical (y) axis. The statistical procedure for regression analysis includes
a test for the significance of the relationship between two variables. Given a significant
relationship between two variables, knowledge of the value of the independent variable
permits a prediction of the value of the dependent variable.
One-Way Analysis of Variance (ANOVA)

When there are three or more samples, and the data from each sample are thought to be
distributed normally, analysis of variance (ANOVA) may be a technique of choice One-
way analysis of variance is a parametric inferential statistical test that enables the
investigators to compare two or more group means, which was developed by RF. Fisher.
The reporting of the results includes the df, F value and the probability level. ANOVA is
of two types: simple analysis of variance and complex analysis of variance or two-way
analysis of variance. One-Way Analysis of Variance (ANOVA) is an extension of t-test,
which permits the investigator to compare more than two means simultaneously.

Researchers studying two or more groups can use ANOVA to determine whether there
are differences among the groups. For example, nurse investigators who want to assess
the levels of helplessness among three groups of patients--long-term, acute care and
outpatients-can administer an instrument designed to measure levels of helplessness and
then calculate an F ratio. If the F ratio is sufficiently large, then conclusion can be that
there is a difference between at least two of the means can be drawn.

The larger the F-ratio, the more likely it is that the null hypothesis can be rejected. Other
tests called post hoc comparisons, can be used to determine which of the means differ
significantly. Fisher’s LSD, Duncan’s new multiple range test, the Neuman-Keuls,
Tukey’s HSD, and Scheffe’s test are the post hoc comparison tests that are most
frequently used following ANOVA. In some instances a post hoc comparison is not
necessary because the means of the groups under consideration readily convey the
differences between the groups (Brockopp & Hastings-Tolsma, 2003).

Kruskal-Wallis test-more than two samples

The Kruskal-Wallis test is a simple non-parametric test to compare the medians of three
or more samples. Observations may be interval measurements, counts of things, derived
variables, or ordinal ranks. If there are only three samples, then there must be at least five
observations in each sample. Samples do not have to be of equal sizes. The statistic K is
used to indicate the test value.

Multivariate Analysis

Two-way or Factorial Analysis of Variance

Factorial analysis of variance permits the investigator to analyze the effects of two or
more independent variables on the dependent variable (one-way ANOVA is used with
one independent variable and one dependent variable). The term factor is interchangeable
with independent variable and factorial ANOVA therefore refers to the idea that data
having two or more independent variables can be analyzed using this technique.
Analysis of Covariance (ANCOVA)

ANCOVA is an inferential statistical test that enables investigators t adjusts statistically


for group differences that may interfere with obtaining results that relate specifically to
the effects of the independent variable(s) on the dependent variable(s).

Multivariate Analysis

Multivariate analysis refers to a group of inferential statistical tests that enable the
investigator to examine multiple variables simultaneously. Unlike other statistical
techniques, these tests permit the investigator to examine several dependent and
independent variables simultaneously.

Choosing the appropriate test

If the data fulfill the requirement of parametric assumptions, any of the parametric tests
which suit the purpose can be used. O the other hand, if the data do not fulfill the
parametric requirements, any of the non-parametric statistical tests, which suit the
purpose, can be selected. Other factors which decide the selection of appropriate
statistical tests are the number of independent and dependent variables, and he nature of
the variables (whether nominal, ordinal, interval or ratio). When both independent and
dependent variables are interval measures and are more than one, multiple correlation is
the most appropriate statistic. On the other hand when they are interval measures and
their number is only one, Pearson r may be used. With ordinal and nominal measures, the
non-parametric statistics are the common choice.

Computer Aided Analysis

The availability of computer software has greatly facilitated the execution of most
statistical techniques. The many statistical packages run on different types of platforms or
computer configurations. For general data analysis the Statistical Package for the Social
Sciences (SPSS), the BMDP series, and the Statistical Analysis System (SAS) are
recommended. These are general-purpose statistical packages that perform essentially all
the analyses common to biomedical research. In addition, a variety of other packages
have emerged.

SYSTAT runs on both IBM-compatible and Macintosh systems and performs most of the
analyses commonly used in biomedical research. The popular SAS program has been
redeveloped for Macintosh systems and is sold under the name JMP. Other commonly
used programs include Stata, which is excellent for the IBM-compatible computers. The
developers of Stata release a regular newsletter providing updates, which makes the
package very attractive. StatView is a general-purpose program for the Macintosh
computer.

Newer versions of StatView include an additional program called Super ANOVA, which
is an excellent set of ANOVA routines. StatView is user-friendly and also has superb
graphics. For users interested in epidemiological analyses, Epilog is a relatively low-cost
program that runs on the IBM-compatible platforms. It is particularly valuable for rate
calculations, analysis of disease-clustering patterns, and survival analysis. GB-STAT, is a
low-cost, multipurpose package that is very comprehensive.

SPSS (Statistical Package for Social Sciences) is one among the popular computer
programs for data analysis. This software provides a comprehensive set of flexible tools
that can be used to accomplish a wide variety of data analysis tasks (Einspruch, 1998).
SPSS is available in a variety of platforms. The latest product information and free
tutorial are available at www.spss.com.

Computer software programs that provide easy access to highly sophisticated statistical
methodologies represent both opportunities and dangers. On the positive side, no serious
researcher need be concerned about being unable to utilize precisely the statistical
technique that best suits his or her purpose, and to do so with the kind of speed and
economy that was inconceivable just two decades ago. The danger is that some
investigators may be tempted to employ after-the-fact statistical manipulations to salvage
a study that was flawed to start with, or to extract significant findings through use of
progressively more sophisticated multivariate techniques.

References & Bibliography

1. Ahuja R (2001). Research Methods. Rawat Publications, New Delhi. 71-72.


2. Brockopp D Y & Hastings-Tolsma M (2003). Fundamental of Nursing Research.
3rd Edition. Jones and Bartlet: Boston
3. Cozby P C (2000). Methods in Behavioral Research (7th Edition). Toronto:
Mayfield Publishing Co.
4. Kerr A W, Hall H K, Kozub S A (2002). Doing Statistics with SPSS. Sage
Publications, London.
5. Einspruch E L (1998). An Introductory Guide to SPSS for Windows. Sage
Publications, Calf.
6. Fowler J, Jarvis P & Chevannes M (2002). Practical Statistics for Nursing and
Health Care. John Wiley & Sons: England
7. Guilliford, J P (1956). Fundamental Statistics in Psychology and Education. New
York: McGraw-Hill Book Co.
8. Lindquist, E F. (1968). Statistical Analysis in Educational Research. New Delhi:
Oxford and IBH Publishing Co.
9. Singh AK. (2002). Tests, Measurements and Research Methods in Behavioural
Sciences. Bharahty Bhavan. New Delhi.
10. Singlton, Royce A. and Straits, Bruce (1999). Approaches to Social Research (3rd
Ed), Oxford University Press, New York.
11. Streiner, D. & Norman, G. (1996). PDQ Epidemiology (2nd Edition). St. Louis:
Mosbey
12. Therese Baker L (1988). Doing Social Research, McGraw Hill Book Co., New
York.
13. reece E W & Treece J H (1989). Elements of Research in Nursing, The C.V.
Mosby Co.,St.Louis.
14. Tyler L E (1963).Tests and Measurements. Englewood Cliffs, New Jersey:
Prentice Hall, a-p7.b-p.14
15. Chalmers TC, Celano P, Sacks H, Smith H(1983). Bias in treatment assignment in
controlled clinical trials. N Engl J Med 309:1358.
16. Cohen J (1988). Statistical Power Analysis for the Behavioral Sciences. Erlbaum,
Hillsdale, NJ.
17. .Cook TD, Campbell DG(1979). Quasi-experimentation: Design and Analysis
Issues for Field Studies. Rand-McNally, Chicago.
18. Daniel WW (1995) Biostatistics: A Foundation for Analysis in the Health
Sciences, ed 6. Wiley, New York.
19. Daniel WW (1990). Applied Nonparametric Statistics, ed 2. PWS-Kent, Boston.
20. Dawson-Saunders B, Trapp RG (1994) Basic and Clinical Biostatistics, ed 2.
Appleton & Lange, Norwalk, CT.
21. Edwards LK, editor (1993) Applied Analysis of Variance in Behavioral Science.
Marcel Dekker, New York.
22. Efron B, Tibshirani R (1991). Statistical data analysis in the computer age.
Science 253:390.
23. Jaccard J, Becker MA (1997). Statistics for the Behavioral Sciences, ed 3.
Brooks/Cole Publishing Co, Pacific Grove, CA.
24. Keppel G (1991). Design and Analysis. Prentice-Hall, Englewood Cliffs, NJ.
25. Kaplan RM, Grant I, (200). Statistics and Experimental Design in Kaplan &
Sadock's Comprehensive Textbook of Psychiatry 7th Edition.
26. McCall R (1994). Fundamental Statistics for Psychology, ed 6. Harcourt Brace, &
Jovanovich, New York.
27. Pett MA (1997). Nonparametric Statistics for Health Care Research: Statistics for
Small Samples and Unusual Distributions. Sage Publications, Thousand Oaks,
CA.
28. Sacks H, Chalmers DC, Smith H (1982). Randomized versus historical controls
for clinical trials. Am J Med 72:233.
29. Ware ME, Brewer CL, editors (1999). Handbook for Teaching Statistics and
Research Methods, ed 2. Erlbaum, Mahwah, NJ.