Вы находитесь на странице: 1из 35

Hypothesis Testing

Hypothesis Testing

The main purpose of statistics is to test a hypothesis. For example, you


might run an experiment and find that a certain drug is effective at treating
headaches. But if you can’t repeat that experiment, no one will take your results
seriously. A good example of this was the cold fusion discovery, which petered into
obscurity because no one was able to duplicate the results.

Contents (Click to skip to the section):

1. What is a Hypothesis?
2. What is Hypothesis Testing?
3. Hypothesis Testing Examples (One Sample Z Test).
4. Hypothesis Test on a Mean (TI 83).
5. Bayesian Hypothesis Testing.
6. More Hypothesis Testing Articles

See also:

 Critical Values
 What is the Null Hypothesis?

What is a Hypothesis?
Andreas Cellarius hypothesis, showing the planetary motions.

A hypothesis is an educated guess about something in the world around you. It should
be testable, either by experiment or observation. For example:
 A new medicine you think might work.
 A way of teaching you think might be better.
 A possible location of new species.
 A fairer way to administer standardized tests.
It can really be anything at all as long as you can put it to the test.

What is a Hypothesis Statement?


If you are going to propose a hypothesis, it’s customary to write a statement. Your
statement will look like this:
“If I…(do this to an independent variable)….then (this will happen to the dependent
variable).”
For example:

 If I (decrease the amount of water given to herbs) then (the herbs will increase
in size).
 If I (give patients counseling in addition to medication) then (their overall
depression scale will decrease).
 If I (give exams at noon instead of 7) then (student test scores will improve).
 If I (look in this certain location) then (I am more likely to find new species).
A good hypothesis statement should:

 Include an “if” and “then” statement (according to the University of


California).
 Include both the independent and dependent variables.
 Be testable by experiment, survey or other scientifically sound technique.
 Be based on information in prior research (either yours or someone else’s).
 Have design criteria (for engineering or programming projects).

What is Hypothesis Testing?

Hypothesis testing in statistics is a way for you to test the results of a survey or
experiment to see if you have meaningful results. You’re basically testing whether your
results are valid by figuring out the odds that your results have happened by chance. If
your results may have happened by chance, the experiment won’t be repeatable and so
has little use.

Hypothesis testing can be one of the most confusing aspects for students, mostly
because before you can even perform a test, you have to know what your null
hypothesis is. Often, those tricky word problems that you are faced with can be
difficult to decipher. But it’s easier than you think; all you need to do is:

1. Figure out your null hypothesis,


2. State your null hypothesis,
3. Choose what kind of test you need to perform,
4. Either support or reject the null hypothesis.

What is the Null Hypothesis?


If you trace back the history of science, the null hypothesis is always the accepted fact.
Simple examples of null hypotheses that are generally accepted as being true are:

1. DNA is shaped like a double helix.


2. There are 8 planets in the solar system (excluding Pluto).
3. Taking Vioxx can increase your risk of heart problems (a drug now taken off
the market).

How do I State the Null Hypothesis?


You won’t be required to actually perform a real experiment or survey in elementary
statistics (or even disprove a fact like “Pluto is a planet”!), so you’ll be given word
problems from real-life situations. You’ll need to figure out what your hypothesis is
from the problem. This can be a little trickier than just figuring out what the accepted
fact is. With word problems, you are looking to find a fact that is nullifiable (i.e.
something you can reject).

Hypothesis Testing Examples #1: Basic


Example
A researcher thinks that if knee surgery patients go to physical therapy twice
a week (instead of 3 times), their recovery period will be longer. Average
recovery times for knee surgery patients is 8.2 weeks.
The hypothesis statement in this question is that the researcher believes the average
recovery time is more than 8.2 weeks. It can be written in mathematical terms as:
H1: μ > 8.2
Next, you’ll need to state the null hypothesis (See: How to state the null hypothesis).
That’s what will happen if the researcher is wrong. In the above example, if the
researcher is wrong then the recovery time is less than or equal to 8.2 weeks. In math,
that’s:
H0 μ ≤ 8.2

Rejecting the null hypothesis


Ten or so years ago, we believed that there were 9 planets in the solar system. Pluto
was demoted as a planet in 2006. The null hypothesis of “Pluto is a planet” was
replaced by “Pluto is not a planet.” Of course, rejecting the null hypothesis isn’t always
that easy — the hard part is usually figuring out what your null hypothesis is in the
first place.

Hypothesis Testing Examples (One


Sample Z Test)
The one sample z test isn’t used very often (because we rarely know the actual
population standard deviation). However, it’s a good idea to understand how it works
as it’s one of the simplest tests you can perform in hypothesis testing. In English class
you got to learn the basics (like grammar and spelling) before you could write a story;
think of one sample z tests as the foundation for understanding more complex
hypothesis testing. This page contains two hypothesis testing examples for one sample
z-tests.

One Sample Hypothesis Testing


Examples: #2
A principal at a certain school claims that the students in his school are above average
intelligence. A random sample of thirty students IQ scores have a mean score of 112.
Is there sufficient evidence to support the principal’s claim? The mean population IQ is
100 with a standard deviation of 15.
Step 1: State the Null hypothesis. The accepted fact is that the population mean is
100, so: H0: μ=100.

Step 2: State the Alternate Hypothesis. The claim is that the students have above
average IQ scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this
is a one-tailed test.

Step 3: Draw a picture to help you visualize the problem.


Step 4: State the alpha level. If you aren’t given an alpha level, use 5% (0.05).

Step 5: Find the rejection region area (given by your alpha level above) from
the z-table. An area of .05 is equal to a z-score of 1.645.

Step 6: Find the test statistic using this formula:


For this set of data: z= (112.5-100) / (15/√30)=4.56.

Step 6: If Step 6 is greater than Step 5, reject the null hypothesis. If it’s less than Step
5, you cannot reject the null hypothesis. In this case, it is greater (4.56 > 1.645), so
you can reject the null.

One Sample Hypothesis Testing


Examples: #3
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of
15. A researcher thinks that a diet high in raw cornstarch will have a positive or
negative effect on blood glucose levels. A sample of 30 patients who have tried the raw
cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw
cornstarch had an effect.

Step 1: State the null hypothesis: H0:μ=100


Step 2: State the alternate hypothesis: H1:≠100
Step 3: State your alpha level. We’ll use 0.05 for this example. As this is a two-tailed
test, split the alpha into two.
0.05/2=0.025
Step 4: Find the z-score associated with your alpha level. You’re looking for the area
in one tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed
test, you would also be considering the left tail (z=1.96)

Step 5: Find the test statistic using this formula:


z=(140-100)/(15/√30)=14.60.
Step 6: If Step 5 is less than -1.96 or greater than 1.96 (Step 3), reject the null
hypothesis. In this case, it is greater, so you can reject the null.

*This process is made much easier if you use a TI-83 or Excel to calculate the z-score
(the “critical value”).
See:
 Critical z value TI 83
 Z Score in Excel

Hypothesis Testing Examples: Mean


(Using TI 83)
You can use the TI 83 calculator for hypothesis testing, but the calculator won’t figure
out the null and alternate hypotheses; that’s up to you to read the question and input
it into the calculator.

Sample problem: A sample of 200 people has a mean age of 21 with a population
standard deviation (σ) of 5. Test the hypothesis that the population mean is 18.9 at α
= 0.05.

Step 1: State the null hypothesis. In this case, the null hypothesis is that the population
mean is 18.9, so we write:
H0: μ = 18.9

Step 2: State the alternative hypothesis. We want to know if our sample, which has a
mean of 21 instead of 18.9, really is different from the population, therefore our
alternate hypothesis:
H1: μ ≠ 18.9

Step 3: Press Stat then press the right arrow twice to select TESTS.

Step 4: Press 1 to select 1:Z-Test…. Press ENTER.

Step 5: Use the right arrow to select Stats.

Step 6: Enter the data from the problem:


μ0: 18.9
σ: 5
x: 21
n: 200
μ: ≠μ0

Step 7: Arrow down to Calculate and press ENTER. The calculator shows the p-value:
p = 2.87 × 10-9

This is smaller than our alpha value of .05. That means we should reject the null
hypothesis.
Bayesian Hypothesis Testing: What is
it?

Image: Los Alamos National Lab.

Bayesian hypothesis testing helps to answer the question: Can the results from a test or
survey be repeated?
Why do we care if a test can be repeated? Let’s say twenty people in the same village
came down with leukemia. A group of researchers find that cell-phone towers are to
blame. However, a second study found that cell-phone towers had nothing to do with
the cancer cluster in the village. In fact, they found that the cancers were completely
random. If that sounds impossible, it actually can happen! Clusters of cancer can
happen simply by chance. There could be many reasons why the first study was faulty.
One of the main reasons could be that they just didn’t take into account that
sometimes things happen randomly and we just don’t know why.

P Values.
It’s good science to let people know if your study results are solid, or if they could have
happened by chance. The usual way of doing this is to test your results with a p-value.
A p value is a number that you get by running a hypothesis test on your data. A P value
of 0.05 (5%) or less is usually enough to claim that your results are repeatable. However,
there’s another way to test the validity of your results: Bayesian Hypothesis testing.
This type of testing gives you another way to test the strength of your results.

Bayesian Hypothesis Testing.


Traditional testing (the type you probably came across in elementary stats or AP stats)
is called Non-Bayesian. It is how often an outcome happens over repeated runs of the
experiment. It’s an objective view of whether an experiment is repeatable.
Bayesian hypothesis testing is a subjective view of the same thing. It takes into account
how much faith you have in your results. In other words, would you wager money on
the outcome of your experiment?

Differences Between Traditional and


Bayesian Hypothesis Testing.
Traditional testing (Non Bayesian) requires you to repeat sampling over and over, while
Bayesian testing does not. The main different between the two is in the first step of
testing: stating a probability model. In Bayesian testing you add prior knowledge to
this step. It also requires use of a posterior probability, which is the conditional
probability given to a random event after all the evidence is considered.

Arguments for Bayesian Testing.


Many researchers think that it is a better alternative to traditional testing, because it:

1. Includes prior knowledge about the data.


2. Takes into account personal beliefs about the results.

Arguments against.
1. Including prior data or knowledge isn’t justifiable.
2. It is difficult to calculate compared to non-Bayesian testing.

Null Hypothesis Overview


The null hypothesis, H0 is the commonly accepted fact; it is the opposite of
the alternate hypothesis. Researchers work to reject, nullify or disprove the null
hypothesis. Researchers come up with an alternate hypothesis, one that they think
explains a phenomenon, and then work to reject the null hypothesis.

Why is it Called the “Null”?


The word “null” in this context means that it’s a commonly accepted fact that
researchers work to nullify. It doesn’t mean that the statement is null itself! (Perhaps
the term should be called the “nullifiable hypothesis” as that might cause less
confusion).

Why Do I need to Test it? Why not just


prove an alternate one?
The short answer is, as a scientist, you are required to; It’s part of the scientific process.
Science uses a battery of processes to prove or disprove theories, making sure than any
new hypothesis has no flaws. Including both a null and an alternate hypothesis is one
safeguard to ensure your research isn’t flawed. Not including the null hypothesis in your
research is considered very bad practice by the scientific community. If you set out to
prove an alternate hypothesis without considering it, you are likely setting yourself up
for failure. At a minimum, your experiment will likely not be taken seriously.

Example
Not so long ago, people believed that the world was flat.

Null hypothesis, H0: The world is flat.


Alternate hypothesis: The world is round.
Several scientists, including Copernicus, set out to disprove the null hypothesis. This
eventually led to the rejection of the null and the acceptance of the alternate. Most
people accepted it — the ones that didn’t created the Flat Earth Society!. What would
have happened if Copernicus had not disproved the it and merely proved the alternate?
No one would have listened to him. In order to change people’s thinking, he first had to
prove that their thinking was wrong.

How to State the Null Hypothesis


Watch the video or read the steps below:

How to State the Null Hypothesis from a


Word Problem
You’ll be asked to convert a word problem into a hypothesis statement in statistics that
will include a null hypothesis and an alternate hypothesis. Breaking your problem into
a few small steps makes these problems much easier to handle.

How to State the Null Hypothesis


Example Problem: A researcher thinks that if knee surgery patients go to physical
therapy twice a week (instead of 3 times), their recovery period will be longer. Average
recovery times for knee surgery patients is 8.2 weeks.
Hypothesis testing is vital to test patient outcomes.

Step 1: Figure out the hypothesis from the problem. The hypothesis is usually hidden in
a word problem, and is sometimes a statement of what you expect to happen in the
experiment. The hypothesis in the above question is “I expect the average recovery
period to be greater than 8.2 weeks.”

Step 2: Convert the hypothesis to math. Remember that the average is sometimes
written as μ.

H1: μ > 8.2

Broken down into (somewhat) English, that’s H1 (The hypothesis): μ (the average) > (is
greater than) 8.2

Step 3: State what will happen if the hypothesis doesn’t come true. If the recovery
time isn’t greater than 8.2 weeks, there are only two possibilities, that the recovery
time is equal to 8.2 weeks or less than 8.2 weeks.

H0: μ ≤ 8.2

Broken down again into English, that’s H0 (The null hypothesis): μ (the average) ≤ (is
less than or equal to) 8.2

How to State the Null Hypothesis: Part


Two
But what if the researcher doesn’t have
any idea what will happen?
Sample Problem: A researcher is studying the effects of radical exercise program on
knee surgery patients. There is a good chance the therapy will improve recovery time,
but there’s also the possibility it will make it worse. Average recovery times for knee
surgery patients is 8.2 weeks.

Step 1: State what will happen if the experiment doesn’t make any difference. That’s
the null hypothesis–that nothing will happen. In this experiment, if nothing happens,
then the recovery time will stay at 8.2 weeks.

H0: μ = 8.2

Broken down into English, that’s H0 (The null hypothesis): μ (the average) = (is equal to)
8.2

Step 2: Figure out the alternate hypothesis. The alternate hypothesis is the opposite of
the null hypothesis. In other words, what happens if our experiment makes a
difference?

H1: μ ≠ 8.2

In English again, that’s H1 (The alternate hypothesis): μ (the average) ≠ (is not equal
to) 8.2

That’s How to State the Null Hypothesis!

Independent Variable (Treatment Variable)


Definition and Uses
Types of Variable > Independent Variable

Contents:

1. Independent Variable
2. Predictor Variable

1. Independent Variable Definition.


Independent variables are variables that stand on their own and aren’t affected by
anything that you, as a researcher, do. You have complete control over which
independent variables you choose. During an experiment, you usually choose
independent variables that you think will affect dependent variables. Those are
variables that can be changed by outside factors. If a variable is classified as a control
variable, it may be thought to alter either the independent variable or dependent
variable but it isn’t the focus of the experiment.

Example: you want to know how calorie intake affects weight. Calorie intake is your
independent variable and weight is your dependent variable. You can choose the
calories given to participants, and you see how that independent variable affects the
weights. You may decide to include a control variable of age in your study to see if it
affects the outcome.

The above graph shows the independent variable of male or female plotted on the
x=axis. “Male” or “Female” is unchangeable by you, the researcher, or anything you can
perform in your experiment. On the other hand, the dependent variable of “mean
vocabulary scores” is potentially changed by which independent variable is assigned. In
other words, the mean vocabulary scores depend on the independent variable: whether
the participant is male or female.

Another way of looking at independent variables is that they cause something (or are
thought to cause something). In the above example, the independent variable is calorie
consumption. That’s thought to cause weight gain (or loss).

Independent Variables: Other Names


and Uses.
Independent variables (inputs) are fed into your machine (i.e. your experiment) to see
what outputs. Source: UNM.EDU

Independent variables are also called the “inputs” for functions. They are traditionally
plotted on the x-axis of a graph. In statistics, an independent variable is also sometimes
called:

 A controlled variable.
 An explanatory variable.
 An exposure variable (in reliability theory).
 A feature (in machine learning and pattern recognition).
 An input variable.
 A manipulated variable.
 A predictor variable.
 A regressor (in regression analysis).
 A risk factor (in medical statistics).

What is a Predictor Variable?


Predictor variables are used in regression analysis.

A predictor variable has essentially the same meaning as an independent variable. It’s
plotted on the x-axis, and it affects a dependent variable. However, it’s not exactly the
same, as you use the term in very specific situations:
 In regression analysis, where the predictor variable is also called a regressor.
The other variable (comparable to the dependent variable) is called a criterion
variable.
 In non-experimental studies, where it is the presumed “cause.” For example,
scores on a math test indicate an aptitude for engineering. “Scores on the math
test” are the predictor variables and engineering aptitude is the criterion variable.

Types of Predictor Variable.


The two main types are:

1. Quantitative Predictors, which have a numerical value (i.e. 5.5,800,2K) for


categories like age, height, test scores or weight.
2. Qualitative Predictors, which do not have numerical values. Used for categories
like gender, socioeconomic status, political affiliation or geographic location.
A common workaround to working with qualitative predictors is to assign them to a
numerical class when performing correlational studies. For example, if you were
performing a study that was looking at the effect of sex and income, you might assign
the following classes:

 Woman(1).
 Man(2).
 Transgender woman(3).
 Transgender man(4).
When you only have two classes coded 0 or 1, it’s called a dummy variable. Dummy
variables can make it easier to understand the results from a regression analysis. Other
codings, like 2/3 or 8/9 can also be used (they just make the output more difficult to
comprehend).

Multiple Predictor Variables


Some regression models can include dozens of predictor variables. That’s a model
that Professor David Dranove of the Kellogg school of management calls the “kitchen
sink” regression method. It’s possible for thousands of potential predictor variables to
make up a data set, so care should be taken in choosing which ones you use for your
analysis. There are several reasons for this, one of which is the more variables you
throw in to the mix, the weaker your model.
Some rules of thumb for choosing variables:

 Select a maximum of one predictor variable for every five observations, if your
predictive model is good.
 Use a maximum of one predictor variable for every ten observations if your
predictive model is weak, or if you have a slew of variables to choose from.
 If you have categorical variables, treat each included one as half of a normal
predictor.

Levels of Independent Variable


While you might study one IV for a science fair project, it more common to have
many levels of the same IV. You can think of a “level” as a sub type of the IV. For
example, you might be studying weight loss for three different diets: Atkins, Paleo, and
Vegan. The three diets are the three levels of Independent Variable. Or, you could have
an experiment where you are comparing two treatments: placebo and experimental. In
that case, you have two levels.

Dependent Variable: Definition and Examples


Types of Variable > Dependent Variable Definition

Contents (click to skip to the section):

1. Dependent Variable General Definition.


2. Dependent Variable Examples.
3. Test Your Understanding.
4. Dependent Variable Definition (Statistical Modeling).
5. Dependent Variables in Psychology.
6. Dependent Variables in Cross Tabs.
7. Other Names for the Dependent Variable.
8. Outcome Variable.

Dependent Variable Definition.


The dependent variable(DV) is just like the name sounds; it depends upon some factor
that you, the researcher, controls. For example:
 How well you perform in a race depends on your training.
 How much you weigh depends on your diet.
 How much you earn depends upon the number of hours you work.
Whatever event you are expecting to change is always the dependent variable. In the
first example above race performance is the variable you would expect to change if you
changed your training, so that’s the dependent variable. In the second example, the
dependent variable is weight and in the third example the dependent variable is the
amount earned.

If you have trouble figuring out which of your variables is the independent one, and
which is the dependent one, try inserting the variables into the following sentence:

“(Independent variable) causes a change in (Dependent Variable) and it isn’


t possible that (Dependent Variable) could cause a change in (Independent
Variable).”
When you run an experiment (I’m using the word “experiment” here loosely…it could
be as simple as taking a survey or it could involve a complex scientific experiment),
your independent variable stays fixed. In the next graph, the independent
variable(IV) is the grade level and the dependent variable is the food rating. You can see
that the food rating depends on what grade a student is (it looks like the higher grade
levels have pickier eaters or perhaps students who choose their food more carefully).

Source: NIH.GOV.

Potential Confusion
You, the researcher, define your variables when you set up your experiment.
Your hypothesis statement is what determines whether a variable is dependent or
independent. Any variable can be and independent variable(IV) or dependent
variable(DV). For example, let’s say you are interested in studying the health benefits of
walking. You write the following two hypothesis statements:
1. A more nutritious diet leads to more daily walking.
2. More daily walking leads to increased happiness.
Both of the statements above are valid (assuming they correctly describe what you are
trying to test with your experiment). However, walking is the DV in statement 1 and
the IV in statement 2.

Example: The Brain as both Dependent and Independent Variables


Much research has been conducted in the past that treats the brain as an IV. For
example, the brain has a direct effect on behavior. However, more recent research has
shown that the brain can also be a DV. for example, biofeedback is a type of learned
behavior that helps you to control stress responses, like heart rate and muscle tension.
The behavior makes subtle (and possibly permanent) changes in the brain. With
biofeedback, the brain is the dependent variable, as it depends upon the behaviors
practiced during biofeedback sessions. Although this is another example of how
confusing the definition of an IV or DV can be, it also highlights how important it is to
craft a good hypothesis statement for your experiment. Remember: the outcome of
your experiment (i.e. your dependent variable) depends on how well you craft your
hypothesis statement!

Back to Top

Dependent Variable Examples.


Example 1: A study finds that reading levels are affected by whether a person is born
in the U.S. or in a foreign country. The IV is where the person was born and the DV is
their reading level. The reading level depends on where the person was born.

Example 2: “In nonexperimental research, where there is no experimental


manipulation, the IV is the variable that ‘logically’ has some effect on a DV. For example,
in the research on cigarette-smoking and lung cancer, cigarette-smoking, which has
already been done by many subjects, is the independent variable.” (Kerlinger, 1986,
p.32) Lung cancer “depends” on smoking.

Tip: If you have trouble figuring out which of your variables is the independent one, and
which is the dependent one, try inserting the variables into the following sentence:

“(Independent variable) causes a change in (Dependent Variable) and it isn’


t possible that (Dependent Variable) could cause a change in (Independent
Variable).”
Taking the two examples above, see how illogical it sounds to switch the places of the IV
and DV in the bolded statements:
1. Where a person is born depends on their reading level.
2. Smoking “depends” on lung cancer.
Like most things in life though, if only it was that easy. Sometimes, it doesn’t work just
to switch the phrase around to see if it works or not. Take the following two examples:

Example 3: A researcher studies how different drug doses affect the progression of a
disease and compares the intensity and frequency of symptoms when different doses
are given. The IV is the dose given and the DV is the intensity and frequency of
symptoms. The intensity and frequency of symptoms “depends” on the dose of drug
given.

Example 4: You are studying how tutoring affects SAT scores. Your independent
variable(IV) is tutoring and the dependent variable(DV) is test scores. The test scores
“depend” on the tutoring.

Switching them around also (sort of) makes sense:

 Dose of drug given depends on the intensity and frequency of symptoms.


 Tutoring depends on test scores.
That said, if you know what the hypothesis statement is–in other words, you know
what is being tested–then you can decide which of the two versions make sense. This is
one reason why it’s vital to craft a very clear hypothesis statement.

Back to Top

Dependent Variable Definition


(Statistical Modeling)
Statistical modeling is where you develop a model that fits a set of observed data. The
definition for the dependent variable(DV) in statistical modeling is essentially the same
basic definition as the one used in general math and science: it’s a variable that
“depends” on the independent variable(IV). However, instead of a hypothesis statement,
you have a model that contains both variables. The DV represents the model’s output or
outcome that you are studying. It is usually given the letter “y” and is traditionally
graphed on the y-axis. The IV represents the potential causes for variation in the model.
It is usually given the letter “x” and is graphed on the x-axis.
Polynomial regression results in a curved line. The dependent variable is graphed
on the y-axis.

The dependent variable is also called a response variable or endogenous variable in


statistical modeling.
Back to Top

Test Your Understanding.


For each question, choose the dependent variable. A tip for completing this quiz is first
choose the two main variables from the statement. Then figure out which one is the DV
(it’s the one that depends on the other one).

Q1: You are conducting an experiment to see if exposure to more sunlight increases
happiness levels for workers who typically spend the entire day in windowless offices.

1. Sunlight.
2. Happiness level.
3. Windowless offices.
4. Time of day.
Click here for the answer.

Q2: An experiment in a climate-controlled greenhouse concludes that water level,


fertilizer and nutrient level in soil affects how tall plants grow. Plants grew an average
of 12″ taller if treated with optimal resources.

1. The greenhouse.
2. Water level, fertilizer and nutrient levels.
3. How tall the plants grow.
4. Optimal resources.
Click here for the answer.

Q3: A researcher suspects that a cholera outbreak is happening because of tainted wells
in the city. Most of the cases are clustered around public wells that draw their water
from the underground aquifer.

1. The underground aquifer.


2. Cholera.
3. Wells.
4. The City.
Click here for the answer.

Q4: Studies have shown that condom use is effective in controlling the spread of HIV.
However, studies also show that a combination of two HIV medications (tenofovir and
emtricitabine) can also control the spread of the disease.

1. Tenofovir.
2. Emtricitabine.
3. Both 1 and 2.
4. HIV.
Click here for the answer.

Original map by John Snow showing the clusters of cholera cases in the London
epidemic of 1854
Solution to Q1:

Q1: The correct answer is 2, happiness level. Happiness levels depend upon the amount
of sunlight. If you try any of the other combinations, none make sense in the statement
“x depends on y.” For example, “sunlight depends on happiness” doesn’t make a whole
lot of sense. Plus, the clue was in the hypothesis statement itself (exposure to more
sunlight increases happiness). Back to Quiz.

Solution to Q2:

Q2: The correct answer is 3, how tall the plants grow (how tall the plants grow
depends on the resources used). Back to Quiz.

Solution to Q3:

Q3: The correct answer is 2, cholera. The cholera outbreak depends upon (i.e. is a result
of) the polluted water supply from the aquifer. Back to Quiz.

Solution to Q4:

Q4: The correct answer is 4, HIV. Controlling the spread of HIV depends upon condom
use and the medications listed.Back to Quiz.

Back to Main Contents.

Dependent Variables in Psychology.


“In psychology studies, the dependent variable is usually a measurement of some aspect
of the participants’ behavior. The IV is called independent because it is free to be varied
by the experimenter. The DV is called dependent because it is thought to depend (at
least in part) on the manipulations of the IV.” (Weiten, 2013)

Put another way, the dependent variable is the variable that is being measured by you,
the experimenter. In psychology, the DV is often a score of some type. For example, a
score on memorization task, an IQ test, or a depression scale.

Multiple Dependent Variables.


It’s common in psychology to investigate multiple dependent variables at the same time.
Research can be a difficult process to set up–from gathering participants to obtaining
funding and permissions–so making your research as broad as possible has many
benefits. Researchers Simone Schnall and colleagues investigated how feeling disgust
affected the harshness level of people’s moral judgment. The harshness of moral
judgment was the DV, but several other DVs were measured, like how disgust affected
people’s willingness to dine at a restaurant.

Back to Top.

Dependent Variables in Contingency


Tables.
A contingency table is a way to summarize the relationship between several categorical
variables. The word “contingency” here means the same as “dependent,” so what the
table does is organize your dependent data. With contingency tables, the DV is usually
placed in rows and the IV is usually placed in columns.

A simple contingency table. Image: Michigan Dept. of Agriculture.

For example, let’s say you were investigating how health is affected by age,
socioeconomic status, or heart disease. The independent variables (i.e. age 0-18,
18-64, 65+) are placed in the columns. Health (perhaps measured on a scale from 1
to 10 with 10 being the best) is placed in the rows. Placing your data using this
standardized format makes it easier to interpret results.
Back to Top.

Other Names for the Dependent


Variable.
A dependent variable is also called:

 An experimental variable.
 An explained variable.
 A measured variable.
 An outcome variable.
 An output variable.
 A responding variable.
 A regressand (in regression analysis.)
 A response variable.
Back to Top.

Outcome variable.

What is an Outcome variable?


The outcome variable and dependent variable are used synonymously. However, they
are not exactly the same: the outcome variable is defined as the presumed effect in
a non-experimental study, where the dependent variable is the presumed effect in
an experimental study1.

Experimental vs. Non-experimental


Studies.
In an experimental study, the researcher controls the allocation of resources to study
participants. A non-experimental study is more like an observational study; the
researcher takes a look at what the participants are exposed to and then categorizes
the individuals based on those exposures. Data registries and case studies are two
examples of non-experimental; studies.

A simple example: let’s say you were interested on whether snack foods improved test
scores. In an experimental study you could separate students into two groups, feed one
group snacks while taking a test and deny the other group (the control group) access to
food. In the non-experimental case, you would find a group of students (say, in an
entire college) and separate the students into those who eat snacks during a test and
those who do not. You could then observe their performance on a test.

Measuring the Outcome Variable.


As outcome variables are involved in non-experimental studies, it’s practically
impossible to put a numeric value on an outcome. Instead, non-numeric techniques are
used2:

 Expert opinion.
 One or more case reports.
 Program evaluations. These are studies designed to see whether a program is
meeting its goals.
 Quality improvement methods (Plan-Do-Study-Act), used to measure or
redefine standards.
 Case control studies; performed after an event has happened. Data is gathered
and the researcher attempts to find the cause based on this historical data.
 Cohort studies: similar to case control but the participants are gathered before
any event has happened. For example, a group of 1,000 people age 40-50 might
be studied for 10 years to see who develops heart disease.
Hypothesis Testing
In statistics, during a statistical survey or a research, a hypothesis has to be set and defined. It is
termed as a statistical hypothesis It is actually an assumption for the population parameter. Though,
it is definite that this hypothesis is always proved to be true. The hypothesis testingrefers to the
predefined formal procedures that are used by statisticians whether to accept or reject the
hypotheses. Hypothesis testing is defined as the process of choosing hypotheses for a particular
probability distribution, on the basis of observed data.

Hypothesis testing is a core and important topic in statistics. In the research hypothesis testing, a
hypothesis is an optional but important detail of the phenomenon. The null hypothesis is defined as
a hypothesis that is aimed to challenge a researcher. Generally, the null hypothesis represent the
current explanation or the vision of a feature which the researcher is going to test. Hypothesis
testing includes the tests that are used to determine the outcomes that would lead to the rejection of
a null hypothesis in order to get a specified level of significance. This helps to know if the results
have enough information, provided that conventional wisdom is being utilized for the establishment
of null hypothesis.

A hypothesis testing is utilized in the reference of a research study. Hypothesis test is used to

Related Calculators

1 Sample T Test Anova Test Calculator

Calculate F Test Chi Square Test Calculator

evaluate and analyze the results of the research study. Let us learn more about this topic.

What is Hypothesis Testing?


Back to Top

Hypothesis testing is one of the most important concepts in statistics. A statistical hypothesis is an
assumption about a population parameter. This assumption may or may not be true. The
methodology employed by the analyst depends on the nature of the data used and the goals of the
analysis. The goal is to either accept or reject the null hypothesis.

Hypothesis Testing Terms


Back to Top
Given below are some of the terms used in hypothesis testing :

1. Test Statistic

The decision, whether to accept and reject the null hypothesis is made based on this value. The
test statistic is a defined formula based on the distribution t, z, F etc. If the calculated test statistic
value is less than the critical value, we accept the hypothesis, otherwise, we reject the hypothesis.

Hypothesis Testing Formula

z test statistic is used for testing the mean of the large sample. The test statistic is given by

zz = x¯−μσn√x¯−μσn

where, x¯x¯ is the sample mean, μμ is the population mean, σσ is the population standard
deviation and n is the sample size.

2. Level of Significance

The confidence at which a null hypothesis is accepted or rejected is called level of significance. The
level of significance is denoted by αα

3. Critical Value

Critical value is the value that divides the regions into two-Acceptance region and rejection region.
If the computed test statistic falls in the rejection region, we reject the hypothesis. Otherwise, we
accept the hypothesis. The critical value depends upon the level of significance and alternative
hypothesis.

4. One Sided or Two Sided Hypothesis

The alternative hypothesis is one sided if the parameter is larger or smaller than the null hypothesis
value. It is two sided when the parameter is different from the null hypothesis value. The null
hypothesis is usually tested against an alternative hypothesis(H1). The alternative hypothesis can
take one of three forms:

1. H1: B1 > 1, is one-sided alternative hypothesis.


2. H1: B1 < 1, also a one-sided alternative hypothesis.
3. H1: B1 ≠≠ 1, is two-sided alternative hypothesis. That is, the true value is either greater or
less than 1.

5. P - Value

The probability that the statistic takes a value as extreme or more than extreme assuming that the
null hypothesis is true is called P- value. The P-value is the probability of observing a sample
statistic as extreme as the test statistic, assuming the null hypothesis is true. The P value is the
probability of seeing the observed difference, or greater, just by chance if the null hypothesis is true.
The larger the P value, the smaller will be the evidence against the null hypothesis.
Hypothesis Benefits and Process
Back to Top

A hypothesis testing gives the following benefits

1. They establish the focus and track for a research effort.


2. Their development helps the researcher shape the purpose of the research movement.
3. They establish which variables will not be measured in a study and similarly those, which
will be measured.
4. They need the researcher to contain the operational explanation of the variables of
interest.

Process of Hypothesis Testing

1. State the hypotheses of importance


2. Conclude the suitable test statistic
3. State the stage of statistical significance
4. State the decision regulation for rejecting / not rejecting the null hypothesis
5. Collect the data and complete the needed calculations
6. Choose to reject / not reject the null hypothesis

Errors in Research Testing:

It is common to make two types of errors while drawing conclusions in research:

Type 1: When we recognize the research hypothesis and the null hypothesis is supposed to be
correct.

Type 2: When we refuse the research hypothesis even if the null hypothesis is incorrect.

Purpose of Hypothesis Testing


Back to Top

Hypothesis testing begins with the hypothesis made about the population parameter. Then, collect
data from appropriate sample and obtained information from the sample is used to decide how
likely it is that the hypothesized population parameter is correct. The purpose of hypothesis testing
is not to question the computed value of the sample statistic but to make a judgement about the
difference between two samples and a hypothesized population parameter.

Hypothesis Testing Steps


Back to Top

We illustrate the five steps to hypothesis testing in the context of testing a specified value for a
population proportion. The procedure for hypothesis testing is given below :

1. Set up a null hypothesis and alternative hypothesis.


2. Decide about the test criterion to be used.
3. Calculate the test statistic using the given values from the sample
4. Find the critical value at the required level of significance and degrees of freedom.
5. Decide whether to accept or reject the hypothesis. If the calculated test statistic value is
less than the critical value, we accept the hypothesis otherwise we reject the hypothesis.

Different Types of Hypothesis:


There are 5 different types of hypothesis as follows:

1) Simple Hypothesis

If a hypothesis is concerned with the population completely such as functional form and the
parameter, it is called simple hypothesis.

Example:

The hypothesis “Population is normal with mean as 15 and standard deviation as 5" is a simple
hypothesis
2) Composite Hypothesis or Multiple Hypothesis

If the hypothesis concerning the population is not explicitly defined based on the parameters, then it
is composite hypothesis or multiple hypothesis.

Example:

The hypothesis “population is normal with mean is 15" is a composite or multiple hypothesis.

3) Parametric Hypothesis

A hypothesis, which specifies only the parameters of the probability density function, is called
parametric hypothesis.

Example:

The hypothesis “Mean of the population is 15" is parametric hypothesis.


4) Non Parametric Hypothesis

If a hypothesis specifies only the form of the density function in the population, it is called a non-
parametric hypothesis.

Example:

The hypothesis "population is normal" is non - parametric.

5) Null and Alternative Hypothesis

A null hypothesis can be defined as a statistical hypothesis, which is stated for acceptance. It is
the original hypothesis. Any other hypothesis other than null hypothesis is called Alternative
hypothesis. When null hypothesis is rejected we accept the alternative hypothesis. Null hypothesis
is denoted by H0 and alternative hypothesis is denoted by H1.

Example:

When we want to test if the population mean is 30, then null hypothesis is “Population mean is 30''
and alternative Hypothesis is “Population mean is not 30".
Logic of Hypothesis Testing
Back to Top

The logic of hypothesis testing is similar to the "presumed innocent until proven guilty". In
hypothesis testing, we assume that the null hypothesis is a possible truth until the sample data
conclusively demonstrate otherwise. A hypothesis test is a statistical method that uses sample data
to evaluate a hypothesis about a population.

The logic underlying the hypothesis testing procedure as follow:

1. The hypothesis concerns the value of a population parameter.


2. Before select a sample, we use the hypothesis to predict the characteristics that the
sample should have.
3. Obtain the random sample from the population.
4. At last compare the obtained sample data with the prediction made from the hypothesis.
Hypothesis is reasonable if the sample mean is consistent with the prediction otherwise
hypothesis is wrong.

Type I Error and Type II Error


Back to Top

The probability of rejecting the null hypothesis, when it is true, is called Type I error whereas the
probability of accepting the null hypothesis is called Type II error. Probability of Type II error is
denoted by ββ.

Example:

Suppose a toy manufacturer and its main supplier agreed that the quality of each shipment will
meet a particular benchmark. Our null hypothesis is that the quality is 90%. If we accept the
shipment, given the quality is less than 90%, then we have committed Type I error. If we reject the
shipment, given the the quality is greater than 90%, we have committed Type II error.

Power of the Test


Power of a test is defined as the probability that the test will reject the null hypothesis when the
alternative hypothesis is true.
For a fixed level of significance, if we increase the sample size, the probability of Type II error
decreases, which in turn increases the power. So to increase the power, the best method is to
increase the sample size.

1. Only one of the Type I error or the Type II error is possible at a time.
2. The power of a test is defined as 1 minus the probability of type II error. Power = 1−β1−β.

Hypothesis Testing Procedure


Back to Top

There are five important steps in the process of hypothesis testing: -

Step 1: Identifying the null hypothesis and alternative hypothesis to be tested.

Step 2: Identifying the test criterion to be used


Step 3: Calculating the test criterion based on the values obtained from the sample

Step 4: Finding the critical value with required level of significance and degrees of freedom

Step 5: Concluding whether to accept or reject the null hypothesis.

Multiple Hypothesis Testing


Back to Top

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be
tested simultaneously for statistical significance. Multiple hypothesis testing occurs in a vast variety
of field and for a variety of purposes. Testing of more than one hypothesis is used in many field and
for many purposes.

An alternate way of multiple hypothesis testing is multiple decision problem. When considering
multiple testing problems, the concern is with Type 1 errors when hypothesis are true and type 11
errors when they are false. The evaluation of the procedures is based on criteria involving balance
between these errors.

Bayesian Hypothesis Testing


Back to Top

Bayesian involves specifying a hypothesis and collecting evidence that support or does not support
the statistical hypothesis. The amount of evidence can be used to specify the degree of belief in a
hypothesis in probabilistic terms. The probability of supporting hypothesis can become vary high or
low. Hypothesis with a high probabilistic terms are accepted as true, and with low are rejected as
false.

Bayesian hypothesis testing works just like any other type of Bayesian inference. Let us consider
the case where we are considering only two hypotheses, H1H1 and H2H2

The probabilities P(H1H1 | x⃗ x→) and P(H2H2 | x⃗ x→ ),

P(H1H1|x⃗ x→) = P(x⃗ |H1)P(H1)P(x⃗ )P(x→|H1)P(H1)P(x→)

P(H2H2|x⃗ x→) = 1 − P(H1H1 | x⃗ x→)

The probability of our data P(x⃗ x→) takes into account the possibility of each hypothesis under
consideration to be true:

P(x⃗ x→) = P(x⃗ x→ | H1H1)P(H1H1) + P(x⃗ x→ | H2H2)P(H2H2)

Level of Significance in Hypothesis Testing


Back to Top

The hypothesis testing follows the following procedure:

 Specify the null and alternative hypotheses


 Specify a value for αα
 Collect the sample data and determine the weight of evidence for rejection the null
hypothesis.

This weight is given in the terms of probability, is called the level of significance(p value) of the
statistical test. The level of significance is the probability of obtaining a value of the statistic that is
likely or reject H0H0 as the actual observed value of the test statistic, assuming that null
hypothesis is true.

If the level of significance is a small value, then the sample data fail to support null hypothesis and it
reject H0H0. If the level of significance is a large value, then we fail to reject null hypothesis.

Hypothesis Testing Example


Back to Top

Given below are some of the examples on hypothesis testing.

Solved Example

Question: XYL Company, with a very small turnover, is taking feedback on permanent
employees. During the feedback process, it was found that the average age of XYL
employees is 20 years. The relevance of the data was verified by taking a random sample of
hundred workers and the common age turns out as 19 years with a standard deviation of 02
years. Now XYZ should continue to make its claim, or it should make changes?
Solution:

1. Specify the hypothesis


H0 = 20 (twenty) years
H1 = 20 (twenty) years

2. State the Significance Level: Since the company would like to maintain its present
message to new human resources, XYZ selects a fairly weak significance
level(αα = 0.5). Because this is a two-tailed analysis, half of the alpha will be
assigned to every tail of the allocation. In this condition the important values of Z =
+1.96 and -1.96.

3. Specify the decision rule: If the calculated value of Z geqgeq 1.96 or

Z leqleq -1.96, the null hypothesis will be rejected.

Вам также может понравиться