Ch4reliabilityvalidity 100226115436 Phpapp01

Reliability and Validity
Scales clarify the characteristics of measurement processes Scales indicate which statistical procedures are appropriate
Nominal Categories without order Colors, gender, political party, nationality
Ordinal Categories with order Size (S,M,L), Social class, Agreement (strong, some, low, none)
Interval Distance is meaningful between categories Temperature, ACT scores, shoe size, IQ
Ratio Scale of categories has absolute zero Age, income, all rates and percents, vacation time
Levels of Measurement: Learn them by playing the game!
Does my measurement procedure give the same accurate measurement each time it is used?
Reliability is
consistency in measurement
Does this procedure or
test yield the same results if you repeat the measurement, so long as conditions have not changed? How do we know?
Stern Tone Variator, from The Archives of the History of American Psychology
Validity is truth
in measurement
Does this procedure or
test actually measure the construct or dimension it is intended to measure? How do we know?
Lavery Psychograph, from The Archives of the History of American Psychology
Reliable: pattern shows the
shot hits the same part of the target each time: it is consistent, so it is reliable. Not valid. The goal is to hit the center of the target, but the shots are not in that area.
Valid because the pattern is
evenly distributed around the correct goal (center): the person probably tried to hit the correct place. Not reliable because the shots are off the mark in every possible direction; they are not consistent.
Not reliable because the
shots are not tightly clustered together; they are not consistent.
Not valid because, to the
extent there is any pattern, it is not at the true target, the center.
Reliable: the darts land
close together. The red player can reliably hit the same part of the target. Valid: the darts are clustered at the center, where they were aimed.
Bullseye!! by modenadude at http://www.flickr.com/photos/modenadude/3280286776
All of our research uses data.

Data is gathered through measurement procedures
The scores only have meaning if they measure what they are
supposed to measure (valid) and do so with accuracy and consistency (reliability).

Evaluating whether data are reliable and valid is a key
element in applying research findings.

We use statistical techniques in evaluating the reliability
and validity of data.
Reliability
Does the value observed and
Validity
Does the value observed and
recorded accurately reflect the true value of the object?

Test by measuring the object
recorded reflect the concept and dimension of interest?

Test by comparing with other
multiple times or ways. Every researcher must either use a known instrument, or test and demonstrate the reliability of a new tool.
The Literature Search is a
data or similar processes. Every researcher must either use a known instrument, or test and demonstrate the validity of a new tool.
The Literature Search is a
huge labor saving device Using a known instrument improves research quality
huge labor saving device Using a known instrument improves research quality.
Unreliable measurement
tools introduce error Reliability improves with new tool or method Test-Retest is the simplest way to assess reliability Can be used whenever the process of measuring will not, by itself, affect data Test with Correlation of first with second scores.
Reliable measurements allow researchers to test their
theories and hypotheses. The more error in the data whether random or systematic the less likely they are to find true and significant results. Researchers in psychology and human service fields spend months and years to develop a set o questions or observations that has high reliability.
Meaning of questions is
unclear or produces random answers. Raters not adequately trained on method of making rating. Some of the questions or items measure a subtly different dimension, they dont go with the others. Instructions may be unclear or inconsistent, even if the test questions are fine. Outside events may be having an effect.
Testing situation requires
the use of different forms of the same tool

School & licensing exams Learning occurs in testing,
so it cant be repeated
Test by computing
correlation of two or more forms, taken under same circumstances.
Abstract or complex
dimensions cant be measured directly Several related items or observations are more likely to get an accurate result Need to verify that all the items relate to the same dimension Cronbachs alpha () is commonly reported.
Interpret strength on same
scale as correlation
Observation process is
skilled and requires individual judgment. Trained researchers make observations of the same subject independently. Test with Correlation of ratings (interval or ratio) or percent agreement (nominal or ordinal)
Taking In the View by Randy Son of Robert at http://www.flickr.com/photos/randysonofrobert/2384256036/
Measurement process
needs accuracy across all possible outcomes. Ceiling effect: process cannot measure extreme high scores Floor effect: process cannot measure extreme low scores Both are a scale attenuation problem.
Taking In the View by Randy Son of Robert at http://www.flickr.com/photos/randysonofrobert/2384256036/
The real world is messy Not always clear how to
math problems for girls by woodleywonderworks at http://www.flickr.com/photos/wwworks/3597217248/
sort out all the reliability questions Reliability issues are intertwined with validity. Learn by seeing what competent researchers do.
Click to listen
W. Andrew Harrell describes his study
Two controversies are connected to Dr. Harrells study:
Do physical traits such as
beauty have an evolutionary impact (i.e., people are more likely to have children to pass along their genes)?
Isnt beauty a subjective
judgment, not a trait that can be objectively measured in a research study.

University Of Alberta (2005, April 13). Researchers Show Parents Give Unattractive Children Less Attention. ScienceDaily. Retrieved July 25, 2009, from http://www.sciencedaily.com /releases/2005/04/050412213412.htm
Dr. Harrell addresses both of those questions in this 4 minute audio clip.
Direct observation of seat
belt safety Previously validated measures of beauty Two trained researchers evaluate attractiveness (inter-rater reliability) Two different trained researchers observe safety (inter-rater reliability and avoiding bias)
All researchers should use identical instructions.
A larger number of items (questions) will provide a
more stable measure of a complex dimension Some questions will be eliminated because they evoke varied and inconsistent responses The items need to cover the entire range of the dimension in order to observe the extreme values Reliability may change if observations are made in very different populations or situations (e.g., college students vs. seniors in assisted living).
Does my measurement procedure give a measurement of the construct or variable that I intend? Or is it measuring something else?
Validity is an over-arching concern of research

Measurement are the observations directly and truly
linked to the dimension or concept claimed? Research design how well does the experiment or study control the situation so that we are confident that the relationships or results observed were due to the impact of the independent variable?
In this course, we consider validity in measurement now
and validity in research design later.
Face validity
Content validity Are all aspects of the dimension or concept covered? Are any aspects over- or under-emphasized? Does the measure differentiate this dimension from other similar ones?
Improving content validity Thorough search of the literature Consult with experts who disagree with your perspective
Manuscripts and checklists by Muffett at http://www.flickr.com/photos/calliope/173797447/
A measure is valid if it has a strong relationship to an external
criterion
A music audition is a valid measure if it selects the better players
over those with less ability (concurrent validity). The GRE is a valid measure if people who do well on the GRE succeed in graduate school (predictive validity) It is often as hard to demonstrate the link between the test and the criterion as between the test and the dimension.
GRE flashcards by NEPMET at http://www.flickr.com/photos/blahman/2168064272/
The dimension to be measured is a construct, an abstract
idea related to a group of interrelated variables. The construct itself might be socially constructed
classic studies in obedience
are being re-interpreted. some cultures lack a word or idea for schizophrenia
Researchers make their case for
Milgrams obedience experiment
validity, but must be open to reconsideration.
Incorrect theory produced
this psychograph. It yielded mostly random data.

The procedure measures a
dimension, but not the one intended by the researcher.

The procedure measures a
dimension, but there is more than one interpretation of its meaning.
Lavery Psychograph, from The Archives of the History of American Psychology
Both reliability and validity are measured on a continuum,
evaluated in terms of degrees. If a measurement has very low reliability, it cant be valid because it is not even accurate. Maximum validity is the square root of reliability.
Validity Reliability
Especially in inferential statistics when relationships
are tested, the analyst must remember to examine the quality of measurement.

Ch4reliabilityvalidity 100226115436 Phpapp01

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ch4reliabilityvalidity 100226115436 Phpapp01

Загружено:

Авторское право:

Доступные форматы

Reliability and Validity

Nominal Categories without order Colors, gender, political party, nationality

Levels of Measurement: Learn them by playing the game!

Reliable: pattern shows the

Valid because the pattern is

Not reliable because the

Reliable: the darts land

Bullseye!! by modenadude at http://www.flickr.com/photos/modenadude/3280286776

All of our research uses data.

supposed to measure (valid) and do so with accuracy and consistency (reliability).

element in applying research findings.

and validity of data.

recorded accurately reflect the true value of the object?

recorded reflect the concept and dimension of interest?

Reliable measurements allow researchers to test their

Testing situation requires

the use of different forms of the same tool

correlation of two or more forms, taken under same circumstances.

Taking In the View by Randy Son of Robert at http://www.flickr.com/photos/randysonofrobert/2384256036/

The real world is messy Not always clear how to

math problems for girls by woodleywonderworks at http://www.flickr.com/photos/wwworks/3597217248/

W. Andrew Harrell describes his study

Two controversies are connected to Dr. Harrells study:

Do physical traits such as

Isnt beauty a subjective

judgment, not a trait that can be objectively measured in a research study.

Direct observation of seat

All researchers should use identical instructions.

A larger number of items (questions) will provide a

Validity is an over-arching concern of research

and validity in research design later.

Manuscripts and checklists by Muffett at http://www.flickr.com/photos/calliope/173797447/

A measure is valid if it has a strong relationship to an external

GRE flashcards by NEPMET at http://www.flickr.com/photos/blahman/2168064272/

The dimension to be measured is a construct, an abstract

validity, but must be open to reconsideration.

Incorrect theory produced

this psychograph. It yielded mostly random data.

dimension, but not the one intended by the researcher.

dimension, but there is more than one interpretation of its meaning.

Lavery Psychograph, from The Archives of the History of American Psychology

Both reliability and validity are measured on a continuum,

Вам также может понравиться