Вы находитесь на странице: 1из 28

THEORY OF MEASUREMENT:

Everything You Wanted To Know About Classroom Assessment But Were Afraid To Ask

Alexander Beaujean William Shiu


Baylor Psychometric Laboratory
http://homepages.baylor.edu/psychometric_lab

2008 Teaching Colloquy Department of Religion

TABLE OF CONTENTS
Definitions Test Design Test Score Properties: Reliability and Validity Cognitive Processes Some Item Types Developing the Test Take Home Message

2008 Teaching Colloquy, Department of Religion

Definitions

2008 Teaching Colloquy Department of Religion

DEFINITIONS

Test
(noun) Etymology:

Middle English, vessel in which metals were assayed [analyzed], potsherd, from Anglo-French test, tees pot, Latin testum earthen vessel; akin to Latin testa earthen pot, shell

Definition:
(1): a procedure, reaction, or reagent used to identify or characterize a substance or constituent (2): something (as a series of questions or exercises) for measuring the skill, knowledge, intelligence, capacities, or aptitudes of an individual or group

(Test. (2008). In Merriam-Webster Online Dictionary. Retrieved September 26, 2008, from http://www.merriam-webster.com/dictionary/test)

2008 Teaching Colloquy, Department of Religion

DEFINITIONS

(Achievement) Test:

A collection of items or tasks used to measure a underlying construct of interest, the results (i.e., test scores) of which allows for decisions based on the construct's level

2008 Teaching Colloquy, Department of Religion

DEFINITIONS

Item:

Genesis is the first book of the Bible. T/F

Item Stem

Item Response

2008 Teaching Colloquy, Department of Religion

DEFINITIONS

Construct:
A measure of some trait/attribute/quality that is not operationally defined. A latent entity whose level and relationship with other objects (either latent or manifest) can only be inferred

Latent:

Extant, but not perceivable by bodily senses

Cronbach & Meehl (1955)

2008 Teaching Colloquy, Department of Religion

Test Design

2008 Teaching Colloquy Department of Religion

TEST DESIGN

Test Philosophy:

What will/will not your test measure ?


What construct are you hoping to makes inferences?

What is required for your test to measure that construct?

2008 Teaching Colloquy, Department of Religion

TEST DESIGN

Context

Person Ability/Trait (Construct)

Cognitive Process(es) Item Response

2008 Teaching Colloquy, Department of Religion

TEST DESIGN

Test Purpose: What information do you want to obtain from this test? and What decision(s) do you need to make from this information?

2008 Teaching Colloquy, Department of Religion

TEST DESIGN

Examinee Population:

For whom is this test intended?

2008 Teaching Colloquy, Department of Religion

TEST DESIGN

Constraints
Time to take test Platform

Paper vs. Computer security/standardization

Location

Administration
Entire Group vs. Subgroups vs. Individual

2008 Teaching Colloquy, Department of Religion

Test Score Properties: Reliability and Validity

2008 Teaching Colloquy Department of Religion

RELIABILITY

Reliability

Do the test scores measure its construct consistently? Contributors to inconsistency


Randomness (vary from examinee to examinee) Systematic (consistent for all examinees)

Effects can be innocuous or severe, depending on the: purpose of the test

2008 Teaching Colloquy, Department of Religion

RELIABILITY

Estimation
0 < reliability < 1 Published: .80-.95 Classroom: .50

Methods

Correlation between 2 administrations (of same test) Correlation among test items

Internal Consistency ()

See Frisbee (1988)

2008 Teaching Colloquy, Department of Religion

RELIABILITY

Influences on Reliability Estimates


Length Dimensionality

How many constructs is the test measuring?

Item Difficulty Item Discrimination



How likely is a response in examinees high on the construct vs. examinees low on the construct?

Heterogeneity of the examinees Student Factors (motivation, testwiseness) Time Allotment Security

2008 Teaching Colloquy, Department of Religion

VALIDITY

Validity

Are the test scores measuring the intended construct? An argument, for which you need multiple stands of evidence, e.g.:
Do they appear to measure what its intended construct? Do experts think they are measuring its intended construct? Do they have relationships with other measures that Measure the same things Measure different things Do they predict outcomes of interest? Do the tests items have a basis in the curriculum?

See AERA/APA/NCME (1999)


2008 Teaching Colloquy, Department of Religion

Cognitive Processes

2008 Teaching Colloquy Department of Religion

ASSESSMENT PROCESS
Good Classroom Assessments Flow From the Classs Instructional Objectives/Learning Outcomes And Allow Inferences About the Construct of Interest

2008 Teaching Colloquy, Department of Religion

ASSESSMENT PROCESS

Learning Objectives

Cognitive Processes

Item Responses

Test Scores

Construct

Inference

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

EVALUATION SYNTHESIS
Bloom (1956) Development/ Difficulty

ANALYSIS APPLICATION COMPREHENSION KNOWLEDGE

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 1-Knowledge
Recall information Some item stems: recall, recite, list, label, define, identify, quote, who, what, when, where, tell list, describe, relate, locate, write, find, state, name

Examples:
Define consubstantiation. Who was Constantine? When were the first Crusades? List the five points of Calvinism.

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 2-Comprehension

Understand information Some item stems: demonstrate, explain, describe, interpret, summarize, cause-effect, explain interpret, outline, discuss, distinguish, restate, translate, describe

Examples:
Why did Paul to write to the church at Philippi?
(a) Address the issue of rivals, and uphold his apostleship (b) To preserve the view of justification by faith (c) To emphasize that under salvation by Christ, Jews and Gentiles are brought together

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 3-Application
Use information Some item stems: demonstrate, apply, calculate, illustrate, show, construct, interview, solve, show use, illustrate, construct, complete, examine, classify

Example:

Translate the following into English:


2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 4- Analysis

Examine/break apart information Some item stems: explain, connect, classify, categorize, compare, analyze, distinguish, examine compare, contrast, investigate, categorize, explain, separate

Example:
Compare Platos Republic with Lenins April Theses Which of the following names of God is most different from the other three: (a) JEHOVAH (b) ELOHIM (c) KURIOS (d) DESPOTES

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 5- Synthesis
Create with information Some item stems: combine, integrate, modify, hypothesize, abstract, create, design, invent compose, predict, plan, imagine, propose, devise, formulate, conjecture

Example:

Conjecture about Stephens response to Paul ne Saul, were they to have met after Pauls Roman imprisonment.

2008 Teaching Colloquy, Department of Religion

BLOOMS TAXONOMY

Level 6- Evaluation
Combine previous information skills to make a judgment Some item stems: judge, select, choose, decide justify, debate, verify, argue, recommend, assess discuss, rate, prioritize, determine

Example:

Appraise Calvins Institutes in light of Obermans The Dawn of the Reformation. Who deserves precedence as the earliest Baptist church in North America: Roger Williams Providence church or John Clarkes Newport church. Support your answer with scholarly sources.
2008 Teaching Colloquy, Department of Religion

Some Item Types

2008 Teaching Colloquy Department of Religion

ASSESSMENT PROCESS

Learning Objectives

Cognitive Processes

Item Responses

Test Scores

Construct

Inference

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #1 TRUE-FALSE


Example: Augustine wrote The Confessions. T/F? Pros:


Convenient to write Easy to score Allows flexibility in content coverage

Cons:
Limited in cognitive processes covered Guessing Student response sets

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #1 TRUE-FALSE


Best Practice:

Make the statements as short and specific as possible One idea per statement Avoid trivial information Use positive statements instead of negative, and always avoid double negative statements Do not use opinion statements unless they are attributed to someone Length should not differ between true/false statements Approximately equal number of true/false statements

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #2 MULTIPLE CHOICE RESPONSE


Example: Who is famous for his 95 Theses?


(a) Pope Leo X; (b) Martin Luther; (c) Johann Eck

Pros:
Best Answer is more flexible than unequivocal true/false Allows different cognitive processes in item response Guessing less of a factor than T/F Easy to score

Cons:
Large amount of time to write good distracters (wrong response alternatives) Guessing is possible

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #2 MULTIPLE CHOICE RESPONSE


Best Practice:

Item stems should: (a) have autonomous meaning , (b) present as much of the item as possible, and (c) have no irrelevant material Avoid negative item stems All item responses should be grammatically compatible with their stem and of approximately equal length There should be only one correct/best answer Distracters should be plausible Avoid clues in item stem Avoid none/all of the above response options
2008 Teaching Colloquy, Department of Religion

ITEM TYPE #3 MATCHING


Example: Match the philosopher with their work:


A. B. C. D. E. Plato Aristotle Socrates Euclid Zeno _A__ 1. The Socratic Dialogues _C__ 2. None _B___3. Organon _D__ 4. The Elements _E___5. Reminiscences of Crates

Pros

Can cover much material in content domain Easy to administer Limited in cognitive processes covered Difficult to find homogenous material Difficult to develop good, plausible set of responses
2008 Teaching Colloquy, Department of Religion

Cons

ITEM TYPE #3 MATCHING


Best Practice: Use homogenous material Have an unequal numbers of stems and responses Place responses in numerical or alphabetical order Explicitly state the basis for finding a match Place all items/responses on the same page

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #4 FILL IN THE BLANK


Example: Martin Buber edited the _______, a Zionist periodical. (Die Welt) Pros:

Very, very minimal guessing Easy to construct item stems


Cons:
Must score by hand, and possibility of multiple correct responses. Assess only factual knowledge

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #4 FILL IN THE BLANK


Best Practice:
Make the item require a short, specific response Do not take items stems directly from textbooks Questions are better than incomplete statements Right or left justify the item response blanks, and make them the same size for all items Only one blank per item

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5A SHORT RESPONSE


Example: List the Beatitudes. Pros:

Can measure complex learning objectives and cognitive processes Minimizes cheating

Cons:

Scoring can be subjective Limited sampling of content

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5B ESSAY


Example: Explain how Nietzsche's notion of the will to power is a response to Schopenhauer's will to live?
(Your answer should be no longer than 2 pages, and should cite scholarly sources. It will be evaluated on your analysis of cited scholarship and the skill at which the essay is organized)

Pros:

Can help students connect related ideas


Responding can (possibly) be a learning exercise itself

Can measure complex objectives & processes

Cons:
Relies on both writing skills and content familiarity Scoring is subjective less score reliability Limited sampling of content

2008 Teaching Colloquy, Department of Religion

ITEM TYPE #5 SHORT ANSWER/ESSAY


Best Practice:

Only use for learning outcomes that require nonobjective assessment Map the questions directly onto learning objectives Inform respondents on the grading criteria (e.g., content knowledge, thought organization) Make the examinees writing task explicit Estimate the time needed for an appropriate answer Give all examinees the same (or equivalent) questions. Avoid optional questions. Outline the expected answer in advance, and Develop a rubric that allocates points in the desired manner before administering exam
2008 Teaching Colloquy, Department of Religion

Developing the Test

2008 Teaching Colloquy Department of Religion

TEST SPECIFICATIONS

Content Domain

How do topics within the content area relate to each other and how does knowledge in the area build?

Cognitive Skills/Process to Answer Item Distribution of Content Areas and Cognitive Skills Demand throughout Test

2008 Teaching Colloquy, Department of Religion

TEST SPECIFICATIONS

For Classroom Evaluations, You Want Your Tests to Map onto Your Instructional Objectives/Learning Outcomes
Instructional Objective/ Learning Outcomes I. Demonstrates Skill in Critical Thinking A. Comprehends Relevant Antecedents to Historical Events

Test Item Name Three Precipitating Events to the First Crusade

2008 Teaching Colloquy, Department of Religion

TABLE OF SPECIFICATIONS
Instructional Objectives
Knowledge Comprehension Application Analysis Synthesis Total Items

Major Content Area


Early Christian Writers in the West Luther and the Beginning of the Reformation Liberal Protestantism in Modernity Total Items 2 3 3 8

3 2 2 7

2 3 2 7

1 3 2 6

2 4 1 7

10 15 10 35

Content Weight

Objective Weight
2008 Teaching Colloquy, Department of Religion

TEST LENGTH
No correct length Depends on:

Administration time Examinees Scores needed Content coverage Item types used Desired reliability

2008 Teaching Colloquy, Department of Religion

TEST ORGANIZATION

Directions

Be explicit
Give time allowed to take test Give directions for responding Give point allocation (weighting) if different across items

Item Grouping

If there are different item types on the test:


Only if needed, group items by content area Put same items types together Within a type, place in order of simpler to more complex

2008 Teaching Colloquy, Department of Religion

TEST/ITEM SCORING

Points to Consider
Allow for partial credit? Should content areas be weighted equally? Should learning objectives be weighted equally? If a test is made of multiple subtests, is each autonomous or graded as a whole?

e.g., if Jane missing all 10 of the Liberal Protestantism in Modernity questions, but gets the other 25 items correct, can she still pass the test?

2008 Teaching Colloquy, Department of Religion

TEST/ITEM ANALYSIS

A Multiple Item Test Provides Much Information


Item difficulties (e.g., percent who passed the item)


Does they differ by content area? Does they differ by instructional objective? Are high scorers endorsing a distracter more than the correct answer? How well does an item discriminate high scorers from low scorers Is there a pattern in those items?

Distracters

Discrimination

Are there omitted items or items not reached?


Reliability Calculations & Validity Evidence


2008 Teaching Colloquy, Department of Religion

TEST/ITEM ANALYSIS

For More Information:


EDP 5340. Measurement/Evaluation Chapter 13 of: Hollis-Sawyer, Thornton, Hurd, & Condon (2008) Chapter 14 of: Linn & Miller (2005) Chapter 6 of Urbina (2004) LERTAP program [http://www.assess.com/ ]

2008 Teaching Colloquy, Department of Religion

Take Home Message

2008 Teaching Colloquy Department of Religion

TAKE HOME MESSAGE


Be Mindful In Test Construction Be Purposeful in Item Selection and Development

2008 Teaching Colloquy, Department of Religion

Questions?

2008 Teaching Colloquy Department of Religion

REFERENCES
American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education [AERA/APA/NCME]. (1999), Standards for educational and psychological testing, Washington, DC: American Psychological Association. Bloom B. S. (1956). Taxonomy of educational objectives, Handbook I: The cognitive domain. New York: David McKay Co Inc. Brennan, R. L. (Ed.) (2006), Educational measurement (4th ed.). Westport, CT: Praeger. Cronbach, L. J. & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302. Frisbie, D. A. (1988). Reliability of scores from teacher-made tests, Educational measurement: Issues and practice, 7, 25-35. [free: http://www.ncme.org/pubs/items/ITEMS_Mod_3.pdf ]

2008 Teaching Colloquy, Department of Religion

REFERENCES
Hollis-Sawyer, L., Thornton, G. C., Hurd, B. & Condon, M. E. (2008). Exercises in psychological testing (2nd ed.). Boston: Allyn & Bacon Linn, R. L. & Miller, M. D. (2005). Measurement and assessment in teaching (9th ed.). Upper Saddle River, NJ: Pearson. Urbina, S. (2004). Essentials of psychological testing. Hoboken, N.J.: John Wiley & Sons.

2008 Teaching Colloquy, Department of Religion

Вам также может понравиться