Where Did Your Cut Score Come From?: A Primer in Standard Setting

Where Did Your Cut Score Come From?
: A Primer in Standard Setting

Dr. Susan Gracia Click to edit Master subtitle style Director of Assessment Simmons College Boston, MA
March 27, 2012
Cut Score
A selected point on the score scale of a test. The cut score separates students into various categories, such as a passing and failing. The process of setting a cut score is called standard setting, which is a multi-stage, judgmental process.
March 27, 2012 22
Nomenclature

Cutoff scores Achievement levels Mastery levels Classification scores Passing scores Criterion levels Performance levels Cut points Performance Cut scores standards Standards Cutscores
March 27, 2012
(ETS, 2005)
Cutting scores
33
Determine the Need for Cut Scores
Why is there a need to set cut scores? What benefits are expected from the use of cut scores? What decisions will be made on the basis of the cut scores? How are those decisions being made now in the absence of cut scores?
March 27, 2012
What reasons are there to believe 44
There is no true cut score
Theres no objective way to set cut scores. Cut scores are constructed, not found. They are based on judgments about people, student work, or test items. Judges depend on subjective, internalized norms about what people can do.
55
March 27, 2012
Errors of Classification
Some people who deserve to pass will fail. Some people who deserve to fail will pass. Raising or lowering the cut score to reduce one type of error will increase the other type of error. Theres no way to prove that a cut score is correct.
66
March 27, 2012
To Do for All Methods

1. 2. 3.
Select judges. Teach judges about cut scores. Define borderline knowledge/skills for a particular task/assessment.
1.
Borderline = Minimally competent
4.
Train judges in the use of the method. Implement the cut score study. Document the results.
77
5. 6.
March 27, 2012
Judges
Judges need to:
Be qualified Know subject and population Be representative and diverse Be acceptable to stakeholders Be willing to follow procedures
Teach them about test purpose, cut score purpose, consequences of passing March 27, 2012 and failing 88
Defining Borderline
1.
Make sure judges understand what the assessment measures and how scores will be used.
2.
Ask judges to describe in their own words a person whose knowledge and skills would represent the borderline between acceptable and unacceptable levels of knowledge/skills the assessment measures. March 27, 2012 99
Examples of Borderlines
Worst reader who should get a diploma / best reader who should not Worst surgeon who should be licensed / best surgeon who should not Worst essay that deserves a score of 6/ best essay that deserves a 5 Highest bacteria count in safe water / March 27, 2012 1010 lowest bacteria count in polluted
Judge Training
Overview of the standard setting method: What is it? Why do it? How is it done? Practice using the standard setting method. Observe judges. Correct errors. Answer all questions.
27, 2012 1111 Practice more until all judges are
March
Selecting a Method

Theres no one best method. Different methods will result in different cut scores. Some methods are better for certain types of assessments and assessment situations.
March 27, 2012
1212
3 Standard Setting Methods

Methods Based on Best judgments about: Test items Test that is only or primarily multiple choice Modified Angoff Method
Contrasting Groups Method
People
Small assessment situations where faculty have strong knowledge of students abilities
Body of Work Method March 27, 2012
Student work
Performance tasks 1313
Modified Angoff Method
Judges examine each question on an assessment and estimate the probability that a borderline student would answer the question correctly.
Or: Imagine a group of 100 borderline students and estimate how many of them would answer the question correctly.
March 27, 2012
Probabilities will range from .00 to 1.00.

1414
Modified Angoff Method
Discuss probabilities. Aim for range of 10-15 % points per question. Judges can change judgments based on discussion. Sum each judges probabilities for all questions to get a recommended cut score for each judge. Average the judges recommended cut scores to arrive at a average cut March 27, 2012 1515 score for minimum competency.
Modified Angoff Example
Judge 1
March 27, 2012
Source: Livingston &
1616
Modified Angoff Example

Judge 1 2 3 4 5 6 7 8 9 10 Recommended Cut Score 5.80 6.00 6.00 5.40 5.00 5.30 5.50 4.80 6.10 5.50 Average Cut Score for 5.54 Minimum Competency March 27, 2012 1717
Modified Angoff
A lot Cons Most researched Pros of data entry method Difficult Stands up to cognitive task court challenges Does not require work well with heavily student data so openit can be carried ended/performa out prior to test nce-based tests administration
1818
March 27, 2012
Contrasting Groups Method
Faculty consider all they know about a population of students. Faculty predict individual students level of performance on an assessment (e.g., expected passing/failing, proficient/nonproficient) without reference to scores. Obtain assessment scores.
March 27, 2012
Predictions are compared to actual
1919
Contrasting Groups Example
March 27, 2012
2020
Another Example
Graph the scores of expected passing and expected failing students in 2 separate distributions. The point at which the 2 distributions intersect is the cut score.
March 27, 2012
2121
Contrasting Groups
Inconvenient to Cons Uses Pros real data, get judgments not conjecture of people Uses external Relies on info in validationhuman judgment addition to test without scores examining actual Easy to explain test performance Need scores before cut is set 2222
March 27, 2012
Body of Work Method
Pre-work:
Performance level categories, as well as performance level descriptors for each performance level, must be established and agreed upon. Scoring of a large sample of student work must occur before standard-setting can begin.
Select 40 to 50 intact samples of student work to represent the range of Marchstudent performance on an assessment. 27, 2012 2323
Body of Work Method
Range-Finding phase:
Identical sets of student work are provided to each judge. Judges are asked to independently categorize the student work samples based on the performance level descriptors, without any discussion.
This process reveals which work samples (e.g., Graduation Portfolios) generate the most agreement and which Marchgenerate the most disagreement 27, 2012 2424
Examination of Student Work Method
Pinpointing phase:
Judges examine sets of work about which they disagreed in the rangefinding phase, along with additional work samples representing those same score intervals. Judges assign performance levels to these work samples.
The minimum score for each performance level is precisely March"pinpointed" by determining the score 27, 2012 2525
Body of Work
Need Cons Uses scores Pros real data, before cut is set not conjecture Requires a lot More intuitive;of prior panelists are not preparation, asked to imagine volumes of a hypothetical materials minimally competent Grueling work examinee or to estimate the
2626
March 27, 2012
Setting the Operational Cut Score Only a legally authorized entity (e.g., policy makers) can authorize the use a cut score. Once authorized, a study cut score becomes an operational cut score.
2727
March 27, 2012
Complying with the Standards
The Standards of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education specify that the following be included in the standard setting process:
Document how judges were selected, their qualifications and training, the procedures used, whether or not judges Marchratings were independent, the2828 27, 2012 level of
Observe Cut Score Effects

Seek opinions on cut scores. Find out what happened to people who failed.
Is there evidence that any of them were actually qualified?
Is there evidence that any people who passed are really unqualified? What were the consequences of misclassification errors? March 27, 2012 2929
Comments? Questions?
March 27, 2012
3030
Discussion Questions
How could you use standard setting in your setting? Which standard setting approach might you utilize? Why? What challenges do you anticipate in implementing this approach? How might you address these challenges?
March 27, 2012 3131
References
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. ( Standards for 1111). Educational and Psychological Testing. Washington, DC: American Psychological Association . Cizek, G.J. 1111 ). An NCME instructional module on setting pass scores. ( Educational Measurement: Issues and Practice (1 111 , 11 ), -1 . Cizek, G.J., Bunch, M.B., Koons, H. (W inter 1111 ). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice-1 . , 111 Horn, C., Ramos, M., Blumer, I. & Madaus, G. 1). Cut scores: Results may vary. 1 (1 1 National Board on Educational Testing and Public Policy Monographs, 1(1 ). Chestnut Hill, MA: Boston College. Livingston, S.A. & Zieky, M.J.(1111 . ). Excerpts from Passing Scores: A Manual for Setting Standards of Performance on Education and Occupational . Tests Princeton,NJ: Educational Testing Service. Measured measures: Technical considerations for developing a local assessment system. (1111 ). Augusta, ME: Maine Department of Education. Pitoniak, M (1111 . ).Considerations in Setting Performance Standards ( Cutscores) . Training session at 1111 National Council for Measurement in Education conference, Montreal, Canada.
March 27, 2012
3232

Where Did Your Cut Score Come From?: A Primer in Standard Setting

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Where Did Your Cut Score Come From?: A Primer in Standard Setting

Загружено:

Авторское право:

Доступные форматы

Where Did Your Cut Score Come From?

: A Primer in Standard Setting

March 27, 2012

Determine the Need for Cut Scores

What reasons are there to believe 44

There is no true cut score

March 27, 2012

March 27, 2012

To Do for All Methods

Borderline = Minimally competent

March 27, 2012

Judges need to:

March 27, 2012

3 Standard Setting Methods

Contrasting Groups Method

Body of Work Method March 27, 2012

Performance tasks 1313

Modified Angoff Method

March 27, 2012

Probabilities will range from .00 to 1.00.

Modified Angoff Method

Modified Angoff Example

March 27, 2012

Source: Livingston &

Modified Angoff Example

March 27, 2012

Contrasting Groups Method

Predictions are compared to actual

Contrasting Groups Example

March 27, 2012

March 27, 2012

March 27, 2012

Body of Work Method

Body of Work Method

Examination of Student Work Method

March 27, 2012

March 27, 2012

Complying with the Standards

Observe Cut Score Effects

Is there evidence that any of them were actually qualified?

March 27, 2012

March 27, 2012

Вам также может понравиться