Вы находитесь на странице: 1из 34

NATIONAL INSTITUTE OF MENTAL HEALTH AND NEURO SCIENCES

(NIMHANS)

(Institute of National Importance)

CLASS PRESENTATION

Subject: Nursing Education

Topics: 1. Standardised and non standardised test

2. Criteria used selecting or developing test

3. Test construction

4. Rating scales, Checklist

5. Objective structured clinical examination

Presented to,

Dr. Sailaxmi Gandhi

Associate Professor,

NIMHANS,

Presented by,

Limiya Varghese,

1st year M.Sc (Psy) nursing

NIMHANS

1
INTRODUCTION
The written tests are divided in to two types, standardized and non standardized test. Non
standardized test is also known as teacher made test. According to Pearson Education's 2003
Assessment Report, standardized tests are Standards based tests are developed by the state
and must align to academic standards and set performance levels to identify each student into
one of 3 categories : Basic, Proficient, and Advanced. Non- Standardized and Alternative
Assessments is the traditional form of Assessment, where teachers construct questions,
evaluate student responses, assign and check homework, and informally assess student
progress every day.

Standardized test

DEFINITION

A standardized test is a test that is administered and scored in a consistent, or


"standard", manner. Standardized tests are designed in such a way that the
questions, conditions for administering, scoring procedures, and
interpretations are consistent and are administered and scored in a
predetermined, standard manner.
A standardized test is any form of test that (1) requires all test takers to
answer the same questions, or a selection of questions from common bank of
questions, in the same way, and that (2) is scored in a standard or consistent
manner, which makes it possible to compare the relative performance of
individual students or groups of students
.
TYPES OF STANDARDIZED TEST
Achievement Test
Aptitude Test
Personality Test
Intelligence Test
Prognostic Test
1) Achievement tests are designed to measure the knowledge and skills students learned
in school or to determine the academic progress they have made over a period of time.
The tests may also be used to evaluate the effectiveness of a schools and teachers, or

2
identify the appropriate academic placement for a studenti.e., what courses or
programs may be deemed most suitable, or what forms of academic support they may
need. Achievement tests are backward-looking in that they measure how well
students have learned what they were expected to learn.

2) Aptitude tests attempt to predict a students ability to succeed in an intellectual or


physical endeavour by, for example, evaluating mathematical ability, language
proficiency, abstract reasoning, motor coordination, or musical talent. Aptitude tests
are forward-looking in that they typically attempt to forecast or predict how well
students will do in a future educational or career setting. Aptitude tests are often a
source of debate, since many question their predictive accuracy and value.

3) A personality test is a questionnaire or other standardized instrument designed to


reveal aspects of an individual's character or psychological makeup. It is a simple test
to assess the subject regarding inner concept of object or environment. Subject
complete the sentence own desire words. Example; I like - , what problem with me?
My mother always?

4) Intelligence test; French psychologist, Alfred Binet the father of intelligence test
construction movement.
Based on number of individual to be tested
1) Individual test-one individual are tested at a time.
2) Group test-group of individual are tested at a time
Based on the form of the test
1) Verbal tests- use language or paper and pencil.
2) Non- verbal test- for example, performance tests.

5) Prognostic Test is to predict achievement in a specific subject area.

CHARACTERISTICS OF STANDARDIZED TEST

Constructed by test expert or specialist.


Specific direction for administering and scoring the test.
Covers board areas of objectives and content.
Selection of items will be done in a careful way ensuring validity,
reliability, usefulness of the test.

3
Procedure of administration is standardized
Test has clear directions and it will be motivating, encouraging student.
Scoring key is provided.
Test manual provides norms for the test.
It should be fixed.
It is specific direction for administering and scoring the test.
It consists of standard content and procedure.
It provides standardized frame of reference determining individual
performance.

ADVANTAGES OF STANDARDIZED TEST


Standardized tests are practical, they're easy to administer, and they consume
less time to administer versus other assessments.
Standardized testing results are quantifiable. By quantifying students'
achievements, educators can identify proficiency levels and more easily
identify students in need of remediation or advancement.
Standardized tests are scored via computer, which frees up time for the
educator.
Since scoring is completed by computer, it is objective and not subject to
educator bias or emotions.
Standardized testing allows educators to compare scores to students within the
same school and across schools. This information provides data on not only
the individual student's abilities but also on the school as a whole. Areas of
school-wide weaknesses and strengths are more easily identifiable.
Standardized testing provides a longitudinal report of student progress. Over
time, educators are able to see a trend of growth or decline and rapidly
respond to the student's educational needs.

DISADVANTAGES OF STANDARDIZED TEST


Standardized test items are not parallel with typical classroom skills and
behaviours. Due to the fact that questions have to be generalizable to the
entire population, most items assess general knowledge and understanding.
Since general knowledge is assessed, educators cannot use standardized test
results to inform their individual instruction methods. If recommendations are
made, educators may begin to 'teach to the test' as opposed to teaching what is
currently in the curriculum or based on the needs of their individual
classroom.
Standardized test items do not assess higher-level thinking skills.

4
Standardized test scores are greatly influenced by non-academic factors, such
as fatigue and attention.

PURPOSE OF STANDARDIZED TESTING

Standardized tests allow the comparison of test takers from different areas of the state, the
country and the world. What a test can do depends on whether it is well-designed for a
particular purpose. Well-designed tests can provide results that can be used in a variety of
meaningful ways, such as:

Purpose Tests results can be used to:


Licensure or Verify whether someone has the necessary knowledge and skills
Certification to be a qualified practitioner or to be given advanced standing in
an occupation or profession.
Admissions Inform decisions about which people should be selected for
entrance to an educational institution.
Placement Determine which courses or level of a course a student should
take.
Employment Inform decisions on the hiring, placement and promotion of
potential and current employees.
Curriculum based Determine whether students have mastered the objectives of the
End-of-Course Testing course taken.
Exit Testing Find out whether students have learned the amount necessary to
graduate from a level of education
Policy Tools Provide a data to policymakers that help them make decisions
regarding funding, class size, curriculum adjustments, teacher
development and more.
Course credit Indicate whether a student should receive credit for a course he or
she didnt take through demonstration of course content
knowledge.
Accountability Hold various levels of the education system responsible for test
results that indicate if students learn what they should have
learned.

DESIGN AND SCORING OF STANDARDIZED TEST


In practice, standardized test can be composed of multiple-choice and true-false questions.
Such items can be tested inexpensively and quickly by scoring special answer sheet by
computer or manually. Some tests also have short-answer or essay writing component that are
assigned a score by independent evaluators who use rubrics (rules or guidelines) and anchor

5
papers (examples of paper for each possible score) to determine the grade to be given to a
response. Most assessments, however, are not scored by people; people are used to score
items that are not able to be scored easily by computer (i.e. essays).

SCORING ISSUES
There can be issues with human scoring, which is a reason for the preference given to
computer scoring. Agreement between scorers can vary between 60 and 85 percent depending
on the test and the scoring session. Sometimes have two or more scorer, read each paper to
improve reliability, though this does not eliminate test responses getting different scores.
SCORE
There are two types of standardized test score interpretations, a norm referenced score
interpretations, or criterion referenced score interpretation.
Norm-referenced score interpretation compare test takers to a sample of peers. Criterion-
referenced score interpretation compare test-takers to a criterion (a formal definition of
content), regardless of the score of other examinees. Norm- referenced test score
interpretations are associated with traditional education, which measures success by rank
ordering student using a variety of metrics, including grades and test score, while standards
based assessments are based on the belief that all students can succeed if they are assessed
against high standards which are required of all students regardless of ability or economic
background.

NON- STANDARDIZED TEST


Non-standardized assessment looks at an individuals performance, and does
not produce scores that allow us to compare that performance to anothers. It
allows us to obtain specific information about the student, and this can be in
different formats.

A non-standardized test is one that allows for an assessment of an individual's


abilities or performances, but doesn't allow for a fair comparison of one
student to another. This types of test contrasts standardized tests, such as state
educational development tests and college entrance exams like the SAT and
ACT.

6
Examples; any type of verbal exam is typically a non-standardized test. These are
commonly used in thesis presentations or final project presentations in classes. Essay tests are
typically non-standardized, though some standardize tests to include assessments of writing
abilities. Various other types of quizzes or exams prepared by instructors in which multiple
responses are applicable are non-standardized tests.

Non Standardized Testing Is Informal Testing

Another term for non-standardized testing is informal testing. These tests are classroom tests
and are usually developed by the teacher as opposed to some group of outside testers. These
classroom tests assess students learning over a period of time or after a particular unit of
study. A score of 80% on a multiple choice test after reading a short story is a non-
standardized score because it does not tell us how the student did in relation to his peers.

Criterion Referenced Measurement

This is also a form of non-standardized testing. The desires level of achievement is the
criterion. The criterion referenced measurements are used to measure learning for a
particular student. This way the teacher can document whether or not learning is taking place.
Evidence of learning, or not, is readily apparent, and the focus here is on the performance of
an individual student as opposed to the norm-referenced tests.

Domain-referenced tests are similar to criterion-referenced. Performance is measured against


a well-defined body (domain) of knowledge or tasks, and the focus is on the individual.

Forms of Non-Standardized Testing

Forms include portfolios, interviews, informal questioning, group discussions, oral tests,
quick pop quizzes, and exhibitions of work, projects and performance exams.

With portfolios the student gathers his work over a period of time, and the teacher will
evaluate the work based on a scoring guideline. The student is encouraged to reflect on his
work, which enhances the learning process. Performance exams are tests given to all students
and are based on a student performing some task, like writing an essay, or giving an oral

7
presentation. These tasks are created by the teachers who teach the students, and so the exams
drive the curriculum. It makes more sense for those doing the teaching to create the tests.

Accountability and Non-Standardized Tests

Parents and the community have a right to know how students are doing: therefore, non-
standardized tests need to show how well schools and students are doing. Teachers are
constantly assessing their students, and by doing so they are constantly adjusting and
changing their teaching to meet individual students needs. There can still be accountability
with non-standardized assessment that provides parents, local officials, and state officials
with the information needed. Teachers can be in constant touch with parents through the
internet, by calling, by parent conferences and by sending home progress reports and samples
of work.

Success with Non-Standardized Testing

The key questions to ask with any kind of assessment is, What is the purpose of this
assessment? and Is this purpose meaningful and worthwhile? If these questions are
constantly referred to and constantly addressed then the assessment in itself is important, and
this helps teachers address what is important to learn. Its a kind of backwards design.
Ultimately the goal is to help students to learn, and to help them to learn the information and
the skills that are important.

Limitation of non- standardized test


Test are often ambiguous
They are either too short or too lengthy
Supervision is not proper.
Answer book are not marked with care

DIFFERENCE BETWEEN THE NON-

STANDARDIZED TEST AND STANDARDIZED

TEST

Non standardized tests Standardized tests

8
Learning outcomes They are used to evaluate the They are used to evaluate the
and content outcomes and the content of outcomes and content common to a
measurement the local curriculum. number of colleges.

Quality is lower than that of Quality is high.


Quality of test items the standardized test.

Reliability Usually unknown. It is high.

Uniform procedure is possible Administration and scoring


Administration and but it is usually flexible. procedure is standardized.
scoring

Scores can be compared and Compared to norm groups.


Interpretation of interpreted only in the context
score of the local situation.

CRITERIA USED SELECTING OR DEVELOPING

TEST

Various aspect of the students behaviour are evaluated in the school/colleges such as
diagnosing of learning difficulties, achievement of desire behaviour as stated in objectives
and appraisals of personal and social development of students. Regardless of the area of
behaviour being evaluated or the use to be made of the result, all of the various tests and
procedure used for evaluation of programme should posses certain common criteria. The
most essential criteria include;

VALIDITY
RELIABILITY
OBJECTIVITY
USABILITY
VALIDITY

The validity of a test is the degree to which it measures what it is intend to measure. This is
the most important aspect of a test. Validity is relative, in that how valid is the test the
concept of degree of validity is moderately valid or highly valid etc...Validity is always

9
specific for a particular test. To be valid a measuring instrument must be both relevant and
reliable. As the reliability of a test increases, it becomes more valid.

Types of Validity

Three types of validity have been identified and used in educational and psychological
measurement. They are:

Content validity
Criterion related validity.
Construct validity.
Face validity.
Formative validity.

1) Content Validity

Content validity may be defined as the extent to which a test measures a representative
sample of the subject matter content and behavioural changes under consideration. The
content of course will include both subject matter content and instructional objectives and
behavioural changes expected as a result of learning. Content validity also means that the
measurement include attitude, interest, personal, and social adjustment of student. The focus
of content validity, then, is on the adequacy of the sample and not on the appearance of a test
(face validity).

In order to make sure that content validity is obtained, Grounlund recommend certain
procedures. They are:

a) The major subject matter content and behavioural changes expected are listed from
the formulated objectives.
b) These subject matter topics and types of behavioural changes expected are weighted
in terms of their relative importance.
c) Prepare a table of specification from the weighted content and behavioural changes.
d) Construct the achievement test in accordance with the table of specification. The
closer the test corresponds to the specification indicated in the table the higher the
degree of content validity.

2) Criterion Related Validity

The criterion related validation may be defined as the process to determine the extent to
which test performance is related to some other valued measure of performance. It indicates

10
to what degree the subjects performance on the measurement tool and the subject actual
behaviour is related. Two forms of criterion validity are concurrent and predictive.

1. Concurrent validity: The relationship between scores on measuring tools and criteria
available at the same time in the present situation. Concurrent validity diagnosis
existing status of the individual rather than prediction about its future outcome.
2. Predictive validity: It refers to the degree of correlation between the measure of
concept and some future measure of the same concept. Predictive validity is extent to
which a test can predict the future performance of the students, this test are used for
classification and selection purpose. E.g.; aptitude test.

3) Construct Validity

Construct validity may be defined as the extent to which test performance can be interpreted
in terms of certain psychological constructs. The process of construct validation involves
identifying and classifying the factors which influence test score so that the test performance
can be interpreted meaningfully.

4) Face Validity

It ascertains that the measure appears to be assessing the intended construct under study. The
stakeholders can easily assess face validity. Although this is not a very scientific type of
validity, it may be an essential component enlisting motivation of stake holders. If the stake
holders do not believe the measure is an accurate assessment of the ability, they may become
disengaged with the task.

5) Formative Validity

When applied to outcomes assessment it is used to assess how well a measure is able to
provide information to help improve the program under study.

Factors Influencing Validity

1. Under direction results low validity


2. If reading vocabulary is poor, the student fails to replay to the test item even if they
know the answer.
3. Unclear difficult sentence may be difficult to comprehend for the reader, will affect
the validity of the test.
4. Use of inappropriate items.

11
5. Medium of expression.
6. Extraneous factors like style of expression legibility, handwriting, length of the
answer.
7. Inappropriate time limit.
8. Student inability to understand a test item, guess and respond, this would lower the
validity of the test item.

Measures to Improve Validity

Validity can be improved by,


1. Careful matching the test with learning objectives, contents, and teaching methods.
2. Increasing the sample of objectives and content areas included in any given test.
3. Using methods that are appropriate for the objectives specified.
4. Ensuring adequate security and supervision to avoid cheating in examination.
5. Improve the ability of the test.

RELIABILITY

Reliability is the term used to indicate the consistency with which a test measures what it is
designed to measure. It should yield similar result when used on two separate occasions,
provided the other variable remains or how consistent the evaluation result are from one
measurement to another. The three main attributes of a reliable scale are,

Stability: ability to produce the same result with repeated testing.


Homogeneity: means all the items in a tool measures the same concept or
characteristic.
Equivalence: if the tool produces the same result when equivalent or parallel
instrument or procedures are used.

Methods of Estimating Reliability

The size of the reliability coefficient is related to the methods of estimating reliability in the
following way:

Split half method: Largest reliability coefficients reported for a given tests.
Estimate inflated by factors such as speed is another sub type of internal
consistency reliability. The process of obtaining split- half reliability is begun
by splitting in half all items of a test that are intended to probe the same area
of knowledge ( e.g, World War ll) in order to form two sets of items. The
entire test is administered to a group of individuals, the total score for each

12
set is computed, and finally the split-half reliability is obtained by
determining the correlation between the two totals set scores.
Test- retest method: medium to large reliability coefficients reported for a
given test and it become smaller as time interval between test is increased. It is
a measure of reliability obtained by administering the same test twice over a
period of time to a group of individuals. The scores from Time 1 and Time 2
can then be correlated in order to evaluate the test for stability over time.
Equivalent forms method: (without time interval) medium to large
reliability coefficient reported for a given test.
Equivalent forms method: (with time interval) smallest reliability
coefficient reported for a given test becomes smaller as time interval between
forms is increased.

Types of Reliability

1. Scorer reliability
2. Content reliability
3. Temporal reliability

Scorer reliability: concerns itself with the degree of arrangement between two scorers of the
same test answer. It deals both the degree of consistency in grading the same test answer by
the same scorer on two different occasion and this is also called inter scorer reliability.

Content reliability: it deals with the ability of all the items of a test to measures
competencies in the same general content area.

Temporal reliability: it is the third major dimension of reliability and as the name suggests it
concerns itself with the stability off the result of attest over time, for attest to be reliable it
must yield consistent result overtime.

Factors Influencing Reliability

1. Data collection method.


2. Interval between testing occasion.
3. Test length, higher length-more reliable.
4. Speed of the method.
5. Group homogeneity.
6. Difficulty of the items.
7. Ambiguous wording.
8. Inconsistency in test administration.
9. Objectivity of scoring is more reliable than subjectivity of scoring.

13
Measures to Improve Reliability

1. Limiting subjectivity of all kind.


2. Ensure that questions are clear.
3. Ensure that expected answers are definite and objective.
4. Checking to make sure the time limits are adequate.
5. Giving simple, clear and unambiguous instruction.
6. Keeping choice with in a test paper to minimum .
7. Conducting test under identical and ideal examination conditions.
8. When using less reliable methods increase the number of question observation or
examination time.

Relationship between Validity and Reliability


A. Validity is truth-fullness, while reliability is truth worthiness. These both are the two
aspect of efficiency.
B. The test cannot be valid unless it is reliable.
C. Validity includes reliability

OBJECTIVITY

Objectivity in measurement helps to increase test validity and reliability, this is the extent to
which independent and competent examiners agree on what constitutes a good answer for
each of the items of a measuring instrument. Most standardized test of aptitude and
achievement are high in objectivity.

Advantages of Objectivity

Objective test are more reliable.


They enable to more extensive survey in a given time than can be obtained by any
other type of test.
The time element is reduced.
A greater number of items can be tested.
Greater the validity of test.

Disadvantages of Objectivity

More time is required to prepare good objective question.


It provides little or no opportunities for the measurement of the students ability.

USABILITY

14
It is important that a test is practical for its purpose. This implies such as factors as the time
taken to conduct the test, the cost of using it and practicability for everyday use. It includes
the practical consideration such as cost, mode of administration, scoring procedures
availability of test and test score interpretation.

Errors of Measurement

There are two types of errors that are concerned in measurement random error or chance error
and systematic or constant error. Validity is concerned with random error. The three
components of measurement score are true score (T) observed score (O) and the error score
(E).

TEST CONSTRUCTION

There are four steps to constructing a test. These steps and procedures help us to produce a
valid, reliable and objective standardized test. The four main steps are:
1. planning the Test

2. Preparing the Test

3. Try out the Test

4. Evaluating the Test

STEP-1:-PLANNING THE TEST

Planning of the test is the first important step in the test construction. The main goal of
evaluation process is to collect valid, reliable and useful data about the student.

Therefore before going to prepare any test we must keep in mind that:
(1) What is to be measured?

(2) What content areas should be included and

(3) What types of test items are to be included.

Therefore the first step includes three major considerations.

15
1. Determining the objectives of testing.

2. Preparing test specifications.

3. Selecting appropriate item types.

1. Determining the Objectives of Testing:


A test can be used for different purposes in a teaching learning process. It can be used to
measure the entry performance, the progress during the teaching learning process and to
decide the mastery level achieved by the students. Tests serve as a good instrument to
measure the entry performance of the students. It answers to the questions, whether the
students have requisite skill to enter into the course or not, what previous knowledge does the
pupil possess. Therefore it must be decided whether the test will be used to measure the entry
performance or the previous knowledge acquired by the student on the subject.

Tests can also be used for formative evaluation. It helps to carry on the teaching learning
process, to find out the immediate learning difficulties and to suggest its remedies. Tests are
used to assign grades or to determine the mastery level of the students. These summative tests
should cover the whole instructional objectives and content areas of the course. Therefore
attention must be given towards this aspect while preparing a test.

2. Preparing Test Specifications:


The second important step in the test construction is to prepare the test specifications. In
order to be sure that the test will measure a representative sample of the instructional
objectives and content areas we must prepare test specifications. So that an elaborate design
is necessary for test construction. One of the most commonly used devices for this purpose is
Table of Specification or Blue Print.

Preparation of Table of Specification/Blue Print:


Preparation of table of specification is the most important task in the planning stage. It acts,
as a guide for the test construction. Table of specification or Blue Print is a three dimen-
sional chart showing list of instructional objectives, content areas and types of items in its
dimensions.

It includes three major steps:

16
(i) Determining the weightage to different instructional objectives.

(ii) Determining the weightage to different content areas.

(iii) Determining the item types to be included.

(i) Determining the weightage to different instructional objectives:

There are vast arrays of instructional objectives. We cannot include all in a single test. In a
written test we cannot measure the psychomotor domain and affective domain. We can only
measure the cognitive domain. It is also true that all the subjects do not contain different
learning objectives like knowledge, understanding, application and skill in equal proportion.
Therefore it must be planned how much weight ago to be given to different instructional
objectives. While deciding this we must keep in mind the importance of the particular
objective for that subject or chapter.
(ii) Determining the weightage to different content areas:
The second step in preparing the table of specification is to outline the content area. It
indicates the area in which the students are expected to show their performance. It helps to
obtain a representative sample of the whole content area. It also prevents repetition or
omission of any unit.

.Weightage of a topic:

(iii) Determining the item types:


The third important step in preparing table of specification is to decide appropriate item
types. Items used in the test construction can broadly be divided into two types like objective
type items and essay type items. For some instructional purposes, the objective type items are
most efficient where as for others the essay questions prove satisfactory.

Appropriate item types should be selected according to the learning outcomes to be


measured. For example when the outcome is writing, naming supply type items are useful. If
the outcome is identifying a correct answer selection type or recognition type items are

17
useful. So that the teacher must decide and select appropriate item types as per the learning
outcomes.

STEP-2. PREPARING THE TEST:


After planning preparation is the next important step in the test construction. In this step the
test items are constructed in accordance with the table of specification. Each type of test item
need special care for construction.

The preparation stage includes the following three functions:


(i) Preparing test items.

(ii) Preparing instruction for the test.

(iii) Preparing the scoring key.

(i) Preparing the Test Items:


Preparation of test items is the most important task in the preparation step. Therefore care
must be taken in preparing a test item. The following principles help in preparing relevant
test items.

1. Test items must be appropriate for the learning outcome to be measured:


The test items should be so designed that it will measure the performance described in the
specific learning outcomes. So that the test items must be in accordance with the performance
described in the specific learning outcome.

2. Test items should measure all types of instructional objectives and the whole content
area:
The items in the test should be prepared that it will cover all the instructional objectives,
knowledge, understanding, thinking skills and match the specific learning outcomes and
subject matter content being measured

3. The test items should be free from ambiguity:

18
The item should be clear. Inappropriate vocabulary should be avoided. The items should be
so worded that all pupils understand the task.

4. The test items should be of appropriate difficulty level:


The test items should be proper difficulty level, so that it can discriminate properly. If the
item is meant for a criterion-referenced test its difficulty level should be as per the difficulty
level indicated by the statement of specific learning outcome. In a norm-referenced test the
main purpose is to discriminate pupils according to achievement. So that the test should be so
designed that there must be a wide spread of test scores. Therefore the items should not be so
easy that everyone answers it correctly and also it should not be so difficult that everyone
fails to answer it. The items should be of average difficulty level.

5. The test item must be free from technical errors and irrelevant clues:
Sometimes there are some unintentional clues in the statement of the item which helps the
pupil to answer correctly. For example grammatical inconsistencies, verbal associations,
extreme words (ever, seldom, always), and mechanical features (correct statement is longer
than the incorrect). Therefore while constructing a test item careful step must be taken to
avoid most of these clues.

6. Test items should be free from racial, ethnic and sexual biasness:
The items should be universal in nature. Care must be taken to make a culture fair item.
While portraying a role all the facilities of the society should be given equal importance. The
terms used in the test item should have an universal meaning to all members of group.

(ii) Preparing Instruction for the Test:


This is the most neglected aspect of the test construction. Generally everybody gives attention
to the construction of test items. So the test makers do not attach directions with the test
items.

But the validity and reliability of the test items to a great extent depends upon the
instructions for the test. N.E. Gronlund has suggested that the test maker should
provide clear-cut direction about;
a. The purpose of testing.

b. The time allowed for answering.

19
c. The basis for answering.

d. The procedure for recording answers.

e. The methods to deal with guessing.

Direction about the Purpose of Testing:


A written statement about the purpose of the testing maintains the uniformity of the test.
Therefore there must be a written instruction about the purpose of the test before the test
items.

Instruction about the time allowed for answering:


Clear cut instruction must be supplied to the pupils about the time allowed for whole test. It is
also better to indicate the approximate time required for answering each item, especially in
case of essay type questions. So that the test maker should carefully judge the amount of time
taking the types of items, age and ability of the students and the nature of the learning
outcomes expected. Experts are of the opinion that it is better to allow more time than to
deprive a slower student to answer the question.

Instructions about basis for answering:


Test maker should provide specific direction on the basis of which the students will answer
the item. Direction must clearly state whether the students will select the answer or supply the
answer. In matching items what is the basis of matching the premises and responses (states
with capital or country with production) should be given. Special directions are necessary for
interpretive items. In the essay type items clear direction must be given about the types of
responses expected from the pupils.

Instruction about recording answer:


Students should be instructed where and how to record the answers. Answers may be
recorded on the separate answer sheets or on the test paper itself. If they have to answer in the
test paper itself then they must be directed, whether to write the correct answer or to indicate
the correct answer from among the alternatives. In case of separate answer sheets used to
answer the test direction may be given either in the test paper or in the answer sheet.

Instruction about guessing:

20
Direction must be provided to the students whether they should guess uncertain items or not
in case of recognition type of test items. If nothing is stated about guessing, then the bold
students will guess these items and others will answer only those items of which they are
confident. So that the bold pupils by chance will answer some items correctly and secure a
higher score. Therefore a direction must be given to guess but not wild guesses.

(iii) Preparing the Scoring Key:


A scoring key increases the reliability of a test, So that the test maker should provide the
procedure for scoring the answer scripts. Directions must be given whether the scoring will
be made by a scoring key (when the answer is recorded on the test paper) or by a scoring
stencil (when answer is recorded on separate answer sheet) and how marks will be awarded
to the test items.

In case of essay type items it should be indicated whether to score with point method or
with the rating method. In the point method each answer is compared with a set of ideal
answers in the scoring Hey. Then a given number of points are assigned.

In the rating method the answers are rated on the basis of degrees of quality and determine
the credit assigned to each answer. Thus a scoring key helps to obtain a consistent data about
the pupils performance. So the test maker should prepare a comprehensive scoring procedure
along with the test items.

STEP - 3. TRY OUT OF THE TEST:


Once the test is prepared next is to confirming the validity, reliability and usability of the test.
Try out helps us to identify defective and ambiguous items, to determine the difficulty level
of the test and to determine the discriminating power of the items.

Try out involves two important functions:


(a) Administration of the test.
(b) Scoring the test.
(a) Administration of the test:
Administration means administering the prepared test on a sample of pupils. So the
effectiveness of the final form test depends upon a fair administration. Gronlund and Linn
have stated that the guiding principle in administering any class room test is that all pupils
must be given a fair chance to demonstrate their achievement of learning outcomes being

21
measured. It implies that the pupils must be provided congenial physical and psychological
environment during the time of testing. Any other factor that may affect the testing procedure
should be controlled.

Physical environment means proper sitting arrangement, proper light and ventilation and
adequate space for invigilation, Psychological environment refers to these aspects which in-
fluence the mental condition of the pupil. Therefore steps should be taken to reduce the
anxiety of the students. The test should not be administered just before or after a great
occasion like annual sports on annual drama etc.

One should follow the following principles during the test administration:
1. The teacher should talk as less as possible.

2. The teacher should not interrupt the students at the time of testing.

3. The teacher should not give any hints to any student who has asked about any item.

4. The teacher should provide proper invigilation in order to prevent the students from
cheating.

(b) Scoring the test:


Once the test is administered and the answer scripts are obtained the next step is to score the
answer scripts. A scoring key may be provided for scoring when the answer is on the test
paper itself Scoring key is a sample answer script on which the correct answers are recorded.

When answer is on a separate answer sheet at that time a scoring stencil may be used for
answering the items. Scoring stencil is a sample answer sheet where the correct alternatives
have been punched. By putting the scoring stencil on the pupils answer script correct answer
can be marked. For essay type items separate instructions for scoring each learning objective
may be provided.

Correction for guessing:


When the pupils do not have sufficient time to answer the test or the students are not ready to
take the test at that time they guess the correct answer, in recognition type items.

In that case to eliminate the effect of guessing the following formula is used:

22
STEP -4. EVALUATING THE TEST:
Evaluating the test is most important step in the test construction process. Evaluation is
necessary to determine the quality of the test and the quality of the responses. Quality of the
test implies that how good and dependable the test is? (Validity and reliability). Quality of the
responses means which items are misfit in the test. It also enables us to evaluate the usability
of the test in general class-room situation.

Evaluating the test involves following functions:


(a) Item analysis.

(b) Determining validity of the test.

(c) Determining reliability of the test.

(d) Determining usability of the test.


(a) Item analysis:
Item analysis is a procedure which helps us to find out the answers to the following
questions:
a. Whether the items functions as intended?

b. Whether the test items have appropriate difficulty level?

c. Whether the item is free from irrelevant clues and other defects?

d. Whether the distracters in multiple choice type items are effective?

The item analysis data also helps us:


a. To provide a basis for efficient class discussion of the test result

b. To provide a basis for the remedial works

23
c. To increase skill in test construction

d. To improve class-room discussion.

(b) Determining Validity of the Test:


At the time of evaluation it is estimated that to what extent the test measures what the test
maker intends to measure.

(c) Determining Reliability of the Test:


Evaluation process also estimates to what extent a test is consistent from one measurement to
other. Otherwise the results of the test cannot be dependable.

(d) Determining the Usability of the Test:


Try out and the evaluation process indicates to what extent a test is usable in general class-
room condition. It implies that how far a test is usable from administration, scoring, time and
economy point of view.

CHECKLIST

Checklists are list of items or performances indicators requiring dichotomous responses such
as satisfactory or unsatisfactory, pass or fail, yes or no, present or absent etc.

Grounlund (2005) describes a checklist as an inventory of measurable performance


dimension of products with a place to record a simple yes or no judgement.

A checklist is a grouping of items by which something may be confirmed or verified .It


can be called as a behavioural inventory. It is basically a method of recording whether a
particular attribute is present or absent or whether an action had or had not taken place. It
consists of listing of steps, activities or behaviour which the observer records when an
individual occurs. The educational and instructional objectives should be kept in mind when
preparing and using a checklist.

Characteristics of checklist

24
Observe one respondent at one time.
Clearly specify the characteristics of behaviour to be observed.
Use only carefully prepared checklist to avoid more complex traits.
The observer should be trained how to observe, what to observe, and how to
record the observed behaviour.
Use checklist only when you are interested in calculating, a particular
characteristics.
Construction of Checklists

While constructing or preparing checklists following points to be kept in mind:

Express each item in clear, simple language.,


Avoid lifting statements verbatim from the text.
Avoid negative statements wherever possible.
Review the items independently.
Utilization of Checklists
Use checklist only when you are interested in ascertaining whether a particular
trait or characteristics is present or absent.
Use only carefully prepared checklist for more complex kind of trait.
Observe only one student at a time and confine your observation to the points
specified in the checklists
Have separate checklist for each student.
The observer must be trained, how to observe, what to observe and how to
record the observed behaviour. To make a valid judgement he should omit
recording those behaviours for which he has insufficient information.
Checklist requires the observer to judge whether certain behaviour of student
and clinical practice has been taken place. They can be used most effectively
when components of clinical performance can be specified .It is possible for
the observer to simply note whether the prescribed behaviour has taken places
or not. The actual physical, psychomotor skills involved can be jointly
precisely stated.

Merits of Checklists.
Short and easy to assess and record.
Useful for evaluation of specific well-defined behaviours and are commonly
used in the clinical simulated laboratory setting
They can be used for both process and procedure evaluation.
They are adaptable to most subject matter areas.
They allow inter individual comparisons to be made on common set of traits
or characteristics.

25
The checking process implies that standards and criteria are available for gauging items. The
inspection procedure requires scrutiny of behaviour under investigation. Check lists are most
useful for determining the status of tangible items ,such as inventory and maintenance of
equipment and supplies .They have the advantage that items to be observed can be
determined in advance and will be the same criteria used in each situation. But there is no
guarantee however, that the observed behaviour is a persistent one and that the procedure will
provide a representative picture of the individual being evaluated.

It is recommended that only significant behaviours essential for a successful performance


is included on the checklist.

Limitations
Does not indicate the quality of performance .Only a limited component can be
assessed.
Only the presence or absence of an attribute, behaviour or performance parameter
may be assessed. However, degree of accuracy of performance cannot be assessed.
It has a limited use in qualitative research studies.

RATING SCALE
Rating means the judgement of one person by another. A rating scale is a method by which
we systemize the expression of opinion concerning a trait. Rating scale is a common
evaluation tool used in describing observed skills and performance. More than noting the
presence or absence of a behaviour to a point on a continuum and also involves judgements
regarding quantitative and qualitative abilities. In simple words, it consists of a set of
characteristics or qualities to be judged and a scale for indicating the degree up to which the
particular attribute is present.

A rating scale can be defined as a standardized method of recording, interpretation of


behaviour, which is totally based on observation, strictly in line with the educational
objectives.

A rating scale is a device used to evaluate situations or characteristics that can occur or be
present in varying degrees, rather than merely be present or absent as in the instrument so
designed as to facilitate appraisal of a number of traits characteristics by reference to a
common quantitative scale of values.

26
Rating scales resembles checklists but are used when finer discriminations are required.
Instead of merely indicating the presence or absence of a trait or characteristic, it enables us
to indicate the degree to which a trait is present. Rating scale provides systematic procedure
of obtaining. Recording and reporting to the observer judgement. That may be filled out
while the observation is made, immediately after the observation is made or as often in the
case, long after the observation.

Rating scale consists of a set of characteristics or qualities to be judged and some type of
scale for indicating the degree to which an attribute is present.

Types of Rating Scales

Various types of rating scales that are commonly used are:

Descriptive rating scales


Graphic rating scales
Numerical rating scales
Behaviourally anchored rating scale.

Descriptive Rating Scales

These types of rating scales use descriptive phrases to identify the points on a graphic scale.
The descriptions are brief details that convey in behavioural terms for each trait (how pupils
behave at different steps along then scale).The rater selects the one most applicable to the
person. A space for comment is also frequently provided to enable the rater to clarify the
rating or ton record behavioural incidents pertinent to the rating.

Graphic Rating Scales

The rater indicates the performers standing in respect to each trait by placing a check mark at
an appropriate point along the line. In this, each line is followed by a horizontal line. The
rating is made by placing a tick on the line. A set of categories identify specific position along
the line, but the rater can also click between these points. Here the degree of each character is
arranged so that the rater can make as fine distinctions as he wishes to make.

Numerical Rating Scales

27
In this the extent or degree to which a particular attribute is present in an individual is
indicated by numbers. The observer puts a tick or circle on the number to which the student
possess that attribute. Each number is given a verbal description that remains constant for a
particular character. It includes numbers against which a list of behaviours is evaluated. This
is not a very reliable tool because of the in consistent value attributed to the number. It can be
partially overcome by adding a few quantitative terms. The numerical rating scale is useful
when the characteristics or qualities to be rated can be classified in to a limited number of
categories and when there is a general agreement concerning the category represented by
each number.

Behaviourally Anchored Rating Scales(BARS)

BARS is an acronym for behaviourally anchored rating scales, sometimes known as BES,
behavioural expectation scales. A BAR evaluate behaviour relevant to specific demands of
the job and provides examples of specific job behaviours corresponding to good, average and
poor performances. This reduces the amount of personal judgement needed by the rater.
Separate BARS are needed for each job. BOS as an acronym for Behavioural Observation
scales. This system capitalizes on some of the strengths of BARS while avoiding some of the
disadvantages. The evaluator lists a number of critical incidents for each performance
dimension and rates the extent to which the behaviour has been observed on a five point scale
ranging from almost never to almost always.

Principles in preparing Rating Scales


Rating scales should directly relate to learning objectives.
They need to be confined to performance areas that can be observed.
Three to seven rating positions may need to be provided.
The scale may be made more objective, valid and reliable by getting a pooled rating
from more than one observer.
All rates should be oriented to the specific scale as well as the process of rating in
general.
As errors are common due to subjective judgement made by the observer, rater
should be conscious enough to avoid them.
A rating scale provides the instructor with a convenient form on which to record
judgements indicating the degree of student performance. This differs from a
checklist in that it allows for more discrimination in judging behaviours as compared
with dichotomous Yes or No options.

28
Characteristics of rating scale
Rating scales are value judgement about attributes of one person by another person.
These scales are generally developed to make quantitative judgements about
qualitative attributes.
Rating scale provides more flexibility to judge the level of performance or presence of
attributes among subjects.
Guilford (1954) identified that a rating scales must have following basic
characteristics, which must be taken care of while construction a rating scale.
Clarity- Rating scale must be constructed using short, concise statements in
simple and unambiguous language.
Relevance- The statements designed in rating scale should be relevant to the
phenomenon and it should be exactly in accordance with the variable under
study.
Variety- While developing a rating scale, monotony in the statements must
be avoided and a variety in different statement must be ensured.
Objectivity statements in the rating scale must be objective in nature, so
that it is convenient for the researcher to judge the attributes or performance
of the subjects under study.
Uniqueness- each statement constructed in rating scale must be unique in
itself, so that attributes can be judged appropriately.
Advantages of Rating Scales
Rating scales are easy to administer and score.
They can be used for large number of students.
They have a wide range of application.
They give a clear feedback to the students.
They are used to evaluate skills, product outcomes, activities, interests, attitudes and
personal characteristics.
It is used to observe and record qualitative and quantitative judgements about
observed performance.
They tend to be adaptable and flexible.
They are efficient and economical in the use of time.
They can help to reduce the subjectivity and unreliability that are usually associated
with the observation method.
It will direct observation toward specific aspects of behaviour.
It will provide a common frame of reference for comparing all pupils on the same set
of characteristics.
It will provide a convenient method for recording the observers judgments.
Disadvantages of Rating Scales

29
However, be careful, a degree of subjectivity is inevitable while using the rating
scales.
Misuse can result in decrease in objectivity.

OBJECTIVE STRUCTURED CLINICAL EXAMINATION


(OSCE)

Objective structured clinical examination (OSCE) is a modern type of examination often used
in health sciences (eg: Medicine, physical therapy, radiography, nursing, pharmacy, dentistry)
to test clinical skill performance and competence in skills such as communication, clinical
examination, medical procedures/prescription, exercise prescription, joint
mobilisations/manipulations techniques and interpretation of results.

OSCE Design

An OSCE usually comprises a circuit of short stations, in which each candidate is examined
on a one to-one basis with one or two impartial examiners and either real or stimulated
patients. Each station has a different examiner, as opposed to the traditional methods of
clinical examinations where a candidate would be assigned to an examiner for the entire
examination. Candidates rotate through the stations, completing all the stations, on their
circuit. In this way, all candidates take the same stations. It is considered to be an
improvement over traditional examination methods because the station can be standardised
enabling fairer peer comparison and complex procedures can be assessed without
endangering patients health.

As the name suggest an OSCE is designed to be:

Objective- all candidates are assessed using exactly the same stations with the same marking
scheme. In an OSCE, candidates get marks for each step on the mark scheme that they
performed correctly which therefore makes the assessments of clinical skills more objective
rather than subjective, where one or two examiners decide whether or not the candidate fails
base on their subjective assessment of their skills.

Structured- stations in OSCEs have very specific task where simulated patients are used,
detailed scripts are provided to ensure that the information that they give is the same to all
candidates, including the emotions that the patients should use during the consultations.
Instructions are carefully written to ensure that the candidate is given a very specific task to
complete. The OSCE is carefully structured to include parts from all elements of curriculum
as well as a wide range of skills.

Clinical examinations- the OSCE are designed to apply clinical and theoretical knowledge,
where theoretical knowledge is required (for example, answering questions from the
examiner at the end of the station) then the questions are standardised and the candidate is
only asked questions that are on the mark sheet.

30
OSCE Marking

Marking in OSCEs is done by examiner. One of the way, an OSCE is made objective is by
having a detailed mark scheme and standard set of questions. For example, a station
concerning the demonstration to a simulated patient on how to use Metered Dose Inhaler
(MDI) would award points for specific actions which are performed safely and accurately.
The examiner can often vary the marks depending on how well the candidate performed the
steps. If a simulated is used then they are often asked to add marks depending on the
candidates approach. At the end, the examiner is often asked to rate the candidate as pass /
borderline/fail (or sometimes as excellent / good/pass/borderline/fail)

Many centres allocate each station an individual pass mark. The sum of the pass marks of all
the stations determines the overall pass mark for the OSCE.

Preparation

Preparation for OSCEs is very different from preparing for an examination on theory. In an
OSCE, clinical skills are tested rather than pure theoretical knowledge. It is essential to learn
correct clinical methods and then practice repeatedly until one perfects the methods. Marks
are awarded for each step in the method : hence it is essential to dissect the method in to its
individual step, learn the steps and then learn to performed the steps in sequence. It is often
very helpful to practise in small groups with colleagues, setting a typical OSCE scenario and
timing it with one person role playing a patient, one person doing the task and (if possible)
one person either observing or commenting on technique or even role playing the examiner
using a simple mark sheet.

How is the OSCEs done? The exam day

Although OSCEs are performed in many settings in regard to the exam purpose, the
organizing institution, and available facilities, they all share similar procedures. On the
examination day, you will go through the following steps in sequence:

1.Registration:

The first step is the registration. You will:

Show your examination invitation card and an identification.


Be reminded about the exam rules.
Be checked for things which are allowed and other not allowed things.
Receive your exam envelope which contains your ID badge , stickers, a pencil, a
notebook or clipboard(both with numbered blank papers)......etc.

2. Orientation:

The next step is orientation. An orientation video may be shown, here:

31
Exam format, procedures and policies will be reviewed.
Introduced to your team and team leader.
Instructed about your starting station and how to proceed.
Your questions will be answered (and not allowed beyond this step).

3.Escorting to exam position:

Now it is exam time. You will be escorted to your station. You will stop by the assigned room
door until a long bell/ buzzer announces the start of the exam.

4. Station Instruction Time:

This is one or two minutes to read the instruction about this station situation, patient, and
required tasks. Read carefully. At the next bell/ buzzer enter the room.

5. The Encounter:

Start your encounter with the SP. This is a 5-20 minute encounter. Perform the required tasks.
Stop at the next bell/ buzzer.

6.Post Encounter Period: Next is a question period. There are some differences here. Some
OSCEs will have no post encounter periods. Some will have one or two minutes of the
encounter period assigned to oral questions asked by the examiner inside the exam room. No
more communication is allowed with the SP. Others have written questions to be answered on
paper or computer outside the exam room for 5-10 minutes. At the next long bell/ buzzer, the
first station ended as well as the next station has started. You have to proceed to the next
station quickly as it is the same long bell/ buzzer at step 4

7. Repeat Steps 4 to 6:

Steps 4 to 6 will be repeated until you have been in all the stations. Some OSCEs will offer
one or two short rest periods.

8. Exam ended / Escorting to dismissal area: The exam is over.

You will be escorted back to the dismissal area for signing out. You will be asked to handle
back all what you had received on signing in, the ID badge, remaining stickers, all the papers,
and the pencil. You may also be asked to stay without outside contacts for some time
(sometimes hours) for exam security reasons.

CONCLUSION
Testing , measurement and evaluation play an important role in all educational institution,
including nursing educational institution. Student achievement is the profoundly influenced

32
by the evaluation practice used by the teachers in the classroom. The nurse educator is
responsible for evaluating students in order to improve their class and clinical performance.

REFERENCES

1. Basheer p Shabeer. Text book of nursing education. Emmess medical publishers. New
Delhi.2012. 1st edition.219-225
2. R Sudha . Text book nursing education .Jaypee brothers medical publishers. New
Delhi.2013. 1st edition.180-190.
3. Noraman. E. Gronlund, Robert . Measurement and evaluation in teaching ,
Macmillian publisher. Newyork. 1985-395.
4. Elizebath C King . Affective education in nursing. Aspen publication. Mary land .
1984 .80-84
5. D. Elakkuvana Bhaskara Raj, Nima Bhaskar. Text book of nursing education.Emmess
publishers. Bangalore.2015.2nd Edition.

33
34

Вам также может понравиться