Академический Документы
Профессиональный Документы
Культура Документы
(NIMHANS)
CLASS PRESENTATION
3. Test construction
Presented to,
Associate Professor,
NIMHANS,
Presented by,
Limiya Varghese,
NIMHANS
1
INTRODUCTION
The written tests are divided in to two types, standardized and non standardized test. Non
standardized test is also known as teacher made test. According to Pearson Education's 2003
Assessment Report, standardized tests are Standards based tests are developed by the state
and must align to academic standards and set performance levels to identify each student into
one of 3 categories : Basic, Proficient, and Advanced. Non- Standardized and Alternative
Assessments is the traditional form of Assessment, where teachers construct questions,
evaluate student responses, assign and check homework, and informally assess student
progress every day.
Standardized test
DEFINITION
2
identify the appropriate academic placement for a studenti.e., what courses or
programs may be deemed most suitable, or what forms of academic support they may
need. Achievement tests are backward-looking in that they measure how well
students have learned what they were expected to learn.
4) Intelligence test; French psychologist, Alfred Binet the father of intelligence test
construction movement.
Based on number of individual to be tested
1) Individual test-one individual are tested at a time.
2) Group test-group of individual are tested at a time
Based on the form of the test
1) Verbal tests- use language or paper and pencil.
2) Non- verbal test- for example, performance tests.
3
Procedure of administration is standardized
Test has clear directions and it will be motivating, encouraging student.
Scoring key is provided.
Test manual provides norms for the test.
It should be fixed.
It is specific direction for administering and scoring the test.
It consists of standard content and procedure.
It provides standardized frame of reference determining individual
performance.
4
Standardized test scores are greatly influenced by non-academic factors, such
as fatigue and attention.
Standardized tests allow the comparison of test takers from different areas of the state, the
country and the world. What a test can do depends on whether it is well-designed for a
particular purpose. Well-designed tests can provide results that can be used in a variety of
meaningful ways, such as:
5
papers (examples of paper for each possible score) to determine the grade to be given to a
response. Most assessments, however, are not scored by people; people are used to score
items that are not able to be scored easily by computer (i.e. essays).
SCORING ISSUES
There can be issues with human scoring, which is a reason for the preference given to
computer scoring. Agreement between scorers can vary between 60 and 85 percent depending
on the test and the scoring session. Sometimes have two or more scorer, read each paper to
improve reliability, though this does not eliminate test responses getting different scores.
SCORE
There are two types of standardized test score interpretations, a norm referenced score
interpretations, or criterion referenced score interpretation.
Norm-referenced score interpretation compare test takers to a sample of peers. Criterion-
referenced score interpretation compare test-takers to a criterion (a formal definition of
content), regardless of the score of other examinees. Norm- referenced test score
interpretations are associated with traditional education, which measures success by rank
ordering student using a variety of metrics, including grades and test score, while standards
based assessments are based on the belief that all students can succeed if they are assessed
against high standards which are required of all students regardless of ability or economic
background.
6
Examples; any type of verbal exam is typically a non-standardized test. These are
commonly used in thesis presentations or final project presentations in classes. Essay tests are
typically non-standardized, though some standardize tests to include assessments of writing
abilities. Various other types of quizzes or exams prepared by instructors in which multiple
responses are applicable are non-standardized tests.
Another term for non-standardized testing is informal testing. These tests are classroom tests
and are usually developed by the teacher as opposed to some group of outside testers. These
classroom tests assess students learning over a period of time or after a particular unit of
study. A score of 80% on a multiple choice test after reading a short story is a non-
standardized score because it does not tell us how the student did in relation to his peers.
This is also a form of non-standardized testing. The desires level of achievement is the
criterion. The criterion referenced measurements are used to measure learning for a
particular student. This way the teacher can document whether or not learning is taking place.
Evidence of learning, or not, is readily apparent, and the focus here is on the performance of
an individual student as opposed to the norm-referenced tests.
Forms include portfolios, interviews, informal questioning, group discussions, oral tests,
quick pop quizzes, and exhibitions of work, projects and performance exams.
With portfolios the student gathers his work over a period of time, and the teacher will
evaluate the work based on a scoring guideline. The student is encouraged to reflect on his
work, which enhances the learning process. Performance exams are tests given to all students
and are based on a student performing some task, like writing an essay, or giving an oral
7
presentation. These tasks are created by the teachers who teach the students, and so the exams
drive the curriculum. It makes more sense for those doing the teaching to create the tests.
Parents and the community have a right to know how students are doing: therefore, non-
standardized tests need to show how well schools and students are doing. Teachers are
constantly assessing their students, and by doing so they are constantly adjusting and
changing their teaching to meet individual students needs. There can still be accountability
with non-standardized assessment that provides parents, local officials, and state officials
with the information needed. Teachers can be in constant touch with parents through the
internet, by calling, by parent conferences and by sending home progress reports and samples
of work.
The key questions to ask with any kind of assessment is, What is the purpose of this
assessment? and Is this purpose meaningful and worthwhile? If these questions are
constantly referred to and constantly addressed then the assessment in itself is important, and
this helps teachers address what is important to learn. Its a kind of backwards design.
Ultimately the goal is to help students to learn, and to help them to learn the information and
the skills that are important.
TEST
8
Learning outcomes They are used to evaluate the They are used to evaluate the
and content outcomes and the content of outcomes and content common to a
measurement the local curriculum. number of colleges.
TEST
Various aspect of the students behaviour are evaluated in the school/colleges such as
diagnosing of learning difficulties, achievement of desire behaviour as stated in objectives
and appraisals of personal and social development of students. Regardless of the area of
behaviour being evaluated or the use to be made of the result, all of the various tests and
procedure used for evaluation of programme should posses certain common criteria. The
most essential criteria include;
VALIDITY
RELIABILITY
OBJECTIVITY
USABILITY
VALIDITY
The validity of a test is the degree to which it measures what it is intend to measure. This is
the most important aspect of a test. Validity is relative, in that how valid is the test the
concept of degree of validity is moderately valid or highly valid etc...Validity is always
9
specific for a particular test. To be valid a measuring instrument must be both relevant and
reliable. As the reliability of a test increases, it becomes more valid.
Types of Validity
Three types of validity have been identified and used in educational and psychological
measurement. They are:
Content validity
Criterion related validity.
Construct validity.
Face validity.
Formative validity.
1) Content Validity
Content validity may be defined as the extent to which a test measures a representative
sample of the subject matter content and behavioural changes under consideration. The
content of course will include both subject matter content and instructional objectives and
behavioural changes expected as a result of learning. Content validity also means that the
measurement include attitude, interest, personal, and social adjustment of student. The focus
of content validity, then, is on the adequacy of the sample and not on the appearance of a test
(face validity).
In order to make sure that content validity is obtained, Grounlund recommend certain
procedures. They are:
a) The major subject matter content and behavioural changes expected are listed from
the formulated objectives.
b) These subject matter topics and types of behavioural changes expected are weighted
in terms of their relative importance.
c) Prepare a table of specification from the weighted content and behavioural changes.
d) Construct the achievement test in accordance with the table of specification. The
closer the test corresponds to the specification indicated in the table the higher the
degree of content validity.
The criterion related validation may be defined as the process to determine the extent to
which test performance is related to some other valued measure of performance. It indicates
10
to what degree the subjects performance on the measurement tool and the subject actual
behaviour is related. Two forms of criterion validity are concurrent and predictive.
1. Concurrent validity: The relationship between scores on measuring tools and criteria
available at the same time in the present situation. Concurrent validity diagnosis
existing status of the individual rather than prediction about its future outcome.
2. Predictive validity: It refers to the degree of correlation between the measure of
concept and some future measure of the same concept. Predictive validity is extent to
which a test can predict the future performance of the students, this test are used for
classification and selection purpose. E.g.; aptitude test.
3) Construct Validity
Construct validity may be defined as the extent to which test performance can be interpreted
in terms of certain psychological constructs. The process of construct validation involves
identifying and classifying the factors which influence test score so that the test performance
can be interpreted meaningfully.
4) Face Validity
It ascertains that the measure appears to be assessing the intended construct under study. The
stakeholders can easily assess face validity. Although this is not a very scientific type of
validity, it may be an essential component enlisting motivation of stake holders. If the stake
holders do not believe the measure is an accurate assessment of the ability, they may become
disengaged with the task.
5) Formative Validity
When applied to outcomes assessment it is used to assess how well a measure is able to
provide information to help improve the program under study.
11
5. Medium of expression.
6. Extraneous factors like style of expression legibility, handwriting, length of the
answer.
7. Inappropriate time limit.
8. Student inability to understand a test item, guess and respond, this would lower the
validity of the test item.
RELIABILITY
Reliability is the term used to indicate the consistency with which a test measures what it is
designed to measure. It should yield similar result when used on two separate occasions,
provided the other variable remains or how consistent the evaluation result are from one
measurement to another. The three main attributes of a reliable scale are,
The size of the reliability coefficient is related to the methods of estimating reliability in the
following way:
Split half method: Largest reliability coefficients reported for a given tests.
Estimate inflated by factors such as speed is another sub type of internal
consistency reliability. The process of obtaining split- half reliability is begun
by splitting in half all items of a test that are intended to probe the same area
of knowledge ( e.g, World War ll) in order to form two sets of items. The
entire test is administered to a group of individuals, the total score for each
12
set is computed, and finally the split-half reliability is obtained by
determining the correlation between the two totals set scores.
Test- retest method: medium to large reliability coefficients reported for a
given test and it become smaller as time interval between test is increased. It is
a measure of reliability obtained by administering the same test twice over a
period of time to a group of individuals. The scores from Time 1 and Time 2
can then be correlated in order to evaluate the test for stability over time.
Equivalent forms method: (without time interval) medium to large
reliability coefficient reported for a given test.
Equivalent forms method: (with time interval) smallest reliability
coefficient reported for a given test becomes smaller as time interval between
forms is increased.
Types of Reliability
1. Scorer reliability
2. Content reliability
3. Temporal reliability
Scorer reliability: concerns itself with the degree of arrangement between two scorers of the
same test answer. It deals both the degree of consistency in grading the same test answer by
the same scorer on two different occasion and this is also called inter scorer reliability.
Content reliability: it deals with the ability of all the items of a test to measures
competencies in the same general content area.
Temporal reliability: it is the third major dimension of reliability and as the name suggests it
concerns itself with the stability off the result of attest over time, for attest to be reliable it
must yield consistent result overtime.
13
Measures to Improve Reliability
OBJECTIVITY
Objectivity in measurement helps to increase test validity and reliability, this is the extent to
which independent and competent examiners agree on what constitutes a good answer for
each of the items of a measuring instrument. Most standardized test of aptitude and
achievement are high in objectivity.
Advantages of Objectivity
Disadvantages of Objectivity
USABILITY
14
It is important that a test is practical for its purpose. This implies such as factors as the time
taken to conduct the test, the cost of using it and practicability for everyday use. It includes
the practical consideration such as cost, mode of administration, scoring procedures
availability of test and test score interpretation.
Errors of Measurement
There are two types of errors that are concerned in measurement random error or chance error
and systematic or constant error. Validity is concerned with random error. The three
components of measurement score are true score (T) observed score (O) and the error score
(E).
TEST CONSTRUCTION
There are four steps to constructing a test. These steps and procedures help us to produce a
valid, reliable and objective standardized test. The four main steps are:
1. planning the Test
Planning of the test is the first important step in the test construction. The main goal of
evaluation process is to collect valid, reliable and useful data about the student.
Therefore before going to prepare any test we must keep in mind that:
(1) What is to be measured?
15
1. Determining the objectives of testing.
Tests can also be used for formative evaluation. It helps to carry on the teaching learning
process, to find out the immediate learning difficulties and to suggest its remedies. Tests are
used to assign grades or to determine the mastery level of the students. These summative tests
should cover the whole instructional objectives and content areas of the course. Therefore
attention must be given towards this aspect while preparing a test.
16
(i) Determining the weightage to different instructional objectives.
There are vast arrays of instructional objectives. We cannot include all in a single test. In a
written test we cannot measure the psychomotor domain and affective domain. We can only
measure the cognitive domain. It is also true that all the subjects do not contain different
learning objectives like knowledge, understanding, application and skill in equal proportion.
Therefore it must be planned how much weight ago to be given to different instructional
objectives. While deciding this we must keep in mind the importance of the particular
objective for that subject or chapter.
(ii) Determining the weightage to different content areas:
The second step in preparing the table of specification is to outline the content area. It
indicates the area in which the students are expected to show their performance. It helps to
obtain a representative sample of the whole content area. It also prevents repetition or
omission of any unit.
.Weightage of a topic:
17
useful. So that the teacher must decide and select appropriate item types as per the learning
outcomes.
2. Test items should measure all types of instructional objectives and the whole content
area:
The items in the test should be prepared that it will cover all the instructional objectives,
knowledge, understanding, thinking skills and match the specific learning outcomes and
subject matter content being measured
18
The item should be clear. Inappropriate vocabulary should be avoided. The items should be
so worded that all pupils understand the task.
5. The test item must be free from technical errors and irrelevant clues:
Sometimes there are some unintentional clues in the statement of the item which helps the
pupil to answer correctly. For example grammatical inconsistencies, verbal associations,
extreme words (ever, seldom, always), and mechanical features (correct statement is longer
than the incorrect). Therefore while constructing a test item careful step must be taken to
avoid most of these clues.
6. Test items should be free from racial, ethnic and sexual biasness:
The items should be universal in nature. Care must be taken to make a culture fair item.
While portraying a role all the facilities of the society should be given equal importance. The
terms used in the test item should have an universal meaning to all members of group.
But the validity and reliability of the test items to a great extent depends upon the
instructions for the test. N.E. Gronlund has suggested that the test maker should
provide clear-cut direction about;
a. The purpose of testing.
19
c. The basis for answering.
20
Direction must be provided to the students whether they should guess uncertain items or not
in case of recognition type of test items. If nothing is stated about guessing, then the bold
students will guess these items and others will answer only those items of which they are
confident. So that the bold pupils by chance will answer some items correctly and secure a
higher score. Therefore a direction must be given to guess but not wild guesses.
In case of essay type items it should be indicated whether to score with point method or
with the rating method. In the point method each answer is compared with a set of ideal
answers in the scoring Hey. Then a given number of points are assigned.
In the rating method the answers are rated on the basis of degrees of quality and determine
the credit assigned to each answer. Thus a scoring key helps to obtain a consistent data about
the pupils performance. So the test maker should prepare a comprehensive scoring procedure
along with the test items.
21
measured. It implies that the pupils must be provided congenial physical and psychological
environment during the time of testing. Any other factor that may affect the testing procedure
should be controlled.
Physical environment means proper sitting arrangement, proper light and ventilation and
adequate space for invigilation, Psychological environment refers to these aspects which in-
fluence the mental condition of the pupil. Therefore steps should be taken to reduce the
anxiety of the students. The test should not be administered just before or after a great
occasion like annual sports on annual drama etc.
One should follow the following principles during the test administration:
1. The teacher should talk as less as possible.
2. The teacher should not interrupt the students at the time of testing.
3. The teacher should not give any hints to any student who has asked about any item.
4. The teacher should provide proper invigilation in order to prevent the students from
cheating.
When answer is on a separate answer sheet at that time a scoring stencil may be used for
answering the items. Scoring stencil is a sample answer sheet where the correct alternatives
have been punched. By putting the scoring stencil on the pupils answer script correct answer
can be marked. For essay type items separate instructions for scoring each learning objective
may be provided.
In that case to eliminate the effect of guessing the following formula is used:
22
STEP -4. EVALUATING THE TEST:
Evaluating the test is most important step in the test construction process. Evaluation is
necessary to determine the quality of the test and the quality of the responses. Quality of the
test implies that how good and dependable the test is? (Validity and reliability). Quality of the
responses means which items are misfit in the test. It also enables us to evaluate the usability
of the test in general class-room situation.
c. Whether the item is free from irrelevant clues and other defects?
23
c. To increase skill in test construction
CHECKLIST
Checklists are list of items or performances indicators requiring dichotomous responses such
as satisfactory or unsatisfactory, pass or fail, yes or no, present or absent etc.
Characteristics of checklist
24
Observe one respondent at one time.
Clearly specify the characteristics of behaviour to be observed.
Use only carefully prepared checklist to avoid more complex traits.
The observer should be trained how to observe, what to observe, and how to
record the observed behaviour.
Use checklist only when you are interested in calculating, a particular
characteristics.
Construction of Checklists
Merits of Checklists.
Short and easy to assess and record.
Useful for evaluation of specific well-defined behaviours and are commonly
used in the clinical simulated laboratory setting
They can be used for both process and procedure evaluation.
They are adaptable to most subject matter areas.
They allow inter individual comparisons to be made on common set of traits
or characteristics.
25
The checking process implies that standards and criteria are available for gauging items. The
inspection procedure requires scrutiny of behaviour under investigation. Check lists are most
useful for determining the status of tangible items ,such as inventory and maintenance of
equipment and supplies .They have the advantage that items to be observed can be
determined in advance and will be the same criteria used in each situation. But there is no
guarantee however, that the observed behaviour is a persistent one and that the procedure will
provide a representative picture of the individual being evaluated.
Limitations
Does not indicate the quality of performance .Only a limited component can be
assessed.
Only the presence or absence of an attribute, behaviour or performance parameter
may be assessed. However, degree of accuracy of performance cannot be assessed.
It has a limited use in qualitative research studies.
RATING SCALE
Rating means the judgement of one person by another. A rating scale is a method by which
we systemize the expression of opinion concerning a trait. Rating scale is a common
evaluation tool used in describing observed skills and performance. More than noting the
presence or absence of a behaviour to a point on a continuum and also involves judgements
regarding quantitative and qualitative abilities. In simple words, it consists of a set of
characteristics or qualities to be judged and a scale for indicating the degree up to which the
particular attribute is present.
A rating scale is a device used to evaluate situations or characteristics that can occur or be
present in varying degrees, rather than merely be present or absent as in the instrument so
designed as to facilitate appraisal of a number of traits characteristics by reference to a
common quantitative scale of values.
26
Rating scales resembles checklists but are used when finer discriminations are required.
Instead of merely indicating the presence or absence of a trait or characteristic, it enables us
to indicate the degree to which a trait is present. Rating scale provides systematic procedure
of obtaining. Recording and reporting to the observer judgement. That may be filled out
while the observation is made, immediately after the observation is made or as often in the
case, long after the observation.
Rating scale consists of a set of characteristics or qualities to be judged and some type of
scale for indicating the degree to which an attribute is present.
These types of rating scales use descriptive phrases to identify the points on a graphic scale.
The descriptions are brief details that convey in behavioural terms for each trait (how pupils
behave at different steps along then scale).The rater selects the one most applicable to the
person. A space for comment is also frequently provided to enable the rater to clarify the
rating or ton record behavioural incidents pertinent to the rating.
The rater indicates the performers standing in respect to each trait by placing a check mark at
an appropriate point along the line. In this, each line is followed by a horizontal line. The
rating is made by placing a tick on the line. A set of categories identify specific position along
the line, but the rater can also click between these points. Here the degree of each character is
arranged so that the rater can make as fine distinctions as he wishes to make.
27
In this the extent or degree to which a particular attribute is present in an individual is
indicated by numbers. The observer puts a tick or circle on the number to which the student
possess that attribute. Each number is given a verbal description that remains constant for a
particular character. It includes numbers against which a list of behaviours is evaluated. This
is not a very reliable tool because of the in consistent value attributed to the number. It can be
partially overcome by adding a few quantitative terms. The numerical rating scale is useful
when the characteristics or qualities to be rated can be classified in to a limited number of
categories and when there is a general agreement concerning the category represented by
each number.
BARS is an acronym for behaviourally anchored rating scales, sometimes known as BES,
behavioural expectation scales. A BAR evaluate behaviour relevant to specific demands of
the job and provides examples of specific job behaviours corresponding to good, average and
poor performances. This reduces the amount of personal judgement needed by the rater.
Separate BARS are needed for each job. BOS as an acronym for Behavioural Observation
scales. This system capitalizes on some of the strengths of BARS while avoiding some of the
disadvantages. The evaluator lists a number of critical incidents for each performance
dimension and rates the extent to which the behaviour has been observed on a five point scale
ranging from almost never to almost always.
28
Characteristics of rating scale
Rating scales are value judgement about attributes of one person by another person.
These scales are generally developed to make quantitative judgements about
qualitative attributes.
Rating scale provides more flexibility to judge the level of performance or presence of
attributes among subjects.
Guilford (1954) identified that a rating scales must have following basic
characteristics, which must be taken care of while construction a rating scale.
Clarity- Rating scale must be constructed using short, concise statements in
simple and unambiguous language.
Relevance- The statements designed in rating scale should be relevant to the
phenomenon and it should be exactly in accordance with the variable under
study.
Variety- While developing a rating scale, monotony in the statements must
be avoided and a variety in different statement must be ensured.
Objectivity statements in the rating scale must be objective in nature, so
that it is convenient for the researcher to judge the attributes or performance
of the subjects under study.
Uniqueness- each statement constructed in rating scale must be unique in
itself, so that attributes can be judged appropriately.
Advantages of Rating Scales
Rating scales are easy to administer and score.
They can be used for large number of students.
They have a wide range of application.
They give a clear feedback to the students.
They are used to evaluate skills, product outcomes, activities, interests, attitudes and
personal characteristics.
It is used to observe and record qualitative and quantitative judgements about
observed performance.
They tend to be adaptable and flexible.
They are efficient and economical in the use of time.
They can help to reduce the subjectivity and unreliability that are usually associated
with the observation method.
It will direct observation toward specific aspects of behaviour.
It will provide a common frame of reference for comparing all pupils on the same set
of characteristics.
It will provide a convenient method for recording the observers judgments.
Disadvantages of Rating Scales
29
However, be careful, a degree of subjectivity is inevitable while using the rating
scales.
Misuse can result in decrease in objectivity.
Objective structured clinical examination (OSCE) is a modern type of examination often used
in health sciences (eg: Medicine, physical therapy, radiography, nursing, pharmacy, dentistry)
to test clinical skill performance and competence in skills such as communication, clinical
examination, medical procedures/prescription, exercise prescription, joint
mobilisations/manipulations techniques and interpretation of results.
OSCE Design
An OSCE usually comprises a circuit of short stations, in which each candidate is examined
on a one to-one basis with one or two impartial examiners and either real or stimulated
patients. Each station has a different examiner, as opposed to the traditional methods of
clinical examinations where a candidate would be assigned to an examiner for the entire
examination. Candidates rotate through the stations, completing all the stations, on their
circuit. In this way, all candidates take the same stations. It is considered to be an
improvement over traditional examination methods because the station can be standardised
enabling fairer peer comparison and complex procedures can be assessed without
endangering patients health.
Objective- all candidates are assessed using exactly the same stations with the same marking
scheme. In an OSCE, candidates get marks for each step on the mark scheme that they
performed correctly which therefore makes the assessments of clinical skills more objective
rather than subjective, where one or two examiners decide whether or not the candidate fails
base on their subjective assessment of their skills.
Structured- stations in OSCEs have very specific task where simulated patients are used,
detailed scripts are provided to ensure that the information that they give is the same to all
candidates, including the emotions that the patients should use during the consultations.
Instructions are carefully written to ensure that the candidate is given a very specific task to
complete. The OSCE is carefully structured to include parts from all elements of curriculum
as well as a wide range of skills.
Clinical examinations- the OSCE are designed to apply clinical and theoretical knowledge,
where theoretical knowledge is required (for example, answering questions from the
examiner at the end of the station) then the questions are standardised and the candidate is
only asked questions that are on the mark sheet.
30
OSCE Marking
Marking in OSCEs is done by examiner. One of the way, an OSCE is made objective is by
having a detailed mark scheme and standard set of questions. For example, a station
concerning the demonstration to a simulated patient on how to use Metered Dose Inhaler
(MDI) would award points for specific actions which are performed safely and accurately.
The examiner can often vary the marks depending on how well the candidate performed the
steps. If a simulated is used then they are often asked to add marks depending on the
candidates approach. At the end, the examiner is often asked to rate the candidate as pass /
borderline/fail (or sometimes as excellent / good/pass/borderline/fail)
Many centres allocate each station an individual pass mark. The sum of the pass marks of all
the stations determines the overall pass mark for the OSCE.
Preparation
Preparation for OSCEs is very different from preparing for an examination on theory. In an
OSCE, clinical skills are tested rather than pure theoretical knowledge. It is essential to learn
correct clinical methods and then practice repeatedly until one perfects the methods. Marks
are awarded for each step in the method : hence it is essential to dissect the method in to its
individual step, learn the steps and then learn to performed the steps in sequence. It is often
very helpful to practise in small groups with colleagues, setting a typical OSCE scenario and
timing it with one person role playing a patient, one person doing the task and (if possible)
one person either observing or commenting on technique or even role playing the examiner
using a simple mark sheet.
Although OSCEs are performed in many settings in regard to the exam purpose, the
organizing institution, and available facilities, they all share similar procedures. On the
examination day, you will go through the following steps in sequence:
1.Registration:
2. Orientation:
31
Exam format, procedures and policies will be reviewed.
Introduced to your team and team leader.
Instructed about your starting station and how to proceed.
Your questions will be answered (and not allowed beyond this step).
Now it is exam time. You will be escorted to your station. You will stop by the assigned room
door until a long bell/ buzzer announces the start of the exam.
This is one or two minutes to read the instruction about this station situation, patient, and
required tasks. Read carefully. At the next bell/ buzzer enter the room.
5. The Encounter:
Start your encounter with the SP. This is a 5-20 minute encounter. Perform the required tasks.
Stop at the next bell/ buzzer.
6.Post Encounter Period: Next is a question period. There are some differences here. Some
OSCEs will have no post encounter periods. Some will have one or two minutes of the
encounter period assigned to oral questions asked by the examiner inside the exam room. No
more communication is allowed with the SP. Others have written questions to be answered on
paper or computer outside the exam room for 5-10 minutes. At the next long bell/ buzzer, the
first station ended as well as the next station has started. You have to proceed to the next
station quickly as it is the same long bell/ buzzer at step 4
7. Repeat Steps 4 to 6:
Steps 4 to 6 will be repeated until you have been in all the stations. Some OSCEs will offer
one or two short rest periods.
You will be escorted back to the dismissal area for signing out. You will be asked to handle
back all what you had received on signing in, the ID badge, remaining stickers, all the papers,
and the pencil. You may also be asked to stay without outside contacts for some time
(sometimes hours) for exam security reasons.
CONCLUSION
Testing , measurement and evaluation play an important role in all educational institution,
including nursing educational institution. Student achievement is the profoundly influenced
32
by the evaluation practice used by the teachers in the classroom. The nurse educator is
responsible for evaluating students in order to improve their class and clinical performance.
REFERENCES
1. Basheer p Shabeer. Text book of nursing education. Emmess medical publishers. New
Delhi.2012. 1st edition.219-225
2. R Sudha . Text book nursing education .Jaypee brothers medical publishers. New
Delhi.2013. 1st edition.180-190.
3. Noraman. E. Gronlund, Robert . Measurement and evaluation in teaching ,
Macmillian publisher. Newyork. 1985-395.
4. Elizebath C King . Affective education in nursing. Aspen publication. Mary land .
1984 .80-84
5. D. Elakkuvana Bhaskara Raj, Nima Bhaskar. Text book of nursing education.Emmess
publishers. Bangalore.2015.2nd Edition.
33
34