Вы находитесь на странице: 1из 8

Course: Teacher Education in Pakistan (8602)

Semester: Autumn, 2017


Level: B. Ed
Assignment No.2

Q 1: State different methods to enhance the reliability of the measurement tool also explain each
type by providing examples.
Answer:
Reliability is a measure of the consistency of a metric or a method. Every metric or method we use,
including things like methods for uncovering usability problems in an interface and expert judgment, must
be assessed for reliability. In fact, before you can establish validity, you need to establish reliability. Here
are the four most common ways of measuring reliability for any empirical method or metric:
 inter-rater reliability
 test-retest reliability
 parallel forms reliability
 internal consistency reliability
Because reliability comes from a history in educational measurement (think standardized tests), many of
the terms we use to assess reliability come from the testing lexicon. But don’t let bad memories of testing
allow you to dismiss their relevance to measuring the customer experience. These four methods are the
most common ways of measuring reliability for any empirical method or metric.
Inter-Rater Reliability
The extent to which raters or observers respond the same way to a given phenomenon is one measure of
reliability. Where there’s judgment there’s disagreement. Even highly trained experts disagree among
themselves when observing the same phenomenon. Kappa and the correlation coefficient are two
common measures of inter-rater reliability. Some examples include:
 Evaluators identifying interface problems
 Experts rating the severity of a problem
For example, we found that the average inter-rater reliability[pdf] of usability experts rating the severity of
usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate multiple
scores from one observer. In that same study, we found that the average intra-rater reliability when
judging problem severity was r = .58 (which is generally low reliability).
Test-Retest Reliability
Do customers provide the same set of responses when nothing about their experience or their attitudes
has changed? You don’t want your measurement system to fluctuate when all other things are static. Have
a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few days,
typically), have them answer the same questions again. When you correlate the two sets of measures, look
for very high correlations (r > 0.7) to establish retest reliability. As you can see, there’s some effort and
planning involved: you need for participants to agree to answer the same questions twice. Few
questionnaires measure test-retest reliability (mostly because of the logistics), but with the proliferation of
online research, we should encourage more of this type of measure.

1
Parallel Forms Reliability
Getting the same or very similar results from slight variations on the question or evaluation method also
establishes reliability. One way to achieve this is to have, say, 20 items that measure one construct
(satisfaction, loyalty, usability) and to administer 10 of the items to one group and the other 10 to another
group, and then correlate the results. You’re looking for high correlations and no systematic difference in
scores between the groups.
Internal Consistency Reliability
This is by far the most commonly used measure of reliability in applied settings. It’s popular because it’s
the easiest to compute using software—it requires only one sample of data to estimate the internal
consistency reliability. This measure of reliability is described most often using Cronbach’s alpha
(sometimes called coefficient alpha).
It measures how consistently participants respond to one set of items. You can think of it as a sort of
average of the correlations between items. Cronbach’s alpha ranges from 0.0 to 1.0 (a negative alpha
means you probably need to reverse some items). Since the late 1960s, the minimally acceptable measure
of reliability has been 0.70; in practice, though, for high-stakes questionnaires, aim for greater than 0.90.
For example, the SUS has a Cronbach’s alpha of 0.92.
The more items you have, the more internally reliable the instrument, so to increase internal consistency
reliability, you would add items to your questionnaire. Since there’s often a strong need to have few items,
however, internal reliability usually suffers. When you have only a few items, and therefore usually lower
internal reliability, having a larger sample size helps offset the loss in reliability.
In Summary
Here are a few things to keep in mind about measuring reliability:
 Reliability is the consistency of a measure or method over time.
 Reliability is necessary but not sufficient for establishing a method or metric as valid.
 There isn’t a single measure of reliability, instead there are four common measures of consistent
responses.
 You’ll want to use as many measures of reliability as you can (although in most cases one is
sufficient to understand the reliability of your measurement system).
 Even if you can’t collect reliability data, be aware of the ways in which low reliability may affect the
validity of your measures, and ultimately the veracity of your decisions
{===============}
Q 2: Explain the effects of content and face validities on performance of the examines. Also
highlight the factors affecting the validity of the measurement tool provide example to
support the answer.
Answer:
Conventional views of validity (Cronbach, 1971)
 Face validity: Face validity simply means that the validity is taken at face value. As a check on face
validity, test/survey items are sent to teachers or other subject matter experts to obtain suggestions
for modification. Because of its vagueness and subjectivity, psychometricians have abandoned this

2
concept for a long time. However, outside the measurement arena, face validity has come back in
another form. While discussing the validity of a theory, Lacity and Jansen (1994) defines validity as
making common sense, and being persuasive and seeming right to the reader. For Polkinghorne
(1988), validity of a theory refers to results that have the appearance of truth or reality.
The internal structure of things may not concur with the appearance. Many times professional
knowledge is counter-common sense. The criteria of validity in research should go beyond "face,"
"appearance," and "common sense."
 Content validity: In the context of content validity, we
draw an inference from the test scores to a larger domain
of items similar to those on the test. Thus, content validity
is concerned with sample-population
representativeness. i.e. the knowledge and skills covered
by the test items should be representative to the larger
domain of knowledge and skills.
For example, computer literacy includes skills in operating
system, word processing, spreadsheet, database, graphics,
internet, and many others. However, it is difficult, if not
impossible, to administer a test covering all aspects of
computing. Therefore, only several tasks are sampled from the
universe of computer skills.
Content validity is usually established by content experts. Take computer literacy as an example again. A
test of computer literacy should be written or reviewed by computer science professors because it is
assumed that computer scientists should know what are important in his own discipline. At first glance,
this approach looks similar to the validation process of face validity, but yet there is a subtle difference. In
content validity, evidence is obtained by looking for agreement in judgments by judges. In short, face
validity can be established by one person but content validity should be checked by a panel, and thus
usually it goes hand in hand with inter-rater reliability.
However, this approach has some drawbacks. Usually experts tend to take their knowledge for granted
and forget how little other people know. It is not uncommon that some tests written by content experts
are extremely difficult.
Second, very often content experts fail to identify the learning objectives of a subject. Take the following
question in a philosophy test as an example:
What is the time period of the philosopher Epicurus?

a. 341-270 BC

b. 331-232 BC

c. 280-207 BC

d. None of the above


This type of question tests the ability of memorizing historical facts, but not philosophizing or any form of
logical reasoning. The content expert may argue that "historical facts" are important for a student to

3
further understand philosophy. Let's change the subject to computer science and statistics. Look at the
following two questions:
When was the founder and CEO of Microsoft, William Gates III born?

a. 1949

b. 1953

c. 1957

d. None of the above


Which of the following statement is true about ANOVA

a. It was invented by R. A. Fisher in 1914

b. It was invented by R. A. Fisher in 1920

c. It was invented by Karl Pearson in 1920

d. None of the above


It would be hard pressed for any computer scientist or statistician to accept that the above questions fulfill
content validity. As a matter of fact, the memorization approach is still a common practice among
instructors.
Further, sampling knowledge from a larger domain of knowledge involves subjective values. For example,
a test regarding art history may include many questions on oil paintings, but less questions on watercolor
paintings and photography because of the perceived importance of oil paintings in art history.
Content validity is sample-oriented rather than sign-oriented. A behavior is viewed as a sample when it
is a subgroup of the same kind of behaviors. On the other hand, a behavior is considered a sign when it is
an indictor or a proxy of a construct (Goodenough, 1949). Construct validity and criterion validity, which
will be discussed later, are sign-oriented because both of them indicate behaviors that are different from
those of the test.
{===============}
Q 3: Write down learning outcomes for any unit of English for 9th class and develop an easy type
test item with rubric, 5 multiple choice questions and 5 short questions for the written
learning outcomes.
Answer:
Humans are linguistic animals. Language is the most fundamental and pervasive tool we have for
interpreting our world and communicating with others as we act in and attempt to transform that world.
Whether they pursue an emphasis in literature or writing, English majors gain a deeper understanding of
the resources of the written word. Both literature and writing courses help students explore how writers
use the creative resources of language-in fiction, poetry, nonfiction prose, and drama-to explore the entire
range of human experience. English courses help students build skills of analytical and interpretive
argument; become careful and critical readers; practice writing-in a variety of genres-as a process of

4
intellectual inquiry and creative expression; and ultimately to become more effective thinkers and
communicators who are well-equipped for a variety of careers in our information-intensive society.
Specific learning outcomes for English courses include the following:
1. Reading: Students will become accomplished, active readers who appreciate ambiguity and complexity,
and who can articulate their own interpretations with an awareness and curiosity for other perspectives.
2. Writing skills and process: Students will be able to write effectively for a variety of professional and
social settings. They will practice writing as a process of motivated inquiry, engaging other writers’ ideas
as they explore and develop their own. They will demonstrate an ability to revise for content and edit for
grammatical and stylistic clarity. And they will develop an awareness of and confidence in their own voice
as a writer.
3. Sense of Genre: Students will develop an appreciation of how the formal elements of language and
genre shape meaning. They will recognize how writers can transgress or subvert generic expectations, as
well as fulfill them. And they will develop a facility at writing in appropriate genres for a variety of
purposes and audiences.
4. Culture and History: Students will gain a knowledge of the major traditions of literatures written in
English, and an appreciation for the diversity of literary and social voices within–and sometimes
marginalized by–those traditions. They will develop an ability to read texts in relation to their historical
and cultural contexts, in order to gain a richer understanding of both text and context, and to become
more aware of themselves as situated historically and culturally.
5. Critical Approaches: Students will develop the ability to read works of literary, rhetorical, and cultural
criticism, and deploy ideas from these texts in their own reading and writing. They will express their own
ideas as informed opinions that are in dialogue with a larger community of interpreters, and understand
how their own approach compares to the variety of critical and theoretical approaches.
6. Research Skills: Students will be able to identify topics and formulate questions for productive inquiry;
they will identify appropriate methods and sources for research and evaluate critically the sources they
find; and they will use their chosen sources effectively in their own writing, citing all sources appropriately.
7. Oral communication skills: Students will demonstrate the skills needed to participate in a conversation
that builds knowledge collaboratively: listening carefully and respectfully to others’ viewpoints; articulating
their own ideas and questions clearly; and situating their own ideas in relation to other voices and ideas.
Students will be able to prepare, organize, and deliver an engaging oral presentation.
8. Valuing literature, language, and imagination: Students will develop a passion for literature and
language. They will appreciate literature’s ability to elicit feeling, cultivate the imagination, and call us to
account as humans. They will cultivate their capacity to judge the aesthetic and ethical value of literary
texts–and be able to articulate the standards behind their judgments. They will appreciate the expressive
use of language as a fundamental and sustaining human activity, preparing for a life of learning as readers
and writers.
{===============}
Q 4: Describe the measures of variation also elaborate how these measures can be utilized in the
interpretation of the test results, provide example where necessary.
Answer:

5
I was going to write this about intelligence and intelligence quotients, but that got really complicated
really fast. So, let's talk about obesity instead, because you're more likely to hear about the rising rates of
obesity rather than the rising IQs. (Unless you go on to do psychological assessments. Then I would
recommend you look up the Flynn Effect and the Idiocracy Theory.) But this lesson is about weight and
understanding the descriptions of it.
Weight, like so many other things, is not static or unchanging. Not everyone who is 6 feet tall is 200
pounds - there is some variability. When reporting these numbers or reviewing them for a project, a
researcher needs to understand how much difference there is in the scores. This is where we will look
at measures of variability, which are statistical procedures to describe how spread out the data is. They
are:
 Range: defined as a single number representing the spread of the data
 Standard deviation: defined as a number representing how far from the average each score is
 Variance: defined as a number indicating how spread out the data is
When trying to understand how spread out the data is, we, as researchers, need to differentiate and know
the difference between population and sample. A population is defined as the complete collection to be
studied, like all the police officers in your city. A sample is defined as a section of the population and
would be a selection of police officers you are studying. This can be anywhere from 1% to 99% of them.
When researchers do psychological experiments, they often must work with samples, because to find
everyone in the population is nearly impossible. If you want a population data set, such as the world's
weight, for example, that would be about seven billion data points. If you wanted the population data set
of everyone in California, then that means you need about 33 million data points. In my own town, this is
about 100,000 people. The trick is trying to make your sample data look like the population, which means
you need to find measures on how variable your data is compared to the estimated population.
Measures of variation Quantities that express the amount of variation in a random
variable (compare measures of location). Variation is sometimes described as spread or dispersion to
distinguish it from systematic trends or differences. Measures of variation are either properties of
a probability distribution or sample estimates of them.
The range of a sample is the difference between the largest and smallest value. The interquartile range is
potentially more useful. If the sample is ranked in ascending order of magnitude two values of x may be
found, the first of which is exceeded by 75% of the sample, the second by 25%; their difference is the
interquartile range. An analogous definition applies to a probability distribution.
The variance is the expectation (or mean) of the square of the difference between a random variable and
its mean; it is of fundamental importance in statistical analysis. The variance of a continuous distribution
with mean μ is and is denoted by σ2. The variance of a discrete distribution is and is also denoted by σ2.
The sample variance of a sample of n observations with mean x̄ is and is denoted by s2. The value (n – 1)
corrects for bias.
The standard deviation is the square root of the variance, denoted by σ (for a distribution) or s (for a
sample). The standard deviation has the same units of measurement as the mean, and for a normal
distribution about 5% of the distribution lies beyond about two standard deviations each side of the
mean. The standard deviation of the distribution of an estimated quantity is termed the standard error.
The mean deviation is the mean of the absolute deviations of the random variable from the mean.

6
{===============}
Q 5: Write notes on the following:
a) Trades in assessment
Answer:
Educational assessment is the systematic process of documenting and using empirical data on the
knowledge, skill, attitudes, and beliefs to refine programs and improve student learning. Assessment data
can be obtained from directly examining student work to assess the achievement of learning outcomes or
can be based on data from which one can make inferences about learning.[2] Assessment is often used
interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the
learning community (class, workshop, or other organized group of learners), a course, an academic
program, the institution, or the educational system as a whole (also known as granularity). The word
'assessment' came into use in an educational context after the Second World War.
As a continuous process, assessment establishes measurable and clear student learning outcomes for
learning, provisioning a sufficient amount of learning opportunities to achieve these outcomes,
implementing a systematic way of gathering, analyzing and interpreting evidence to determine how well
student learning matches expectations, and using the collected information to inform improvement in
student learning.
The final purpose of assessment practices in education depends on the theoretical framework of the
practitioners and researchers, their assumptions and beliefs about the nature of human mind, the origin of
knowledge, and the process of learning.
{===============}
b) Use of computer in testing
Answer:
Technology can be used for assessment purposes at various levels ranging from the management of the
assessment information to a fully automated assessment system. Using technology for the management
of assessment information can enable information to be presented in different ways to meet the needs of
different audiences (such as teachers, students, course organisers and external examiners). Not only the
quality of presentation of reports but more importantly the range and scope of their content can be
improved by utilising technology for this purpose. At the other extreme, in a fully automated assessment
system all aspects of the system from the assessment which the student completes to the processing and
administration of the marks, including the overall management of assessment information, is technology-
based.
Assessment strategy should be considered during Course Design, here it is useful to ask ourselves:
 What kind of things do we want our students to learn?
 What opportunities will be provided?
 What assessment tasks will be set?
 What methods of assessment will be used?
Analysing the types of learning which we require to take place has significant implications for the
instructional design, assessment strategies and methods. Since assessment involves measuring it gives rise

7
to problems in: Choosing a valid assessment instrument; Finding a suitable unit of measurement; Ensuring
the test is measuring what it is supposed to measure; Scorer reliability, especially if more than one marker
is involved; Using valid statistical methods and drawing valid inferences from measures.
Why change current practice?
 Drive to find effective new and improved assessment methods.
 Focus on Quality Assurance.
 The changing aims of HE (mix of academic, vocational and general transferable skills).
 Emphasising deep learning rather than shallow.
 Interest in using new teaching methods.
 Need for greater cost effectiveness.
 To save staff time.
Advantages in the use of IT for Assessment
 More frequent formative and summative assessment.
 Staff can be alerted sooner to adapt their teaching.
 Can spend less time marking.
 Self-assessment; in the student's own time, at their own pace, when they are ready.
 Increased student confidence.
 Students like rapid results.
{===============}
c) Use of e-portfolio
Answer:
An electronic portfolio (also known as an eportfolio, e-portfolio, digital portfolio, or online portfolio) is a
collection of electronic evidence assembled and managed by a user, usually on the Web. Such electronic
evidence may include input text, electronic files, images, multimedia, blog entries, and hyperlinks. E-
portfolios are both demonstrations of the user's abilities and platforms for self-expression. If they are
online, users can maintain them dynamically over time.
One can regard an e-portfolio as a type of learning record that provides actual evidence of achievement.
Learning records are closely related to the learning plan, an emerging tool which individuals, teams,
communities of interest, and organizations use to manage learning.[citation needed] To the extent that a
personal learning environment captures and displays a learning record, it may also operate as an
electronic portfolio.
E-portfolios, like traditional portfolios, can facilitate students' reflection on their own learning, leading to
more awareness of learning strategies and needs. Comparative research by M. van Wesel and A. Prop
between paper-based portfolios and electronic portfolios in the same setting tentatively suggests that use
of an electronic portfolio may lead to better learning outcomes.
{===============}

Вам также может понравиться