Chapter 5

The characteristics of a good test.

The evaluation is expected to be able to provide feedback for the overall learning, Therefore We need to
do well and use the test as a tool quality and it can account. It has characteristics; (1). Validity, (2)
Reliability, (3) Etcetera.

1. Validity

Validity that is associated with the suitability of the test as a measurement tool with the main target that
needs to be measured. For example, we measure the ability to read, then there is no benefit to
conducting the speaking test and others. Every People have different ability. The reading ability test is
only valid, relevant, suitable for reading ability measurements, and does not exist for speech or other
abilities. The next of this section examines

2.The nature of validity,

Validity can be identified through its study of conformity with content, criteria, and construction.
Proving the validity of the content can be done by compiling a grid that compares the task. The validity
of the criteria is proven by comparing test scores. While the validity of the construction is theoretical
exposure to explain behavior.in the validity discourse, there is a logical validity or validity of reasoning
that relies on proof through reasoning or argumentation about the suitability of the test with the test
objectives, which is distinguished from empirical validity using empirical data from the results of
statistical calculations on the scores of 2 tests. Empirical validity can be distinguished between
concurrent validity and predictive validity. Face validity, namely proof of validity based solely on the
impression of what appears from the outside.

3.Verification of validity.

The validity of a test does not just happen on its own, it needs to be worked out, to be explained and
proven. Anyone who compiles and develops a test needs to give an explanation and verification that the
test is indeed valid.

Empirical verification of validity is done by presenting empirical evidence in the form of the calculation of
the correlation between two series of scores obtained by the same group of test participants from 2
different texts, namely validated tests and similar tests whose validity level has been proven.

reliability is a measuring tool whose measurement results are used to make various important decisions,
the text is expected to produce consistent, unchanging, and reliable. Reliability is only expressed
empirically using statistical figures.

5.Kind of reliability

The kind of reliability are;

1. Test-retest reliability

This type of reliability bases its application on the assumption that reliable tests will produce the same
results (in the form of scores) or not significantly different, if the implementation is repeated for the
same number of test-takers. The repetition is carried out after a grace period arranged in such a way
that the test taker does not do (repeat) the same test solely on the basis of memory of what was done
when taking the same test before.

2. Equivalent-forms reliability or alternate-forms reliability.

Equivalent-forms reliability or alternate-forms reliability requires the two test equivalent and serving to
one group of the same participant. It means with Test equivalent is the same test in various aspects,
except test items. The similar main aspects related to the main test. The similarities main aspect related
to the main test, type test, context, number of item test, also level difficulty and sequence of test items.

3. Split-half reliability

In the application of the calculation split-half reliability, Test held one time without repetition. After it
does an assessment to the job for all of test' participants, the score list is divided into two parts with the
same number of test items, for instance, every 10 scores test if amount all of the 20 number test.

4. Cronbach alpha reliability

This type reliability Crinbach alpha have two various;

(1). We can use as the alternative reliability calculation Equivalent tests include halves, Crinbach alpha

(2). Especially, this is for the test that shape an essay such as fabricate.

5. Cronbach alpha reliability for writing skill

6. Reliability Kuder-Richardson
This calculation reliability with K-R method require one time test organizer. An answers of the number
test suspended as dichotomy, like right skor is(1), or wrong skor is (0). There are two versions of formula
which it is often used; a. K-R20 and b. K-R21 is easier and many people use for the teaching test.

7. Rater reliability

Rater reliability means for applying in the specific field, including language field, another type of ability
to test not appropriate to manifest in the test item. This test does not consist of test items. It has two
scores that needed for collecting the level reliability for this type. It can use for collecting (1). Intra-rater
reliability is based on the occupation for a number of test-takers by one appraisers that can do two
times. This thing can be done through the work of test-takers in the form of a written exposure that must
be read twice.

(2). Inter-rater reliability. It is collecting the lever reliability from the row 2 scores which is obtained from
two peoples of a number of the same test takers. Inter-rater reliability score can be done together, such
as that happen in the speaking skill, where for the showing of work test takers and it can suspended
more than examiner.

8. Estimated reliability

The application of reliability, which is described as a shortcut in calculating the level of reliability,
estimates the level of reliability based on the number of test items, average scores and standard
deviations.for calculating the reliability level of a test it is first necessary to determine the level of test
difficulty based on the average score. The test is categorized as easy when the average score ranges
between 70% and 90% is correct. When the average score is only 50% and 70% correct the test is
categorized as difficult.

other features

in addition to validity and reliability as two main types of text stories, there are also other features
that sometimes need to be considered its application, especially 1. practicality and 2. financial burden.

practicality is simplicity and the complexity of administering the test without showing the use of
places, tools, and procedures that are extremely complicated, including how to answer questions and
do assignments.

while the financial burden indicates the need to consider the financial consequences that must be
provided by the test organizer, as well as the greater test taker, whose implications should be taken
against the organizers or individuals, and does not reduce the chance of participating in a test, solely
because of the burden of providing too much funding weight.