Вы находитесь на странице: 1из 12

NAME

: ACHMAD MUTTAQIEN

NIM

: F12111060

CLASS

: SEMINAR ON ELT

SEMESTER: SEVEN

1. how to use Kr21


2. how to use T test one tile and two tile
3. how to use effectsize
4. how to use CAR design and experimental design
5. how to know the instruments is valid.
expert voices supported.

1. KUDER RICHARDSON 21 (KR21)


To know the reliability of the test we can use KR21. A test said to be reliable if the result is
consistent. Reliability of research concerns the replicability and consistency of the methods,
conditions, and results.Reliable data is evidence that you can trust.That means that if someone
else did the same experiment they would get the same result.
That formula KR21 can be used only if it can be assumed that the items are of equal
difficulty.
The formula is :

)
K
( K1
)(1 M (KK(M
s) )

KR21=

s=

( x )
n

or

( xx )2

s=
2

Where K= the number of items in the test; M = the mean of the test scores; and s = the standard
deviation of the test scores.

Reliability coefficient

Expresses a relationship, but this time it is between scores of the same individuals on the
same instrument at two different times, or between two parts of the same instrument.

Reliability is related to these parts. If scores have large error components,reliability is


low; but if there is little error in the scores, Reliability is high.

Reliability coefficient can take on values from 0 to 1.0, inclusive. Conceptually, if a


reliability coefficient where 0, there would be no "true" component in the observed score.
The observed score would consist entirely of error. On the other hand, if the reliability
coefficient where 1.0, the observed score would contain no error.

Coefficient
0.00 0.19
0.20 0.39
0.40 0.59
0.60 0.79
0.80 1.00

Reliability of the test


Negligible (N)
Low (L)
Moderate (M)
Substantial (S)
High to very high

Threats to and Techniques for Improving Reliability Indices


1. Threats to Reliability
a. Group homogeneity: When a test is given to a very similar (homogeneous) group, the
resulting scores are closely clustered and the reliability coefficient will be low. The more
heterogeneous the examinee group, the higher the correlation coefficient.
b. Time limits: The rate at which an examinee works will systematically influence
performance, as some will finish the test and some not.
c. Test or measure length: If a test is too short, then the reliability coefficient will be low.
d. Scoring errors (e.g., inconsistent scoring) will depress a reliability estimate. Keep
scoring simple and consistently applied.
e. Item Quality: Poorly constructed test items introduce ambiguity into the testing situation,
thus affecting examinee performance.

f. Other threats: These include differences in content on test or measure forms,


administration, examinee, and/or scoring errors, guessing, effects of memory, practice,
boredom, etc.
2. To Improve Reliability
a. Group homogeneity: Ensure that the group is as heterogeneous, respecting the skill or
content to be tested as is natural.
b. Time limits: Ensure there is enough time for almost all examinees to finish.
c. Test or measure length: Ensure the test is long enough to accurately reflect the content or
[assessment] domain. Test length tends to increase the magnitude of the reliability
coefficient. Test length is a balance between time and content. It is wisest to select the
most important content and test for that. Leave the minor points alone.
d. Test or measure items: Use only high quality test items which conform to generally
accepted editorial guidelines. Prepare examinees by teaching test taking strategies. Well
written items will contribute significantly to reliability.
e. Examinees or respondents: Poorly motivated, fatigued, uncomfortable examinees will
perform below their capability. Examinees should be encouraged to get sufficient sleep
and food; the testing environment should be comfortable; and everything possible should
be done to motivate examinees.
f. Item difficulty: If a test is too easy or too difficult, the coefficient is reduced as variance
is squeezed out. Target p-values should be between .40 and .60, generally.
2. T -TEST
If we establish the statistical test to determine whether the difference between the sample
mean and the population mean is significant then, it is called the t-test, and it is used when
comparing sample means, when only the sample standard deviation is known.
Use of a t-test can help us to decide whether the difference between the conditions is
"real" or whether it is due merely to chance fluctuations from one time of testing to another. The
t-test enables us to decide whether the mean of one condition is really different from the mean of
another condition.
The t-test, and any statistical test of this sort, consists of three steps.

1. Define the null and alternate hyptheses,


2. calculate the t-statistic for the data,
3. compare tcalc to the tabulated t-value, for the appropriate significance level and degree of
freedom. If tcalc >ttab, we reject the null hypothesis and accept the alternate hypothesis.
Otherwise, we accept the null hypothesis.
Separate Variance Formula
Use the separate variance formula if :
If n1=n 2s 21=s22
2

df =n1 +n22

If n1=n 2s 1 s2

df =n11n21

If n1 n 2s 21 s22

df =n11n21

t=

x 1x 2

2
1

2
2

s s
+
n1 n 2

Differencebetweenmeans
variance
samplesize

Where
x 1=mean of sample 1
x 2=mean of sample 2
n1=number of subject sample 1
n2=number of subject sample 2
2

( x1 x1 )
s =variance of sample 1=
2
1

n1

( x2 x2 )
s =variance of sample 2=
2
2

n2

Pooled Variance Formula


Use the pooled variance formula if :
If n1=n 2s 21=s22

df =n1 +n22

If n1 n 2s 1=s2

df =n1 +n22

If n1=n 2s 21 s22

df =n11n21

t=

x 1x 2

( x 1 x 1 ) + ( x 2 x 2 )2
n1 + n22

][ ]
1 1
+
n1 n 2

Correlated Data Formula


If the samples are related (two measures from the same subject or matched pairs), the correlated
data formula is used.
t=

x 1x 2

[ ][ ]

s1 s2
s
+ 2 r 1
n1 n 2
n1

s2
n2

In choosing the correct formula, it is fairly easy to determine if the sample sizes are
equal. The number of subjects are either the same or they are not. However, to determine if the
variances are homogeneous, use the formula

F=s 2

(largest) /

s2

(smallest). We compare

the calculated F value to the F table value at the .05 or .01 level of significance with
and

n21

degrees of freedom. If the calculated values

n11

table value, then the variances

are not equal; if the calculated value < table value, then the variances are equal.
The degrees of freedom are calculated differently depending upon whether the ns and ss
are equal or not. We must check the degrees of freedom corresponding with the formula we use.
To test the hypothesis, we compare the calculated value to the table value for the significance
level we have chosen. If the calculated value table value, we reject the null hypothesis and
conclude the difference is greater than that expected by chance. If the calculated value < table
value, we fail to reject the null hypothesis and conclude this amount of difference could have
been the result of chance.

3. EFFECTSIZE
Effect size is a simple measure for quantifying the difference between two groups or the same
group over time, on a common scale. In an educational setting, effect size is one way to measure
the effectiveness of a particular intervention. Effect size enables us to measure both the
improvement(gain) in learner achievement for a group of learners and the variationof student
performances expressed on a standardised scale. By taking into account both improvement and
variationit provides information about which interventions are worth having.
There are many ways in which to use effect sizes. The effect size can be used to :
1. Investigate the effectiveness of a particular intervention for a defined group of students.
2. Compare the effectiveness of different interventions.
3. Evaluate the growth over time.
The effect size is just the standardised mean difference between the two
groups. In other words:
EffectSize=

MeanofExperimentalGroup MeanofControlGroup
StandardDeviation

If it is not obvious which of two groups is the 'experimental' (i.e. the one which was given
the 'new' treatment being tested) and which the 'control' (the one given the 'standard' treatment or no treatment - for comparison), the difference can still be calculated. In this case, the 'effect
size' simply measures the difference between them, so it is important in quoting the effect size to
say which way round the calculation was done. The 'standard deviation' is a measure of the
spread of a set of values. Here it refers to the standard deviation of the population from which the
different treatment groups were taken. In practice, however, this is almost never known, so it
must be estimated either from the standard deviation of the control group, or from a 'pooled'
value from both groups.
The meaning of effect size varies by context, but the standard interpretation offered by
Cohen (1988) is:
0.8 = large (8/10 of a standard deviation unit)
0.5 = moderate (1/2 of a standard deviation)

0.2 = small (1/5 of a standard deviation)


The criteria of Effect Size
Effect Size (ES)
ES 0,2

Qualification
Low

0.2> ES<0,8

Moderate

0,8

High

See the examples below :


Effect size example 1 (using a t-test): p 0.05, or Signficant Results
Among 7th graders in Lowndes County Schools taking the CRCT reading exam (N = 336), there
was a statistically significant difference between the two teaching teams, team 1 (M = 818.92,
SD = 16.11) and team 2 (M = 828.28, SD = 14.09), t(98) = 3.09, p 0.05, CI.95 -15.37, -3.35.
Therefore, we reject the null hypothesis that there is no difference in reading scores between
teaching teams 1 and 2. Further, Cohens effect size value (d = 0.62) suggested a moderate to
high practical significance.
Effect size example 2 (using a t-test): p 0.05, or Not Significant Results
Among 7th graders in Lowndes County Schools taking the CRCT science exam (N = 336), there
was no statistically significant difference between female students (M = 834.00, SD = 32.81) and
male students (841.08, SD = 28.76), t(98) = 1.15 p .05, CI.95 -19.32, 5.16. Therefore, we fail
to reject the null hypothesis that there is no difference in science scores between females and
males. Further, Cohens effect size value (d = 0.09) suggested low practical significance.
*population (N), mean (M) and standard deviation (SD) for both samples, the statistical value (t
or F), degrees freedom (df), significance (p), and confidence interval (CI.95)

4. CLASSROOM ACTION RESEARCH DESIGN AND EXPERIMENTAL RESEARCH


DESIGN
-CLASSROOM ACTION RESEARCH DESIGN
Classroom action research follows the same steps as the general scientific model, although in a
more

informal manner. CAR methods also recognize that the researcher is, first and foremost, the
classroom
teacher and that the research cannot be allowed to take precedence over student learning. The
CAR process
can be conceptualized as a seven-step process. (For more detailed information about conducting
CAR
research, see authors such as Bell, 1993; Sagor, 2000; and Hubbard and Power, 1993)
Step one: Identify a question or problem. This question should be something related to student
learning in
your classroom. For example, would a different type of assignment enhance student
understanding? Would
a strict attendance policy result in better test scores? Would more time spent in cooperative
learning groups
help students understand concepts at a higher level? The general model might be "what is the
effect of X on
student learning?"
Since the goal of CAR is to inform decision-making, the question or problem should look at
something
under teacher control, such as teaching strategies, student assignments, and classroom activities.
The
problem should also be an area in which you are willing to change. There is no point in
conducting a CAR
project if you have no intention of acting on your findings. Larger institutional questions might
be tackled,
if the institution is committed to change.
Finally, the question or problem should be feasible in terms of time, effort and resources. In
general, this
means to think small--to look at one aspect of teaching in a single course. Angelo and Cross
(1993) suggest
that you NOT start with your "problem class" but rather start with a class that is progressing
fairly well. As
you become more comfortable with CAR methods, you may attempt more complicated projects.
Step two: Review Literature
You need to gather two types of information, background literature and data. The literature
review may be
much less extensive than traditional research, and the use of secondary sources is sufficient.
Sources such
as Cross and Steadman (1996) or Woolfolk (2000) will often provide background information on
learning,
motivation, and classroom management topics. Another source is the Educational Resources
Information
Center (ERIC) database, which contains references to a huge number of published and
unpublished
manuscripts. You can search the ERIC database at http://ericir.syr.edu/ Your campus' teaching
and learning
center should also have many useful resources.

Step three: Plan a research strategy


The research design of a CAR study may take many forms, ranging from a pretest-posttest
design to a
comparison of similar classes to a descriptive case study of a single class or student. Both
quantitative and
qualitative methods are appropriate. The tightly controlled experimental designs of traditional
research are
rarely possible in a natural classroom setting, so CAR relies on the triangulation of data to
provide validity.
To triangulate, collect at least three types of data (such as student test scores, teacher evaluations,
and
observations of student behavior). If all data point to the same conclusions, you have some
assurance of
validity.
Step four: Gather data
CAR tends to rely heavily on existing data such as test scores, teacher evaluations, and final
course grades.
You might also want to collect other data. See Angelo and Cross (1993) for a wonderful array of
classroom
assessment techniques.
(Be sure to check with your Institutional Review Board for policies regarding the use of human
subjects.
Most CAR with adult students will be exempt from review as long as you do not identify
individual
students.)
Step five: Make sense of the data
Analyze your data, looking for findings with practical significance. Simple statistical analyses of
quantitative data, such as simple t-tests and correlations, are usually sufficient. Tables or graphs
are often
very helpful. Qualitative data can be analyzed for recurring themes, citing supporting evidence.
Practical
significance, rather than statistical significance, is the goal.
Step six: Take action
Use your findings to make decisions about your teaching strategies. Sometimes you will find that
one
strategy is clearly more effective, leading to an obvious choice. Other times, strategies may prove
to be
equally effective. In that situation, you may choose the strategy that you prefer or the one that
your students
prefer.
Step seven: Share your findings
You can share your findings with peers in many ways. You may submit your report to JoSoTL,
which has a
special section for CAR reports. These articles will typically be from 4 to 8 pages--shorter than
the typical

traditional research report. Most CAR reports are appropriate for submission to the ERIC
database
(instructions for submission can be found on the ERIC website at:
http://ericfac.piccard.csc.com/submitting). You might also share your work at conferences such
as the
International Conference for Teacher-Researchers ( http://www.educ.ubc.ca/ictr2001/ ) or at
regional
conferences for your discipline. Most disciplines sponsor a journal on teaching, although CAR
may be too
informal to meet publication requirements.
Judging the quality of CAR projects
Although CAR projects are not as comprehensive as traditional educational research, their
quality can still
be assessed using the guidelines of Glassick, et al (1997) in Scholarship Assessed. I recently
worked with
colleagues to develop an evaluation plan for the CAR projects of K-12 teachers in a local school
district
(Mettetal, Bennett and Smith, 2000). The resulting rubric has been adapted for JoSoTL and is
used by our
reviewers for CAR, traditional research, and essay
(http://www.iusb.edu/~josotl/rubric/rubric.htm).

-EXPERIMENTAL RESEARCH DESIGN


Experimental Design
We are concerned with the analysis of data generated from an experiment. It is wise to take time
and effort to organize the experiment properly to ensure that the right type of data, and enough of
it, is available to answer the questions of interest as clearly and efficiently as possible. This
process is called experimental design.
The specific questions that the experiment is intended to answer must be clearly identified before
carrying out the experiment. We should also attempt to identify known or expected sources of
variability in the experimental units since one of the main aims of a designed experiment is to
reduce the effect of these sources of variability on the answers to questions of interest. That is,
we design the experiment in order to improve the precision of our answers.
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Experimental designs are often touted as the most "rigorous" of all research designs or, as the
"gold standard" against which all other designs are judged. In one sense, they probably are. If
you can implement an experimental design well (and that is a big "if" indeed), then the
experiment is probably the strongest design with respect to internal validity. Why? Recall that
internal validity is at the center of all causal or cause-effect inferences. When you want to

determine whether some program or treatment causes some outcome or outcomes to occur, then
you are interested in having strong internal validity. Essentially, you want to assess the
proposition:
If X, then Y
or, in more colloquial terms:
If the program is given, then the outcome occurs
Unfortunately, it's not enough just to show that when the program or treatment occurs the
expected outcome also happens. That's because there may be lots of reasons, other than the
program, for why you observed the outcome. To really show that there is a causal relationship,
you have to simultaneously address the two propositions:
If X, then Y
and
If not X, then not Y
Or, once again more colloquially:
If the program is given, then the outcome occurs
and
If the program is not given, then the outcome does not occur
If you are able to provide evidence for both of these propositions, then you've in effect isolated
the program from all of the other potential causes of the outcome. You've shown that when the
program is present the outcome occurs and when it's not present, the outcome doesn't occur. That
points to the causal effectiveness of the program.

5. INSTRUMENT VALIDITY
We could say one way to validate instrument is to know and understand the whole thing about
the participant/ subject which we aimed. Then we need to understand our purposes and the
interrelation between the purpose , technique, and the participant to be mentioned. We cannot
use a similar method over one kind of participant. A valid instrument of role playing might be
appropriate for an elementary school student but, it will be different if its applied to the senior
high school student.

Below are one way to have a validated instrument before used it in research
Protect construct validity. A construct is the behavior or outcome a researcher seeks to
measure within a study, often revealed by the independent variable. Therefore, it is important to
operationalize or define the construct precisely. For example, if you are studying depression but
only measure the number of times a person cries, your construct is not valid and your research
will likely be skewed.
Protect internal validity. Internal validity refers to how well your experiment is free of
outside influence that could taint its results. Thus, a research instrument that takes students
grades into account but not their developmental age is not a valid determinant of intelligence.
Because the grades on a test will vary within different age brackets, a valid instrument should
control for differences and isolate true scores.
Protect external validity. External validity refers to how well your study reflects the real
world and not just an artificial situation. An instrument may work perfectly with a group of white
male college students but this does not mean its results are generalizable to children, blue-collar
adults or those of varied gender and ethnicity. For an instrument to have high external validity, it
must be applicable to a diverse group of people and a wide array of natural environments.
Protect conclusion validity. When the study is complete, researchers may still invalidate their
data by making a conclusion error. Essentially, there are two types to guard against. A Type I
error is concluding there is no relationship between experimental variables when, in fact, there is.
Conversely, a Type II error is claiming a relationship exists when the correlation is merely the
result of flawed data.
Read more : http://www.ehow.com/how_2277596_validate-research-instrument.html

Вам также может понравиться