Вы находитесь на странице: 1из 10

Running head: A CRITIQUE OF THE CAEL ASSESSMENT

A Critique of the CAEL Assessment


Cat Migliore
EDU 5399
Dr. Marzieh H. Tafaghodtari
University of Ottawa
February 29, 2016

Introduction

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


2
There are many English proficiency tests that exist in order to determine if English as a Second
Language Students (ESL) have the literacy skills required in order to attend a postsecondary institution.
These include the CanTEST, the Common Entrance Test (CET), and the Canadian Academic English
Language Assessment (CAEL, to name a few. For the purposes of this critical analysis, we will be looking
solely at the CAEL assessment in terms of its validity and reliability through an examination of scoring
procedures, topic affect, motivation, test anxiety, and diagnostic assessment. This critical assessment will
demonstrate that although the CAEL assessment has high validity, reliability and impact, it has low
practicality.
Carleton University created the Canadian Academic English Language (CAEL) assessment in
1987 (Paragon Testing Enterprises [PTE], 2015). According to Malone (2010), the CAEL is recognized as
a reputable program in Canada and the U.S.A. The CAEL is a criterion-referenced high-stakes
standardized test that determines whether or not English as a Second Language (ESL) learners have the
English proficiency skills required to attend college and university programs in Canada (Malone, 2010).
The CAEL can also be used to assess the skills of ESL learners in an English Intensive program, and ESL
learners who have jobs but are seeking to refine their literacy skills and attain a certificate in order to
receive a promotion. The test costs around $200 Canadian dollars, takes approximately three hours to
complete, and test-takers must be at least 14 years old (Malone, 2010). It assess four domains of skills:
reading, writing, listening, and speaking, and is comprised of comprehension, multiple choice, charts,
matching, audio passages, item recall, etc. It can be considered an alternative to other high-stakes
standardized tests, such as the TOEFL and CET, because it is topic-based and incorporates constructrelated responses (Malone, 2010).

Literature Review - Validity and Reliability


Reliability can be determined in four different ways: test-retest, parallel form, interrelated, and
internal consistency, and validity errors can be seen as either systematic or random (Airasian, Engemann,

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


3
& Gallagher, 2012). According to Malone (2010), the validity of the CAEL is assessed by how well it
answers [the] specific question: Does the test taker have the ability to meet the English language
requirements of the university degree programs? (Malone, 2010). The four CAEL domains previously
mentioned are created using introductory postsecondary course requirements, which ensures validity. Test
developers attend lectures, observe students, and analyze materials related to these courses through a
variety of methods, including surveys, interviews, classroom observation and analys[e]s of universitylevel assignments and exams (Malone, 2010, p. 633). These developers also use criterion-related validity
to compare CAEL scores with the Test Of English As A Foreign Language (TOEFL) and the International
English Language Test System (IELTS) tests (Malone, 2010). Validity is also determined through the use
of a post-CAEL survey, which encourages test-takers to discuss their experiences with the test (Malone,
2010).
However, the test typically uses Carleton University introductory courses as its foundational
structure, thus it might not reflect introductory courses in all postsecondary institutions (Malone, 2010).
To combat validity issues, the CAEL assessment has 14 parallel-forms; is frequently updated; its scorers
are highly trained; and the developers often perform studies to assess its validity and reliability (Malone,
2010). One of these studies tracked test takers GPAs after successful completion of the CAEL. According
to this study, 79 students scored above 70% in their introductory classes, and only two of those 79 failed
courses (Malone, 2010). However, as Malone points out, the developers do not publicly release how many
participants this involved, or details about them, unless requested. This means that the results cannot be
generalized for all CAEL assessment environments (Malone, 2010).
Malone describes that reliability is determined by test domain (Malone, 2010, p. 634). For
example, within the speaking domain, each speaking performance is rated by one rater with regular
checks for interrater reliability (Malone, 2010, p. 634). Overall, reliability has approximately a 10%
variance. When variances happen, multiple raters collaborate and agree upon a final score (Malone,
2010). Reliability is also determined through construct irrelevant variance (Malone, 2010, p. 634). For

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


4
example, in order to eliminate topic advantage, each question has the answer hidden within it (in other
words, additional knowledge is not needed to find the correct answer).
Jennings, Fox, Graves, and Shohamy (1999) discuss how topic advantage can decrease an
instruments validity and reliability. Topic advantage can be due to comprehension, interest, and prior
knowledge of a subject (Jennings et al., 1999). Jennings et al. (1999) investigated the results of CAEL
test-takers who got to choose their test topic (out of 5 topics in total) in comparison to test-takers who
were given a topic. Participants included 254 ESL CAEL test-takers from various cultural backgrounds,
such as China, Iran and Korea (Jennings et al., 1999). The methodology included an analysis of four
dependent variables: proficiency level, reading response, lecture response and essay response (Jennings
et al., 1999, p. 437) against the independent variables (the test domains: reading, writing, listening, and
speaking). Half of the participants chose their topic, the other half did not, and the researchers randomly
assigned participants to the groups. The study can be considered genuine, as it tested students on regular
CAEL testing days (Jennings et al., 1999). After completing the CAEL test, students completed a
questionnaire that asked them questions about their experiences and perceptions towards the test.
Jennings et al. (1999) concluded that both the choice and non-choice groups had relatively similar
performance outcomes, which means that they could be compared. Thus, the results from this study
validate the CAEL test, in that topic effect does not impact test-takers performances on the test. However,
within the questionnaire, test-takers stated that time is the most influential test component, and that
having choice on tests is important to them (Jennings, 1999). This suggests that if students are not given
enough time to complete a test, time is a systematic error, which results in many students feeling rushed
and not being able to complete the test to their best ability. Also, many test-takers thought that topic effect
would take place, even though results showed that the difference was small (Jennings et al., 1999).
Results stimulate a curiosity towards examining the pattern of choice and if proficiency level could affect
the results (Jennings, 1999). Although this article is from over a decade ago, it is important to eliminate
topic effect as a potential validity issue.

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


5
Another reliability issue is motivation and test anxiety, which are considered mutable test factors
because when one changes it directly affects the other (Cheng et al., 2014). For example, if a student is
highly motivated to perform well on a test, chances are, their test anxiety will be lower. Cheng et al.
(2014) use the term ability-in-language-user-in-context when referring to how social, educational, and
economic influences affect ones motivation, anxiety, and thus performance on a high-stakes standardized
test, including how these factors affect the validity of test development and scoring. Their research
focuses on test-takers motivation and anxiety across three high-stakes English Proficiency standardized
tests (the CAEL, CET, and GEPT). It examines the social, economic, and educational factors which can
cause systematic errors depending on a test-takers gender, stakeholders, and the test itself (Cheng et al.,
2014). Cheng et al. (2014) ask the question: What are the interrelationships between motivation and test
anxiety constructs in terms of language test performance across the three testing contexts after controlling
for personal variables of gender and age? (Cheng et al., 2014, p. 307). They describe that prior research
tends to examine motivation and anxiety separate from one another, but these two factors need to be
researched together because of their impact on society. Canada continues to receive immigrants, and if
these immigrants choose to study at the postsecondary level, we need to ensure that our English
Proficiency tests are valid, as these test scores affect admission, program enrollment, graduation, and
sometimes even immigration (Cheng et al., 2014).
The researchers used a nested research design methodology, which compared the CAEL, CET,
and GEPT (Cheng et al., 2014, p. 307). They used a four-part questionnaire that focused on motivation,
anxiety, perceptions, and the purpose of the test. The questions used a rating scale from 1 (unimportant) to
6 (very important) and asked questions relating to perceptions, the students background, length of time
spent in Canada, etc. Results from the questionnaire concluded that CAEL test-takers have the highest
amotivation and low levels of intrinsic motivation, males tend to be more confident than females, and that
older female test-takers performed better on the CAEL. In comparison to the CET and GEPT, CAEL testtakers found the test more important to themselves and their parents, for acceptance into post-secondary

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


6
programs, future job endeavors, and found the test purpose more important than other test groups, which
the researchers found shocking due to the fact that the other tests have a greater international reputation
and can also be completed internationally (Cheng et al., 2014).
By comparing the CAEL assessment to other high-stakes standardized English proficiency tests,
this study provided more evidence that CAEL test-takers are heavily influenced by social, educational,
and economic factors that can cause random errors during test performances, such as fear of test failure.
Although Cheng et al. (2014) use various scales and methods to determine validity and reliability of their
study, this does not promise validity of the CAEL test itself. Instead, the results show that the perceptions
of test-takers affects their overall performance on tests, and that motivation and test anxiety are random
errors affecting CAEL assessment results. Despite these results, the CAEL assessment can still be
considered to have high validity and reliability, because random errors will always occur (and are difficult
to control), whereas systematic errors reveal discrepancies within a tests structure.
The last study examined in this critical analysis focuses on the CAEL test as a form of diagnostic
assessment, which aids test-takers to refine their literacy skills. Diagnostic assessment is a type of
formative assessment, which can help built autonomy, as it is a constructionist, student-centered approach
to learning (Doe, 2015). Formative assessments collect and synthesize information needed to help
improve learning during instruction, whereas summative assessments use this same information to make
grades and decisions about students learning (Airasian et al., 2012). Diagnostic assessment determines
what learners already know about a topic, and thus guides learning. As CAEL test-takers are ESL
students, some do not have experience using formative assessments, therefore, they might not have the
metacognitive skills or strategies required to help them improve upon their independent learning (Doe,
2015).
The CAEL assessment provides learners with feedback on their test results and if the target user,
the student, does not use the feedback, this nonuse poses a threat to the validity of an interpretative
argument for a test, which states that the test use is beneficial (Doe, 2015, p. 113). This study asked 47

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


7
EAP program participants, with various cultural backgrounds, a series of questions regarding diagnostic
assessment, and whether or not their CAEL test feedback helped them improve their literacy skills (Doe,
2015). The researchers used a Likert scale that asked various yes/no and other open-ended questions. One
of the questions posed was, How much do you think the diagnostic information presented in the learner
profile reflects your actual language ability? and 35 out of 38 participants who answered this question
did not entirely agree (Doe, 2015, p. 118). This suggests a discrepancy between the participants
perceptions of themselves in comparison to their skills as assessed in the test. However, Doe states that
overall, test-takers found the CAEL test feedback as a useful source for helping them improve their
literacy (Doe, 2015). This indicates that after completion of the CAEL test (regardless of whether or not
one has succeeded), the feedback can inform one on how to improve.
Critical Analysis
So far, this paper has identified several studies that question the validity and reliability of the
CAEL assessment. The first study discussed how CAEL developers consider the test valid and reliable.
Malone suggests that CAEL assessment developers demonstrate parallel-form reliability (because there
are 14 versions of the test) and inter-rater reliability (as there are multiple raters who are highly trained
and work together to resolve scoring issues). The test is also considered valid because it strives to assess
the same literacy skills required for postsecondary introductory courses (Malone, 2010). However, as the
test was created based on Carleton courses, does it facilitate consistency? The test might not be applicable
to all introductory courses across Canada. Also, Malone does not discuss the subjectivity of having
multiple raters, which can cause random errors in testing. The second study discussed topic effect as a
potential validity issue, however, results determined that the CAEL assessment has construct-related
validity, which means that students need to know the form of the tests, rather than specific content
(Jennings, 1999). The third study revealed that motivation and test anxiety are random validity errors that
can affect a test-taker's performance, but these errors are more difficult to eliminate (as its easier to
control systematic errors). Also, random errors affect all types of high-stakes standardized tests, not just

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


8
the CAEL (Cheng et al., 2014). The last article discusses how the CAEL test can be used as diagnostic
assessment for students, following the test. Therefore, because the test has multiple purposes (it permits
individuals to study at a postsecondary institution and provides individuals with feedback that will
enhance their English language learning), it can be considered more valid.
The aforementioned articles demonstrate that the CAEL assessment has high reliability and
validity. However, none of the articles mentioned how home literacy practices affect validity. For
example, ESL learners struggle to identify literacy practices outside of school (such as reading emails,
searching the web, being interviewed, etc.) (Zheng et al., 2011). Therefore, if these students have not
practiced standardized test constructs, or know how literacy is defined in Canada, they might not succeed
on the test.
The CAEL test also lacks pluralistic assessment, which refers to being responsive to cultural
diversity (Airasian et al., 2012). For example, although most test-takers are foreign (Jennings, 1999),
none of the articles mention whether or not the test topics are cultural. It is acknowledged that most of the
students come from foreign countries, need to learn the tests constructs, but are these topics reflective of
cultures outside of the Eurocentric domain? Although Chengs article revealed that topic effect does not
increase student success on the test, topic effect (in this case, including multiculturalism within a test)
could contribute to a students motivation towards completing the test, as having relatable material might
make them more intrinsically motivated (Ryan & Deci, 2000).
Overall, although the CAEL test has high validity and reliability, the practicality of the CAEL test
is quite low. For example, although this test is more accommodating than other tests, as it is topic-based
and includes construct-related responses, it does not have the same international popularity that other tests
have. Due to this international unfamiliarity, test-takers who wish to study abroad would benefit from
taking a test with more international appraisal (Malone, 2010). The CAEL assessment has high impact
because failure in the test can result in having to spend more money to retake the test, or affect admission
requirements, immigration, and Visa applications. Also, successful completion of the test determines

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


9
ones acceptance into a college or university. In summary, the CAEL assessment is a valid and reliable
instrument for ESL learners who wish to enhance their current skills in order to get a job promotion or
study at the postsecondary level in Canada. Those wishing to seek an education in other countries, would
benefit from taking a test with a higher international reputation.

References
Airasian, P., Engemann, J., & Gallagher, T. (2012). Classroom assessment: Concepts and applications.
Whitby, ON: McGraw-Hill Ryerson.

Running head: A CRITIQUE OF THE CAEL ASSESSMENT


10
Paragon Testing Enterprises. (2015). CAEL Assessment. (2015). Retrieved February 28, 2016, from

https://www.cael.ca/paragon-has-acquired-the-cael-assessment/
Cheng, L., Klinger, D., Fox, J., Doe, C., Jin, Y., & Wu, J. (2014). Motivation and test anxiety in test
performance across three testing contexts: the CAEL, CET, and GEPT. TESOL Quarterly,
48(2), 300-330.
Doe, C. (2015). Student interpretations of diagnostic feedback. Language Assessment Quarterly, 12(1),
110-135.
Jennings, M., Fox, J., Graves, B., & Shohamy, E. (1999). The test-takers' choice: an investigation of the
effect of topic on language-test performance. Language Testing, 16(4), 426-456.
Malone, M. E. (2010). Test Review: Canadian Academic English Language (CAEL) Assessment.
Language Testing, 27(4), 631-636.
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic
motivation, social development, and well-being. American psychologist, 55(1), 68.

Zheng, Y., Klinger, D. A., Cheng, L., Fox, J., & Doe, C. (2011). Test-takers background,
literacy activities, and views of the Ontario secondary school literacy test. Alberta J
Journal of educational Research, 57(2), 115-136.

Вам также может понравиться