Вы находитесь на странице: 1из 20

Irish Educational Studies

ISSN: 0332-3315 (Print) 1747-4965 (Online) Journal homepage: http://www.tandfonline.com/loi/ries20

Agreeableness and Conscientiousness as


predictors of university students’ self/peer-
assessment rating error

Parviz Birjandi & Masood Siyyari

To cite this article: Parviz Birjandi & Masood Siyyari (2016): Agreeableness and
Conscientiousness as predictors of university students’ self/peer-assessment rating error, Irish
Educational Studies

To link to this article: http://dx.doi.org/10.1080/03323315.2016.1147973

Published online: 30 Mar 2016.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=ries20

Download by: [Orta Dogu Teknik Universitesi] Date: 30 March 2016, At: 14:48
Irish Educational Studies, 2016
http://dx.doi.org/10.1080/03323315.2016.1147973

Agreeableness and Conscientiousness as predictors of university students’


self/peer-assessment rating error
Parviz Birjandia and Masood Siyyarib*
a
Department of English Language Teaching, Science and Research Branch, Islamic Azad
University, Tehran, Iran; bScience and Research Branch, Hesarak, Ashrafi Esfahani Highway,
Tehran, Iran
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

(Received 24 March 2014; accepted 22 January 2016)

This paper presents the results of an investigation into the role of two personality
traits (i.e. Agreeableness and Conscientiousness from the Big Five personality
traits) in predicting rating error in the self-assessment and peer-assessment of
composition writing. The average self/peer-rating errors of 136 Iranian English
major undergraduates assigned to two equal-sized groups, one doing self-
assessment and the other peer-assessment over nine sessions, were correlated with
and regressed against their measures of Agreeableness and Conscientiousness.
The results indicated that Agreeableness and Conscientiousness only explained
7% of the variation in self-rating error. Including both Agreeableness and
Conscientiousness in a model to predict peer-rating error explained 16% of the
variation. When Agreeableness and Conscientiousness were both included in a
multiple regression analysis, only Agreeableness was found to significantly
predict peer-rating error. Based on these findings, some conclusions are drawn,
and pedagogical implications are discussed.
Keywords: agreeableness; conscientiousness; self/peer-assessment; rating error

Introduction
With the advent of educational assessment as opposed to traditional summative psy-
chometric testing, assessment in support of learning became one of the major goals to
pursue in education (Gipps 1994). To accomplish the goals of educational assessment,
the alternative means of assessment are considered most effective, which include use of
checklists, videotapes, audiotapes, teacher observations, journals, logs, conferences,
portfolio, self-assessment, and peer-assessment (Brown and Hudson 1998; McKay
2006).
Among the alternative means of assessment, self- and peer-assessment have
attracted so much attention in recent years owing to growing emphasis on learner
independence and autonomy, as well as power sharing between teachers and learners
in education (Boud and Falchikov 2007; Taras 2008). Self- and peer-assessment have
been viewed as having significant pedagogical values since they are cognitively
demanding tasks which require and encourage intelligent self-questioning, post hoc
reflection, learners’ ownership and management of learning processes, sense of per-
sonal responsibility and accountability, self-efficacy, and meta-cognition (Topping

*Corresponding author. Email: m.siyyari@srbiau.ac.ir


© 2016 Educational Studies Association of Ireland
2 P. Birjandi and M. Siyyari

1998, 2003). Self-assessment and peer-assessment also require less time to conduct in
the classroom, involve the students in the process of assessment, and this by itself can
lead to increased learner autonomy, self-confidence, and higher motivation (Dochy,
Segers, and Sluijman 1999; Harris 1997; Oscarson 1989; Topping 1998). Examples
of empirical studies supporting the pedagogical benefits of self- and peer-assessment
abound in the literature. Detailed reviews of these studies could be found in works
by Dochy, Segers, and Sluijman (1999), Falchikov and Goldfinch (2000), and
Topping (1998, 2003).
Despite the significance of self/peer-assessment, they elude full integration in
different settings (Cassidy 2007) including the field of language teaching and assess-
ment. Several reasons for this have been reported in the literature; however, the
main reason lies in teachers’ doubt about the ability of the learners to assess themselves
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

accurately and objectively (Cassidy 2007; Falchikov and Goldfinch 2000; Oscarson
1989). Research findings on the accuracy and reliability of self- and peer-assessment
have also added to the uncertainty of teachers and administrators about the learners’
ability to do self- and peer-assessment reliably since these findings are not consistent
(Dochy, Segers, and Sluijman 1999; Patri 2002; Pope 2005; Topping 1998).
The literature review of self/peer-assessment indicates that some factors and vari-
ables can account for bias and inaccuracy in self/peer-assessment (Falchikov and
Goldfinch 2000). Probably. the most important of these variables are:

(1) the seniority and ability level of self/peer-assessors (e.g. Blanche 1988; David-
son and Henning 1985; Falchikov and Boud 1989; Falchikov and Goldfinch
2000; Heilenmann 1990; Jafarpur and Yamini 1995; Janssen-van Dieten
1989; Topping 1998);
(2) clarity of marking criteria and student training (Brantmeier and Vanderplank
2008; Brown 2005; Oldfield and MacAlpine 1995; Orsmond, Merry, and
Reiling 2000; Smith, Cooper, and Lancaster 2002; Topping et al. 2000);
(3) the self- or peer-assessed subject area (Falchikov and Boud 1989);
(4) the number of students involved in peer-assessment (Stefani 1994);
(5) the setting where self- or peer-assessment is conducted (i.e. classroom vs. pro-
fessional settings) (Falchikov and Goldfinch 2000);
(6) assessment criteria specificity and the choice of scoring method (i.e. analytic
vs. holistic scoring) (Lejk and Wyvill 2001, 2002; MacAlpine 1999; Miller
2003);
(7) feedback presence (Taras 2001);
(8) cultural norms (Matsuno 2009);
(9) self/peer-rater’s gender (Andrade et al. 2009; Falchikov and Magin 1997;
Langan et al. 2008; Pope 2005);
(10) learning styles (Cassidy 2007; Vickerman 2009);
(11) stress, attitudinal issues, affectivity, and personality traits (AlFallay 2004;
Blanche 1988; Dlaska and Krekeler 2008; Pope 2005).

The above factors and variables could be classified as either the situational deter-
minants or individual determinants of self/peer-assessment accuracy. Situational
determinants, such as assessment setting, ability level, assessment method, or assess-
ment criteria have the potential to be controlled and improved to enhance the accuracy
Irish Educational Studies 3

of self- and peer-assessment; however, individual determinants like learning styles,


attitude, and personality cannot be controlled as situational determinants are.
Based on the arguments of Bernardin, Cooke, and Villanova (2000) and Yun et al.
(2005), phenomena such as rating elevation (i.e. restricting ratings to the positive end
of continuum on a rating scale), rating leniency (overestimation), and rating severity
(underestimation), which are all different manifestations of rating error and inaccur-
acy, are immune to any control since the literature indicates that they are stable charac-
teristics in raters which can be predicted via measuring individual differences including
personality traits (e.g. Borman and Hallam 1991; Kane et al. 1995; Tziner et al. 2002,
as cited in Yun et al. 2005). These rating behaviours are in fact stable since the person-
ality traits affecting these rating behaviours are stable by nature. Support for the stab-
ility of personality traits independently of age and context could be found in Costa and
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

McCrae’s (1997) review of many cross-sectional and longitudinal studies on this issue.
Given the above points, this study aimed at investigating the relationship between
two selected personality traits and the degree of rating error in self- and peer-assess-
ment in language learning context. In order to narrow down this study to some specific
personality traits which might be more likely to predict self- and peer-assessment
rating error, the related literature needed to be reviewed in more depth since the
area of personality traits and how they may affect self- and peer-assessment has
been paid far too little attention so far.

Personality traits
The literature of personality psychology indicates that information processing func-
tions such as perception, attention, memory, and speeded response needed in cognitive
tasks relate to personality traits (Matthews, Deary, and Whiteman 2009). Similarly,
self/peer-assessment, which also involve conscious decision-making and processing
of information, could logically be assumed to be influenced by personality traits as
well. In this regard, Bernardin, Cooke, and Villanova (2000), Yun et al. (2005), and
Grahek (2007) also consider a role for personality traits in rating accuracy. For the
above reasons, affective variables and their impact on assessment and learning have
been investigated over the past years by some researchers. For instance, studies by
Clement, Gardner, and Smythe (1977), MacIntyre (1992), MacIntyre, Noels, and
Clement (1997), and Gregersen and Horwitz (2002) have investigated self-rating
and anxiety. AlFallay (2004) has also studied the role of some selected psychological
and personality traits of learners of English as a foreign language, namely motivation
types, self-esteem, anxiety, and motivational intensity in the accuracy of self- and peer-
assessments. Nonetheless, there are still many other affective and personality factors
which could be investigated in terms of their role in language learning and assessment,
and in particular self- and peer-assessment.
Among the many personality traits which might have direct bearing on rating
accuracy in self/peer-assessment, Agreeableness and Conscientiousness in the five-
factor theory of personality (Costa and McCrae 1992) were chosen to be investigated
in this study. There were two main reasons why these personality traits were selected
for investigation in this study. First, the personality traits in the five factor theory of
personality have especially come to fore in professional and educational psychology
as a result of Costa and McRae’s extensive research in the area of personality traits
by utilizing every major personality theory. Costa and McRae’s research has resulted
4 P. Birjandi and M. Siyyari

in a five-factor theory of personality which comprise Neuroticism, Extraversion,


Openness to experience, Agreeableness, and Conscientiousness . Their theory of per-
sonality has undergone different validation studies and has incorporated traits in
other theories of personality including the 16 factors in Cattell’s model and the
three in Eysenck’s (Giordano 2008). Moreover, most personality psychologists have
supported Costa and McCrae’s five-factor theory since cross-cultural support and
stability over time have been observed for their theory and test (Feist and Feist 2006).
The questionnaires to measure these traits have been widely employed in many set-
tings such as counselling, clinical psychology, psychiatry, behavioural medicine, health
psychology, vocational counselling, industrial/organizational psychology, and more
importantly in personality research. The field of language teaching, and assessment,
however, seems not to have seen many studies involving the use of the Big Five. To
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

name a few studies in this regard, MacIntyre and Charos (1996) found that all of
the Big Five’s dimensions were significant predictors of language-related attitudes as
predictors of motivation and willingness to communicate, which are themselves
directly related to foreign language learning.
Tamir (2005) also found that those high in Neuroticism can choose to increase
their level of worry when they are expected to perform demanding tasks. Moreover,
he found that this preferred increase in worry level may be also beneficial to
performance.
The second reason for directing the focus of this study on two of the Big Five per-
sonality traits was that considering the definitions of some of these traits and the resul-
tant liabilities and characterization seem quite relevant for explaining the extent of
learners’ error in self-assessment and peer-assessment. For instance, Conscientious-
ness in the Big Five is believed to be associated with several instances of behaviour
such as ‘reliability’ (Costa and McCrae 1992) of a person’s behaviour. It sounds
reasonable to hypothesize that a reliable person, as his/her Conscientiousness drives
him/her to be so, may assess himself/herself or his/her peers more reliably and
objectively.
Agreeableness, too, is associated with some characteristics and dispositions which
might predict rating error in self/peer-assessment. Some of these characteristics such as
altruism, compliance, and trustworthiness might lead a person to follow the teacher’s
instructions on how to carry out an assessment task more accurately based on the
assessment criteria. The literature also suggests that Agreeableness is positively associ-
ated with job and learning performance (e.g. Cortina et al. 1992; Mohammed,
Mathieu, and Bartlett 2002; Salgado 1997; Touze 2005). In light of the above
points, it might then be logical to expect that those high in Agreeableness may have
a better performance in learning how to conduct self/peer-assessment. In addition
to the above points, what should be considered with caution is that self/peer-assess-
ment are usually considered the two sides of the same coin in educational settings;
that is to say, they are usually employed by the teachers not assuming any difference
between the two. However, the literature suggests that different mental, social, and
affective mechanisms may be at play. For instance, Nanine et al. (2010) argue that
peer-assessment as opposed to self-assessment is a social and interpersonal activity
involving interactional processes especially due to the fact that students are supposed
to rate their own peers and classmates with whom they may have social relationships of
different natures. At the same time, it should be also considered that according to
Costa and McCrae (1992), Agreeableness is a trait which has more to with the
Irish Educational Studies 5

interpersonal tendencies; therefore, it can be hypothesized that a trait such as Agree-


ableness may influence rating accuracy in self-assessment and peer-assessment differ-
ently. This hypothesis is partly in agreement with Kane et al.’s (1995; as cited in Yun
et al. 2005) suggestion that ‘raters’ Agreeableness may predict leniency when raters
anticipate future interaction with the ratees after the ratings … Thus, raters high on
Agreeableness may show less accurate ratings when they expect to provide face-to-
face feedback to ratees than when they do not’ (p. 98). Definitely, peer-assessment
also involves a kind of rating after which the peer-raters have social interaction with
one another and their rates; therefore, Agreeableness may affect rating accuracy in
peer-assessment differently from that in self-assessment.
In sum, motivated by the above reasoning and background information, this study
investigated if Agreeableness and Conscientiousness have any relationship with rating
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

error in the self-assessment and peer-assessment of writing skill. And if so, to what
extent can these two personality traits predict self-rating and peer-rating error? Natu-
rally, the more that is known about the nature of self/peer-assessment and what factors
may influence the extent of learners’ rating error in doing self/peer-assessment, the
more awareness the teachers and educators will have of how to use the scores resulting
from self/peer-assessment in their teaching and assessment practice.

Method
Participants
The participants of this study consisted of 136 Iranian male and female adult under-
graduate students studying different English language majors, including English lit-
erature, English translation, and English language teaching, at Allameh Tabataba’i
University, the South Tehran Teacher Training Branch of Islamic Azad University,
and Alborz Higher Education Institute. The participants were aged between 18 and
29, and the needed data for this study were collected from the participants while
attending a two-credit 16-week course on academic composition writing normally
offered to the students in the third term of the bachelor’s programme. Since intact
classes were used, the classes were arbitrarily assigned to self-assessment and peer-
assessment groups by using a ‘semi-randomization procedure’ (Mackey and Gass
2005, 143). Table 1 shows how the participants were assigned to the two groups. In
the beginning of the course, the participants were all informed of the purpose of the
research, and they were at liberty to avoid answering the personality questionnaires;

Table 1. Participants assignment to groups.

University Peer-assessment group Self-assessment group

Allameh Tabataba’i University n = 33 → 18 females n =0


15 males
the South Tehran Teacher n=0 n = 35→ 19 females
Training Branch of Islamic 16 males
Azad University
Alborz Higher Education Institute n = 35→ 18 females n = 33→ 20 females
17 males 13 males
Total 68 68
6 P. Birjandi and M. Siyyari

however, self or peer-assessment exercises were part of the main common activities of
the current course and other similar courses in the past by the same instructors, and
the extent to which the students took self- or peer-assessments seriously was con-
sidered in 10% of their final term score. It should be noted that the students’ rating
accuracy was never considered in their term score, but their active participation in
self/per-assessment was always encouraged and rewarded in the final term score.
Their whole term score was in fact a composite score including scores for a final
achievement test, class participation, and assignments. Given these points, the self-
or peer-assessment data of those students who had also answered the questionnaire
were included in this study. Fortunately, except six students who had to drop out of
the course for reasons unrelated to the course or this research, the rest of the students
agreed to have their data collected and analysed in this study.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

Instruments
To provide the means for collecting the necessary data for this study, some instruments
as follows were required.

Writing scale
The writing scale employed for scoring the compositions of the participants was the
English as a Second Language (ESL) composition profile by Jacobs et al. (1981)
which has formerly been used for teacher-, self-, and peer-assessment by Saito and
Fujita (2004) and Matsuno (2009) as well. Jacobs et al. (1981) have provided impress-
ive indices on the reliability and validity of this scale, including interrater reliability
coefficients between two, three, and four raters (ranging from .85 to .93), intercorrela-
tions of the components of the scale (ranging between .64 and .89), Cronbach’s coeffi-
cient alpha (.89), and results from a differential groups construct validity. For the
purpose of the present study too, the same indices were computed on a sample of
30 compositions, the results of which demonstrated coefficients above .80.
To conduct a differential-groups construct validity study, the same participants’
scores, which were collected in the beginning of a writing course, were compared
with their own scores at the end of the writing course via paired-samples t-test,
which indicated a significant average increase from pretest to posttest; t (30) =
−6.72, p < .01, d > 1.
In the present study, the ESL composition profile (Jacobs et al. 1981) was used for
both raters’ ratings and students’ self- and peer-ratings. It should be noted that this
scale was originally accompanied by scoring rubrics and brief descriptors for every
key word and component to do with writing ability (i.e. content, organization, voca-
bulary, language use, and mechanics); however, all the descriptors and the components
of writing ability were even further explained and illustrated by the researcher in a sep-
arate pamphlet for both the participants and raters. This explanatory pamphlet was
mainly based on books by Jacobs et al. (1981), Hughey et al. (1983), Arnaudet and
Barrett (1990), and Zemach and Stafford-Yilmaz (2008). The participants’ pamphlet
differed to some extent from that of the raters’ since the scale was translated into
Persian for the students, and the wording of the descriptors was simpler and less tech-
nical with more examples. Finally, a set of anchor scripts receiving the different band
scores for each writing component on the scale was appended to both pamphlets.
Irish Educational Studies 7

These anchor scripts were actually sample compositions from students who had for-
merly taken the course, and the raters had rated them with very high inter-rater
reliability. Provision of such an annotated scale was based on the works and sugges-
tions of many researchers in the literature of assessment (e.g. Andrade and Valtcheva
2009; Brown 2005), who believe that self/peer-assessment will prove accurate and ped-
agogically valuable when students know what it is that they are assessing and what per-
formance is acceptable on the task.

Proficiency test
Since language proficiency has been found to be a strong source of variation in rating
accuracy of self- and peer-assessment (Blanche 1988; Davidson and Henning 1985; Hei-
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

lenmann 1990; Janssen-van Dieten 1989), it was necessary to have groups of partici-
pants which were not significantly different in terms of language proficiency level. To
this aim, the participants’ proficiency level was determined by means of the Oxford Pla-
cement Test (OPT). This test was chosen on the grounds that according to Allan (2004),
OPT has been carefully calibrated against the proficiency levels based on the Common
European Framework of Reference for Languages (CEF) and some major international
examinations such as TOEFL. Each test is divided into two sections (Listening and
Grammar), and each section consists of 100 items. These sections are integrated with
reading skills and vocabulary in context at the same time. Despite the test’s impressive
item facility values, discrimination indices, item and inter-test reliability, concurrent
validity, and predictive validity, the concurrent validity of the test was further estab-
lished by calculating the Pearson correlation coefficient between 32 participants’
scores on subskills of the OPT and a retired paper-based TOEFL (Table 2). These par-
ticipants were, in fact, students of other courses taught at the same universities who had
demographic information similar to the final participants of the present study.

Personality traits inventory


Agreeableness and Conscientiousness are measured by two subscales included in the
inventories for measuring the Big Five. There are two inventories for measuring
the Big Five, one called The Revised NEO Personality Inventory (NEO-PI-R), and
the other The NEO Five-Factor Inventory (NEO-FFI). Both of these inventories
are scored on a Likert scale of 1–5. The difference between the two lies in the fact

Table 2. Correlations between OPT and TOEFL subskills and total scores.

TOEFL structure TOEFL listening TOEFL reading TOEFL total

OPT grammar r .71** .83** .91** .89**


p <0.01 <0.01 <0.01 <0.01
OPT listening r .72** .87** .92** .91**
p <0.01 <0.01 <0.01 <0.01
OPT Total r .72** .86** .92** .90**
p <0.01 <0.01 <0.01 <0.01
n 32 32 32 32

**p < .01.


8 P. Birjandi and M. Siyyari

that the NEO-PI-R consists of 240 items which allows a comprehensive assessment of
adult personality on all the above-mentioned personality dimensions and their facets;
however, the NEO-FFI consists of 60 items, and it is recommended to be used at time
constraints, and when global information about personality is required.
For the purpose of this study, the Persian translation of the form S of the NEO-
FFI, approved and available by its publisher Psychological Assessment Resources,
Inc., was given to the participants since time did not allow for the administration of
the NEO-PI-R.
Despite some reliability and validity evidence for the Persian version of NEO-FFI,
such as the evidence found by Garousi, Mehryar, and Ghazi Tabatabayi (2001) which
included Cronbach’s Alpha coefficients ranging between 0.66 and 0.87, and criterion-
related validity with coefficients between 0.65 and 0.76, a pilot study was carried out to
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

collect some validity and reliability evidence for the scale.


To begin with, a total number of 60 undergraduate students almost similar to the
final participants of the study were chosen for the pilot study. The coefficient alpha for
the Agreeableness, and Conscientiousness dimensions of the Persian NEO-FFI were
computed. Compared to the coefficients reported by Costa and McCrae (1992)
(between .86 and .95) and other studies, and with regard to the fact that alphas
were found to be .83, .89 for Agreeableness, and Conscientiousness, respectively, it
was concluded that these two subscales were internally consistent.
Furthermore, the retest reliability of the subscales was investigated by asking the
same participants of the pilot study to retake the tests three weeks after the first admin-
istration. Retest reliability coefficients for the Agreeableness, and Conscientiousness
subscales were .80, and .83, respectively, which is again acceptable in comparison to
the range (between .79 and .83) reported by Costa and McCrae (1992).

Procedure
Personality tests administration
After collecting the above validity evidence, the Agreeableness, and Conscientiousness
tests were administered to the final participants of the study, and alpha for each trait
was calculated. The alphas this time were .85, and .88, for Agreeableness, and Con-
scientiousness, respectively.

Proficiency test administration


In the beginning of the course, the OPT was administered to the students of the groups
to determine their level of general English proficiency. Descriptive statistics on the
groups’ proficiency scores are presented in Table 3.

Table 3. Descriptive statistics on groups’ proficiency scores.

n Min Max Mean SD

Peer-assessment group 68 62 188 135.97 28.78


Self-assessment group 68 78 182 130.02 26.10

Note: Maximum possible proficiency score = 200.


Irish Educational Studies 9

According to the manual to OPT, the participants’ average language proficiency


level lay on the borderline between Lower Intermediate Modest User and Upper Inter-
mediate Competent User (based on OPT), and B1 Threshold Independent User and
B2 Vantage Independent User (based on CEF) (Allan 2004). Since general English
proficiency is an important factor in determining writing performance, it was ideal
to have similar groups in terms of general English proficiency; therefore, the profi-
ciency means of the groups were compared. Since the Kolmogorov–Smirnov and
Shapiro–Wilk tests showed the data were not normally distributed (p < .05), the non-
parametric Kruskal–Wallis Test was used to compare the proficiency means of the
groups, which showed the groups were not significantly different; H = 1.84, df = 2,
p > .05.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

Rater/instructor training
One of the researchers of this study as well as two English major instructors, who were
experienced English language teachers at institute and university levels and held
Master’s and Bachelor’s degrees in TEFL, rated the writing performances of the partici-
pants. The rater training was conducted in several sessions by the researcher, who acted
as the leader in the training process, based on the procedures of Educational Testing
Service elaborated on by Weigle (2002) and the guidelines outlined by Jacobs et al.
(1981). To check the interrater reliability of the raters after the training, 30 compositions
randomly gathered from the same study participants were rated by the raters, and the
interrater reliability was calculated via intraclass correlation, which turned out to be
.92. It should be noted that the same raters were also trained to act as the instructors
of the writing courses. The training was done by holding several training and coordi-
nation meetings so that all the classes would have similar treatment.

Self/peer-assessment training and practice


After the administration of the pretest, the writing course actually started with a two-
hour session on the basics of composition writing such as writing introductions/con-
clusions, topic sentence, supporting sentences, coherence, and cohesion. The second
session, the ESL composition profile accompanied by the related pamphlet containing
the full descriptors, illustrations, and anchor scripts was introduced to the students.
The third session was also spent on the scale elaboration, and then sample compo-
sitions including the ones written on the pretest were given to the students to be
rated based on the scale and the anchor scripts. The students’ ratings were then com-
pared with those of the raters, and the rating ambiguities were discussed and resolved
by the instructors during the session.
After the sessions spent on the introduction of the scale by the instructors and
rating practice by the students, one method of composition development was intro-
duced to the students every session (e.g. comparison/contrast, description, cause
and effect, etc.). Having done the book exercises, the students were given a choice
of two topics for writing. In the peer-assessment group, the participants exchanged
their compositions with those of their peers for peer-assessment; however, the partici-
pants of the self-assessment group rated their own compositions. This was done for
nine sessions afterwards since there were as a whole nine composition development
methods introduced to the students.
10 P. Birjandi and M. Siyyari

Every session, the participants’ compositions from the previous session were rated
by the raters, and the necessary feedback was given to the students. At times, some
sample compositions were also read aloud by the students to be rated by both the tea-
chers and students together in the class. Moreover, the participants in the peer-assess-
ment group compared their own ratings with those of the raters every session. It should
be noted that it was tried to have the groups receive equal amount of instructions on
composition writing.

Data analysis
The analysis of the data in this study was carried out using The Statistical Package for
the Social Sciences (SPSS for Windows, version11), which involved the calculation of
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

descriptive statistics and running paired-samples t test, Kruskal–Wallis Test, corre-


lation, and linear regression. The numerical data for this study came from composition
performance scores given by three raters. The final scores were the mean of the three
ratings, rounded to the closest integer. When a rater’s score varied significantly from
the mean (i.e. by more than one standard deviation), the deviant rater rated the com-
position again.
To come up with the rating error scores, the difference between the self/peer-ratings
and the criterion scores (mean of the three raters’ ratings) were calculated for all ses-
sions. The mean of all these differences were considered as the average rating error
scores of the participants. Logically, the higher the rating error, the lower the rating
accuracy. It should be noted that in a relevant study to this study, the students improve-
ment in rating accuracy and writing performance was measured which indicated sig-
nificant gains from the pretest to the posttest. For further details, the interested
reader is referred to (Siyyari 2010).
Before employing multiple regression, the assumptions made by this statistic were
checked. As regards sample size requirements, different authors have provided differ-
ent criteria in multiple regression. In this study, Tabachnick and Fidell’s (2001, 117)
formula to calculate the necessary sample size was employed. Based on this formula
(n > 50 + 8 m, where m = number of independent variables), our sample size for each
group (n = 68) proved enough for generalizability.
While doing multiple regression analysis, the data for both groups underwent
screening for issues such as outliers (by checking the Mahalanobis distances), normal-
ity (by checking the Normal Probability plot), as well as linearity and homoscedasti-
city (checked from the residuals scatter plots).

Results
To investigate the predictability of self-/peer-assessment rating error through Con-
scientiousness and Agreeableness, standard (simultaneous) multiple regression was
employed. The reason for employing this type of multiple regression was twofold.
First, simultaneous multiple regression allowed for seeing not only how much variance
the independent variables (personality traits) as a group could explain in the depen-
dent variable (rating error), but also how much unique variance each independent vari-
able could explain in the dependent variable. The second reason was that theoretical
grounds on which the order of entering independent variables could be based were
not sufficient to the researcher (Table 4).
Irish Educational Studies 11

Table 4. Descriptive statistics on personality and rating error.

Scores n Min Max Mean SD

Peer-assessment group Agreeableness 68 25.00 75.00 62.6618 13.96436


Conscientiousness 68 25.00 75.00 61.2353 14.02569
Rating error 68 .00 22.63 5.8982 4.15679
Self-assessment group Agreeableness 68 25.00 75.00 63.8529 12.08584
Conscientiousness 68 25.00 75.00 63.4118 13.82993
Rating error 68 .03 24.25 7.7106 6.26920
Valid N (listwise) 68
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

In order to have an acceptable regression model, multicollinearity assumption


must be observed. According to Tabachnick and Fidell (2001), this assumption is
not observed when there is some significant relationship (above .7) between the inde-
pendent variables. As Table 5 shows, this r is .62 and .58 for the peer-assessment and
self-assessment groups, respectively, which obviates the need to omit or combine the
variables.
Moreover, some significant correlation is seen between the dependent and indepen-
dent variables (preferably above .3) in the peer-assessment group (–.39 and –.33),
which is, according to Pallant (2001), sufficient to lets us go on with the main multiple
regression analysis safely. In the self-assessment group, however, no significant
relationship is seen between the dependent and independent variables (.012 and
–.20), so it is hardly possible to see any predictability contribution in the case of
self-assessment group. As it will be seen, no significant contribution to prediction
could be seen on the side of the independent variables in the self-assessment group,
taken together and separately.
Table 6 presents the regression analysis results for the peer-assessment group. As
R2 value shows, the model including both independent variables explains 16% of
variance in the dependent variable with medium to large effect size (Cohen
1992). To assess the significance of R2, the ANOVA result is checked which
shows the independent variables together can significantly predict and explain var-
iance in the dependent variable F(2,65) = 6.46, p < .01. To examine the contribution
of each independent variable separately, beta values are checked, which shows that
the contribution of Agreeableness is more, and only Agreeableness is making a
statistically significant unique contribution to the equation based on the t test
results (p < .05) with medium effect size (r2 = .15), but not Conscientiousness. In
other words, the beta value for Agreeableness indicates that one unit/standard devi-
ation increase in the independent variable (i.e. Agreeableness) is associated with a
decrease of .3-value unit/standard deviation in the dependent Variable (i.e. peer-
assessment rating error), with another independent variable (i.e. Conscientiousness)
held constant.
Table 7 presents the regression analysis results for the self-assessment group. Based
on R2 value, the model including both independent variables explains only 7% of var-
iance in the dependent variable with very small effect size (Cohen 1992). The ANOVA
result shows that the independent variables together cannot significantly predict and
explain variance in the dependent variable, F(2,65) = 2.49, p > .05.
12 P. Birjandi and M. Siyyari

Table 5. Intercorrelations for self/peer-rating error and predictor variables.

Rating error Agreeableness Conscientiousness

Peer-assessment Peer-rating error – −.39** −.33*


group Agreeableness – .62**
Conscientiousness –
Self-assessment group Self-rating error – .01 −.20
Agreeableness – .58**
Conscientiousness –

N = 68.
*p < .05.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

To consider the independent variables’ contribution separately, the beta values are
checked, which indicates that only Conscientiousness is making a statistically signifi-
cant unique contribution to the equation (p < .05). This contribution, however, is just
relatively significant in comparison to Agreeableness, and the effect size is very small
as the correlation between Conscientiousness and self-rating error in the beginning
showed (r2 = .04)

Discussion
The data analysis above indicated that Agreeableness, and Conscientiousness had a
very weak and insignificant relationship with rating error in the self-assessment of
writing performance, thus not allowing for any prediction of self-rating error via
these two personality traits.
In the case of peer-assessment, the Pearson correlations showed that Agreeableness
and Conscientiousness had a relatively significant negative correlation with rating
error in the peer-assessment of writing performance. Further analysis of the data
through multiple regression analysis also showed that Agreeableness and Conscien-
tiousness, taken together, could significantly predict and explain variance in peer-
assessment rating error; however, when Agreeableness and Conscientiousness were
both included in a model, only Agreeableness was found to predict peer-rating variance
significantly. That may reflect the strong correlation between both variables.

Table 6. Regression analysis results for the peer-assessment group.

Unstandardized Standardized Collinearity


coefficients coefficients t Sig. statistics

Std.
B error Beta Tolerance VIF

(Constant) 14.56 2.46 5.92 <0.01**


Agreeableness −.15 .07 −.30* −2.09 .04* .61 1.62
Conscientiousness −.06 .06 −.14 −.99 .32 .61 1.62

Note: R 2 = .16.
*p < .05.
**p < .01.
Irish Educational Studies 13

Table 7. Regression analysis results for the self-assessment group.

Unstandardized Standardized Collinearity


coefficients coefficients t Sig. statistics

Std.
B error Beta Tolerance VIF

(Constant) 11.29 4.50 2.50 .01


Agreeableness .16 .12 .20 1.39 .16 .65 1.52
Conscientiousness −.23 .10 −.32* −2.23 .02* .65 1.52

Note: R 2 = .07.
*p < .05.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

What could be initially understood from the above findings is that despite the
surface similarities between self- and peer-assessment, rating error in self-assessment
and peer-assessment might be of different constructs and working through different
mental, social, and affective mechanisms since they have shown differential relation-
ships with Agreeableness and Conscientiousness.
The first possible reason for this finding might be that peer-assessment as opposed
to self-assessment is a social and interpersonal activity involving interactional pro-
cesses (Nanine et al. 2010), and by the same token, the personality traits investigated
in this study may take effect in interpersonal activities including peer-assessment. This
reasoning is partly supported by Costa and McCrae’s (1992) definitions of Agreeable-
ness in which they assert that this trait has more to do with the interpersonal ten-
dencies. In other words, whenever an individual is involved in an interpersonal
activity, this trait affects the course of actions taken in conducting that activity.
Having reviewed labels given by different theorists, Graziano and Eisenberg (1997)
have also associated Agreeableness with interpersonal terms and concepts such as
‘social interest’, ‘cooperation’, ‘empathy’, ‘selflessness’, ‘identification with others’,
‘friendly compliance’, and ‘conforming to other wishes’ (795–796).
To support the social interpersonal nature of peer-assessment, Topping (2003)
could be cited as explaining that social processes as a dominant source of influence
in peer-assessment rather than self-assessment can affect the reliability and validity
of peer-assessment. Some of these social processes, which seem to be absent in self-
assessment, revolve around issues such as friendship bonds, power processes, and
group popularity levels of individuals. These issues which are seemingly absent in
self-assessment can be considered as reasons for Agreeableness to have significant cor-
relations with peer-rating error rather than self-rating error as the findings above
suggested.
More support and justifications for the above finding can also be found by consid-
ering the descriptions of the predispositions revealed by individuals with extreme levels
of Agreeableness. As indicated in Table 5, a negative correlation was observed between
Agreeableness and peer-rating error within the peer-assessment group. The higher the
level of agreeableness, the lower the rating error in peer-assessment. Although this con-
clusion is too strong to be drawn with regard to the medium effect size, its significant
negative correlation with rating error in comparison to Conscientiousness, and its sig-
nificant power to predict peer-rating error, needs more attention. In this regard,
14 P. Birjandi and M. Siyyari

considering Costa and McCrae’s (1992) definition of Agreeableness and its facets can
be helpful. Agreeableness which is generally defined by Costa and McCrae (1992) as
altruism, sympathy, and helpfulness is of the following facets: Trust, Straightforward-
ness, Altruism, Compliance, Modesty, and Tender-mindedness. It seems that all these
instances of behaviour and predispositions can be conducive to help support the
finding above, especially with regard to the fact that during the instruction and treat-
ment of the current study, the teachers always encouraged the students to be fair and
do the peer-ratings based on the criteria without any favouritism towards their peers. It
can logically be expected that more altruistic, cooperative, compliant, trustworthy, and
generally more agreeable individuals with their teachers tended to be more accurate
raters than others.
In explaining the facets of Agreeableness, Costa and McCrae (1992) further
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

provide clear descriptions of low or high scorers on this scale, which are worth consid-
ering. For the first facet, they describe Straightforwardness as the tendency to be
honest, sincere, and ingenuous. Logically, a person with these characteristics is
expected to be more willing to do the peer-assessment honestly if he or she is asked
to do so by the teacher. Altruism, as another facet of Agreeableness, is also defined
as the tendency to assist others with their problems. Since the teachers in the
present study explained to the students that sharing assessment responsibility with
the students was one of the purposes of peer-assessment and the current study, the
highly agreeable participants were probably more altruistic and tried to have their
share in assessment with less rating error.
Some studies on the relationship between Agreeableness and learning and job per-
formance in the literature can be also cited here to justify the predictive power of
Agreeableness in peer-rating error. The present study involved the performance of stu-
dents on an assessment task which, in the first place, required some learning in terms
of the assessment criteria and then actual performance on the assessment task. It
seems reasonable to think that the better the students learn to get familiar with the
assessment criteria and do the rating, the lower their rating error. What might have
helped the students to better do so might be Agreeableness which has been found to
have significant relationship with learning and job performance in different contexts.
For instance, Cortina et al. (1992) and Salgado (1997) found Agreeableness to be a
valid predictor of learning in different jobs. Also, Mohammed, Mathieu, and Bartlett
(2002) found that Agreeableness is a valid predictor of job performance in different
contexts. If this is the case, then one can ask why this trait did not contribute to
better performance (i.e. lower rating error) in self-assessment in the present study.
One possible answer could lie in the fact that maybe Agreeableness, as a trait con-
cerned with interpersonal tendencies, can contribute to better rating performance
only when it interacts with the social interactional nature of peer-assessment, which
is seemingly absent in self-assessment.
Among very few studies similar to the present study, whose findings may have a
bearing on the current study’s results, is Grahek’s (2007) study which revealed that
Agreeableness was positively correlated with rating leniency or overestimation. More-
over, Bernardin, Cooke, and Villanova (2000), and Yun et al. (2005) had formerly
found somewhat similar results. Their findings are in general justified on the ground
that later face-to-face interaction in the form of feedback meetings of raters and
ratees caused raters high on Agreeableness to provide more elevated and overestimated
(as a result less accurate) ratings than raters low on Agreeableness. They also add that
Irish Educational Studies 15

more agreeable raters would not want to provide a ratee with negative feedback for
fear that the ratee may react negatively in his or her later interaction with the rater.
This finding is evidently in contrast to the finding of the current study since peer-
assessment is also a kind of rating practice which involves later interaction of raters
and ratees who are usually classmates or schoolmates; however, as Yun et al. (2005)
reason, many factors can moderate the relationship between rating accuracy and per-
sonality. These studies focused on one aspect of rating error, that is, lenient rating or
overestimation, but the findings of the present study are concerned with rating error
including both underestimation and overestimation. Moreover, these studies were
different from the current study in terms of their context or the instruments used
(e.g. checklist or opinion-questionnaire). Moreover, it could be argued that maybe
contrary to Yun et al.’s (2005), reasoning raters high on Agreeableness would like to
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

be more agreeable to their teachers rather than their peers, and since the teachers
would naturally want more accurate peer-ratings, the peer-raters would do their best
to rate their peers more accurately in order to meet the expectations of their teachers.
As the findings of the study indicated, there was a statistically significant negative
correlation between Conscientiousness and peer-rating error, but when included in
model with another strongly correlated variable (i.e. Agreeablessness), it was no
longer statistically significant. It should also be remembered that including both
Agreeableness and Conscientiousness in a model to predict peer-rating error explained
16% of the variation. When Agreeableness and Conscientiousness were both included
in a multiple regression analysis, only Agreeableness was found to significantly predict
peer-rating error’. How these two traits together may have resulted in the above finding
can be recommended for further research. Of course, in Yun et al.’s (2005) study,
similar insignificant correlations were found, which was justified by reasoning that
when no incentive is provided for rating accuracy, and when the raters are not evalu-
ated and considered accountable for the accuracy of their ratings, those high on Con-
scientiousness would not want to provide the most accurate ratings possible because it
is not important to these individuals anymore to be evaluated well. Similarly, in this
study too, the students’ rating accuracy was never considered in their term score
(i.e. no evaluation was made in terms of their rating accuracy in both self and peer-
assessment); therefore, the context of the present study was not relevant for this trait.

Conclusions
Agreeableness and Conscientiousness, either separately or in combination of their
effects, seem to have little effect on self-rating error given the circumstances under
which this study was conducted. This finding, however, never means that self-assess-
ment rating accuracy may never be influenced by personality traits since the literature
shows that personality itself may interact with other variables such as social settings
(e.g. Yun et al. 2005) which can result in findings other than what is expected or
hypothesized. Only further comprehensive research can shed further light on this
issue. This weak relationship in self-assessment is, however, in contrast to the other
finding showing that Agreeableness and Conscientiousness, in combination, and Agree-
ableness alone can significantly be predictive of peer-rating error. The main implication
of this finding is to emphasize practising peer-assessment in educational settings
accompanied by the use of personality inventories to measure traits, including Agree-
ableness and Conscientiousness as well as any other influencing personality trait that
16 P. Birjandi and M. Siyyari

can cause and predict variation in peer-rating error (e.g. social variables). Despite the
educational benefits of practising peer-assessment, consideration of personality traits
in peer-assessment can help teachers and administrators use the scores obtained from
peer-assessment of learners for any important testing decision with more awareness
and caution. Of course, this can only be done when further evidence from studies
with more representative and larger samples has been collected in terms of the
predictive power of personality traits in peer-assessment rating error. Apart from
employing more representative samples, it is suggested that further research be done
under different social and psychological circumstances such as when self-raters and
peer-raters are held accountable for the accuracy of their ratings. This and other
factors may moderate the way personality traits may influence rating accuracy in
self/peer-assessment.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

Finally, the results of this study can give researchers a better understanding of the
differential interactive relationships between rater personality and rating format.
Although much research has been conducted considering the role of personality in per-
formance rating by expert raters, this is the first study to examine the relationship
between personality and rating accuracy in self/peer-assessment. Therefore, it is
suggested that more research be done considering the moderating and interactive
effect of other individual differences along with other social factors to broaden our
understanding of the potential sources and causes of error in rating in self/peer-
assessment.

Notes on contributors
Parviz Birjandi is a full professor of TEFL and applied linguistics at Science and Research
Branch of Islamic Azad University (SRBIAU) in Tehran, Iran. He received his M.A from Col-
orado State University and his Ph.D. from the University of Colorado at Boulder in English
Education. He has supervised numerous MA and Ph.D. theses and has taught different
English language and applied linguistics courses at BA, MA and PhD for over 30 years. He
has also published numerous national middle school, high school, and university textbooks
and international articles. His major interests are language assessment, research methodology,
and first language acquisition.

Masood Siyyari is an assistant professor of TEFL and applied linguistics at Science and
Research Branch of Islamic Azad University in Tehran, Iran. He received his PhD in applied
linguistics from Allameh Tabataba’i University in Tehran, Iran. Currently, he supervises MA
theses and teaches graduate courses in applied linguistics and translation studies at SRBIAU.
His main areas of research include language assessment and second language acquisition.

References
AlFallay, I. 2004. “The Role of Some Selected Psychological and Personality Traits of the Rater
in the Accuracy of Self-And Peer-Assessment.” System 32: 407–425.
Allan, D. 2004. Oxford Placement Test 1. Oxford: Oxford University Press.
Andrade, H. G., and A Valtcheva. 2009. “Promoting Learning and Achievement Through Self-
Assessment.” Theory Into Practice 48: 12–19.
Andrade, H., X. Wang, Y. Du, and R Akawi. 2009. “Rubric-Referenced Self-Assessment and
Self-Efficacy for Writing.” The Journal of Educational Research 102: 287–302.
Arnaudet, M. L., and M. E. Barrett. 1990. Paragraph Development: A Guide for Students of
English (2nd ed.). Englewood Cliffs: Prentice Hall.
Bernardin, H. J., D. K. Cooke, and P Villanova. 2000. “Conscientiousness and Agreeableness as
Predictors of Rating Leniency.” Journal of Applied Psychology 85: 232–236.
Irish Educational Studies 17

Blanche, P. 1988. “Self-Assessment of Foreign Language Skills: Implications for Teachers and
Researchers.” RELC Journal 19: 75–96.
Borman, W. C., and G. L. Hallam. 1991. “Observation Accuracy for Assessors of Work-sample
Performance: Consistency across Task and Individual-differences Correlations.” Journal of
Applied Psychology 76: 11–18.
Boud, D., and N. Falchikov. 2007. Rethinking Assessment in Higher Education: Learning for the
Longer Term. London: Routledge.
Brantmeier, C., and R. Vanderplank. 2008. “Descriptive and Criterion-Referenced Self-
Assessment with L2 Readers.” System 36: 456–477.
Brown, A. 2005. “Self-Assessment of Writing in Independent Language Learning Programs:
The Value of Annotated Samples.” Assessing Writing 10: 174–191.
Brown, J. D., and T Hudson. 1998. “The Alternatives in Language Assessment.” TESOL
Quarterly 32: 653–675.
Cassidy, S. 2007. “Assessing ‘Inexperienced’ Students’ Ability to Self-Assess: Exploring Links
with Learning Style and Academic Personal Control.” Assessment & Evaluation in Higher
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

Education 32: 313–330.


Clement, R., R. Gardner, and P. Smythe. 1977. “Motivational Variables in Second Language
Acquisition: A Study of Francophones Learning English.” Canadian Journal of Behavioral
Science 9: 123–133.
Cohen, J. 1992. “A Power Primer.” Psychological Bulletin 112: 155–159.
Cortina, J. M., M. L. Doherty, N. Schmitt, G. Kaufman, and R. G Smith. 1992. “The Big Five
Personality Factors in the IPI and MMPI: Predictors of Police Performance.” Personnel
Psychology 45: 119–140.
Costa, P. T., and R. R. McCrae. 1992. Revised NEO Personality Inventory (NEO–PI-R) and
NEO Five-Factor Inventory (NEO–FFI): Professional manual. Odessa, FL: Psychological
Assessment Resources.
Costa, P. T., and R. R. McCrae. 1997. “Longitudinal Stability of Adult Personality.” In
Handbook of Personality Psychology, edited by R. Hogan, J. Johnson, and S. Briggs, 269–
290. San Diego, CA: Academic Press.
Davidson, F., and G. Henning. 1985. “A Self-Rating Scale of English Proficiency: Rasch Scalar
Analysis of Items and Rating Categories.” Language Testing 2: 164–179.
Dlaska, A., and C. Krekeler. 2008. “Self-Assessment of Pronunciation.” System 36: 505–516.
Dochy, F., M., Segers, and S. Sluijman. 1999. “The Use of Self-, Peer and Co-Assessment in
Higher Education: A Review.” Studies in Higher Education 24: 330–350.
Falchikov, N., and D. Boud. 1989. “Student Self-Assessment in Higher Education: A Meta
Analysis.” Review of Educational Research 59: 395–430.
Falchikov, N., and J. Goldfinch. 2000. “Student Peer Assessment in Higher Education: A Meta-
Analysis Comparing Peer and Teacher Marks.” Review of Educational Research 70: 287–322.
Falchikov, N., and D. Magin. 1997. “Detecting Gender Bias in Peer Marking of Students’
Group Progress Work.” Assessment & Evaluation in Higher Education 22: 385–396.
Feist, J., and G. J. Feist. 2006. Theories of Personality. 6th ed. New York, NY: McGraw-Hill.
Garousi, M. T., A. H. Mehryar, and M. Ghazi Tabatabayi. 2001. “Application of the NEO-PI-
R Test and Analytic Evaluation of Its Characteristics and Factorial Structure Among Iranian
University Students.” Journal of Humanities 11: 173–198.
Giordano, P. J. 2008. “Personality Psychology.” In 21stcentury Psychology: A Reference
Handbook, edited by S. F. Davis, and W. Buskist, 402–412. Thousand Oaks, CA: Sage.
Gipps, C. V. 1994. Beyond Testing: Towards a Theory of Educational Assessment. London: The
Falmer Press.
Grahek, M. 2007. Personality and Rater Leniency: Comparison of Broad and Narrow
Measures of Conscientiousness and Agreeableness. Masters’ thesis, Denton, Texas:
University of North Texas.
Graziano, W. G., and N. Eisenberg. 1997. “Agreeableness; A Dimension of Personality.” In
Handbook of Personality Psychology, edited by R. Hogan, S. Briggs, and J. Johnson,
795–824. San Diego, CA: Academic Press.
Gregersen, T., and E. Horwitz. 2002. “Language Learning and Perfectionism: Anxious and
Non-Anxious Language Learners’ Reactions to Their Own Oral Performance.” Modern
Language Journal 86: 562–570.
18 P. Birjandi and M. Siyyari

Harris, M. 1997. “Self-Assessment of Language Learning in Formal Settings.” ELT Journal 51:
12–20.
Heilenmann, K. L. 1990. “Self-Assessment of Second Language Ability: The Role of Response
Effects.” Language Testing 7: 174–201.
Hughey, J. B., D. R. Wormuth, V. F. Hartfiel, and H. L. Jacobs. 1983. Teaching ESL
Composition: Principles and Techniques. Rowley, MA: Newbury House.
Jacobs, H. L., S. A. Zingraf, D. R. Wormuth, V. F. Hartfiel, and J. B Hughey. 1981. Testing ESL
Composition: A Practical Approach. Rowley, MA: Newbury House.
Jafarpur, A., and M. Yamini. 1995. “Do Self-Assessment and Peer-Rating Improve with
Training?” RELC Journal 26: 63–85.
Janssen-van Dieten, A. 1989. “The Development of a Test of Dutch as a Second Language: The
Validity of Self-Assessments by Inexperienced Subjects.” Language Testing 6: 30–46.
Kane, J. S., H. J. Bernardin, P. Villanova, and J. Peyrefitte. 1995. “Stability of Rater Leniency:
Three Studies.” Academy of Management Journal 38: 1039–1051.
Langan, A. M., D. M. Shuker, W. R. Cullen, D. Penney, R. F. Preziosi, and C. P Wheater. 2008.
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

“Relationships Between Student Characteristics and Self, Peer and Tutor Evaluations of Oral
Presentations.” Assessment Evaluation in Higher Education 33: 179–190.
Lejk, M., and M. Wyvill. 2001. “Peer Assessment of Contributions to a Group Project: A
Comparison of Holistic and Category-Based Approaches.” Assessment & Evaluation in
Higher Education 26: 62–72.
Lejk, M., and M. Wyvill. 2002. “Peer Assessment of Contributions to a Group Project: Student
Attitudes to Holistic and Category-Based Approaches.” Assessment & Evaluation in Higher
Education 27: 569–577.
MacAlpine, J. M. K. 1999. “Improving and Encouraging Peer Assessment of Student
Presentations.” Assessment & Evaluation in Higher Education 24: 15–25.
MacIntyre, P. 1992. Anxiety, Language Learning and Stages of Processing. PhD Diss., London,
Canada: University of West Ontario.
MacIntyre, P. D., and C. Charos. 1996. “Personality, Attitudes, and Affect as Predictors of
Second Language Communication.” Language and Social Psychology 15: 3–26.
MacIntyre, P., K. Noels, and R. Clement. 1997. “Biases in Self-Ratings of Second Language
Proficiency: The role of language anxiety.” Language Learning 47: 265–287.
Mackey, A., and S. M. Gass. 2005. Second Language Research: Methodology and Design.
Mahwah, NJ: Lawrence Erlbaum.
Matsuno, S. 2009. “Self-, peer-, and teacher-assessments in Japanese university EFL writing
classrooms.” Language Testing 26: 75–100.
Matthews, G., I. J. Deary, and M. C. Whiteman. 2009. Personality Traits. 3rd ed. New York,
NY: Cambridge University Press.
McKay, P. 2006. Assessing Young Language Learners. Cambridge, MA: Cambridge University
Press.
Miller, P. J. 2003. “The Effect of Scoring Criteria Specificity on Peer and Self-Assessment.”
Assessment & Evaluation in Higher Education 28: 383–394.
Mohammed, S., J. E. Mathieu, and A. L Bartlett. 2002. “Technical-Administrative Task
Performance, Leadership Task Performance, and Contextual Performance: Considering the
Influence of Team- and Task-Related Composition Variables.” Journal of Organizational
Behavior 23: 795–814.
Nanine, A. E., N. A. E. van Gennip, M. S. R. Segers, and H. H. Tillema. 2010. “Peer
Assessment as a Collaborative Learning Activity: The Role of Interpersonal Variables and
Conceptions.” Learning and Instruction 20: 280–290.
Oldfield, K., and J. M. K. MacAlpine. 1995. “Peer and Self-Assessment at Tertiary Level: An
Experiential Report.” Assessment & Evaluation in Higher Education 20: 125–132.
Orsmond, P., S. Merry, and K. Reiling. 2000. “The Use of Student Derived Marking Criteria in
Peer and Self-Assessment.” Assessment & Evaluation in Higher Education 25: 23–38.
Oscarson, M. 1989. “Self-Assessment of Language Proficiency: Rationale and Implications.”
Language Testing 6: 1–13.
Pallant, J. 2001. SPSS Survival Manual. Philadelphia: Open University Press.
Patri, M. 2002. “The Influence of Peer Feedback on Self-And Peer Assessment of Oral Skills.”
Language Testing 19: 109–131.
Irish Educational Studies 19

Pope, N. 2005. “The Impact of Stress in Self- and Peer-Assessment.” Assessment & Evaluation in
Higher Education 30: 51–63.
Saito, H., and T. Fujita. 2004. “Characteristics and User Acceptance of Peer Rating in EFL
Writing Classrooms.” Language Teaching Research 8: 31–54.
Salgado, J. F. 1997. “The Five Factor Model of Personality and Job Performance in the
European Community.” Journal of Applied Psychology 82: 30–43. 212.
Siyyari, M. 2010. On the Cross Relationship between Self-/Peer-assessment Rating Accuracy
and Personality Traits, and the Contribution to Writing Performance. PhD diss., Allameh
Tabataba’i University.
Smith, H., A. Cooper, and L. Lancaster. 2002. “Improving the Quality of Undergraduate Peer
Assessment: A Case for Student and Staff Development.” Innovations in Education &
Teaching International 39: 71–81.
Stefani, L. A. J. 1994. “Peer, Self and Tutor Assessment: Relative Reliabilities.” Studies in
Higher Education 19: 69–75.
Tabachnick, B. G., and L. S. Fidell. 2001. Using Multivariate Statistics. 4th ed. New York: Allyn
Downloaded by [Orta Dogu Teknik Universitesi] at 14:48 30 March 2016

& Bacon.
Tamir, M. 2005. “Don’t Worry, Be Happy? Neuroticism, Trait-Consistent Affect Regulation,
and Performance.” Journal of Personality and Social Psychology 89: 449–461.
Taras, M. 2001. “The Use of Tutor Feedback and Student Self-Assessment in Summative
Assessment Tasks: Towards Transparency for Students and for Tutors.” Assessment &
Evaluation in Higher Education 26: 605–614.
Taras, M. 2008. “Issues of Power and Equity in Two Models of Self-Assessment.” Teaching in
Higher Education 13: 81–92.
Topping, K. J. 1998. “Peer Assessment Between Students in Colleges and Universities.” Review
of Educational Research 68: 249–276.
Topping, K. J. 2003. “Self and Peer Assessment in School and University: Reliability, Validity
and Utility.” In Optimizing New Modes of Assessment: In Search of Qualities and Standards,
edited by M. Segers, F. Dochy, and E. Cascallar, 55–87. Dordrecht: Kluwer Academic.
Topping, K. J., E. F. Smith, I. Swanson, and A Elliot. 2000. “Formative Peer Assessment of
Academic Writing Between Postgraduate Students.” Assessment & Evaluation in Higher
Education 25: 149–166.
Touze, P. A. 2005. “Personality and Prediction of Performance in the Workplace.” Le travail
humain 68: 37–53.
Tziner, A., K. R. Murphy, and J. N. Cleveland. 2002. “Does Conscientiousness Moderate the
Relationship between Attitudes and Beliefs Regarding Performance Appraisal and Rating
Behavior?” International Journal of Selection and Assessment 10: 218–224.
Vickerman, P. 2009. “Student Perspectives on Formative Peer Assessment: An Attempt to
Deepen Learning?” Assessment & Evaluation in Higher Education 34: 221–230.
Weigle, S. C. 2002. Assessing Writing. Cambridge: Cambridge University Press.
Yun, G. J., L. M. Donahue, N. M. Dudley, and L. A Mcfarland. 2005. “Rater Personality,
Rating Format, and Social Context: Implications for Performance Appraisal Ratings.”
International Journal of Selection and Assessment 13: 97–107.
Zemach, D. E., and L. Stafford-Yilmaz. 2008. Writers at Work: The Essay. Cambridge:
Cambridge University Press.

Вам также может понравиться