Вы находитесь на странице: 1из 10

Computers & Education 60 (2013) 325334

Contents lists available at SciVerse ScienceDirect

Computers & Education


journal homepage: www.elsevier.com/locate/compedu

Is learner self-assessment reliable and valid in a Web-based portfolio


environment for high school students?
Chi-Cheng Chang a, *, Chaoyun Liang b,1, Yi-Hui Chen a
a
b

Department of Technology Application and Human Resource Development, National Taiwan Normal University, No. 162, He-Ping E. Road, Section 1, Taipei 106, Taiwan
Department of Information Communication, Yuan Ze University, No. 135, Yuan-Tung Road, Chung-Li, Taoyuan 32003, Taiwan

a r t i c l e i n f o

a b s t r a c t

Article history:
Received 3 September 2011
Received in revised form
16 May 2012
Accepted 18 May 2012

This study explored the reliability and validity of Web-based portfolio self-assessment. Participants were
72 senior high school students enrolled in a computer application course. The students created learning
portfolios, viewed peers work, and performed self-assessment on the Web-based portfolio assessment
system. The results indicated: 1) there was a high consistency level between students two assessment
results; 2) the self-assessment results and teacher-assessment results were highly consistent without
signicant differences; 3) the results of self-assessment and end-of-course examination were highly
consistent, implying that Web-based portfolio self-assessment can faithfully reect learning achievements, i.e. it is a reliable and valid assessment method.
2012 Elsevier Ltd. All rights reserved.

Keywords:
Self-assessment
Portfolio
Web-based Portfolio
Portfolio assessment
Reliability
Validity

1. Introduction
A learning portfolio refers to an assessment tool purposefully collecting learning processes over a period of time in a way that reects
academic achievements, including evidence of progress, reective thinking, learning outcomes, and so on. Traditional learning portfolios are
relatively ineffective in showcasing and viewing, since huge piles of paper documents are usually gathered over time. Yanceys (2001)
recognized the benets of the Web format of learning portfolios, which was anticipated to gain more and more popularity. According to
Chang (2001) and Chang and Tseng (2009a), the Internet has markedly contributed to learning portfolios in terms of storage and
management, enabling learners to have more opportunities to share and view peer work. Plus, it is benecial for teachers to grasp deeper
understandings of each student and to enhance classroom interaction.
Portfolio assessment is dened as the examination of a systematic collection of student work that documents a students endeavors,
growth and achievements (Chang, 2008). In short, it is used to assess students achievements by reviewing their learning portfolios.
Moreover, in a Web-based context the assessment process is carried out through the Internet where participants undertake a variety of
activities, such as self- and peer-assessment, peer portfolio reviews, teacher evaluation, and so on (Chang & Tseng, 2009b). Gadbury-Amyot
et al. (2003) asserted that the utilization of self-assessment, usually available on portfolio assessment, is instrumental in facilitating college
students expertise. As a part of Web-based portfolio assessment, students are required to perform self-assessment, for which they have to
make reections on past performance according to the gathered information on their own learning portfolios, such as to which extent
learning goals were achieved, or what are the problem areas needed to be improved.
Although a substantial body of literature exists on peer-assessment, relatively little empirical attention has been directed toward issues
regarding self-assessment. The reliability and validity of such method are also controversial topics which needed to be explored. After
summarizing previous studies, Oskay, Schallies, and Morgil (2008) validated portfolio assessments as a powerful and trustworthy approach;
however, there was no further discussion of self-assessment related issues. In Knowles, Holton and Swansons (2005) view, self-assessment

* Corresponding author. Tel.: 886 2 77343439; fax: 886 2 23921015.


E-mail addresses: samchang@ntnu.edu.tw (C.-C. Chang), Liangiccylian@saturn.yzu.edu.tw (C. Liang), mchen316@gmail.com (Y.-H. Chen).
1
Tel.: 886 3 4638800x2230; fax: 886 3 4638233.
0360-1315/$ see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.compedu.2012.05.012

326

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

research oftentimes reports acceptable degrees of reliability but poor validity. In contrast, a number of researchers discovered decent
consistency levels among self-based and teacher-based ratings (Liang & Tsai, 2010; Sadler & Good, 2006; Sung, Chang, Chiou, & Hou, 2005).
Probably, the divergent ndings were due to various educational levels of participants, content of course and training of self-assessment.
Hence, reliability of self-assessment may be improved by increasing assessors understanding on assessment method and scoring
criteria, training and opportunity of self-assessment, validity of scoring criteria, number of assessor, and sample of assessment etc (Derham
& DiPerna, 2007; Gadbury-Amyot, 2003; Sulzen, Young, & Hannin, 2008; Sung et al., 2005).
Comparing to the paper counterparts, the reliability and validity appear to be subject to many inuential factors in computerized
assessments, e.g. technological supports (Bouzidi & Jaillet, 2009). Given that electronic portfolio assessment has not yet been fully explored,
the present study hereby attempted to examine the reliability and validity of Web-based self assessment. Table 1 illustrates the research
questions and statistical methods.
2. Literature review
2.1. Web-based portfolio assessment
Portfolios allow assessors to track learning processes and outcomes, covering affective and cognitive variables and individual differences
of learners (Baturay & Daloglu, 2010; Zawacki-Richter & Hanft, 2011). That is to say, it is an assessment method that sincerely evaluates
a learners performance by examining learning process according to his/her portfolio content. Hult (2001) pointed out students are more
likely to identify the problem areas by themselves during the development of learning portfolios. In light of metacognitive theory, selfmonitoring ability empowers learners to become aware of their progress as well as strengths and weaknesses by constantly uncovering
and solving problems. In accordance with this view, Luca and McMahon (2006) regarded Web-based portfolios as an online tool for
cultivating self-monitoring strategies. This coincides with Chang and Tsengs (2009b) ndings, which revealed that students benet from
the use of Web-based portfolios in a way that sharpens their self-assessment skills. In summary, online self-monitoring and self-assessment
play crucial roles in Web-based portfolio assessment.
2.2. Reliability and validity of self-assessment
2.2.1. Reliability
In assessment studies, reliability involves the consistency of assessment results, i.e. the extent to which assessment results are accurate
and trustworthy. In Yus (2002) denition, it refers to the consistency or stability among repeated assessment results on the same item over
a period of time, for which an assessment is to be considered as reliable if consistency is discovered between two or more assessment
results. If the assessment scores are to be signicant, reasonable level of consistency and reliability has to be detected (Chang, 2002).
When it comes to self-assessment, reliability is further dened as the consistency or stability of an assessment examined at different
point of time and in different contexts. Knowles et al. (2005) revealed that fact that good reliability is prone to occur in self-assessment
analysis, considering that consistency is very likely to be achieved within an individual even at different time periods. With respect to
statistical methods, the Pearsons productmoment correlation is appropriate in measurement of the correlated coefcients between two
self-assessment results (Chang, 2002; Derham & DiPerna, 2007; Gadbury-Amyot et al., 2003; Lenze, 2004; Lin, Liu, & Yuan, 2001; Liu, Lin,
Chiu, & Yuan, 2001), while the inter-rater reliability of assessment scale can be examined by using Cronbachs a (Derham & DiPerna, 2007).
2.2.2. Validity
In the present study, we described validity as the degree to which an assessment is able to evaluate the intended properties or functions,
or the accuracy of assessment results. It is also dened as the valid level of an assessment scoredhow appropriate the score is in order to
make a strategic decision (Yu, 2002).
Knowles et al. (2005) cast doubt on self-assessment results because raters self-satisfaction might decrease grading accuracy. It is
desirable that self-assessment validity research should incorporate an exterior criterion, e.g. assessment results from teachers or teacher
assistants (Bouzidi & Jaillet, 2009; Cho, Schunn, & Wilson, 2006; Falchikov & Goldnch, 2000; Sluijsmans, Dochy, & Moerkerke, 1999; Yu,
2002). Sung et al. (2005) concluded self-assessment was consistent with teacher-assessment. Similarly, Sadler and Good (2006) identied high level of consistency among self- and teacher-based scores. In summary, researchers so far have held opposite views on selfassessment validity issues.

Table 1
Research purposes, questions and statistical methods.
Research purpose

Research question

Treatment procedure

Statistical method

Self-assessment
reliability

Are the two self-assessment


results consistent within
individual rater?
Are self-assessment results
consistent with teacher-assessment
results? Are there patent differences
between the two variables?
Are self-assessment results sufcient
to examine learning achievements,
i.e. are self-assessment results
consistent with end-of-course exam
results?

Test-retest consistency

Pearsons correlation

Criterion-related Validity
(teacher-assessment results
as exterior criteria)

Pearsons correlation and t-test

Criterion-related Validity
(exam results as exterior criteria)

Pearsons correlation

Self-assessment
validity

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

327

Gadbury-Amyot et al. (2003) investigated the correlation between portfolio-based scores and end-of-course exam scores by treating the
latter as exterior criteria. It is believed that high correlation coincides with strong level of validity. In our view, self-assessment validity can
be effectively measured by using an exterior criterion based on teacher-assessment or exam results. To ensure a valid exterior criterion, it is
essential having a large number of teacher raters who are preferably well-trained and experienced (Chang, 2002). Moreover, if an exterior
criterion is to be signicant, further analysis related to teachers inter-rater consistency needs to be provided.
3. Method
3.1. Participants
The participants were 79 students in a computer course at a senior high school. In this study, 72 students portfolios were considered
suitable for statistical analysis, including 34 males and 38 females. The duration of the study was a 12-week period with 3 h for each week.
The students were taught 2 units of the computer course that addressed Word Processing: Page Setup (Unit 1) and File Edition (Unit 2). In
the experiments, the data collected from learning portfolios were based on the coursework. The students performed portfolio creation,
inspection and self-assessment via the Web-based portfolio assessment system. Teachers, on the other hand, employed this system to
monitor learning progress as well as to review learning performances.
3.2. Development of assessment rubrics
Rubrics are a set of items or indicators that would be used for assessing students portfolios. Rubrics are also called performance criteria
or standards that are used to assess human behavior performance. Baturay and Daloglu (2010) used the scoring rubric for assessing students
portfolios in an online English language course. In order to construct content validity, the assessment rubric was created with reference to
relevant papers, and constant modications were made in terms of comments from the teacher and experts. After the pre-test experiment,
we nalized the rubric, as many scholars suggested, by taking into account students opinions in order to enhance their motivation as well as
engagement (Bloxham & West, 2004; Price & ODonovan, 2006; Prins, Sluijsmans, Kirschner, & Strijbos, 2005; Tsai, Lin, & Yuan, 2002). The
rubric was consisted of 27 scoring criteria including six aspects: Portfolio Creation, Learning Goal, Artifact, Reection, Attitude, and Other.
Scores were given ranging from 1 to 5, with a 0.5-increment in order to precisely distinguish the assessment results.
In Portfolio Creation, the evaluation of portfolios focuses on content completeness, appropriateness, richness, and organization and
presentation. The detailed questions are as follows:
1. Information completeness: Is the information in my portfolio completed as required?
2. Content appropriateness: Is the content directly related to the purpose of the portfolio?
3. Content richness: In addition to the requirements, is extra information provided to enhance the content of portfolio, e.g. achievement
test results, classroom anecdotes or observation, course activity outcomes or link sharing?
4. Organization and presentation: Is the layout artful and clear, and enhances the information presented?
In Learning Goal, the focus was placed on to which extent a students learning goals were fullled. The questions are as follows:
1.
2.
3.
4.

Personal goal setting: Do I establish proper goals and objectives based on my ability, self-expectations, etc?
Teaching goals: Do I achieve teaching goals related to the course subject?
Progress and accomplishments: Do I show progress and improvement in learning attitudes and academic performance.
Goal attainment: Are my learning goals achieved?

The Artifact aspect was concerned with the accuracy, appropriateness, richness, completeness, difculty and originality of uploaded
coursework; different versions of assignments were also taken into account. The detailed questions are as follows:
1.
2.
3.
4.
5.
6.
7.

Correctness: Does my artifact have adequate correctness?


Appropriateness: Does my artifact have adequate appropriateness?
Richness and completeness: Does my artifact have adequate richness and completeness?
Difculty level: Does my artifact demonstrate adequate difculty and elaborateness?
Creativity and originality: Does my artifact demonstrate adequate creativity and originality?
Comprehension of course content: Does my artifact demonstrate adequate comprehension on the course content?
Process of artifact production: Does my artifact demonstrate the process of development, e.g. draft, original and revised versions of
work?

Students reections on learning goals, artifacts, learning attitudes and achievements were assessed in Reection. It is expected that
learners demonstrate the ability to identify their weaknesses and provide useful remedies. To boost writing productivity, assessment scores
wouldnt be negatively affected if students specied mistakes or improper behaviors in the learning process. The questions are as follows:
1.
2.
3.
4.
5.
6.

Reection on learning goals: Does the reection relate to my learning goals?


Reection on artifacts: Does the reection describe the process and outcomes of my artifact development?
Reection on learning achievements: Does the reection demonstrate my learning achievements, weaknesses and progress?
Reection on attitudes: Does the reection demonstrate my learning attitudes and growth?
Reection on peer performance: Does the reection present suitable self-expectations after reviewing the content of peer portfolios.
Reection on feedback: Does the reection present adequate self-expectations and improvement according teacher and peer feedback?

328

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

The Attitude aspect examined a students performance in portfolio inspection, peer assessment and feedback, link sharing, quantity and
quality of online participation. The questions are as follows:
1. Online inspection and viewing, peer-assessment, feedback giving: Do I make appropriate comments and feedback according to the
content of peer portfolios?
2. Resource sharing: Do I share relevant learning resources on the assessment system?
3. Online forum, knowledge sharing, idea exchange, and Q&A: Do I practice online discussion, knowledge sharing, idea exchange and
problem solving on the assessment system?
4. Quantity of online interaction: Do I participate in sufcient online interaction, e.g. the amount and duration of portfolio reviews, length
and numbers of posts, etc?
5. Quality of online interaction: Do I participate in high-quality online interaction, e.g. active online discussion and constructive peer
feedback or critiques?
On the other hand, our study adopted on-line self-assessment because with the features of web hyperlinks, it tends to support social
interaction by offering a wide range of two-way activities which are not available for conventional learning portfolios (Yancey, 2001).
Portfolio Creation measures students ability to take advantage of various types of le formats to present their work, e.g. audio/visual, Word,
PowerPoint. In Reection, two additional criteria were entailed, looking into ones reection on reviewing peer portfolios and on teacher/
peer feedback. The aspect Other was to measure students learning processes and growth according to supplemental evidence. The
question in this aspect is as follows:
1. Supporting examples or evidence: Additional evidence is given to demonstrate the students learning process or academic growth?

3.3. Analysis of assessment aspects


The assessment rubrics of pre-test were measured using item analysis. The t-values of both high-scoring (27%) and low-scoring groups
(27%) achieved signicance level, which implied each rubric had good discrimination capability and should be retained. The Pearsons
correlation between rubric means and the overall mean was signicant, showing that the consistency among rubrics was high and should be
retained.
3.4. Validity of assessment rubrics
The Kaiser-Meyer-Olkin (KMO) values for each rubric were greater than 0.5, implying that factor analysis could be applied (Table 2). An
approach of factor analysisdPrincipal Component Analysis (PCA)dmay be further used to construct validity. Considering that all factors (or
dimensions) had certain degree of correlation with one another, the oblique rotation approach was used conducting the PCA. The chi-square
approximation of the Bartlett distribution reached signicance level. This nding conrmed the existence of common factors between each
rubric, which showed the applicability of factor analysis. The results of factor analysis indicated factor loadings of a rubric under Attitudes
were lower than 0.3; the rubric was thus extracted. However, the results of another factor analysis revealed that all rubrics yielded factor
loadings greater than 0.3 with validity coefcient greater than 0.1. Therefore, all rubrics were retained.
Five constructs with eigenvalues higher than 1 were extracted. The explained variances of each rubric were greater than 45%, which
meant all rubrics were acceptably valid. The aspect of Artifact demonstrated the highest explained variance; Attitude, the least.
3.5. Reliability of assessment
Both the pre-test and formal test achieved high reliability with an overall Cronbachs a greater than 0.7, revealing that the rubrics had
a high degree of internal consistency. Hence, all rubrics were retained. (Table 3).
3.6. Web-based portfolio assessment system
On the assessment system, students were allowed to develop portfolios by selecting entry buttons and lling out forms according to the
instructions. To enhance assessment results, students were shown with the identical entry button line and portfolio forms.

Table 2
Factor analysis of the pre-test for assessment rubrics.
Aspect

KMO

Explained variance (%)

Portfolio Creation
Learning Goal
Artifact
Reection
Attitude
Othera
Overall

0.67
0.84
0.89
0.80
0.69
N/A
0.86

56.26
58.10
66.71
50.64
45.41
N/A
72.09

Other includes only one question, not applicable for factor analysis.

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

329

Table 3
Cronbachs a values of the pre-test and formal test.
Aspect

Pre-test

Formal test

Portfolio Creation
Learning Goal
Artifact
Reection
Attitude
Other
Overall

0.77
0.83
0.86
0.80
0.71
N/A
0.93

0.70
0.75
0.90
0.87
0.71
N/A
0.93

The toolbar menu on the left hand side includes the following options:
1. Guidance: guidelines for the development and assessment of portfolios
2. Portfolio Creation (students only): students prole, learning goal setting, assignments uploading, reective writing, etc.
3. Portfolio Assessment: teacher ratings are allowed for teacher login, while students will be directly linked to self- and peer-assessment
pages where they will view and evaluate portfolios from other people.
4. Portfolio Scores: it is the section where students may look up their grades from self-, peer-, and teacher-assessment, and the examples of
outstanding work are also displayed here.
5. Course Overview: a general class description and detailed lesson plans are illustrated.
The top Toolbar menu provides a number of functions as below (Fig. 1):
1.
2.
3.
4.
5.
6.
7.

Bulletin Board: up-to-date news and announcements.


Discussion Forum: students ideas or comments on related topics, such as portfolio creation or course content.
Prole: student personal information.
Portfolio Preview by User: portfolios are viewed in terms of students.
Portfolio Preview by Entry: portfolios are viewed in terms of assessment entries.
E-mail: contact information of all users.
Unit Change: the shift of the two course units.

The completed list of student names will be shown in the Portfolio Assessment section, where students are allowed to perform selfassessment by clicking on the button next to their own names. After that, a number of options are available on the webpage, including
Prole, Learning Goals, Reections, Unit Work, Other Entries, Scoring, Teacher Feedback, Peer Feedback, Participation Records, etc. To access
Scoring section, students can mark grades by using the Likert Scale, and write feedback in the textboxes below (Fig. 2).

Fig. 1. The Web-based self-assessment system.

330

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

Fig. 2. Conrmation of self-assessment results.

3.7. Procedure of experiment


The experiment comprised a pre-test and a formal test. To elevate reliability and validity, we adopted several approaches before the
formal test began, e.g. the increase of raters skills and familiarity with the rubrics. This study employed a full-blown portfolio assessment
designed by Chang and Tseng (2009a; 2009b). Outside the classes, the students were involved in a number of course activities: goal setting,
reection writing, work submission (including preliminary, revised, nal versions of coursework), etc. The teachers tracked students
learning progress through the assessment system.
During the rst week, assessment criteria and system operation trainings were presented. The teacher gave clear picture of learning
portfolios and its construction, and provided detailed scoring guidelines that highlight a raters integrity, fairness and full understandings of
grading standards, evidence-based grading, etc. After that, the teacher assistants guided students to access the assessment system and to
undertake self-assessment with one-on-one tutorials for needed individuals. The students also exchanged ideas and made comments on
two students self-assessment results randomly selected by the teacher. It is believed that these class activities and trainings would have
positive inuence on the practice of self-assessment.
For the analysis of test-retest validity, there were a total of four self-assessment results because self-grading had to be done twice for
individual student every time after the course units ended. There was a one-week interval between the two self-assessment activities. On
the other hand, the teachers (one instructor and three teacher assistants) began to assess portfolios until the two course units nished (Table
4).
4. Results and discussion
4.1. Are the two self-assessment results consistent within individual rater?
Table 5 summarized the Pearsons correlation of students self-assessment results, which were highly consistent as well as signicant. In
other words, the self-generated portfolio scores yielded a high degree of consistency. The nding supports Zalatans (2001) statement:
learners who undertake self-assessment tend to plunge into the processes, enabling them to develop reective thinking ability. The greatest
correlation coefcient appeared in Portfolio Creation; Learning Goal had the second greatest coefcient; Reection had the lowest. This
indicated that self-based ratings failed to reach consistency in Reection aspect. In the research of paper portfolio reliability, Gadbury-Amyot
et al. (2003) measured the correlation coefcients between two raters ranging from 0.28 to 0.6, slightly lower than our statistical results.
However, their ndings were based on the portfolio scores determined by teachers, whereas our research was conducted in a selfassessment context.
4.2. Are self-assessment results consistent with teacher-assessment results? are there signicant differences between the two variables?
The coefcients of Pearsons correlation and t-test are illustrated in Table 6. In Artifact and Overall, the two assessment methods were
found to be correlated and signicant, which suggested a high degree of consistency. Nevertheless, insignicance was discovered in other

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

331

Table 4
Procedure of experiment.
Week

Subject

Task

Week 1 (Pre-test)

Teacher

1. Provide instructions on how to create learning portfolios.


2. Explain scoring criteria.
3. Be prepared with class activities and course content.
1. System testing, and create accounts for all users.
2. Help users to get familiar with the system by offering hands-on activities.
1. Get basic understandings of portfolio development.
2. Learn how to perform self-assessment.
3. Get familiar with the system.
1. To deliver the course (Unit 1).
2. To monitor students portfolio creation.
1. System maintenance and management.
2. Monitor students portfolio creation.
Develop portfolios on the system after classes.
1. Deliver the course (Unit 1).
2. Monitor students portfolio creation.
Monitor students portfolio creation.
1. Develop portfolios on the system
2. Have preliminary and nal versions of assignments submitted.
Monitor students submission status.
Send a reminder for late work.
Revise and upload assignments.
Grade portfolios.
Grade portfolios.
Undertake self-assessment (twice) and peer-assessment.
1. Prepare for class activities and course content of Unit 2.
2. Specify the problems encountered in Unit 1, and provide proper solutions.
Collect the data, and analyze reliability as well validity to modify scoring criteria.
Be aware of the problems occurred in Unit 1.
Get familiar with the updated scoring criteria.
See week 23

Assistant
Student

Week 23

Teacher
Assistant

Week 4

Student
Teacher
Assistant
Student

Week 5

Week 6

Week 7 (formal test)

Teacher
Assistant
Student
Teacher
Assistant
Student
Teacher
Assistant
Student

Week 89

Week 10

Week 11

Week 12

Week 13

Teacher
Assistant
Student
Teacher
Assistant
Student
Teacher
Assistant
Student
Teacher
Assistant
Student
Assistant

See week 4

See week 5

See week 6

Gather the data, and gure out the reliability and validity.

aspects; that is to say, self and teacher scoring demonstrated inconsistency in the other four aspects. Assume that teacher-based scores are
appropriate validity criteria, self-assessment in general can be considered as a valid method.
In calculating the difference between the two variables, insignicance was identied in four aspects Portfolio Creation, Artifact,
Attitude, and Overall which means self- and teacher-based scores were consistent. On the contrary, Learning Goal and Reection yielded
the largest effect sizes, implying signicant differences between the two assessment methods in these aspects. More specically, Reection
yielded the largest effect size with the second largest in Learning Goal and the least in Portfolio Creation. Overall, self-grading was relaxing
comparing to teacher-grading, according to the mean scores. Particularly worth noting, the mean scores of teacher-assessment were
signicantly higher in the aspects of Portfolio Creation, Artifact, and Reection. The factors that resulted in the phenomenon need to be
further explored.
4.3. Are self-assessment results sufcient to examine learning achievements, i.e. are self-assessment results consistent with end-of-course exam
results?
The coefcients of Pearsons correlation between self-assessment scores and exam scores revealed that these two variables were highly
consistent (Table 7), and high levels of consistency were found in each aspect. This implied that consistency indeed existed; thus, selfTable 5
The Pearsons correlation of two self-assessment results.
Aspect

Correlation coefcient(Signicance)

Portfolio Creation
Learning Goals
Artifacts
Reections
Attitudes
Other
Overall

0.99(0.003***)
0.93(0.022**)
0.87(0.011**)
0.78(0.023**)
0.85(0.033**)
0.82(0.028**)
0.81(0.001***)

**p < 0.01, ***p < 0.001.

332

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

Table 6
The Persons correlation and t-test of self-assessment and teacher-assessment results.
Aspect

Correlation coefcient (signicance)

Portfolio Creation

0.19 (0.270)

Learning Goal

0.28 (0.102)

Artifact

0.49 (0.002**)

Reection

0.32 (0.061)

Attitude

0.07 (0.685)

Other

0.18 (0.373)

Overall

0.83 (0.001**)

Assessment method

Mean

Standard deviation

T value (signicance)

Effect size

Self
Teacher
Self
Teacher
Self
Teacher
Self
Teacher
Self
Teacher
Self
Teacher
Self
Teacher

4.02
4.05
3.93
3.69
3.70
3.75
3.82
4.11
3.62
3.44
3.54
3.59
3.82
3.74

0.57
0.40
0.53
0.24
0.44
0.26
0.52
0.32
0.54
0.40
0.53
0.42
0.46
0.26

0.309 (0.759)

0.061

2.860 (0.007**)

0.583

0.775 (0.443)

0.138

3.289 (0.002**)

0.672

1.555 (0.129)

0.379

0.867 (0.065)

0.105

1.058 (0.297)
0.214

**p < 0.01.

assessment results were appropriate for evaluating students learning accomplishments. Gelinas (1998) made a statement in accordance
with this nding: there is a positive correlation between learners portfolio scores and their academic performance. According to our
statistical analysis, Artifact and Portfolio Creation, respectively, were the two aspects having the greatest correlation coefcients, and Other
held the least coefcient. In other words, learning achievements were more likely to be accurately assessed in Artifact.
4.4. Analysis of results
As far as reliability is concerned, there was a high consistency between two self-assessment results within individual student (Table 5).
That is to say, the self-assessment was sufciently reliable, and the nding was aligned with Knowles et al. (2005) view. When it comes to
validity, based on the Pearsons correlation and t-test analyses, students self-assessment results were highly correlated and signicant with
teacher-assessment results (Table 6) and exam results (Table 7). Hence, self-assessment had a good degree of validity. The conclusion is
supported by a number of research results (Sadler & Good, 2006; Sung et al., 2005) but contradictory to the ndings of Knowles et al. (2005).
To sum up, self-assessment is a method with appropriate reliability and validity. Although the greatest and lowest reliability and validity
were randomly distributed, the Artifact aspect tended to achieve higher levels, and lower levels were usually found in Attitude (Table 8).
To date, validity and reliability studies related to portfolio assessment have been conducted in both conventional and electronic learning.
Most researchers discovered good validity and reliability (Cho et al., 2006), although opposite views have been proposed. If the reliability
and validity studies are to be conclusive, related investigations must cover large samples, sufce course units and participants from diverse
educational backgrounds (Bouzidi & Jaillet, 2009). Even though a low level of reliability was discovered, Sulzen et al. (2008) identied
sufcient validity, and concluded that increasing the number of raters was effective in reliability improvement. It is also noted that scoring
results may be adversely affected, for the practice of online assessment requires considerable amount of time and effort (Ballantyne, Hughes,
& Mylonas, 2002; Davies, 2000; Lin et al., 2001; Miller, 2003; Sung et al., 2005; Sung, Lin, Lee, & Chang, 2003; Topping, Smith, Swanson, &
Elliot, 2000; Tsai & Liang, 2009; Tsai et al., 2002; Wen & Tsai, 2006; Yang & Tsai, 2010). According to Sung et al. (2003) and Sung et al. (2005),
repeated practice of assessment will help to narrow down the gap between student-based and teacher-based portfolio scores. This view was
based on the assumption that students are more likely to sharpen rating abilities when they are provided with sufcient practice, which will
in turn make a higher validity possible. Nevertheless, it is a sensitive issue needs to be dealt with: abundant assessment activities tend to
guarantee advances in grading skills, but also impose heavy burden on students.
In Falchikov and Goldnchs (2000) research, similar scores were not necessarily found in each aspect, despite the fact that student-based
portfolio scores were usually consistent with teacher-based scores. The ndings parallel with our statistical analyses. Bouzidi and Jaillet
(2009), in contrast, held that there was potential consistency between student and teacher grading in distinct aspect. To sum up,
teacher-assessment should not be superceded by self-assessment in paper- or Web-based contexts, regardless of the results of reliability and
validity analyses.
5. Discussion and conclusion
In this study, we concluded self-assessment as reliable and valid based on the consistency detected between students two self-based
portfolio scores. To rise reliability and validity it is useful allowing student raters to have strict trainings and in-depth understandings of
portfolio assessment (Gadbury-Amyot, 2003; Oskay et al., 2008; Sulzen et al., 2008). In our view, students age, usually associated with ones
Table 7
The Pearsons correlation between self-based portfolio scores and end-of-course exam scores.
Aspect

Correlation coefcient

Signicance

Portfolio Creation
Learning Goal
Artifact
Reection
Attitude
Other
Overall

0.67
0.64
0.68
0.66
0.61
0.24
0.71

0.000***
0.003**
0.000***
0.000***
0.006***
0.008*
0.023**

*p < 0.05, **p < 0.01, ***p < 0.001.

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

333

Table 8
Comparison of self-assessment correlation coefcients.
Aspect

Two self-assessment results

Self-assessment and teacher-assessment

Self-assessment and learning achievements

Portfolio Creation
Learning Goal
Artifact
Reection
Attitude
Other

1***
2***
3***
6**
4**
5**

4
3
1**
2
6
5

2***
4***
1***
3***
5**
6*

*p < 0.05, **p < 0.01, ***p < 0.001.

cognition and mental maturity, may also play an inuential role in self-assessment validity. Among all aspects, the highest consistency level
was discovered in Portfolio Creation, meaning that self-assessment in this aspect had higher possibility to achieve consistency. Conversely,
Reection held the lowest level, implying that self-assessment in Reection was less likely to be consistent.
Overall, self-grading showed good consistency with teacher-grading. It is concluded that the self-assessment of senior high school
students had adequate validity if teacher-based scores were valid criteria. However, some aspects demonstrated poor validity, e.g. Portfolio
Creation, Learning Goal, Reection, Attitude, in which signicant differences and inconsistency existed between self-assessment and
teacher-assessment. In response to validity issues, many researchers proposed effective ways in order to improve validity levels. Bouzidi and
Jaillet (2009) suggested the use of portfolio assessment incorporate proper scoring criteria, well-developed and specic rubrics, technological support, and sufcient examples of portfolio work. It is also believed that concrete and clear rubrics are particularly benecial for
valid assessment results (Gadbury-Amyot, 2003; Oskay et al., 2008). In the present study, we discovered that students appeared to have
better grasp of concrete assessment criteria (e.g. Artifact) than that of abstract criteria (e.g. Learning Goal, Reection). According to our
ndings, self-assessment results and exam results were signicant and consistent, implying the Web-based portfolio self-assessment was
adequate to reect learning achievements. Therefore, the Web-based portfolio self-assessment should be regarded as a valid method.
Given the timeframe and external factors such as the graduation and college entrance examinations, the duration of our experiments did
not exceed three months. It is preferable that investigations into portfolio assessments should endure over a longer period of time in order to
enhance the authenticity of assessment. Furthermore, only two of students assignments were examined for the pre-test and formal tests.
Hopefully, prospective studies will incorporate more data, focusing on the diversity of individual learners portfolio documents. GadburyAmoyot (2003) concluded that larger samples tend to elicit good reliability, we therefore recommend future studies expand the sample size,
making the results promising and trustworthy.
Additionally, the Reection aspect examined the reective thinking on ones learning goals, artifacts, learning attitudes as well as
achievements. In terms of the rubrics, those who were aware of their own disadvantages would receive higher scores, e.g. I should improve
my attitude and devote more efforts to creating portfolios. In this case, does the above reection have adverse inuences on earning higher
scores because the student seemed to make negative comments on her own learning attitude? Or is it possible to develop a full-blown
scoring method by adopting additional indices or mechanisms? These are unanswered questions deserving further exploration.
References
Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing procedures for implementing peer assessment in large classes using an action research process. Assessment &
Evaluation in Higher Education, 27(5), 427441.
Baturay, M. H., & Daloglu, A. (2010). E-portfolio assessment in an online English language course. Computer Assisted Language Learning, 23(5), 413428.
Bloxham, S., & West, A. (2004). Understanding the rules of the game: marking peer assessment as a medium for developing students conceptions of assessment. Assessment
& Evaluation in Higher Education, 29(6), 721733.
Bouzidi, L., & Jaillet, A. (2009). Can online peer assessment be trusted? Educational Technology & Society, 12(4), 257268.
Chang, C.-C. (2001). A study on the evaluation and effectiveness analysis of web-based learning portfolio (WBLP). British Journal of Educational Technology, 32(4), 435458.
Chang, L. L. (2002). The analysis of portfolio assessment reliability and validity: a case study of elementary school students. Education and Psychology, 25, 134.
Chang, C.-C. (2008). Enhancing self-perceived effects using Web-based portfolio assessment. Computers in Human Behavior, 24(4), 17531771.
Chang, C.-C., & Tseng, K.-H. (2009a). Using a Web-based portfolio assessment system to elevate project-based learning performances. Interactive Learning Environments, 16(2),
120.
Chang, C.-C., & Tseng, K.-H. (2009b). Use and performances of web-based portfolio assessment. British Journal of Educational Technology, 40(2), 358370.
Cho, K., Schunn, C., & Wilson, R. (2006). Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. Journal of Educational
Psychology, 98(4), 891901.
Davies, P. (2000). Computerized peer assessment. Innovations in Education and Training International, 37(4), 346355.
Derham, C., & DiPerna, J. (2007). Digital professional portfolios of preservice teaching: an initial study of score reliability and validity. International Journal of Technology and
Teacher Education, 15(3), 363381.
Falchikov, N., & Goldnch, J. (2000). Student peer assessment in higher education: a meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3),
287322.
Gadbury-Amyot, C. C. (2003). Validity and reliability of portfolio assessment of competency in a baccalaureate dental hygiene program. Unpublished doctoral dissertation. Kansas
City, Missouri: University of Missouri.
Gadbury-Amyot, C.,C., Kim, J., Palm, R. L., Mills, E., Noble, E., & Overman, P. (2003). Validity and reliability of portfolio assessment of competency in a baccalaureate dental
hygiene program. Journal of Dental Education, 67(9), 9911002.
Gelinas, A. M. (1998). Issue of reliability and validity in using portfolio assessment to measure foreign language teacher performance. Unpublished doctoral dissertation. Columbus,
Ohio: Ohio State University.
Hult, C. (2001). Using on-line portfolios to assess English majors at Utah state University. In B. L. Cambridge, S. Kahn, D. P. Tompkins, & K. B. Yancey (Eds.), Electronic portfolios:
Emerging practice in student, faculty, and institutional learning (pp. 6070). Washington, DC: American Association for Higher Education Press.
Knowles, M. S., Holton, E. F., III, & Swanson, R. A. (2005). The adult learner (6th ed.). Burlington, MA: Elsevier.
Lenze, J., et al. (2004). Inter-rater reliability in the evaluation of electronic portfolios: a survey of empirical research results. In R. Ferdig (Ed.), Proceedings of Society for
Information Technology & Teacher Education International Conference 2004 (pp. 164169). Chesapeake, VA: AACE.
Liang, J. C., & Tsai, C. C. (2010). Learning through science writing via online peer assessment in a college biology course. Internet and Higher Education, 13, 242247.
Lin, S. J., Liu, Z. F., & Yuan, S. M. (2001). Web-based peer assessment: feedback for students with various thinking-styles. Journal of Computer Assisted Learning, 17(4), 420432.
Liu, Z. F., Lin, S. J., Chiu, C. H., & Yuan, S. M. (2001). Web-based peer review: the learner as both adapter and reviewer. IEEE Transactions on Education, 44(3), 246251.

334

C.-C. Chang et al. / Computers & Education 60 (2013) 325334

Luca, J., & McMahon, M. (2006). Developing multidisciplinary teams through self-assessment, supported with online tools. In E. Pearson, & P. Bohman (Eds.), Proceedings of EdMedia (pp. 18551860). Norfolk, VA: AACE.
Miller, P. J. (2003). The effect of scoring criteria specicity on peer and self-assessment. Assessment & Evaluation in Higher Education, 28(4), 383394.
Oskay, O., Schallies, M., & Morgil, I. (2008). A closer look at ndings from recent publication. Hacettepe University Journal of Education, 35, 263272.
Price, M., & ODonovan, B. (2006). Improving performance through enhancing student understanding of criteria and feedback. In C. Bryan, & K. Clegg (Eds.), Innovative
assessment in higher education (pp. 100109). London, UK: Routledge.
Prins, F. J., Sluijsmans, D. M. A., Kirschner, P. A., & Strijbos, J.-W. (2005). Formative peer assessment in a CSCL environment: a case study. Assessment & Evaluation in Higher
Education, 30(4), 417444.
Sadler, P., & Good, E. (2006). The impact of self- and peer-grading on student learning. Educational Assessment, 11(1), 131.
Sluijsmans, D., Dochy, F., & Moerkerke, G. (1999). Creating a learning environment by using self-, peer-, and co-assessment. Learning Environments Research, 1(3), 293319.
Sulzen, J., Young, M., Hannin, R., et al. (2008). Reliability and validity of an ecologically-grounded student teacher electronic portfolio rubric. In K. McFerrin (Ed.), Proceedings
of Society for Information Technology & Teacher Education International Conference 2008 (pp. 153159). Chesapeake, VA: AACE.
Sung, Y.-T., Chang, K.-E., Chiou, S.-K., & Hou, H.-T. (2005). The design and application of a web-based self and peer-assessment system. Computers & Education, 45(2), 187202.
Sung, Y.-T., Lin, C.-S., Lee, C.-L., & Chang, K.-E. (2003). Evaluating proposals for experiments: an application of web-based self-assessment and peer-assessment. Teaching of
Psychology, 30(4), 331334.
Topping, K. J., Smith, E. F., Swanson, I., & Elliot, A. (2000). Formative peer assessment of academic writing between postgraduate students. Assessment & Evaluation in Higher
Education, 25(2), 149169.
Tsai, C.-C., & Liang, J.-C. (2009). The development of science activities via on-line peer assessment: the role of scientic epistemological views. Instructional Science, 37(3),
293310.
Tsai, C.-C., Lin, S.-J., & Yuan, S.-M. (2002). Developing science activities through a networked peer assessment system. Computers & Education, 38(13), 241252.
Wen, M.-L., & Tsai, C.-C. (2006). University students perceptions of and attitudes toward (online) peer assessment. Higher Education, 51(1), 2744.
Yancey, K. B. (2001). General patterns and the future. In B. L. Cambridge, S. Kahn, D. P. Tompkins, & K. B. Yancey (Eds.), Electronic Portfolios: Emerging practice in student, faculty,
and institutional learning (pp. 8387). Washington, DC: American Association for Higher Education Press.
Yang, Y.-F., & Tsai, C.-C. (2010). Conceptions of and approaches to learning through online peer assessment. Learning and Instruction, 20(1), 7283.
Yu, M. N. (2002). Educational testing and assessment. Taipei, Taiwan: Psychological Publishing.
Zalatan, K. A. (2001). Electronic portfolios in a management major curriculum. In B. L. Cambridge, S. Kahn, D. P. Tompkins, & K. B. Yancey (Eds.), Electronic portfolios: Emerging
practice in student, faculty, and institutional learning (pp. 4452). Washington, DC: American Association for Higher Education Press.
Zawacki-Richter, O., & Hanft, A. (2011). Validation of competencies in e-portfolios: a qualitative analysis. International Review of Research in Open and Distance Learning, 12(1),
4260.

Вам также может понравиться