The Performance Quality Rating Scale (PQRS) : Reliability, Convergent Validity, and Internal Responsiveness For Two Scoring Systems

http://informahealthcare.
com/dre
ISSN 0963-8288 print/ISSN 1464-5165 online
Disabil Rehabil, 2015; 37(3): 231–238

! 2015 Informa UK Ltd. DOI: 10.3109/09638288.2014.913702
ASSESSMENT PROCEDURE
The performance quality rating scale (PQRS): reliability, convergent

validity, and internal responsiveness for two scoring systems
Rose Martini1, Jorge Rios2, Helene Polatajko3, Timothy Wolf4, and Sara McEwen2,5
1
School of Rehabilitation Sciences, University of Ottawa, Ottawa, Ontario, Canada, 2Sunnybrook Research Institute, St. John’s Rehab Research
Program, Toronto, Ontario, Canada, 3Department of Occupational Science and Occupational Therapy, University of Toronto, Toronto, Ontario,
Canada, 4Program in Occupational Therapy, Washington University School of Medicine in St. Louis, MO, USA, and 5Department of Physical Therapy,
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
University of Toronto, Toronto, Ontario, Canada
Abstract Keywords
Purpose: The performance quality rating scale (PQRS) is an observational measure of Development coordination disorder, outcome
performance quality of client-selected, personally meaningful activities. It has been used measures, performance quality rating scale,
inconsistently with different scoring systems, and there have been no formal publications on its PQRS, psychometric properties, stroke
psychometric properties. The purpose of this study was to test and compare the psychometric
properties of two PQRS scoring systems in two populations. Methods: A secondary analysis of History
video recorded participant-selected activities from previous studies involving either adults
living with stroke or children diagnosed with developmental coordination disorder (DCD) was Received 1 May 2013
For personal use only.
conducted. Three pairs of raters scored the video recorded performances with PQRS Revised 2 April 2014
operational definitions (PQRS-OD) and a generic rating system (PQRS-G). Results: For inter- Accepted 7 April 2014
rater reliability, PQRS-OD ICCs were substantial, ranging from 0.83 to 0.93; while the PQRS-G Published online 28 April 2014
ICCs were moderate, ranging from 0.71 to 0.77. Test–retest reliability was substantial, 40.80
(ICC), for both rating systems across all rater pairs. Internal responsiveness was high for both
rating systems. Convergent validity with the Canadian Occupational Performance Measure
(COPM) was inconsistent, with scores ranging from low to moderate. Conclusion: Both scoring
systems have demonstrated they are reliable and have good internal responsiveness. The PQRS-
OD demonstrated greater consistency across raters and is more sensitive to clinically important
change than the PQRS-G and should be used when greater accuracy is required. Further
exploration of validity with actual rather than perceived performance measures is required.
ä Implications for Rehabilitation

The PQRS provides clinicians and researchers with a means of measuring actual performance
of self-selected activities.
Rehabilitation scientists are encouraged to improve our understanding of the relationship
between perceived and actual performance as a means to improve interventions and
outcomes.
Introduction The performance quality rating scale (PQRS) is an observa-

tional, video-based tool intended to measure the actual perform-
Combining task-specific training with a client-centred approach,
ance of individual client-selected activities. We distinguish that
wherein the client self-selects the tasks or goals to be the focus of
the PQRS measures actual performance rather than capacity
an intervention, has demonstrated efficacy in several populations,
using definitions provided by Lemmens et al. [7]; they suggest
including people with stroke, brain injury, and developmental
that actual performance is the objectively detectable performance
coordination disorder (DCD) [1–4]. However, measuring changes
in the current environment, whereas capacity is the highest
in the ability to perform client-selected goals has proven
possible level of functioning in a standardized environment.
challenging, usually relying on client self-report, such as using
Martini and Polatajko first described PQRS in 1998, when the
the Canadian occupational performance measure (COPM) [5], or
tool was developed to evaluate children’s performance in self-
combining self-report with standards for the specific goal, such as
selected activities in a single case experimental series testing an
using goal attainment scaling (GAS) [6].
intervention for children with DCD [8]. It was described as a
6-point (0–5) scale used to examine both the activity steps and the
performance quality, with scoring based on objective, operatio-
Address for correspondence: Sara McEwen, PT, PhD, Sunnybrook
Research Institute, St. John’s Rehab Research Program, 285 Cummer nalized criteria developed for each activity. In 2001, Miller et al.
Avenue, Toronto, Ontario, Canada M2M 2G1. Tel: 416-226-6780, ext. modified the PQRS, making it a 10-point performance rating
7957. E-mail: sara.mcewen@utoronto.ca scale to better complement the 10-point client-rated COPM [4].
232 R. Martini et al. Disabil Rehabil, 2015; 37(3): 231–238
Subsequently, the PQRS has been used in other intervention group format described in [13] and in Canadian Association of
studies [2,3,9–11], however, it has been used inconsistently, with Occupational Therapists Conference presentations 2011 and 2012.
different scoring systems, and a formal evaluation of its psycho- They were between 7 and 11 years, six boys and two girls, and all
metric properties has not been conducted. In fact, in Lemmens performed below the 15th percentile on the Movement
and colleagues’ recent review of upper extremity assessments for Assessment Battery for Children – 2 [14].
stroke and cerebral palsy, PQRS was excluded specifically The investigators selected videos for inclusion in this study
because of lack of published psychometric properties [7]. that had associated COPM scores and that depicted an activity
The PQRS has been used with either generic scoring, or with that was amenable to video analysis. For example, videos of a
specific operational definitions for different rating levels for each stroke patient doing a yoga breathing activity that was perceived
participant-selected skill. With generic scoring, such as used by as too subtle to be assessed with video, was not included.
Miller et al. [4], a rating of ‘‘1’’ indicates that the skill is not done Three pairs of blinded raters were used: two trained research
at all, and ‘‘10’’ indicates that the skill is performed very well. assistants without a health-care professional background (RA),
Alternatively, Martini and Polatajko [8] and McEwen et al. [2,3] two registered occupational therapists (OT), and two occupational
developed operational definitions for PQRS ratings for specific therapy professional masters students (ST). None of the raters had
skills. For example, if the skill was tying shoelaces, an operational ever been involved in the treatment of the patients. Within each
definition for a rating of ‘‘2’’ may be ‘‘crosses laces but cannot pair, one rater was randomly chosen to be the A Rater, and the
make a knot’’; descriptions of progressing skill level are given for other automatically became the B Rater.
at least every other rating level [2,3,8].
The PQRS has generally been used in conjunction with the Instruments
COPM. In the COPM, the therapist administers a semi-structured

The COPM is an outcome measure that, through a semi-structured
interview to identify and prioritize individualized, meaningful,
interview, is designed to (a) identify and prioritize occupational
real-life performance issues subsequently developed into the
performance issues or goals, and (b) measure clients’ self-
goals/activities that are the focus of treatment. The COPM also
perceived changes in occupational performance and satisfaction
measures clients’ self-perceived changes in performance and
with their performance over time, of identified goals [5]. The
performance satisfaction of identified activities [5]. The PQRS
COPM has demonstrated test–retest reliability of 0.89 in people
complements the COPM by assessing actual rather than perceived
with stroke [15], and has been used with children to identify
performances of the activities that were identified. Each chosen
goals [4,8,16]. The COPM has been shown to be sensitive to
activity is executed without verbal or physical guidance in a non-
changes over time in a client’s perceived occupational perform-
standardized environment, i.e. any appropriate space within a
ance and satisfaction in client-identified activities [17]. A change
treatment facility, community, or home. An independent observer,
of two points or more on the COPM is considered clinically

using a 10-point scale, then rates video recordings of the activity.
significant [18].
PQRS inter-rater reliability has previously been reported, but
The PQRS rates performance on a 10-point scale, with a score
within intervention studies, and with no reporting of the methods
of 1 indicating ‘‘can’t do the skill at all’’ and 10 indicating ‘‘does
used for evaluation [3,4]. No direct comparisons between generic
the skill very well’’ [4]. An independent observer rates perform-
ratings and specific operational definitions have been made using
ances from video recorded trials of each skill at all assessment
the same population with the same goals. Developing operational
points. In this study, we compared two versions of the PQRS
definitions for PQRS ratings for individual, client-selected
scoring – PQRS-OD, in which operational definitions were
activities for all participants in a research study or all clients in
developed for at least every other numerical rating for each goal
a therapist’s practice takes time and is therefore costly, but may
selected by participants and PQRS-G, in which a generic rating
deliver more measurement precision than using a 10-point scale.
system was used. Table 1 shows both systems. In this instance, all
Furthermore, using the PQRS as a tool for research in interven-
operational definitions were developed or edited by either RM
tions with participant self-selected activities necessitates the
(child definitions) or SM (adult definitions).
availability of clearly standardized administration procedures and
good psychometric properties. Therefore, the objective of this
Procedure
study was to estimate and compare the reliability, convergent
validity, and internal responsiveness of two PQRS rating systems Two rating periods were established, with a 2-week washout
in two populations, adults with stroke and children with DCD. period in between. During the first rating period, A Raters used
the PQRS-G system for the first half of the videos, and PQRS-OD
for the second half of the videos. During the second rating period,
Methods
2 weeks later, this was reversed, and A Raters used the PQRS-OD
To meet the objective above, three pairs of raters were asked to system for the first half of the videos and the PQRS-G system for
watch video-recorded activity performances from the two patient the second half. B Raters did the opposite (Figure 1). At the end of
populations, and to score performances using both a generic rating both rating periods, all raters had watched each video twice, had
system (PQRS-G) and a system using specific operational used both rating systems, and had used each rating system first in
definitions (PQRS-OD). This was a secondary analysis of one rating period or the other. This last procedure was enacted in
video-recorded task performances from participants from previ- case there was an effect of having used one rating system or
ous intervention studies evaluating the efficacy of the Cognitive the other first.
Orientation to daily Occupational Performance treatment One set of videos was used to explore inter-rater reliability,
approach with either adults living with stroke or children convergent validity, and responsiveness, and a second set of
diagnosed with DCD. The stroke participants, described in two videos was used to explore test–retest reliability. For inter-rater
previously published studies [2,3], comprised six community- reliability, convergent validity, and responsiveness, 20 pairs of
dwelling adults between the ages of 38 and 56, four men and two videos, 40 videos total, were viewed. Ten pairs, 20 videos, were of
women, all more than 1-year post-stroke, and with pre-interven- adults and 10 pairs, 20 videos, were of children. Each pair
tion impairment levels ranging from 2/7 to 6/7 on the Chedoke consisted of a pre-intervention video recording of a self-selected
McMaster Stroke Assessment Impairment Inventory [12]. The activity and a post-intervention video recording of the same self-
children were participants in a study on the use of CO-OP in a selected activity, performed by the same research participant.
DOI: 10.3109/09638288.2014.913702 The performance quality rating scale (PQRS) 233
Table 1. Comparison of PQRS-OD and PQRS-G rating systems for tying a tie.
Example of PQRS-OD rating scale and definitions

Rating Operational definitions for tying a neck tie
1 Hands do not grasp tie and attempt to tie it.
2 Knot is partially completed.
4 Knot is completed, but loosely. Uses teeth for part of the process, and/or bottom tail of tie is longer than top tail, and/or tie bottom
is not sitting at belt. (3/3 or 2/3)
6 Knot is completed loosely. Uses teeth for part of the process OR top tail of tie is longer than bottom tail OR tie bottom is NOT sitting
at belt. (1/3)
8 Knot is completed loosely. Top tail of tie is longer than bottom tail, and tie sits roughly at belt.
10 Tie is tied quickly, with no errors or false starts. Knot is taut, top tail of tie is longer than bottom tail, and tie bottom sits roughly
at belt.
PQRS-G rating scale
Instructions: Watch video. Rate the performance on a scale of 1 to 10, with 1 being ‘‘cannot perform the task at all’’ and 10 ‘‘can perform the
task well’’. Use the table below as a guideline to assist you in rating the performance. Please note that only ONE score is assigned. The video may
be watched a second time, but not more than twice.
Quality takes into account, as applicable to the task, timeliness, accuracy, safety, and overall quality of performance or product.
Rating is based on average of completeness and quality, and is rounded to the nearest whole number with X.5 being rounded up. For example, an
observed performance was scored as 75% task completion or 6 for Completeness and poor or 4 for Quality of Performance Product. The average
rating for that performance is 5 [(6 + 4)/2].
Scores of 3, 5, 7, and 9 are to be used at the rater’s discretion for observed performance that are marginally better or worse than something previously
observed, but don’t quite meet the criteria for the next defined category.
Rating Completeness Quality of performance or product
1 0% task completion n/a

2 25 % task completion Very poor
3
4 50% task completion Poor
5
6 75% task completion Moderate
7
8 100 % of task completion Good
9
10 100 % of task completion Excellent
Figure 1. Visual representation of the study A Rater B Rater

design.
1. PQRS - G 1. PQRS-OD
• inter-rater reliability: 10 pairs • inter-rater reliability: 10 pairs
of videos (videos 1 - 20) of videos (videos 1 - 20)
• test-retest reliability: 5 • test-retest reliability: 5
Time 1 individual videos (21 -25) individual videos (21 - 25)
2. PQRS-OD 2. PQRS-G
individual videos (46 -50) individual videos (46 - 50)
2-Week Washout Period
A Rater B Rater
1. PQRS-OD 1. PQRS-G
Time 2 individual videos (46 -50) individual videos (46 -50)
2. PQRS-G 2. PQRS-OD
individual videos (21 -25) individual videos (21 -25)
As part of previously conducted intervention studies, the research Table 2. Skills viewed on videos.
participants had been interviewed by a therapist using the
standardized COPM interview to select personally meaningful Children Adults
activities for assessment and intervention. The order of the pre Inter-rater reliability and responsiveness videos
and post intervention videos was randomized within each pair, Putting on and tying a Carrying objects with
and the videos within a pair were always viewed in sequence. In hockey helmet affected arm
other words, if the self-selected activity was skipping rope, both Tying shoe laces Getting in and out of a chair
the pre-intervention and post-intervention videos of skipping rope (4 different children)
Jumping rope Buttoning a shirt
would be viewed back-to-back, but the rater may or may not see Stopping with rollerblades Using a computer mouse
the pre-intervention video first. Each video recording had a Riding a bicycle Putting on a jacket
corresponding COPM score that had been previously assigned by Swinging on a swing Sewing
the research participant during the respective intervention study. Gardening
For test–retest reliability, an additional 10 individual videos of Climbing stairs
self-selected activities were viewed, five of adults and five of Cutting fruit
Riding a bicycle
children. These videos were not paired, as pre- and post-
Test–retest reliability videos
intervention pairs were not required for assessment of test–retest Swinging on a swing Clipping nails
reliability. At each rating period, A Raters scored five videos with Jumping rope alone Walking
the PQRS-OD and the other five videos with the PQRS-G, Brushing teeth Tying a tie
repeating the procedure at the second rating period. The B Raters Lacing skates Photography
used the same videos and procedures, but employed the alternate Running
PQRS used by A Raters.
The file names of the video recordings were designed to
indicate the order that the raters should view them in and the mixed, ICC, and the two may be substituted interchangeably [19].
video files’ metadata was altered to ensure the raters could not The following benchmarks proposed by Shrout (1998) were used
determine the chronological order of the video pairs. Videos were for the interpretation of inter-rater reliability coefficients: 0.00 to
transferred to memory sticks encrypted using TrueCrypt1. Each 0.10 (virtually none); 0.11 to 0.40 (slight); 0.41 to 0.60 (fair); 0.61
memory stick contained videos for a specific rating period and to 0.80 (moderate); and 0.81 to 1.0 (substantial) [20].
rating scale. To accommodate the schedules of the raters across Intra-class correlation coefficients were calculated to evaluate
two study sites, the memory sticks were given to the raters in convergent validity between PQRS and COPM performance and
advance and when a rater was ready to begin a rating period, they satisfaction scores at pre-test and post-test. Convergent validity of
were e-mailed the passwords to the appropriate memory sticks. items is the degree to which theoretically similar concepts are
Raters were instructed to rate each video to the best of their ability related [21].
using the assigned system, but not to review each video more than Internal responsiveness is the ability of a measure to detect
twice to decide on the score. This instruction was given to change over a specified time frame [22]. The PQRS is an ordinal
minimize practice effect. scale, so a significant change finding was determined using the
Ethics approval for the present study was obtained from the non-parametric Wilcoxon signed rank test. It should be noted that
Research Ethics Boards of the participating institutions. in their comparison of the Wilcoxon signed rank test with the t-
test, Meek, Ozgur and Dunning (2007) concluded that the t-test
Analysis may be preferred over the Wilcoxon signed rank test as the t-test
seemed to possess higher power than the Wilcoxon signed-rank
Data were cleaned and examined for discrepancies. A decision to
test when the sample sizes are small, even though its assumptions
remove some videos was made based on feedback from raters. If a
were violated [23]. As such, a paired t-test between pre and post
video was identified by raters as problematic, it was removed only
scores was used to confirm significant change.
if there were between-rater differences of 4 points or more across
The effect size statistic provides direct information on
more than one pair of raters and with both rating systems. This
magnitude of change in a measure, as such it is widely
resulted in five videos being removed, leaving 35 videos for inter-
recommended for use as an indicator of responsiveness [22].
rater reliability analysis. Since pairs of videos (pre-test and post-
The effect size for an ordinal scale was calculated using ¼Z/ˇN.
test) were required for validity and responsiveness analysis, an
Relative efficiency (RE) is another estimate of change in a
additional three videos were removed for those analyses, leaving
measure. It allows one to determine whether one measure is more
32 videos (16 pairs or tasks: 9 by children and 7 by adults). For
or less efficient for measuring change than another. It can be
test–retest reliability one video was removed, leaving nine videos
computer for any pair of instruments. An RE51 means that the
of different skills, four of adults and five of children. The decision
instrument was a less efficient tool for measuring change; while
to remove these videos from the analysis was made to reduce
an RE41 means that the instrument was a more efficient tool for
measurement error based on the quality of the videos rather than
measuring change.
on issues with the scoring systems. Table 2 lists all skills that
Husted et al. (2000) recommended that this be complemented
were viewed.
by another measure. The standardized response mean (SRM) was
Test–retest and inter-rater reliability were calculated using the
therefore used to confirm the paired t-test and RE index findings.
intra-class correlation coefficient (ICC). To determine inter-rater
The SRM is the ratio of observed change and the standard
reliability, two-way mixed model ICCs with absolute agreement
deviation reflecting the variability of the change scores [22].
were calculated for pairs of raters (OTs, RAs and STs) for both the
Norman (2010) argues that parametric statistics are robust with
PQRS-OD and the PQRS-G scoring systems. While the PQRS is
respect to violations of assumptions regarding sample size and
an ordinal measure Normal and Streiner (2008) have argued that
normal distribution and hence can be applied to ordinal ratings
using a weighted kappa for ordinal scales is identical to a two-way
[19]. As such the SRM, based on the paired t-test calculations was
applied to ordinal PQRS ratings. Calculated SRM effect sizes
1
Copyright ß 2003-2013 TrueCrypt Developers Association; http:// were compared to Hopkins’ more conservative Likert-scale
www.truecrypt.org approach to predetermined values of effect size representing
responsiveness as: trivial (0–0.20), small (0.20–0.60), moderate Convergent validity
(0.60–1.20), large (1.20–2.0), very large (2.0–4.0) or ‘‘nearly
The correlations between both versions of the PQRS and the
perfect’’ (4.0 or greater).
COPM Performance Score are noted in Table 4. The negative ICC
The ability of scale to detect clinically relevant changes over
scores should be interpreted as indicating a low intra-class
time, or responsiveness, is influences by its test–retest reprodu-
correlation whereby the correlated pair of scores vary as much as
cibility [24]. It is important to calculate changes in the means
any two randomly selected pair of scores [27]. It seems that little
from the measurements obtained in test–retest in order to
convergent validity exists between the PQRS and the COPM’s
determine what change arises from typical variation. A change,
performance component. Examining the adult and child popula-
following an intervention, that is smaller than typical variation is
tions separately, correlations were similar at pre-test. At post-test,
usually not clinically important [25]. Minimal detectable change,
correlations varied from slight to moderate for children and from
or smallest real difference (SRD), is the smallest change in score
virtually none to slight for adults. Correlations of change scores
that can be interpreted as a real change in a client [24]. SRD can
varied from virtually none to fair for children and from slight to
be used as a threshold to determine whether changed scores for a
moderate for adults. Findings for both children and adults
particular client illustrate real improvement, rather than improve-
confirmed the general lack of convergent validity between the
ment due to measurement error or chance variation [26].
COPM performance scores and the PQRS scores.
A change, such as in task performance between pre-test and
The ICC’s for the COPM satisfaction component varied from
post-test, is based on the change score and its error size As such,
virtually none to slight for both versions of the PQRS for both pre-
SRD was calculated using the standard error of measurement
test and post-test correlations. Slightly higher correlations, slight
(SEM) with the following formula 1.96 ˇ(2 ‘‘SEM’’) [24].
to fair, are noted for the correlations of change scores. A fair level
SEM was calculated by taking the square root of the mean square
error term from the analysis of variance [25].
Table 4. Intra-class correlation coefficients between PQRS scores and
Results COPM scores.
Reliability
COPM performance COPM satisfaction
Table 3 displays ICC correlations between each pair of raters, as
Rater PQRS-G PQRS-OD PQRS-G PQRS-OD
well as an overall correlation of all raters for both inter-rater and
test–retest reliability. Based on Shrout’s (1998) classification, all Pre-test
inter-rater correlations for the PQRS-OD displayed substantial Research assistants 0.08 0.37 0.12 0.23
inter-rater correlations with ICCs varying from 0.83 to 0.93; while
OT students 0.10c 0.14 0.13 0.10

the PQRS-G displayed moderate inter-rater correlations with Occupational therapists 0.07 0.18 0.07 0.21
ICCs varying from 0.71 to 0.77. To determine whether inter-rater Post-test
reliabilities were influenced by the population performing the Research assistants 0.37b 0.53a 0.14b 0.24b
tasks, inter-rater correlations were also calculated per population OT students 0.44a 0.33b 0.16b 0.25b
(Table 3). For both children and adults, the ICC coefficients Occupational therapists 0.20b 0.49a 0.04c 0.29b
indicated substantial inter-rater correlations for the PQRS-OD, Change scores
whereas most correlations for the PQRS-G were in the moderate Research assistants 0.35 0.14 0.57 0.58
range. OT students 0.20 0.36 0.47 0.44
While test–retest reliability is slightly higher for the PQRS-G Occupational therapists 0.17 0.12 0.17 0.21
for the RAs and OTs, test–retest was higher for the PQRS-OD for a
the students. Regardless, test–retest reliability is substantial for Fair; bSlight; cVirtually none; negative ICC scores should be interpreted
as indicating a low intra-class correlation.
both the PQRS-G and the PQRS-OD for all categories of raters.
Table 3. Intra-class correlation coefficients between rater pairs and SRDs.
Children Adult Combined

Rater pairs PQRS-G PQRS-OD PQRS-G PQRS-OD PQRS-G PQRS-OD
Inter-rater reliabilities
Research assistants 0.87a 0.93a 0.66b 0.92a 0.77b 0.93a
OT students 0.75b 0.90a 0.80 a
0.96a 0.76b 0.92a
Occupational therapists 0.68b 0.84a 0.77b 0.98a 0.71b 0.88a
Total 0.76b 0.88a 0.74b 0.94a 0.74b 0.91a
Test–rest reliabilities
Research assistants 0.92a 0.94a 0.52c 0.39d 0.86a 0.81a
OT students 0.94a 0.99a 0.78b 0.99a 0.88a 0.99a
Occupational therapists 0.95a 1.0a 0.90a 0.68b 0.99a 0.88a
Total 0.93a 0.99a 0.77b 0.67b 0.88a 0.88a
SRD
Research assistants 2.91 0.90 2.96 5.10 6.23 8.17
OT students 2.63 0.90 3.66 0.89 2.83 1.61
e
Occupational therapists 2.13 2.23 2.91 3.35 1.36
Total 2.55 0.70 2.85 4.24 2.69 2.99
a
Substantial agreement; bModerate agreement; cFair agreement; dSlight agreement; eNot possible to calculate as
standard error of the difference was 0; SRD ¼ smallest real difference ¼ 1.96 ˇ(2 ‘‘SEM’’) .
Table 5. Testing for pre-post intervention differences: paired t-tests and Wilcoxon signed ranks tests (z).
Time 1 Time 2
Median (range) IQR Mean SD Median (range) IQR Mean SD z t p
PQRS-OD: Pre-Post (n ¼ 33)
Research assistants 3 (5) 2 3.06 1.52 8 (6) 3 7.42 2.0 4.7 9.53 50.001
OT students 3 (5) 2 3.06 1.50 8 (8) 3.5 7.64 2.16 4.8 9.64 50.001
Occupational therapists 2 (5) 2 2.79 1.47 8 (6) 2.5 7.15 1.89 4.7 10.1 50.001
PQRS-G: Pre-Post (n ¼ 31)
Research assistants 3 (9) 3 3.72 2.36 8 (7) 2 7.47 1.98 4.1 6.58 50.001
OT students 4 (8) 4.8 4.03 2.62 8 (6) 3.5 8.06 1.66 4.6 7.50 50.001
Occupational therapists 2.5(6) 1.8 2.91 2.91 7 (8) 3 6.63 2.15 4.7 8.49 50.001
IQR, interquartile range; p, probability value for both z and t values.
of convergent validity is noted between the change scores rated by Table 6. Relative efficiencies, parametric effect sizes (SRM), and non-
the RAs and STs and the participant’s satisfaction change scores. parametric effect sizes (responsiveness).
At post-test, correlations for children were virtually none for both
versions of the PQRS, while for adults correlations varied from SRM Responsiveness
virtually none to fair with the PQRS-G and slight to fair for the Rater pairs RE PQRS-G PQRS-OD PQRS-G PQRS-OD
PQRS-OD. For change scores, correlations varied from virtually
Research assistants 0.69 1.11 1.59 0.73 0.82
none to moderate for children and virtually none to fair for adults.
OT students 0.79 1.30 1.62 0.83 0.82
While adults demonstrated a greater correlation between satis- Occupational 0.83 1.44 1.70 0.81 0.84
faction scores and PQRS scores than the children, there does not therapists
appear to be one version of the PQRS where great convergent All raters 0.76 1.28 1.68 0.79 0.82
validity was observed consistently. t
RE, relative efficiency ¼ tPQRSOD
PQRSG
; SRM, standardized response mean ¼
x
D
; responsiveness ¼ pZffiffiffi.
Internal responsiveness SDðD Þ
x N
All paired variances for all paired raters for both versions of the from 0.89 for OT students to 5.10 for research assistants; while for
PQRS were found to be not significant, as verified using the the PQRS-G, SRDs varied from 2.23 for occupational therapists
Pitman-Morgan test, a test of variance for paired samples [28,29]. to 3.66 for OT students. In general, for both the PQRS-OD and the
The results of the paired t-tests, confirmed by the Wilcoxon PQRS-G, greater SRDs are noted for research assistants than for
signed ranks test, are presented in Table 5. All paired t-tests, and occupational therapy students and OTs.
Wilcoxon signed rank tests, for both PQRS-G and PQRS-OD
were significant, indicating that the task performance change from Discussion
pre-test to post-test was significant. Effect size (Z/ˇN) and
relative efficiencies are presented in Table 6. Large effect sizes This is the first study to formally estimate the psychometric
were obtained for both the PQRS-OD and the PQRS-G (R40.72). properties of the PQRS, an observational system for rating
With respect to relative efficiencies, all scores are less than 1, performance quality of client-selected activities. Two different
indicating that the PQRS-OD is the more efficient tool for PQRS scoring systems were compared using a variety of client-
measuring change than the PQRS-G. This is confirmed by the selected activities performed by both adults and children.
SRM in Table 6. For both the PQRS-G and the PQRS-OD the These results (conducted with two distinct populations and a
various effect sizes are large (greater than 1) which suggests that wide variety of self-selected activities derived from COPM
the observed changes in task performance are clinically mean- interviews) indicate that using PQRS-OD provides better inter-
ingful. Middel and van Sodersen (2002) report that ‘‘to give rater reliability than the PQRS-G, overall, as well as per
clinically relevant meaning to change scores gained on two population. Reliability data indicated that both versions of the
different points in time’’ (p. 13) individual client’s perceptions PQRS are reliable, a requisite characteristic for a measure to be
should be considered more explicitly [30]. In keeping with this responsive. Both the PQRS-G and the PQRS-OD demonstrated the
suggestion, the RE and SRM were again calculated after having ability to detect change in general. When comparing the two
removed the only item that did not have a clinically meaningful versions, the RE coefficients point to the PQRS-OD as being more
change of two points on the COPM (data not shown), and the sensitive to general and clinically important changes over time than
PRSQ-OD was still found to be the more responsive tool. the PQRS-G. The PQRS-OD generally obtained smaller SRDs,
SRD for different raters and PQRS versions are presented in confirming that smaller changes detected with the PQRS-OD may
Table 3. When a change score (pre-post) is greater than the SRD, be interpreted as a real change with 95% confidence, thus
a true change can be ascertained with a 95% confidence interval corroborating that it is a more sensitive instrument than the
[31]. Generally, smaller SRD values are obtained using the PQRS- PQRS-G.
OD than the PQRS-G. For children, the SRDs for the PQRS-OD These data suggest that PQRS-OD is a precise tool that permits
varied from 0.69 to 0.89. This indicates that with a PQRS-OD a cross-comparison of a wide range of client-chosen goals and
change in score of 1 can indicate a true change in children. With activities. On the other hand, the PQRS-G provides moderate
the PQRS-G, SRDs with children varied from 2.13 to 2.91, inter-rater reliability, substantial test–retest reliability, and large
indicating that a change in score of 3 is needed to be 95% sure that effect sizes demonstrating good internal responsiveness, indicat-
the change is real and not due to a measurement error. With ing that it is an adequate measure for use in clinical settings, or in
adults, it seems that a greater change score is required to ensure research projects when taking the time and resources required to
that it is a true change. For the PQRS-OD, adult SRDs varied write operational definitions is not feasible.
The PQRS was designed to loosely mirror the COPM, to If the goal/activity cannot easily be observed on video, PQRS
provide an additional measure of self-selected activity perform- ratings can be done with a physical product, such as
ance using ratings of actual observed performances rather than handwriting samples, or by superimposing another objective
self-perceived performance. The COPM is used to elicit the rating system onto PQRS scores, such as the time it takes for
activities that will subsequently be performed, videotaped, and task completion.
rated. We hypothesized that PQRS and COPM scores would be
highly correlated because the PQRS activities were derived from Study limitations
the COPM interview and were thus matched. This did not prove to
As we conducted a secondary analysis of existing video data, the
be true. While the COPM has demonstrated low convergent
video clips were not always ideal, as described above. While it is
validity with other measures in the past, this was presumed to be
likely that the reliability scores would have been higher with
because the COPM items are defined by the client whereas the
perfect videos, and lower if the lowest quality videos had not been
items in other measures are predetermined and can differ greatly
removed from the analysis, we applied strict criteria for removing
from what the COPM is measuring [32]. That issue does not help
certain videos designed to strike a balance between ideal videos
to explain low convergent validity between the COPM and PQRS
and videos that mirror the realities of clinical research. While we
in this study, as the PQRS items came directly from the COPM.
did ensure a wide range of activities, some activities such as bike
The low convergent validity in this current study is better
riding and shoelace tying were self-selected by more than one
explained by the idea that actual performance and perceived
participant, increasing raters’ familiarity with a particular task.
performance may be different constructs. Chan and Lee (1997)
Also, raters were required to view the same video on two
found virtually zero correlation between perceived performance
occasions, thus increasing familiarity with that particular video,

(COPM) and actual performance measures, providing support for
which may have influenced the rating on the second occasion.
this idea [33]. Thus, to explore the convergent validity of the
Nevertheless, it is expected that the 2-week washout period was
PQRS, other measures of actual performance should be used. For
sufficient to guard against this potential practice effect. Another
example, the Functional Arm Activity Behavioral Observation
limitation was that only 10 videos were used for the test–retest
System (FAABOS) makes use of random samples of video
reliability. While this is a small number, correlations were
recordings from stroke and brain injury survivors’ homes to
substantial. As such, it is unlikely that more samples would result
evaluate actual upper extremity performance, and could perhaps
in a substantive change. As all videos came from a small sample
be used as a starting point to assess PQRS validity [34].
of six adults and eight children, the results may not be
Previous use of the PQRS has identified certain usability
generalizable to a broader population. Additional testing is
issues, generally related to whether or not the activity of interest
recommended.
can be seen adequately on video [3]. This issue resurfaced in this
The PQRS-OD requires that someone, usually the clinical or

study, for example in the ‘‘brushing teeth’’ goal, some raters
research therapist, take the time to write clear operational
reported that video did not show the inside of the mouth, making
definitions in line with the client’s self-selected COPM goal. It
performance quality difficult to rate. Some additional issues were
is possible that definitions written by different individuals would
also described by raters. Lack of clarity about the specifics of
have resulted in different psychometric properties. However, in
some goals made using PQRS-G ratings difficult. For example,
this study the definitions were prepared by two different research
with the ‘‘sewing’’ goal, it was not clear if that meant to sew a
therapists (R.M. and S.M.) with different styles, suggesting that
single stitch or to sew two pieces of fabric together, making it
reliability remains high even with varying definition development
difficult for the raters to judge performance completeness.
styles.
Another issue reported by some raters, with both rating systems,
Future validation studies should use other actual performance
arose when the video showed more than one attempt of the same
measures, and should be expanded to include other properties
task, particularly if the attempts had markedly different perform-
such as predictive validity. In addition, the relationship between
ance quality. For instance, in the ‘‘skipping rope’’ goal, the video
the PQRS and an individual’s overall level of functioning should
showed the child missing the first attempt at jumping, but then
be examined.
shows the child readjusting herself and completing 10 consecutive
jumps. It was not clear to raters how to manage the observation of
different trials within one video clip. A third issue was raised Conclusions
using the PQRS-OD. At times, raters described occurrences when Overall, PQRS showed substantial reliability and internal respon-
the video did not match the criteria in the operational definition. siveness in two different populations and across raters with very
For example, for bike riding, one video clip began when the different backgrounds. PQRS-OD had better results than the
participant was already on the bike, but the PQRS-OD criteria PQRS-G for inter-rater reliability, internal responsiveness, and
specified the amount of assistance required to mount the bike. SRD, and is therefore recommended when greater precision is
To mitigate these problems the following recommendations desired. PQRS-G demonstrated acceptable inter-rater reliability,
regarding PQRS administration are made: substantial test–retest reliability, and good internal responsive-
When using PQRS-G, the goal/activity should be well- ness, and therefore is acceptable to use when preparing oper-
specified and should have a degree of measurability to it. For ational definitions is not feasible. Further validation of the PQRS
example, recording the goal as ‘‘skipping’’ is too vague, against actual performance and overall functional measures is
whereas recording ‘‘skipping a minimum of 10 jumps’’ may required.
provide the rater with sufficient additional information to
improve rating accuracy.
Acknowledgements
For both rating systems, care should be taken while video
recording and making video clips that show: only one The authors would like to thank Dr. Carolyn Baum for her
performance attempt, the entire performance of the task, and contributions to the development of the PQRS-G scoring system
close-ups of important task details where possible. In a and for her helpful comments on the manuscript, and Dianne
clinical setting, performances can be rated during a live Cirone, Tammy Craig, Alisa Grigorovich, Mindy Simon, Elise
observation of the activity rather than using a video Pomerleau, and Christina Lamontagne for conducting the
recording. ratings.
Declaration of interest 16. Law M, Darrah J, Pollock N, et al. Focus on function – a randomized
controlled trial comparing two rehabilitation interventions for young
This study was partially funded by the Canadian Institutes of children with cerebral palsy. BMC Pediatr 2007;7:31.
Health Research (FRN #111200). The authors report no conflict 17. Eyssen IC, Beelen A, Dedding C, et al. The reproducibility of the
of interest. Canadian Occupational Performance Measure. Clin Rehabil 2005;
19:888–94.
References 18. Law M, Babtiste S, Carswell-Opzoomer A, et al. Canadian
Occupational Performance Measure, 3th ed. Ottawa, ON: CAOT
1. Dawson DR, Gaya A, Hunt A, et al. Using the cognitive orientation Publications ACE; 1998.
to daily occupational performance approach with adults with 19. Norman GR, Streiner DL. Biostatistics: the bare essentials, 3rd ed.
traumatic brain injury. Can J Occup Ther 2009;76:115–27. Toronto, Canada: BC Decker; 2007.
2. McEwen SE, Polatajko HJ, Huijbregts MPJ, Ryan JD. Inter-task 20. Shrout PE. Measurement reliability and agreement in psychiatry.
transfer of meaningful, functional skills following a cognitive-based Stat Methods Med Res 1998;7:301–17.
treatment: results of three multiple baseline design experiments in 21. Bragante KC, Nascimento DM, Motta NW. Evaluation of acute
adults with chronic stroke. Neuropsychol Rehabil 2010;20:541–61. radiation effects on mandibular movements of patients with head
3. McEwen SE, Polatajko HJ, Huijbregts MP, Ryan JD. Exploring a and neck cancer. Rev Bras Fisioter 2012;16:141–7.
cognitive-based treatment approach to improve motor-based skill 22. Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for
performance in chronic stroke: results of three single case experi- assessing responsiveness: a critical review and recommendations.
ments. Brain Inj 2009;23:1041–53. J Clin Epidemiol 2000;53:459–68.
4. Miller L, Polatajko HJ, Missiuna C, et al. A pilot of a cognitive 23. Meek GE, Ozgur C, Dunning K. Comparison of the t vs. Wilcoxon
treatment for children with developmental coordination disorder. signed-rank test for Likert scale data and small samples. J Modern
Human Mov Sci 2001;20:183–210. Appl Statist Meth [Internet]. Revised 2007;6:91–106.
5. Law M, Baptiste S, McColl MA, et al. The Canadian occupational 24. Beckerman H, Roebroeck ME, Lankhorst GJ, et al. Smallest real
performance measure: an outcome measure for occupational therapy. difference, a link between reproducibility and responsiveness. Qual
Can J Occup Ther 1990;57:82–7. Life Res 2001;10:571–8.
6. Forbes DA. Goal attainment scaling. A responsive measure of client 25. Lexell JE, Downham DY. How to assess the reliability of
outcomes. J Gerontol Nurs 1998;24:34–40. measurements in rehabilitation. Am J Phys Med Rehabil 2005;84:
7. Lemmens RJ, Timmermans AA, Janssen-Potten YJ, et al. Valid and 719–23.
reliable instruments for arm-hand assessment at ICF activity level in 26. Lu WS, Wang CH, Lin JH, et al. The minimal detectable change of
persons with hemiplegia: a systematic review. BMC Neurol 2012; the simplified stroke rehabilitation assessment of movement meas-
12:21. ure. J Rehabil Med 2008;40:615–19.
8. Martini R, Polatajko H. Verbal self-guidance as a treatment 27. Kenny DA, Kashy DA, Cook WL. Dyadic data analysis. New York:
approach for children with developmental coordination disorder: a Guilford Publications; 2006.
systematic replication study. Occupat Therap J Res 1998;18:157–81. 28. Jones G, Noble ADL, Schauer B, Cogger N. Measuring the
9. Phelan S, Steinke L, Mandich A. Exploring a cognitive intervention attenuation in a subject-specific random effect with paired data.
for children with pervasive developmental disorder. Can J Occup
J Data Sci 2009;7:179–88.
Ther 2009;76:23–8.
29. Pitman-Morgan Test: test the difference between correlated
10. Polatajko HJ, McEwen SE, Ryan JD, Baum CM. Pilot randomized
variances. Available from: http://how2stats.blogspot.ca/2011/06/
controlled trial investigating cognitive strategy use to improve goal
testing-difference-between-correlated.html [last accessed 14 Oct
performance after stroke. Am J Occup Ther 2012;66:104–9.
2013].
11. Rodger S, Ireland S, Vun M. Can cognitive orientation to daily
30. Middel B, van Sonderen E. Statistical significant change versus
occupational performance (CO-OP) help children with Asperger’s
syndrome to master social and organisational goals? Br J Occup relevant or important change in (quasi) experimental design: some
Ther 2008;71:23–32. conceptual and methodological problems in estimating magnitude of
12. Gowland C, VanHullenaar S, Torresin W, et al. Chedoke-McMaster intervention-related change in health services research. Int J Integr
stroke assessment. Development, validation, and administration Care 2002;2:e15 (2–18).
manual. Hamilton, ON: Chedoke-McMaster Hospitals and 31. Wang CY, Sheu CF, Protas EJ. Test-retest reliability and measure-
McMaster University; 1995. ment errors of six mobility tests in the community-dwelling elderly.
13. Martini R, Mandich A, Green D. (2014) Implementing a modified Asian J Gerontol Geriatr 2009;4:8–13.
cognitive orientation to daily occupational performance approach for 32. Cup EH, Scholte op Reimer WJ, Thijssen MC, van Kuyk-Minis MA.
use in a group format. Br J Occup Ther 2014;77:214–19. Reliability and validity of the Canadian occupational performance
14. Henderson SE, Sugden DA, Barnett AL. Movement assessment measure in stroke patients. Clin Rehabil 2003;17:402–9.
battery for children – second edition (movement ABC-2). London, 33. Chan CCH, Lee TMC. Validity of the Canadian occupational
UK: The Psychological Corporation; 2007. performance measure. Occupat Therap Int 1997;4:231–49.
15. Phipps S, Richardson P. Occupational therapy outcomes for clients 34. Uswatte G, Hobbs Qadri L. A behavioral observation system for
with traumatic brain injury and stroke using the Canadian occupa- quantifying arm activity in daily life after stroke. Rehabil Psychol
tional performance measure. Am J Occup Ther 2007;61:328–34. 2009;54:398–403.

The Performance Quality Rating Scale (PQRS) : Reliability, Convergent Validity, and Internal Responsiveness For Two Scoring Systems

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Performance Quality Rating Scale (PQRS) : Reliability, Convergent Validity, and Internal Responsiveness For Two Scoring Systems

Загружено:

Авторское право:

Доступные форматы

http://informahealthcare.

Disabil Rehabil, 2015; 37(3): 231–238

The performance quality rating scale (PQRS): reliability, convergent

University of Toronto, Toronto, Ontario, Canada

ä Implications for Rehabilitation

Introduction The performance quality rating scale (PQRS) is an observa-

COPM. In the COPM, the therapist administers a semi-structured

of two points or more on the COPM is considered clinically

Example of PQRS-OD rating scale and definitions

PQRS-G rating scale

Rating Completeness Quality of performance or product

1 0% task completion n/a

Figure 1. Visual representation of the study A Rater B Rater

2-Week Washout Period

OT students 0.10c 0.14 0.13 0.10

Table 3. Intra-class correlation coefficients between rater pairs and SRDs.

Children Adult Combined

IQR, interquartile range; p, probability value for both z and t values.

occasions, thus increasing familiarity with that particular video,

The PQRS-OD requires that someone, usually the clinical or

Вам также может понравиться