Академический Документы
Профессиональный Документы
Культура Документы
com/dre
ISSN 0963-8288 print/ISSN 1464-5165 online
ASSESSMENT PROCEDURE
Abstract Keywords
Purpose: The performance quality rating scale (PQRS) is an observational measure of Development coordination disorder, outcome
performance quality of client-selected, personally meaningful activities. It has been used measures, performance quality rating scale,
inconsistently with different scoring systems, and there have been no formal publications on its PQRS, psychometric properties, stroke
psychometric properties. The purpose of this study was to test and compare the psychometric
properties of two PQRS scoring systems in two populations. Methods: A secondary analysis of History
video recorded participant-selected activities from previous studies involving either adults
living with stroke or children diagnosed with developmental coordination disorder (DCD) was Received 1 May 2013
For personal use only.
conducted. Three pairs of raters scored the video recorded performances with PQRS Revised 2 April 2014
operational definitions (PQRS-OD) and a generic rating system (PQRS-G). Results: For inter- Accepted 7 April 2014
rater reliability, PQRS-OD ICCs were substantial, ranging from 0.83 to 0.93; while the PQRS-G Published online 28 April 2014
ICCs were moderate, ranging from 0.71 to 0.77. Test–retest reliability was substantial, 40.80
(ICC), for both rating systems across all rater pairs. Internal responsiveness was high for both
rating systems. Convergent validity with the Canadian Occupational Performance Measure
(COPM) was inconsistent, with scores ranging from low to moderate. Conclusion: Both scoring
systems have demonstrated they are reliable and have good internal responsiveness. The PQRS-
OD demonstrated greater consistency across raters and is more sensitive to clinically important
change than the PQRS-G and should be used when greater accuracy is required. Further
exploration of validity with actual rather than perceived performance measures is required.
Subsequently, the PQRS has been used in other intervention group format described in [13] and in Canadian Association of
studies [2,3,9–11], however, it has been used inconsistently, with Occupational Therapists Conference presentations 2011 and 2012.
different scoring systems, and a formal evaluation of its psycho- They were between 7 and 11 years, six boys and two girls, and all
metric properties has not been conducted. In fact, in Lemmens performed below the 15th percentile on the Movement
and colleagues’ recent review of upper extremity assessments for Assessment Battery for Children – 2 [14].
stroke and cerebral palsy, PQRS was excluded specifically The investigators selected videos for inclusion in this study
because of lack of published psychometric properties [7]. that had associated COPM scores and that depicted an activity
The PQRS has been used with either generic scoring, or with that was amenable to video analysis. For example, videos of a
specific operational definitions for different rating levels for each stroke patient doing a yoga breathing activity that was perceived
participant-selected skill. With generic scoring, such as used by as too subtle to be assessed with video, was not included.
Miller et al. [4], a rating of ‘‘1’’ indicates that the skill is not done Three pairs of blinded raters were used: two trained research
at all, and ‘‘10’’ indicates that the skill is performed very well. assistants without a health-care professional background (RA),
Alternatively, Martini and Polatajko [8] and McEwen et al. [2,3] two registered occupational therapists (OT), and two occupational
developed operational definitions for PQRS ratings for specific therapy professional masters students (ST). None of the raters had
skills. For example, if the skill was tying shoelaces, an operational ever been involved in the treatment of the patients. Within each
definition for a rating of ‘‘2’’ may be ‘‘crosses laces but cannot pair, one rater was randomly chosen to be the A Rater, and the
make a knot’’; descriptions of progressing skill level are given for other automatically became the B Rater.
at least every other rating level [2,3,8].
The PQRS has generally been used in conjunction with the Instruments
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
Instructions: Watch video. Rate the performance on a scale of 1 to 10, with 1 being ‘‘cannot perform the task at all’’ and 10 ‘‘can perform the
task well’’. Use the table below as a guideline to assist you in rating the performance. Please note that only ONE score is assigned. The video may
be watched a second time, but not more than twice.
Quality takes into account, as applicable to the task, timeliness, accuracy, safety, and overall quality of performance or product.
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
Rating is based on average of completeness and quality, and is rounded to the nearest whole number with X.5 being rounded up. For example, an
observed performance was scored as 75% task completion or 6 for Completeness and poor or 4 for Quality of Performance Product. The average
rating for that performance is 5 [(6 + 4)/2].
Scores of 3, 5, 7, and 9 are to be used at the rater’s discretion for observed performance that are marginally better or worse than something previously
observed, but don’t quite meet the criteria for the next defined category.
5
6 75% task completion Moderate
7
8 100 % of task completion Good
9
10 100 % of task completion Excellent
A Rater B Rater
1. PQRS-OD 1. PQRS-G
• inter-rater reliability: 10 pairs • inter-rater reliability: 10 pairs
of videos (videos 1 - 20) of videos (videos 1 - 20)
• test-retest reliability: 5 • test-retest reliability: 5
Time 2 individual videos (46 -50) individual videos (46 -50)
2. PQRS-G 2. PQRS-OD
• inter-rater reliability: 10 pairs • inter-rater reliability: 10 pairs
of videos (videos 26 - 45) of videos (videos 26 - 45)
• test-retest reliability: 5 • test-retest reliability: 5
individual videos (21 -25) individual videos (21 -25)
234 R. Martini et al. Disabil Rehabil, 2015; 37(3): 231–238
As part of previously conducted intervention studies, the research Table 2. Skills viewed on videos.
participants had been interviewed by a therapist using the
standardized COPM interview to select personally meaningful Children Adults
activities for assessment and intervention. The order of the pre Inter-rater reliability and responsiveness videos
and post intervention videos was randomized within each pair, Putting on and tying a Carrying objects with
and the videos within a pair were always viewed in sequence. In hockey helmet affected arm
other words, if the self-selected activity was skipping rope, both Tying shoe laces Getting in and out of a chair
the pre-intervention and post-intervention videos of skipping rope (4 different children)
Jumping rope Buttoning a shirt
would be viewed back-to-back, but the rater may or may not see Stopping with rollerblades Using a computer mouse
the pre-intervention video first. Each video recording had a Riding a bicycle Putting on a jacket
corresponding COPM score that had been previously assigned by Swinging on a swing Sewing
the research participant during the respective intervention study. Gardening
For test–retest reliability, an additional 10 individual videos of Climbing stairs
self-selected activities were viewed, five of adults and five of Cutting fruit
Riding a bicycle
children. These videos were not paired, as pre- and post-
Test–retest reliability videos
intervention pairs were not required for assessment of test–retest Swinging on a swing Clipping nails
reliability. At each rating period, A Raters scored five videos with Jumping rope alone Walking
the PQRS-OD and the other five videos with the PQRS-G, Brushing teeth Tying a tie
repeating the procedure at the second rating period. The B Raters Lacing skates Photography
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
used the same videos and procedures, but employed the alternate Running
PQRS used by A Raters.
The file names of the video recordings were designed to
indicate the order that the raters should view them in and the mixed, ICC, and the two may be substituted interchangeably [19].
video files’ metadata was altered to ensure the raters could not The following benchmarks proposed by Shrout (1998) were used
determine the chronological order of the video pairs. Videos were for the interpretation of inter-rater reliability coefficients: 0.00 to
transferred to memory sticks encrypted using TrueCrypt1. Each 0.10 (virtually none); 0.11 to 0.40 (slight); 0.41 to 0.60 (fair); 0.61
memory stick contained videos for a specific rating period and to 0.80 (moderate); and 0.81 to 1.0 (substantial) [20].
rating scale. To accommodate the schedules of the raters across Intra-class correlation coefficients were calculated to evaluate
two study sites, the memory sticks were given to the raters in convergent validity between PQRS and COPM performance and
advance and when a rater was ready to begin a rating period, they satisfaction scores at pre-test and post-test. Convergent validity of
For personal use only.
were e-mailed the passwords to the appropriate memory sticks. items is the degree to which theoretically similar concepts are
Raters were instructed to rate each video to the best of their ability related [21].
using the assigned system, but not to review each video more than Internal responsiveness is the ability of a measure to detect
twice to decide on the score. This instruction was given to change over a specified time frame [22]. The PQRS is an ordinal
minimize practice effect. scale, so a significant change finding was determined using the
Ethics approval for the present study was obtained from the non-parametric Wilcoxon signed rank test. It should be noted that
Research Ethics Boards of the participating institutions. in their comparison of the Wilcoxon signed rank test with the t-
test, Meek, Ozgur and Dunning (2007) concluded that the t-test
Analysis may be preferred over the Wilcoxon signed rank test as the t-test
seemed to possess higher power than the Wilcoxon signed-rank
Data were cleaned and examined for discrepancies. A decision to
test when the sample sizes are small, even though its assumptions
remove some videos was made based on feedback from raters. If a
were violated [23]. As such, a paired t-test between pre and post
video was identified by raters as problematic, it was removed only
scores was used to confirm significant change.
if there were between-rater differences of 4 points or more across
The effect size statistic provides direct information on
more than one pair of raters and with both rating systems. This
magnitude of change in a measure, as such it is widely
resulted in five videos being removed, leaving 35 videos for inter-
recommended for use as an indicator of responsiveness [22].
rater reliability analysis. Since pairs of videos (pre-test and post-
The effect size for an ordinal scale was calculated using ¼Z/ˇN.
test) were required for validity and responsiveness analysis, an
Relative efficiency (RE) is another estimate of change in a
additional three videos were removed for those analyses, leaving
measure. It allows one to determine whether one measure is more
32 videos (16 pairs or tasks: 9 by children and 7 by adults). For
or less efficient for measuring change than another. It can be
test–retest reliability one video was removed, leaving nine videos
computer for any pair of instruments. An RE51 means that the
of different skills, four of adults and five of children. The decision
instrument was a less efficient tool for measuring change; while
to remove these videos from the analysis was made to reduce
an RE41 means that the instrument was a more efficient tool for
measurement error based on the quality of the videos rather than
measuring change.
on issues with the scoring systems. Table 2 lists all skills that
Husted et al. (2000) recommended that this be complemented
were viewed.
by another measure. The standardized response mean (SRM) was
Test–retest and inter-rater reliability were calculated using the
therefore used to confirm the paired t-test and RE index findings.
intra-class correlation coefficient (ICC). To determine inter-rater
The SRM is the ratio of observed change and the standard
reliability, two-way mixed model ICCs with absolute agreement
deviation reflecting the variability of the change scores [22].
were calculated for pairs of raters (OTs, RAs and STs) for both the
Norman (2010) argues that parametric statistics are robust with
PQRS-OD and the PQRS-G scoring systems. While the PQRS is
respect to violations of assumptions regarding sample size and
an ordinal measure Normal and Streiner (2008) have argued that
normal distribution and hence can be applied to ordinal ratings
using a weighted kappa for ordinal scales is identical to a two-way
[19]. As such the SRM, based on the paired t-test calculations was
applied to ordinal PQRS ratings. Calculated SRM effect sizes
1
Copyright ß 2003-2013 TrueCrypt Developers Association; http:// were compared to Hopkins’ more conservative Likert-scale
www.truecrypt.org approach to predetermined values of effect size representing
DOI: 10.3109/09638288.2014.913702 The performance quality rating scale (PQRS) 235
responsiveness as: trivial (0–0.20), small (0.20–0.60), moderate Convergent validity
(0.60–1.20), large (1.20–2.0), very large (2.0–4.0) or ‘‘nearly
The correlations between both versions of the PQRS and the
perfect’’ (4.0 or greater).
COPM Performance Score are noted in Table 4. The negative ICC
The ability of scale to detect clinically relevant changes over
scores should be interpreted as indicating a low intra-class
time, or responsiveness, is influences by its test–retest reprodu-
correlation whereby the correlated pair of scores vary as much as
cibility [24]. It is important to calculate changes in the means
any two randomly selected pair of scores [27]. It seems that little
from the measurements obtained in test–retest in order to
convergent validity exists between the PQRS and the COPM’s
determine what change arises from typical variation. A change,
performance component. Examining the adult and child popula-
following an intervention, that is smaller than typical variation is
tions separately, correlations were similar at pre-test. At post-test,
usually not clinically important [25]. Minimal detectable change,
correlations varied from slight to moderate for children and from
or smallest real difference (SRD), is the smallest change in score
virtually none to slight for adults. Correlations of change scores
that can be interpreted as a real change in a client [24]. SRD can
varied from virtually none to fair for children and from slight to
be used as a threshold to determine whether changed scores for a
moderate for adults. Findings for both children and adults
particular client illustrate real improvement, rather than improve-
confirmed the general lack of convergent validity between the
ment due to measurement error or chance variation [26].
COPM performance scores and the PQRS scores.
A change, such as in task performance between pre-test and
The ICC’s for the COPM satisfaction component varied from
post-test, is based on the change score and its error size As such,
virtually none to slight for both versions of the PQRS for both pre-
SRD was calculated using the standard error of measurement
test and post-test correlations. Slightly higher correlations, slight
(SEM) with the following formula 1.96 ˇ(2 ‘‘SEM’’) [24].
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
to fair, are noted for the correlations of change scores. A fair level
SEM was calculated by taking the square root of the mean square
error term from the analysis of variance [25].
Table 4. Intra-class correlation coefficients between PQRS scores and
Results COPM scores.
Reliability
COPM performance COPM satisfaction
Table 3 displays ICC correlations between each pair of raters, as
Rater PQRS-G PQRS-OD PQRS-G PQRS-OD
well as an overall correlation of all raters for both inter-rater and
test–retest reliability. Based on Shrout’s (1998) classification, all Pre-test
inter-rater correlations for the PQRS-OD displayed substantial Research assistants 0.08 0.37 0.12 0.23
inter-rater correlations with ICCs varying from 0.83 to 0.93; while
For personal use only.
Table 5. Testing for pre-post intervention differences: paired t-tests and Wilcoxon signed ranks tests (z).
Time 1 Time 2
Median (range) IQR Mean SD Median (range) IQR Mean SD z t p
PQRS-OD: Pre-Post (n ¼ 33)
Research assistants 3 (5) 2 3.06 1.52 8 (6) 3 7.42 2.0 4.7 9.53 50.001
OT students 3 (5) 2 3.06 1.50 8 (8) 3.5 7.64 2.16 4.8 9.64 50.001
Occupational therapists 2 (5) 2 2.79 1.47 8 (6) 2.5 7.15 1.89 4.7 10.1 50.001
PQRS-G: Pre-Post (n ¼ 31)
Research assistants 3 (9) 3 3.72 2.36 8 (7) 2 7.47 1.98 4.1 6.58 50.001
OT students 4 (8) 4.8 4.03 2.62 8 (6) 3.5 8.06 1.66 4.6 7.50 50.001
Occupational therapists 2.5(6) 1.8 2.91 2.91 7 (8) 3 6.63 2.15 4.7 8.49 50.001
of convergent validity is noted between the change scores rated by Table 6. Relative efficiencies, parametric effect sizes (SRM), and non-
the RAs and STs and the participant’s satisfaction change scores. parametric effect sizes (responsiveness).
At post-test, correlations for children were virtually none for both
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
versions of the PQRS, while for adults correlations varied from SRM Responsiveness
virtually none to fair with the PQRS-G and slight to fair for the Rater pairs RE PQRS-G PQRS-OD PQRS-G PQRS-OD
PQRS-OD. For change scores, correlations varied from virtually
Research assistants 0.69 1.11 1.59 0.73 0.82
none to moderate for children and virtually none to fair for adults.
OT students 0.79 1.30 1.62 0.83 0.82
While adults demonstrated a greater correlation between satis- Occupational 0.83 1.44 1.70 0.81 0.84
faction scores and PQRS scores than the children, there does not therapists
appear to be one version of the PQRS where great convergent All raters 0.76 1.28 1.68 0.79 0.82
validity was observed consistently. t
RE, relative efficiency ¼ tPQRSOD
PQRSG
; SRM, standardized response mean ¼
x
D
; responsiveness ¼ pZffiffiffi.
Internal responsiveness SDðD Þ
x N
For personal use only.
All paired variances for all paired raters for both versions of the from 0.89 for OT students to 5.10 for research assistants; while for
PQRS were found to be not significant, as verified using the the PQRS-G, SRDs varied from 2.23 for occupational therapists
Pitman-Morgan test, a test of variance for paired samples [28,29]. to 3.66 for OT students. In general, for both the PQRS-OD and the
The results of the paired t-tests, confirmed by the Wilcoxon PQRS-G, greater SRDs are noted for research assistants than for
signed ranks test, are presented in Table 5. All paired t-tests, and occupational therapy students and OTs.
Wilcoxon signed rank tests, for both PQRS-G and PQRS-OD
were significant, indicating that the task performance change from Discussion
pre-test to post-test was significant. Effect size (Z/ˇN) and
relative efficiencies are presented in Table 6. Large effect sizes This is the first study to formally estimate the psychometric
were obtained for both the PQRS-OD and the PQRS-G (R40.72). properties of the PQRS, an observational system for rating
With respect to relative efficiencies, all scores are less than 1, performance quality of client-selected activities. Two different
indicating that the PQRS-OD is the more efficient tool for PQRS scoring systems were compared using a variety of client-
measuring change than the PQRS-G. This is confirmed by the selected activities performed by both adults and children.
SRM in Table 6. For both the PQRS-G and the PQRS-OD the These results (conducted with two distinct populations and a
various effect sizes are large (greater than 1) which suggests that wide variety of self-selected activities derived from COPM
the observed changes in task performance are clinically mean- interviews) indicate that using PQRS-OD provides better inter-
ingful. Middel and van Sodersen (2002) report that ‘‘to give rater reliability than the PQRS-G, overall, as well as per
clinically relevant meaning to change scores gained on two population. Reliability data indicated that both versions of the
different points in time’’ (p. 13) individual client’s perceptions PQRS are reliable, a requisite characteristic for a measure to be
should be considered more explicitly [30]. In keeping with this responsive. Both the PQRS-G and the PQRS-OD demonstrated the
suggestion, the RE and SRM were again calculated after having ability to detect change in general. When comparing the two
removed the only item that did not have a clinically meaningful versions, the RE coefficients point to the PQRS-OD as being more
change of two points on the COPM (data not shown), and the sensitive to general and clinically important changes over time than
PRSQ-OD was still found to be the more responsive tool. the PQRS-G. The PQRS-OD generally obtained smaller SRDs,
SRD for different raters and PQRS versions are presented in confirming that smaller changes detected with the PQRS-OD may
Table 3. When a change score (pre-post) is greater than the SRD, be interpreted as a real change with 95% confidence, thus
a true change can be ascertained with a 95% confidence interval corroborating that it is a more sensitive instrument than the
[31]. Generally, smaller SRD values are obtained using the PQRS- PQRS-G.
OD than the PQRS-G. For children, the SRDs for the PQRS-OD These data suggest that PQRS-OD is a precise tool that permits
varied from 0.69 to 0.89. This indicates that with a PQRS-OD a cross-comparison of a wide range of client-chosen goals and
change in score of 1 can indicate a true change in children. With activities. On the other hand, the PQRS-G provides moderate
the PQRS-G, SRDs with children varied from 2.13 to 2.91, inter-rater reliability, substantial test–retest reliability, and large
indicating that a change in score of 3 is needed to be 95% sure that effect sizes demonstrating good internal responsiveness, indicat-
the change is real and not due to a measurement error. With ing that it is an adequate measure for use in clinical settings, or in
adults, it seems that a greater change score is required to ensure research projects when taking the time and resources required to
that it is a true change. For the PQRS-OD, adult SRDs varied write operational definitions is not feasible.
DOI: 10.3109/09638288.2014.913702 The performance quality rating scale (PQRS) 237
The PQRS was designed to loosely mirror the COPM, to If the goal/activity cannot easily be observed on video, PQRS
provide an additional measure of self-selected activity perform- ratings can be done with a physical product, such as
ance using ratings of actual observed performances rather than handwriting samples, or by superimposing another objective
self-perceived performance. The COPM is used to elicit the rating system onto PQRS scores, such as the time it takes for
activities that will subsequently be performed, videotaped, and task completion.
rated. We hypothesized that PQRS and COPM scores would be
highly correlated because the PQRS activities were derived from Study limitations
the COPM interview and were thus matched. This did not prove to
As we conducted a secondary analysis of existing video data, the
be true. While the COPM has demonstrated low convergent
video clips were not always ideal, as described above. While it is
validity with other measures in the past, this was presumed to be
likely that the reliability scores would have been higher with
because the COPM items are defined by the client whereas the
perfect videos, and lower if the lowest quality videos had not been
items in other measures are predetermined and can differ greatly
removed from the analysis, we applied strict criteria for removing
from what the COPM is measuring [32]. That issue does not help
certain videos designed to strike a balance between ideal videos
to explain low convergent validity between the COPM and PQRS
and videos that mirror the realities of clinical research. While we
in this study, as the PQRS items came directly from the COPM.
did ensure a wide range of activities, some activities such as bike
The low convergent validity in this current study is better
riding and shoelace tying were self-selected by more than one
explained by the idea that actual performance and perceived
participant, increasing raters’ familiarity with a particular task.
performance may be different constructs. Chan and Lee (1997)
Also, raters were required to view the same video on two
found virtually zero correlation between perceived performance
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
Declaration of interest 16. Law M, Darrah J, Pollock N, et al. Focus on function – a randomized
controlled trial comparing two rehabilitation interventions for young
This study was partially funded by the Canadian Institutes of children with cerebral palsy. BMC Pediatr 2007;7:31.
Health Research (FRN #111200). The authors report no conflict 17. Eyssen IC, Beelen A, Dedding C, et al. The reproducibility of the
of interest. Canadian Occupational Performance Measure. Clin Rehabil 2005;
19:888–94.
References 18. Law M, Babtiste S, Carswell-Opzoomer A, et al. Canadian
Occupational Performance Measure, 3th ed. Ottawa, ON: CAOT
1. Dawson DR, Gaya A, Hunt A, et al. Using the cognitive orientation Publications ACE; 1998.
to daily occupational performance approach with adults with 19. Norman GR, Streiner DL. Biostatistics: the bare essentials, 3rd ed.
traumatic brain injury. Can J Occup Ther 2009;76:115–27. Toronto, Canada: BC Decker; 2007.
2. McEwen SE, Polatajko HJ, Huijbregts MPJ, Ryan JD. Inter-task 20. Shrout PE. Measurement reliability and agreement in psychiatry.
transfer of meaningful, functional skills following a cognitive-based Stat Methods Med Res 1998;7:301–17.
treatment: results of three multiple baseline design experiments in 21. Bragante KC, Nascimento DM, Motta NW. Evaluation of acute
adults with chronic stroke. Neuropsychol Rehabil 2010;20:541–61. radiation effects on mandibular movements of patients with head
3. McEwen SE, Polatajko HJ, Huijbregts MP, Ryan JD. Exploring a and neck cancer. Rev Bras Fisioter 2012;16:141–7.
cognitive-based treatment approach to improve motor-based skill 22. Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for
performance in chronic stroke: results of three single case experi- assessing responsiveness: a critical review and recommendations.
ments. Brain Inj 2009;23:1041–53. J Clin Epidemiol 2000;53:459–68.
4. Miller L, Polatajko HJ, Missiuna C, et al. A pilot of a cognitive 23. Meek GE, Ozgur C, Dunning K. Comparison of the t vs. Wilcoxon
treatment for children with developmental coordination disorder. signed-rank test for Likert scale data and small samples. J Modern
Human Mov Sci 2001;20:183–210. Appl Statist Meth [Internet]. Revised 2007;6:91–106.
Disabil Rehabil Downloaded from informahealthcare.com by Nyu Medical Center on 04/21/15
5. Law M, Baptiste S, McColl MA, et al. The Canadian occupational 24. Beckerman H, Roebroeck ME, Lankhorst GJ, et al. Smallest real
performance measure: an outcome measure for occupational therapy. difference, a link between reproducibility and responsiveness. Qual
Can J Occup Ther 1990;57:82–7. Life Res 2001;10:571–8.
6. Forbes DA. Goal attainment scaling. A responsive measure of client 25. Lexell JE, Downham DY. How to assess the reliability of
outcomes. J Gerontol Nurs 1998;24:34–40. measurements in rehabilitation. Am J Phys Med Rehabil 2005;84:
7. Lemmens RJ, Timmermans AA, Janssen-Potten YJ, et al. Valid and 719–23.
reliable instruments for arm-hand assessment at ICF activity level in 26. Lu WS, Wang CH, Lin JH, et al. The minimal detectable change of
persons with hemiplegia: a systematic review. BMC Neurol 2012; the simplified stroke rehabilitation assessment of movement meas-
12:21. ure. J Rehabil Med 2008;40:615–19.
8. Martini R, Polatajko H. Verbal self-guidance as a treatment 27. Kenny DA, Kashy DA, Cook WL. Dyadic data analysis. New York:
approach for children with developmental coordination disorder: a Guilford Publications; 2006.
systematic replication study. Occupat Therap J Res 1998;18:157–81. 28. Jones G, Noble ADL, Schauer B, Cogger N. Measuring the
For personal use only.
9. Phelan S, Steinke L, Mandich A. Exploring a cognitive intervention attenuation in a subject-specific random effect with paired data.
for children with pervasive developmental disorder. Can J Occup
J Data Sci 2009;7:179–88.
Ther 2009;76:23–8.
29. Pitman-Morgan Test: test the difference between correlated
10. Polatajko HJ, McEwen SE, Ryan JD, Baum CM. Pilot randomized
variances. Available from: http://how2stats.blogspot.ca/2011/06/
controlled trial investigating cognitive strategy use to improve goal
testing-difference-between-correlated.html [last accessed 14 Oct
performance after stroke. Am J Occup Ther 2012;66:104–9.
2013].
11. Rodger S, Ireland S, Vun M. Can cognitive orientation to daily
30. Middel B, van Sonderen E. Statistical significant change versus
occupational performance (CO-OP) help children with Asperger’s
syndrome to master social and organisational goals? Br J Occup relevant or important change in (quasi) experimental design: some
Ther 2008;71:23–32. conceptual and methodological problems in estimating magnitude of
12. Gowland C, VanHullenaar S, Torresin W, et al. Chedoke-McMaster intervention-related change in health services research. Int J Integr
stroke assessment. Development, validation, and administration Care 2002;2:e15 (2–18).
manual. Hamilton, ON: Chedoke-McMaster Hospitals and 31. Wang CY, Sheu CF, Protas EJ. Test-retest reliability and measure-
McMaster University; 1995. ment errors of six mobility tests in the community-dwelling elderly.
13. Martini R, Mandich A, Green D. (2014) Implementing a modified Asian J Gerontol Geriatr 2009;4:8–13.
cognitive orientation to daily occupational performance approach for 32. Cup EH, Scholte op Reimer WJ, Thijssen MC, van Kuyk-Minis MA.
use in a group format. Br J Occup Ther 2014;77:214–19. Reliability and validity of the Canadian occupational performance
14. Henderson SE, Sugden DA, Barnett AL. Movement assessment measure in stroke patients. Clin Rehabil 2003;17:402–9.
battery for children – second edition (movement ABC-2). London, 33. Chan CCH, Lee TMC. Validity of the Canadian occupational
UK: The Psychological Corporation; 2007. performance measure. Occupat Therap Int 1997;4:231–49.
15. Phipps S, Richardson P. Occupational therapy outcomes for clients 34. Uswatte G, Hobbs Qadri L. A behavioral observation system for
with traumatic brain injury and stroke using the Canadian occupa- quantifying arm activity in daily life after stroke. Rehabil Psychol
tional performance measure. Am J Occup Ther 2007;61:328–34. 2009;54:398–403.