Академический Документы
Профессиональный Документы
Культура Документы
net/publication/256061891
CITATIONS READS
60 4,748
2 authors, including:
Simone Kauffeld
Technische Universität Braunschweig
383 PUBLICATIONS 2,946 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Quo vadis Post-Doc: Professur, Wirtschaft oder prekäres Arbeitsverhältnis? Individuelle, soziale und organisationale Faktoren für die Laufbahnentwicklung und den
Laufbahnerfolg des wissenschaftlichen Nachwuchses (ProWi+) View project
All content following this page was uploaded by Simone Kauffeld on 09 July 2016.
Introduction
Professional training is costly for contemporary organizations (e.g. Grossman & Salas,
2011). In 2010, for example, US organizations invested a total of approximately
Figure 1: Scales of the Q4TE (framework following Wang & Wilcox, 2006; Kirkpatrick &
Kirkpatrick, 2006).
Kirkpatrick’s organizational level seems to be most difficult for trainees to assess (Wang
& Wilcox, 2006). To our knowledge, there is no questionnaire that covers all four levels
of Kirkpatrick’s evaluation framework in a time-efficient manner while being applica-
ble to a wide variety of trainings contents and psychometrically examined.
In the present paper, we address this issue by developing the Q4TE, a time-efficient
and widely applicable self-report measure especially for practitioners. The Q4TE covers
all four levels of Kirkpatrick’s evaluation framework (see Figure 1). Level 1, reaction, is
assumed to be multidimensional and often divided into affective responses and utility
judgments (Alliger et al., 1997; Tracey et al., 2001). Therefore, the first level of the Q4TE
is divided into global satisfaction with the training and perceived training utility. Level
2, learning, refers to the skills and knowledge acquired in a training (e.g. Wang &
Wilcox, 2006). In the Q4TE, we focus on knowledge, which refers to participants’
perceived knowledge acquisition. Level 3, behavior, refers to changes in behavior as a
consequence of training participation (Kirkpatrick & Kirkpatrick, 2006, p. 22). In the
Q4TE, we measure application to practice, which refers to the extent to which the
training contents are applied at work (Aguinis & Kraiger, 2009). Level 4, organizational
results, is kept rather unspecified in the Kirkpatrick model (Alliger et al., 1997). To
clarify this, the Q4TE takes into account that there are three main aspects relevant for
evaluating organizational results: qualitative, temporal and financial impact of training
participation (Wang & Wilcox, 2006). As the costs or financial impact is difficult to assess
with self-report items, level 4 of the Q4TE aims at covering especially the qualitative,
but also the temporal impact. In line with the multifoci perspective (e.g. concerning
organizational citizenship behavior; Lavelle et al., 2007), the Q4TE differs between
individual and global organizational results. We thereby account for the fact that
training may have an effect on the organization, which is, in turn, reflected on the single
employee (individual organizational results) and on the whole organization (global
organizational results). It is important to note that the Q4TE scales knowledge, appli-
cation to practice, individual, and global organizational results measure the perceived
training benefits. For simplification purposes, however, they are henceforth referred to
without this specification.
Study overview
To investigate the psychometric properties of the Q4TE, we use three studies with a
total of n = 1134 employees. In study 1, the Q4TE is developed. In study 2, we address
the first research question and examine the underlying factor structure of the Q4TE.
Finally, in study 3, we address the second and third research questions and explore the
relationship between the Q4TE scales and transfer to practice.
Data analysis. To investigate the stability of the reduced form, sample 1 (n = 408) was
randomly split into two subsamples with a ratio of about 60:40 by means of the
Predictive Analytics SoftWare (PASW) random case selection procedure. Subsample A
(n = 251) was used for item reduction and subsample B (n = 157) was used for the
investigation of stability. The ratio of 60:40 was chosen to account for the fact that
subsample B was to serve for examining a model with smaller complexity. First, to
explore the underlying factor structure that best represents the present data, an
exploratory factor analysis (EFA) was performed on subsample A. Second, we con-
ducted a CFA on subsample A, taking into account the EFA results and considering
theoretical assumptions (distinguishing between single Q4TE scales, cf. Alliger et al.,
1997). By means of CFA we reduced the number of items per scale further in order to
get a final Q4TE form with two items per scale. Item selection was based on statistical
(e.g. factor loadings and modification indices) and nonstatistical properties (item
wording; cf. Kline, 1986). Third, the resulting Q4TE form had to be reexamined via CFA
on subsample B to assess its stability.
EFA was conducted with PASW 18 (SPSS Inc., Chicago, IL), and CFA was con-
ducted with Mplus 6 (Muthén & Muthén, 1998–2010). As model evaluation should
be based on multiple criteria (Byrne, 2005), we used the ratio of c2 to degrees of
freedom (d.f.), RMSEA, CFI and SRMR (Schweizer, 2010; see Schermelleh-Engel et al.,
2003 for cutoff values). For all CFA analyses, we applied a maximum like-
lihood estimator robust to non-normally distributed data (MLR; Muthén & Muthén,
1998–2010).
Reaction Satisfaction Ich werde das Training in guter Erinnerung Adaptation from Bihler
behalten. (2006, p. 200)
I will keep the training in good memory.
Das Training hat mir sehr viel Spaß gemacht. Additional item
I enjoyed the training very much. developed in our
work group (see also
Brown, 2005)
Utility Das Training bringt mir für meine Arbeit sehr Additional item
viel.b developed in our
The training is very beneficial to my work.b work group (see also
Mathieu et al., 1992)
Die Teilnahme am Training ist äußerst Adaptation of the
nützlich für meine Arbeit. initial Q4TE form by
Participation in this kind of training is very Kauffeld et al. (2009)
useful for my job.
Learning Knowledge Ich weiß jetzt viel mehr als vorher über die Adaptation from
Trainingsinhalte. Deschler et al. (2005,
After the training, I know substantially p. 34)
more about the training contents than
before.
In dem Training habe ich sehr viel Neues Following the initial
gelernt. Q4TE form by
I learned a lot of new things in the Kauffeld et al. (2009)
training.
Behavior Application Die im Training erworbenen Kenntnisse nutze Adaptation of the
to practice ich häufig in meiner täglichen Arbeit. initial Q4TE form by
In my everyday work, I often use the Kauffeld et al. (2009)
knowledge I gained in the training.
Es gelingt mir sehr gut, die erlernten Following Gnefkow
Trainingsinhalte in meiner täglichen Arbeit (2008, p. 263)
anzuwenden.
I successfully manage to apply the training
contents in my everyday work.
Organizational Individual Seit dem Training bin ich mit meiner Arbeit Additional item
results zufriedener. developed in our
Since the training, I have been more work group (see also
content with my work. Ironson et al., 1989)
Durch die Anwendung der Trainingsinhalte Following Ong et al.
hat sich meine Arbeitsleistung verbessert. (2004)
My job performance has improved through
the application of the training contents.
Global Durch die Anwendung der Trainingsinhalte Adaptation of the
konnten Arbeitsabläufe im Unternehmen initial Q4TE form by
vereinfacht werden. Kauffeld et al. (2009)
Overall, it seems to me that the application
of the training contents has facilitated the
work flow in my company.
Durch das Training hat sich das Adaptation of the
Unternehmensklima verbessert. initial Q4TE form by
Overall, it seems to me that the Kauffeld et al. (2009)
organizational climate has improved due
to the training.
Note: Adaptation means a maximum of three words of the original item wording was changed (e.g. to
adapt the item to the training field). Following means item content of the original item was used as basis
for item development.
a
If required for research purposes, researchers can additionally use the scale self-efficacy (not depicted
here). This scale was part of the initial Q4TE and contains two items which are adapted from Schyns and
von Collani (2002).
b
The tense of this item was adapted for our retrospective study.
Scale development. For ML estimation, a minimum ratio of at least five cases per free
parameter estimated is recommended (Bentler & Chou, 1987). Therefore, a model based
on the remaining 20 items (CFA on subsample A) and a model based on the reduced
form (CFA on subsample B), respectively, would not have been sufficient. Due to this
constraint, we specified two separate CFAs (covering short- and long-term outcomes,
respectively) for each subsample (see Figure 2 for subsample A analysis).
a s1 b
s2
a1
SAT s3
APP a2
s4
a3
s5
u1 i1
UT u2 I-OR i2
u3 i3
k1
g1
k2 G-OR
KNOW g2
k3
k4
Figure 2: Specified CFA model for (a) short-term evaluation and (b) long-term evaluation in
subsample A (n = 251) of study 1 (error terms are not depicted). SAT = satisfaction,
UT = utility, KNOW = knowledge, APP = application to practice, I-OR = individual
organizational results, G-OR = global organizational results.
Study 1 (n = 408)
Subsample A (n = 251): short-term
evaluation
3-factor model (SAT with 5 items, 12 316.84 51 6.21 0.144 0.898 0.041
UT with 3 items, KNOW with 4
items)
3-factor model (SAT with 2 items, 6 4.68 6 0.78 0.000 1.000 0.007
UT with 2 items, KNOW with 2
items)
Subsample A (n = 251): long-term
evaluation
3-factor model (APP with 3 items, 8 66.20 17 3.89 0.107 0.953 0.035
I-OR with 3 items, G-OR with 2
items)
3-factor model (APP with 2 items, 6 3.59 6 0.60 0.000 1.000 0.011
I-OR with 2 items, G-OR with 2
items)
Subsample B (n = 157): short-term
evaluation
3-factor model (SAT with 2 items, 6 7.19 6 1.20 0.036 0.997 0.015
UT with 2 items, KNOW with 2
items)
Subsample B (n = 157): long-term
evaluation
3-factor model (APP with 2 items, 6 4.76 6 0.79 0.000 1.000 0.017
I-OR with 2 items, G-OR with 2
items)
Study 2 (n = 287)
Model 1: 6 latent, intercorrelated 12 100.40 39 2.57 0.074 0.966 0.030
factors (SAT, UT, KNOW, APP,
I-OR, G-OR)
Model 2: 6 latent first-order factors 12 167.51 47 3.56 0.095 0.933 0.051
(SAT, UT, KNOW, APP, I-OR,
G-OR) and 2 latent second-order
factors (S-TE and L-TE) following
Wang and Wilcox (2006)
Model 3: 4 latent, intercorrelated 12 215.81 48 4.50 0.110 0.907 0.051
factors following Kirkpatrick‘s
(1967) four-level model
identified a reduced form with a total of six items based on modification indices and
residual variances in combination with inspection of item wording. The reduced model
with two items per scale obtained a good fit to subsample A (see Table 2). Investigation
of the stability of this solution in subsample B also provided good model fit (see Table 2).
We successfully identified a Q4TE form with two items per scale, which makes
time-efficient training evaluations possible. However, the present results still had to be
cross-validated and examined in one CFA model comprising both short- and long-term
evaluation. This was realized in study 2.
Measures. The Q4TE was measured with the two items per scale identified in study 1
with an 11-point answering scale (see study 1).
Data analysis. CFA using the MLR-estimator in Mplus 6 (Muthén & Muthén, 1998–
2010) was applied to investigate the stability of the underlying factor structure.
Figure 3: CFA models for investigating the underlying factor structure in study 2
(n = 287). Error terms are not depicted. SAT = satisfaction, UT = utility,
KNOW = knowledge, APP = application to practice, I-OR = individual organizational
results, G-OR = global organizational results, Short-TE = short-term evaluation,
Long-TE = long-term evaluation.
Measures. The Q4TE was measured with the two items per scale identified in study
1 and cross-validated in study 2, with an 11-point answering scale (see study 1).
Transfer to practice was measured with the item ‘Have you been able to transfer
training contents to practice?’, which had to be rated with yes or no (an adaptation of
Kauffeld et al., 2008, 2009). Moreover, we measured transfer quantity as the number
of steps transferred to practice (Kauffeld et al., 2008, 2009; Kauffeld & Lehmann-
Willenbrock, 2010) and used it as a more elaborated measure of transfer. The par-
ticipants were asked to write down up to 10 training contents they had been able to
transfer to practice.
Data analysis. We investigated group differences between employees who could trans-
fer training contents to practice and those who could not by means of two separate
multivariate analysis of covariance (MANCOVA) analyses in PASW (covering short-
and long-term outcomes, respectively). To investigate the relationship between the
Q4TE scales and transfer quantity, we conducted a multiple regression analysis in
PASW.
Note: Internal consistency values calculated with Cronbach’s a are presented diagonally in parentheses.
* p < 0.05, ** p < 0.01 (2-sided significance).
a
No internal consistency value was calculated for transfer and transfer quantity (one item each).
b
Transfer: 1 = yes, 0 = no.
c
Gender: 1 = female, 2 = male.
d
1 = closed skills, 2 = both (open and closed skills) and 3 = open skills.
e
Kendall‘s t correlations are depicted because type of training content is an ordinal variable.
M = mean, SD = standard deviation.
M SD M SD F
Step 1 Step 2
b b
Covariates
Age 0.13 0.13*
Organizational tenure 0.06 0.00
Course duration 0.08 0.03
Correlates (independent variables)
Satisfaction (SAT) – 0.12
Utility (UT) – -0.05
Knowledge (KNOW) – 0.05
Application to practice (APP) – 0.26**
Individual org. results (I-OR) – 0.11
Global org. results (G-OR) – 0.03
R2 0.04 0.22
R2adj 0.03 0.20
F 4.88** 11.63**
Note: Multiple regression analysis using the enter method (n = 391), missing listwise.
b = standardized regression coefficient. We included age, organizational tenure, and course dura-
tion as covariates in step 1, and all Q4TE variables in step 2.
* p < 0.05, ** p < 0.01 (2-sided significance).
professional training evaluation measures: it covers short- and long-term training out-
comes (cf. Wang & Wilcox, 2006) and provides high usability in terms of time efficiency
(cf. Aguinis & Kraiger, 2009). Moreover, it shows promising psychometric properties
(cf. Aiken & Groth-Marnat, 2006).
Our analyses yielded a time-efficient measure for summative training evaluation
that is generalizable to diverse training contents and contexts. We established sound
psychometric properties and demonstrated good or at least satisfactory internal con-
sistency values for all Q4TE scales (Nunnally & Bernstein, 1994, p. 265). In study 1,
the final Q4TE form was successfully identified by means of EFA and CFA. In study
2, CFA results clearly support a model with six latent factors (satisfaction, utility,
knowledge, application to practice, individual organizational results and global
organizational results) over two competing models (following either Wang & Wilcox,
2006 or Kirkpatrick & Kirkpatrick, 2006). Addressing our first research question,
study 2 results underscore the importance of distinguishing single training
outcomes (e.g. satisfaction and utility). However, if one has to aggregate evaluation
data on a higher level in future studies (e.g. if a model is otherwise too complex),
EFA results in study 1 clearly indicate that the distinction between short- and long-
term evaluation following Wang and Wilcox (2006) is appropriate for aggregation. By
contrast, study 2 results revealed no sufficient fit for a model in line with Wang and
Wilcox (2006). Moreover, we found no sufficient fit for a model following Kirkpatrick
and Kirkpatrick (2006), except for an acceptable SRMR value for both models. Yet, the
detailed investigation of the CFA results showed slight model improvements for the
model in line with Wang and Wilcox (2006) compared with Kirkpatrick‘s (1967)
framework. In sum, our analyses clearly support a six-factor solution (satisfaction,
utility, knowledge, application to practice, individual organizational results and
global organizational results) and hint at the appropriateness of distinguishing
between short- and long-term outcomes (cf. Wang & Wilcox, 2006) if aggregating
evaluation data is necessary.
Limitations
The present study has several limitations. First, the psychometric examination of the
Q4TE relied entirely on computer-based, cross-sectional, retrospective samples. As all
scales were measured at the same level of specificity and at the same time, higher
intercorrelations between the Q4TE scales are observed in contrast to values reported
in several meta-analyses based on Kirkpatrick’s four-level framework (see Alliger et al.,
1997). To reduce the potential bias inherent in the present research design, future
research should include a time lag between the short-term (e.g. satisfaction) and long-
term evaluation scales (e.g. application to practice; see Podsakoff et al., 2012). However,
the retrospective online samples used in the three studies offered the opportunity to
obtain three diverse data sets from different organizations and training programs,
while avoiding missing data. These design characteristics were important for our
research aim of developing an inventory that is not training-specific, but widely appli-
cable to professional training evaluation.
Second, the Q4TE consists of self-report items only, which can be a source of common
method bias (e.g. Podsakoff et al., 2012). One possibility to deal with common method
bias is to apply subsequent statistical procedures (for an overview, see Podsakoff et al.,
2012). However, to date, there is still a scientific debate on whether and how to apply
statistical procedures for dealing with common method bias (e.g. Conway & Lance,
2010). As Conway and Lance (2010) pointed out, ‘[n]o post hoc statistical correction
procedure can be recommended until additional research evaluates the relative effec-
tiveness of those that have been proposed’ (p. 332). Furthermore, assessing level 3
(behavior) and level 4 (organizational results) by means of self-reports runs contrary
to some recommendations, e.g. to use behavioral observations to measure level 3
(e.g. Wang & Wilcox, 2006). However, using self-report measures seems appropriate
because the participant himself or herself is widely regarded as a valid data source for
many psychological constructs (Spector, 2006). For example, several studies have
shown that self-report measures reflect specific learning outcomes appropriately (for
an overview, see Kraiger et al., 1993). Furthermore, using standardized self-report
questionnaires is the only possibility to get a quick overview over organization-wide
References
Aguinis, H. and Kraiger, K. (2009), ‘Benefits of training and development for individuals and
teams, organizations, and society’, Annual Review of Psychology, 60, 451–74.
Aiken, L. R. and Groth-Marnat, G. (2006), Psychological Testing and Assessment, 12th edn (Boston,
MA: Pearson Education).
Alliger, G. M. and Janak, E. A. (1989), ‘Kirkpatrick’s levels of training criteria: thirty years later’,
Personnel Psychology, 42, 331–42.
Alliger, G. M., Tannenbaum, S. I., Bennett, W. Jr, Traver, H. and Shotland, A. (1997), ‘A meta-
analysis of the relations among training criteria’, Personnel Psychology, 50, 341–58.
Alvarez, K., Salas, E. and Garofano, C. M. (2004), ‘An integrated model of training evaluation and
effectiveness’, Human Resource Development Review, 3, 385–416.
Arthur, W. Jr, Bennett, W. Jr, Edens, P. S. and Bell, S. T. (2003), ‘Effectiveness of training in
organizations: a meta-analysis of design and evaluation features’, Journal of Applied Psychology,
88, 234–45.
Baldwin, T. T. and Ford, J. K. (1988), ‘Transfer of training: a review and directions for future
research’, Personnel Psychology, 41, 63–105.
Bates, R. A. (2004), ‘A critical analysis of evaluation practice: the Kirkpatrick model and the
principle of beneficence’, Evaluation and Program Planning, 27, 341–7.
Bentler, P. M. and Chou, C.-P. (1987), ‘Practical issues in structural modeling’, Sociological Methods
& Research, 16, 78–117.
Bergkvist, L. and Rossiter, J. R. (2007), ‘The predictive validity of multiple-item versus single-item
measures of the same constructs’, Journal of Marketing Research, 44, 175–84.
Bihler, W. (2006), Weiterbildungserfolg in betrieblichen Lehrveranstaltungen: Messung und Einflussfak-
toren im Bereich Finance & Controlling [Success of Advanced Training in Operating Courses:
Measurement and Determinants in Finance and Controlling] (Wiesbaden: Dt. Univ.-Verlag).
Blau, G., Gibson, G., Bentley, M. and Chapman, S. (2012), ‘Testing the impact of job-related
variables on a utility judgment training criterion beyond background and affective reaction
variables’, International Journal of Training and Development, 16, 54–66.
Blume, B. D., Ford, J. K., Baldwin, T. T. and Huang, J. L. (2010), ‘Transfer of training: a meta-
analytic review’, Journal of Management, 36, 1065–105.
Broad, M. L. (1997), ‘Overview of transfer of training: from learning to performance’, Performance
Improvement Quarterly, 10, 2, 7–21.
Brown, K. G. (2005), ‘An examination of the structure and nomological network of trainee
reactions: a closer look at “smile sheets” ’, Journal of Applied Psychology, 90, 991–1001.
Byrne, B. M. (2005), ‘Factor analytic models: viewing the structure of an assessment instrument
from three perspectives’, Journal of Personality Assessment, 85, 17–32.
Combs, J., Liu, Y., Hall, A. and Ketchen, D. (2006), ‘How much do high-performance work
practices matter? A meta-analysis of their effects on organizational performance’, Personnel
Psychology, 59, 501–28.