Академический Документы
Профессиональный Документы
Культура Документы
C O M
August 2006
here is an old adage commonly cited in management circles, “If you can’t meas-
T ure it, you can’t manage it.” Most executives would likely agree that managing
their organization’s training function is an essential responsibility of the enter-
prise. Therefore, measuring and evaluating the effectiveness of that function must be a
priority of the professionals charged with this responsibility. Yet, a major challenge
© 2006 The eLearning Guild. All rights reserved. http://www.eLearningGuild.com
faces these professionals: how best to perform such measurement and evaluation, and
report the results in a timely, cost effective, and useful manner. Where can they find a
method or system to address this challenge?
Many training professionals turn to Kirkpatrick’s four Yet, despite its status as an industry standard,
levels because it has become an industry standard for many studies, including one conducted by Kirkpatrick
evaluating training programs over the course of forty- himself, have shown that the full taxonomy is not wide-
seven years in the literature. First described by Donald ly used beyond the first two levels. This pattern of
Kirkpatrick in 1959, this standard provides a simple usage means that training practitioners might not be
taxonomy comprising four criteria of evaluation (Kirk- fully measuring, and therefore effectively managing, the
patrick originally called them steps or segments, but impact that training and development has on two of the
over the years they have become known as levels). The most important reasons for funding and providing
structure of the four level taxonomy suggests that each resources for training in the first place: improvements
level after the first succeeds from the prior level. The in workplace performance and positive business or
first level measures the student’s reaction to the train- organizational results.
ing and the second level what the student learned. The Several important questions come up. Why are not
third level measures change in on-the-job behavior due all the levels of the taxonomy as described by Kirk-
to the training, and the fourth, the results in terms of patrick used more widely by training professionals? If
specific business and financial goals and objectives for the measurement of training is a critical task, and the
the organization. Theoretically, one level of evaluation industry boasts of a standard for evaluation that is
leads to the next. almost fifty years old, then why does so much impor-
R E S E A R C H R E P O R T / K i r k p a t r i c k ’s F o u r L e v e l s o f Tr a i n i n g E v a l u a t i o n
28 Summary
Demographics
We asked our respondents to identify themselves and their organizations by five attributes: their role in their organization, the size of
their organization, the type of their organization, their organization’s primary business focus, and the department they work for. This sec-
tion presents the demographic data of our survey sample.
This survey, like all other Guild surveys, was open to Guild Members and Associates as well as to occasional web-site visitors. These
surveys are completed by accessing the survey link on the homepage of the Guild website. Naturally, Guild Members and Associates are
more likely than non-members to participate, because each of the more than 22,100 Members and Associates receive an email notify-
ing them of the survey and inviting them to participate. For this reason, we can classify this survey as a random sample because all
Members have an opportunity to participate, and their participation is random.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Demographics
2% Sales or Marketing
2% Engineering or Product Development
2% Customer Service
2% Research and Development
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
4
sources and training departments sponsored and conducted most tinue existing programs, Kirkpatrick argues that the third reason is
training, just as they still do in most organizations, training pro- “... to justify the existence of the training department” (Kirkpatrick,
grams and courses were almost exclusively classroom-based and 1994, p. 18). Therefore, one of Kirkpatrick’s primary objectives was
led by an instructor or subject matter expert. Computer-assisted to give training professionals some guidelines and suggestions for
self-study was still in its infancy, and the possibility of blending the showing their management that the efforts of the training depart-
classroom experience with pre-class and post-class asynchronous ment had value and were worth its cost.
e-Learning was literally decades away. In addition, human capital In these articles, Kirkpatrick proposed that evaluating training is
development as a strategy for competitive advantage did not enjoy a four-step process, with each step leading to the next in succes-
the same level of acceptance that it does today, and there was far sion from one to four. He named and defined the four steps or seg-
less need to provide employees with continuing education for pro- ments as (1) “reaction” or “how well trainees like a particular pro-
fessional development in order to maintain a knowledgeable and gram”; (2) “learning” or “a measure of the knowledge acquired,
skilled workforce. As a result, there was much less job security in skills improved, or attitudes changed due to training”; (3) “behav-
the training department. Finally, the task of training evaluation was ior” or “a measure of the extent to which participants change their
5
on-the-job behavior because of training”; and (4) “results” or “a ing has taken place. Nor is that an indication that participants’
measure of the final results that occur due to training, including behavior will change because of the training. Still farther away is
increased sales, higher productivity, bigger profits, reduced costs, any indication of results that one can attribute to the training (p.
less employee turnover, and improved quality.” (Kirkpatrick, 1996, 55).”
p. 54 - 56). Kirkpatrick describes these steps in quite general Kirkpatrick also acknowledges that evaluation at steps 3
terms, yet he readily acknowledges that the level of work and (behavior) and 4 (results) is more difficult than at steps 1 (reac-
expertise required by each successive step in the evaluation tion) and 2 (learning) because these steps require “... a more sci-
process is more complex and difficult than in its predecessor entific approach and the consideration of many factors ...” (p. 58)
step. such as motivation to improve, work environment, and opportunity
Kirkpatrick concludes his presentation of the four steps with the to practice the newly acquired knowledge or skills. He refers to
hope that “... the training directors who have read and studied the problem of the “separation of variables,” which raises the
these articles are now clearly oriented on the problems and question of what other factors, in addition to the training, might
approaches in evaluating training” (Kirkpatrick, 1996, p. 59). have effected the behavior and results. These intervening vari-
Kirkpatrick describes his four steps as an orientation, a way of ables certainly impact results at Levels 3 and 4, but are not
breaking down a complex process involving many variables and necessarily within the purview or the range of experience of
data collection challenges into four clearly delineated and logically most training evaluation practitioners. Kirkpatrick is clear that
ordered parts, which are theoretically sequential in nature, but “Eventually, we may be able to measure human relations training
only loosely connected in practice. For example, he wants practi- in terms of dollars and cents. But at the present time, our
tioners to see that they can get started with the evaluation research techniques are not adequate” (p. 59).
process by completing the relatively simple task of measuring the These four articles lay the groundwork for a simple approach to
students’ reaction to a course. At the same time he recognizes evaluating training that Kirkpatrick hoped would be enough to get
that the information gleaned in the succeeding steps will be rela- training professionals started. He did not know that this approach
tively more significant even as the steps will be even more diffi- would become the de facto industry standard in the ensuing
cult to design and implement. He suggested, “When training direc- decades. His aim was more simple,
tors effectively measure participants’ reactions and find them “It’s hoped that the training directors who have read and stud-
favorable, they can feel proud. But they should also feel humble; ied these articles are now clearly oriented on the problems and
the evaluation has only just begun” (p. 55). approaches in evaluating training. We training people should
In anticipation of the criticism that was yet to come, Kirkpatrick carefully analyze future articles to see whether we can borrow
points out quite clearly that a positive evaluation of one of the the techniques and procedures described (p. 59).”
steps does not guarantee or even imply that there will be a posi- Kirkpatrick wanted to jump-start the industry with his four sim-
tive evaluation in another step. In doing so, he admits that there ple steps to evaluation in the hope that practitioners would work
may not be a correlation among the results of four steps of evalu- things out as they used this approach, buying time and resources
ation, but as will be shown, he often implies that there should be, as they evolved and refined the practice.
without offering a theoretical or researchable basis for such a The findings presented in this report provide a glimpse of how
claim. far today’s training practitioners, as represented by Members and
“Even though a training director may have done a masterful job Associates of The eLearning Guild community, have evolved and
measuring trainees’ reactions, that’s no assurance that any learn- refined the practice.
R E S E A R C H R E P O R T / August 2006
6
7a. Level 1: “Reaction — How students react to the training” 4.34 85%
7b. Level 2: “Learning — The extent to which students 3.57 57%
change attitudes, improve knowledge, and/or increase skill
as a result of the training”
7c. Level 3: “Behavior — The extent to which on-the-job 2.65 20%
behavior or performance has changed and/or improved as
a result of the training”
7d. Level 4: “Results — The extent to which desired business 2.11 13%
and/or organizational results have occurred as a result of
the training” 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
These results are similar to those of many studies taken over years since Kirkpatrick’s 1968 research, including several recent stud-
ies published by the Guild (e.g., Metrics: Learning Outcomes and Business Results and Metrics and Measurement 2005 Research
Report). The data of such studies generally show that usage of the Kirkpatrick four levels declines with each succeeding level and that
usage of Levels 3 and 4 is consistently below 50%, and in many cases at the lower levels reported in these findings. Note in chart 7c.
that Level 3 evaluations are “Never” or “Rarely” conducted by 47% of respondents’ organizations, and in chart 7d. that Level 4 are
“Never” or “Rarely” conducted by an even larger 74%. Granted, there are other evaluations methods and systems, and some of these
organizations may use them instead of Kirkpatrick. The point remains, however, that even after almost fifty years in practice, usage of
Levels 3 and 4 has not grown as significantly as Kirkpatrick might have hoped.
R E S E A R C H R E P O R T / August 2006
7
7a. Kirkpatrick Level 1: “Reaction — How students 7b. Kirkpatrick Level 2: “Learning — The extent to
react to the training” which students change attitudes, improve knowl-
edge, and/or increase skill as a result of the
training”
Average Rating: 4.34 Average Rating = 3.57
7c. Kirkpatrick Level 3: “Behavior — The extent 7d. Kirkpatrick Level 4: “Results — The extent to
to which on-the-job behavior or performance which desired business and/or organizational
has changed and/or improved as a result of results have occurred as a result of the
the training” training”
Average Rating: 2.65 Average Rating: 2.11
6% 5 = Always 4% 5 = Always
14% 4 = Frequently 9% 4 = Frequently
33% 3 = Sometimes 13% 3 = Sometimes
34% 2 = Rarely 41% 2 = Rarely
13% 1 = Never 33% 1 = Never
R E S E A R C H R E P O R T / August 2006
8
Q8. Summary of Average Ratings and Percen- Average Percentage “Highly Important” or “Very Important”
tages of the reasons why respondents’ Rating
organizations use Kirkpatrick Level 3 (Scale
evaluations. 1 - 5)
8a. To demonstrate the actual impact that training has on 4.17 80%
employee on-the-job performance
8b. To gain information on how to improve future training 4.02 78%
programs
8c. To determine that the desired change in employee 4.01 74%
on-the-job performance has been achieved
8d. To decide whether to continue or discontinue a training 3.22 44%
program
8e. To justify the budget allocated to the design and delivery 3.18 42%
of training
8f. To justify the existence of the training department by 3.17 44%
showing how it contributes to the organization’s objec-
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
tives and goals
Our respondents whose organizations use Level 3 indicate that the most important reason to do so is “To demonstrate the actual
impact that training has on employee on-the-job performance.” This reason is followed closely by “To gain information on how to
improve future training programs” and “To determine that the desired change in employee on-the-job performance has been achieved.”
One of Kirkpatrick’s three reasons, “To justify the existence of the training department ...” is the least important. Perhaps these organi-
zations are more sophisticated in their approach to employee development and, as such, the justification of the training is implicit, and
the organization’s desire to measure and manage its impact on employee on-the-job performance is strong and well supported.
R E S E A R C H R E P O R T / August 2006
9
8a. To demonstrate the actual impact that training 8b. To gain information on how to improve future
has on employee on-the-job performance training programs
8c. To determine that the desired change in 8d. To decide whether to continue or discontinue a
employee on-the-job performance has been training program
achieved
8e. To justify the budget allocated to the design 8f. To justify the existence of the training depart-
and delivery of training ment by showing how it contributes to the
organization’s objectives and goals
We asked our respondent to rate on a scale of 1 - 5 the value to their organization of the data obtained from Kirkpatrick Level 3 eval-
uations in terms of measuring a) the effectiveness of training programs and b) the desired change in employee on-the-job performance.
Q9. Summary of Average Ratings and Percen- Average Percentage “Highly Valuable” or “Very Valuable”
tages of the Value of Evaluation Data in Rating
Terms of Measuring Two Outcomes (Scale
1 - 5)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Those respondents whose organizations use Kirkpatrick Level 3 evaluation report that the data they obtain is quite valuable both in
terms of measuring “The desired change in employee on-the-job performance” and “The effectiveness of training programs.”
Significantly, 0% of respondents report that these data have no value, and very few (3% to 5%) indicate that they are not very valuable.
These high levels of data value for such a large group hint at several possibilities. First, our sample population of Level 3 practition-
ers must be following some best practices in order to obtain this quality of data and then to apply those data to the proper evaluation
criteria. Second, these data and the best practices followed may be associated with the specific intervening variables measured during
the process (See Question 10). Third, it would seem that if done properly, Level 3 evaluation is well worth doing.
Detailed Average Ratings and Percentages of The Value of Evaluation Data in Terms of Measuring Two Outcomes
9a. The desired change in employee on-the-job 9b. The effectiveness of training programs
performance
R E S E A R C H R E P O R T / August 2006
11
We asked our respondents to rate on a scale of 1 - 5 the extent to which their organizations’ Kirkpatrick Level 3 evaluations include
consideration of each of several intervening variables.
One of the difficulties of evaluating the effectiveness of training programs at the level of “behavior” or “performance” is that so many
different variables outside of the training program purview may affect achieving or not achieving the desired outcomes. In an attempt to
determine the extent to which Level 3 practitioners consider some of these variables in the evaluation process, we provided respon-
dents with a selection of five intervening variables.
Q10. Summary of Average Ratings and Percen- Average Percentage “Always” or “Frequently”
tages of Frequency of Consideration of Rating
Intervening Variables When Conducting (Scale
Kirkpatrick Level 3 Evaluations 1 - 5)
These findings show that while all five of the given variables are commonly measured as part of Level 3 evaluations (a point to be
remembered in terms of the high value of data obtained — See Question 9), there are slight differences in frequency among them.
“Successful learning” is the variable our respondents’ organizations most often consider in the evaluation process — in other words,
the results of a Level 2 evaluation. Thus, demonstrating, rather than assuming, a correlation between Level 2 and Level 3 outcomes is
a primary consideration by successful evaluation practitioners.
However, we note that our respondents’ organizations give the same level of attention to “Whether the student has the opportunity to
apply what was learned in practice and/or on-the-job situations.” By doing so, the evaluators are likely to make the connection between
“learning” and “performing” by assessing whether sufficient practice time has been allowed outside the “classroom” for the student to
reinforce and retain the learning in the arena of real performance.
R E S E A R C H R E P O R T / August 2006
12
31% 5 = Always
34% 5 = Always
42% 4 = Frequently
37% 4 = Frequently
22% 3 = Sometimes
23% 3 = Sometimes
4% 2 = Rarely
6% 2 = Rarely
1% 1 = Never
0% 1 = Never
10c. Whether the student perceives that the train- 10d. Whether the student is motivated to transfer
ing has satisfied his/her need for perform- learning to on-the-job performance
ance-related learning
2% 1 = Never 2% 1 = Never
21% 5 = Always
30% 4 = Frequently
R E S E A R C H R E P O R T / August 2006
33% 3 = Sometimes
13% 2 = Rarely
3% 1 = Never
13
We asked our respondents to rate on a scale of 1 - 5 the degree of challenge for each of several issues that their organization may
have dealt with in order to use Kirkpatrick Level 3 evaluation. These issues are among those commonly cited in the literature by
Kirkpatrick and others as obstacles to using Level 3.
Q11. Summary of Average Ratings and Average Percentage “Highly Challenging” or “Very
Percentages of The Challenges of Rating Challenging”
Implementing Kirkpatrick Level 3 (Scale
1 - 5)
If the findings presented for Questions 8 to 10 provide some indication that Level 3 evaluators find value in the results of their prac-
tice, and hint at some of the reasons why they derive this value, then it is worth examining what issues they had to deal with in honing
their practice and achieving the results. As indicated by the low percentage of Level 3 usage (See Question 7), and the observations of
training evaluation experts, including Kirkpatrick himself, Level 3 evaluation is not easy. These data give us some perspective on where
the difficulties lie.
We see that the average “challenge” rating for all of the issues faced falls somewhere between “fairly challenging” and “very challeng-
ing.” Relatively speaking, however, we note that “time required” and “access to the data required” stand out, and these two selections
seem underscored by the fact that making Level 3 evaluation a priority for training professionals is also quite challenging.
R E S E A R C H R E P O R T / August 2006
14
11a. The time required to conduct Level 3 evalua- 11b. Gaining access to the data required to con-
tions duct a Level 3 evaluation
Average Rating: 3.60 Average Rating: 3.46
11c. Making Level 3 evaluations a priority for 11d. The expertise required to conduct Level 3
HRD and training professionals evaluations
Average Rating: 3.37 Average Rating: 3.28
11e. Gaining management support for Level 3 11f. The cost of conducting Level 3 evaluations
evaluations
Average Rating: 3.16 Average Rating: 3.07
R E S E A R C H R E P O R T / August 2006
Question 12. The Reasons Why Organizations Do Not Use Kirkpatrick Level 3 Evaluations.
Note: We asked respondents whose organizations “Never” or “Rarely” use Kirkpatrick Level 3 to answer Question 12 because this
question pertains specifically to non-usage of Kirkpatrick Level 3 evaluations. Respondents whose organizations “Sometimes,”
“Frequently,” or “Always” use Kirkpatrick Level 3 evaluations did not answer Question 12.
We asked our respondents to rate on a scale of 1 - 5 the relative importance of each of several reasons why their organization never,
or only rarely, uses Kirkpatrick Level 3. We provided respondents with seven reasons that their organizations might not use Level 3 eval-
uation. Note that these reasons relate directly to the challenging issues faced by those respondents who use Level 3 evaluations (See
Question 11).
Q12. Summary of Average Ratings and Percen- Average Percentage “Highly important” or “Very Important”
tages of The Reasons Why Organizations Rating
Do Not Use Kirkpatrick Level 3 (Scale
Evaluation 1 - 5)
12a. Difficulty accessing the data required for a Level 3 3.79 65%
evaluation
12b. No management support to conduct Level 3 evaluation 3.76 63%
12c. Too time consuming to conduct Level 3 evaluation 3.63 57%
12d. Level 3 evaluation is not considered a relatively impor- 3.27 46%
tant or urgent priority for the training department
12e. Too costly to conduct Level 3 evaluation 3.11 38%
12f. We do not have the required expertise to conduct Level 2.78 30%
3 evaluation
12g. Levels 1 and/or 2 evaluations are all that is needed to 2.11 14%
determine effectiveness of training programs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
The top two reasons for not using Kirkpatrick Level 3 evaluation reported by our respondents whose organizations do not use Level 3
are “Difficulty accessing the data required ...” and “No management support ...” The first reason corresponds to the second rated chal-
lenge reported by respondents whose organizations do use Level 3 evaluation, “Gaining access to the data required” (See Question
16).
The time required to conduct Level 3 evaluations seems to be much more significant a reason not to do so than the cost of conduct-
ing such evaluations. Again, this finding corresponds to the relative challenge of time and cost as issues faced by those who do use
Level 3.
One reason in particular does not seem to be much of a factor. We see from these results that few organizations do not conduct
Level 3 evaluations because they believe “Levels 1 and/or 2 evaluation are all that is needed to determine effectiveness of training pro-
grams.”
R E S E A R C H R E P O R T / August 2006
16
12a. Difficulty accessing the data required for a 12b. No management support to conduct Level
Level 3 evaluation 3 evaluation
Average Rating: 3.79 Average Rating: 3.76
12c. Too time consuming to conduct Level 3 12d. Level 3 evaluation is not considered a rela-
evaluation tively important or urgent priority for the
training department
Average Rating: 3.63
Average Rating: 3.27
28% 5 = Highly important 24% 5 = Highly important
29% 4 = Very important 22% 4 = Very important
26% 3 = Fairly important 22% 3 = Fairly important
12% 2 = Not very important 20% 2 = Not very important
5% 1 = Not at all important 12% 1 = Not at all important
6% 5 = Highly important
8% 4 = Very important
19% 3 = Fairly important
26% 2 = Not very important
41% 1 = Not at all important
17
Question 13. The reasons why respondents’ organizations use Kirkpatrick Level 4.
In regard to their organization’s use of Kirkpatrick Level 4, we asked our respondents to rate on a scale of 1 - 5 the importance of
each of several reasons why their organization uses Kirkpatrick Level 4 to evaluate training programs.
We provided respondents with a selection of six reasons why organizations might use Kirkpatrick Level 4, including three reasons pro-
posed by Kirkpatrick himself: to gain information on how to improve future programs, to decide whether to continue existing programs,
and to justify the existence of the training department. To these three we added two reasons concerning measurement of the specific
criteria of Level 4 (business results) and one reason concerning justification of the training budget.
Q13. Summary of Average Ratings and Percen- Average Percentage “Highly Important” or “Very Important”
tages of the reasons why respondents’ Rating
organizations use Kirkpatrick Level 4 (Scale
1 - 5)
13a. To demonstrate the actual impact that training has on 4.10 76%
business results
13b. To determine that the desired change in business 4.09 80%
results has been achieved (13f)
13c. To gain information on how to improve future training 3.91 71%
programs (13b)
13d. To justify the budget allocated to the design and deliv- 3.50 51%
ery of training (13d)
13e. To decide whether to continue or discontinue a train- 3.43 52%
ing program (13a)
13f. To justify the existence of the training department by 3.20 43%
showing how it contributes to the organization’s objec-
tives and goals (13c) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Our respondents whose organizations use Level 4 indicate that the most important reason to do so is “To demonstrate the actual
impact that training has on business results.” This reason is followed closely by “To determine that the desired change in business
results has been achieved” and “To gain information on how to improve future training programs.” One of Kirkpatrick’s three reasons,
“Justifying the existence of the training department” is the least important.
These findings parallel those presented for Question 8 in which we asked respondents why their organizations use Level 3 evalua-
tion. As we noted in that case, these organizations may be more sophisticated in their approach to employee development and, as
such, the justification of the training department is implicit, and the organization’s desire to measure and manage its impact on busi-
ness results is strong and well supported.
We might conclude from the results of both Questions 8 and 13 that Kirkpatrick’s three key reasons for conducting training evalua-
R E S E A R C H R E P O R T / August 2006
tion are not the primary motivations for doing Level 3 or Level 4 evaluations. It would seem that unless an organization has a strong
desire to specifically measure the actual criteria of Levels 3 and 4 (employee performance and business results), then the traditional
Kirkpatrick rationale for evaluation might not be enough to drive usage of Level 3 and 4 evaluations. This possibility, strongly supported
by these data, provides one explanation for the infrequency of usage of Levels 3 and 4 relative to Levels 1 and 2.
18
Detailed Average Ratings and Percentages of the Reasons why respondents’ organizations use Kirkpatrick Level 4
13a. To demonstrate the actual impact that 13b. To determine that the desired change in
training has on business results business results has been achieved
Average Rating: 4.10 Average Rating: 4.09
13c. To gain information on how to improve future 13d. To justify the budget allocated to the
training programs design and delivery of training
Average Rating: 3.91 Average Rating: 3.50
13e. To decide whether to continue or discontinue 13f. To justify the existence of the training depart-
a training program ment by showing how it contributes to the
organization’s objectives and goals
Average Rating: 3.43
Average Rating: 3.20
13% 5 = Highly important
39% 4 = Very important
15% 5 = Highly important
32% 3 = Fairly important
28% 4 = Very important
11% 2 = Not very important
24% 3 = Fairly important
5% 1 = Not at all important
27% 2 = Not very important R E S E A R C H R E P O R T / August 2006
6% 1 = Not at all important
19
Q14. Summary of Average Ratings and Per- Average Percentage “Highly Valuable” or “Very Valuable”
centages of The Value of Level 4 Eval- Rating
uation Data in Terms of Measuring: (Scale
1 - 5)
14a. The desired business and/or organizational results 4.08 74%
14b. The effectiveness of training programs (14a) 3.97 68%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Those respondents whose organizations use Kirkpatrick Level 4 evaluation report that the data they obtain is quite valuable both in
terms of measuring “The desired business and/or organizational results” and “The effectiveness of training programs.” Significantly,
only 1% of respondents report that these data have no value, and only 2% indicate that they are not very valuable.
These high levels of data value for such a large group hint at several possibilities. First, our sample population of Level 4 practition-
ers must be following some best practices in order to obtain this quality of data and then to apply those data to the proper evaluation
criteria. Second, these data, and the best practices followed, may be associated with the specific intervening variables measured during
the process (See Question 15). Third, it would seem that if done properly, Level 4 evaluation is well worth doing.
Detailed Average Ratings and Percentages of The Value of Level 4 Evaluation Data
14a. The desired business and/or organizational 14b. The effectiveness of training programs
results
Question 15. Consideration of Intervening Variables When Conducting Kirkpatrick Level 4 Evaluations
We asked our respondents to rate on a scale of 1 - 5 the extent to which their organization’s Kirkpatrick Level 4 evaluations include
consideration of each of the several variables.
One of the difficulties of evaluating the effectiveness of training programs at the level of “business or organizational results” is that
so many different variables outside of the training program purview may affect achieving or not achieving the desired outcomes. In an
attempt to determine the extent to which Level 4 practitioners consider some of these variables in the evaluation process, we provided
respondents with a selection of five intervening variables.
Q15. Summary of Average Ratings and Per- Average Percentage “Always” or “Frequently”
centages of Frequency of Consideration Rating
of Intervening Variables When Conduct- (Scale
ing Kirkpatrick Level 4 Evaluations 1 - 5)
These findings show that while all six of the given variables are commonly measured as part of Level 4 evaluations (a point to be
remembered in terms of the high value of data obtained — See Question 14), there are slight differences in frequency among them.
“Alignment of training with business results” is the variable our respondents’ organizations most often consider in the Level 4 evalua-
tion process — in other words, how well the design of a training program responds to the demands of the business itself. This result
supports the notion that the most effective use of the four levels begins with consideration of the desired business results and works
backwards to the training.
However, we note that our respondents’ organizations also give high level of attention to factors clearly outside the purview of train-
ing. By doing so, these evaluators are likely to make a more realistic connection between “learning,” “performing,” and “results” by
weighing other variables, such as stakeholder support, employee motivation, and the competitive climate, and judging their impact.
Clearly, Level 4 evaluation requires the ability to evaluate factors that are outside of, yet work along with, the training program.
R E S E A R C H R E P O R T / August 2006
21
15a. Alignment of training with desired business 15b. Stakeholder support for achieving desired
results business results
Average Rating: 4.11 Average Rating: 3.78
1% 1 = Never 1% 1 = Never
15c. Impact of employee behavior or motivation 15d. Organizational capability for achieving
on desired business results desired business results
Average Rating: 3.78
Average Rating: 3.76
22% 5 = Always
41% 4 = Often 20% 5 = Always
2% 1 = Never 8% 2 = Rarely
3% 1 = Never
12% 2 = Rarely
7% 1 = Never
22
Q16. Summary of Average Ratings and Percen- Average Percentage “Highly Challenging” or “Very Challenging”
tages of The Challenges of Implementing Rating
Kirkpatrick Level 4 (Scale
1 - 5)
16a. Gaining access to the data required to conduct Level 4 3.77 63%
evaluations
16b. The time required to conduct Level 4 evaluations 3.75 63%
16c. The expertise required to conduct Level 4 evaluations 3.49 50%
16d. The cost of conducting Level 4 evaluations (16b) 3.36 47%
16e. Gaining management support for Level 4 evaluations 3.10 39%
16f. Making Level 4 evaluations a priority for HRD and train- 3.09 38%
ing professionals
16g. Overcoming the belief or opinion that Levels 1 and/or 2 2.72 29%
evaluations are sufficient to determine the effective-
ness of training 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
If the findings presented for Questions 13 to 15 provide some indication that Level 4 evaluators find value in the results of their prac-
tice, and hint at some of the reasons why they derive this value, then it is worth examining what issues they had to deal with in honing
their practice and achieving the results. As indicated by the low percentage of Level 4 usage (See Question 7), and the observations of
training evaluation experts, including Kirkpatrick himself, Level 4 evaluation is not easy. These data give us some perspective on where
the difficulties lie.
We see that the average “challenge” rating for all but one of the issues faced falls somewhere between “Fairly challenging” and “Very
challenging.” Relatively speaking, however, we note that “access to the data required” and “time required” stand out as they did for
Level 3 evaluations (See Question 11). However, we note “expertise required” is a more significant challenge (3.49 — 50%) for Level 4
evaluations than for Level 3 (See Question 11: 3.28 — 43%).
In regard to the challenges for both Level 3 and Level 4 evaluations, we note that “Overcoming the belief or opinion that Levels 1
and/or 2 evaluations are sufficient to determine the effectiveness of training” rate on average as “Not very challenging.” Many experts
have criticized Kirkpatrick’s four level approach because many training practitioners assume that positive outcomes at Level 1 and 2
imply positive outcomes at Levels 3 and 4, and therefore these “higher level” evaluations are not necessary. These data show that for
those evaluating at Levels 3 and 4, these types of assumptions are not much of an obstacle.
R E S E A R C H R E P O R T / August 2006
23
Detailed Average Ratings and Percentages of The Challenges of Implementing Kirkpatrick Level 4
16a. Gaining access to the data required to con- 16b. The time required to conduct Level 4 evalua-
duct Level 4 evaluations tions
Average Rating: 3.77 Average Rating: 3.75
16c. The expertise required to conduct Level 4 16d. The cost of conducting Level 4 evaluations
evaluations
Average Rating: 3.49 Average Rating: 3.36
16e. Gaining management support for Level 4 16f. Making Level 4 evaluations a priority for
evaluations HRD and training professionals
Average Rating: 3.10 Average Rating: 3.09
9% 5 = Highly challenging
20% 4 = Very challenging
24% 3 = Fairly challenging
29% 2 = Not very challenging
18% 1 = Not at all challenging
24
We asked our respondents to rate on a scale of 1 - 5 the relative importance of each of several reasons why their organization never,
or only rarely, uses Kirkpatrick Level 4. We provided respondents with seven reasons that their organizations might not use Level 4 evalu-
ation. Note that these reasons relate directly to the challenging issues faced by those respondents who use Level 4 evaluations (See
Question 16).
Q17. Summary of Average Ratings and Per- Average Percentage “Highly Important” or “Very Important”
centages of The Reasons Why Organiza- Rating
tions Do Not Use Kirkpatrick Level 4 (Scale
Evaluation 1 - 5)
17a. Difficulty accessing the data required for a Level 4 4.07 74%
evaluation
17b. Too time consuming to conduct Level 4 evaluation 3.81 65%
17c. No management support to conduct Level 4 evaluation 3.63 59%
17d. Level 4 evaluation is not considered a relatively impor- 3.39 48%
tant or urgent priority for the training department
17e. Too costly to conduct Level 4 evaluation 3.38 47%
17f. We do not have the required expertise to conduct Level 3.11 42%
4 evaluation
17g. Levels 1 and/or 2 evaluations are all that is needed to 2.32 17%
determine effectiveness of training programs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
The top two reasons for not using Kirkpatrick Level 4 evaluation reported by our respondents whose organizations do not use Level 4
are “Difficulty accessing the data required ...” and “Too time consuming to conduct... .” These reasons correspond to the top two chal-
lenges reported by respondents whose organizations do use Level 4 evaluation, “Gaining access to the data required” and “The time
required” (See Question 16).
A lack of management support for Level 4 evaluation as well as low urgency and prioritization by the training department are also sig-
nificant inhibitors to using Level 4 evaluations.
One reason in particular does not seem to be much of a factor. We see from these results that few organizations do not conduct
Level 4 evaluations because they believe “Levels 1 and/or 2 evaluations are all that is needed to determine effectiveness of training
programs.”
R E S E A R C H R E P O R T / August 2006
25
17a. Difficulty accessing the data required for a 17b. Too time consuming to conduct Level 4 evalu-
Level 4 evaluation ation
Average Rating: 4.07 Average Rating: 3.81
17c. No management support to conduct Level 17d. Level 4 evaluation is not considered a rela-
4 evaluation tively important or urgent priority for the
Average Rating: 3.63 training department
Average Rating: 3.39
33% 5 = Highly important
26% 4 = Very important 23% 5 = Highly important
21% 3 = Fairly important 25% 4 = Very important
12% 2 = Not very important 27% 3 = Fairly important
8% 1 = Not at all important 16% 2 = Not very important
9% 1 = Not at all important
7% 5 = Highly important
10% 4 = Very important
24% 3 = Fairly important
24% 2 = Not very important
35% 1 = Not at all important
26
19. Rate on a scale of 1 to 5 the importance of 20. Rate on a scale of 1 to 5 the importance of
competitive pressures in your organization’s your organization’s need to maintain a
market sector as a factor in establishing your knowledgeable and skilled work force as a
organization’s level of expenditure on training factor in establishing your organization’s
for employees. level of expenditure on training for employ-
ees. (Select only one)
Average Rating: 3.27
Average Rating: 4.14
16% 5 = Highly important
45% 5 = Highly important
28% 4 = Very important
33% 4 = Very important
30% 3 = Fairly important
17% 3 = Fairly important
19% 2 = Not very important
3% 2 = Not very important
7% 1 = Not at all important
2% 1 = Not at all important
Does competitive pressure drive expenditure on training? For Does the need for a knowledgeable and skilled workforce drive
74% of our respondents, this factor is at least fairly important in expenditure on training? For 95% of our respondents, this factor
the funding process. In addition, the findings show that as this is at least fairly important in the funding process. In addition, the
factor increases in importance for an organization, so too does findings show that as this factor increases in importance for an
its usage of Kirkpatrick Levels 3 and 4. organization, so too does its usage of Kirkpatrick Levels 3 and 4.
Summary
Most training professionals would likely agree that the practice of training evaluation has come a long way since Kirkpatrick first pub-
lished on the topic in 1959 and gave the industry his four step taxonomy, which, for better or worse, later became known as the four
levels model. Yet, despite Kirkpatrick’s own hopes, the use of this taxonomy is often limited beyond the first two levels because of the
many difficult challenges raised by Level 3 and 4 evaluation. Nonetheless, this research report shows that those organizations who do
meet these challenges derive significant value from the data obtained from their Level 3 and 4 evaluation efforts, especially in terms of
measuring the impact of training on employee on-the-job performance and desired business results. Significantly, these findings show
that those organizations who do use Levels 3 and 4 are also likely to cite the importance of competitive pressures and their need for a
knowledgeable and skilled workforce as driving factors in the funding of their training programs.
R E S E A R C H R E P O R T / August 2006
28
References:
Alliger, G. M., & Janak, E. A. (1989). Kirkpatrick’s levels of training criteria: thirty years later. Personnel Psychology, 42(2), 331-342.
Catalanello, R. F., & Kirkpatrick, D. L. (1968). Evaluating Training Programs — The State of the Art. Training and Development Journal,
22(5), 2-9.
Holton, E. F. (1996). The Flawed Four-Level Evaluation Model. Human Resource Development Quarterly, 7(1), 5-21.
Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of ASTD, 13(11), 3-9.
Kirkpatrick, D. L. (1959). Techniques for evaluating training programs: Part 2 — Learning. Journal of ASTD, 13(12), 21-26.
Kirkpatrick, D. L. (1960). Techniques for evaluating training programs: Part 3 — Behavior. Journal of ASTD, 14(1), 13-18.
Kirkpatrick, D. L. (1960). Evaluating training programs: Part 4 — Results. Journal of ASTD, 14(2), 28-32.
Kirkpatrick, D. L. (1976). Evaluation of Training. In R. L. Craig (Ed.), Training & Development Handbook (Second ed., pp. 18-11:18-27).
New York: McGraw-Hill Book Company.
Kirkpatrick, D. L. (1977). Evaluating training programs: evidence vs. proof. Training and Development Journal, 31(11), 9-12.
Kirkpatrick, D. L. (1994). Evaluating Training Programs: The Four Levels (First ed.). San Francisco: Berrett-Koehler.
Kirkpatrick, D. L. (1998). Evaluating Training Programs: The Four Levels (Second ed.). San Francisco: Berrett-Koehler Publishers, Inc.
Kirkpatrick, D. L., & Kirkpatrick, J. D. (2005). Transferring Learning to Behavior. San Francisco: Berrett-Koehler Publishers, Inc.
Newstrom, J. W. (1978). Catch-22: the problems of incomplete evaluation of training. Training and Development Journal, 32(11), 22-24.
R E S E A R C H R E P O R T / August 2006
Newstrom, J. W. (1995). Evaluating Training Programs: The Four Levels. Human Resource Development Quarterly, 6(3), 317-320.
O’Driscoll, T., Sugrue, B., & Vona, M. K. (2005). The C-Level and the Value of Learning. TD, 7.
Pulichino, J. (2004). Metrics: Learning Outcomes and Business Results Research Report. Santa Rosa: The eLearning Guild.
Pulichino, J. (2005). Metrics and Measurement 2005 Research Report. Santa Rosa: The eLearning Guild.
Pulichino, J. (2006). Usage and Value of Kirkpatrick’s Four Levels. Unpublished Dissertation, Pepperdine University, Malibu.
This survey generated responses from over 550 Members and Associates.
29
Guild members represent a diverse group of instructional designers, content developers, Web developers, project managers, contractors,
consultants, managers and directors of training and learning services — all of whom share a common interest in e-Learning design, develop-
ment, and management. Members work for organizations in the corporate, government, academic, and K-12 sectors. They also are employ-
ees of e-Learning product and service providers, consultants, students, and self-employed professionals.
More than 22,100 Members and Associates of this growing, worldwide community look to the Guild for timely, relevant, and objective
information about e-Learning to increase their knowledge, improve their professional skills, and expand their personal networks.
The eLearning Guild’s Learning Solutions Magazine is the premier weekly online publication of The
eLearning Guild. Learning Solutions practical strategies and techniques for designers, developers, and
managers of e-Learning.
The eLearning Guild organizes a variety of industry events focused on participant learning:
R E S E A R C H R E P O R T / August 2006
TBA TBA