Measuring & Evaluating Learning
Measuring & Evaluating Learning
Measuring and evaluating learning has earned a place among the critical issues in the
Workplace Learning and Performance (WLP) field. For almost a decade, this topic has been on
conference agendas and at professional meetings. Journals and newsletters regularly embrace
the concept with increasing print space. At ASTD, a 500-member professional organization has
been organized (the ROI Network) to exchange information on measurement and evaluation.
More than 25 books provide significant coverage of the topic. Even top executives have stepped
up their appetite for evaluation data.
Although the interest in the topic has heightened and much progress has been made, it is
still an issue that challenges even the most sophisticated and progressive WLP departments.
While some professionals argue that it is very difficult to have a successful evaluation process,
others are quietly and deliberately implementing very effective evaluation systems. The latter
group is gaining tremendous support from the senior management team and is making much
progress. Regardless of the position taken on the issue, the reasons for measurement and
evaluation are intensifying. Almost all WLP professionals share a concern that they must
eventually show results on their learning investment. Otherwise, funds may be reduced or the
WLP department may not be able to maintain or enhance its present status and influence in the
organization.
The dilemma surrounding the evaluation of learning is a source of frustration with many
senior executives — even within the WLP field itself. Most executives realize that learning is a
basic necessity when organizations are experiencing significant growth or increased competition.
They intuitively feel that there is value in providing learning opportunities, logically anticipating
a payoff in important bottom-line measures such as productivity improvements, quality
enhancements, cost reductions, time savings, and customer service. Yet the frustration comes
from the lack of evidence to show that the process is really working. While results are assumed
to exist and learning programs appear to be necessary, more evidence is needed or funding may
be adjusted in the future. A comprehensive measurement and evaluation process represents the
most promising way to show this accountability in a logical, rational approach.
* This is an updated and modified version of material that has been published in several
publications including Return on Investment in Training and Performance Improvement
Programs, 2nd Ed. by Jack J. Phillips, Butterworth-Heinemann, Boston, MA 2003; and Make
Training Evaluation Work, Jack Phillips, Patti P. Phillips, and Toni K. Hodges, ASTD,
Alexandria, VA 2004.
1
Global Evaluation Trends
Measurement and evaluation have been changing and evolving – in both private and public
sector organizations – across organizations and cultures; not only in the USA, but across all the
developed countries. The following trends have been identified in our research:
• Organizations are increasing their investment in measurement and evaluation with best
practice groups spending 3-5% of the learning and development budget on measurement
and evaluation.
• Organizations are moving up the value chain, away from measuring reaction and learning
to measuring application, impact, and occasionally ROI.
• The increased focus on measurement and evaluation is largely driven by the needs of the
clients and sponsors of learning projects, programs, initiatives, and solutions.
• Evaluation is an integral part of the design, development, delivery, and implementation of
programs.
• A shift from a reactive approach to a more proactive approach is developing, with
evaluation addressed early in the cycle.
• Measurement and evaluation processes are systematic and methodical, often designed
into the delivery process.
• Technology is significantly enhancing the measurement and evaluation process, enabling
large amounts of data to be collected, processed, analyzed, and integrated across
programs.
• Evaluation planning is becoming a critical part of the measurement and evaluation cycle.
• The implementation of a comprehensive measurement and evaluation process usually
leads to increased emphasis on the initial needs analysis.
• Organizations with comprehensive measurement and evaluation have enhanced their
program budgets.
• Organizations without comprehensive measurement and evaluation have reduced or
eliminated their program budgets.
• The use of ROI is emerging as an essential part of the measurement and evaluation mix.
It is a fast growing metric – 70-80% of companies have it on their wish list.
• Many successful examples of comprehensive measurement and evaluation applications
are available in all types of organizations and cultures.
These trends are creating a tremendous demand for more information, resources, knowledge, and
skills in the measurement and evaluation processes.
2
Measurement and Evaluation Challenges
Although measurement and evaluation is increasing, why aren’t organizations doing
more? The barriers to conducting meaningful evaluation can be summarized as 12 basic
challenges:
Too many theories and models. Since Kirkpatrick provided his four levels of
evaluation in the late 1950s, dozens of evaluation books have been written just for the learning
and development community. Add to this the dozens of evaluation books written primarily for
the social sciences, education, and government organizations. Then, add in the 25-plus models
and theories for evaluation offered to practitioners to help them measure the contribution of
learning and development, each claiming a unique approach and a promise of addressing
evaluation woes and bringing world peace.
Models are too complex. Evaluation can be a difficult issue. Because situations and
organizations are different, implementing an evaluation process across multiple programs and
organizations is quite complex. The challenge is to develop models that are theoretically sound,
yet simple and usable.
Lack of understanding of evaluation. It hasn’t always been easy for practitioners to
learn this process. Some books on the topic have over 600 pages, making it impossible for a
practitioner to absorb just through reading. Not only is it essential for the evaluator to
understand evaluation processes, but also the entire learning and development staff must learn
parts of the process and understand how it fits into their role. To remedy this situation, it is
essential for the organization to focus on how expertise is developed and disseminated within the
organization.
The search for statistical precision. The use of complicated statistical models is very
confusing and difficult to absorb for most practitioners. Statistical precision is needed when a
high-stakes decision is being made and when plenty of time and resources are available.
Otherwise, very simple statistics are appropriate.
Evaluation is considered a post-program activity. When evaluation is considered an
add-on activity, it loses the power to deliver the needed results. The most appropriate way to use
evaluation is to consider it early – prior to program development – at the time of conception.
With this approach, an evaluation is conducted efficiently and the quality and quantity of data
you collect is enhanced.
Failure to see the long-term payoff of evaluation. Developing the long-term payoff of
evaluation requires examining multiple rationales for pursuing evaluation. Evaluation can be
used to:
• determine success in accomplishing WLP program objectives,
• prioritize resources for WLP,
• enhance the accountability of WLP,
• identify the strengths and weaknesses of the learning and development process,
• compare the costs to the benefits for a WLP program,
• decide who should participate in future WLP programs,
• test the clarity and validity of tests, cases, and exercises,
• identify which participants were the most successful with the WLP program,
• reinforce major points made to the participant,
3
• improve the quality of learning and development
• assist in marketing future programs,
• determine if the program was appropriate solution for the specific need, and
• establish a database that can assist management in making decisions.
Lack of support from key stakeholders. Important customers, who need and use
evaluation data, sometimes don’t provide the support needed to make the process successful.
Specific steps must be taken to win support and secure buy-in from key groups including senior
executives and the management team. Executives must see that evaluation produces valuable
data to improve programs and validate results. When the stakeholders understand what’s
involved, they may offer more support.
Evaluation hasn’t delivered the data senior managers want. Today, clients and
sponsors are asking for data beyond reaction and learning. They need data on the application of
new skills on the job and the corresponding impact in the business units. Sometimes they want
ROI data for major programs. They are requesting data about the business impact of learning –
both from short-term and long-term perspectives. Ultimately, these executives are the ones who
must continue funding learning and development. If the desired data are not available, future
funding could be in jeopardy.
Improper use of evaluation data. Improper use of evaluation data can lead to four
major problems:
• Too many organizations do not use evaluation data at all. Data are collected, tabulated,
catalogued, filed, and never used by any particular group other than the individual who
initially collected the data.
• Data are not provided to the appropriate audiences. Analyzing the target audiences and
determining the specific data needed for each group are important steps when
communicating results.
• Data are not used to drive improvement. If not part of the feedback cycle, evaluation falls
short of what it is intended to accomplish.
• Data are used for the wrong reasons--to take action against an individual or group or to
withhold funds rather than improving processes. Sometimes the data is used in political
ways to gain power or advantage over another person.
Lack of consistency. For evaluation to add value and be accepted by different
stakeholders, it must be consistent in its approach and methodology. Tools and templates need to
be developed to support the method of choice to prevent perpetual reinvention of the wheel.
Without this consistency, evaluation consumes too many resources and raises too many concerns
about the quality and credibility of the process.
A lack of standards. Closely paralleled with consistency is the issue of standards.
Standards are rules for making evaluation consistent, stable, and equitable. Without standards
there is little credibility in processes and stability of outcomes.
Sustainability. A new model or approach often has a short life. It is not sustained.
Evaluation must be integrated into the organization so that it becomes routine and lasting. To
accomplish this, the evaluation process must gain respect of key stakeholders at the outset. The
evaluation process must be well documented, and stakeholders must accept their responsibilities
to make it work. Without sustainability, evaluation will be on a roller-coaster ride, where data are
collected only when programs are in trouble and less attention is provided when they are not.
4
Benefits of Measurement and Evaluation
Although the benefits of measurement and evaluation may appear obvious, several
distinct and important payoffs can be realized.
Respond to requests and requirements. Today’s executives and administrators need
information about application and implementation in the workplace and the corresponding
impact on key business measures. In some cases, they are asking for ROI analysis. Developing a
comprehensive measurement and evaluation system is the best strategy to meet these requests
and requirements.
Justify budgets. Some WLP functions use evaluation data to support a requested budget
while others use evaluation data to prevent the budget from being slashed, or--in drastic cases--
eliminated entirely. Additional evaluation data can show where programs add value and where
they do not. This approach can lead to protecting successful programs as well as pursuing new
programs.
Improve program design. A comprehensive evaluation system should provide
information to improve the overall design of a program, including the critical areas of learning
design, content, delivery method, duration, timing, focus, and expectations. These processes may
need to be adjusted to improve learning, especially during implementation of a new program.
Identify and improve dysfunctional processes. Evaluation data can determine whether
the upfront analysis was conducted properly, thereby aligning the program with the
organizational needs. Additional evaluation data can indicate whether interventions are needed
other than learning and development. Finally, evaluation data can help pinpoint inadequacies in
implementation systems and identify ways to improve them.
Enhance the transfer of learning. Learning transfer is perhaps one of the biggest
challenges facing the learning and development field. Research still shows that 60 to 90 percent
of job-related skills and knowledge acquired in a program still are not being implemented on the
job. A comprehensive evaluation system can identify specific barriers to the use of learning.
Evaluation data can also highlight supportive work environments that enable learning transfer.
Eliminate unnecessary or ineffective projects or programs. Evaluation processes can
provide rational, credible data to help support the decision to either implement a program or
discontinue it. In reality, if the program cannot add value, it should be discontinued. One caveat:
Eliminating programs should not be a principal motive or rationale for increasing evaluation
efforts. Although it is a valid use of evaluation data, program elimination is often viewed more
negatively than positively.
Expand or implement successful programs. The flipside of eliminating programs is
expanding their presence or application. Positive results may signal the possibility that a
program’s success in one division or region can be replicated in another division if a similar need
exists.
Enhance the respect and credibility of the WLP staff. Collecting and using evaluation
data – including application, impact, and ROI – builds respect for learning and respect for the
WLP staff. Appropriate evaluation data can enhance the credibility of the WLP function when
the data reveal the value added to the organization.
5
Satisfy client needs. Satisfying clients is a critical challenge. If the client is not pleased
with the data, he or she may decline the opportunity to use the WLP function in the future. If the
client is satisfied, he or she may repeat the use of WLP programs and even recommend them to
others.
Increase support from managers. Immediate managers of participants need convincing
data about the success of learning. They often do not support these processes because they see
little value in taking employees away from the job to be involved in a program with little
connection to their business unit is evident. Data showing how learning helps them to achieve
their objectives will influence their support.
Strengthen relationships with key executives and administrators. Senior executives
must perceive WLP as a business partner that they can be invited to the table for important
decisions and meetings. A comprehensive measurement and evaluation process can show the
contribution of the WLP function and help strengthen this relationship.
Set priorities for learning and development. A comprehensive measurement system
can help determine which programs and projects represent the highest priority. Evaluation data
can show the payoff or potential payoff of important and expensive programs, or those
supporting strategic objectives.
Reinvent WLP. Measurement and evaluation reveal the extent of alignment between
learning and development and the business, driving increased alignment in the future. This
alignment requires a continuous focus on critical organizational needs and results that can and
should be obtained from programs and projects. Ultimately, this reinvents workplace learning
and performance.
Alter management’s perceptions of WLP. Middle-level managers often see learning
as a necessary evil. A comprehensive evaluation process may influence these managers to view
learning as a contributing process and an excellent investment. It can also help shift the
perception of learning from a dispensable activity to an indispensable value-adding process.
Achieve a monetary payoff. In some situations, an actual monetary value can be
calculated for investing in measurement and evaluation. This is particularly true with the
implementation of ROI where many organizations have calculated “the ROI on the ROI.” They
determine the payoff of investing in a comprehensive measurement and evaluation process – the
ROI methodology. The payoff is developed by detailing specific economies, efficiencies, and
direct cost savings generated by the evaluation process.
These key benefits, inherent with almost any type of impact evaluation process, make
additional measurement and evaluation an attractive challenge for the WLP function.
6
do and how it can or should be implemented in organizations. Following is a list of myths and
with the appropriate clarification:
Measurement and evaluation is too expensive. Cost is usually the first issue to surface
when considering additional measurement and evaluation. Many practitioners see evaluation
adding cost to an already lean budget that is regularly scrutinized. In reality, when the cost of
evaluation is compared to the budget, a comprehensive measurement and evaluation system can
be implemented for less than 5 percent of the total direct learning and development or WLP
budget.
Evaluation takes too much time. Parallel with the concern about cost is the actual time
involved in evaluation--time to design evaluation instruments, collect data, process the data, and
communicate results to a variety of groups. Dozens of shortcuts are available to help reduce the
total time requirements for evaluation.
If senior management does not require additional measurement, there is no need to
pursue it. Sometimes senior executives fail to ask for results because they think that the data are
not available. They may assume that results can’t be produced. Paradigms are shifting, not only
within the WLP context, but within senior management groups as well. Senior managers are
beginning to request higher level data that shows application, impact, and even ROI.
Measurement and evaluation is a passing fad. While some practitioners regard the
move to more evaluation as a passing fad that will soon go away, accountability is a concern
now. Many organizations are being asked to show the value of programs. Studies show this
trend is going to continue and escalate.
Evaluation generates only one or two types of data. Although some evaluation
processes generate a single type of data (reaction-level data, for example), many evaluation
models and processes generate a variety of data, offering a balanced approach based on both
qualitative and quantitative data. Some models generate as many as six different types of
qualitative and quantitative data collected at different timeframes and from different sources.
Evaluation cannot be easily replicated. With so many different evaluation processes
available, this issue becomes an understandable concern. In theory, any process worthy of
implementation is one that can be replicated from one study to another. Fortunately, many
evaluation models offer a systematic process with certain guiding principles or operating
standards to increase the likelihood that two different evaluators will obtain the same results.
Evaluation is too subjective. Subjectivity of evaluation has become a concern in part
because of the studies conducted using estimates and perceptions that have been published and
presented at conferences. The fact is that many studies are precise and are not based on
estimates. Estimates usually represent the worse case scenario or approach.
Impact evaluation is not possible for soft skills programs, only for technical and
hard skills. This assumption is often based on the concern that soft skills programs are
sometimes difficult to measure. Practitioners have a problem understanding how to measure the
success of leadership, team-building, and communication programs, for example. What they
often misunderstand is that soft skills learning and development programs can and should drive
hard data items such as output, quality, cost, and time.
7
Evaluation is more appropriate for certain types of organizations. Although
evaluation is easier in certain types of programs, generally, evaluation can be used in any setting.
Comprehensive measurement systems are successfully implemented in health care, nonprofit,
government, and educational areas in addition to traditional service and manufacturing
organizations. Another concern expressed by some is that only large organizations have a need
for measurement and evaluation. Although this may appear to be the case (because large
organizations have large budgets), in reality, evaluation can work in the smallest organizations; it
has to be scaled down to fit the situation.
It is not always possible to isolate the effects of learning on impact data. Several
methods are available to isolate the effects of learning on outcome data. The challenge is to
select an appropriate isolation technique for the resources available and the accuracy needed in
the particular situation.
Because the WLP staff has no control over participants after they complete a
program, a process for measuring the on-the-job improvement should not be used. This
myth is fading as organizations realize the importance of measuring results of workplace
learning solutions. Systems and processes can be implemented to influence application.
Expectations can be created so that participants anticipate a follow-up and provide appropriate
data.
A participant is rarely responsible for the failure of programs. Too often participants
are allowed to escape accountability for their learning experience. It is too easy for participants
to claim that the program was not supported by their manager, it did not fit the culture of the
work group, or that the systems or processes are in conflict with the skills and processes
presented in the program. Today, participants are being held more accountable for the success of
learning in the workplace.
Evaluation is only the evaluator’s responsibility. Some organizations assign an
individual or group primary responsibility for evaluation. When that is the case, other
stakeholders assume that they have no responsibility for evaluation. In today’s climate,
evaluation must be a shared responsibility. All stakeholders are involved in some aspect of
analyzing, designing, developing, delivering, implementing, coordinating, or organizing a
program.
Successful evaluation implementation requires a degree in statistics or evaluation. It
is not a requirement to have a degree or possess some special skill or knowledge. Eagerness to
learn, willingness to analyze data, and a desire to make improvements in the organization are
primary requirements. With these requirements met, most individuals can learn how to properly
implement evaluation.
Negative data are always bad news. Negative data provides a rich source of
information for improvement. An effective evaluation system can pinpoint what went wrong so
that changes can be made. Barriers to success as well as enablers of success can be identified. It
will generate conclusions that show what must be changed to make the process more effective.
8
Key Steps and Issues
This concept shows how value is developed and also provides data from different
perspectives. Some stakeholders are interested in knowing about the inputs so that they can be
managed and made more efficient; others are interested in reaction; still others are interested in
learning. More recently, clients and sponsors are more interested in actual behavior change
(application) and the corresponding business impact, while a few stakeholders are concerned
about the actual return on investment.
9
Evaluation Planning
Evaluation must be planned – overall and individually – with each program. When
evaluation is conducted only at reaction levels, not much planning is involved, but as evaluation
moves up the value chain, increased attention and efforts needs to be placed on planning. During
the typical planning cycle, it is helpful to review the purpose of evaluation for the specific
solutions and determine where the evaluation will stop on the value chain. The feasibility of
evaluating at different levels is explored and two planning documents are developed when the
evaluation migrates to application, impact, and ROI: the data collection plan and the analysis
plan. These documents are sometimes used in combination, but are often developed separately.
Objectives
One of the most important developments in measurement and evaluation is the creation
of higher levels of objectives. Program objectives correspond with the different levels on the
value chain. Ideally, the levels of objectives should be in place at the highest level desired for
evaluation. Essentially, the levels of objectives are:
Collecting Data
One important issue is the timing of data collection. In some cases, pre-program
measurements are taken to compare with post-program measures and, in some cases, multiple
measures are taken. In other situations, pre-program measurements are not available and specific
follow-ups are still taken after the program. The important issue is to determine the timing for
the follow-up evaluation.
Another important issue is the method used. Data are collected using the following
methods:
Surveys are taken to determine the extent to which participants are satisfied with the
program, have learned skills and knowledge, and have utilized various aspects of the
program.
Questionnaires are usually more detailed than surveys and can be used to uncover a
wide variety of data. Participants provide responses to several types of open-ended and
forced response questions.
Tests are conducted to measure changes in knowledge and skills. Tests come in a wide
variety of formal (criterion-referenced tests, performance tests and simulations, and skill
10
practices) and informal (facilitator assessment, self assessment, and team assessment)
methods.
On-the-job observation captures actual skill application and use. Observations are
particularly useful in customer service training and are more effective when the observer
is either invisible or transparent.
Interviews are conducted with participants to determine the extent to which learning has
been utilized on the job.
Focus groups are conducted to determine the degree to which a group of participants has
applied the training to job situations.
Action plans and program assignments are developed by participants during the
program and are implemented on the job after the program is completed. Follow-ups
provide evidence of program success.
Performance contracts are developed by the participant, the participant’s supervisor,
and the facilitator who all agree on job performance outcomes.
Business Performance monitoring is useful where various performance records and
operational data are examined for improvement.
The important challenge in data collection is to select the method or methods appropriate for
the setting and the specific program, within the constraints of the organization.
Analysis
Evaluation requires analysis. Even if the evaluation stops at Level 1, analysis is required,
usually involving simple averages and standard deviations. As organizations progress up the
value chain, additional analyses are required. In some cases, not only are the averages and
standard deviations used, but simple hypotheses testing and correlations may be required;
however, these are very unusual situations. For the most part, analysis is simply tabulating,
organizing, and integrating data and then presenting results in meaningful ways for the audience
to understand and appreciate.
A control group arrangement is used to isolate learning’s impact. With this strategy, one
group participates in a program, while another similar group does not. The difference in
the performance of the two groups is attributed to the program. When properly setup and
implemented, the control group arrangement is the most effective way to isolate the
effects of learning and development.
Trend lines and forecasting are used to project the values of specific output variables as
if the learning program had not been undertaken. The projection is compared to the
actual data after the program is conducted, and the difference represents the estimate of
11
the impact of learning. Under certain conditions, this strategy can accurately isolate the
impact of learning.
Participants or managers estimate the amount of improvement related to the learning
and development program. With this approach, participants or managers are provided
with the total amount of improvement, on a pre and post program basis, and are asked to
indicate the percent of the improvement that is actually related to the program.
Other experts, such as customers, provide estimates of the impact of learning on the
performance variable. Because the estimates are based on previous experience, the
experts must be familiar with the type of program and the specific situation.
This step is necessary for determining the monetary benefits from a learning program. The
process is challenging, particularly with soft data, but can be methodically accomplished using
one or more techniques.
12
Calculating the Return on Investment
When the ROI is actually developed, it should be calculated systematically, using
standard formulas. Two formulas are available. The benefits/cost ratio is the program benefits
divided by cost. In formula form it is:
Program Benefits
BCR =
Program Costs
The return on investment uses the net benefits divided by program costs. The net benefits
are the program benefits minus the costs. In formula form, the ROI becomes:
This is the same basic formula used in evaluating other investments where the ROI is
traditionally reported as earnings divided by investment. An example of the benefits/cost ratio
and ROI is illustrated below. A training program is delivered to 50 participants. Consider that
following the training program, the first year program benefits (from L-4 business impact data)
are found to be $300,000 from the 50 participants and the fully loaded cost to train these 50
participants is $200,000.
$300,000
BCR = = $1.50:1
$200,000
$100,000
ROI = = 0.5 x 100 = 50%
$200,000
The ROI calculation of net benefits ($300,000 minus $200,000) divided by total costs
brings an ROI of 50%. This is what is earned after we get back the $200,000 spent for the
program. The ROI calculation accounts for the program cost and shows the resulting net gain.
The BCR calculation (Benefits/Cost ratio) above uses total benefits in the numerator.
Therefore the expressed BCR of 1.50:1 does not account for replacing the money expended.
This is why when using the same values, BCR will always be “1” greater than the ROI. The
BCR of 1.50:1 in the example means, for every dollar we spend, we get back $1.50 (one dollar
and fifty cents). One dollar has to pay for the investment, so the net is $0.50 (fifty cents as
expressed in the ROI).
13
improved teamwork,
improved customer service,
reduced complaints, and
reduced conflicts.
During analysis, hard data such as output, quality, and time are usually converted to
monetary values. The conversion of soft data is attempted; however, if the process used for
conversion is too subjective or inaccurate, and the resulting values lose credibility in the process;
then the data are listed as an intangible benefit with the appropriate explanation. For some
programs, intangible, non-monetary benefits are extremely valuable, often carrying as much
influence as the hard data items.
Reporting Data
This very critical step often lacks the proper attention and planning to ensure that it is
successful. This step involves developing appropriate information as impact studies, executive
summaries, one-page summaries, and other brief reports. The heart of the step includes the
different techniques used to communicate to a wide variety of target audiences. In most
situations, several audiences are interested in and need the information. Careful planning to
match the communication method with the audience is essential to ensure that the message is
understood and appropriate actions follow.
Operating Standards
To ensure consistency and replication of evaluation studies, operating standards should be
developed and applied in the measurement and evaluation process. It is extremely important for
the results of an evaluation to stand alone and not vary depending on the individual conducting
the study. The operating standards detail how each step and issue of the process will be
addressed. Examples of general standards are:
These specific standards not only serve as a way to consistently address each step, but
also provide a much needed conservative approach to the analysis. A conservative approach will
build credibility with the target audience.
14
Implementation Issues
A variety of organizational issues and events will influence the successful
implementation of measurement and evaluation. These issues must be addressed early to ensure
that evaluation is successful. Specific topics or actions may include:
Measurement and evaluation can fail or succeed based on these implementation issues.
Final Thoughts
There is almost universal agreement that more attention is needed for measurement and
evaluation. Its use is expanding. The payoff is huge. The process is not very difficult or
impossible. The approaches, strategies, and techniques are not overly complex and can be useful
in a variety of settings. The combined and persistent efforts of practitioners and researchers will
continue to refine the techniques and create successful applications.
15