CWL Publishing Enterprises

CWL Publishing Enterprises

Level IV Evaluation: Why, What, and How
Dale Brethower This article is from The 1997 ASTD Training and Performance Yearbook. As most people know, Level IV evaluation of training results can be difficult, but Brethower explains that it doesn't have to be if you understand its purpose and apply some common sense. We're tempted to call this article "everything you ever wanted to know about Level IV evaluation," but it's not quite that complete. However, it's filled with sound, practical advice to help you understand how to find out if your training programs actually result in improved performance.

* * *

Preview

There are three common pitfalls that make Level IV evaluation difficult or impossible. If the pitfalls are avoided by use of three evaluation strategies and three evaluation tactics, Level IV evaluation is easy to do and very low in cost. The three pitfalls are: (1) Attempting after-the-fact evaluation. (2) Attempting to answer an unanswerable evaluation question. (3) Doing training that is disconnected from strategic or current business issues or current workplace performance.

The strategies that support Level IV evaluation are: (1) establish truth-in-labeling standards relevant to training objectives and evaluate accordingly; (2) include evaluation throughout the training design, delivery, and support process; (3) focus on relevant, important, and answerable evaluation questions. Three evaluation tactics that support Level IV evaluation are: (1) user evaluation panels, (2) success case evaluation, (3) action project evaluation.

Why Do Level IV Evaluation?

The major reason for doing Level IV evaluation is to help HRD professionals do their jobs competently. Though it is well-established that most organizations neither understand nor support Level IV evaluation, some HRD professionals elect to do it anyway. This article is written for such pioneers.

What Is Level IV Evaluation?

We can define levels of evaluation, following Kirkpatrick's lead, by the nature of the evaluation questions we ask: Level I—Do trainees like the training? Level II—Do trainees learn the material? Level III—Do trainees use what they learned? Level IV—Does using it do any good? The questions reflect commonly held values: HRD professionals like to run training programs that people like, learn from, find useful, and use beneficially. Defining the levels in terms of evaluation questions focuses attention on exactly what the evaluation is about, a benefit that is especially important in Level IV evaluation.

Does using what was learned do any good? The question that fairly begs for criteria and dimensions of "goodness"! Does performance improve (as measured by timeliness, quality, or cost)? Are the performance goals achieved? Do people who use what was learned perform better than those who do not? Do people like to use what they learned? Is using what was learned supported in the workplace? Does using what was learned have unintended positive or negative effects? Do people suppose that using what was learned is consistent with the company culture or the image the company wishes to project? These different questions are all about matters that stakeholders might be interested in, and many of them are Level IV questions.

How Can We Do Level IV Evaluation?

We can do Level IV evaluation readily under the right conditions, but it is nigh impossible under other conditions. The ease or difficulty of Level IV evaluation depends upon the circumstances, not the intentions or desires of the HRD practitioner. Those of us who want to do Level IV evaluation should, I believe, pick our spots and do Level IV evaluation when it is easy rather than attempting it when it is very difficult or impossible; we should also work to arrange conditions so that it is possible more frequently.

To do Level IV evaluation successfully, we must avoid three common pitfalls and can use three strategies and three tactics appropriate to the situations in which we work.

Three Common Pitfalls

The pitfalls are common and, under some circumstances, impossible to avoid.

Pitfall One: Post Hoc Evaluation
The first pitfall is attempting Level IV evaluation after the fact, after the training program has been designed and implemented. Evaluation as an afterthought is both silly and common. We can illustrate just how silly it is by simple thought experiments: What do you suppose would happen to an airplane manufacturer that built and sold 500 airplanes and then said, "Oh, by the way, let's find out whether any of them fly!" What do you suppose would happen to a pharmaceutical company that released new drugs and then said, "Oh, by the way, let's find out if any of them work or have harmful side effects!" What do you suppose would happen to a surgical department if it introduced new surgical procedures and then said, "Oh, by the way, let's find out what happens to the patients later on!" What do you suppose would happen to an HRD department that launched its training programs and then said, "Oh, by the way, let's find out if any of them do any good?" (My thought is that the airplane manufacturer and the drug manufacturer would be out of business and the executives jailed in short order. On the other hand, the surgical department and the HRD department would be OK, engaging in standard practices.)

After-the-fact evaluation can be very difficult if conditions aren't set up properly. For example, testing conditions for the airplanes or the drugs are established carefully and in advance with special attention to the specific data needed to answer specific evaluation questions: What's the optimal dosage? The lethal dosage? Will the airplane fly at slow speeds in good weather? Will it pull out of a shallow dive? Of course, these evaluation questions should all have been answered during development where highly controlled conditions can be established. But even the field testing/clinical trials must be carefully preplanned.

Pitfall Two: Unanswerable Questions
People commonly ask this Level IV evaluation question: Did the training cause the performance improvement? (Did the training cause the reduction in defects? Did the training cause the increase in safe lifting behaviors? Did the sales training cause the increase in sales?)
The question seems reasonable unless we think it through. It would be nice, according to this thinking, to say with conviction, "Our training program caused a 15% increase in sales!" or "Our training program caused a 5% reduction in lost time accidents!" or "Our training program caused a 53% reduction in wasted time during meetings!"

But the causal question, on reflection, turns out to be both impossible to answer and irrelevant to practical issues. It is impossible for all the reasons people commonly say it is: other variables are at work. Just do a thought experiment and see how many variables you can list that might have influenced sales, or accidents, or utilization of meeting time. Did one thing, that is, the training program, cause the performance improvement? Never!

Does inability to answer the causal question mean that Level IV evaluation is pointless? Not at all. Here are some meaningful Level IV evaluation questions: Did performance improve? (If it didn't, we know the training didn't cause it to improve; if it did, we know that several things came together to support the result.) Did the training program do what it promised? (Did the people exit the program having acquired proficiency in safe lifting, saying that safe lifting was important to them, and predicting that they would lift safely, at least for a few hours, on the job?) Did engineering do what it promised? (Were new safety switches attached to the machines? Were safety standards developed and monitored?) Did supervisors do what they had promised? (Did they monitor safe lifting, comment positively when it occurred and correctively when it did not? Did they hold meetings with the workers to decelerate macho garbage?)

Any major organizational result can be achieved only through the efforts of numerous people in numerous departments or functions. Does it make sense to evaluate to see whether or not the result occurred? Of course. Does it make sense to evaluate to see whether each of the players performed as promised? Of course. But would we conclude, for example, that the Chicago Bulls won their most recent championship because of Michael Jordan? No. If he could have caused the win all by himself, why have all those other VEPs (Very Expensive Players) take the court? Could they have won without Michael Jordan? Maybe or maybe not—it's a great conversation question but a very poor evaluation question. (Would you, as the general manager, want to do the experiment to find out? The experiment could be performed, but how often should we evaluate a decision alternative that isn't even open for consideration? )

It is important to know whether a result occurs, and it is important to know if each player delivers as promised, but it makes no sense to ask, "Did training cause the result?" In general, we already know that no one variable, by itself, causes an organizationally significant result. Instead of pursuing the unanswerable, we should focus on useful and answerable Level IV evaluation questions.

Pitfall Three: Disconnected Training
Some training is done in response to needs assessments that consist of asking people what sort of training they'd like. The proper evaluation question for such training is a Level I question: Did they like the training they got? If user satisfaction is the only goal, user satisfaction is what we should measure. I suppose one could ask, "Did people want training that yielded a Level IV impact?" but that seems to be an odd approach. For example, there's no reason to imagine that people will respond to a training wants survey this year in a way that accurately predicts next year's critical business issues or the upcoming strategic issues. When training is not connected to business and strategic issues—and much of it is not—it makes little sense to do an evaluation to determine whether or not it is having a beneficial Level IV effect.

It makes sense to do Level IV evaluation of Level IV objectives. For example, if reducing lost-time accidents is a current business priority, then it makes sense to know whether accidents were reduced (but it doesn't make sense to ask if the training program caused the result all by itself). If increasing sales revenues is a current business priority, then it makes sense to know whether or not sales increase (and it makes sense to know whether training delivered as promised). But it doesn't make sense to do Level IV evaluation for certain other reasons. For example, if the business purpose is to do mandated safety training, it makes sense to know whether the training offered meets legal requirements; evaluating to determine whether training that meets legal requirements has an effect on accidents isn't relevant to the business purpose.

My preference is that the business purpose should be to fulfill the intentions of the mandate as well as the letter, but that is another issue entirely. The point I'm making here is that, like it or not, not all training has a Level IV purpose. Some training is disconnected from business performance issues; it should either be connected or the disconnect acknowledged, and Level IV evaluation should not be attempted.

Three Evaluation Policies

Three evaluation policies can help avoid the three pitfalls and three evaluation tactics can be used to implement these policies. The evaluation policies are to establish "truth in labeling" standards for training, to focus evaluation only on useful and answerable questions, and to include evaluation throughout the training process.

Truth in Labeling
The notion here is simply to be quite conservative and honest in stating training goals and to evaluate accordingly. For example, if fulfilling a legal requirement is the purpose, then evaluate only to document fulfillment of the requirement. Requirements are typically written in terms of the number of hours of training given and the topics covered. The evaluation question is "Did the training, as implemented, meet requirements?"

If, in addition, you choose to promise that people will learn certain things, add a Level II question: To what extent did trainees master the material taught? If, in addition, you choose to promise that people will use what they learn, add Level III evaluation questions: Did trainees attempt to use what they learned? Did the workplace environment support those attempts? If, in addition, you choose to promise that people will like the training, add Level I evaluation questions: Do you perceive the material as relevant or useful? Do you like what you learned? Did you enjoy the learning experience? If, in addition, you choose to promise that training will contribute to the solution of a performance problem or attainment of a business goal, add Level IV evaluation questions: Was the safety problem solved or the goal attained? Did the trainees and others do their parts? Getting before-the-fact agreements on the evaluation questions to be answered is very helpful to training designers and sets up the evaluation. For example, a designer who assumes that performance improvement is desired when compliance is the only goal and the designer who assumes that compliance is the goal when performance improvement is desired will both deliver training that is unsatisfactory.

The point here is simply that we should promise what we will deliver and deliver what we promise. Evaluation should verify that that is being done and, if desired, answer other legitimate evaluation questions, such as, Were there detectable and unintended effects?

Focus on Important and Answerable Evaluation Questions
The way to avoid unanswerable questions is to focus on answerable questions. Causal questions are unanswerable--I would argue that they are unanswerable in principle because everything has multiple causes--but many impact questions are answerable, for example, Did standard business measures improve? Did the people involved keep their promises?

Avoiding unimportant evaluation questions is a bit harder. To avoid unimportant questions, evaluators should ask questions about proposed questions, e.g., How much would it be worth to have an answer to that question? What decisions will be made based upon the answer? If you knew that answer to that question, what would you do differently? What is an answer to that question that would influence your decision making?

Incorporate Evaluation Throughout the Training Process
The most common remedy for after-the-fact evaluation is to do no careful evaluation at all. Unfortunately, that tactic tends to yield unsystematic after-the-fact evaluation: "I heard that training program was excellent." Or "I heard that training program was a bummer."

Evaluation can't be escaped; the evaluation-by-rumor model is quite common. A better practice is to include evaluation throughout the training process. Three of the four examples below show how.

Three Evaluation Tactics

Three evaluation tactics are especially useful in doing Level IV evaluation: (1) working with a stakeholder panel, (2) doing success case evaluation, and (3) integrating evaluation into action projects.

A Stakeholder Evaluation Panel

A stakeholder evaluation panel can be established, sometimes easily, by saying, "I'll need the help of some key people to ensure that the project is successful. Who are the people who ...? I can meet with them individually, but some of the time it would be helpful to meet as a group." The job of the evaluation panel—or advisory committee or whatever it's labeled—is to make decisions at key points in the design, implementation, and support of the training effort. For example, the panel might be asked to sign off on a statement of the purpose of the training, on the outcome objectives, and so forth. Simply going to members of the panel individually in a group, showing them something, and asking: "Will this do the job? How can we do it better?" gets their involvement quite naturally.

Success Case Evaluation
Success case evaluation asks this question: "When the training works best, how well does it work?" The name comes from the practice of selecting success cases, people who actually successfully apply what was learned in the training. We discovered the method quite by accident, noticing that some trainees volunteered success stories when we encountered them. It was a short step to systematically collect information from them about the results achieved, a rare example of successfully doing Level IV evaluation after the fact.

Success case methodology can be used proactively by predicting who will successfully use the learning and working closely with them afterwards to get information on successes and obstacles. The method has a high yield of useful information and is low in cost. An evaluation question worth adding later, but harder to answer, is "How many people become success cases?" Examples described below will illustrate the method in more detail. As you will see, success case methodology does a very limited job well.

Action Projects
Action projects can be used to get trainees to do Level IV evaluation for you. The tactic is to devote a significant portion of a training program to developing projects people will do on the job. Building results tracking (evaluation) into the projects enables trainees and anyone else to monitor their success in using what they learn. For example, participants in a workshop on instructional delivery can devise checklists to get evaluation data from participants in the next course they teach: "Please help me with a self-improvement project I'm working on. This checklist shows several things I believe I should do as an instructor. Would you fill out the checklist for me at the end of the day?" This surely gets at Level III issues--do they use what they learned, that is, does their instructional behavior improve? The next step, also built into their projects, can get more directly at Level IV. Student performance on an examination or during a key case study can be monitored to see if student performance improves as instructor behavior improves.

Some action projects are quite easy to use as vehicles for Level IV evaluation, for example, cost reduction projects, process improvement projects. The more such vehicles HRD professionals can incorporate into training, the more Level IV evaluation they can do. Indeed, we can argue that action projects should always contain mechanisms for determining whether or not the projects are successful.

Four Examples of Level IV Evaluation

Here are four examples of Level IV evaluation that illustrate the policies and tactics just described. All the examples employ action projects and some form of success case methodology. In all cases, the action projects are an integral part of the instruction rather than an add-on. (With the exception of the public workshops, they all occur over several weeks to allow for implementation of the projects during as well as after the training.) They all focus on relevant and answerable questions and include evaluation throughout the training cycle. An evaluation panel was explicitly used in one of the examples.

Level IV Evaluation in Public Workshops
A number of years ago a group of us taught public management workshops at a major university. We used a telephone survey to ask people how they were using what they'd learned and what results they had gotten. The people who were getting the best results were eager to talk with us, share successes, and ask us questions about problems they encountered. We learned about the impact of the workshop when it worked well and we learned answers to other useful questions, such as "What obstacles do people encounter?" "How do they deal with the obstacles?" "How many of the workshop graduates report successes?" and "How many of the reports are vague protestations and how many are supported by hard data?"

We also interacted with successful workshop graduates when they returned for advanced workshops, when they called us for advice, and when we did consulting projects with them. We were interested in successes and obstacles they had experienced, not in proving we'd taught them how to succeed (and fail). The information we obtained was rigorous enough to guide continuous improvement of the workshops. The evaluation also yielded many "real life" examples, which we incorporated into the workshops and thereby increased (Level I) satisfaction measures as well as the quality of the action projects people designed and implemented.

Level IV Evaluation in a Quality Initiative
The financial division of a large privately owned and internationally functioning corporation sought to improve the quality of financial reports and other services to their internal customers. The HRD department worked with them to train the financial professionals in techniques of identifying "customer" requirements and improving their work processes to better meet the requirements. A task force from the division worked with HRD staff to plan and implement the training intervention. The HRD staff person incorporated evaluation into the design by the simple device of raising relevant questions and offering to help collect information that would answer them, e.g., "How will we know whether people have the time or other support they need to complete the action projects?"

The divisional task force, possibly aided by such questions or by suggestions of the HRD advisor, instituted trainee satisfaction questionnaires, evaluation of proposed action projects, follow-up coaching and documentation, and end-of-project reports for the action projects. They decided the end-of-project reports should include data on both service quality improvement and cost avoidance. Thus, the task force was instrumental in guiding the Level IV evaluation, which showed that all projects were implemented, some with greater impact than others, but netting a positive return on the time and money invested.

Level IV Evaluation in Supervisory Training
A consulting firm contracted to do supervisory training in which principles of supervision were taught and implemented in action projects. Supervisors attending the course and their bosses were interviewed prior to the course and, with the help of the consultant, identified one or more action projects that they would work on during the training. The training introduced specific principles and then the supervisors worked on applying them in the action projects. The course met for about an hour per week over several weeks; the instructor went out on coaching expeditions between sessions. The action projects had built-in measures of performance improvement. Some of the performance measures, but not all, were such that dollar values of the projects were measured or could be estimated with reasonable precision. Level IV data came from the action projects.

Level IV Evaluation in a Service Function
Most colleges and universities offer reading improvement or study skills courses tailored to high risk students. Some, perhaps 100 out of the several thousand, use an approach, Learning-to-Learn, that involves Level IV evaluation. Level IV evaluation was used in developing the approach and is used in implementing it. During the course, students apply what they are learning to their other courses. Instructional staff work with students to evaluate the products they generate, a task made easier by the fact that many of the products (e.g., quizzes, homework assignments, term papers) are submitted for grades in other courses.
Level IV evaluation data include grade averages before, during, and after the Learning-to-Learn course, credit hours completed, and persistence to graduation. The credit hours completed (vs. courses dropped) improve, an economic loss to the college or the university and an economic gain to students; similarly, graduates of the course graduate from the institution in higher percentages than other students, an economic gain to the college or university. In a typical implementation of the course, the "hard" data are collected through occasional studies or on an exception basis. For example, an administrator in a college that teaches the course to all incoming students noticed a decline in graduation percentages a few years after the course had been implemented. The administrator shared these data with the instructional staff--who had indeed drifted into doing their own thing in the course--and did a little implementation evaluation to ensure the quality of the course and an increase in graduation rates.

Discussion of the Four Examples
All four examples use action projects and some form of success case methodology. The action projects are an integral part of the instruction rather than an add-on. (With the exception of the public workshops, training occurs over several weeks to allow for implementation of the projects during as well as after the training.) All four examples focus on relevant and answerable questions and include evaluation throughout the training cycle. An evaluation panel was explicitly used in one of the examples and could have been, probably should have been, used in the others.

The four examples were selected to show the practicality and utility of Level IV evaluation in a variety of settings: public workshops, a training program run by a consulting firm, an in-house training program, and a course taught within colleges and universities.

References

Brethower, D.M., and Smalley, K.A. (1992). "Converting to Performance-based Instruction." Performance & Instruction. 31 (4), 27-32.

Brinkerhoff, R.O. (1988). Achieving Results from Training: How to Evaluate Human Resource Development to Strengthen Programs and Increase Impact. San Francisco: Jossey-Bass.

Brinkerhoff, R.O. (1983). "The Success-case Method: A High-yield, Low-cost Evaluation Technique." Training and Development Journal. 37 (8), 58-59.

Brinkerhoff, R.O., Formella, L., and Smalley, K.A. (1994). "Total Quality Management Training for White-collar Workers." In J.J. Phillips (Ed.), Measuring Return on Investment. Alexandria, VA: American Society for Training and Development. 45-54.

Heiman, M., and Slomianko, J. (1992). Success in College and Beyond. Allston, MA: Learning to Learn, Inc.

Copyright © 1997 by McGraw-Hill. All rights reserved.

CWL Publishing Enterprises 3010 Irvington Way
Madison, WI 53713-3414
phone: 608 273-3710
fax: 608 274-4554
^{Email us! CWLPubEnt@execpc.com}

Return to the CWL Training and Performance Resources Page