Kirkpatrick Model: Complete Guide to Training Evaluation [2026]

The Kirkpatrick Model has been the dominant framework for training evaluation since Donald Kirkpatrick developed it in 1959. Updated by Kirkpatrick's son James into the "New World" Kirkpatrick Model, it provides four levels of measurement that together reveal whether training actually works. Despite being widely known, most organizations only measure Level 1 (learner satisfaction) — missing the most important questions about training effectiveness.

Organizations that systematically measure all four levels make dramatically better L&D decisions, invest 2–3x more effectively, and demonstrate clear business ROI. This guide covers each level in depth, how to measure it practically, and how to build an evaluation program that actually drives decision-making.

The Four Levels

Level 1: Reaction

What it measures: Did participants like the training?

Timing: Immediately after training.

Methods:

Post-training surveys
Focus groups
NPS-style ratings
Qualitative feedback

Key metrics:

Overall satisfaction
Content relevance
Facilitator effectiveness (if applicable)
Willingness to recommend
Engagement during session

Common questions:

How relevant was this training to your role?
How would you rate the instructor/content?
How likely are you to recommend this training?
What was most valuable?
What should be improved?

Level 2: Learning

What it measures: Did participants acquire the intended knowledge, skills, or attitudes?

Timing: Immediately before and after training (pre/post), and ideally again at 30 days.

Methods:

Pre/post knowledge assessments
Skill demonstrations
Confidence self-assessments
Behavior role-plays

Key metrics:

Pre vs. post test score improvement
Skill demonstration competency
Attitude shift (for attitudinal training)
Confidence level changes

Best practices:

Tie assessments to learning objectives
Measure at appropriate Bloom's level
Test knowledge with variety of methods
Follow up at 30 days to test retention

Level 3: Behavior

What it measures: Did participants apply what they learned on the job?

Timing: 30-90 days after training (some behaviors take longer to emerge).

Methods:

Observation (supervisor, peer, self)
360° feedback
Work product review
Customer/stakeholder feedback
Behavior frequency surveys

Key metrics:

Specific behavior frequency
Skill application quality
Sustained behavior change (6+ months)
Barriers to application

Example behaviors tracked:

For leadership training: frequency of 1:1s, feedback conversations
For customer service training: use of specific frameworks
For compliance training: incident reporting rates, policy adherence
For sales training: methodology adoption, sales activities

Critical: Behavior change requires manager reinforcement, coaching, and opportunity to practice. Training alone rarely produces sustained behavior change.

Level 4: Results

What it measures: Did training produce the intended business outcomes?

Timing: 6+ months after training (allow for behavior change to produce business results).

Methods:

Business metrics comparison
Control group analysis
ROI calculation
Strategic goal attainment

Key metrics:

Revenue impact
Cost reduction
Quality improvements
Customer satisfaction
Employee engagement
Retention rates
Productivity improvements

Example results tracked:

For leadership training: team engagement scores, team retention, team productivity
For sales training: revenue per rep, deal size, win rate
For compliance training: incident rate, audit findings, penalty avoidance
For customer service training: CSAT, churn rate, first-contact resolution

The "New World" Kirkpatrick Model

James Kirkpatrick (Donald's son) updated the model in 2006:

Key Enhancements

Reverse the design process: Traditional ADDIE moves 1→2→3→4. New World Kirkpatrick starts with Level 4 (what business result do we need?) and works backward.

Introduce "Required Drivers": Training alone rarely produces behavior change. "Required drivers" are the reinforcement systems needed:

Manager reinforcement
Accountability
Recognition
Practice opportunities
Coaching

Leading indicators: Measure "leading indicators" that predict Level 4 results before they fully materialize.

Focus on Level 3: The "New World" emphasizes Level 3 (behavior) as the critical link between training and results.

Building an Evaluation Program

Step 1: Define Level 4 First

Start with the business result you want:

"Reduce customer churn by 15%"
"Improve team engagement score from 65 to 75"
"Cut safety incidents by 40%"

Step 2: Identify Required Behaviors

What behaviors must change to produce that result?

For churn reduction: specific service interactions, complaint handling, proactive outreach
For engagement: manager behaviors (feedback, 1:1s, recognition)
For safety: hazard reporting, PPE use, safety observations

Step 3: Design Learning for Those Behaviors

Work backward to learning objectives that support the behaviors. See ADDIE model guide.

Step 4: Plan Measurement for Each Level

Level 1 (Reaction): Post-session surveys Level 2 (Learning): Pre/post assessments Level 3 (Behavior): 30/60/90 day follow-ups with observation Level 4 (Results): Business metrics comparison at 6 months

Step 5: Build Required Drivers

Manager briefings and coaching support
Recognition systems
Practice opportunities
Accountability systems
Consequences for non-application

Step 6: Execute and Iterate

Launch training with full evaluation program
Capture data at each level
Report and discuss findings
Adjust program based on what's working

Measurement Methods in Detail

Level 1: Reaction Measurement

Standard post-training survey:

Rate your agreement (1-5):

The content was relevant to my role
The training was engaging
I learned something useful
I would recommend this training
The instructor was effective (if applicable)

Open-ended:

What was most valuable?
What should be improved?
How might you apply this?

Benchmarks:

Typical satisfaction: 3.8-4.2 / 5
Strong training: 4.3+ / 5
"Delight" territory: 4.5+ / 5

Level 2: Learning Measurement

Pre-test approach:

Assess knowledge BEFORE training
Assess same content AFTER training
Calculate gain scores

Sample pre/post question: "When handling a customer complaint, the first step in the LEAPS framework is to:"

A) Solve the problem immediately
B) Listen actively
C) Document the interaction
D) Escalate to manager

Beyond knowledge:

Skills: demonstration assessments
Attitudes: pre/post surveys
Confidence: self-rating

Level 3: Behavior Measurement

Observation approach:

Supervisor observes specific behaviors
Frequency counts over defined period
Quality assessments of behaviors
Compared pre/post training

Self-report approach:

Learners report frequency of behaviors
Less reliable but easier to collect
Best combined with other methods

360° approach:

Multiple perspectives (manager, peer, direct report, self, customer)
Rated against behavioral competencies
Pre/post comparison

Digital tracking:

System usage metrics (for behaviors visible in systems)
Document/email analysis
Process compliance data

Level 4: Results Measurement

Control group comparison (gold standard):

Trained group vs. untrained control
Compare outcomes over time
Attribute differences to training

Pre/post with controls:

Before training metrics
After training metrics
Control for other variables

Benchmark comparison:

Industry benchmarks
Internal benchmarks (other teams)
Historical trends

Examples:

Sales rep trained in Q1 vs. similar rep hired in Q2 without training
Team receiving leadership development vs. control team
Stores receiving customer service training vs. not

Common Evaluation Mistakes

Mistake 1: Only Measuring Level 1

"Smile sheets" tell you almost nothing about training effectiveness.

Fix: Measure at least Levels 1, 2, and 3 for all significant training. Level 4 for strategic initiatives.

Mistake 2: No Pre-Training Measurement

Without baseline, you can't measure change.

Fix: Pre-training assessment of knowledge, skills, behaviors, and business metrics.

Mistake 3: Stopping Measurement After 30 Days

Behavior change emerges over 3-6 months. Business results often take 6-12 months.

Fix: Sustained measurement over 6+ months post-training.

Mistake 4: Not Controlling for Other Variables

Many things affect business outcomes. Attributing results to training alone is tricky.

Fix: Control groups when possible. Isolate variables. Use multiple methods.

Mistake 5: Evaluation as Afterthought

Evaluation designed after training is rolled out rarely captures what matters.

Fix: Design evaluation during training design (as part of ADDIE Design phase).

Mistake 6: Ignoring Required Drivers

Training with no reinforcement fails at Level 3/4 regardless of quality.

Fix: Invest in reinforcement systems alongside training.

ROI Calculation

Level 4 sometimes includes ROI calculation:

ROI = (Benefits - Costs) / Costs × 100%

Benefits to capture:

Revenue increase attributed to training
Cost reductions (fewer errors, less turnover, etc.)
Avoided losses (safety, compliance)
Productivity gains

Costs to include:

Training development and delivery
Participant time (labor cost)
Travel and logistics
Technology and platforms
Ongoing evaluation

Typical corporate training ROI: 3:1 to 7:1 for well-designed programs. Poor programs produce negative ROI.

See how to measure training ROI for detailed methodology.

Alternative Evaluation Frameworks

Phillips ROI Methodology

Adds Level 5 to Kirkpatrick: ROI calculation.

Brinkerhoff Success Case Method

Focus on top and bottom performers:

Who succeeded most with training? Why?
Who failed? Why?
What can we learn from both?

Kaufman's Model

Expands Kirkpatrick to include societal impact (Level 5).

CIRO Model

Context, Input, Reaction, Output — alternative European framework.

Most organizations use Kirkpatrick as foundation and adapt as needed.

FAQs

Do I need to measure all 4 levels for every training?

No. Minor/short training: Levels 1 and 2 sufficient. Major investments: all 4. High-stakes/expensive: all 4 with rigorous methodology.

What about Level 0? (Learning events attendance)

Some frameworks include "Level 0" for baseline metrics like attendance, completion rates, cost per learner. Useful as operational metrics but don't reveal effectiveness.

How long after training should I measure Level 3?

30 days minimum. 90 days better. 6 months for sustained behavior.

Can AI help with evaluation?

Yes:

Automated survey analysis (NLP on open-ended responses)
Behavior pattern detection in system data
Predictive analytics (which learners need additional support)
Personalized follow-up

What if I can't do control groups?

Use pre/post comparison, benchmark against similar groups, or use time-series analysis. Control groups are ideal but not always feasible.

Getting Started with Konstantly

Konstantly's analytics support Kirkpatrick evaluation natively:

Level 1: Built-in satisfaction surveys
Level 2: Pre/post assessment comparison
Level 3: Long-term engagement and behavior tracking via integrations
Level 4: Business metric correlation through data export

Free Plan

10 users, 5 courses
Basic evaluation tools

Business Plan — $24/month

Advanced analytics
Survey tools
API for data export

Enterprise Plan

Full analytics suite
Custom reporting

Create Free Account → · Contact Sales →

Related Resources

Platform:

Ready to evaluate training at all four levels? Start free today — or contact our team.

The Four Levels

Level 1: Reaction

Level 2: Learning

Level 3: Behavior

Level 4: Results

The "New World" Kirkpatrick Model

Key Enhancements

Building an Evaluation Program

Step 1: Define Level 4 First

Step 2: Identify Required Behaviors

Step 3: Design Learning for Those Behaviors

Step 4: Plan Measurement for Each Level

Step 5: Build Required Drivers

Step 6: Execute and Iterate

Measurement Methods in Detail

Level 1: Reaction Measurement

Level 2: Learning Measurement

Level 3: Behavior Measurement

Level 4: Results Measurement

Common Evaluation Mistakes

Mistake 1: Only Measuring Level 1

Mistake 2: No Pre-Training Measurement

Mistake 3: Stopping Measurement After 30 Days

Mistake 4: Not Controlling for Other Variables

Mistake 5: Evaluation as Afterthought

Mistake 6: Ignoring Required Drivers

ROI Calculation

Alternative Evaluation Frameworks

Phillips ROI Methodology

Brinkerhoff Success Case Method

Kaufman's Model

CIRO Model

FAQs

Do I need to measure all 4 levels for every training?

What about Level 0? (Learning events attendance)

How long after training should I measure Level 3?

Can AI help with evaluation?

What if I can't do control groups?

Getting Started with Konstantly

Free Plan

Business Plan — $24/month

Enterprise Plan

Related Resources

Sources