Kirkpatrick Model: Complete Guide to Training Evaluation [2026]
Master Kirkpatrick's 4 levels of training evaluation: Reaction, Learning, Behavior, Results. Practical templates and measurement methods.
The Kirkpatrick Model has been the dominant framework for training evaluation since Donald Kirkpatrick developed it in 1959. Updated by Kirkpatrick's son James into the "New World" Kirkpatrick Model, it provides four levels of measurement that together reveal whether training actually works. Despite being widely known, most organizations only measure Level 1 (learner satisfaction) — missing the most important questions about training effectiveness.
Organizations that systematically measure all four levels make dramatically better L&D decisions, invest 2–3x more effectively, and demonstrate clear business ROI. This guide covers each level in depth, how to measure it practically, and how to build an evaluation program that actually drives decision-making.
The Four Levels
Level 1: Reaction
What it measures: Did participants like the training?
Timing: Immediately after training.
Methods:
- Post-training surveys
- Focus groups
- NPS-style ratings
- Qualitative feedback
Key metrics:
- Overall satisfaction
- Content relevance
- Facilitator effectiveness (if applicable)
- Willingness to recommend
- Engagement during session
Common questions:
- How relevant was this training to your role?
- How would you rate the instructor/content?
- How likely are you to recommend this training?
- What was most valuable?
- What should be improved?
Level 2: Learning
What it measures: Did participants acquire the intended knowledge, skills, or attitudes?
Timing: Immediately before and after training (pre/post), and ideally again at 30 days.
Methods:
- Pre/post knowledge assessments
- Skill demonstrations
- Confidence self-assessments
- Behavior role-plays
Key metrics:
- Pre vs. post test score improvement
- Skill demonstration competency
- Attitude shift (for attitudinal training)
- Confidence level changes
Best practices:
- Tie assessments to learning objectives
- Measure at appropriate Bloom's level
- Test knowledge with variety of methods
- Follow up at 30 days to test retention
Level 3: Behavior
What it measures: Did participants apply what they learned on the job?
Timing: 30-90 days after training (some behaviors take longer to emerge).
Methods:
- Observation (supervisor, peer, self)
- 360° feedback
- Work product review
- Customer/stakeholder feedback
- Behavior frequency surveys
Key metrics:
- Specific behavior frequency
- Skill application quality
- Sustained behavior change (6+ months)
- Barriers to application
Example behaviors tracked:
- For leadership training: frequency of 1:1s, feedback conversations
- For customer service training: use of specific frameworks
- For compliance training: incident reporting rates, policy adherence
- For sales training: methodology adoption, sales activities
Critical: Behavior change requires manager reinforcement, coaching, and opportunity to practice. Training alone rarely produces sustained behavior change.
Level 4: Results
What it measures: Did training produce the intended business outcomes?
Timing: 6+ months after training (allow for behavior change to produce business results).
Methods:
- Business metrics comparison
- Control group analysis
- ROI calculation
- Strategic goal attainment
Key metrics:
- Revenue impact
- Cost reduction
- Quality improvements
- Customer satisfaction
- Employee engagement
- Retention rates
- Productivity improvements
Example results tracked:
- For leadership training: team engagement scores, team retention, team productivity
- For sales training: revenue per rep, deal size, win rate
- For compliance training: incident rate, audit findings, penalty avoidance
- For customer service training: CSAT, churn rate, first-contact resolution
The "New World" Kirkpatrick Model
James Kirkpatrick (Donald's son) updated the model in 2006:
Key Enhancements
Reverse the design process: Traditional ADDIE moves 1→2→3→4. New World Kirkpatrick starts with Level 4 (what business result do we need?) and works backward.
Introduce "Required Drivers": Training alone rarely produces behavior change. "Required drivers" are the reinforcement systems needed:
- Manager reinforcement
- Accountability
- Recognition
- Practice opportunities
- Coaching
Leading indicators: Measure "leading indicators" that predict Level 4 results before they fully materialize.
Focus on Level 3: The "New World" emphasizes Level 3 (behavior) as the critical link between training and results.
Building an Evaluation Program
Step 1: Define Level 4 First
Start with the business result you want:
- "Reduce customer churn by 15%"
- "Improve team engagement score from 65 to 75"
- "Cut safety incidents by 40%"
Step 2: Identify Required Behaviors
What behaviors must change to produce that result?
- For churn reduction: specific service interactions, complaint handling, proactive outreach
- For engagement: manager behaviors (feedback, 1:1s, recognition)
- For safety: hazard reporting, PPE use, safety observations
Step 3: Design Learning for Those Behaviors
Work backward to learning objectives that support the behaviors. See ADDIE model guide.
Step 4: Plan Measurement for Each Level
Level 1 (Reaction): Post-session surveys Level 2 (Learning): Pre/post assessments Level 3 (Behavior): 30/60/90 day follow-ups with observation Level 4 (Results): Business metrics comparison at 6 months
Step 5: Build Required Drivers
- Manager briefings and coaching support
- Recognition systems
- Practice opportunities
- Accountability systems
- Consequences for non-application
Step 6: Execute and Iterate
- Launch training with full evaluation program
- Capture data at each level
- Report and discuss findings
- Adjust program based on what's working
Measurement Methods in Detail
Level 1: Reaction Measurement
Standard post-training survey:
Rate your agreement (1-5):
- The content was relevant to my role
- The training was engaging
- I learned something useful
- I would recommend this training
- The instructor was effective (if applicable)
Open-ended:
- What was most valuable?
- What should be improved?
- How might you apply this?
Benchmarks:
- Typical satisfaction: 3.8-4.2 / 5
- Strong training: 4.3+ / 5
- "Delight" territory: 4.5+ / 5
Level 2: Learning Measurement
Pre-test approach:
- Assess knowledge BEFORE training
- Assess same content AFTER training
- Calculate gain scores
Sample pre/post question: "When handling a customer complaint, the first step in the LEAPS framework is to:"
- A) Solve the problem immediately
- B) Listen actively
- C) Document the interaction
- D) Escalate to manager
Beyond knowledge:
- Skills: demonstration assessments
- Attitudes: pre/post surveys
- Confidence: self-rating
Level 3: Behavior Measurement
Observation approach:
- Supervisor observes specific behaviors
- Frequency counts over defined period
- Quality assessments of behaviors
- Compared pre/post training
Self-report approach:
- Learners report frequency of behaviors
- Less reliable but easier to collect
- Best combined with other methods
360° approach:
- Multiple perspectives (manager, peer, direct report, self, customer)
- Rated against behavioral competencies
- Pre/post comparison
Digital tracking:
- System usage metrics (for behaviors visible in systems)
- Document/email analysis
- Process compliance data
Level 4: Results Measurement
Control group comparison (gold standard):
- Trained group vs. untrained control
- Compare outcomes over time
- Attribute differences to training
Pre/post with controls:
- Before training metrics
- After training metrics
- Control for other variables
Benchmark comparison:
- Industry benchmarks
- Internal benchmarks (other teams)
- Historical trends
Examples:
- Sales rep trained in Q1 vs. similar rep hired in Q2 without training
- Team receiving leadership development vs. control team
- Stores receiving customer service training vs. not
Common Evaluation Mistakes
Mistake 1: Only Measuring Level 1
"Smile sheets" tell you almost nothing about training effectiveness.
Fix: Measure at least Levels 1, 2, and 3 for all significant training. Level 4 for strategic initiatives.
Mistake 2: No Pre-Training Measurement
Without baseline, you can't measure change.
Fix: Pre-training assessment of knowledge, skills, behaviors, and business metrics.
Mistake 3: Stopping Measurement After 30 Days
Behavior change emerges over 3-6 months. Business results often take 6-12 months.
Fix: Sustained measurement over 6+ months post-training.
Mistake 4: Not Controlling for Other Variables
Many things affect business outcomes. Attributing results to training alone is tricky.
Fix: Control groups when possible. Isolate variables. Use multiple methods.
Mistake 5: Evaluation as Afterthought
Evaluation designed after training is rolled out rarely captures what matters.
Fix: Design evaluation during training design (as part of ADDIE Design phase).
Mistake 6: Ignoring Required Drivers
Training with no reinforcement fails at Level 3/4 regardless of quality.
Fix: Invest in reinforcement systems alongside training.
ROI Calculation
Level 4 sometimes includes ROI calculation:
ROI = (Benefits - Costs) / Costs × 100%
Benefits to capture:
- Revenue increase attributed to training
- Cost reductions (fewer errors, less turnover, etc.)
- Avoided losses (safety, compliance)
- Productivity gains
Costs to include:
- Training development and delivery
- Participant time (labor cost)
- Travel and logistics
- Technology and platforms
- Ongoing evaluation
Typical corporate training ROI: 3:1 to 7:1 for well-designed programs. Poor programs produce negative ROI.
See how to measure training ROI for detailed methodology.
Alternative Evaluation Frameworks
Phillips ROI Methodology
Adds Level 5 to Kirkpatrick: ROI calculation.
Brinkerhoff Success Case Method
Focus on top and bottom performers:
- Who succeeded most with training? Why?
- Who failed? Why?
- What can we learn from both?
Kaufman's Model
Expands Kirkpatrick to include societal impact (Level 5).
CIRO Model
Context, Input, Reaction, Output — alternative European framework.
Most organizations use Kirkpatrick as foundation and adapt as needed.
FAQs
Do I need to measure all 4 levels for every training?
No. Minor/short training: Levels 1 and 2 sufficient. Major investments: all 4. High-stakes/expensive: all 4 with rigorous methodology.
What about Level 0? (Learning events attendance)
Some frameworks include "Level 0" for baseline metrics like attendance, completion rates, cost per learner. Useful as operational metrics but don't reveal effectiveness.
How long after training should I measure Level 3?
30 days minimum. 90 days better. 6 months for sustained behavior.
Can AI help with evaluation?
Yes:
- Automated survey analysis (NLP on open-ended responses)
- Behavior pattern detection in system data
- Predictive analytics (which learners need additional support)
- Personalized follow-up
What if I can't do control groups?
Use pre/post comparison, benchmark against similar groups, or use time-series analysis. Control groups are ideal but not always feasible.
Getting Started with Konstantly
Konstantly's analytics support Kirkpatrick evaluation natively:
- Level 1: Built-in satisfaction surveys
- Level 2: Pre/post assessment comparison
- Level 3: Long-term engagement and behavior tracking via integrations
- Level 4: Business metric correlation through data export
Free Plan
- 10 users, 5 courses
- Basic evaluation tools
Business Plan — $24/month
- Advanced analytics
- Survey tools
- API for data export
Enterprise Plan
- Full analytics suite
- Custom reporting
Create Free Account → · Contact Sales →
Related Resources
- ADDIE Model Complete Guide
- Bloom's Taxonomy Guide
- Adult Learning Theory Guide
- Learning Analytics Complete Guide
- How to Measure Training ROI
- Training Needs Analysis Guide
Platform:
Ready to evaluate training at all four levels? Start free today — or contact our team.