Are We Measuring the Right Thing When We Measure AI Adoption?: Diana Wolfe

← Back to Blog

Adoption is the metric everyone tracks. But what does it actually tell you? That someone logged in. That a button was clicked. IO research on training transfer has shown for years that exposure alone does not produce behavior change. You need reinforcement, feedback loops, environmental support. The organizations that scale AI successfully are not measuring adoption. They are measuring learning velocity: how fast teams integrate new capabilities into how they actually work.

The hypothesis: current adoption metrics, namely usage rates, frequency, and satisfaction scores, are measuring the wrong construct. What organizations need to track is learning velocity: the rate at which teams incorporate AI capabilities into core task performance.

Three Takeaways

First, adoption metrics inherit a fundamental attribution error. When we measure whether someone used a tool, we attribute the outcome to the individual. But Baldwin and Ford (1988) demonstrated in their influential model of training transfer that the work environment accounts for a substantial portion of whether learned behaviors persist. If the team structure, incentive system, or managerial expectations do not reinforce AI-augmented work, individual adoption is fragile regardless of how many times someone logs in.

Second, learning velocity captures something adoption metrics miss: integration depth. Argyris (1977) distinguished between single-loop learning (correcting errors within existing frames) and double-loop learning (questioning the frames themselves). An employee who uses an AI tool to do the same task faster is engaged in single-loop learning. An employee who uses the tool and then redesigns how they approach the task entirely is engaged in double-loop learning. Both show up as "adoption." Only one represents transformative integration.

Third, measurement systems shape behavior. Campbell's Law, articulated by Donald Campbell (1979), states that the more a quantitative indicator is used for decision-making, the more subject it becomes to corruption pressures and the more likely it is to distort the process it is intended to monitor. When organizations tie performance reviews or team metrics to AI adoption numbers, they create incentives for performative use: opening the tool, generating outputs that go unused, checking the box without changing the work.

The Longer View

The Kirkpatrick Model (1959), originally developed for training evaluation, offers a useful parallel. Kirkpatrick proposed four levels: reaction (did they like it?), learning (did they acquire knowledge?), behavior (did they change what they do?), and results (did it affect organizational outcomes?). Most AI adoption metrics operate at Level 1, occasionally Level 2. Almost none systematically measure Level 3 or Level 4. The organizations that do are the ones reporting genuine return on AI investment.

From measurement theory in psychometrics, the concept of construct validity asks whether we are measuring what we think we are measuring (Cronbach & Meehl, 1955). "AI adoption" as typically operationalized (logins, sessions, feature usage) has face validity but questionable construct validity. The construct we actually care about, productive integration of AI into work, requires different indicators: changes in task approach, shifts in time allocation, new outputs that were previously impossible.

My Two Cents

I have watched organizations celebrate 90% adoption rates while their AI programs delivered almost no measurable business value. The numbers looked impressive in the quarterly review. But when we dug into what was actually happening, most usage was shallow, habitual, and disconnected from the work that mattered. The teams that quietly achieved deep integration, often with lower headline adoption numbers, were the ones driving real outcomes. We were measuring the wrong thing and making decisions accordingly.

Try This

Consider building a learning velocity dashboard alongside your adoption metrics. Track three things: how many teams have changed a workflow because of AI (behavior change), how many new outputs are being produced that were previously impossible (capability expansion), and how fast the gap between AI deployment and workflow integration is closing over time (velocity). These are harder to measure. They are also the only metrics that tell you whether your investment is working.

Read to Learn More

Academic: Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41(1), 63-105.

Industry: Davenport, T. H., & Ronanki, R. (2018). Artificial intelligence for the real world. Harvard Business Review, 96(1), 108-116.

References

Argyris, C. (1977). Double loop learning in organizations. Harvard Business Review, 55(5), 115-125.

Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41(1), 63-105.

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67-90.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302.

Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of the American Society of Training Directors, 13(11), 3-9.