Anthropic published its AI Fluency Index on February 23, 2026. The study tracked 11 observable behaviors across 9,830 Claude conversations using the 4D AI Fluency Framework developed by Dakan and Feller. Two findings stood out. First, users who iterated and refined, treating initial outputs as starting points, exhibited more than double the rate of other fluency behaviors. Second, when AI produced polished artifacts like code or documents, users became more directive but less evaluative: 3.1 percentage points less likely to question reasoning, 5.2 points less likely to identify missing context (Swanson et al., 2026).
I have been thinking about these constructs for two years. When my colleagues and I extended the UTAUT framework to study AI adoption among 2,257 professionals (Wolfe et al., 2025), we found that 56% had adopted AI tools, but the variance in how deeply adoption translated to workflow change was enormous. The question we kept circling is the same one Anthropic is now asking: what separates people who use AI from people who use AI well? Anthropic calls the answer fluency. I want to examine what that construct contains, what it misses, and what it means for collaborative intelligence.
The hypothesis: AI fluency as operationalized in the Anthropic study captures real variation in how individuals interact with AI, but it measures a construct closer to individual proficiency than to the collaborative intelligence mechanisms (specifically co-evolution) that determine whether human-AI interaction produces durable capability growth.
Three Takeaways
First, the iteration finding is the strongest signal in the study, and it maps directly onto the distinction between single-loop and double-loop learning that I have argued is the key to understanding AI value. Argyris (1977) distinguished single-loop learning (correcting errors within existing frames) from double-loop learning (questioning the frames themselves). Users who accept the first output are single-loop: the AI produced something adequate, move on. Users who iterate approach double-loop territory: evaluating whether the frame is right, whether the question was well-posed, whether the output reflects what they actually need. Anthropic found iterative conversations were 5.6 times more likely to involve questioning AI reasoning. That is not marginal. It is a qualitative shift. In our UTAUT extension (Wolfe et al., 2025), we found a similar gap: organizational level predicted who adopted AI, but not who integrated it in ways that changed outcomes. Fluency may be the individual-level construct that explains that variance.
Second, the artifact finding confirms what I have been calling the polished output problem, and it connects directly to Shaw et al.'s cognitive surrender research. When AI produces something that looks finished, people stop evaluating. Anthropic found that artifact conversations showed higher directive behaviors but lower discernment: users were less likely to question reasoning, check facts, or identify gaps. Shaw et al. (2026) documented this in controlled experiments: participants with AI access adopted outputs with minimal scrutiny, and when the AI was deliberately inaccurate, participant accuracy dropped 15 percentage points. They called it cognitive surrender. Anthropic's finding is the naturalistic version observed in real conversations. The implication is direct: the more capable AI becomes at producing polished outputs, the more critical it becomes to design workflows that maintain human evaluative judgment. Fluency without discernment is not fluency. It is sophisticated delegation.
Third, what the study cannot capture (by its own admission) may matter more than what it does, and this is where the construct needs to be extended toward collaborative intelligence. Anthropic assessed 11 of 24 behavioral indicators. The 13 unobservable behaviors include honesty about AI's role in work, evaluating outputs through external channels, and considering consequences of sharing AI-generated content. These are not peripheral. They determine whether fluency operates within an ethical and organizational context or exists as a purely individual skill. More importantly, the study measures fluency at a single point within individual conversations. It cannot tell us whether fluency develops over time, whether capability changes because of sustained interaction, or whether fluent users reshape the systems around them. That longitudinal, bidirectional dynamic is co-evolution, and it remains unmeasured.
The Longer View
Psychometrics provides the foundational question. Cronbach and Meehl (1955) established that construct validity requires demonstrating that an instrument captures the phenomenon it claims to capture, not just a correlate. Anthropic's fluency index has strong face validity: the behaviors it tracks are intuitively related to effective AI use. The construct validity question is whether these behaviors predict outcomes that matter (improved decision quality, sustained performance improvement) or correlate with effectiveness without causing it. This is the same critique I have made of adoption metrics: measuring behavior is not the same as measuring the construct the behavior represents.
Organizational learning theory identifies the level-of-analysis gap. Fluency as measured here is an individual construct. But organizational value from human-AI collaboration is produced at the workflow, team, and system level. Nonaka and Takeuchi (1995) showed that knowledge creation depends on dynamic interaction between tacit and explicit knowledge across organizational levels. An individually fluent user in an organization with no mechanisms for sharing what they learn produces individual value but not organizational capability. The construct connecting individual fluency to organizational value is co-evolution: individual learning with AI feeding back into systemic improvement.
My Two Cents
I want to be precise about what Anthropic got right. This is serious measurement work: privacy-preserving analysis, reliability tested across days and languages, limitations honestly acknowledged, methodology published transparently. The 4D framework is a genuine contribution. And the two core findings (that iteration predicts fluency and polished outputs suppress evaluation) align with what organizational psychology predicts and what I have observed in enterprise AI programs.
What I want to push on is the construct's boundary. Fluency is necessary but not sufficient. An organization full of individually fluent AI users who do not share what they learn, do not reshape workflows, and do not feed evaluative insights back into the systems they use will plateau. The construct bridging individual fluency and organizational value is collaborative intelligence, and the mechanism producing compounding returns is co-evolution. Anthropic has built the baseline. The next measurement challenge is whether fluency leads to co-evolution, or whether it can exist without it.
If you are an AI researcher: extend fluency measurement longitudinally. Track the same users over months. Ask whether fluency behaviors increase, transfer across tasks, and change the organizational systems around them. If you are an AI leader: do not treat fluency as a training problem. Build the workflows and knowledge-sharing infrastructure that translate individual fluency into organizational capability. Anthropic gives you the what. The how is an organizational design problem.
Read to Learn More
Academic: Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.
Industry: Swanson, K., Bent, D., Huang, S., Ludwig, Z., Dakan, R., & Feller, J. (2026). Anthropic education report: The AI fluency index. Anthropic.
References
Argyris, C. (1977). Double loop learning in organizations. Harvard Business Review, 55(5), 115–125.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.
Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating company. Oxford University Press.
Shaw, S. D., et al. (2026). Thinking (fast, slow, and artificial): How AI is reshaping human reasoning and the rise of cognitive surrender. OSF Preprints. https://doi.org/10.31234/osf.io/yk25n
Swanson, K., Bent, D., Huang, S., Ludwig, Z., Dakan, R., & Feller, J. (2026). Anthropic education report: The AI fluency index. Anthropic.
Wolfe, D., Price, M., Choe, A., Kidd, F., & Wagner, H. (2025). Revisiting UTAUT for the age of AI: Understanding employees' AI adoption and usage patterns through an extended UTAUT framework. arXiv preprint arXiv:2510.15142.