Anthropic Introduces an “AI Fluency Index” to Measure How Well People Use AI — Not Just How Much
A new education report finds that the strongest marker of “good AI use” is iteration and refinement, while polished outputs like code and apps can quietly reduce users’ critical checking.

What Happened (Facts)
Anthropic published a new education report on 23 Feb 2026 titled “The AI Fluency Index.” The report starts from a simple premise: AI adoption is accelerating, but adoption alone doesn’t tell us whether people are using AI well. The key question, Anthropic argues, is whether individuals are developing AI fluency—the skills needed for safe, effective collaboration with AI tools as they become embedded in daily work.
Anthropic says earlier education research looked at how students and educators use Claude (students for reports and lab analysis; educators for lesson materials and routine automation). This new report tries to measure something more general: how AI users build skill and judgment over time.
How Anthropic measured “AI fluency”
Anthropic’s methodology uses the 4D AI Fluency Framework, developed by Professors Rick Dakan and Joseph Feller in collaboration with Anthropic. The framework defines 24 behaviors intended to represent effective human–AI collaboration.
However, Anthropic notes that only 11 of these behaviors are directly observable in Claude chat logs on Claude.ai (and Claude Code). The remaining 13 behaviors include important ethical and responsible-use dimensions—such as disclosure about AI’s role in work and considering downstream consequences—which typically happen outside the chat window and are harder to measure at scale.
For this initial index, Anthropic analyzed 9,830 anonymized, multi-turn conversations on Claude.ai over a 7-day window in January 2026, using a “privacy-preserving analysis tool.” Each conversation was scored for whether each of the 11 observable behaviors was present or absent, and a single conversation could include multiple behaviors. Anthropic says it checked stability across days and languages and found the results were consistent.
The outcome is framed as a baseline: the AI Fluency Index, which Anthropic intends to use as a reference point to track how AI collaboration changes as models and users evolve.
Two key findings in the report
Anthropic says it found two main patterns:
1) Iteration and refinement correlates strongly with other fluency behaviors
85.7% of conversations showed iteration and refinement (users building on previous exchanges rather than taking the first response and moving on).
Conversations with iteration and refinement showed substantially higher rates of other fluency behaviors.
On average, iterative conversations exhibited 2.67 additional fluency behaviors, compared with 1.33 in non-iterative conversations—about double.
Iterative conversations were 5.6× more likely to include questioning the model’s reasoning, and 4× more likely to include identifying missing context.
Anthropic interprets this as evidence that treating AI as a thought partner—rather than a one-shot answer machine—is associated with better habits.
2) When Claude produces “artifacts,” users become more directive but less evaluative
Anthropic defines “artifacts” broadly: outputs such as apps, code, documents, interactive tools, and similar deliverables.
12.3% of conversations involved artifacts.
In these artifact conversations, users were more likely to:
clarify their goal (+14.7 percentage points)
specify a format (+14.5pp)
provide examples (+13.4pp)
iterate (+9.7pp)
So users do more upfront directing when they want a concrete output.
But Anthropic found a drop in discernment behaviors:
users were less likely to identify missing context (-5.2pp)
less likely to check facts (-3.7pp)
less likely to ask for the model’s reasoning (-3.1pp)
Anthropic notes this is particularly important because its Economic Index found that models struggle more on complex tasks—exactly the kinds of tasks that often produce artifacts.
Anthropic’s practical guidance to users
Based on these patterns, Anthropic suggests three habits to improve AI fluency:
Stay in the conversation (treat first output as a draft; iterate)
Question polished outputs (when it looks finished, pause and check accuracy, missing context, and reasoning)
Set collaboration terms up front (only about 30% of conversations include users explicitly instructing Claude how to interact; Anthropic suggests prompts like “push back if my assumptions are wrong” or “tell me what you’re uncertain about”)
Limitations acknowledged
Anthropic highlights several constraints:
The sample is from Claude.ai users in one week and likely skews toward early adopters.
Only 11 of 24 framework behaviors were measured (the rest are outside-chat ethical behaviors).
Behaviors are classified binary (present/absent), missing nuance.
Some evaluation may happen silently (or off-platform) rather than in-chat.
Findings are correlational, not causal.
Anthropic says future work will include cohort analyses (new users vs experienced), qualitative research into unobservable behaviors, and exploration of causal interventions. It also plans deeper study into Claude Code, which has a different developer-heavy user base.
What Is Analysis (Interpretation)
Anthropic’s report is interesting because it treats “AI skill” as a measurable behavioral pattern rather than a vibe. And its central finding—iteration predicts fluency—has implications beyond Claude: it describes a general truth about interacting with powerful tools.
1) “Fluency” is becoming the real digital divide
Early in the internet era, the divide was access: who had a connection. Later it became literacy: who could search, verify, and evaluate sources. Anthropic’s framing suggests the next divide is AI fluency: who can steer models effectively and maintain skepticism.
Two people can use the same AI tool; one becomes faster and smarter, the other becomes more passive and error-prone. The difference isn’t the model—it’s the collaboration habits.
2) The “polished output trap” is the most important warning
The report’s most consequential finding may be the artifact pattern: when the output looks like a finished product (code, app, document), users become less likely to question it. That’s a psychological effect, not a technical one. “It looks done” becomes a proxy for “it’s correct.”
This matters because modern models increasingly produce outputs that are aesthetically complete even when logically fragile. If the AI makes the output feel professional, people may assume it is safe. That’s precisely when errors become dangerous—especially in domains like coding, policy, health information, or compliance docs where mistakes don’t look obvious.
3) Iteration isn’t just better prompting—it’s better governance
Anthropic’s “stay in the conversation” advice is more than a productivity tip. Iteration creates checkpoints: it forces the user to re-engage, clarify assumptions, and interrogate uncertainty. In a sense, it functions like a lightweight audit process.
If organizations want safer AI adoption, this suggests training shouldn’t focus only on “prompt templates.” It should teach a habit loop:
draft → challenge → refine → verify → finalize
4) The missing half of fluency is ethics—and it’s the hardest to measure
Anthropic acknowledges that 13 of 24 behaviors aren’t visible in the chat interface—and those include the most socially consequential ones (disclosure, downstream consequences, responsible sharing). That’s a major limitation, but also an honest one.
It hints at a future problem: companies may optimize for the measurable behaviors (iteration, formatting, instruction clarity) while overlooking the harder-to-measure ones (accountability, disclosure, harm consideration). In other words, we risk building a world full of “effective” AI users who are not necessarily “responsible” AI users.
5) This is also a product signal
Finally, it’s worth reading the report as product strategy. If Anthropic can show that good collaboration habits correlate with better outcomes, it can justify building tools that encourage iteration, prompt users to verify, or nudge them to set interaction terms. The report provides a research-backed rationale for UI and policy changes that push users toward safer patterns—without having to frame it as moral policing.




Comments
There are no comments for this story
Be the first to respond and start the conversation.