Anthropic Warns of “Industrial-Scale” Claude Distillation Attacks by Rival AI Labs
The company says DeepSeek, Moonshot, and MiniMax generated 16M+ Claude exchanges via 24,000 fraudulent accounts—raising concerns about safety “stripping,” export controls, and a widening gray market for frontier-model access.

What Happened (Facts)
In a post dated 23 Feb 2026, Anthropic says it uncovered three large-scale campaigns aimed at illicitly extracting (“distilling”) Claude’s capabilities to improve competitors’ models. Anthropic attributes the campaigns to DeepSeek, Moonshot, and MiniMax, alleging they collectively generated over 16 million exchanges with Claude using roughly 24,000 fraudulent accounts, violating Anthropic’s terms of service and regional access restrictions.
Anthropic explains that distillation is a common and legitimate training technique: a weaker model is trained on outputs of a stronger model. Frontier labs often distill their own models to make smaller, cheaper versions for customers. The concern here is illicit distillation—using another company’s model outputs to accelerate capability gains “at a fraction of the time and cost” of building those capabilities independently.
Anthropic says these campaigns are increasing in “intensity and sophistication,” and argues the threat is broader than any single company or region—requiring “rapid, coordinated action” across industry, cloud providers, policymakers, and the wider AI community.
Why Anthropic says it matters
Anthropic argues that illicitly distilled models are likely to lack critical safety safeguards, creating potential national security risks. The post claims US AI labs build systems intended to prevent misuse in areas like bioweapons development and malicious cyber activity, and that distillation could copy “capabilities” while shedding protections. Anthropic also warns that if such models are then used by military, intelligence, or surveillance systems—or open-sourced—the risks multiply.
Export controls angle
Anthropic connects distillation to export controls on advanced chips. It argues distillation undermines export controls by allowing foreign labs (including those “subject to the control of the Chinese Communist Party,” per the post) to close capability gaps without building everything from scratch. At the same time, Anthropic claims the ability to execute distillation at scale still requires access to advanced compute—so detecting these attacks should reinforce the rationale for restricting high-end chip access.
What Anthropic says it found
Anthropic says each campaign followed a broadly similar playbook:
Use fraudulent accounts and proxy services to access Claude at scale while evading detection
Generate large volumes of highly structured, repetitive prompts designed to extract specific capabilities rather than serve normal user needs
Focus on Claude’s “most differentiated capabilities”: agentic reasoning, tool use, and coding
Anthropic claims it attributed each campaign “with high confidence” using indicators such as IP address correlation, request metadata, infrastructure indicators, and—in some cases—corroboration from industry partners.
DeepSeek (150,000+ exchanges)
Anthropic says DeepSeek targeted:
reasoning across diverse tasks
rubric-based grading tasks (to make Claude function like a reward model for reinforcement learning)
censorship-safe alternatives to policy-sensitive queries
Anthropic also describes coordinated traffic patterns consistent with “load balancing” across accounts. It says it observed prompts attempting to make Claude articulate internal reasoning step-by-step to generate “reasoning training data” at scale.
Moonshot AI (3.4M+ exchanges)
Anthropic says Moonshot targeted:
agentic reasoning and tool use
coding and data analysis
computer-use agent development
computer vision
Anthropic claims Moonshot used “hundreds” of fraudulent accounts across multiple access pathways, making detection harder. It also alleges Moonshot later tried a more targeted approach to reconstruct Claude’s reasoning traces.
MiniMax (13M+ exchanges)
Anthropic says MiniMax focused on:
agentic coding
tool use and orchestration
Anthropic claims it detected this campaign while still active, gaining visibility into the lifecycle from data generation to model launch. It also says MiniMax pivoted quickly when Anthropic released a new model—redirecting significant traffic within 24 hours to extract capabilities from the latest system.
How distillers allegedly got access
nthropic says it does not offer commercial access to Claude in China (or to certain subsidiaries). It alleges labs circumvented this using commercial proxy services that resell access at scale, operating “hydra cluster” networks—large pools of fraudulent accounts that distribute traffic across Anthropic’s API and third-party cloud platforms. When one account is banned, another replaces it, and the proxy traffic is mixed with normal traffic to blend in.
Anthropic’s response measures
Anthropic says it is strengthening defenses via:
Detection: classifiers + behavioral fingerprinting to spot distillation patterns (including chain-of-thought elicitation patterns)
Intelligence sharing: sharing indicators with other labs, cloud providers, and relevant authorities
Access controls: stricter verification for common account-creation pathways
Countermeasures: product/API/model-level steps to reduce the usefulness of outputs for illicit distillation without harming legitimate customers
What Is Analysis (Interpretation)
Anthropic’s post is one of the clearest public claims yet that “model extraction” is evolving from isolated abuse into industrial competition tactics. If the numbers are accurate—16M+ exchanges via 24k accounts—this looks less like opportunistic scraping and more like an organized training pipeline.
1) The real strategic risk is “capabilities without constraints”
Anthropic’s strongest point is not that rivals copy answers—it’s that they may copy capabilities while discarding or weakening safety behaviors and governance. Even if distilled models are less capable overall, losing safety guardrails could make them more dangerous per unit of capability—especially if they become widely available or embedded into sensitive state systems.
2) The gray market for frontier-model access is becoming infrastructure
The “proxy services + hydra clusters” description suggests an ecosystem that may persist regardless of any single lab’s actions. If access brokering is profitable, the market will keep innovating around verification and bans. This becomes a platform-level issue—similar to spam or ad fraud—where enforcement is a constant arms race.
3) Attribution and incentives will be contested
Anthropic claims high-confidence attribution via metadata and partner corroboration, but these claims will inevitably be disputed by accused parties. Also, the public framing serves multiple incentives for Anthropic:
deterring abuse by showing detection capability
shaping policy debates around export controls
signaling to enterprise and government buyers that Anthropic takes model security seriously
None of that makes the claims false—but it does mean the post is both security disclosure and strategic messaging.
4) Defensive measures may reshape how frontier models are offered
If distillation attacks scale, labs may respond by tightening access and reducing “extractability.” That could mean more KYC-like verification, stricter rate limits, more aggressive anomaly detection, and potentially output transformations that preserve usefulness for humans but reduce training value for competitors. The tradeoff: security vs openness. Tighter controls can frustrate legitimate developers and researchers and may concentrate power further inside a few vendors.
5) The industry coordination problem is real—and hard
Anthropic calls for coordinated action among labs, cloud providers, and policymakers. That’s sensible, but difficult: competitors don’t naturally share intel, cloud providers may have conflicting incentives, and governments vary widely in priorities. Meanwhile, attackers only need one weak link—one proxy network, one leaky onboarding path, one under-monitored reseller—to keep operating.
Bottom line: This announcement is a sign that frontier AI competition is shifting from “who trains the biggest model” to “who can protect their model’s value.” If Anthropic is right, distillation attacks aren’t just IP theft—they’re a pathway for rapid capability diffusion without the safety norms and controls that frontier labs claim to be building.



Comments
There are no comments for this story
Be the first to respond and start the conversation.