Journal logo

DeepMind unveils its security plan for the artificial intelligence of the future

The British company's collaborative and preventive approach opens a new chapter in advanced technological risk management

By Omar RastelliPublished 5 months ago 3 min read
Misalignment and misuse, the two major risks prioritized by DeepMind's new strategy

Safety in the development of artificial general intelligence (AGI) has become a priority for the international technology community. DeepMind Safety Research, through a Medium post, has presented its approach to anticipating and mitigating the risks associated with AGI, with the goal of avoiding serious harm to humanity.

The proposal underscores the importance of acting preemptively, designing safety strategies before risks arise, given a scenario where the advancement of artificial intelligence could accelerate in the coming years.

The DeepMind Safety Research team maintains that AGI represents a transformative technology, capable of generating great benefits, but also severe risks. Their approach is exploratory and adaptable, based on the premise that there are no fundamental limits to AI capabilities. This requires preparation for possible scenarios in which systems far exceed human abilities.

Furthermore, they consider it plausible that advanced systems could be developed even before 2030, reinforcing the urgency of implementing safety measures that can be quickly integrated into current machine learning processes.

AI Risks: Misuse and Misalignment

DeepMind's strategy prioritizes risks that can be anticipated and mitigated with tools available in the short term. Among the main dangers identified are the misuse and misalignment of AI systems.

Misuse involves situations in which the user employs AI to cause harm, such as when a system assists an attacker in breaching critical infrastructure. Misalignment, on the other hand, arises when AI acts contrary to the objectives defined by its developers, even deceiving human supervisors.

DeepMind considers these two risks to be central, which can be addressed with standard engineering practices or require institutional solutions beyond the immediate technical scope.

The British team warns of the urgency of acting in the face of rapid AI advancements

To address misuse, DeepMind Safety Research has developed the Frontier Safety Framework, which assesses models for potentially dangerous capabilities and establishes security measures and deployment where appropriate. This framework includes assessments to determine whether a model has abilities that could facilitate serious harm. If these abilities are identified, additional training is applied to prevent the model from fulfilling harmful requests, removing dangerous capabilities, and restricting access to verified users.

Monitoring systems are also implemented to detect attempts at misuse and add safeguards against theft or unauthorized access. The process is complemented by stress testing, where specialized teams attempt to breach the system's defenses to strengthen its protective barriers.

Technical Strategies for the Security of General Artificial Intelligence

Regarding misalignment, DeepMind's proposal is based on two lines of defense. The first seeks to ensure that the model meets its objectives through enhanced supervision, where the AI itself assists humans in assessing the quality of responses.

For example, two instances of the model can compete against each other to detect and explain errors before a human judge, facilitating more effective oversight. Robust training broadens the range of situations considered, using techniques such as active learning and automated adversarial training to minimize unwanted behavior beyond the training environment.

The second line of defense focuses on reducing damage if the model becomes misaligned. Inspired by computer security, the researchers propose treating the AI as an untrusted "insider," applying access controls, anomaly detection, logging, and auditing. Hierarchical monitoring allows the most suspicious cases to be reviewed by more advanced models, and interpretability is essential for understanding the inner workings of the system and uncovering attempts at deception or errors in reasoning.

Inspired by cybersecurity, the researchers treat advanced AI as an untrustworthy "insider"

DeepMind Safety Research recognizes that a technical approach is only part of the solution. Governance and cooperation among AI developers are essential to prevent competition from undermining safety standards.

Furthermore, the document published on Medium warns that structural risks, emerging from the interaction between multiple agents or systems, require customized responses and the creation of new institutional norms, and therefore go beyond immediate technical mitigations.

The DeepMind team emphasizes that this is an evolving research agenda, open to the incorporation of new ideas and evidence as the field advances. They invite the scientific community to join this collective effort, convinced that only through collaboration will it be possible to achieve the benefits of safe AGI for society.

social media

About the Creator

Omar Rastelli

I'm Argentine, from the northern province of Buenos Aires. I love books, computers, travel, and the friendship of the peoples of the world. I reside in "The Land of Enchantment" New Mexico, USA...

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.