Humans logo

The Only AI Trained On The Dark Web, Meet DarkBERT

A new language model has been trained on the Dark Web, the darkest section of the internet, in case you were concerned that the current generation of generative AIs is too pleasant and sympathetic.

By Najmoos SakibPublished 3 years ago 3 min read

A portion of the Internet known as the "Dark Web" is inaccessible using a regular web browser and is not indexed by search engines like Google. Specialized overlay network software like Tor (The Onion Router) (Dingledine et al., 2004) are necessary to access the Dark Web. In addition, Tor provides "hidden services" (also known as "onion services"), which are web services that conceal the IP addresses of both the client and the server (Biryukov et al., 2013).

This sense of anonymity afforded by the Dark Web has a catch: many of the underground activities popular in the Dark Web are immoral/illegal in nature, ranging from content hosting such as data breaches to drug transactions (Al Nabkiet al., 2017; Jin et al., 2022). As a result, the Dark Web's appeal as a platform of choice for nefarious activity has piqued the curiosity of academics and security professionals alike.

Natural language processing (NLP) techniques are now being used by cybersecurity professionals and researchers to combat the constantly evolving panorama of contemporary cyber threats. Cyber threat intelligence, also known as CTI (Liao et al., 2016; Bromiley, 2016), is a crucial component of contemporary cybersecurity research. It involves gathering evidence-based knowledge to mitigate new risks, such as indications of compromise (IOC).

As a result, the application of NLP approaches to the Dark Web has been expanded (Jin et al., 2022; Yoon et al., 2019; Choshen et al., 2019; Al Nabki et al., 2017; AlNabki et al., 2019; Yuan et al., 2018). The Dark Web's ongoing use as a cybercrime platform makes it a desirable and vital arena for CTI study.

DarkBERT (yeah, that's its real name) is a generative AI trained entirely on the Dark Web to compare it to a vanilla equivalent. The researchers wanted to know if using the Dark Web as a dataset would give an AI better context on the language used there, making it more valuable to people wishing to trawl the Dark Web for research and law enforcement fighting cybercrime.

It also thoroughly scanned a location that most people would rather avoid and indexed its many domains, so thanks for doing your part, DarkBERT.

The majority of people are prevented from accessing the Dark Web because Google and other search engines ignore it. It has developed quite notoriety for what happens there since it can only be accessed by using specialist software called Tor (or something similar). Although there have been stories of torture chambers, contract murderers, and other horrifying crimes, the majority of them have really been hoaxes and other means of data theft that bypass browser security, which we all take for granted. Nevertheless, the Dark Web is a very important target for law enforcement since it is allegedly utilized by cybercrime networks for anonymous communication.

To better understand the language used on the Dark Web, a team from South Korea set up a language model to comb through it using Tor and provide the raw data it discovered. Once completed, they compared its performance to that of earlier models developed by the researchers, such as RoBERTa and BERT.

DarkBERT beat the competition in all datasets, according to the research provided in the paper, although it wasn't by much. All the AIs came from the same framework, thus it was assumed that their performance would be equal. However, DarkBERT performed very well on the Dark Web.

So what will be done with DarkBERT? The team anticipates it to be a potent tool for scanning the Dark Web for cybersecurity risks and monitoring forums to spot illegal behavior, though they hope it won't be given the nuclear launch codes. Just let's hope this doesn't inspire any new ideas in OpenAI. On the arXiv, you may find the preprint, which is an early draft of a paper that hasn't yet undergone peer review.

science

About the Creator

Najmoos Sakib

Welcome to my writing sanctuary

I'm an article writer who enjoys telling compelling stories, sharing knowledge, and starting significant dialogues. Join me as we dig into the enormous reaches of human experience and the artistry of words.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.