Futurism logo

HOW AI IS USING TIKTOK DANCES TO SEE

REVOLUTIONARY AI

By adsbuePublished 3 years ago • 5 min read
HOW AI IS USING TIKTOK DANCES TO SEE
Photo by Possessed Photography on Unsplash

Have you ever wondered how computers can "see" images and videos? Machine learning and artificial intelligence (AI) are the technologies behind this incredible feat, but how do they learn to recognize objects, people, and movements in the real world? The answer lies in the ground truth, the set of correct answers that computers rely on to learn and test their skills. Without the right ground truth, AI models are useless.

By Possessed Photography on Unsplash

But where do we get the ground truth for such complex tasks as 3D image generation, scene reconstruction, and motion detection? One researcher found an unlikely source of ground truth in TikTok videos. Yasamin Jafarian, a PhD student in computer vision and graphics at the University of California, Berkeley, collected 600 custom-selected TikToks to train her AI model. But why TikToks, you may ask? The answer lies in their diversity.

By Amanda Vick on Unsplash

TikTok videos are not all about trendy dances and lip-syncing, although those are certainly popular. They are also full of people, objects, and backgrounds that can help computers learn to see in 3D. For instance, Yasamin used Renderpeople, a dataset of video scans of people in 3D, as the starting point for her research. However, she needed more variety in backgrounds and movements to train her AI model to see people from different angles, poses, and contexts. TikToks provided that variety in spades, from indoor to outdoor settings, from casual to formal outfits, and from slow to fast movements.

By Yan Berthemy on Unsplash

To use TikToks as ground truth, Yasamin had to create a 3D model of each video from the phone's point of view. She used two programs to achieve this: one to remove the background and isolate the person, and another to estimate the person's movements in 3D space. The result was a 3D mesh that represented the person's pose and location in the video. The AI model could then learn from these meshes and predict how a person would look like from a different viewpoint or with a different movement.

By Alexander Shatov on Unsplash

This process is akin to flipping over a flashcard with a question on one side and an answer on the other. In machine learning, the question is the input (e.g., a video or an image) and the answer is the output (e.g., a 3D mesh or a reconstructed scene). The more varied and accurate the answers are, the better the AI model can learn and test. TikToks provided Yasamin with a rich source of ground truth that enabled her to train her AI model to recognize people in 3D from various viewpoints and movements.

By Owen Beard on Unsplash

But TikToks are not the only source of ground truth for AI models. Another example is the mannequin challenge, a viral video trend that involved groups of people freezing in place while a camera moved around them. This trend resulted in thousands of videos that provided an unusual form of ground truth: scenes with people who are not moving. This may seem counterintuitive for teaching AI models to recognize motion, but it actually helps them learn to recognize depth and perspective.

By Andre Hunter on Unsplash

In the mannequin challenge videos, the camera moves around a group of people who are trying to stay still, creating a 3D representation of the scene from different angles. This is similar to what a set of cameras could achieve if they were placed around the scene to capture it from different viewpoints. However, using real cameras has limitations, such as the cost, the time, and the complexity of setting up such a system. Using mannequin challenge videos as ground truth allows researchers to approximate the real world while still controlling for certain variables.

For example, a team of researchers at Google used the mannequin challenge videos to create a dataset In recent years, there has been a significant increase in the amount of data available due to the proliferation of digital technology. This data can come in many forms, such as text, images, audio, and video. One of the challenges faced by researchers in the field of artificial intelligence (AI) is to find ways to efficiently collect and process this data in order to train machine learning models.

One innovative approach to gathering large amounts of video data is the use of social media challenges, such as the mannequin challenge. The mannequin challenge is a viral trend that began in 2016, in which people freeze in place while a video is being recorded, creating a tableau-like effect. These videos quickly became popular on social media, with millions of people posting their own versions of the mannequin challenge.

A team of researchers at Google saw the potential of using these videos to create a dataset for training machine learning models. They collected over 2,000 mannequin challenge videos from YouTube and other social media platforms, which amounted to over 100,000 frames of video data. The videos featured a wide range of scenes, including people in different poses and environments, such as offices, classrooms, and outdoor spaces.

By Scott Graham on Unsplash

The team then used a technique called transfer learning to train a machine learning model on this dataset. Transfer learning is a method of using a pre-trained model, such as a neural network, to perform a new task. In this case, the researchers used a pre-trained neural network that had been trained on a large dataset of images to identify objects and scenes. They then fine-tuned this model on the mannequin challenge dataset to teach it to recognize people frozen in different poses and environments.

The resulting model was able to accurately identify people in the mannequin challenge videos with a high degree of accuracy. This demonstrates the potential of using social media challenges as a source of large amounts of video data for machine learning.

The use of social media challenges to gather data is not limited to the mannequin challenge. Other challenges, such as the ALS Ice Bucket Challenge and the Harlem Shake, have also been used to gather large amounts of video data. These challenges often involve people performing a specific action or dance, which can be used to create a dataset for training machine learning models to recognize those actions.

By Aaron Weiss on Unsplash

One example of this is the work of researchers at the University of California, Berkeley, who used the Harlem Shake challenge to create a dataset for training machine learning models to recognize dance moves. They collected over 3,000 videos of people performing the Harlem Shake from YouTube and other sources, and used these videos to train a machine learning model to recognize the different components of the dance, such as the shaking and the jumping. This is how AI is capturing our movements and studying it.

CONSIDER SUBSCRIBING IF YOU LIKE THE ARTICLE

THANK YOU SO MUCH FOR READING

artificial intelligenceevolutionfact or fictionfuturehumanityintellectscience

About the Creator

adsbue

HI, IM ADITI

A WRITER BY PASSION, I AM A COMMERCE STUDENT I JUST LIKE TO WRITE A LOT AND INFORM PEOPLE ABOUT VARIOUS THINGS AND TOPICS AND I LOVE ANIME, MANHWAS AND MANGAS.

SOO ALSO AN OTAKU:)

I HOPE YOU ENJOY MY ARTICLES.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.