Futurism logo

Humans are still better than AI at reading the room

Says Johns Hopkins Study

By Waqar AhmadPublished 9 months ago 3 min read

When it comes to truly “reading the room,” humans are still far ahead of even the most advanced AI systems. A recent study conducted by researchers at Johns Hopkins University sheds light on a critical shortcoming in today’s artificial intelligence technology—its inability to accurately interpret and respond to social interactions, especially in dynamic real-world scenarios.

What’s the Big Deal?

Let’s imagine a self-driving car approaching a busy intersection. A pedestrian steps onto the crosswalk but then stops to wave at someone. Will the pedestrian cross the street? Are they just saying hello? Or are they engaged in a conversation with someone and not intending to move? A human driver can quickly make sense of this social situation and react accordingly. Current AI systems, on the other hand, struggle to make these kinds of nuanced judgments.

This limitation isn't just a minor flaw—it’s a major obstacle for technologies like autonomous vehicles, service robots, and any AI system designed to interact with people in the real world. These machines need to recognize human intentions, interpret behaviors, and anticipate actions, not just detect movement or identify faces.

The Johns Hopkins Study: How It Worked

To test AI’s abilities, researchers designed an experiment involving three-second video clips. These short clips showed people doing everyday things—chatting, walking side-by-side, or just standing around. Human participants were asked to watch these clips and rate the nature of the interactions they saw. Were the people engaging with each other? Acting independently? Was there a social connection?

Then, more than 350 AI models—including language, video, and image models—were given the same challenge. They were asked to predict how humans would interpret each scene and how the human brain might respond.

The results? While humans largely agreed on the interpretations, AI systems fell short across the board. The models couldn’t reliably determine whether people were talking, about to move, or interacting at all.

Why AI Struggles with Social Context

So why are these advanced systems—many trained on massive datasets—failing such a basic human task?

The answer lies in how these models are built. Most current AI, especially visual AI, is based on neural networks inspired by the part of the human brain that processes static images. This approach has been incredibly successful for tasks like facial recognition or object detection. But social interactions aren’t static—they’re complex, dynamic, and full of subtle context clues.

“Real life isn’t a snapshot,” said Kathy Garcia, one of the lead researchers. “It’s a movie. And to understand that movie, you need to understand the relationships and context evolving in real time.”

AI that understands only frames—like still images—is missing the bigger picture. It can’t recognize the ongoing “story” in a scene. This becomes a major blind spot when the AI is used in environments where human interaction is constant and unpredictable.

A Gap Between Machine Perception and Human Understanding

Interestingly, different types of AI models performed differently. Language models—those trained to understand and generate text—were better at predicting human behavior based on written descriptions of the clips. Meanwhile, video models were more successful at predicting how the brain might react neurologically to what was seen.

But even the best-performing models couldn’t match the combined behavioral and cognitive responses of human participants.

“There’s something fundamental that machines are missing when it comes to how we, as humans, process and understand social scenes,” said Leyla Isik, the lead author of the study.

Implications for AI Development

This research highlights a crucial area for improvement in AI. While we’ve come a long way in teaching machines to recognize objects, follow commands, and even write essays, we haven’t yet cracked the code for social intelligence. And without that, AI will always be limited in how it operates alongside people.

Think about the implications for industries relying on AI:

Autonomous Vehicles: Misreading a pedestrian’s intention could lead to accidents.

Healthcare Assistants: Failing to recognize patient distress or emotion can be dangerous.

Customer Service Bots: Misunderstanding tone or social cues can lead to frustration or poor service.

Where Do We Go from Here?

The takeaway is clear: If we want AI to truly integrate into society, we need to design models that don’t just see or hear but understand. This means taking inspiration from how the human brain processes dynamic interactions—not just static visuals.

Future AI systems will likely need hybrid architectures, blending video, audio, language, and even emotional signals to truly grasp the social fabric of human interaction. It’s a tough nut to crack, but a necessary one if AI is to evolve from a useful tool into a genuinely intelligent companion.

The Johns Hopkins study is a wake-up call for developers, researchers, and tech companies. There’s still a long way to go before AI can match our social instincts, but recognizing the problem is the first step toward solving it.

artificial intelligenceevolutionfact or fictiontranshumanism

About the Creator

Waqar Ahmad

I have been a professional freelancer and computer science degree holder since 2007. I have been working as a content and article writer for more than 10 years. Providing the best content with better research is my aim.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2026 Creatd, Inc. All Rights Reserved.