Back to all posts

How AI Learns to Recognize Human Emotions in Pictures

2025-07-10Manuel G. Pascual4 minutes read
Artificial Intelligence
Machine Learning
Psychology

While machines are incapable of feeling emotions or empathizing with people, a new study reveals something startling: they behave as if they understand them. According to a study published in Royal Society Open Science, when modern large language models with multimodal capabilities are asked to rate emotions in images, their responses are remarkably similar to those of human volunteers.

Unlike traditional LLMs trained only on text, these multimodal systems are built using billions of images paired with detailed text descriptions. This creates a complex model that correlates pixels with words, allowing the AI to answer sophisticated questions about visual scenes. The researchers aimed to discover if these systems could judge the emotional content of images, a crucial step in ensuring AI responses align with human values and mitigating the risk of biased or inappropriate reactions.

The study's conclusion is clear: AI ratings show a high correlation with average human ratings. This suggests that modern AI can develop sophisticated representations of emotional concepts through natural language, even without explicit training on emotion recognition.

The Experiment: Pitting AI Against Human Judgment

To test this, researchers used three of the most advanced multimodal systems available today: OpenAI's ChatGPT-4o, Google's Gemini Pro, and Anthropic's Claude Sonnet. The models were prompted to act like a human participant in a psychological experiment.

They were then shown a series of images and asked to rate them on several scales:

  • Valence: How negative or positive the scene was (1-9).
  • Arousal: Whether it made them want to avoid or approach the scene.
  • Motivational Direction: If it provoked relaxation or alertness.
  • Basic Emotions: The extent to which the image evoked happiness, anger, fear, sadness, disgust, or surprise.

These AI-generated ratings were compared against the responses of 204 human participants who evaluated 362 photos from the NAPS database, which contains a wide range of positive, neutral, and unpleasant images.

Surprising Results: AI's Ratings Align with Humans

The findings showed a striking similarity between the judgments of machines and people. According to the study, GPT-4o's responses correlated particularly well with humans, scoring between 0.77 and 0.90 (where 1.0 is a perfect match). Claude also performed strongly with scores of 0.63-0.90, though it sometimes refused to answer due to its safety protocols. Gemini's performance was slightly lower but still demonstrated a remarkable match to human responses, with scores ranging from 0.55 to 0.86.

How Does AI Learn Emotion Without Feeling It?

How can multimodal systems achieve this without any capacity for genuine feeling? Alberto Tesolin, a co-author of the study, points to the training data. "We tend to think that image-text pairs contain purely visual semantic information, such as ‘image of a field of sunflowers,’" he explains. "Our research suggests that textual descriptions are much richer, allowing us to infer the emotional status of the person who wrote the entry."

In essence, the AI isn't understanding emotion; it's recognizing patterns in the language used to describe emotional scenes. As Professor José Miguel Fernández Dols, who was not involved in the study, notes, if a machine has access to text describing typical human reactions to certain stimuli, it can mimic those judgments. It processes the adjectives, adverbs, and verbs associated with descriptions of a particular type of image.

A Crucial Distinction: Emulation is Not Emotion

The authors stress a critical point: an AI's ability to emulate human ratings does not mean it can think or feel. Human emotional responses are complex and varied, whereas an AI provides an averaged, probabilistic response. The study states, "‘reading about emotions’ is qualitatively different from having direct emotional experiences.”

This taps into a wider, more controversial debate in the AI field. While some companies sell facial recognition systems that claim to detect emotions, much of the scientific community disputes the idea of universal emotional expressions, highlighting the significant role culture plays in how we show and interpret feelings. The researchers call for more investigation into these cultural differences.

Professor Fernández Dols concludes that these findings are a topic for reflection: "everyday language is a logical construct that can be perfectly coherent, persuasive, informative, and even emotional without any brain speaking."

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.