Back to all posts

New AI Leaderboard Ranks Gemini First ChatGPT Eighth

2025-09-18Alex Hughes3 minutes read
Ai
Chatbots
Technology

Artificial intelligence brain glowing next to a smartphone screen

In the competitive world of artificial intelligence, a constant stream of benchmarks and tests aim to determine which model is the most powerful. These tests often focus on technical capabilities like mathematical problem-solving, reasoning, and even medical knowledge. While models like GPT-4 are known for scientific reasoning, others such as Gemini and Claude may excel in adapting to new concepts.

However, these technical benchmarks often overlook a critical factor: the user experience. A new ranking system has emerged to fill this gap, focusing purely on which AI models people genuinely prefer to use.

A New Way to Rank AI The Humaine Leaderboard

UK-based tech company Prolific has introduced a unique AI leaderboard named Humaine. Instead of evaluating models on automated tasks, Humaine gathers data directly from people. The study involved 21,352 participants from the UK and the US, who were each asked to interact with two different AI models and report which one provided a better experience.

This approach allowed Prolific to not only determine an overall winner but also to analyze preferences across different demographics, including age, ethnicity, and political views in both countries. The feedback was categorized to rank models on core task performance, reasoning, communication, fluidity, and trust and ethics.

A chart showing the ranking of the top five AI chatbots

The Surprising Results Who Came Out on Top

The results were decisive. Google's Gemini 2.5 Pro emerged as the clear favorite, topping the charts in overall performance and nearly every subcategory. Whether it was young adults in the UK or voters over 55 in the US, the consensus pointed to Gemini as the best model to interact with.

The only category where Gemini was outranked was in trust, ethics, and safety, where Grok-3 took the top spot. Following Gemini in the overall rankings were Deepseek, Magistral Le Chat, and Grok, a lineup that differs significantly from most technical leaderboards.

Logos of ChatGPT and Gemini side by side

Where Did ChatGPT and Claude Land

Perhaps the biggest surprise was the placement of some of the industry's most well-known names. The world-famous ChatGPT, specifically the GPT-4.1 model, came in at a distant 8th place. The situation was even more striking for Anthropic's Claude, with its two powerful version 4 models landing in 11th and 12th place respectively.

What This Means for the AI Landscape

So, should you stop using ChatGPT and switch to Gemini? Not necessarily. These results do not reflect the raw performance capabilities where models from OpenAI and Anthropic often lead. Instead, the Humaine leaderboard provides a valuable new perspective focused on the human side of AI interaction.

It shows that technical power doesn't always translate to a better user experience. Models like Le Chat, which may not score highest on benchmarks, have a loyal following due to user trust and ease of communication. This study serves as an important reminder that as AI becomes more integrated into our lives, the quality of the human-AI interaction is just as important as the underlying technology.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.