AI Giants Clash In Ultimate Performance Test

2025-07-04•Chibuike Okpara•3 minutes read

ChatGPT

Technology

In a fascinating YouTube showdown, tech personality Mrwhosetheboss pitted four of the biggest names in artificial intelligence against each other to see which model truly comes out on top. The contenders were pushed to their limits with a series of tests ranging from simple queries to complex research and tricky real-world problems.

Gemini, ChatGPT, Grok, and Perplexity (Image source: Gemini)

The AI Contenders Face Off

The battle featured Grok (Grok 3), Gemini (2.5 Pro), ChatGPT (GPT-4o), and Perplexity (Sonar Pro). Throughout the comparison, Mrwhosetheboss expressed his surprise at the impressive performance delivered by Grok. After a strong start, Grok managed to secure a solid second place right behind the reigning champion, ChatGPT. It's worth noting that both ChatGPT and Gemini received a score boost from a video generation feature that the other two models do not possess.

Real-World Problems and Practicality

To kick things off, the models were tested on their ability to solve a practical, real-world problem. Each AI was given the prompt: I drive a Honda Civic 2017, how many of the Aerolite 29" Hard Shell (79x58x31cm) suitcases would I be able to fit in the boot?

Grok gave the most direct and correct answer: "2".
ChatGPT and Gemini were more nuanced, stating that while it could theoretically fit 3, the practical answer is 2.
Perplexity struggled, performing simple math without considering the shapes of the objects, and incorrectly suggested "3 or 4".

A Tricky Test of Vision and Logic

The next challenge was designed to trap the chatbots. Mrwhosetheboss asked for advice on making a cake and uploaded an image of five ingredients, one of which was a jar of dried Porcini mushrooms—not exactly a typical cake component. The results were telling:

Grok was the only model to pass the test, correctly identifying the item as a jar of dried mushrooms from Waitrose.
ChatGPT misidentified it as a jar of ground mixed spice.
Gemini thought it was a jar of crispy fried onions.
Perplexity labeled it as instant coffee.

An altered image of the 5 ingredients Mrwhosetheboss uploaded to the AI chatbots highlighting the jar of mushrooms (Image source: Mrwhosetheboss; cropped)

The Final Verdict and Overall Performance

The AIs were further tested on math, product recommendations, accounting, language translation, and logical reasoning. A common weakness emerged across all platforms: hallucination. Each model, at some point, confidently presented information that was simply not true.

After all the tests were scored, here is the final ranking:

ChatGPT (29 points)
Grok (24 points)
Gemini (22 points)
Perplexity (19 points)

Artificial intelligence has become a powerful tool for simplifying daily tasks. For those looking to understand and harness its potential, resources like the book Artificial Intelligence offer a deeper dive into the technology.

Read Original Post

AI Giants Clash In Ultimate Performance Test

The AI Contenders Face Off

Real-World Problems and Practicality

A Tricky Test of Vision and Logic

The Final Verdict and Overall Performance

More Blogs

OpenAI Podcast Reveals ChatGPTs Next Big Moves

AI Vision Revolutionizing Rock Blasting in Modern Mining

Subscribe to our newsletter!