Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

AI Excels at Medical Image Exams with GPT4o

2025-08-23•Chen, Wei•4 minutes read

AI in Medicine

Medical Education

ChatGPT

The Rise of AI in Medical Education

Generative artificial intelligence tools like ChatGPT are rapidly changing numerous fields, and medical education is no exception. Advanced models such as GPT-4 and the newer GPT-4 Omni (GPT-4o) have shown a remarkable ability to understand complex text and reason through problems. Their potential to help with clinical decision-making, exam preparation, and streamlining the learning process is significant.

While previous studies have shown that AI can perform well on text-based medical exams, a crucial gap remained: its ability to handle questions that include clinical images. In fields like dermatology, radiology, and pathology, visual diagnosis is fundamental. This study addresses that gap by evaluating how well GPT-4 and GPT-4o can interpret and answer image-based questions from the United States Medical Licensing Examination (USMLE), a benchmark for medical knowledge.

Putting AI to the Test The USMLE Challenge

To create a rigorous test, researchers gathered all 38 publicly available image-based questions from the USMLE Step 1 and Step 2 Clinical Knowledge sample sets. These questions are designed by the National Board of Medical Examiners to assess a combination of foundational medical knowledge and practical clinical reasoning. They cover a wide range of specialties and represent the types of challenges that doctors face daily.

Each AI model was presented with the questions one by one in a fresh session to avoid any memory effects. Using specific prompts, the models were asked to provide a single answer choice without an explanation, mimicking a standard exam environment. This method allowed for a clear calculation of their accuracy based on the number of correct responses. The study also explored how these models could be used in case-based teaching scenarios, further probing their utility for educators.

The Verdict How Did GPT-4 and GPT-4o Perform

The study analyzed 38 questions spanning 18 medical disciplines, with dermatology, cardiology, and gastroenterology being the most represented. The results showed a significant leap in performance with the newer model.

GPT-4 achieved a respectable accuracy rate of 73.4%. However, GPT-4o performed even better, correctly answering an impressive 89.5% of the image-based questions. While this improvement was notable, the difference was not statistically significant, likely due to the limited number of questions. Interestingly, the four questions that GPT-4o answered incorrectly were also missed by GPT-4, suggesting that certain complex visual reasoning tasks remain a challenge for current AI.

Overview of Image-Based USMLE Sample Test Questions and ChatGPT Performance.

An interesting side-note was that the two models often disagreed on the complexity of the questions, with only about half being classified identically.

Performance Evaluation of GPT-4 and GPT-4 Omni (GPT-4o) on Image-Based USMLE Test Items.

Beyond the Exam AI as a Teaching Assistant

The potential of these AI models extends far beyond simply taking tests. The study demonstrated that Large Vision-Language Models (LVLMs) like GPT-4o can be valuable tools for educators. For example, they can help generate new multiple-choice questions from existing course materials that include images, streamlining the development of assessments.

Furthermore, GPT-4o showed a strong capacity for logical reasoning when asked to analyze incorrect answers. This suggests it could be used to create detailed explanations that help students understand their mistakes and reinforce key clinical concepts. In exploratory tests, the model was also able to assist in designing structured lesson plans centered on specific clinical cases, showcasing its potential to make curriculum development more efficient.

A Cautious Welcome The Future of AI in Medicine

This study provides compelling evidence that AI is becoming increasingly capable of handling complex, multimodal medical tasks. The high accuracy of GPT-4o on image-based USMLE questions signals a new frontier for AI-assisted learning.

However, the researchers stress the importance of caution and continued human oversight. Errors still occur, especially in nuanced clinical scenarios, which means that any AI-generated educational content must be validated by experts to ensure it is accurate, safe, and pedagogically sound. The study also acknowledges limitations, such as the small sample size and the use of a web interface instead of a more controlled API. Despite these limitations, the findings strongly support the growing role of AI as a powerful assistant in medical education. The future lies not in replacing human instructors, but in integrating these advanced tools to create a more efficient and personalized learning experience for the next generation of healthcare professionals.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram