Back to all posts

ChatGPT 4o Passes Medical Exam for Eye Specialists

2025-07-28Why publish in Cureus? Click below to find out.3 minutes read
Artificial Intelligence
Medical Education
ChatGPT

The Rise of AI in Medical Training

Artificial intelligence, and powerful language models like ChatGPT in particular, are rapidly becoming influential forces in medical education and knowledge assessment. Previous studies have already highlighted the growing ability of AI to tackle complex medical exams, including the Final Medical Examination (LEK) and the Polish State Specialization Exam (PES) across various fields. This progress raises important questions about how AI can be best utilized as a supportive tool in the rigorous process of specialist training.

Putting ChatGPT-4o to the Test

To explore this potential, a recent study aimed to evaluate the performance of the latest model, ChatGPT-4o, by having it take the official Polish State Specialization Exam in ophthalmology from the Spring 2024 session. The challenge consisted of 120 multiple-choice questions, which were provided to the model in their original Polish language after it was familiarized with the exam's regulations.

The analysis focused on two key areas: the accuracy of the AI's answers, checked against the official answer key, and the model's own declared confidence level for each answer, rated on a scale of 1 to 5. To ensure a thorough review, the questions were also sorted into clinical and theoretical categories.

How Did the AI Perform?

The results were impressive. ChatGPT-4o correctly answered 94 out of the 120 questions, achieving a score of 78.3% and comfortably exceeding the required passing threshold. This demonstrates a high level of effectiveness in a specialized medical domain. Furthermore, a statistical analysis showed no significant performance gap between its handling of clinical and non-clinical (theoretical) questions, suggesting the model possesses a balanced and comprehensive knowledge base.

Confidence is Key: A New Metric for AI Reliability

A particularly insightful part of the study was the analysis of the model's self-assessed confidence. The researchers found a strong and statistically significant correlation: the AI provided correct answers with much higher confidence levels. This suggests that the model's confidence rating could serve as a valuable indicator of its own accuracy, helping users gauge the reliability of the information it provides.

The Future of AI in Medical Education

The study concludes that ChatGPT-4o has demonstrated significant potential as a tool in specialist medical education. The ability to not only provide answers but also signal the likelihood of their correctness is a major step forward. However, despite these promising results, the researchers stress that AI is not yet ready to replace human expertise. Before these models can be widely implemented in medical education, careful expert supervision and further research across a diverse range of medical fields are essential to ensure their safe and effective use.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.