Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

AI Tutors Ophthalmology Student Study Aids Tested

2025-06-19•Why publish in Cureus? Click below to find out.•3 minutes read

Artificial Intelligence

Medical Education

Ophthalmology

AI in the Exam Room Can ChatGPT and Bard Help Ophthalmology Students?

A recent study investigated how well two popular AI tools, ChatGPT 3.5 and Google Bard, can assist undergraduate medical students in preparing for ophthalmology short answer questions (SAQs). The core goal was to see if these AI models could serve as effective self-assessment resources.

How the AI Tutors Were Tested

Researchers embarked on a meticulous process to evaluate the AI. They randomly selected 261 short answer questions from past university exams and publicly available ophthalmology question banks. To ensure relevance to current medical education standards in India, these questions were categorized based on the National Medical Commission's (NMC) competency-based medical education (CBME) curriculum. This resulted in three types of questions:

Short note task-oriented questions (SNTO): 169 questions
Short note reasoning questions (SNRQ): 15 questions
Applied aspect SAQs (SN Applied): 77 questions

Notably, image-based questions were excluded from this study.

Before pitting the AI against the questions, a team of three ophthalmologists collaborated to create ideal model answers for every single question. These expert-crafted answers served as the benchmark. The same 261 questions were then fed to both ChatGPT 3.5 and Google Bard.

To gauge the quality of the AI-generated responses, the same three ophthalmologists independently evaluated each answer. They used a 3-point scoring system focusing on:

Correct diagnosis (if applicable)
Accuracy of content
Relevance of the information provided

The scores from the three evaluators were compiled, and the data underwent statistical analysis to compare the overall and category-specific performance of ChatGPT 3.5 and Bard.

AI Scorecard ChatGPT vs Bard

Out of a total possible score of 783 (which is 261 questions multiplied by 3 points per question), ChatGPT 3.5 achieved a score of 696, translating to an accuracy of 88.8%. Google Bard was close behind, scoring 685, or 87.5%.

While the overall difference in performance between the two AI tools was not statistically significant, ChatGPT 3.5 demonstrated a significantly better performance in the 'short note task-oriented' (SNTO) category. This suggests ChatGPT might be more adept at handling questions that require specific, factual recall or procedural descriptions.

The Fine Print Limitations and Errors

Despite the generally high scores, the study highlighted a critical concern: both AI models produced poor-quality or inadequate answers for a considerable number of questions. Specifically:

ChatGPT 3.5 provided subpar answers for 50 questions, an error rate of 19%.
Bard struggled with 44 questions, resulting in an error rate of 16.8%.

In some instances, the AI-generated responses were found to be lacking essential information, even for topics considered high-yield or fundamental in ophthalmology. This indicates that relying solely on AI for self-assessment could lead to gaps in student understanding.

The Verdict Using AI Wisely in Medical Studies

The study concludes that both ChatGPT 3.5 and Bard are capable of generating largely accurate and relevant responses to ophthalmology short answer questions. ChatGPT 3.5 showed a slight edge, especially with task-oriented questions, hinting it might be a more effective self-assessment aid for undergraduate students in this context.

However, the researchers stress a crucial caveat: due to the notable error rate hovering around 20%, AI-generated responses should not be used in isolation. Students must cross-reference information from AI tools with standard textbooks and verified academic resources.

These AI tools are likely best suited for rapid information retrieval during the initial phases of study or for getting a quick overview of a topic. They are not yet a substitute for traditional, validated learning materials and expert human guidance in medical education.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram