AI Tutors Ophthalmology Student Study Aids Tested
AI in the Exam Room Can ChatGPT and Bard Help Ophthalmology Students?
A recent study investigated how well two popular AI tools, ChatGPT 3.5 and Google Bard, can assist undergraduate medical students in preparing for ophthalmology short answer questions (SAQs). The core goal was to see if these AI models could serve as effective self-assessment resources.
How the AI Tutors Were Tested
Researchers embarked on a meticulous process to evaluate the AI. They randomly selected 261 short answer questions from past university exams and publicly available ophthalmology question banks. To ensure relevance to current medical education standards in India, these questions were categorized based on the National Medical Commission's (NMC) competency-based medical education (CBME) curriculum. This resulted in three types of questions:
- Short note task-oriented questions (SNTO): 169 questions
- Short note reasoning questions (SNRQ): 15 questions
- Applied aspect SAQs (SN Applied): 77 questions
Notably, image-based questions were excluded from this study.
Before pitting the AI against the questions, a team of three ophthalmologists collaborated to create ideal model answers for every single question. These expert-crafted answers served as the benchmark. The same 261 questions were then fed to both ChatGPT 3.5 and Google Bard.
To gauge the quality of the AI-generated responses, the same three ophthalmologists independently evaluated each answer. They used a 3-point scoring system focusing on:
- Correct diagnosis (if applicable)
- Accuracy of content
- Relevance of the information provided
The scores from the three evaluators were compiled, and the data underwent statistical analysis to compare the overall and category-specific performance of ChatGPT 3.5 and Bard.
AI Scorecard ChatGPT vs Bard
Out of a total possible score of 783 (which is 261 questions multiplied by 3 points per question), ChatGPT 3.5 achieved a score of 696, translating to an accuracy of 88.8%. Google Bard was close behind, scoring 685, or 87.5%.
While the overall difference in performance between the two AI tools was not statistically significant, ChatGPT 3.5 demonstrated a significantly better performance in the 'short note task-oriented' (SNTO) category. This suggests ChatGPT might be more adept at handling questions that require specific, factual recall or procedural descriptions.
The Fine Print Limitations and Errors
Despite the generally high scores, the study highlighted a critical concern: both AI models produced poor-quality or inadequate answers for a considerable number of questions. Specifically:
- ChatGPT 3.5 provided subpar answers for 50 questions, an error rate of 19%.
- Bard struggled with 44 questions, resulting in an error rate of 16.8%.
In some instances, the AI-generated responses were found to be lacking essential information, even for topics considered high-yield or fundamental in ophthalmology. This indicates that relying solely on AI for self-assessment could lead to gaps in student understanding.
The Verdict Using AI Wisely in Medical Studies
The study concludes that both ChatGPT 3.5 and Bard are capable of generating largely accurate and relevant responses to ophthalmology short answer questions. ChatGPT 3.5 showed a slight edge, especially with task-oriented questions, hinting it might be a more effective self-assessment aid for undergraduate students in this context.
However, the researchers stress a crucial caveat: due to the notable error rate hovering around 20%, AI-generated responses should not be used in isolation. Students must cross-reference information from AI tools with standard textbooks and verified academic resources.
These AI tools are likely best suited for rapid information retrieval during the initial phases of study or for getting a quick overview of a topic. They are not yet a substitute for traditional, validated learning materials and expert human guidance in medical education.