AI Teaches Doctors Professionalism A Pilot Study
AI in Medical Training: A New Frontier for Professionalism
Artificial intelligence (AI), especially advanced language models like ChatGPT, is rapidly transforming how we approach clinical care and medical education. A recent pilot study delved into this evolving landscape, specifically evaluating how a curriculum generated by ChatGPT 3.5 could impact the understanding of professionalism among internal medicine residents in a U.S. residency program. This innovative approach seeks to harness AI's capabilities to foster essential ethical and professional competencies in future physicians.
Designing an AI-Powered Ethics Curriculum
The investigation, conducted as a single-group, pre-post intervention pilot study from August 2024 to February 2025, was exempt from IRB review (E24149). It involved internal medicine residents across all postgraduate years (PGY-1 to PGY-3).
The core of the study was a three-week professionalism curriculum. This was thoughtfully integrated into the residents' regular Friday ambulatory didactics, ensuring minimal disruption to their schedules. Each week, residents engaged with a new module. These modules presented case scenarios entirely generated by ChatGPT 3.5. The scenarios were specifically crafted to align with the established domains of the Penn State Questionnaire on Professionalism (PSQP), a recognized tool for assessing professionalism.
To ensure the quality and appropriateness of the AI-generated content, three experienced faculty members meticulously reviewed each case scenario for clinical and ethical relevance before it was presented to the residents. Participants completed one module per week using the Qualtrics platform and received immediate feedback, creating an interactive and responsive learning experience.
The study's primary measure of impact was the validated 36-item PSQP, which residents completed anonymously both before the curriculum began and after its conclusion. Researchers analyzed the pre- and post-intervention differences using unpaired t-tests, adjusted for clustering based on baseline characteristics. Sensitivity analyses with log-transformed scores were also performed. For more detailed subgroup analyses, propensity score matching and cluster-adjusted logistic regression were employed. A p-value of less than 0.05 was the threshold for statistical significance.
Key Findings from the AI Curriculum Pilot
A total of 37 residents participated in the pre-intervention survey, with 33 completing the post-intervention survey. The participant cohort had a mean age of 28.9 years (standard deviation: 3.4). The gender distribution was well-balanced, with 18 males and 19 females. A significant portion of the residents, 59%, were non-U.S. citizens. Participants were also evenly distributed across the PGY-1, PGY-2, and PGY-3 levels. Covariate balance was successfully achieved after matching participants by age and sex.
While the study observed improvements across all domains of professionalism following the intervention, these changes were not statistically significant when considering the group as a whole. However, trends indicated a positive shift, with a greater proportion of residents selecting "much" or "great deal" on most PSQP items after the curriculum (61-77%) compared to before (35-70%). Particularly notable improvements, achieving statistical significance, were seen in areas such as understanding the importance of corrective action (p = 0.006), recognizing the value of attending seminars (p = 0.003), and the commitment to upholding scientific standards.
A Gendered Lens: Significant Gains for Female Residents
One of the most striking findings of the pilot study emerged when analyzing results by gender. Female residents demonstrated statistically significant improvements in several critical professionalism domains:
- Duty (p = 0.004)
- Accountability (p = 0.037)
- Honor (p = 0.028)
- Altruism (p = 0.017)
Importantly, some of these positive effects in female residents persisted even after statistical matching for baseline characteristics. In contrast, the study did not find any statistically significant changes in these domains among male residents.
Implications and the Path Forward for AI in Medical Education
This pilot study is among the pioneering efforts to assess a ChatGPT 3.5-generated professionalism curriculum using the validated PSQP. Although the overall changes in professionalism scores did not reach statistical significance for the entire cohort, the substantial and significant gains observed in specific domains among female residents are compelling. This suggests a clear educational benefit and highlights the potential need for gender-responsive instructional design in medical education.
The curriculum's low-cost and scalable format presents a promising model. It could serve as a valuable template for medical institutions, particularly those with limited resources, seeking to implement or enhance their professionalism training programs.
The researchers conclude that further investigation is warranted. Future studies should ideally be multi-institutional, employ paired designs (comparing the same individuals pre- and post-intervention), and include long-term follow-up to better understand the lasting impact of such AI-driven educational tools. This initial exploration opens the door for more comprehensive research into how AI can effectively and equitably contribute to the development of professionalism in the next generation of physicians.