Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

AI in the OR How ChatGPT Aids Surgical Students

2025-11-12•Cauley, Ryan P.•3 minutes read

Medical Education

Artificial Intelligence

Surgical Training

The High-Stakes Challenge for Aspiring Surgeons

For medical students aiming for a career in the competitive field of plastic and reconstructive surgery (PRS), subinternships are a crucial step. Being well-prepared for these rotations is essential, but there isn't a standard, one-size-fits-all method for students to get ready for complex surgical cases. Recently, large language models (LLMs) like ChatGPT have shown great potential in medical education, but their reliability for the specific demands of surgery has been an open question. A new study aimed to find out just how useful ChatGPT can be for students on their PRS rotations by testing its responses to common surgical questions.

Putting AI to the Test The Studys Method

Researchers created a dataset of 267 questions covering a wide range of PRS subtopics. To ensure the test was comprehensive, they selected three different procedures from each subtopic and asked 12 identical questions about each one. To give the AI context, every prompt started with the phrase, "I am a medical student preparing for my PRS rotation." The responses from GPT-4o were then carefully scored by experts on a five-point scale, evaluating them for accuracy, completeness, usefulness, relevance, and overall quality. When a response scored poorly, the team re-queried it using a newer model, OpenA1 o1, to see if performance had improved.

The Verdict ChatGPTs Performance Under the Microscope

The results were a mixed bag, revealing both the strengths and weaknesses of the AI. Right away, 21 responses had to be thrown out because they contained fabricated, non-existent scientific articles when asked for reading material—these scored an average of just 1.17. For the remaining responses, the scores were generally strong:

Accuracy: 4.12
Completeness: 3.88
Usefulness: 3.96
Relevance: 4.19
Overall Quality: 4.00 The AI performed best on questions about lymphatics but struggled most with the head and neck subtopic. Notably, it was much better at answering general educational questions than it was at detailing specific surgical procedures. When the low-scoring answers were re-queried with the newer o1 model, the scores improved significantly.

Key Takeaways and the Future of AI in Surgery

The study concluded that ChatGPT is most effective at providing educational tools and background information, particularly in well-defined areas like lymphatics. However, it has clear limitations, especially when it comes to generating reliable literature and answering detailed procedural questions. The significant improvement seen with the newer o1 version is a promising sign. It suggests that as these models continue to be refined, LLMs will likely become increasingly valuable and reliable tools in surgical education. For a full description of the Evidence-Based Medicine ratings used in this journal, please refer to the publisher's online Instructions to Authors.

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram