Developer Offer

Try ImaginePro API with 50 Free Credits

Build and ship AI-powered visuals with Midjourney, Flux, and more — free credits refresh every month.

Start Free Trial

DeepSeek Outperforms ChatGPT On Myopia Health Questions

2025-11-11•Liu, Wen•5 minutes read

Artificial Intelligence

Healthcare

Ophthalmology

The Rise of AI in Healthcare

In an age where many turn to the internet for health advice before consulting a doctor, artificial intelligence (AI) chatbots are becoming the new frontier for medical information. These large language models (LLMs), like the well-known ChatGPT, can provide instant, human-like answers to complex questions. This accessibility is particularly relevant for common conditions like pediatric myopia, or nearsightedness, which has reached epidemic levels globally and is a major concern for parents seeking reliable guidance.

However, the accuracy of these AI tools is a critical concern. To that end, a new study delved into the performance of two leading AI chatbots: OpenAI's ChatGPT-4o and DeepSeek, a model developed within the Chinese AI ecosystem. The research aimed to determine how well these models could answer real-world questions about childhood myopia, offering valuable insights into their current capabilities and limitations as a source for health information.

How the AIs Were Tested

To create a fair and clinically relevant comparison, researchers designed a rigorous evaluation process. Here’s how they did it:

The Questions: A team of experienced ophthalmologists compiled a list of 30 frequently asked questions about childhood myopia, reflecting the common concerns of parents.
The Categories: These questions were sorted into six key domains: the causes (pathogenesis), symptoms (clinical manifestations), diagnosis, prevention, treatment, and long-term outlook (prognosis).
The Test: Each of the 30 questions was posed to both ChatGPT-4o and DeepSeek.
The Judges: A panel of three senior pediatric ophthalmologists independently graded every response. To eliminate bias, the answers were anonymized, so the experts didn't know which AI provided them.
The Scorecard: Responses were rated for accuracy on a three-point scale: "Good" (fully accurate and comprehensive), "Fair" (factually correct but missing some details), or "Poor" (containing factual errors that could be misleading).

The Verdict: A Clear Winner Emerges

When the scores were tallied, the results showed a significant difference in performance between the two AI models.

Overall Accuracy DeepSeek was the clear winner. It received a "Good" rating on 23 out of 30 questions (76.7%), demonstrating strong reliability across most topics. In contrast, ChatGPT-4o earned a "Good" rating on only 13 questions (43.3%).

A Shared Blind Spot: Treatment Information Both models struggled significantly in the treatment domain. Their performance dropped when asked about specific, commercially available myopia control products. For example, both chatbots gave incorrect or incomplete answers about the brands of defocus-incorporated spectacle lenses (DISL), a modern treatment for slowing myopia progression. This suggests their training data lags behind the rapidly evolving, market-specific landscape of medical treatments.

This gap was also evident in questions about low-dose atropine, a common treatment. The models failed to capture crucial region-specific differences in available concentrations, highlighting a major limitation for users seeking localized health information.

Comprehensiveness On a positive note, when the chatbots provided an accurate answer, it was generally also comprehensive. This indicates that when the underlying information is correct, the models are capable of delivering detailed and useful explanations for a general audience. However, the study also found that the human experts often disagreed with each other's scores, pointing to the inherent subjectivity in evaluating the quality of AI-generated medical advice.

What This Means for Patients and Parents

The study reveals that while advanced AI chatbots can be valuable tools for general education on health topics like myopia, they are not yet a substitute for professional medical advice. DeepSeek's superior performance is noteworthy, suggesting that newer, possibly more localized LLMs, can be highly competitive and even outperform established global models.

The key takeaway is that users should be cautious, especially when seeking information on treatments. The AIs' knowledge base is not always up-to-date with the latest clinical guidelines, product availability, or regional healthcare practices. This lag can result in receiving outdated or irrelevant information, which could impact health decisions.

The Future of AI in Eye Care

This comparative analysis underscores both the immense potential and the current pitfalls of using AI for health communication. Both ChatGPT-4o and DeepSeek can deliver useful, accurate information on the basics of myopia, but their reliability wanes when it comes to specific and current treatment options.

To become truly dependable resources, these AI systems require continuous refinement. This includes regular updates to their knowledge bases with the latest research, localization of content to reflect regional standards of care, and domain-specific fine-tuning to improve precision. With rigorous oversight and ongoing development, AI chatbots promise to become powerful allies in promoting health literacy and supporting patients and their families in making informed decisions.

About this article

Check for updates. Verify currency and authenticity via CrossMark

This blog post is a repurposed summary of a research article published in BMC Ophthalmology. To cite the original work:

Yao, J., Hsin, S.C., Li, L. et al. Benchmark analysis of myopia-related issues using large language models: a comparison of ChatGPT-4o and deepseek. BMC Ophthalmol 25, 632 (2025). https://doi.org/10.1186/s12886-025-04328-w

Read Original Post

Compare Plans & Pricing

Find the plan that matches your workload and unlock full access to ImaginePro.

ImaginePro pricing comparison
Plan	Price	Highlights
Standard	$8 / month	300 monthly credits included Access to Midjourney, Flux, and SDXL models Commercial usage rights
Premium	$20 / month	900 monthly credits for scaling teams Higher concurrency and faster delivery Priority support via Slack or Telegram