Back to all posts

AI Models Compete In Eye Infection Diagnosis

2025-07-17Muhammad Hasnain1*Khursheed Aurangzeb2Musaed Alhussein2Imran Ghani3Muhammad Hamza Mahmood13 minutes read
Artificial Intelligence
Healthcare
Medical Research

The AI Models Under the Microscope

A recent study explored the capabilities of popular large language models (LLMs) in the medical field, specifically focusing on their ability to assist with information related to conjunctivitis. The research analyzed responses from ChatGPT and Deepseek to clinical questions, and also tested the image analysis skills of ChatGPT, Claude, and Deepseek to assess their potential diagnostic power. Using advanced prompt engineering techniques, the researchers evaluated how well these models could generate high-quality, accurate information about the common eye infection.

Evaluating Text-Based Diagnostic Assistance

When it came to providing detailed information on conjunctivitis, the models showed distinct strengths. The findings revealed that Deepseek excelled at delivering precise and comprehensive information on specific topics. It provided in-depth medical insights and detailed explanations, making it a potentially powerful tool for medical professionals seeking specialized knowledge.

In contrast, the ChatGPT model offered more generalized public information about the infection. While this makes it suitable for broader and less technical discussions, it lacked the clinical depth provided by Deepseek. The study also measured the "hallucination rate," or the frequency of generating incorrect information. Deepseek performed better with a 7% hallucination rate, compared to ChatGPT's 13%.

The Verdict on Image-Based Diagnosis

The study also explored how these AI models handled visual data by asking them to analyze a dataset of images related to conjunctivitis. The results from this portion of the research were starkly different. Claude demonstrated perfect, 100% accuracy in its binary classification task, showing an exceptional ability to interpret medical images correctly.

This performance significantly outperformed ChatGPT, which achieved only 62.5% accuracy in the same task. Deepseek, despite its strong performance with text, showed limited capabilities in understanding the image dataset.

Key Takeaways for Medical Professionals

This comparative analysis highlights the varying strengths and weaknesses of different AI models in a medical context. Deepseek proves to be a strong contender for text-based medical research, while Claude shows remarkable potential for image-based diagnostics. The study serves as an insightful guide for scholars and health professionals, helping them choose the right AI tool for their specific needs and reminding them of the importance of verifying AI-generated information.

Study Details and Citation

This research is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution, or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and the original publication in this journal is cited.

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.